With a growing demand for quasi-instantaneous communication services such as real-time video streaming, cloud gaming, and industry 4.0 applications, multi-constraint Traffic Engineering (TE) becomes increasingly important. While legacy TE management planes have proven laborious to deploy, Segment Routing (SR) drastically eases the deployment of TE paths and is thus increasingly adopted by Internet Service Providers (ISP). There is a clear need in computing and deploying Delay-Constrained Least-Cost paths (DCLC) with SR for real-time interactive services. However, most current DCLC solutions are not tailored for SR. They also often lack efficiency or guarantees. Similarly to approximation schemes, we argue that the challenge is to design an algorithm providing both performances and guarantees. However, conversely to most of these schemes, we also consider operational constraints to provide a practical, high-performance implementation. We leverage the inherent limitations of delay measurements and account for the operational constraint added by SR to design a new algorithm, best2cop, providing guarantees and performance in all cases. Best2cop outperforms a state-of-the-art algorithm on both random and real networks of up to 1000 nodes. Relying on commodity hardware with a single thread, our algorithm retrieves all non-superfluous 3-dimensional routes in only 250ms and 100ms respectively. This execution time is further reduced using multiple threads, as the design of best2cop enables a speedup almost linear in the number of cores. Finally, we extend best2cop to deal with massive scale ISP by leveraging the multi-area partitioning of these deployments. Thanks to our new topology generator specifically designed to model the realistic patterns of such massive IP networks, we show that best2cop solves DCLC-SR in approximately 1 second even for ISP having more than 100000 routers.