FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks
Eloise Stein , Quentin Bramas , Flavio Pisani , Tommaso Colombo and Cristel Pelsser
This 2025 international journal article, by Eloise Stein and 4 coauthors, was published in Computing and Software for Big Science. Topics covered include all-to-all, fat-tree networks, integer linear programming, optimal routing, fault-tolerance, and data acquisition.
Full author list: Eloise Stein, Quentin Bramas, Flavio Pisani, Tommaso Colombo, and Cristel Pelsser.
Abstract
Data acquisition (DAQ) networks, widely used in scientific research and indus- trial applications, are composed of numerous interconnected servers, exchanging substantial data volumes produced by large scientific instruments. One traf- fic matrix generally used in such networks is the all-to-all collective exchange, which demands substantial network resources, making network failures partic- ularly challenging to mitigate. If not mitigated, the effects of network failures severely hamper the performance of the DAQ network, potentially leading to the loss of valuable experimental data. In the context of DAQ networks using a fat-tree topology, we propose FORS: a scheduling and associated routing solution to support the all-to-all collective exchange under network failures. FORS optimizes bandwidth utilization in the face of any failure scenarios, ensuring robust performance compared to the exist- ing approaches. We propose an algorithm to solve the scheduling. For the routing, we design an algorithm for simple failure scenarios, along with a linear program- ming model to address more complex failure scenarios. We validate our proposed solution using a real-world DAQ network as a case study. Results demonstrate significant performance degradation in existing approaches and FORS’ consistent ability to achieve higher throughput across various failure scenarios.
Publication Details
- Publication Type
- Journal Article
- Publication Date
- April 2025
- Published In
- Computing and Software for Big Science
- External Link
- https://link.springer.com/journal/41781
Suggested citation
Eloise Stein, Quentin Bramas, Flavio Pisani, Tommaso Colombo, and Cristel Pelsser. 2025. FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks. Computing and Software for Big Science (Apr. 2025).
BibTeX Citation
BibTeX Citation
@article{Stein2025,
title = {FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks},
author = {Stein, Eloise and Bramas, Quentin and Pisani, Flavio and Colombo, Tommaso and Pelsser, Cristel},
year = 2025,
month = apr,
journal = {Computing and Software for Big Science},
url = {https://link.springer.com/journal/41781},
abstract = {Data acquisition (DAQ) networks, widely used in scientific research and indus- trial applications, are composed of numerous interconnected servers, exchanging substantial data volumes produced by large scientific instruments. One traf- fic matrix generally used in such networks is the all-to-all collective exchange, which demands substantial network resources, making network failures partic- ularly challenging to mitigate. If not mitigated, the effects of network failures severely hamper the performance of the DAQ network, potentially leading to the loss of valuable experimental data. In the context of DAQ networks using a fat-tree topology, we propose FORS: a scheduling and associated routing solution to support the all-to-all collective exchange under network failures. FORS optimizes bandwidth utilization in the face of any failure scenarios, ensuring robust performance compared to the exist- ing approaches. We propose an algorithm to solve the scheduling. For the routing, we design an algorithm for simple failure scenarios, along with a linear program- ming model to address more complex failure scenarios. We validate our proposed solution using a real-world DAQ network as a case study. Results demonstrate significant performance degradation in existing approaches and FORS’ consistent ability to achieve higher throughput across various failure scenarios.},
groups = {International Journals and Magazines},
keywords = {all-to-all, fat-tree networks, integer linear programming, optimal routing,fault-tolerance, data acquisition}
}
Related publications
Fault-adaptive Scheduling for Data Acquisition Networks
Eloise Noelle Stein, Cristel Pelsser, and Quentin Bramas, et al.
The 48th IEEE Conference on Local Computer Networks (LCN), 2023
Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Stein, Flavio Pisani, and Tommaso Colombo, et al.
IEEE Transactions on Nuclear Science, 2024
Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Noelle Stein, Cristel Pelsser, and Flavio Pisani, et al.
24th IEEE Real Time Conference - ICISE, 2024
Deploying Near-Optimal Delay-Constrained Paths with Segment Routing in Massive-Scale Networks
Jean-Romain Luttringer, Thomas Alfroy, and Pascal Mérindol, et al.
Computer Networks, 2022