Fault-adaptive Scheduling for Data Acquisition Networks
Eloise Noelle Stein , Cristel Pelsser , Quentin Bramas and Tommaso Colombo
Abstract
Supporting such an all-to-all traffic matrix is chal- lenging as it can easily lead to congestion. Scheduling pat- terns are designed to avoid such congestion by spreading the communications over time. The time is divided in phases and communications are spread across the phases. However, current scheduling algorithms are not fault-tolerant. In this paper we propose a fault-adaptive congestion-free scheduling to support an all-to-all exchange in fat tree topology. Our approach consist in the computation of the minimum number of communication phases required to support the all-to-all exchange with the available links, and of the scheduling of the communications on these phases. It enables to recover from failures and makes optimal use of the remaining bandwidth. We show that our scheduling approach provides better performance than the most common approach which is the Linear-shift scheduling. The throughput is improved by roughly 80% with our approach, for as little as one link failure.
Publication Details
- Publication Type
- Conference Paper
- Publication Date
- October 2023
- Published In
- The 48th IEEE Conference on Local Computer Networks (LCN)
- Publisher
- IEEE
- Location
- Daytona Beach, Florida, USA
BibTeX Citation
@inproceedings{Stein2023,
title = {Fault-adaptive Scheduling for Data Acquisition Networks},
author = {Stein, Eloise Noelle and Pelsser, Cristel and Bramas, Quentin and Colombo, Tommaso},
year = 2023,
month = oct,
booktitle = {The 48th IEEE Conference on Local Computer Networks (LCN)},
publisher = {IEEE},
address = {Daytona Beach, Florida, USA},
organization = {IEEE},
abstract = {Supporting such an all-to-all traffic matrix is chal- lenging as it can easily lead to congestion. Scheduling pat- terns are designed to avoid such congestion by spreading the communications over time. The time is divided in phases and communications are spread across the phases. However, current scheduling algorithms are not fault-tolerant. In this paper we propose a fault-adaptive congestion-free scheduling to support an all-to-all exchange in fat tree topology. Our approach consist in the computation of the minimum number of communication phases required to support the all-to-all exchange with the available links, and of the scheduling of the communications on these phases. It enables to recover from failures and makes optimal use of the remaining bandwidth. We show that our scheduling approach provides better performance than the most common approach which is the Linear-shift scheduling. The throughput is improved by roughly 80% with our approach, for as little as one link failure.},
groups = {International Conferences},
keywords = {all-to-all, fat-tree networks, integer linear programming}
}
Related publications
FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks
Eloise Stein, Quentin Bramas, and Flavio Pisani, et al.
Computing and Software for Big Science, 2025
Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Noelle Stein, Cristel Pelsser, and Flavio Pisani, et al.
24th IEEE Real Time Conference - ICISE, 2024
Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Stein, Flavio Pisani, and Tommaso Colombo, et al.
IEEE Transactions on Nuclear Science, 2024
Deploying Near-Optimal Delay-Constrained Paths with Segment Routing in Massive-Scale Networks
Jean-Romain Luttringer, Thomas Alfroy, and Pascal Mérindol, et al.
Computer Networks, 2022