Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Stein , Flavio Pisani , Tommaso Colombo and Cristel Pelsser
This 2024 international journal article, by Eloise Stein and 3 coauthors, was published in IEEE Transactions on Nuclear Science. Topics covered include servers, data acquisition, large hadron collider, bandwidth, throughput, sensors, computers, data acquisition, event building, failure analysis, network fault tolerance, and networks.
Full author list: Eloise Stein, Flavio Pisani, Tommaso Colombo, and Cristel Pelsser.
Abstract
For the Large Hadron Collider beauty (LHCb) experiment, achieving high throughput in the data acquisition (DAQ) network is crucial for supporting scientific applications. However, failures within DAQ networks can lead to significant performance degradation. In this study, we investigate the frequency, duration, and causes of failures in the LHCb DAQ network over a two-month period to illustrate how common these events are. This insight is essential for developing strategies to optimize performance during data taking periods. We further study the performance degradation upon failure. We explore the performance for two potential approaches to high-performance event building on the DAQ network: synchronized and non-synchronized designs. We use live experiments to demonstrate that a synchronized design, which carefully schedules network communications to avoid congestion, can achieve significantly better performance when the network is used at full capacity. However, this approach comes at the expense of reduced fault tolerance compared to the non-synchronized approach. This study highlights that it is essential for the network to handle failures more efficiently to sustainably maintain high data rates.
Publication Details
- Publication Type
- Journal Article
- Publication Date
- August 2024
- Published In
- IEEE Transactions on Nuclear Science
- Volume & Issue
- Vol. PP
- Pages
- 1--1
- Digital Object Identifier (DOI)
- 10.1109/TNS.2024.3451177
Suggested citation
Eloise Stein, Flavio Pisani, Tommaso Colombo, and Cristel Pelsser. 2024. Measuring Performance Under Failures in the LHCb Data Acquisition Network. IEEE Transactions on Nuclear Science PP (Aug. 2024), 1–1. https://doi.org/10.1109/TNS.2024.3451177
BibTeX Citation
BibTeX Citation
@article{Stein2024a,
title = {Measuring Performance Under Failures in the LHCb Data Acquisition Network},
author = {Stein, Eloise and Pisani, Flavio and Colombo, Tommaso and Pelsser, Cristel},
year = 2024,
month = aug,
journal = {IEEE Transactions on Nuclear Science},
volume = {PP},
pages = {1--1},
doi = {10.1109/TNS.2024.3451177},
abstract = {For the Large Hadron Collider beauty (LHCb) experiment, achieving high throughput in the data acquisition (DAQ) network is crucial for supporting scientific applications. However, failures within DAQ networks can lead to significant performance degradation. In this study, we investigate the frequency, duration, and causes of failures in the LHCb DAQ network over a two-month period to illustrate how common these events are. This insight is essential for developing strategies to optimize performance during data taking periods. We further study the performance degradation upon failure. We explore the performance for two potential approaches to high-performance event building on the DAQ network: synchronized and non-synchronized designs. We use live experiments to demonstrate that a synchronized design, which carefully schedules network communications to avoid congestion, can achieve significantly better performance when the network is used at full capacity. However, this approach comes at the expense of reduced fault tolerance compared to the non-synchronized approach. This study highlights that it is essential for the network to handle failures more efficiently to sustainably maintain high data rates.},
groups = {International Journals and Magazines},
keywords = {Servers,Data acquisition,Large Hadron Collider,Bandwidth,Throughput,Sensors,Computers,data acquisition,event building,failure analysis,network fault tolerance,networks}
}
Related publications
FORS: Fault-adaptive Optimized Routing and Scheduling for DAQ Networks
Eloise Stein, Quentin Bramas, and Flavio Pisani, et al.
Computing and Software for Big Science, 2025
The Forest Behind the Tree: Revealing Hidden Smart Home Communication Patterns
François De Keersmaeker, Rémi Van Boxem, and Cristel Pelsser, et al.
Proceedings of the 33rd IEEE International Conference on Network Protocols (ICNP '25), 2025
Impact of Road Congestion on Mobile Networks
Alexandre Vogel, Dena Markudova, and Andra Lutu, et al.
9th IEEE/IFIP Network Traffic Measurement and Analysis Conference (TMA 2025), 2025
Measuring Performance Under Failures in the LHCb Data Acquisition Network
Eloise Noelle Stein, Cristel Pelsser, and Flavio Pisani, et al.
24th IEEE Real Time Conference - ICISE, 2024