Measuring Performance Under Failures in the LHCb Data Acquisition Network

Authors: Eloise Noelle Stein, Cristel Pelsser, Flavio Pisani, Tommaso Colombo

Year: 2024

Published in: 24th IEEE Real Time Conference - ICISE

Abstract: In this paper, we study two possible approaches to high-performance event building on the data acquisition (DAQ) system of the LHCb experiment. We show, using live experiments, that a synchronized design, that carefully schedules network communications to avoid network congestion, can obtain significantly better performance than a looser approach. However, this comes at the price of fault tolerance: we study the performance degradation of the DAQ system in the presence of various link failures, showing that, in these scenarios, the synchronized approach is not optimal. Finally, we derive some design recommendations to make synchronized designs cope with network failures.

View full publication page