Measuring Performance Under Failures in the LHCb Data Acquisition Network


In this paper, we study two possible approaches to high-performance event building on the data acquisition (DAQ) system of the LHCb experiment. We show, using live experiments, that a synchronized design, that carefully schedules network communications to avoid network congestion, can obtain significantly better performance than a looser approach. However, this comes at the price of fault tolerance: we study the performance degradation of the DAQ system in the presence of various link failures, showing that, in these scenarios, the synchronized approach is not optimal. Finally, we derive some design recommendations to make synchronized designs cope with network failures.

24th IEEE Real Time Conference - ICISE
Cristel Pelsser
Cristel Pelsser
Critical embedded systems, Computer networking, Researcher, Professor

The focus of my research is on network operations, routing, Internet measurements, protocols and security.