Measuring Performance Under Failures in the LHCb Data Acquisition Network

Eloise Noelle Stein , Cristel Pelsser , Flavio Pisani and Tommaso Colombo

Featured image for Measuring Performance Under Failures in the LHCb Data Acquisition Network
Download PDF Full Text

Abstract

In this paper, we study two possible approaches to high-performance event building on the data acquisition (DAQ) system of the LHCb experiment. We show, using live experiments, that a synchronized design, that carefully schedules network communications to avoid network congestion, can obtain significantly better performance than a looser approach. However, this comes at the price of fault tolerance: we study the performance degradation of the DAQ system in the presence of various link failures, showing that, in these scenarios, the synchronized approach is not optimal. Finally, we derive some design recommendations to make synchronized designs cope with network failures.

Publication Details

Publication Type
poster
Publication Date
April 2024
Published In
24th IEEE Real Time Conference - ICISE
Location
Quy Nhon, Vietnam

Suggested citation

Eloise Noelle Stein, Cristel Pelsser, Flavio Pisani, and Tommaso Colombo. 2024. Measuring Performance Under Failures in the LHCb Data Acquisition Network. In 24th IEEE Real Time Conference - ICISE. Quy Nhon, Vietnam.

BibTeX Citation

@poster{Stein2024,
	title        = {Measuring Performance Under Failures in the LHCb Data Acquisition Network},
	author       = {Eloise Noelle Stein and Cristel Pelsser and Flavio Pisani and Tommaso Colombo},
	year         = 2024,
	month        = apr,
	day          = 23,
	booktitle    = {24th IEEE Real Time Conference - ICISE},
	address      = {Quy Nhon, Vietnam},
	url          = {https://indico.cern.ch/event/940112/contributions/5765003/},
	abstract     = {In this paper, we study two possible approaches to high-performance event building on the data acquisition (DAQ) system of the LHCb experiment. We show, using live experiments, that a synchronized design, that carefully schedules network communications to avoid network congestion, can obtain significantly better performance than a looser approach. However, this comes at the price of fault tolerance: we study the performance degradation of the DAQ system in the presence of various link failures, showing that, in these scenarios, the synchronized approach is not optimal. Finally, we derive some design recommendations to make synchronized designs cope with network failures.},
	affiliation  = {Universite de Strasbourg (FR)},
	coauthors    = {Cristel Pelsser (UCLouvain and University of Strasbourg), Flavio Pisani (CERN), Tommaso Colombo (CERN)},
	duration     = {1h},
	groups       = {Posters},
	keywords     = {Data Acquisition, Network Failures, Performance Measurement, LHCb},
	session      = {Data Acquisition and Trigger Architectures Poster A},
	speaker      = {Eloise Noelle Stein},
	time         = {11:55 AM},
	type         = {Mini Oral and Poster}
}

Related publications