ItalianoClear Cookie - decide language by browser settings

Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution

Magnoni, Luca (2012) Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution. PhD Thesis , Università degli studi di Ferrara.

PDF File

Download (11MB) | Preview


    Effective monitoring and analysis tools are fundamental in modern IT infrastructures to get insights on the overall system behavior and to deal promptly and effectively with failures. In recent years, Complex Event Processing (CEP) technologies have emerged as effective solutions for information processing from the most disparate fields: from wireless sensor networks to financial analysis. This thesis proposes an innovative approach to monitor and operate complex and distributed computing systems, in particular referring to the ATLAS Trigger and Data Acquisition (TDAQ) system currently in use at the European Organization for Nuclear Research (CERN). The result of this research, the AAL project, is currently used to provide ATLAS data acquisition operators with automated error detection and intelligent system analysis. The thesis begins by describing the TDAQ system and the controlling architecture, with a focus on the monitoring infrastructure and the expert system used for error detection and automated recovery. It then discusses the limitations of the current approach and how it can be improved to maximize the ATLAS TDAQ operational efficiency. Event processing methodologies are then laid out, with a focus on CEP techniques for stream processing and pattern recognition. The open-source Esper engine, the CEP solution adopted by the project is subsequently analyzed and discussed. Next, the AAL project is introduced as the automated and intelligent monitoring solution developed as the result of this research. AAL requirements and governing factors are listed, with a focus on how stream processing functionalities can enhance the TDAQ monitoring experience. The AAL processing model is then introduced and the architectural choices are justified. Finally, real applications on TDAQ error detection are presented. The main conclusion from this work is that CEP techniques can be successfully applied to detect error conditions and system misbehavior. Moreover, the AAL project demonstrates a real application of CEP concepts for intelligent monitoring in the demanding TDAQ scenario. The adoption of AAL by several TDAQ communities shows that automation and intelligent system analysis were not properly addressed in the previous infrastructure. The results of this thesis will benefit researchers evaluating intelligent monitoring techniques on large-scale distributed computing system.

    Item Type:Thesis (PhD Thesis)
    Date:30 March 2012
    Tutor:Luppi, Eleonora - Lehmann Miotto, Giovanna
    Coordinator:Ruggiero, Valeria
    Institution:Università degli studi di Ferrara
    Divisions:Dipartimento > Matematica
    Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 Informatica
    Uncontrolled Keywords:event processing, cep, intelligent monitoring, aal, shifter assistant, atlas tdaq
    Deposited on:27 Feb 2013 08:38


    View ItemView Item