Knowledge-aware Alert Aggregation in Large-scale Cloud Systems discusses the challenges of handling alert storms in cloud systems and proposes COLA, a novel hybrid approach. The method leverages external knowledge from SOPs to improve alert aggregation efficiency and accuracy.
Existing methods for alert aggregation overlook causal rationale or struggle with infrequent alerts. COLA combines correlation mining and LLM reasoning to address these limitations effectively. By leveraging domain-specific knowledge and advanced techniques, COLA achieves high F1-scores and comparable efficiency in handling alerts.
The paper highlights the importance of automatically aggregating alerts caused by the same root cause to enhance engineers' efficiency in resolving failures. It also emphasizes the need for leveraging external knowledge like SOPs for effective alert aggregation.
COLA is evaluated on three datasets from a real-world cloud platform, demonstrating superior performance compared to state-of-the-art methods. The deployment experience of COLA in Cloud X is shared to benefit the community.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Jinxi Kuang,... في arxiv.org 03-12-2024
https://arxiv.org/pdf/2403.06485.pdfاستفسارات أعمق