Files
nexus/wiki/concepts/Event-Correlation.md

2.3 KiB
Raw Blame History

title, type, tags, date
title type tags date
Event Correlation concept
aiops
monitoring
incident-management
operations
2025-03-01

Definition

事件关联Event CorrelationAIOps的核心技术之一,通过算法将大量分散的监控告警和系统事件归类为少量有意义的事件组,减少告警噪音,加速Incident-ManagementRoot-Cause-Analysis

The Problem

Without Event Correlation:
─────────────────────────────
Alert #1: CPU High on Server A
Alert #2: Memory High on Server A
Alert #3: Disk I/O High on Server A
Alert #4: Network Latency on Server A
Alert #5: App Response Slow
Alert #6: Database Connection Pool Full
Alert #7: API Timeout
... (100+ alerts for ONE root cause)

Event Correlation Techniques

1. Rule-Based Correlation

IF alerts occur within time window T
AND involve same source/host/service
THEN group as single incident

2. Statistical Correlation

  • Time series analysis
  • Pattern matching
  • Anomaly detection

3. AI/ML Correlation

  • Root cause inference
  • Causal graph models
  • Predictive correlation

Benefits

收益 描述
告警降噪 减少90%+噪音
加速RCA 快速定位根因
MTTR降低 减少人工分析时间
SLA保障 更快响应

In ITSM Context

ITSM 2.0Incident-Management中,事件关联是关键能力:

Incident Management 2.0
├── Event Correlation (ML-enhanced)
│   ├── 告警去重
│   ├── 根因推断
│   └── 关联推理
├── AIOps-powered Analysis
│   ├── 异常检测
│   ├── 模式识别
│   └── 预测分析
└── Self-Healing Automation
    ├── 自动诊断
    └── 自动修复

Sources