Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/concepts/Incident-Management.md
+++ b/wiki/concepts/Incident-Management.md
@@ -1,74 +1,74 @@
---
-title: "Incident Management"
-type: concept
-tags: [itsm, operations, reliability]
-date: 2025-03-01
---
-
-## Definition
-
-事件管理（Incident Management）是[[ITSM]]的核心流程之一，专注于**快速恢复服务正常运作**，将服务中断或降级对业务的影响降到最低。
-
-## Incident Lifecycle
-
-```
-┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
-│  Event  │ →  │ Detect  │ →  │ Triage  │ →  │ Resolve │ →  │ Review  │
-│ Occurs  │    │ & Alert │    │ & Prior │    │ & Recover│  │ & Learn │
-└─────────┘    └─────────┘    └─────────┘    └─────────┘    └─────────┘
-```
-
-## Modern Incident Management (ITSM 2.0)
-
-在[[ITSM 2.0]]中，事件管理由[[AIOps]]和[[Self-Healing-Systems]]驱动：
-
-### Key Capabilities
-
-| 能力 | 描述 | 技术 |
-|------|------|------|
-| Real-time Observability | 实时可观测性 | Metrics, Logs, Traces |
-| Automated Remediation | 自动化修复 | AIOps, Runbooks |
-| Dynamic Prioritization | 动态优先级 | ML Models |
-| Auto-escalation | 自动升级 | Alert Routing |
-| Self-Healing | 自愈 | Automated Recovery |
-
-### AIOps-Powered Incident Response
-
-```
-监控检测 → 智能分类 → 自动路由 → 自动化修复 → SLA监控
-    ↓          ↓          ↓          ↓          ↓
-  AIOps    ML模型     技能路由    Runbooks    告警升级
-```
-
-## Key Metrics
-
-| 指标 | 描述 |
-|------|------|
-| [[MTTR]] | Mean Time to Recovery — 平均恢复时间 |
-| [[MTTD]] | Mean Time to Detect — 平均检测时间 |
-| MTTA | Mean Time to Acknowledge — 平均确认时间 |
-| Change Failure Rate | 变更失败率 |
-
-## Priority Levels
-
-| 优先级 | 描述 | SLA |
-|--------|------|-----|
-| P1/Critical | 核心服务不可用 | 15分钟 |
-| P2/High | 主要功能不可用 | 1小时 |
-| P3/Medium | 次要功能受影响 | 4小时 |
-| P4/Low | 轻微影响 | 24小时 |
-
-## Related Concepts
-
- [[ITSM]] — 父框架
- [[Problem-Management]] — 问题管理
- [[AIOps]] — AI运维能力
- [[Self-Healing-Systems]] — 自愈系统
- [[MTTR]] — 平均恢复时间
- [[MTTD]] — 平均检测时间
- [[Event-Correlation]] — 事件关联
- [[Root-Cause-Analysis]] — 根因分析
-
-## Sources
-
- [[understanding-complete-itsm]] — AIOps-driven Incident Management
+---
+title: "Incident Management"
+type: concept
+tags: [itsm, operations, reliability]
+date: 2025-03-01
+---
+
+## Definition
+
+事件管理（Incident Management）是[[ITSM]]的核心流程之一，专注于**快速恢复服务正常运作**，将服务中断或降级对业务的影响降到最低。
+
+## Incident Lifecycle
+
+```
+┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
+│  Event  │ →  │ Detect  │ →  │ Triage  │ →  │ Resolve │ →  │ Review  │
+│ Occurs  │    │ & Alert │    │ & Prior │    │ & Recover│  │ & Learn │
+└─────────┘    └─────────┘    └─────────┘    └─────────┘    └─────────┘
+```
+
+## Modern Incident Management (ITSM 2.0)
+
+在[[ITSM 2.0]]中，事件管理由[[AIOps]]和[[Self-Healing-Systems]]驱动：
+
+### Key Capabilities
+
+| 能力 | 描述 | 技术 |
+|------|------|------|
+| Real-time Observability | 实时可观测性 | Metrics, Logs, Traces |
+| Automated Remediation | 自动化修复 | AIOps, Runbooks |
+| Dynamic Prioritization | 动态优先级 | ML Models |
+| Auto-escalation | 自动升级 | Alert Routing |
+| Self-Healing | 自愈 | Automated Recovery |
+
+### AIOps-Powered Incident Response
+
+```
+监控检测 → 智能分类 → 自动路由 → 自动化修复 → SLA监控
+    ↓          ↓          ↓          ↓          ↓
+  AIOps    ML模型     技能路由    Runbooks    告警升级
+```
+
+## Key Metrics
+
+| 指标 | 描述 |
+|------|------|
+| [[MTTR]] | Mean Time to Recovery — 平均恢复时间 |
+| [[MTTD]] | Mean Time to Detect — 平均检测时间 |
+| MTTA | Mean Time to Acknowledge — 平均确认时间 |
+| Change Failure Rate | 变更失败率 |
+
+## Priority Levels
+
+| 优先级 | 描述 | SLA |
+|--------|------|-----|
+| P1/Critical | 核心服务不可用 | 15分钟 |
+| P2/High | 主要功能不可用 | 1小时 |
+| P3/Medium | 次要功能受影响 | 4小时 |
+| P4/Low | 轻微影响 | 24小时 |
+
+## Related Concepts
+
+- [[ITSM]] — 父框架
+- [[Problem-Management]] — 问题管理
+- [[AIOps]] — AI运维能力
+- [[Self-Healing-Systems]] — 自愈系统
+- [[MTTR]] — 平均恢复时间
+- [[MTTD]] — 平均检测时间
+- [[Event-Correlation]] — 事件关联
+- [[Root-Cause-Analysis]] — 根因分析
+
+## Sources
+
+- [[understanding-complete-itsm]] — AIOps-driven Incident Management