Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/concepts/Self-Healing-Systems.md
+++ b/wiki/concepts/Self-Healing-Systems.md
@@ -1,73 +1,73 @@
---
-title: "Self-Healing Systems"
-type: concept
-tags: [aiops, automation, reliability, agentic-ai]
-date: 2026-04-14
-aliases:
-  - Self-Healing
---
-
-## Definition
-
-自愈系统（Self-Healing Systems）是能够**自动检测异常、诊断问题并执行修复操作**的智能系统，无需人工干预即可恢复正常运行状态。这是[[Agentic AI]]和[[AIOps]]的核心能力之一。
-
-## How It Works
-
-```
-┌──────────────┐    ┌──────────────┐    ┌──────────────┐
-│   Anomaly    │ →  │   Diagnosis   │ →  │    Repair    │
-│   Detection   │    │    & Root    │    │   Action     │
-│              │    │   Cause       │    │              │
-└──────────────┘    └──────────────┘    └──────────────┘
-       ↓                  ↓                   ↓
-   AI/ML Model       Decision Tree       Automated Script
-   + Metrics         + Knowledge Base     + Runbooks
-                                                  ↓
-                    ┌──────────────┐    ┌──────────────┐
-                    │  Monitoring  │ ←  │ Verification │
-                    │    Close     │    │   & Report   │
-                    └──────────────┘    └──────────────┘
-```
-
-## Self-Healing Actions
-
-| 动作类型 | 描述 | 示例 |
-|----------|------|------|
-| Restart | 服务重启 | Pod重启、进程重启 |
-| Scale | 扩缩容 | 增加Pod数量、扩容资源 |
-| Evict | 驱逐问题节点 | Kubernetes节点驱逐 |
-| Cleanup | 资源清理 | 清理磁盘、释放连接池 |
-| Rollback | 版本回滚 | 回到上一个稳定版本 |
-| Reroute | 流量切换 | DNS切换、负载均衡调整 |
-
-## In ITSM Context
-
-在[[ITSM 2.0]]的[[Incident-Management]]中，自愈是关键能力：
-
-### AIOps-Powered Self-Healing
- Real-time observability drives rapid detection
- ML models predict failure before it happens
- Automated runbooks execute recovery
- Continuous learning improves future responses
-
-### Kubernetes Self-Healing
-[[Kubernetes]]提供原生自愈机制：
- **Liveness Probes** — 自动重启不健康容器
- **Readiness Probes** — 停止流量到不健康Pod
- **Node Failure Detection** — 自动重新调度Pod
-
-## Related Concepts
-
- [[Agentic AI]] — 自愈的驱动者
- [[AIOps]] — 自愈的分析引擎
- [[Incident-Management]] — 自愈的应用场景
- [[Kubernetes]] — 自愈的主要载体
- [[Root-Cause-Analysis]] — 自愈前的诊断过程
- [[MTTR]] — 自愈改善的关键指标
-
-## Sources
-
- [[how-agentic-ai-can-help-for-cloud-devops]] — Agentic AI自愈场景
- [[understanding-complete-itsm]] — ITSM 2.0自愈能力
- [[Agentic-AI]] — 实体页面中的自愈描述
- [[Kubernetes]] — Kubernetes自愈机制
+---
+title: "Self-Healing Systems"
+type: concept
+tags: [aiops, automation, reliability, agentic-ai]
+date: 2026-04-14
+aliases:
+  - Self-Healing
+---
+
+## Definition
+
+自愈系统（Self-Healing Systems）是能够**自动检测异常、诊断问题并执行修复操作**的智能系统，无需人工干预即可恢复正常运行状态。这是[[Agentic AI]]和[[AIOps]]的核心能力之一。
+
+## How It Works
+
+```
+┌──────────────┐    ┌──────────────┐    ┌──────────────┐
+│   Anomaly    │ →  │   Diagnosis   │ →  │    Repair    │
+│   Detection   │    │    & Root    │    │   Action     │
+│              │    │   Cause       │    │              │
+└──────────────┘    └──────────────┘    └──────────────┘
+       ↓                  ↓                   ↓
+   AI/ML Model       Decision Tree       Automated Script
+   + Metrics         + Knowledge Base     + Runbooks
+                                                  ↓
+                    ┌──────────────┐    ┌──────────────┐
+                    │  Monitoring  │ ←  │ Verification │
+                    │    Close     │    │   & Report   │
+                    └──────────────┘    └──────────────┘
+```
+
+## Self-Healing Actions
+
+| 动作类型 | 描述 | 示例 |
+|----------|------|------|
+| Restart | 服务重启 | Pod重启、进程重启 |
+| Scale | 扩缩容 | 增加Pod数量、扩容资源 |
+| Evict | 驱逐问题节点 | Kubernetes节点驱逐 |
+| Cleanup | 资源清理 | 清理磁盘、释放连接池 |
+| Rollback | 版本回滚 | 回到上一个稳定版本 |
+| Reroute | 流量切换 | DNS切换、负载均衡调整 |
+
+## In ITSM Context
+
+在[[ITSM 2.0]]的[[Incident-Management]]中，自愈是关键能力：
+
+### AIOps-Powered Self-Healing
+- Real-time observability drives rapid detection
+- ML models predict failure before it happens
+- Automated runbooks execute recovery
+- Continuous learning improves future responses
+
+### Kubernetes Self-Healing
+[[Kubernetes]]提供原生自愈机制：
+- **Liveness Probes** — 自动重启不健康容器
+- **Readiness Probes** — 停止流量到不健康Pod
+- **Node Failure Detection** — 自动重新调度Pod
+
+## Related Concepts
+
+- [[Agentic AI]] — 自愈的驱动者
+- [[AIOps]] — 自愈的分析引擎
+- [[Incident-Management]] — 自愈的应用场景
+- [[Kubernetes]] — 自愈的主要载体
+- [[Root-Cause-Analysis]] — 自愈前的诊断过程
+- [[MTTR]] — 自愈改善的关键指标
+
+## Sources
+
+- [[how-agentic-ai-can-help-for-cloud-devops]] — Agentic AI自愈场景
+- [[understanding-complete-itsm]] — ITSM 2.0自愈能力
+- [[Agentic-AI]] — 实体页面中的自愈描述
+- [[Kubernetes]] — Kubernetes自愈机制