25 lines
1.0 KiB
Markdown
25 lines
1.0 KiB
Markdown
---
|
||
title: "Self-Healing Systems"
|
||
type: concept
|
||
tags: [automation, resilience, fault-tolerance]
|
||
sources: [How-Agentic-AI-can-help-for-Cloud-DevOps]
|
||
last_updated: 2026-04-16
|
||
---
|
||
|
||
## Summary
|
||
Self-Healing Systems(自愈系统)是指能够主动检测异常并自动修复问题的系统,无需人工干预即可恢复正常运行状态。
|
||
|
||
## Definition
|
||
具备自动检测、诊断和修复故障能力的系统,能够在问题发生时自动恢复服务。
|
||
|
||
## Key Mechanisms
|
||
- **异常检测**:持续监控关键指标,检测偏离正常模式的行为
|
||
- **自动诊断**:分析日志和指标,确定故障根本原因
|
||
- **自动修复**:执行预定义或 AI 生成的修复脚本
|
||
- **扩缩容**:根据负载自动调整资源分配
|
||
|
||
## Connections
|
||
- [[Agentic AI]] ← enables ← [[Self-Healing Systems]]:Agentic AI 实现自愈能力
|
||
- [[Kubernetes]] ← hosts ← [[Self-Healing Systems]]:K8s 提供自愈机制(Pod 重启、节点替换)
|
||
- [[混沌工程]] ← tests ← [[Self-Healing Systems]]:混沌工程验证自愈系统有效性
|