Update nexus: fix conflicts and sync local changes

This commit is contained in:
Shen Wei
2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions

View File

@@ -1,69 +1,69 @@
---
title: "Predictive Maintenance"
tags:
- devops
- reliability
- ai
- operations
created: 2026-04-25
---
# Predictive Maintenance
## Definition
Predictive Maintenance 是基于历史故障模式学习,**主动建议补丁或变更**以预防非计划停机的方法。Agentic AI 分析历史运维数据,预测潜在故障并提前采取预防措施。
## Mechanism
```
Historical Data → Pattern Learning → Failure Prediction → Proactive Action
运维日志、告警历史、变更记录、监控数据
ML 模型识别故障前兆模式
- 磁盘 I/O 逐渐下降 → 预测磁盘故障 → 建议迁移
- 内存使用率周期性峰值 → 预测 OOM → 建议扩容
- API 响应时间逐步增加 → 预测容量瓶颈 → 建议扩缩容
```
## 与 Self-Healing Systems 的关系
| 维度 | Reactive (Self-Healing) | Predictive (Predictive Maintenance) |
|------|------------------------|-----------------------------------|
| 时机 | 故障发生后修复 | 故障发生前预防 |
| 目标 | 减少 MTTR | 减少 MTBF (Mean Time Between Failures) |
| 成本 | 被动投入 | 主动投入,高 ROI |
| 成熟度 | Level 4 AIOps | Level 5 AIOps |
## 示例
> Agentic AI analyzes 6 months of Kubernetes pod restart logs and identifies:
> - Pods restart every 48-72 hours
> - Pattern correlates with memory leak in v2.3.1 of service
> - **Predicts**: Next scheduled restart will cause cascade failure
> - **Proposes**: Patch to v2.3.2 + preventive restart during low-traffic window
## 与 [[AIOps]] 的关系
Predictive Maintenance 是 [[AIOps]] Level 5 (Optimizing) 的核心能力:
```python
DevOps_Maturity_AIOps = {
"Level 3 - Defined": "Smart Alerting",
"Level 4 - Advanced": "Self-Healing: Automated Remediation",
"Level 5 - Optimizing": "Predictive Maintenance ←" # ← 本页
}
```
## Related Concepts
- [[Self-Healing Systems]] — Predictive 是 Reactive 的进化
- [[AIOps]] — Predictive Maintenance 是 AIOps 的高级能力
- [[MTTR]] — Predictive 改善 MTBFMTTR 不变但故障减少
- [[Availability]] — Predictive 直接提升可用性
## Related Sources
- [[how-agentic-ai-can-help-for-cloud-devops]]
---
title: "Predictive Maintenance"
tags:
- devops
- reliability
- ai
- operations
created: 2026-04-25
---
# Predictive Maintenance
## Definition
Predictive Maintenance 是基于历史故障模式学习,**主动建议补丁或变更**以预防非计划停机的方法。Agentic AI 分析历史运维数据,预测潜在故障并提前采取预防措施。
## Mechanism
```
Historical Data → Pattern Learning → Failure Prediction → Proactive Action
运维日志、告警历史、变更记录、监控数据
ML 模型识别故障前兆模式
- 磁盘 I/O 逐渐下降 → 预测磁盘故障 → 建议迁移
- 内存使用率周期性峰值 → 预测 OOM → 建议扩容
- API 响应时间逐步增加 → 预测容量瓶颈 → 建议扩缩容
```
## 与 Self-Healing Systems 的关系
| 维度 | Reactive (Self-Healing) | Predictive (Predictive Maintenance) |
|------|------------------------|-----------------------------------|
| 时机 | 故障发生后修复 | 故障发生前预防 |
| 目标 | 减少 MTTR | 减少 MTBF (Mean Time Between Failures) |
| 成本 | 被动投入 | 主动投入,高 ROI |
| 成熟度 | Level 4 AIOps | Level 5 AIOps |
## 示例
> Agentic AI analyzes 6 months of Kubernetes pod restart logs and identifies:
> - Pods restart every 48-72 hours
> - Pattern correlates with memory leak in v2.3.1 of service
> - **Predicts**: Next scheduled restart will cause cascade failure
> - **Proposes**: Patch to v2.3.2 + preventive restart during low-traffic window
## 与 [[AIOps]] 的关系
Predictive Maintenance 是 [[AIOps]] Level 5 (Optimizing) 的核心能力:
```python
DevOps_Maturity_AIOps = {
"Level 3 - Defined": "Smart Alerting",
"Level 4 - Advanced": "Self-Healing: Automated Remediation",
"Level 5 - Optimizing": "Predictive Maintenance ←" # ← 本页
}
```
## Related Concepts
- [[Self-Healing Systems]] — Predictive 是 Reactive 的进化
- [[AIOps]] — Predictive Maintenance 是 AIOps 的高级能力
- [[MTTR]] — Predictive 改善 MTBFMTTR 不变但故障减少
- [[Availability]] — Predictive 直接提升可用性
## Related Sources
- [[how-agentic-ai-can-help-for-cloud-devops]]