Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/concepts/MTTR.md
+++ b/wiki/concepts/MTTR.md
@@ -1,66 +1,66 @@
-# MTTR (Mean Time to Recovery)
-
-## Definition
-MTTR (Mean Time to Recovery) is the average time required to recover from a failure — from the moment a failure is detected to the moment service is fully restored to normal operation.
-
-MTTR is one of the four core **DORA metrics** used to measure DevOps performance.
-
-## Key Components
-
-MTTR can be broken down into:
-1. **MTTD (Mean Time to Detect)** — Average time to identify a problem
-2. **MTTA (Mean Time to Acknowledge)** — Average time to acknowledge and begin addressing a problem
-3. **Mean Time to Repair/Restore** — Actual time to fix and restore service
-4. **MTTR = MTTD + MTTA + Mean Time to Repair**
-
-## Across DevOps Maturity Levels
-
-| Maturity | Detection & Recovery Capability |
-|----------|--------------------------------|
-| Phase 1 | Long MTTD and MTTR — outages reported by users (reactive), no proactive monitoring |
-| Phase 2 | Better MTTD — essential monitoring tools alert teams when issues affect users |
-| Phase 3 | Improved — security scans integrated earlier, but monitoring unchanged from Phase 2 |
-| Phase 4 | Continuous monitoring tracks system health, enabling early detection and root cause analysis |
-| Phase 5 | Max uptime — high collaboration, rapid data-driven decision-making, minimal customer interruptions |
-
-## MTTD and MTTA
-
-### MTTD (Mean Time to Detect)
- The average time to identify that a problem has occurred
- Lower is better — faster detection means faster recovery
- Requires: comprehensive monitoring, alerting, and observability
-
-### MTTA (Mean Time to Acknowledge)
- The average time from detection to someone actively working on the issue
- Includes time to notify on-call staff, triage, and begin investigation
- Requires: clear incident response processes and on-call coverage
-
-## Elite Performance Benchmark (DORA)
- **Elite performers**: MTTR < 1 hour
- Short MTTR indicates:
-  - Robust incident detection and alerting
-  - Clear incident response processes
-  - Well-practiced on-call procedures
-  - Effective automation for rollback and recovery
-  - Good observability and debugging tools
-
-## How to Reduce MTTR
- Implement comprehensive monitoring and alerting
- Practice chaos engineering and incident simulations
- Automate rollback procedures
- Use feature flags to isolate failures
- Maintain runbooks for common failures
- Foster blameless post-mortem culture
- Use observability tools for faster root cause analysis
-
-## Sources
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
- [[sources/cloud-devop-maturity-guideline.md]]
-
-## Related Concepts
- [[concepts/DORA-Metrics]]
- [[concepts/MTTD]]
- [[concepts/MTTA]]
- [[concepts/Error-Budget]]
- [[concepts/Change-Failure-Rate]]
- [[concepts/DevOps-Maturity]]
+# MTTR (Mean Time to Recovery)
+
+## Definition
+MTTR (Mean Time to Recovery) is the average time required to recover from a failure — from the moment a failure is detected to the moment service is fully restored to normal operation.
+
+MTTR is one of the four core **DORA metrics** used to measure DevOps performance.
+
+## Key Components
+
+MTTR can be broken down into:
+1. **MTTD (Mean Time to Detect)** — Average time to identify a problem
+2. **MTTA (Mean Time to Acknowledge)** — Average time to acknowledge and begin addressing a problem
+3. **Mean Time to Repair/Restore** — Actual time to fix and restore service
+4. **MTTR = MTTD + MTTA + Mean Time to Repair**
+
+## Across DevOps Maturity Levels
+
+| Maturity | Detection & Recovery Capability |
+|----------|--------------------------------|
+| Phase 1 | Long MTTD and MTTR — outages reported by users (reactive), no proactive monitoring |
+| Phase 2 | Better MTTD — essential monitoring tools alert teams when issues affect users |
+| Phase 3 | Improved — security scans integrated earlier, but monitoring unchanged from Phase 2 |
+| Phase 4 | Continuous monitoring tracks system health, enabling early detection and root cause analysis |
+| Phase 5 | Max uptime — high collaboration, rapid data-driven decision-making, minimal customer interruptions |
+
+## MTTD and MTTA
+
+### MTTD (Mean Time to Detect)
+- The average time to identify that a problem has occurred
+- Lower is better — faster detection means faster recovery
+- Requires: comprehensive monitoring, alerting, and observability
+
+### MTTA (Mean Time to Acknowledge)
+- The average time from detection to someone actively working on the issue
+- Includes time to notify on-call staff, triage, and begin investigation
+- Requires: clear incident response processes and on-call coverage
+
+## Elite Performance Benchmark (DORA)
+- **Elite performers**: MTTR < 1 hour
+- Short MTTR indicates:
+  - Robust incident detection and alerting
+  - Clear incident response processes
+  - Well-practiced on-call procedures
+  - Effective automation for rollback and recovery
+  - Good observability and debugging tools
+
+## How to Reduce MTTR
+- Implement comprehensive monitoring and alerting
+- Practice chaos engineering and incident simulations
+- Automate rollback procedures
+- Use feature flags to isolate failures
+- Maintain runbooks for common failures
+- Foster blameless post-mortem culture
+- Use observability tools for faster root cause analysis
+
+## Sources
+- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
+- [[sources/cloud-devop-maturity-guideline.md]]
+
+## Related Concepts
+- [[concepts/DORA-Metrics]]
+- [[concepts/MTTD]]
+- [[concepts/MTTA]]
+- [[concepts/Error-Budget]]
+- [[concepts/Change-Failure-Rate]]
+- [[concepts/DevOps-Maturity]]