Update nexus: fix conflicts and sync local changes
This commit is contained in:
@@ -1,83 +1,83 @@
|
||||
# Change Failure Rate
|
||||
|
||||
## Definition
|
||||
Change Failure Rate (CFR) is the percentage of deployments that cause failures in production — such as service outages, degraded performance, or incidents requiring hotfixes, rollbacks, or patches.
|
||||
|
||||
Change Failure Rate is one of the four core **DORA metrics** used to measure DevOps performance.
|
||||
|
||||
## Why Change Failure Rate Matters
|
||||
|
||||
A low change failure rate indicates:
|
||||
- High confidence in the deployment process
|
||||
- Robust testing and quality assurance
|
||||
- Effective risk management
|
||||
- Mature operational practices
|
||||
|
||||
A high change failure rate means:
|
||||
- Frequent production incidents
|
||||
- Unstable deployments
|
||||
- Low team confidence
|
||||
- Customer impact
|
||||
|
||||
## Across DevOps Maturity Levels
|
||||
|
||||
| Maturity | Change Failure Rate Characteristic |
|
||||
|----------|-----------------------------------|
|
||||
| Phase 1 | High — manual processes, no automated testing, siloed teams, security only at release |
|
||||
| Phase 2 | Improving — unit, integration, and end-to-end tests implemented, but security separate |
|
||||
| Phase 3 | Lower — automated infrastructure, security scans integrated throughout development |
|
||||
| Phase 4 | Significantly reduced — performance/load testing, immutable infrastructure, dependency vulnerability management |
|
||||
| Phase 5 | 0-15% (elite) — zero human intervention, real-time data decisions, high-level security integration prevents non-compliant code |
|
||||
|
||||
## Elite Performance Benchmark (DORA)
|
||||
- **Elite performers**: 0-15% change failure rate
|
||||
- **High performers**: 16-30% change failure rate
|
||||
- **Medium performers**: 16-30% change failure rate
|
||||
- **Low performers**: 31-100% change failure rate
|
||||
|
||||
## Types of Failed Changes
|
||||
- Production outages
|
||||
- Service degradations
|
||||
- Data corruption
|
||||
- Security vulnerabilities introduced
|
||||
- Performance regressions
|
||||
- Failed rollbacks
|
||||
|
||||
## How to Reduce Change Failure Rate
|
||||
|
||||
### Technical Practices
|
||||
- Comprehensive test automation (unit, integration, E2E)
|
||||
- Feature flags for gradual rollouts
|
||||
- Canary deployments
|
||||
- Blue-green deployments
|
||||
- Automated rollback mechanisms
|
||||
- Chaos engineering to find weaknesses before production
|
||||
|
||||
### Process Improvements
|
||||
- Code review requirements
|
||||
- Security scanning in CI/CD pipeline
|
||||
- Staging environment parity with production
|
||||
- Small batch sizes to limit blast radius
|
||||
- Dependency management and vulnerability scanning
|
||||
|
||||
### Cultural Factors
|
||||
- Blameless post-mortems
|
||||
- Learning from failures
|
||||
- Psychological safety to report issues
|
||||
- Shared ownership of reliability
|
||||
|
||||
## Relationship with Other DORA Metrics
|
||||
- **Deployment Frequency**: Higher frequency with lower CFR indicates elite performance
|
||||
- **Lead Time**: Shorter lead times with maintained/low CFR = high performance
|
||||
- **MTTR**: Lower CFR means fewer incidents, contributing to lower overall MTTR
|
||||
|
||||
## Sources
|
||||
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
|
||||
- [[sources/cloud-devop-maturity-guideline.md]]
|
||||
|
||||
## Related Concepts
|
||||
- [[concepts/DORA-Metrics]]
|
||||
- [[concepts/Continuous-Deployment]]
|
||||
- [[concepts/DevOps-Maturity]]
|
||||
- [[concepts/Error-Budget]]
|
||||
- [[concepts/Rollback-Rate]]
|
||||
# Change Failure Rate
|
||||
|
||||
## Definition
|
||||
Change Failure Rate (CFR) is the percentage of deployments that cause failures in production — such as service outages, degraded performance, or incidents requiring hotfixes, rollbacks, or patches.
|
||||
|
||||
Change Failure Rate is one of the four core **DORA metrics** used to measure DevOps performance.
|
||||
|
||||
## Why Change Failure Rate Matters
|
||||
|
||||
A low change failure rate indicates:
|
||||
- High confidence in the deployment process
|
||||
- Robust testing and quality assurance
|
||||
- Effective risk management
|
||||
- Mature operational practices
|
||||
|
||||
A high change failure rate means:
|
||||
- Frequent production incidents
|
||||
- Unstable deployments
|
||||
- Low team confidence
|
||||
- Customer impact
|
||||
|
||||
## Across DevOps Maturity Levels
|
||||
|
||||
| Maturity | Change Failure Rate Characteristic |
|
||||
|----------|-----------------------------------|
|
||||
| Phase 1 | High — manual processes, no automated testing, siloed teams, security only at release |
|
||||
| Phase 2 | Improving — unit, integration, and end-to-end tests implemented, but security separate |
|
||||
| Phase 3 | Lower — automated infrastructure, security scans integrated throughout development |
|
||||
| Phase 4 | Significantly reduced — performance/load testing, immutable infrastructure, dependency vulnerability management |
|
||||
| Phase 5 | 0-15% (elite) — zero human intervention, real-time data decisions, high-level security integration prevents non-compliant code |
|
||||
|
||||
## Elite Performance Benchmark (DORA)
|
||||
- **Elite performers**: 0-15% change failure rate
|
||||
- **High performers**: 16-30% change failure rate
|
||||
- **Medium performers**: 16-30% change failure rate
|
||||
- **Low performers**: 31-100% change failure rate
|
||||
|
||||
## Types of Failed Changes
|
||||
- Production outages
|
||||
- Service degradations
|
||||
- Data corruption
|
||||
- Security vulnerabilities introduced
|
||||
- Performance regressions
|
||||
- Failed rollbacks
|
||||
|
||||
## How to Reduce Change Failure Rate
|
||||
|
||||
### Technical Practices
|
||||
- Comprehensive test automation (unit, integration, E2E)
|
||||
- Feature flags for gradual rollouts
|
||||
- Canary deployments
|
||||
- Blue-green deployments
|
||||
- Automated rollback mechanisms
|
||||
- Chaos engineering to find weaknesses before production
|
||||
|
||||
### Process Improvements
|
||||
- Code review requirements
|
||||
- Security scanning in CI/CD pipeline
|
||||
- Staging environment parity with production
|
||||
- Small batch sizes to limit blast radius
|
||||
- Dependency management and vulnerability scanning
|
||||
|
||||
### Cultural Factors
|
||||
- Blameless post-mortems
|
||||
- Learning from failures
|
||||
- Psychological safety to report issues
|
||||
- Shared ownership of reliability
|
||||
|
||||
## Relationship with Other DORA Metrics
|
||||
- **Deployment Frequency**: Higher frequency with lower CFR indicates elite performance
|
||||
- **Lead Time**: Shorter lead times with maintained/low CFR = high performance
|
||||
- **MTTR**: Lower CFR means fewer incidents, contributing to lower overall MTTR
|
||||
|
||||
## Sources
|
||||
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
|
||||
- [[sources/cloud-devop-maturity-guideline.md]]
|
||||
|
||||
## Related Concepts
|
||||
- [[concepts/DORA-Metrics]]
|
||||
- [[concepts/Continuous-Deployment]]
|
||||
- [[concepts/DevOps-Maturity]]
|
||||
- [[concepts/Error-Budget]]
|
||||
- [[concepts/Rollback-Rate]]
|
||||
|
||||
Reference in New Issue
Block a user