3.1 KiB
3.1 KiB
Change Failure Rate
Definition
Change Failure Rate (CFR) is the percentage of deployments that cause failures in production — such as service outages, degraded performance, or incidents requiring hotfixes, rollbacks, or patches.
Change Failure Rate is one of the four core DORA metrics used to measure DevOps performance.
Why Change Failure Rate Matters
A low change failure rate indicates:
- High confidence in the deployment process
- Robust testing and quality assurance
- Effective risk management
- Mature operational practices
A high change failure rate means:
- Frequent production incidents
- Unstable deployments
- Low team confidence
- Customer impact
Across DevOps Maturity Levels
| Maturity | Change Failure Rate Characteristic |
|---|---|
| Phase 1 | High — manual processes, no automated testing, siloed teams, security only at release |
| Phase 2 | Improving — unit, integration, and end-to-end tests implemented, but security separate |
| Phase 3 | Lower — automated infrastructure, security scans integrated throughout development |
| Phase 4 | Significantly reduced — performance/load testing, immutable infrastructure, dependency vulnerability management |
| Phase 5 | 0-15% (elite) — zero human intervention, real-time data decisions, high-level security integration prevents non-compliant code |
Elite Performance Benchmark (DORA)
- Elite performers: 0-15% change failure rate
- High performers: 16-30% change failure rate
- Medium performers: 16-30% change failure rate
- Low performers: 31-100% change failure rate
Types of Failed Changes
- Production outages
- Service degradations
- Data corruption
- Security vulnerabilities introduced
- Performance regressions
- Failed rollbacks
How to Reduce Change Failure Rate
Technical Practices
- Comprehensive test automation (unit, integration, E2E)
- Feature flags for gradual rollouts
- Canary deployments
- Blue-green deployments
- Automated rollback mechanisms
- Chaos engineering to find weaknesses before production
Process Improvements
- Code review requirements
- Security scanning in CI/CD pipeline
- Staging environment parity with production
- Small batch sizes to limit blast radius
- Dependency management and vulnerability scanning
Cultural Factors
- Blameless post-mortems
- Learning from failures
- Psychological safety to report issues
- Shared ownership of reliability
Relationship with Other DORA Metrics
- Deployment Frequency: Higher frequency with lower CFR indicates elite performance
- Lead Time: Shorter lead times with maintained/low CFR = high performance
- MTTR: Lower CFR means fewer incidents, contributing to lower overall MTTR
Sources
- sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md
- sources/cloud-devop-maturity-guideline.md