nexus/wiki/concepts/Change-Failure-Rate.md

# Change Failure Rate

## Definition
Change Failure Rate (CFR) is the percentage of deployments that cause failures in production — such as service outages, degraded performance, or incidents requiring hotfixes, rollbacks, or patches.

Change Failure Rate is one of the four core **DORA metrics** used to measure DevOps performance.

## Why Change Failure Rate Matters

A low change failure rate indicates:
- High confidence in the deployment process
- Robust testing and quality assurance
- Effective risk management
- Mature operational practices

A high change failure rate means:
- Frequent production incidents
- Unstable deployments
- Low team confidence
- Customer impact

## Across DevOps Maturity Levels

| Maturity | Change Failure Rate Characteristic |
|----------|-----------------------------------|
| Phase 1 | High — manual processes, no automated testing, siloed teams, security only at release |
| Phase 2 | Improving — unit, integration, and end-to-end tests implemented, but security separate |
| Phase 3 | Lower — automated infrastructure, security scans integrated throughout development |
| Phase 4 | Significantly reduced — performance/load testing, immutable infrastructure, dependency vulnerability management |
| Phase 5 | 0-15% (elite) — zero human intervention, real-time data decisions, high-level security integration prevents non-compliant code |

## Elite Performance Benchmark (DORA)
- **Elite performers**: 0-15% change failure rate
- **High performers**: 16-30% change failure rate
- **Medium performers**: 16-30% change failure rate
- **Low performers**: 31-100% change failure rate

## Types of Failed Changes
- Production outages
- Service degradations
- Data corruption
- Security vulnerabilities introduced
- Performance regressions
- Failed rollbacks

## How to Reduce Change Failure Rate

### Technical Practices
- Comprehensive test automation (unit, integration, E2E)
- Feature flags for gradual rollouts
- Canary deployments
- Blue-green deployments
- Automated rollback mechanisms
- Chaos engineering to find weaknesses before production

### Process Improvements
- Code review requirements
- Security scanning in CI/CD pipeline
- Staging environment parity with production
- Small batch sizes to limit blast radius
- Dependency management and vulnerability scanning

### Cultural Factors
- Blameless post-mortems
- Learning from failures
- Psychological safety to report issues
- Shared ownership of reliability

## Relationship with Other DORA Metrics
- **Deployment Frequency**: Higher frequency with lower CFR indicates elite performance
- **Lead Time**: Shorter lead times with maintained/low CFR = high performance
- **MTTR**: Lower CFR means fewer incidents, contributing to lower overall MTTR

## Sources
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
- [[sources/cloud-devop-maturity-guideline.md]]

## Related Concepts
- [[concepts/DORA-Metrics]]
- [[concepts/Continuous-Deployment]]
- [[concepts/DevOps-Maturity]]
- [[concepts/Error-Budget]]
- [[concepts/Rollback-Rate]]