3.1 KiB
3.1 KiB
Rollback Rate
Definition
Rollback Rate is the proportion of deployments that are reverted (rolled back) to a previous stable version after being deployed to production. It measures how often deployments fail to the point where reverting becomes necessary.
The DevOps Maturity Model explicitly lists Rollback Rate as one of the metrics for measuring DevOps maturity.
Why Rollback Rate Matters
A high rollback rate indicates:
- Deployment quality issues
- Insufficient testing before deployment
- Gap between staging and production environments
- Unstable or risky deployment processes
A low rollback rate indicates:
- High confidence in the deployment pipeline
- Comprehensive pre-production testing
- Stable deployment processes
Across DevOps Maturity Levels
| Maturity | Rollback Rate Characteristic |
|---|---|
| Phase 1 | High rollback rate — manual deployments, no automated testing, siloed teams, manual infrastructure |
| Phase 2 | Improving — automation reduces some risks, but manual interventions still cause rollbacks |
| Phase 3 | Lower — automated infrastructure and security scans reduce failures before deployment |
| Phase 4 | Reduced — performance testing, immutable infrastructure, dependency vulnerability management |
| Phase 5 | Minimal — zero human intervention, real-time decisions, rollback automation for fast recovery |
Relationship with Other Metrics
Rollback Rate and Change Failure Rate
- Change Failure Rate: All deployments that cause failures (regardless of rollback)
- Rollback Rate: Only deployments where the team explicitly chose to roll back
A high CFR but low Rollback Rate could mean failures were fixed without rollback. A low CFR but high Rollback Rate suggests teams are overly cautious.
Rollback Rate and MTTR
- Rollback is often a strategy for reducing MTTR
- Fast rollback mechanisms enable quick recovery
- Organizations with mature CI/CD pipelines have both low rollback rates AND fast rollback capabilities
How to Reduce Rollback Rate
Technical Strategies
- Comprehensive pre-production testing
- Feature flags for gradual rollouts
- Canary deployments (route small % of traffic to new version)
- Blue-green deployments
- Comprehensive observability to detect issues before users notice
- A/B testing in production
Process Improvements
- Small batch deployments to limit blast radius
- Strict deployment criteria (all tests green, no open severity-1 bugs)
- Deployment freeze periods for critical systems
- Change advisory board for high-risk changes
Cultural Factors
- Psychological safety to admit when a deployment is failing
- Clear criteria for when to rollback vs fix-forward
- Blameless post-mortems to learn from rollbacks
- On-call engineers empowered to make rollback decisions