2.5 KiB
2.5 KiB
Availability
Definition
Availability is the time a system remains operational and accessible to users. It is typically expressed as a percentage of uptime over a defined period (e.g., monthly or yearly).
The DevOps Maturity Model explicitly lists Availability as one of the key metrics for measuring DevOps maturity.
Availability SLAs
Common availability targets:
| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|---|---|---|---|
| 99% | 3.65 days | 7.31 hours | 1.68 hours |
| 99.9% | 8.76 hours | 43.83 minutes | 10.08 minutes |
| 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes |
| 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds |
Across DevOps Maturity Levels
| Maturity | Availability Capability |
|---|---|
| Phase 1 | Poor — reactive monitoring, siloed teams, manual processes cause frequent outages |
| Phase 2 | Improving — essential monitoring detects issues, but manual intervention required |
| Phase 3 | Better — automated infrastructure reduces human errors, faster recovery |
| Phase 4 | High — continuous monitoring for early detection, root cause analysis capability |
| Phase 5 | Max uptime — no interruptions to customer experience, rapid data-driven decisions |
Key Practices for High Availability
Architecture
- Redundancy at every layer
- Load balancing
- Geographic distribution
- Graceful degradation
- Circuit breakers
Operations
- Continuous monitoring
- Automated failover
- Disaster recovery planning
- Regular maintenance windows
- Capacity planning
Development
- Robust error handling
- Idempotent operations
- Transaction management
- Feature flags for rapid rollback
- Chaos engineering
Relationship with Other Metrics
| Metric | Relationship with Availability |
|---|---|
| MTTD | Faster detection = shorter outage = higher availability |
| MTTR | Faster recovery = shorter outage = higher availability |
| Error Budget | Availability target defines the error budget |
| Change Failure Rate | Fewer failed deployments = fewer outages = higher availability |
| Scalability | Better scalability prevents availability degradation under load |