# Availability ## Definition Availability is the time a system remains operational and accessible to users. It is typically expressed as a percentage of uptime over a defined period (e.g., monthly or yearly). The DevOps Maturity Model explicitly lists Availability as one of the key metrics for measuring DevOps maturity. ## Availability SLAs Common availability targets: | Availability | Downtime/Year | Downtime/Month | Downtime/Week | |-------------|---------------|----------------|---------------| | 99% | 3.65 days | 7.31 hours | 1.68 hours | | 99.9% | 8.76 hours | 43.83 minutes | 10.08 minutes | | 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes | | 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds | ## Across DevOps Maturity Levels | Maturity | Availability Capability | |----------|----------------------| | Phase 1 | Poor — reactive monitoring, siloed teams, manual processes cause frequent outages | | Phase 2 | Improving — essential monitoring detects issues, but manual intervention required | | Phase 3 | Better — automated infrastructure reduces human errors, faster recovery | | Phase 4 | High — continuous monitoring for early detection, root cause analysis capability | | Phase 5 | Max uptime — no interruptions to customer experience, rapid data-driven decisions | ## Key Practices for High Availability ### Architecture - Redundancy at every layer - Load balancing - Geographic distribution - Graceful degradation - Circuit breakers ### Operations - Continuous monitoring - Automated failover - Disaster recovery planning - Regular maintenance windows - Capacity planning ### Development - Robust error handling - Idempotent operations - Transaction management - Feature flags for rapid rollback - Chaos engineering ## Relationship with Other Metrics | Metric | Relationship with Availability | |--------|-------------------------------| | **MTTD** | Faster detection = shorter outage = higher availability | | **MTTR** | Faster recovery = shorter outage = higher availability | | **Error Budget** | Availability target defines the error budget | | **Change Failure Rate** | Fewer failed deployments = fewer outages = higher availability | | **Scalability** | Better scalability prevents availability degradation under load | ## Sources - [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]] ## Related Concepts - [[concepts/High-Availability]] - [[concepts/MTTR]] - [[concepts/Error-Budget]] - [[concepts/Scalability]] - [[concepts/Disaster-Recovery]] - [[concepts/DevOps-Maturity]]