72 lines
2.6 KiB
Markdown
72 lines
2.6 KiB
Markdown
# Availability
|
|
|
|
## Definition
|
|
Availability is the time a system remains operational and accessible to users. It is typically expressed as a percentage of uptime over a defined period (e.g., monthly or yearly).
|
|
|
|
The DevOps Maturity Model explicitly lists Availability as one of the key metrics for measuring DevOps maturity.
|
|
|
|
## Availability SLAs
|
|
|
|
Common availability targets:
|
|
|
|
| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|
|
|-------------|---------------|----------------|---------------|
|
|
| 99% | 3.65 days | 7.31 hours | 1.68 hours |
|
|
| 99.9% | 8.76 hours | 43.83 minutes | 10.08 minutes |
|
|
| 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes |
|
|
| 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds |
|
|
|
|
## Across DevOps Maturity Levels
|
|
|
|
| Maturity | Availability Capability |
|
|
|----------|----------------------|
|
|
| Phase 1 | Poor — reactive monitoring, siloed teams, manual processes cause frequent outages |
|
|
| Phase 2 | Improving — essential monitoring detects issues, but manual intervention required |
|
|
| Phase 3 | Better — automated infrastructure reduces human errors, faster recovery |
|
|
| Phase 4 | High — continuous monitoring for early detection, root cause analysis capability |
|
|
| Phase 5 | Max uptime — no interruptions to customer experience, rapid data-driven decisions |
|
|
|
|
## Key Practices for High Availability
|
|
|
|
### Architecture
|
|
- Redundancy at every layer
|
|
- Load balancing
|
|
- Geographic distribution
|
|
- Graceful degradation
|
|
- Circuit breakers
|
|
|
|
### Operations
|
|
- Continuous monitoring
|
|
- Automated failover
|
|
- Disaster recovery planning
|
|
- Regular maintenance windows
|
|
- Capacity planning
|
|
|
|
### Development
|
|
- Robust error handling
|
|
- Idempotent operations
|
|
- Transaction management
|
|
- Feature flags for rapid rollback
|
|
- Chaos engineering
|
|
|
|
## Relationship with Other Metrics
|
|
|
|
| Metric | Relationship with Availability |
|
|
|--------|-------------------------------|
|
|
| **MTTD** | Faster detection = shorter outage = higher availability |
|
|
| **MTTR** | Faster recovery = shorter outage = higher availability |
|
|
| **Error Budget** | Availability target defines the error budget |
|
|
| **Change Failure Rate** | Fewer failed deployments = fewer outages = higher availability |
|
|
| **Scalability** | Better scalability prevents availability degradation under load |
|
|
|
|
## Sources
|
|
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
|
|
|
|
## Related Concepts
|
|
- [[concepts/High-Availability]]
|
|
- [[concepts/MTTR]]
|
|
- [[concepts/Error-Budget]]
|
|
- [[concepts/Scalability]]
|
|
- [[concepts/Disaster-Recovery]]
|
|
- [[concepts/DevOps-Maturity]]
|