Files
nexus/wiki/concepts/Availability.md

72 lines
2.6 KiB
Markdown

# Availability
## Definition
Availability is the time a system remains operational and accessible to users. It is typically expressed as a percentage of uptime over a defined period (e.g., monthly or yearly).
The DevOps Maturity Model explicitly lists Availability as one of the key metrics for measuring DevOps maturity.
## Availability SLAs
Common availability targets:
| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|-------------|---------------|----------------|---------------|
| 99% | 3.65 days | 7.31 hours | 1.68 hours |
| 99.9% | 8.76 hours | 43.83 minutes | 10.08 minutes |
| 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes |
| 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds |
## Across DevOps Maturity Levels
| Maturity | Availability Capability |
|----------|----------------------|
| Phase 1 | Poor — reactive monitoring, siloed teams, manual processes cause frequent outages |
| Phase 2 | Improving — essential monitoring detects issues, but manual intervention required |
| Phase 3 | Better — automated infrastructure reduces human errors, faster recovery |
| Phase 4 | High — continuous monitoring for early detection, root cause analysis capability |
| Phase 5 | Max uptime — no interruptions to customer experience, rapid data-driven decisions |
## Key Practices for High Availability
### Architecture
- Redundancy at every layer
- Load balancing
- Geographic distribution
- Graceful degradation
- Circuit breakers
### Operations
- Continuous monitoring
- Automated failover
- Disaster recovery planning
- Regular maintenance windows
- Capacity planning
### Development
- Robust error handling
- Idempotent operations
- Transaction management
- Feature flags for rapid rollback
- Chaos engineering
## Relationship with Other Metrics
| Metric | Relationship with Availability |
|--------|-------------------------------|
| **MTTD** | Faster detection = shorter outage = higher availability |
| **MTTR** | Faster recovery = shorter outage = higher availability |
| **Error Budget** | Availability target defines the error budget |
| **Change Failure Rate** | Fewer failed deployments = fewer outages = higher availability |
| **Scalability** | Better scalability prevents availability degradation under load |
## Sources
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
## Related Concepts
- [[concepts/High-Availability]]
- [[concepts/MTTR]]
- [[concepts/Error-Budget]]
- [[concepts/Scalability]]
- [[concepts/Disaster-Recovery]]
- [[concepts/DevOps-Maturity]]