nexus/wiki/concepts/Availability.md

# Availability

## Definition
Availability is the time a system remains operational and accessible to users. It is typically expressed as a percentage of uptime over a defined period (e.g., monthly or yearly).

The DevOps Maturity Model explicitly lists Availability as one of the key metrics for measuring DevOps maturity.

## Availability SLAs

Common availability targets:

| Availability | Downtime/Year | Downtime/Month | Downtime/Week |
|-------------|---------------|----------------|---------------|
| 99% | 3.65 days | 7.31 hours | 1.68 hours |
| 99.9% | 8.76 hours | 43.83 minutes | 10.08 minutes |
| 99.99% | 52.60 minutes | 4.38 minutes | 1.01 minutes |
| 99.999% | 5.26 minutes | 26.30 seconds | 6.05 seconds |

## Across DevOps Maturity Levels

| Maturity | Availability Capability |
|----------|----------------------|
| Phase 1 | Poor — reactive monitoring, siloed teams, manual processes cause frequent outages |
| Phase 2 | Improving — essential monitoring detects issues, but manual intervention required |
| Phase 3 | Better — automated infrastructure reduces human errors, faster recovery |
| Phase 4 | High — continuous monitoring for early detection, root cause analysis capability |
| Phase 5 | Max uptime — no interruptions to customer experience, rapid data-driven decisions |

## Key Practices for High Availability

### Architecture
- Redundancy at every layer
- Load balancing
- Geographic distribution
- Graceful degradation
- Circuit breakers

### Operations
- Continuous monitoring
- Automated failover
- Disaster recovery planning
- Regular maintenance windows
- Capacity planning

### Development
- Robust error handling
- Idempotent operations
- Transaction management
- Feature flags for rapid rollback
- Chaos engineering

## Relationship with Other Metrics

| Metric | Relationship with Availability |
|--------|-------------------------------|
| **MTTD** | Faster detection = shorter outage = higher availability |
| **MTTR** | Faster recovery = shorter outage = higher availability |
| **Error Budget** | Availability target defines the error budget |
| **Change Failure Rate** | Fewer failed deployments = fewer outages = higher availability |
| **Scalability** | Better scalability prevents availability degradation under load |

## Sources
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]

## Related Concepts
- [[concepts/High-Availability]]
- [[concepts/MTTR]]
- [[concepts/Error-Budget]]
- [[concepts/Scalability]]
- [[concepts/Disaster-Recovery]]
- [[concepts/DevOps-Maturity]]