Files
nexus/wiki/concepts/MTTA.md

72 lines
2.5 KiB
Markdown

# MTTA (Mean Time to Acknowledge)
## Definition
MTTA (Mean Time to Acknowledge) is the average time from when a problem is detected to when a team member actively begins working on resolving it. It measures the speed of human response after an alert is triggered.
MTTA is a component of MTTR, sitting between MTTD and Mean Time to Repair.
## Why MTTA Matters
MTTA measures:
- On-call response effectiveness
- Alert severity and clarity
- Incident management process efficiency
- Team availability and readiness
A short MTTA ensures that once a problem is detected, the recovery process begins promptly.
## Across DevOps Maturity Levels
| Maturity | Acknowledgment Capability |
|----------|--------------------------|
| Phase 1 | Long MTTA — unclear ownership, manual processes, reactive responses |
| Phase 2 | Improving — essential monitoring alerts team when issues affect users, ops staff manually intervene |
| Phase 3 | Better process — ops team adopts automation techniques, but monitoring unchanged |
| Phase 4 | Efficient acknowledgment — continuous monitoring with clear escalation paths, root cause analysis starts quickly |
| Phase 5 | Rapid — high collaboration, rapid data-driven decision-making, minimal customer interruptions |
## Key Factors Affecting MTTA
### On-Call Practices
- Clear on-call rotations
- Fast escalation policies
- Adequate staffing levels
- Compensation for on-call duty
### Alert Quality
- Actionable alerts (not noise)
- Clear severity levels
- Sufficient context in alerts
- Pre-configured runbook links
### Incident Response Process
- Clear ownership and accountability
- Pre-defined roles (incident commander, communications lead)
- Escalation procedures
- Communication channels
## MTTA as Part of MTTR
```
MTTR = MTTD + MTTA + Mean Time to Repair
```
All three components must be optimized for minimal MTTR. Even with perfect MTTD (instant detection), a long MTTA will result in poor overall recovery times.
## How to Improve MTTA
- Implement PagerDuty, Opsgenie, or similar incident management tools
- Create clear escalation policies
- Practice incident response with regular game days
- Improve alert quality to reduce noise and fatigue
- Ensure adequate on-call coverage
- Pre-build runbooks for common incidents
## Sources
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
## Related Concepts
- [[concepts/MTTR]]
- [[concepts/MTTD]]
- [[concepts/DORA-Metrics]]
- [[concepts/DevOps-Maturity]]