72 lines
2.5 KiB
Markdown
72 lines
2.5 KiB
Markdown
# MTTA (Mean Time to Acknowledge)
|
|
|
|
## Definition
|
|
MTTA (Mean Time to Acknowledge) is the average time from when a problem is detected to when a team member actively begins working on resolving it. It measures the speed of human response after an alert is triggered.
|
|
|
|
MTTA is a component of MTTR, sitting between MTTD and Mean Time to Repair.
|
|
|
|
## Why MTTA Matters
|
|
|
|
MTTA measures:
|
|
- On-call response effectiveness
|
|
- Alert severity and clarity
|
|
- Incident management process efficiency
|
|
- Team availability and readiness
|
|
|
|
A short MTTA ensures that once a problem is detected, the recovery process begins promptly.
|
|
|
|
## Across DevOps Maturity Levels
|
|
|
|
| Maturity | Acknowledgment Capability |
|
|
|----------|--------------------------|
|
|
| Phase 1 | Long MTTA — unclear ownership, manual processes, reactive responses |
|
|
| Phase 2 | Improving — essential monitoring alerts team when issues affect users, ops staff manually intervene |
|
|
| Phase 3 | Better process — ops team adopts automation techniques, but monitoring unchanged |
|
|
| Phase 4 | Efficient acknowledgment — continuous monitoring with clear escalation paths, root cause analysis starts quickly |
|
|
| Phase 5 | Rapid — high collaboration, rapid data-driven decision-making, minimal customer interruptions |
|
|
|
|
## Key Factors Affecting MTTA
|
|
|
|
### On-Call Practices
|
|
- Clear on-call rotations
|
|
- Fast escalation policies
|
|
- Adequate staffing levels
|
|
- Compensation for on-call duty
|
|
|
|
### Alert Quality
|
|
- Actionable alerts (not noise)
|
|
- Clear severity levels
|
|
- Sufficient context in alerts
|
|
- Pre-configured runbook links
|
|
|
|
### Incident Response Process
|
|
- Clear ownership and accountability
|
|
- Pre-defined roles (incident commander, communications lead)
|
|
- Escalation procedures
|
|
- Communication channels
|
|
|
|
## MTTA as Part of MTTR
|
|
|
|
```
|
|
MTTR = MTTD + MTTA + Mean Time to Repair
|
|
```
|
|
|
|
All three components must be optimized for minimal MTTR. Even with perfect MTTD (instant detection), a long MTTA will result in poor overall recovery times.
|
|
|
|
## How to Improve MTTA
|
|
- Implement PagerDuty, Opsgenie, or similar incident management tools
|
|
- Create clear escalation policies
|
|
- Practice incident response with regular game days
|
|
- Improve alert quality to reduce noise and fatigue
|
|
- Ensure adequate on-call coverage
|
|
- Pre-build runbooks for common incidents
|
|
|
|
## Sources
|
|
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
|
|
|
|
## Related Concepts
|
|
- [[concepts/MTTR]]
|
|
- [[concepts/MTTD]]
|
|
- [[concepts/DORA-Metrics]]
|
|
- [[concepts/DevOps-Maturity]]
|