# MTTD (Mean Time to Detect) ## Definition MTTD (Mean Time to Detect) is the average time required to identify that a problem or failure has occurred in a system. It measures the effectiveness of monitoring, alerting, and observability practices. MTTD is a component of MTTR and represents the first phase of incident response. ## Why MTTD Matters A short MTTD means: - Failures are caught before they cascade into larger outages - Customer impact is minimized - The team can begin recovery faster - Root cause analysis starts sooner Long MTTD means: - Problems can escalate undetected - User experience degrades for longer periods - More customers are affected - Root cause analysis becomes harder as the incident grows ## Across DevOps Maturity Levels | Maturity | Detection Capability | |----------|---------------------| | Phase 1 | Long MTTD — outages reported by users, no proactive monitoring, reactive approach | | Phase 2 | Better MTTD — essential monitoring tools alert teams as soon as issues affect users | | Phase 3 | Improved detection — automated monitoring continues, security scans added earlier in pipeline | | Phase 4 | Continuous monitoring — tracks system health for early problem detection and root cause analysis | | Phase 5 | Minimal MTTD — max uptime with high collaboration and continuous monitoring, no customer interruptions | ## Key Practices for Low MTTD ### Monitoring & Alerting - Comprehensive application performance monitoring (APM) - Infrastructure monitoring - Log aggregation and analysis - Real-user monitoring (RUM) - Synthetic monitoring ### Alerting Best Practices - Meaningful alert thresholds (avoid alert fatigue) - Alert routing to appropriate on-call staff - Clear alert context for rapid triage - Correlation of related alerts ### Observability - Structured logging - Distributed tracing - Metrics dashboards - Error tracking ## MTTD vs Other Metrics - **MTTR**: MTTD is a component of MTTR (MTTR = MTTD + MTTA + Mean Time to Repair) - **Availability**: High availability depends partly on short MTTD - **Change Failure Rate**: Fewer failures reaching production reduces MTTD pressure ## Sources - [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]] ## Related Concepts - [[concepts/MTTR]] - [[concepts/MTTA]] - [[concepts/DORA-Metrics]] - [[concepts/APM]] - [[concepts/DevOps-Maturity]]