RTO (Recovery Time Objective)

RTO (Recovery Time Objective) 是指系统发生故障后，业务能够容忍的最大停机时间。它衡量的是恢复速度——从系统下线到用户可以重新使用系统的时间窗口。

Definition

"RTO is about getting back online. It's the clock that starts ticking the moment your system goes down." — LaunchDarkly

RTO 是灾备规划的核心指标之一，与 RPO（恢复点目标）共同构成灾备目标体系。

RTO 和 RPO 经常被混淆，但衡量的是完全不同的维度：

两者可以独立设定：快速恢复不代表数据不丢失，反之亦然。

Tier	场景	RTO 目标	说明
Critical	支付处理、用户认证	< 5 分钟	业务立即停止，需要 3AM 告警
Important	管理后台、报表	< 1 小时	业务减速但不停止
Nice-to-have	内部工具、文档站	< 4 小时	仅造成不便

传统灾备规划假设 RTO 针对的是硬件故障（服务器宕机、数据中心断电），但现代持续交付中最大的风险来自代码变更：

Feature Flag 将 RTO 从小时级降至秒级：只需切换配置，无需重新部署代码。

"What does an hour of downtime actually cost your business? If it's $10K, don't spend $100K/year on infrastructure to prevent it."