Auto-sync: 2026-04-22 04:02

2026-04-22 04:03:04 +08:00
parent 24218550d2
commit de096f2f88
232 changed files with 16604 additions and 514 deletions
--- a/wiki/sources/rto-vs-rpo-key-differences-for-modern-disaster-recovery.md
+++ b/wiki/sources/rto-vs-rpo-key-differences-for-modern-disaster-recovery.md
@@ -0,0 +1,89 @@
+---
+title: "RTO vs RPO: Key Differences for Modern Disaster Recovery"
+type: source
+tags: [cloud, devops, disaster-recovery, feature-flags, continuous-delivery]
+date: 2025-07-26
+---
+
+## Source File
+- [[raw/Cloud & DevOps/RTO vs RPO Key Differences for Modern Disaster Recovery.md]]
+
+## Summary (用中文描述)
+- **核心主题**：现代持续交付场景下 RTO（恢复时间目标）和 RPO（恢复点目标）的区别，以及 Feature Flag 如何实现秒级恢复
+- **问题域**：传统灾备只关注硬件故障，而现代软件交付的最大风险来自代码变更本身
+- **方法/机制**：
+  - RTO 衡量系统停机时间，RPO 衡量数据丢失量
+  - Feature Flag 将部署与发布解耦，支持微恢复（feature 级别回滚）
+  - Kill Switch 实现配置级热切换，无需重新部署
+  - Progressive Rollout 通过分阶段放量控制影响范围
+- **结论/价值**：预防优于恢复；Feature Flag 工具（如 LaunchDarkly）可实现秒级 RTO、近零 RPO，远比传统灾备基础设施性价比高
+
+## Key Claims (用中文描述)
+- Feature Flag 将部署（deploy）与发布（release）解耦，实现配置级热修复 → RTO 从小时降至秒级
+- 渐进式放量（Progressive Rollout）将影响范围限制在 1% 用户 → 包含损害，RTO 以秒计
+- Kill Switch 支持支付网关、搜索算法、AI 模型等任意组件的热切换 → 无需重新部署代码
+- Feature Flag 回滚不丢失数据（只切换代码路径） → RPO 始终保持近零
+- 传统灾备规划关注硬件故障，但现代交付中代码变更频率更高、风险更大
+- 应用分层级保护（Tier 1/2/3），而非对所有系统一刀切 Tier 1
+- HP 将回滚时间从小时缩短到分钟，Christian Dior 从 15 分钟降至即时切换
+
+## Key Quotes
+> "RTO is about getting back online. It's the clock that starts ticking the moment your system goes down." — RTO 的本质是系统下线那一刻开始的倒计时
+> "RPO is about protecting data. It's measured backwards from the moment of failure." — RPO 从故障时刻向后追溯可接受的数据丢失窗口
+> "Deploy whenever you want, release when you're ready." — Feature Flag 的核心理念：部署与发布分离
+> "Prevention beats cure." — 预防优于恢复，减少故障比快速恢复更有价值
+> "Your RTO drops to seconds because fixing issues becomes a configuration change, not a code deployment." — Feature Flag 将修复变成配置变更而非代码部署
+> "86% of surveyed LaunchDarkly customers recover from incidents within a day." — LaunchDarkly 客户事故恢复数据
+
+## Key Concepts
+- [[RTO]]：Recovery Time Objective，系统可容忍的最大停机时间，衡量恢复速度
+- [[RPO]]：Recovery Point Objective，可接受的最大数据丢失量，衡量数据保护程度
+- [[Feature Flag]]：功能开关，将代码部署与功能发布解耦，支持热切换
+- [[Kill Switch]]：应急切断开关，紧急情况下绕过故障组件的机制
+- [[Progressive Rollout]]：渐进式放量，分阶段向用户群发布新功能
+- [[Micro-Recovery]]：feature 级别细粒度恢复，无需回滚整个部署
+- [[Deployment-vs-Release]]：部署（代码到达生产）与发布（用户可见）的分离
+- [[Business Impact Analysis]]：业务影响分析，用于确定不同应用的分层保护级别
+
+## Key Entities
+- [[LaunchDarkly]]：Feature Flag 管理平台，HP、Christian Dior 等企业的 RTO/RPO 优化案例
+- [[Veeam]]：传统灾备工具（数据库备份、服务器镜像）
+- [[Acronis]]：传统灾备工具（跨区域复制）
+- [[HP]]：HP 案例——Feature Flag 将回滚时间从小时缩短到分钟
+- [[Christian Dior]]：Christian Dior 案例——回滚从 15 分钟降至即时切换
+
+## Connections
+- [[Disaster Recovery]] ← extends ← [[RTO]] + [[RPO]]（RTO/RPO 是灾备的核心指标）
+- [[Deployment-Automation]] ← depends_on ← [[Feature Flag]]（Feature Flag 是现代部署自动化的基础设施）
+- [[CI-CD-Pipeline]] ← extends ← [[Deployment-vs-Release]]（持续交付中的部署与发布分离）
+- [[High Availability]] ← depends_on ← [[Kill Switch]]（Kill Switch 是 HA 的应急保障机制）
+- [[LaunchDarkly]] ← implements ← [[Feature Flag]]（LaunchDarkly 是 Feature Flag 的商业实现）
+- [[Feature Flag]] ← enables ← [[Progressive Rollout]]（Feature Flag 支持渐进式放量）
+
+## Contradictions
+- 与传统灾备观点冲突：
+  - **冲突点**：传统灾备投资（热备服务器、跨区域复制）vs Feature Flag 方案
+  - **当前观点**（本文）：软件优先方法（Feature Flag + Kill Switch）ROI 更高；HP 案例显示 8% 客户运维成本降低超 50%
+  - **对方观点**（传统 DR）：关键业务系统需要完整的基础设施冗余（Active-Active、跨区域热备）
+
+## Tiering Reference Table
+
+| Tier | 场景 | RTO 目标 | RPO 目标 | 投资策略 |
+|------|------|----------|----------|----------|
+| (1) Critical | 支付处理、用户认证 | < 5 分钟 | < 1 分钟 | Feature Flag + 自动化监控 + 3AM 告警 |
+| (2) Important | 管理后台、报表 | < 1 小时 | < 15 分钟 | Feature Flag（主要发布）+ 业务时间监控 |
+| (3) Nice-to-have | 内部工具、文档站 | < 4 小时 | < 1 小时 | 基础监控 + 手动恢复流程 |
+
+## Application Criticality Questions
+
+**If down for an hour:**
+- Lost revenue? How much?
+- Angry customers? How many?
+- Blocked employees? Can they work around it?
+- Regulatory issues? Legal problems?
+
+**If losing last hour of data:**
+- Can we recreate it?
+- Does it contain money/transactions?
+- Will users notice?
+- Is it required for compliance?