67 lines
2.7 KiB
Markdown
67 lines
2.7 KiB
Markdown
---
|
|
title:
|
|
source:
|
|
author: shenwei
|
|
published:
|
|
created:
|
|
description:
|
|
tags: []
|
|
---
|
|
|
|
## **1. Objective**
|
|
|
|
Ensure business continuity and data protection by implementing an effective DR strategy for the customer, leveraging EFS replication, RDS PITR, and different failover methods.
|
|
|
|
## **2. DR Scenarios & Recovery Options**
|
|
|
|
| | **Method** | **RDS Recovery** | **EFS Recovery** | **Failover Steps** | **Estimated Downtime (RTO)** | **RPO** | **Cost Impact** |
|
|
| ------------------ | ------------------------------- | ---------------- | ---------------------------- | ------------------------------------------------------------------------------------------------- | ---------------------------- | ------- | --------------- |
|
|
| DR Basic Service | **Cold Backup-Restore** | Snapshot (6h) | Backup Restore (6h) | 1. Restore RDS from snapshot (6h) <br>2. Restore EFS from snapshot (6h) <br>3. Recover EKS (4h) | **24 hours** | 4 hours | **Base Cost** |
|
|
| DR Premium Service | **EFS Replica Only (RDS PITR)** | PITR (6h) | EFS Replica + Restore (0.2h) | 1. RDS recovery from PITR (6h) <br>2. Stop EFS sync (0.2h) <br>3. Full EKS recovery | **6 hours** | 15 min | **+30% Cost** |
|
|
|
|
---
|
|
|
|
## **3. Downtime Estimation & RTO Considerations**
|
|
|
|
- **EFS Replica Only (RDS PITR)**
|
|
- **6-hour RTO**, significantly reducing downtime compared to cold restore.
|
|
- **15-minute RPO** ensures minimal data loss.
|
|
|
|
---
|
|
|
|
## **4. DR Execution Plan**
|
|
|
|
### **4.1 Pre-DR Readiness Checks**
|
|
|
|
- Ensure **EFS replication** is active and functioning correctly.
|
|
- Verify **RDS PITR backups** and retention policies.
|
|
- Pre-configure **EKS deployment templates(Velero)** for rapid recovery.
|
|
|
|
### **4.2 Disaster Recovery Trigger**
|
|
|
|
- DR activation is **initiated upon a major failure event** in the primary environment.
|
|
- Decision criteria include **regional failure, prolonged service outage, or severe data corruption**.
|
|
|
|
### **4.3 Execution Steps**
|
|
|
|
#### **EFS Replica Only (RDS PITR)**
|
|
|
|
1. **Recover RDS** from PITR (**6 hours**).
|
|
2. **Stop EFS replication sync** (**0.2 hours**).
|
|
3. **Recover EKS cluster** and validate application (**immediate**).
|
|
|
|
### **4.4 Post-Failover Validation**
|
|
|
|
- Confirm **data consistency** between DR and primary environments.
|
|
- Validate **application services and connectivity**.
|
|
- Communicate DR activation and service restoration to stakeholders.
|
|
|
|
---
|
|
|
|
## **5. DR Testing & Cost Estimation**
|
|
|
|
- **Annual DR validation test** is required, adding an **estimated 2 months of AWS costs**.
|
|
- **EFS Replica Only (RDS PITR):**
|
|
- **$20.8K/month**
|
|
|