Files
nexus/knowledgebase/DevOps & SRE/10_OpenText-Series/public-cloud-learning-sessions-opentext-evolving-from-dr-to-recovery-assurance-2.md

36 lines
3.2 KiB
Markdown

---
title: "Public Cloud Learning Sessions (OpenText) - Evolving from DR to Recovery Assurance - 20240723 160210-Meeting Recording"
type: cloud-learning
source-type: video
category: "DevOps & SRE/10_OpenText-Series"
tags:
- OpenText
- DR
- Recovery
- BCP
date-added: 2026-04-14
video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions (OpenText) - Evolving from DR to Recovery Assurance - 20240723_160210-Meeting Recording.mp4"
audio-source: ""
status: summarized (Gemini 摘要)
---
# Public Cloud Learning Sessions (OpenText) - Evolving from DR to Recovery Assurance - 20240723 160210-Meeting Recording
**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions (OpenText) - Evolving from DR to Recovery Assurance - 20240723_160210-Meeting Recording.mp4`
**Type:** VIDEO | **Category:** 10_OpenText-Series
**Status:** 🟡 Awaiting Whisper transcription → Summary
---
The learning session focuses on evolving disaster recovery (DR) mechanisms to recovery assurance, presented by Jim Rose. The primary objectives include understanding the current state of DR for OpenText solutions and the trend toward site reliability engineering (SRE) and observability engineering to enhance recovery assurance.
Jim Rose discusses the CrowdStrike incident, where a software vulnerability caused widespread system outages, emphasizing the importance of robust DR strategies. *CrowdStrike was not us, but we have had some disruptions.* He highlights past incidents like the 2003 Power Grid outage and the 2017 WannaCry ransomware attack to illustrate potential disaster impacts. OpenText has experienced incidents, driving the need for improved end-to-end system management.
Key DR terms include Recovery Time Objective (RTO), the time to restore services after an event, and Recovery Point Objective (RPO), the amount of data that might be lost. OpenText's RTO and RPO vary from minutes to days based on customer contracts. Testing is often reactive, manual, and customer-scheduled, involving many teams and significant effort. *Every person who is a SME on some part of this has to be involved in developing a plan.* The company aims to shift to a more proactive stance for better scalability.
Several factors are driving change, including the increasing use of AWS, GCP, and Azure for hosting solutions. Testing in hyperscalers has limitations, such as focusing on zone failures rather than other potential issues. Hybrid solutions, where only part of the service can be failed over, pose additional challenges. The current model lacks a consistent approach across the organization, especially for systems that have not been tested.
The discussion covers four key areas: design, software, build, and environments. Recoverability should be a design principle, with mechanisms for data and environment recovery conceived early. Software should provide telemetry to understand system health continuously, with self-healing capabilities. The build process should include a customer zero environment for validating new products and releases. Environments should leverage observability engineering and SRE to improve resilience and capacity. Automation is seen as a future opportunity to reduce manual effort and time delays in DR processes.