Auto-sync: 2026-04-24 04:02

This commit is contained in:
2026-04-24 04:02:45 +08:00
parent 4e9ee6f51e
commit a96baa8fb7
40 changed files with 1934 additions and 89 deletions

View File

@@ -0,0 +1,32 @@
---
title: "Agile Ceremonies"
type: concept
tags: []
sources: []
last_updated: 2026-04-24
---
## 定义
敏捷仪式Agile Ceremonies是敏捷框架中定义的固定会议和活动目的是促进团队协作、沟通和持续改进。
## Scrum 标准仪式
- **Sprint Planning冲刺规划** Sprint 开始时,确定本 Sprint 要完成的工作
- **Daily Stand-up每日站会** 每天定时短会,回答三个问题:昨天完成什么、今天做什么、有什么阻碍
- 时长15-30 分钟
- 围绕看板工具展开
- **Sprint Review冲刺评审** Sprint 结束时向利益相关方演示已完成的工作
- **Sprint Retrospective回顾会议** Sprint 结束时团队复盘,识别改进点
## 仪式保留策略(混合框架)
云转型团队保留 Scrum 的两个核心仪式:
- **Daily Stand-up** 确保快速同步团队状态
- **Retrospective** 驱动快速反馈循环,持续改进产品和开发文化
放弃 Sprint 固定节奏以允许持续变更。
## 最佳实践
- 行动项必须带负责人Owner
- 回顾会议输出具体可执行的改进措施
## 来源
- [[ctp-topic-4-using-agile-to-run-the-cloud-transformation-program]]

51
wiki/concepts/CAPA.md Normal file
View File

@@ -0,0 +1,51 @@
---
title: "CAPA"
type: concept
tags: [Incident-Management, Post-mortem, Root-Cause, Preventive-Action]
last_updated: 2026-04-14
---
## Definition
CAPA (Corrective and Preventive Action纠正和预防措施) 是 Emergency Change 事后必须执行的流程用于从事故中提取根本原因并预防同类问题再次发生。CAPA 通常与 Post-mortem 回顾结合使用。
## Components
### Corrective Action纠正措施
- 修复当前事故的直接措施
- 目标是恢复服务到期望状态
- 通常在 Emergency Change 执行阶段完成
### Preventive Action预防措施
- 防止同类问题再次发生的长期措施
- 目标是消除根本原因
- 通常在 Post-mortem 分析后制定
## Relationship with Emergency Change
CAPA 是 Emergency Change 流程的关键组成部分:
```
Incident 发生
Emergency Change 执行Corrective Action
CAPA/Post-mortem 分析
制定 Preventive Action
可能转化为 Standard Change避免未来同类事件
```
## Process
1. **Incident Review**:回顾事故经过
2. **Root Cause Analysis**:识别根本原因
3. **Corrective Action**:执行即时修复
4. **Preventive Action**:制定长期预防措施
5. **Follow-up**:跟踪措施执行情况
6. **Closure**:完成 CAPA 记录
## Sources
- [[ctp-topic-30-managing-change]]

View File

@@ -0,0 +1,24 @@
---
title: "Continuous Delivery"
type: concept
tags: []
sources: []
last_updated: 2026-04-24
---
## 定义
持续交付Continuous Delivery是一种软件工程方法通过自动化流水线使代码变更可以在任何时候安全地部署到生产环境。
## 核心特征
- **随时可发布:** 代码变更经过自动化测试后,可随时部署
- **无需等待固定周期:** 打破 Sprint/迭代边界
- **自动化流水线:** 构建、测试、部署全流程自动化
## 与 Kanban 的关系
持续交付是 Kanban 框架的核心优势之一——当需求可以随时进入看板、完成即可交付时,团队可以更快地响应变化。
## 与 Scrum 的对比
Scrum 的批量发布模式Sprint 结束时一次性发布)相比持续交付,延迟了价值交付时间,且积累的变更越多,发布风险越高。
## 来源
- [[ctp-topic-4-using-agile-to-run-the-cloud-transformation-program]]

View File

@@ -0,0 +1,61 @@
---
title: "Early Live Support"
type: concept
tags: [SRE, Cloud-Transformation, Go-Live, Support-Model]
last_updated: 2026-04-14
---
## Definition
Early Live Support早期上线支持是 Build构建与 BAU日常运维之间的过渡阶段。在这个阶段SRE 团队与产品团队紧密协作,确保新服务平稳上线并建立持续运营的支持模式。
## Phase Overview
```
Build ──────→ Early Live Support ──────→ BAU
↑ ↑ ↑
产品团队主导 ↑ SRE 主导
SRE + 产品团队协作
```
## Key Activities
### 1. Go-Live Checklist
Early Live Support 阶段需要完成以下检查清单:
| Item | Description |
|------|-------------|
| 监控覆盖 | 所有关键服务和基础设施都得到充分监控 |
| 支持模型 | 明确 On-call Schedule 和升级路径 |
| 事件响应流程 | 建立清晰的事件响应和升级流程 |
| SLO/SLI 定义 | 定义服务等级目标和指标 |
| 文档交接 | 完成技术文档和运维手册 |
### 2. SRE Collaboration
SRE 团队在此阶段:
- 提供技术支持,协助解决上线问题
- 验证监控和告警的有效性
- 建立与产品团队的沟通渠道
- 收集 BAU 阶段的运维需求
### 3. Handoff Criteria
从 Early Live Support 过渡到 BAU 的标准:
- 所有监控面板正常运行
- 事件响应流程经过演练
- On-call Schedule 与 Service Desk 集成
- 产品团队完成运维培训
## Relationship with SRE
Early Live Support 是 SRE 三阶段支持模型的关键环节:
1. **Build**SRE 参与架构评审,定义 SLO/SLI
2. **Early Live Support**SRE 提供上线支持,确保平稳过渡
3. **BAU**SRE 提供持续监控和可靠性保障
## Sources
- [[ctp-topic-30-managing-change]]

View File

@@ -0,0 +1,44 @@
---
title: "Emergency Change"
type: concept
tags: [Change-Management, ITSM, Incident-Response, CAPA]
last_updated: 2026-04-14
---
## Definition
Emergency Change紧急变更是为了缓解事故Incident并尽快恢复服务到期望状态而需要立即执行的变更。与 Normal Change 不同Emergency Change 可以在没有 CAB 预先批准的情况下执行,但事后必须通过 CAPA/Post-mortem 流程修复根因。
## Characteristics
|| Attribute | Value |
|-----------|--------|
| Approval Required | 事后审批 |
| CAB Involvement | 事后汇报 |
| Automation Level | 可部分自动化 |
| Risk Level | 高(应急执行) |
| Change Window | 即时执行 |
| Post-Process | 必须 CAPA/Post-mortem |
## Process Flow
1. **Trigger**Incident 触发应急响应
2. **Assessment**:快速评估并决定是否执行 Emergency Change
3. **Execution**:立即执行变更以缓解事故
4. **Communication**:通知相关干系人
5. **CAPA**:事后通过 CAPA 流程修复根因
6. **Documentation**:完成变更记录和 Post-mortem
## Key Principle
> Emergency Change 的目标不是永久性补丁,而是通过 CAPA/Post-mortem 识别根本原因并制定长期解决方案。
## CAPA Process
CAPACorrective and Preventive Action包含两个阶段
- **Corrective Action**:修复当前问题的即时措施
- **Preventive Action**:预防同类问题再次发生的长期措施
## Sources
- [[ctp-topic-30-managing-change]]

37
wiki/concepts/HCX.md Normal file
View File

@@ -0,0 +1,37 @@
---
title: "Hybrid Cloud Extension (HCX)"
type: concept
tags:
- VMware
- Hybrid-Cloud
- Migration
last_updated: 2026-04-25
---
## Hybrid Cloud Extension (HCX)
VMware's hybrid cloud extension technology that enables any-to-any vSphere workload migration between on-premises and cloud environments.
## Definition
HCX enables bidirectional workload migration between any vSphere environments, supporting seamless movement of applications between on-premises data centers and VMware Cloud on AWS.
## Key Features
- **Any-to-Any Migration**: Migrate workloads between any vSphere environments
- **Bidirectional**: Supports migration in both directions (on-prem → cloud and cloud → on-prem)
- **Fast Migration**: Workloads can move in seconds
- **No Re-architecture Required**: Applications can migrate without code changes
## Use Cases
- Cloud migration
- Disaster recovery
- Bursting to cloud
- Data center evacuation
- Cloud repatriation
## Connections
- [[VMware-Cloud-on-AWS]] ← enables ← [[HCX]]
- [[ctp-topic-43-vmware-cloud-on-aws]] ← source ← [[HCX]]
- [[ctp-topic-69-best-practices-for-migrating-on-premises-iod-virtual-machines-to-vm]] ← related_to ← [[HCX]]
## Sources
- [[ctp-topic-43-vmware-cloud-on-aws]]

View File

@@ -0,0 +1,60 @@
---
title: "Hybrid DNS Resolution"
type: concept
tags: []
sources: []
last_updated: 2026-04-24
---
## Hybrid DNS Resolution
混合 DNS 解析,指在 AWS VPC 环境与本地数据中心On-prem之间实现跨环境的域名解析能力是企业云迁移和混合云架构的关键基础设施。
## Problem
在企业迁移到 AWS Landing Zone 的过程中,存在以下 DNS 解析需求:
- AWS 内部的 VPC 需要解析本地数据中心的内部域名(如 `corp.internal`
- 本地数据中心的服务器需要解析 AWS VPC 内部的私有域名(如 `int-sas.local`
- 跨账号的 VPC 之间需要相互解析
传统的分散式 DNS 管理无法有效解决这些问题。
## Solution: Route 53 Resolver Endpoints
AWS Route 53 Resolver 提供两个关键组件实现混合 DNS
### Inbound Endpoints入站终端节点
- 用途:接收来自**本地数据中心**的 DNS 查询请求
- 机制:本地 DNS 服务器将针对 AWS 私有域名的查询转发至 Inbound Endpoint 的 IP
- 场景:本地用户访问 AWS 内部的私有服务(如 `*.int-sas.local`
### Outbound Endpoints出站终端节点
- 用途:将** AWS VPC 内部**的 DNS 查询转发至本地 DNS 服务器
- 机制:通过 Resolver Rules解析规则定义哪些域名需要转发以及转发到哪个 IP
- 场景AWS 工作负载需要访问本地资源(如 GitHub Enterprise、遗留数据库
## Cross-Account Architecture
在 AWS Landing Zone 中,集中化 DNS 管理的标准架构:
1. **专用 DNS 账号**:在 Landing Zone 中设立专门的 DNS 账号(曾被称为 InfoBlocks 账号)
2. **Private Hosted Zones (PHZ)**:在 DNS 账号中集中管理所有私有托管区
3. **AWS RAM 共享**:通过 Resource Access Manager 将 Resolver Rules 共享给各业务账号
4. **VPC 关联授权**:跨账号关联时,必须先由 PHZ 拥有者授权,再由 VPC 拥有者执行关联
5. **Terraform 自动化**:新账号创建时自动完成规则共享与 VPC 关联
## Key Concepts
- [[Private Hosted Zone]]AWS Route 53 私有托管区,在指定 VPC 内部解析自定义域名
- [[Route 53 Resolver Rules]]:解析规则,定义域名的转发路径
- [[VPC Association Authorization]]:跨账号关联的先授权后关联流程
- [[AWS RAM]]:跨账号资源共享机制
- [[AWS Landing Zone]]DNS 架构的承载基础
## Aliases
- Hybrid DNS
- Cross-Cloud DNS
- On-Premises DNS Integration

33
wiki/concepts/Kanban.md Normal file
View File

@@ -0,0 +1,33 @@
---
title: "Kanban"
type: concept
tags: []
sources: []
last_updated: 2026-04-24
---
## 定义
Kanban 是一种敏捷框架强调持续流动continuous flow无固定 Sprint 边界,允许随时调整优先级和需求。
## 核心特征
- **持续交付:** 无需等待 Sprint 结束即可发布
- **随时变更:** 优先级可在任何时候调整
- **可视化看板:** 通过列(列名)管理任务状态
- **限制在制品WIP** 控制同时进行的工作数量
## 与 Scrum 的对比
| 维度 | Scrum | Kanban |
|------|-------|--------|
| 迭代周期 | 固定 Sprint1-4周 | 无固定周期 |
| 变更时机 | Sprint 期间禁止 | 随时可调整 |
| 发布节奏 | Sprint 结束时批量发布 | 持续交付 |
| 仪式 | Sprint Planning/Review/Retrospective | 可选 ceremonies |
## 企业实践:混合框架
云转型团队采用以 Kanban 为主 + 保留 Scrum 仪式(每日站会和回顾会议)的混合方案,兼顾灵活性和反馈循环。
## 来源
- [[ctp-topic-4-using-agile-to-run-the-cloud-transformation-program]]
## 别名
- Kanban Framework

View File

@@ -0,0 +1,49 @@
---
title: "Normal Change"
type: concept
tags: [Change-Management, ITSM, CAB, Change-Window]
last_updated: 2026-04-14
---
## Definition
Normal Change正常变更是一种包含一定风险或影响的变更类型需要变更咨询委员会CAB的审批并可能需要跨团队协调。与 Standard Change 不同Normal Change 需要明确的变更窗口来执行。
## Characteristics
|| Attribute | Value |
|-----------|--------|
| Approval Required | 是CAB 审批) |
| CAB Involvement | 必须 |
| Automation Level | 部分自动化或无 |
| Risk Level | 中-高(需评估) |
| Change Window | 需要明确的时间窗口 |
## Process Flow
1. **Request**:产品团队提交变更请求
2. **Assessment**:评估变更的风险和影响
3. **CAB Review**:变更咨询委员会审批
4. **Scheduling**:安排变更窗口
5. **Execution**:在变更窗口内执行
6. **Verification**:验证变更结果
7. **Closure**:完成变更记录
## Relationship with Standard Change
Normal Change 的理想状态是通过自动化逐步将其归入 Standard Change 范畴:
- 识别重复的 Normal Change
- 评估风险并制定标准化流程
- 通过 IaC + CI/CD 实现自动化
- 将 Normal Change 转化为 Standard Change
## Example Use Cases
- 跨账户的网络架构变更
- 需要 CAB 审批的安全策略更新
- 涉及多个团队协调的基础设施迁移
## Sources
- [[ctp-topic-30-managing-change]]

35
wiki/concepts/SDDC.md Normal file
View File

@@ -0,0 +1,35 @@
---
title: "Software-Defined Data Center (SDDC)"
type: concept
tags:
- VMware
- SDDC
- Cloud
last_updated: 2026-04-25
---
## Software-Defined Data Center (SDDC)
A data center approach where all infrastructure is virtualized and delivered as a service. In the context of VMC on AWS, the SDDC is the core deployment unit managed through vCenter.
## Definition
SDDC extends the concept of software-defined computing (hypervisor), software-defined storage, and software-defined networking to the entire data center infrastructure. VMware Cloud on AWS deploys SDDCs on AWS infrastructure, managed through vCenter Server.
## Key Characteristics
- **Virtualized Compute**: VMware vSphere runs on bare metal servers
- **Virtualized Storage**: Software-defined storage integrated with AWS storage
- **Virtualized Networking**: NSX provides software-defined networking
- **Unified Management**: vCenter Server manages the entire SDDC
- **Cloud-Native Integration**: Native access to AWS services
## SDDC Management
- **VMware Cloud Services Portal**: Web-based portal for SDDC management
- **Developer Center**: API Explorer for programmatic access
- **Follow-Me-Help**: Access to VMware engineers for assistance
## Connections
- [[VMware-Cloud-on-AWS]] ← deploys ← [[SDDC]]
- [[ctp-topic-43-vmware-cloud-on-aws]] ← source ← [[SDDC]]
## Sources
- [[ctp-topic-43-vmware-cloud-on-aws]]

29
wiki/concepts/Scrum.md Normal file
View File

@@ -0,0 +1,29 @@
---
title: "Scrum"
type: concept
tags: []
sources: []
last_updated: 2026-04-24
---
## 定义
Scrum 是一种敏捷框架,通过固定时长的 Sprint冲刺组织工作每个 Sprint 通常为 1-4 周。
## 核心组件
- **Product Backlog** 按优先级排序的需求列表
- **Sprint Planning** 每个 Sprint 开始时的规划会议
- **Daily Scrum** 每日站会15 分钟内)
- **Sprint Review** Sprint 结束时演示已完成的工作
- **Sprint Retrospective** Sprint 结束时回顾改进点
## 局限性
- Sprint 期间不允许变更no changes during sprint
- 固定节奏适合稳定需求,但不适合云转型等高变化频率的项目
- 云转型团队因此转向 Kanban
## 与其他框架的关系
- **vs [[Kanban]]** Scrum 有固定 Sprint 边界Kanban 无固定迭代周期
- **混合方案:** 企业实践中常见 Scrum 仪式(站会、回顾会议)+ Kanban 持续流动
## 来源
- [[ctp-topic-4-using-agile-to-run-the-cloud-transformation-program]]

View File

@@ -0,0 +1,56 @@
---
title: "Standard Change"
type: concept
tags: [Change-Management, ITSM, Automation, IaC, CI-CD]
last_updated: 2026-04-14
---
## Definition
Standard Change标准变更是一种预批准的变更类型无需变更咨询委员会CAB的审批。在理想的 DevOps/SRE 实践中,标准变更应实现完全自动化,通过 IaC基础设施即代码+ CI/CD Pipeline 实现。
## Characteristics
|| Attribute | Value |
|-----------|--------|
| Approval Required | 否(预批准) |
| CAB Involvement | 无需 |
| Automation Level | 完全自动化IaC + CI/CD |
| Risk Level | 低(已知变更,已评估) |
| Change Window | 无限制 |
## Implementation
标准变更的实现依赖于以下技术栈:
1. **IaC基础设施即代码**Terraform/Terragrunt 声明式定义基础设施
2. **CI/CD Pipeline**Jenkins/GitHub Actions 自动执行 plan/apply
3. **Automated Testing**:在部署前自动验证基础设施变更
4. **Tagging Standards**:符合 AWS 标签规范,确保 Checkpoint 防火墙正确放行
## Example Use Cases
- AWS 标签验证Tag Validation Tool自动扫描资源合规性
- VPC 自动化供给Topic 61CIDR ≤ /22 自动审批
- 标准 AMI 更新Topic 50自动构建和发布
## Relationship with Other Change Types
```
Standard Change ──(通过自动化提升)──→ Normal Change
CAB 审批
有风险影响
Emergency Change ──(CAPA 修复根因)──→ Standard Change
立即执行
```
## Goal
将尽可能多的变更归类为标准变更,通过自动化减少人工审批负担,提高变更频率和可靠性。
## Sources
- [[ctp-topic-30-managing-change]]

View File

@@ -0,0 +1,37 @@
---
title: "Stretched Cluster"
type: concept
tags:
- VMware
- High-Availability
- AWS
last_updated: 2026-04-25
---
## Stretched Cluster
A cluster architecture that spans multiple availability zones, providing high availability and disaster recovery capabilities across geographic locations.
## Definition
In the context of VMware Cloud on AWS, a stretched cluster extends across availability zones (AZs) to provide increased resilience. If one AZ experiences failure, workloads can continue running in the other AZ.
## Key Characteristics
- **Cross-AZ Deployment**: Cluster nodes distributed across multiple availability zones
- **High Availability**: Automatic failover if one AZ becomes unavailable
- **Disaster Recovery**: Built-in DR capability without additional configuration
- **Low Latency**: AWS AZs are designed for low latency interconnectivity
## Use Cases
- Mission-critical applications requiring high availability
- Disaster recovery planning
- Compliance requirements for geographic redundancy
- Applications with strict RTO/RPO requirements
## Connections
- [[VMware-Cloud-on-AWS]] ← enables ← [[Stretched-Cluster]]
- [[ctp-topic-43-vmware-cloud-on-aws]] ← source ← [[Stretched-Cluster]]
- [[High-Availability]] ← implements ← [[Stretched-Cluster]]
- [[Disaster-Recovery]] ← supports ← [[Stretched-Cluster]]
## Sources
- [[ctp-topic-43-vmware-cloud-on-aws]]

View File

@@ -0,0 +1,54 @@
---
title: "VMware Cloud on AWS (VMC on AWS)"
type: concept
tags:
- VMware
- AWS
- Hybrid-Cloud
- VMC
last_updated: 2026-04-25
---
## VMware Cloud on AWS (VMC on AWS)
A jointly engineered cloud service by VMware and AWS, where the VMware hypervisor runs natively on AWS bare metal servers. This is not simply deploying VMware software onto cloud infrastructure — it is a true joint engineering collaboration.
## Definition
VMC on AWS provides a middle ground for organizations not ready for a full native cloud migration. It allows vSphere workloads to move back and forth between on-premises and AWS cloud environments in seconds.
## Key Characteristics
- **Native Hypervisor**: VMware vSphere 8 runs natively on AWS hardware (i3.metal / i3en.metal)
- **Joint Engineering**: Not a simple software deployment — VMware and Amazon engineers the service together
- **Same Toolset**: Organizations use the same vSphere tools they use on-premises
- **AWS Service Integration**: Native access to AWS services with low latency
- **On-Demand Scalability**: Scale resources up or down as needed
- **HCX Migration**: Hybrid Cloud Extension enables any-to-any vSphere workload migration
- **27% Cost Savings**: Compared to using regular cloud compute services
## Use Cases
- Next-generation application development
- Cloud migration
- Virtual desktops (VDI)
- Disaster recovery
## Infrastructure
- **Server Hosts**: i3.metal and i3en.metal bare metal servers
- **Organization**: Clusters within availability zones and regions
- **Stretched Clusters**: Cross-AZ clusters for increased resilience
- **Management**: vCenter (same as on-premises)
## Cost Model
- VMware sells an entire host (enabling over-provisioning and cost reduction)
- Cloud economics team can perform TCO (Total Cost of Ownership) calculations
- Compare costs with on-premises or other hyperscalers
## Connections
- [[VMware]] ← provides ← [[VMware-Cloud-on-AWS]]
- [[AWS]] ← hosts ← [[VMware-Cloud-on-AWS]]
- [[HCX]] ← enables ← [[VMware-Cloud-on-AWS]] migration
- [[SDDC]] ← architecture ← [[VMware-Cloud-on-AWS]]
- [[Stretched-Cluster]] ← extends ← [[VMware-Cloud-on-AWS]] resilience
- [[Hybrid-Cloud]] ← implements ← [[VMware-Cloud-on-AWS]]
## Sources
- [[ctp-topic-43-vmware-cloud-on-aws]]