62 lines
3.1 KiB
Markdown
62 lines
3.1 KiB
Markdown
---
|
||
id: ctp-topic-41-nfrs-and-error-budgets
|
||
title: "CTP Topic 41 NFR's and Error Budgets"
|
||
type: source
|
||
tags: [cloud-learning, devops, sre]
|
||
date: 2026-04-14
|
||
sources:
|
||
- raw/Cloud & DevOps/Public-Cloud-Learning-Sessions/10_OpenText-Series/ctp-topic-41-nfrs-and-error-budgets.md
|
||
---
|
||
|
||
## Source File
|
||
- [[raw/Cloud & DevOps/Public-Cloud-Learning-Sessions/10_OpenText-Series/ctp-topic-41-nfrs-and-error-budgets.md]]
|
||
|
||
## Summary
|
||
- 核心主题:NFR(非功能需求)与 Error Budget(错误预算)在云和敏捷开发中的应用
|
||
- 问题域:如何平衡功能快速交付与系统可靠性要求
|
||
- 方法/机制:SRE 实践、SLI/SLO/SLA 体系、混沌工程
|
||
- 结论/价值:Error Budget 将失败正常化,弥合开发与运维之间的鸿沟
|
||
|
||
## Key Claims
|
||
- NFR(Non-Functional Requirements,非功能需求)是评判系统运行状况的标准,决定可用性、性能、安全等属性
|
||
- Error Budget(错误预算)是系统在不影响客户的前提下可以不可靠的最大时间量
|
||
- Error Budget = 1 - 可用性 SLO,例如 99.9% SLO 对应 0.1% Error Budget
|
||
- 混沌工程(Chaos Engineering)通过故意引发故障来测试系统韧性,确保满足 NFR
|
||
- AWS 共享责任模型下,企业必须自行架构和管理云服务以满足 NFR
|
||
|
||
## Key Quotes
|
||
> "We want to drive collaboration across our product groups and operations to ensure our obligation to our customers." — Brendan Standing
|
||
|
||
> "Error budgets normalize failure as part of the development process." — Brendan Standing
|
||
|
||
> "Perfect availability is 100%, and the error budget falls between the SLO and 100%." — Brendan Standing
|
||
|
||
## Key Concepts
|
||
- [[NFR(非功能需求)]]:评判系统运行状况的标准,如可用性、性能、安全性
|
||
- [[Error Budget(错误预算)]]:系统可不可靠而不影响客户的允许时间量
|
||
- [[SLI(服务等级指标)]]:可靠性的可量化度量指标
|
||
- [[SLO(服务等级目标)]]:服务应该达到的性能/可靠性目标
|
||
- [[SLA(服务等级协议)]]:客户级别的正式协议
|
||
- [[混沌工程]]:主动引入故障测试系统韧性的实践
|
||
- [[SRE(站点可靠性工程)]]:将软件工程方法应用于运维问题的学科
|
||
|
||
## Key Entities
|
||
- [[Brendan Standing]]:Micro Focus SRE 负责人,演讲者
|
||
- [[AWS]]:Amazon Web Services,云服务提供商,共享责任模型
|
||
- [[Micro Focus]]:软件公司,SRE 团队所在组织
|
||
|
||
## Connections
|
||
- [[SRE]] ← implements ← [[NFR(非功能需求)]]
|
||
- [[SRE]] ← uses ← [[Error Budget(错误预算)]]
|
||
- [[SLO(服务等级目标)]] ← derives ← [[Error Budget(错误预算)]]
|
||
- [[SLI(服务等级指标)]] ← measures ← [[SLO(服务等级目标)]]
|
||
- [[混沌工程]] ← validates ← [[NFR(非功能需求)]]
|
||
|
||
## Contradictions
|
||
- (暂无)
|
||
|
||
## Notes
|
||
- NFR Epic 目标:将 NFR 模板集成到 Sprint backlog,确保任何重大变更都考虑 NFR
|
||
- NFR 在云端应更规范化,利用云原生服务(如 AWS Backup 定义备份策略和测试频率)
|
||
- 监控能力对于衡量 Error Budget 是否耗尽至关重要
|
||
- 下一步:与产品团队合作,将 NFR 集成到 backlog,制定 SLO |