Files
nexus/wiki/sources/ctp-topic-59-achieving-reliability-with-amazon-eks.md
2026-04-19 14:51:38 +08:00

57 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "CTP Topic 59 Achieving reliability with Amazon EKS"
type: source
tags: [AWS, EKS, Kubernetes, Reliability, CTP]
date: 2026-04-14
---
## Source File
- [[raw/Cloud & DevOps/Public-Cloud-Learning-Sessions/04_EKS/ctp-topic-59-achieving-reliability-with-amazon-eks.md]]
## Summary
- 核心主题Amazon EKS 可靠性实践,涵盖容器服务选型、共享责任模型、三层可靠性设计
- 问题域:如何在 EKS 上构建高可靠性 Kubernetes 集群
- 方法/机制应用可靠性Pod 分布、HPA、VPA、探针、控制平面可靠性监控、认证、集群升级、数据平面可靠性节点检测、资源预留、QoS
- 结论/价值EKS 可靠性需要从应用层、控制层、数据层全面考虑AWS 与客户按共享责任模型分工
## Key Claims
- ECS 适合容器入门用户EKS 适合熟悉 Kubernetes 生态的用户
- 可靠性是指系统在故障发生时仍能提供可预测行为
- AWS 负责管理控制平面API Server、etcd、Scheduler、Controller Manager客户负责数据平面Worker Node、OS、应用配置
- Fargate 模式下客户无需管理节点和补丁升级
- 应用可靠性通过 Pod 反亲和性、拓扑分布约束、HPA/VPA、探针、Pod 中断预算实现
- 控制平面可靠性通过监控控制平面指标、安全认证、精心配置的 webhook、集群升级实现
- 数据平面可靠性通过节点问题检测器、系统资源预留、QoS 资源配额实现
## Key Quotes
> "Reliability in a system means it offers predictable behavior even when failures occur." — Surav Paul
> "ECS is a more AWS opinionated way of running containers." — Surav Paul
> "With Fargate, you don't have to worry about managing the nodes or worrying about patching or upgrading the nodes." — Surav Paul
## Key Concepts
- [[EKS 可靠性]]:系统在故障发生时仍提供可预测行为
- [[共享责任模型]]AWS 管理控制平面,客户负责数据平面和应用
- [[Pod 反亲和性]]:避免 Pod 部署在同一节点或可用区
- [[拓扑分布约束]]:细粒度控制 Pod 在可用区间的分布
- [[HPA]]Horizontal Pod Autoscaler根据 CPU/内存自动扩展 Pod
- [[VPA]]Vertical Pod Autoscaler自动调整 Pod 资源请求
- [[探针]]Liveness、Readiness、Startup 探针用于 Pod 健康检测
- [[Pod 中断预算]]:确保维护期间仍提供最低服务水平
## Key Entities
- [[Surav Paul]]AWS 高级解决方案架构师,本主题演讲人
- [[AWS]]:公有云平台,提供 EKS 服务
- [[EKS]]Elastic Kubernetes ServiceAWS 托管 Kubernetes 服务
- [[ECS]]Elastic Container ServiceAWS 容器服务
- [[Fargate]]AWS 无服务器容器运行环境
## Connections
- [[EKS]] ← 使用 [[共享责任模型]] ← [[AWS]]
- [[Surav Paul]] ← 演讲 [[CTP Topic 59 Achieving reliability with Amazon EKS]]
- [[CTP Topic 59 Achieving reliability with Amazon EKS]] ← 依赖 [[EKS]]
- [[CTP Topic 70 EKS Deployment using IAC]] ← 关联主题
## Contradictions
- (暂无)