Sync: update nexus knowledgebase content
This commit is contained in:
68
wiki/concepts/Amazon-EKS.md
Normal file
68
wiki/concepts/Amazon-EKS.md
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
title: "Amazon EKS"
|
||||
type: concept
|
||||
tags: [AWS, Kubernetes, 托管服务, 容器编排]
|
||||
sources: [ctp-topic-70-eks-deployment-using-iac, public-cloud-learning-sessions-eks-optimization-part-3-of-3-introduction-to-eks, ctp-topic-59-achieving-reliability-with-amazon-eks, ctp-topic-64-scaling-out-with-amazon-eks]
|
||||
last_updated: 2026-04-24
|
||||
---
|
||||
|
||||
# Amazon EKS
|
||||
|
||||
## Overview
|
||||
Amazon Elastic Kubernetes Service (EKS) 是 AWS 提供的托管 Kubernetes 服务,提供完全托管的控制平面,支持零停机滚动部署和 Worker Node 自动扩缩容。
|
||||
|
||||
## Key Features
|
||||
- **完全托管控制平面**:AWS 自动管理 Kubernetes Control Plane 的可用性和扩展性
|
||||
- **零停机滚动部署**:Worker Node 更新时实现零停机
|
||||
- **IAM RBAC Mapping**:通过最小权限原则控制 EKS 集群访问
|
||||
- **多可用区高可用**:自动跨多个 AZ 部署 Control Plane
|
||||
- **与 AWS 服务集成**:VPC、CNI、IAM、CloudWatch、ALB 等原生集成
|
||||
|
||||
## Deployment Methods
|
||||
### Terraform
|
||||
通过 `tera-grant.scl` 文件定义集群参数:
|
||||
- 环境变量配置
|
||||
- EKS 集群版本
|
||||
- Worker Node 类型(CPU/GPU/Default)
|
||||
- AWS Secret Manager 集成(工程联系人通知)
|
||||
|
||||
### AWS Service Catalog
|
||||
通过产品组合模块化部署:
|
||||
- 版本选择
|
||||
- Worker Node 类型配置
|
||||
- 更精细的安全和权限控制
|
||||
|
||||
## Networking
|
||||
### EMI (ENI Multi-IP)
|
||||
自定义网络解决方案,解决 VPC CIDR 限制:
|
||||
- 为 Pod 分配额外 IP 地址
|
||||
- 通过虚拟弹性网络接口(ENI)实现
|
||||
- 支持更高的 Pod 密度
|
||||
|
||||
### ALB Ingress Controller
|
||||
AWS Load Balancer Controller 集成:
|
||||
- 管理 Application Load Balancer 资源
|
||||
- 实现 Kubernetes Service 的七层负载均衡
|
||||
- 自动配置路由规则
|
||||
|
||||
## Autoscaling
|
||||
### Cluster Autoscaler
|
||||
Kubernetes Cluster Autoscaler:
|
||||
- 根据资源需求自动扩缩 Worker Node
|
||||
- 与 AWS Auto Scaling Groups 集成
|
||||
- 未来计划引入 Carpenter 实现更高效的实例类型创建
|
||||
|
||||
## Monitoring
|
||||
- **CloudWatch Agent + FluentBit**:以 DaemonSet 方式部署,收集日志和指标
|
||||
- **Container Insights**:发布容器级别指标到 CloudWatch
|
||||
- **AWS OpenTelemetry**:统一的可观测性数据采集方案
|
||||
- **Grafana**:通过模板化仪表盘可视化指标
|
||||
|
||||
## Related Concepts
|
||||
- [[Kubernetes]]:EKS 的底层技术
|
||||
- [[Infrastructure as Code]]:EKS 部署的推荐方式
|
||||
- [[AWS Service Catalog]]:EKS 部署的 Service Catalog 方式
|
||||
- [[Cluster Autoscaler]]:Worker Node 自动扩缩容
|
||||
- [[EMI]]:EKS 自定义网络方案
|
||||
- [[CloudWatch Container Insights]]:EKS 监控方案
|
||||
- [[AWS OpenTelemetry]]:可观测性数据采集
|
||||
42
wiki/concepts/Cloud-Monitoring.md
Normal file
42
wiki/concepts/Cloud-Monitoring.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
title: Cloud Monitoring
|
||||
type: concept
|
||||
tags: [AWS, CloudOps, Observability, CTP, Monitoring]
|
||||
date: 2026-04-14
|
||||
---
|
||||
|
||||
## Definition
|
||||
Cloud Monitoring(云监控)是指在公有云环境(AWS/Azure/GCP)中,对基础设施、服务器、应用程序、硬件和网络等数据源进行持续监控和事件采集的系统性实践。云监控的核心挑战在于云环境的动态性——资源生命周期短、数量庞大、跨多账户多区域分布,传统基于静态服务器的监控工具难以有效覆盖。
|
||||
|
||||
## Core Properties
|
||||
- **动态发现**:云环境中资源随时创建/销毁,监控必须支持自动发现而非静态配置
|
||||
- **多账户覆盖**:AWS Organizations 多账户架构下,需要集中化监控能力
|
||||
- **无代理采集**:云环境下倾向于通过 API(如 CloudWatch)而非在被监控目标上安装 Agent
|
||||
- **跨平台支持**:现代监控解决方案需支持 AWS/Azure/GCP 等多云环境
|
||||
- **策略驱动**:通过 Policy/Management Pack 定义监控规则,实现规模化管理
|
||||
|
||||
## Key Mechanisms
|
||||
- **CloudWatch API**:AWS 的指标和日志服务,是 AWS 云监控的统一数据源
|
||||
- **IAM Role 跨账户访问**:通过角色信任关系实现监控账户安全读取被监控账户数据,无需共享 Access Key
|
||||
- **Management Pack**:监控平台(如 OBM)的策略包,定义采集间隔、指标、阈值和数据源
|
||||
- **Global/Regional 分层架构**:区域级 OBM 采集数据 → 全球级 OBM 汇聚 → 工单系统触发事件处理
|
||||
|
||||
## Comparison with Traditional Monitoring
|
||||
| 维度 | 传统监控 | 云监控 |
|
||||
|------|---------|--------|
|
||||
| 目标发现 | 手动添加 | 自动发现 |
|
||||
| 部署模式 | 被监控目标安装 Agent | API 拉取(无代理) |
|
||||
| 账户覆盖 | 单点监控 | 多账户集中采集 |
|
||||
| 伸缩性 | 固定容量 | 按需弹性 |
|
||||
| 密钥管理 | 共享 Access Key | IAM Role 信任关系 |
|
||||
|
||||
## Related Concepts
|
||||
- [[Multi-Account-Deployment]]:云监控的多账户架构基础
|
||||
- [[Landing-Zone-Architecture]]:监控账户是 Landing Zone 的一部分
|
||||
- [[IAM-Role]]:跨账户安全访问的核心机制
|
||||
- [[Management-Pack]]:云监控策略化管理的具体实现
|
||||
- [[Cloud-Native]]:云原生监控的自然延伸
|
||||
|
||||
## References
|
||||
- [[ctp-topic-8-obm-cloud-monitoring]]:OBM AWS 云监控完整实现方案
|
||||
- [[ctp-topic-29-cloud-monitoring-saas-lz-accounts]]:SaaS Landing Zone 监控账户架构
|
||||
30
wiki/concepts/Cluster-Autoscaler.md
Normal file
30
wiki/concepts/Cluster-Autoscaler.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: "Cluster Autoscaler"
|
||||
type: concept
|
||||
tags: [Kubernetes, 自动扩缩容, 云原生]
|
||||
sources: [ctp-topic-70-eks-deployment-using-iac, ctp-topic-64-scaling-out-with-amazon-eks]
|
||||
last_updated: 2026-04-24
|
||||
---
|
||||
|
||||
# Cluster Autoscaler
|
||||
|
||||
## Overview
|
||||
Cluster Autoscaler 是 Kubernetes 的自动扩缩容组件,根据资源需求自动调整 Worker Node 的数量,实现基础设施的弹性伸缩。
|
||||
|
||||
## How It Works
|
||||
1. **监控资源使用情况**:定期检查 Pod 的调度状态
|
||||
2. **检测资源不足**:当 Pod 因资源不足无法调度时触发扩容
|
||||
3. **调用云提供商的 API**:AWS 上与 Auto Scaling Groups 集成
|
||||
4. **自动启动新节点**:在可用区中启动新 EC2 实例
|
||||
5. **缩容检测**:当节点利用率低且 Pod 可安全驱逐时,触发缩容
|
||||
|
||||
## AWS Integration
|
||||
- 与 AWS Auto Scaling Groups 深度集成
|
||||
- 支持多个 Auto Scaling Groups
|
||||
- 根据 Pod 需求自动选择合适的实例类型
|
||||
|
||||
## Related Concepts
|
||||
- [[Amazon EKS]]:Cluster Autoscaler 部署的目标平台
|
||||
- [[Kubernetes]]:Cluster Autoscaler 是 Kubernetes 的核心组件
|
||||
- [[Horizontal Pod Autoscaler (HPA)]]:Pod 级别的水平扩缩容(HPA 扩 Pod,CA 扩 Node)
|
||||
- [[Vertical Pod Autoscaler (VPA)]]:Pod 级别的垂直扩缩容
|
||||
35
wiki/concepts/EKS-Auto-Mode.md
Normal file
35
wiki/concepts/EKS-Auto-Mode.md
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "EKS Auto Mode"
|
||||
type: concept
|
||||
tags: []
|
||||
last_updated: 2025-03-04
|
||||
---
|
||||
|
||||
## Summary
|
||||
EKS Auto Mode 是 Amazon EKS 的半托管计算模式,将 Kubernetes 数据平面(计算节点)的生命周期管理责任从用户扩展至 AWS。用户只需关注 VPC 配置、集群配置和 workload 配置,AWS 自动处理节点采购、OS 补丁、安全更新和滚动升级。
|
||||
|
||||
## Definition
|
||||
AWS EKS 的半托管计算选项,通过 Carpenter Controller 自动管理节点池的生命周期,包括实例采购、操作系统(Bottlerocket)维护、安全补丁和版本升级。兼容所有 Kubernetes-compliant 工作负载,无需修改应用代码。
|
||||
|
||||
## Key Components
|
||||
- **Carpenter Controller**:计算控制器,运行于集群内,负责节点池生命周期管理、AMI 版本管理和滚动升级编排
|
||||
- **Bottlerocket OS**:Amazon 开发的容器专用最小化 Linux 操作系统,专为 Auto Mode 设计,自动应用安全补丁
|
||||
- **Default Node Pools**:两个内置节点池(General Purpose 锁定 AMD64 + System 带 taint),权重为零支持自定义池优先级
|
||||
- **Core Capabilities**:计算(Carpenter)、网络(AWS LB Controller)、存储(EBS CSI)、安全(Pod Identity Associations)
|
||||
- **Prefix Delegation**:VPCCNI 特性,为 Pod 分配 /28 前缀 IP 块,默认启用
|
||||
|
||||
## Mechanism
|
||||
1. 用户启用 Auto Mode 后,AWS 在集群内部署 Carpenter Controller(作为 core capability)
|
||||
2. Carpenter 监听控制面版本变更,自动识别新版本对应的 AMI
|
||||
3. 控制面升级完成后,Carpenter 自动触发节点 AMI 滚动升级
|
||||
4. 12% 实例费用溢价覆盖自动化管理成本
|
||||
|
||||
## Trade-offs
|
||||
- **优点**:大幅降低 K8s 运维负担;自动化 OS 安全补丁;版本升级自动化
|
||||
- **缺点**:每个 Auto Mode 实例附加 12% 管理溢价;节点配置灵活性受限;不支持裸金属实例
|
||||
|
||||
## Related Concepts
|
||||
- [[Carpenter Controller]]:EKS Auto Mode 的计算控制器实现
|
||||
- [[Bottlerocket OS]]:Auto Mode 的默认操作系统
|
||||
- [[Pod Identity Associations]]:Auto Mode 的 Pod 级 IAM 权限控制机制
|
||||
- [[Kubernetes]]:EKS Auto Mode 基于的标准容器编排平台
|
||||
@@ -1,51 +1,37 @@
|
||||
# Infrastructure as Code (IaC)
|
||||
---
|
||||
title: "Infrastructure as Code"
|
||||
type: concept
|
||||
tags: [DevOps, 自动化, 配置管理]
|
||||
sources: [ctp-topic-70-eks-deployment-using-iac, learning-sessions-ecs-deployment-using-iac-20230808-183322-meeting-recording, ctp-topic-16-cross-account-terraform-modules, ctp-topic-12-using-ses-smtp-service-terraform-module, learning-sessions-cloud-transformation-programme-deploying-rds-via-terraform]
|
||||
last_updated: 2026-04-24
|
||||
---
|
||||
|
||||
## Definition
|
||||
Infrastructure as Code is the practice of managing and provisioning infrastructure through machine-readable configuration files rather than manual processes.
|
||||
# Infrastructure as Code
|
||||
|
||||
## Key Principles
|
||||
- **Version Control**: All infrastructure configurations are stored in version control
|
||||
- **Idempotency**: Running the same configuration produces the same result
|
||||
- **Automation**: Infrastructure provisioning is automated and repeatable
|
||||
- **Documentation**: Code serves as documentation
|
||||
## Overview
|
||||
Infrastructure as Code (IaC) 是一种通过代码定义和管理基础设施的方法,实现基础设施的标准化、可审计和可重复部署。
|
||||
|
||||
## Tools
|
||||
- **Terraform**: Multi-cloud IaC tool using HCL
|
||||
- **Ansible**: Configuration management and orchestration
|
||||
- **CloudFormation**: AWS-native infrastructure provisioning
|
||||
- **CloudFormation StackSets**: AWS-native cross-account/cross-region deployment extension for CloudFormation
|
||||
- **Pulumi**: IaC using general-purpose programming languages
|
||||
- **Terragrunt**: Wrapper for Terraform providing organization
|
||||
## Core Principles
|
||||
- **声明式配置**:定义期望的状态,而非执行的具体步骤
|
||||
- **版本控制**:所有基础设施配置纳入 Git 版本控制
|
||||
- **自动化部署**:通过 CI/CD 流水线自动化执行部署
|
||||
- **幂等性**:重复执行相同配置不产生副作用
|
||||
|
||||
## Best Practices
|
||||
1. Use modules for reusable components
|
||||
2. Separate state management (remote state with locking)
|
||||
3. Implement proper access controls
|
||||
4. Use workspaces for environment separation
|
||||
5. Enable drift detection
|
||||
6. Implement automated testing for IaC
|
||||
## Key Tools
|
||||
- **Terraform**:HashiCorp 的基础设施编排工具,支持多云
|
||||
- **AWS CloudFormation**:AWS 原生的 IaC 服务
|
||||
- **AWS Service Catalog**:AWS 的服务目录,封装标准化产品组合
|
||||
- **Pulumi**:使用编程语言(Python, TypeScript 等)定义基础设施
|
||||
|
||||
## IaC Across DevOps Maturity Levels
|
||||
|
||||
| Maturity | IaC Maturity |
|
||||
|----------|-------------|
|
||||
| Phase 1 | Manual infrastructure management, servers managed individually, error-prone and slow |
|
||||
| Phase 2 | Version control used for environments and configurations, but provisioning still manual |
|
||||
| Phase 3 | Most infrastructure automated, provisioning repeatable and reliable |
|
||||
| Phase 4 | Immutable infrastructure — old servers replaced rather than updated, managed through CI/CD pipelines |
|
||||
| Phase 5 | Full automation, zero human intervention, infrastructure changes flow through automated pipelines |
|
||||
|
||||
## Sources
|
||||
- [[sources/cloud-devop-maturity-guideline.md]]
|
||||
- [[sources/devops-maturity-model-from-traditional-it-to-advanced-devops.md]]
|
||||
## Key Concepts
|
||||
- **HCL (HashiCorp Configuration Language)**:Terraform 的配置语言
|
||||
- **State Management**:Terraform 使用 state 文件追踪资源
|
||||
- **Modules**:可重用的基础设施组件
|
||||
- **Remote State**:远程状态存储,支持团队协作
|
||||
|
||||
## Related Concepts
|
||||
- [[concepts/DevOps-Maturity]]
|
||||
- [[concepts/CI-CD-Pipeline]]
|
||||
- [[concepts/GitOps]]
|
||||
- [[concepts/Scalability]]
|
||||
- [[concepts/Cloud-Native]]
|
||||
|
||||
## Ingested
|
||||
- Date: 2026-04-21
|
||||
- Date: 2026-04-24 (updated with maturity level progression)
|
||||
- [[Terraform]]:最流行的 IaC 工具之一
|
||||
- [[AWS Service Catalog]]:AWS IaC 产品目录
|
||||
- [[GitOps]]:基于 Git 的运维方法论
|
||||
- [[CI/CD Pipeline]]:自动化部署流水线
|
||||
- [[DevOps Culture]]:IaC 是 DevOps 实践的核心组成
|
||||
|
||||
43
wiki/concepts/Kubernetes.md
Normal file
43
wiki/concepts/Kubernetes.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Kubernetes"
|
||||
type: concept
|
||||
tags: [容器编排, 云原生, 分布式系统]
|
||||
sources: [ctp-topic-70-eks-deployment-using-iac]
|
||||
last_updated: 2026-04-24
|
||||
---
|
||||
|
||||
# Kubernetes
|
||||
|
||||
## Overview
|
||||
Kubernetes(K8s)是 Google 开源的容器编排平台,用于分布式系统的弹性运行。提供自动化部署、扩缩容、负载均衡、滚动更新和回滚等核心能力。
|
||||
|
||||
## Key Features
|
||||
- **自动化部署与回滚**:根据声明式配置自动管理应用版本
|
||||
- **服务发现与负载均衡**:内置 DNS 和负载均衡机制
|
||||
- **自愈能力**:自动重启失败的容器,替换不健康的节点
|
||||
- **水平扩缩容**:根据 CPU/内存指标自动调整 Pod 数量
|
||||
- **存储编排**:支持多种存储后端(AWS EBS, NFS, Ceph 等)
|
||||
- **密钥与配置管理**:管理敏感信息和配置,无需重建镜像
|
||||
|
||||
## Architecture
|
||||
- **Control Plane**:主节点,包含 API Server、Scheduler、Controller Manager、etcd
|
||||
- **Worker Nodes**:工作节点,运行 Pod,包含 kubelet、kube-proxy、Container Runtime
|
||||
|
||||
## Key Concepts
|
||||
- **Pod**:最小部署单元,一个 Pod 可包含一个或多个容器
|
||||
- **Deployment**:声明式更新,管理 Pod 副本数和滚动更新策略
|
||||
- **Service**:稳定的网络访问入口,通过标签选择器路由流量
|
||||
- **Ingress**:管理 HTTP/HTTPS 路由,实现七层负载均衡
|
||||
- **ConfigMap/Secret**:存储配置和敏感数据
|
||||
- **Namespace**:资源隔离和访问控制
|
||||
|
||||
## Related Concepts
|
||||
- [[Amazon EKS]]:AWS 托管的 Kubernetes 服务
|
||||
- [[Cluster Autoscaler]]:Kubernetes 自动扩缩容组件
|
||||
- [[Infrastructure as Code]]:用于声明式管理 Kubernetes 资源
|
||||
- [[Cloud-Native]]:云原生应用的核心理念
|
||||
- [[Container(容器)]]:Pod 的基础运行时
|
||||
|
||||
## Related Entities
|
||||
- [[Google]]:Kubernetes 的创始公司(2015 年捐给 CNCF)
|
||||
- [[CNCF]]:托管 Kubernetes 项目的开源基金会
|
||||
37
wiki/concepts/Management-Pack.md
Normal file
37
wiki/concepts/Management-Pack.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
title: Management Pack
|
||||
type: concept
|
||||
tags: [Micro-Focus, OBM, Monitoring, Policy, AWS]
|
||||
date: 2026-04-14
|
||||
---
|
||||
|
||||
## Definition
|
||||
Management Pack(管理包)是 Micro Focus Operations Bridge Manager (OBM) 等企业监控平台的策略包机制,通过声明式配置定义监控目标、数据采集间隔、具体指标、阈值规则和数据源,实现云环境下规模化的统一监控策略管理。
|
||||
|
||||
## Core Properties
|
||||
- **声明式配置**:以 Policy 形式定义监控规则,而非手动逐个配置
|
||||
- **自动化发现**:新增资源自动匹配 Policy,无需人工干预
|
||||
- **阈值驱动**:Policy 内置阈值配置,指标超出阈值自动触发事件
|
||||
- **跨平台抽象**:同一 Management Pack 框架可适配 AWS/Azure/GCP 等多云平台
|
||||
- **动态生效**:Policy 变更后自动推送到 Agent,无需重启服务
|
||||
|
||||
## How It Works (OBM AWS Management Pack)
|
||||
1. **Policy 定义**:在 OBM 控制台创建 AWS Management Pack Policy,配置项包括:
|
||||
- Role ARN(跨账户 IAM 角色)
|
||||
- Account ID(被监控账户)
|
||||
- Namespace/Service(监控的 AWS 服务,如 EC2/RDS/Lambda)
|
||||
- Metrics(具体指标列表)
|
||||
- Thresholds(阈值规则)
|
||||
- Monitoring Frequency(采集频率)
|
||||
- Title Format(告警标题格式,供服务台团队使用)
|
||||
2. **Agent 执行**:Operation Agent 接收 Policy 后,按配置调用 CloudWatch API 采集数据
|
||||
3. **事件触发**:数据与阈值比对,超出阈值则生成事件并通过 SMACKS 触发工单
|
||||
4. **自动部署**:被监控账户新增实例时,Agent 自动识别并纳入监控范围
|
||||
|
||||
## Related Concepts
|
||||
- [[Cloud-Monitoring]]:Management Pack 是云监控策略化管理的核心工具
|
||||
- [[IAM-Role]]:Management Pack 通过 IAM Role 实现跨账户安全访问
|
||||
- [[Cloud-Monitoring]]:Management Pack 解决了云环境动态监控的核心挑战
|
||||
|
||||
## References
|
||||
- [[ctp-topic-8-obm-cloud-monitoring]]:AWS Management Pack 完整实操流程
|
||||
Reference in New Issue
Block a user