Files
nexus/wiki/entities/Kubernetes.md
2026-04-27 16:26:34 +08:00

4.2 KiB
Raw Blame History

title, type, tags, sources, created
title type tags sources created
Kubernetes entity
cloud
container
orchestration
devops
cloud-operating-model-key-strategies-and-best-practices
2026-04-25

Kubernetes

Definition

Kubernetes (K8s) 是 Google 开源的容器编排平台,用于自动化容器化应用的部署、扩缩容和管理。是云原生 (Cloud-Native) 架构的核心基础设施,也是 Agentic AI 自主修复 (Self-Healing) 的主要目标环境。

Aliases

  • K8s
  • Kubernetes
  • Container Orchestration Platform

Major Cloud Implementations

Provider Service Description
AWS EKS (Elastic Kubernetes Service) 托管 Kubernetes on AWS
GCP GKE (Google Kubernetes Engine) 托管 Kubernetes on GCP
Azure AKS (Azure Kubernetes Service) 托管 Kubernetes on Azure

Kubernetes Self-Healing Capabilities

Kubernetes 原生提供基础 Self-Healing 能力:

# Kubernetes Self-Healing 原生机制
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
  template:
    spec:
      terminationGracePeriodSeconds: 30
# 内置机制:
# - 自动重启失败的容器
# - 替换不健康的 Pod
# - 滚动更新确保服务可用

Agentic AI 在原生能力基础上提供更高级的自我修复

能力 Kubernetes 原生 Agentic AI Enhanced
Pod 重启 自动重启崩溃容器 智能分析根因 + 预防性重启
扩缩容 HPA 基于指标 预测性扩缩容
节点恢复 节点故障迁移 主动健康检查 + 预防性迁移
配置修复 需人工介入 AI 自动修正 ConfigMap/Secret

Agentic AI Monitoring Targets

┌─────────────────────────────────────────────────┐
│              Agentic AI for Kubernetes            │
├─────────────────────────────────────────────────┤
│  监控层                                          │
│  ├── Pod Metrics (CPU/Memory/Network)           │
│  ├── Workload Health (Deployment/ReplicaSet)   │
│  ├── Node Status (Ready/Condition)              │
│  └── Cluster Components (etcd, API Server)     │
│                                                  │
│  决策层                                          │
│  ├── Anomaly Detection (AI)                    │
│  ├── Root Cause Analysis (AI)                   │
│  └── Action Planning (AI)                       │
│                                                  │
│  执行层                                          │
│  ├── kubectl API (restart/migrate/scale)       │
│  ├── HPA Override (AI-driven scaling)          │
│  └── Config Updates (AI-driven fixes)           │
└─────────────────────────────────────────────────┘

Example

An AI agent monitoring AWS EKS clusters detects high CPU usage due to a rogue pod:

  • Pod payment-service-v2-abc123 CPU usage: 95%
  • AI correlates with recent deployment timestamp
  • AI identifies: Memory leak in new version
  • AI Actions:
    1. Scale deployment to 3 replicas (distribute load)
    2. Create rollback ticket
    3. Notify team via Slack
    4. Auto-rollback after approval
  • Agentic AI — Kubernetes 是 Agentic AI 的管理对象
  • EKS, GKE, AKS — 具体云服务商实现