Files
nexus/wiki/concepts/Autoscaling.md
2026-05-03 05:42:12 +08:00

52 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Autoscaling"
type: concept
tags: [sre, cloud, scalability, reliability, kubernetes]
last_updated: 2026-04-20
---
# Autoscaling
自动扩缩容Autoscaling是云原生系统中根据负载自动调整资源容量的机制但它与真正的弹性Elasticity有本质区别。
## Definition
Autoscaling 通过预定义的规则(如 CPU 使用率、请求队列长度等)自动增加或减少计算资源。它是一种**被动的、反应式的**机制。
## Key Limitation
> "Autoscaling is reactive, not resilient. Without caps, metrics, or overrides, it can worsen failures." — David Iyanu Jonathan
没有以下保护机制时Autoscaling 可能**加剧故障**
- **上限caps**:防止无限扩容
- **指标metrics**:确保扩容基于可靠数据
- **覆盖机制overrides**:允许人工干预
## Autoscaling vs. Elasticity
| Aspect | Autoscaling | [[Elasticity]] |
|--------|-------------|----------------|
| 性质 | 被动的、反应式的 | 主动的、前瞻性的 |
| 触发 | 基于指标阈值 | 基于策略和规划 |
| 保护机制 | 可能缺失 | 必须具备 |
| 故障时行为 | 可能加剧故障 | 设计上防止故障扩大 |
## Anti-Patterns
- **Autoscaling to Death**:系统在负载高峰时无限扩容,导致资源耗尽
- **No Upper Limits**:缺少上限导致成本爆炸
- **Metrics Blindness**:依赖单一指标,忽视系统整体健康状况
## Best Practices
1. 设置合理的扩容上限和缩容下限
2. 配置多维度指标(不仅仅是 CPU
3. 建立人工覆盖机制
4. 在非生产环境测试扩容策略
5. 监控 Autoscaling 本身的行为
## Related Concepts
- [[Elasticity]]
- [[Scalability]]
- [[Cluster-Autoscaler]]
- [[Cost-Optimization]]
## Source
- SRE Weekly Issue #513 — [[sre-weekly-issue-513]]