Auto-sync: 2026-04-19 06:32
This commit is contained in:
31
wiki/concepts/Observability-Engineering.md
Normal file
31
wiki/concepts/Observability-Engineering.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "Observability Engineering"
|
||||
type: concept
|
||||
tags: [monitoring, sre]
|
||||
---
|
||||
|
||||
## Definition
|
||||
可观测性工程是通过收集、分析和利用系统运行时数据(指标、日志、追踪)来持续理解系统健康状态的能力。
|
||||
|
||||
## Three Pillars
|
||||
1. **Metrics(指标)**:数值型数据,如 CPU 使用率、请求延迟
|
||||
2. **Logs(日志)**:事件记录,详细描述系统活动
|
||||
3. **Traces(追踪)**:请求在系统中的完整调用链路
|
||||
|
||||
## Goal
|
||||
不仅知道"系统是否正常运行",更能理解"系统为什么这样运行",实现:
|
||||
- 问题快速定位
|
||||
- 根因分析
|
||||
- 主动式运维
|
||||
- 容量规划
|
||||
|
||||
## Related Tools
|
||||
- Prometheus:指标收集和存储
|
||||
- Grafana:可视化
|
||||
- Jaeger:分布式追踪
|
||||
- ELK Stack:日志分析
|
||||
|
||||
## Related Concepts
|
||||
- [[SRE]]:站点可靠性工程
|
||||
- [[Monitoring]]:监控
|
||||
- [[Alerting]]:告警
|
||||
Reference in New Issue
Block a user