Files
nexus/wiki/concepts/Observability-Engineering.md
2026-04-19 06:32:15 +08:00

31 lines
884 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Observability Engineering"
type: concept
tags: [monitoring, sre]
---
## Definition
可观测性工程是通过收集、分析和利用系统运行时数据(指标、日志、追踪)来持续理解系统健康状态的能力。
## Three Pillars
1. **Metrics指标**:数值型数据,如 CPU 使用率、请求延迟
2. **Logs日志**:事件记录,详细描述系统活动
3. **Traces追踪**:请求在系统中的完整调用链路
## Goal
不仅知道"系统是否正常运行",更能理解"系统为什么这样运行",实现:
- 问题快速定位
- 根因分析
- 主动式运维
- 容量规划
## Related Tools
- Prometheus指标收集和存储
- Grafana可视化
- Jaeger分布式追踪
- ELK Stack日志分析
## Related Concepts
- [[SRE]]:站点可靠性工程
- [[Monitoring]]:监控
- [[Alerting]]:告警