Files
nexus/wiki/concepts/监控体系.md
2026-04-14 16:02:50 +08:00

39 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "监控体系"
type: concept
tags: [monitoring, prometheus, grafana, metrics]
last_updated: 2025-11-11
---
## Definition
监控体系是用于收集、展示和告警系统/应用指标的完整解决方案。
## 核心组件
| 组件 | 用途 |
|------|------|
| Prometheus | 时序数据库,采集和存储指标 |
| Grafana | 可视化仪表盘和告警管理 |
| Alertmanager | 告警分发(邮件/Slack/Webhook |
## 数据采集层Exporters
| Exporter | 采集内容 | 端口 |
|----------|----------|------|
| node_exporter | 主机指标(CPU/内存/磁盘/网络) | 9100 |
| cAdvisor | Docker容器指标 | 8080 |
| blackbox_exporter | HTTP/TCP/DNS探测 | 9115 |
## 监控维度
1. **主机层**CPU、内存、磁盘、网络、I/O
2. **容器层**:运行状态、重启次数、资源限制
3. **服务层**HTTP可用性、响应码、延迟、错误率、TLS证书
4. **日志层**:应用错误/异常可选Loki
## 推荐告警规则
- CPU使用率>85%持续2分钟
- 磁盘剩余空间<10%
- 内存可用<15%
- HTTP探测失败连续2分钟
- TLS证书剩余<14天
## 相关文档
- [[家庭监控方案Prometheus + Grafana + Node Exporter + cAdvisor + Blackbox]]