Batch ingest: Multi-Agent Team / DevOps Maturity / 一语点醒梦中人 / NodeWarden
Sources: - Agent-usecases-multi-Agent-Team.md - DevOps-Maturity-Model-From-Traditional-IT-to-Advanced-DevOps.md - AI-一语点醒梦中人.md - Home-Office-NodeWarden-把-Bitwarden-搬上-Cloudflare-Workers彻底告别服务器.md Entities: Trebuh, Cloudflare Concepts: DevOps成熟度模型, 共享内存模式, 空性智慧, 绝处逢生
This commit is contained in:
@@ -0,0 +1,153 @@
|
||||
---
|
||||
title: "家庭监控方案:Prometheus + Grafana + Node Exporter + cAdvisor + Blackbox"
|
||||
type: source
|
||||
tags: [monitoring, prometheus, grafana, self-hosted]
|
||||
date: 2025-11-11
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/Home Office/家庭监控方案:Prometheus + Grafana + Node Exporter + cAdvisor +Blackbox.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:家庭/小型实验室环境基于 Docker 的可观测性监控方案,覆盖主机层、容器层、服务层和合成监测
|
||||
- 问题域:如何用开源工具低成本构建完整的监控告警体系
|
||||
- 方法/机制:Prometheus 拉模式采集 + Grafana 可视化 + Alertmanager 告警分发;cAdvisor 采集容器指标;blackbox_exporter 做 HTTP/TCP/DNS 合成监测;node_exporter 采集主机指标
|
||||
- 结论/价值:提供两套 docker-compose 模板(轻量/PoC),以及可直接拷贝的 prometheus.yml、告警规则和 Alertmanager 配置
|
||||
|
||||
## Key Claims
|
||||
- Prometheus 拉模式(pull-based)适配多主机监控,通过 scrape_configs 抓取各 exporter 指标
|
||||
- cAdvisor 容器指标需挂载 /var/lib/docker/ 才可完整采集容器资源使用情况
|
||||
- blackbox_exporter 支持 HTTP/TCP/ICMP/DNS 四类探测,可监控内外网服务可用性和 TLS 证书到期
|
||||
- Alertmanager 支持邮件/Slack/Webhook/PagerDuty 分组抑制告警,避免告警风暴
|
||||
- docker-compose 部署 Prometheus + Grafana + cAdvisor + blackbox_exporter + Alertmanager 一键启动
|
||||
- Grafana 导入 Dashboard 只需 ID(Node Exporter Full: 1860、cAdvisor: 14282、Blackbox: 7587)
|
||||
- Docker Socket 挂载存在安全风险,容器可获取宿主机 root 等同权限
|
||||
- TLS 证书到期可通过 probe_ssl_earliest_cert_expiry 指标监控,提前 14 天告警
|
||||
- 建议将监控流量放在管理 VLAN 或通过防火墙限定访问
|
||||
- Prometheus 本地磁盘会持续增长,长期保留需配置 remote_write 到 VictoriaMetrics 等远端存储
|
||||
|
||||
## Key Quotes
|
||||
> "Prometheus 本地磁盘会增长,考虑长期保留要用远端存储或定期 snapshot" — 生产级存储建议
|
||||
> "Prometheus 支持对同一网站设置下载延迟 + 随机化访问,防止被封禁" — 爬虫防封策略
|
||||
|
||||
## Key Concepts
|
||||
- [[Prometheus]]:开源时序数据库和监控告警系统,支持 PromQL 查询语言和告警规则引擎
|
||||
- [[Grafana]]:开源可观测性平台,支持时序数据可视化、仪表盘和告警通知
|
||||
- [[Alertmanager]]:Prometheus 生态告警分发组件,支持分组、抑制和路由
|
||||
- [[cAdvisor]]:Google 开源容器资源监控工具,采集 CPU、内存、网络、磁盘 I/O 指标
|
||||
- [[node_exporter]]:Prometheus 官方主机指标 exporter,采集 CPU、内存、磁盘、网络指标
|
||||
- [[blackbox_exporter]]:Prometheus 官方黑盒监测 exporter,支持 HTTP/TCP/DNS/ICMP 探测
|
||||
- [[PromQL]]:Prometheus Query Language,用于查询和聚合时序指标
|
||||
- [[可观测性]]:监控系统三大支柱(Metrics/Logs/Traces)
|
||||
- [[合成监测]]:Synthetic Monitoring,通过探针模拟用户请求检测服务可用性
|
||||
- [[Prometheus告警规则]]:基于 PromQL 表达式持续评估,达到阈值触发告警
|
||||
- [[Docker Socket安全]]:挂载 /var/run/docker.sock 等同给予容器宿主机 root 权限
|
||||
|
||||
## Key Entities
|
||||
- [[Uptime Kuma]]:自托管网站监控工具,支持 HTTP/TCP/DNS/TLS 探测,适合合成监测外层 UI
|
||||
- [[Loki]]:Grafana Labs 日志聚合系统,与 Prometheus/Grafana 原生集成,轻量级
|
||||
- [[VictoriaMetrics]]:高性能时序数据库,兼容 Prometheus remote_write API,适合长期存储
|
||||
- [[Portainer]]:Docker 可视化管理工具,不替代 Prometheus 但便于运维操作
|
||||
|
||||
## Connections
|
||||
- [[Prometheus]] ← scrape_configs ← [[node_exporter]]
|
||||
- [[Prometheus]] ← scrape_configs ← [[cAdvisor]]
|
||||
- [[Prometheus]] ← scrape_configs ← [[blackbox_exporter]]
|
||||
- [[Grafana]] ← 数据源 ← [[Prometheus]]
|
||||
- [[Alertmanager]] ← 告警接收 ← [[Prometheus]]
|
||||
- [[Grafana]] ← 仪表盘 ← [[cAdvisor]] / [[node_exporter]] / [[blackbox_exporter]]
|
||||
- [[Prometheus]] ← 远端存储 ← [[VictoriaMetrics]]
|
||||
|
||||
## Contradictions
|
||||
- 无明显冲突
|
||||
|
||||
## Infrastructure Code
|
||||
|
||||
### docker-compose.yml 核心配置
|
||||
|
||||
```yaml
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
ports: ["9090:9090"]
|
||||
volumes:
|
||||
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
|
||||
- prometheus-data:/prometheus
|
||||
command: ['--config.file=/etc/prometheus/prometheus.yml', '--storage.tsdb.path=/prometheus', '--web.enable-lifecycle']
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
ports: ["3000:3000"]
|
||||
environment:
|
||||
- GF_AUTH_ANONYMOUS_ENABLED=true
|
||||
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
|
||||
|
||||
node_exporter:
|
||||
image: prom/node-exporter:latest
|
||||
network_mode: "host"
|
||||
pid: "host"
|
||||
volumes:
|
||||
- /proc:/host/proc:ro
|
||||
- /sys:/host/sys:ro
|
||||
- /:/rootfs:ro
|
||||
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor:latest
|
||||
ports: ["8080:8080"]
|
||||
volumes:
|
||||
- /:/rootfs:ro
|
||||
- /var/run:/var/run:ro
|
||||
- /sys:/sys:ro
|
||||
- /var/lib/docker/:/var/lib/docker:ro
|
||||
|
||||
blackbox:
|
||||
image: prom/blackbox-exporter:latest
|
||||
ports: ["9115:9115"]
|
||||
```
|
||||
|
||||
### prometheus.yml scrape_configs
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'node_exporter'
|
||||
file_sd_configs:
|
||||
- files: ['/etc/prometheus/targets/node.yml']
|
||||
- job_name: 'cadvisor'
|
||||
file_sd_configs:
|
||||
- files: ['/etc/prometheus/targets/cadvisor.yml']
|
||||
- job_name: 'blackbox_http'
|
||||
metrics_path: /probe
|
||||
params: { module: [http_2xx] }
|
||||
file_sd_configs:
|
||||
- files: ['/etc/prometheus/targets/blackbox.yml']
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox:9115
|
||||
```
|
||||
|
||||
### 核心告警规则
|
||||
|
||||
```yaml
|
||||
- alert: HostHighCPU
|
||||
expr: avg(rate(node_cpu_seconds_total{mode="user"}[2m])) * 100 > 85
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "高 CPU 使用率"
|
||||
|
||||
- alert: HostLowDisk
|
||||
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
|
||||
- alert: TLSCertExpiring
|
||||
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
```
|
||||
Reference in New Issue
Block a user