Batch ingest: Multi-Agent Team / DevOps Maturity / 一语点醒梦中人 / NodeWarden

Sources:
- Agent-usecases-multi-Agent-Team.md
- DevOps-Maturity-Model-From-Traditional-IT-to-Advanced-DevOps.md
- AI-一语点醒梦中人.md
- Home-Office-NodeWarden-把-Bitwarden-搬上-Cloudflare-Workers彻底告别服务器.md

Entities: Trebuh, Cloudflare
Concepts: DevOps成熟度模型, 共享内存模式, 空性智慧, 绝处逢生
This commit is contained in:
2026-04-15 18:05:17 +08:00
parent 426d48b57d
commit 5789476c23
60 changed files with 1577 additions and 118 deletions

View File

@@ -0,0 +1,153 @@
---
title: "家庭监控方案Prometheus + Grafana + Node Exporter + cAdvisor + Blackbox"
type: source
tags: [monitoring, prometheus, grafana, self-hosted]
date: 2025-11-11
---
## Source File
- [[raw/Home Office/家庭监控方案Prometheus + Grafana + Node Exporter + cAdvisor +Blackbox.md]]
## Summary
- 核心主题:家庭/小型实验室环境基于 Docker 的可观测性监控方案,覆盖主机层、容器层、服务层和合成监测
- 问题域:如何用开源工具低成本构建完整的监控告警体系
- 方法/机制Prometheus 拉模式采集 + Grafana 可视化 + Alertmanager 告警分发cAdvisor 采集容器指标blackbox_exporter 做 HTTP/TCP/DNS 合成监测node_exporter 采集主机指标
- 结论/价值:提供两套 docker-compose 模板(轻量/PoC以及可直接拷贝的 prometheus.yml、告警规则和 Alertmanager 配置
## Key Claims
- Prometheus 拉模式pull-based适配多主机监控通过 scrape_configs 抓取各 exporter 指标
- cAdvisor 容器指标需挂载 /var/lib/docker/ 才可完整采集容器资源使用情况
- blackbox_exporter 支持 HTTP/TCP/ICMP/DNS 四类探测,可监控内外网服务可用性和 TLS 证书到期
- Alertmanager 支持邮件/Slack/Webhook/PagerDuty 分组抑制告警,避免告警风暴
- docker-compose 部署 Prometheus + Grafana + cAdvisor + blackbox_exporter + Alertmanager 一键启动
- Grafana 导入 Dashboard 只需 IDNode Exporter Full: 1860、cAdvisor: 14282、Blackbox: 7587
- Docker Socket 挂载存在安全风险,容器可获取宿主机 root 等同权限
- TLS 证书到期可通过 probe_ssl_earliest_cert_expiry 指标监控,提前 14 天告警
- 建议将监控流量放在管理 VLAN 或通过防火墙限定访问
- Prometheus 本地磁盘会持续增长,长期保留需配置 remote_write 到 VictoriaMetrics 等远端存储
## Key Quotes
> "Prometheus 本地磁盘会增长,考虑长期保留要用远端存储或定期 snapshot" — 生产级存储建议
> "Prometheus 支持对同一网站设置下载延迟 + 随机化访问,防止被封禁" — 爬虫防封策略
## Key Concepts
- [[Prometheus]]:开源时序数据库和监控告警系统,支持 PromQL 查询语言和告警规则引擎
- [[Grafana]]:开源可观测性平台,支持时序数据可视化、仪表盘和告警通知
- [[Alertmanager]]Prometheus 生态告警分发组件,支持分组、抑制和路由
- [[cAdvisor]]Google 开源容器资源监控工具,采集 CPU、内存、网络、磁盘 I/O 指标
- [[node_exporter]]Prometheus 官方主机指标 exporter采集 CPU、内存、磁盘、网络指标
- [[blackbox_exporter]]Prometheus 官方黑盒监测 exporter支持 HTTP/TCP/DNS/ICMP 探测
- [[PromQL]]Prometheus Query Language用于查询和聚合时序指标
- [[可观测性]]监控系统三大支柱Metrics/Logs/Traces
- [[合成监测]]Synthetic Monitoring通过探针模拟用户请求检测服务可用性
- [[Prometheus告警规则]]:基于 PromQL 表达式持续评估,达到阈值触发告警
- [[Docker Socket安全]]:挂载 /var/run/docker.sock 等同给予容器宿主机 root 权限
## Key Entities
- [[Uptime Kuma]]:自托管网站监控工具,支持 HTTP/TCP/DNS/TLS 探测,适合合成监测外层 UI
- [[Loki]]Grafana Labs 日志聚合系统,与 Prometheus/Grafana 原生集成,轻量级
- [[VictoriaMetrics]]:高性能时序数据库,兼容 Prometheus remote_write API适合长期存储
- [[Portainer]]Docker 可视化管理工具,不替代 Prometheus 但便于运维操作
## Connections
- [[Prometheus]] ← scrape_configs ← [[node_exporter]]
- [[Prometheus]] ← scrape_configs ← [[cAdvisor]]
- [[Prometheus]] ← scrape_configs ← [[blackbox_exporter]]
- [[Grafana]] ← 数据源 ← [[Prometheus]]
- [[Alertmanager]] ← 告警接收 ← [[Prometheus]]
- [[Grafana]] ← 仪表盘 ← [[cAdvisor]] / [[node_exporter]] / [[blackbox_exporter]]
- [[Prometheus]] ← 远端存储 ← [[VictoriaMetrics]]
## Contradictions
- 无明显冲突
## Infrastructure Code
### docker-compose.yml 核心配置
```yaml
services:
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- prometheus-data:/prometheus
command: ['--config.file=/etc/prometheus/prometheus.yml', '--storage.tsdb.path=/prometheus', '--web.enable-lifecycle']
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
node_exporter:
image: prom/node-exporter:latest
network_mode: "host"
pid: "host"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports: ["8080:8080"]
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
blackbox:
image: prom/blackbox-exporter:latest
ports: ["9115:9115"]
```
### prometheus.yml scrape_configs
```yaml
scrape_configs:
- job_name: 'node_exporter'
file_sd_configs:
- files: ['/etc/prometheus/targets/node.yml']
- job_name: 'cadvisor'
file_sd_configs:
- files: ['/etc/prometheus/targets/cadvisor.yml']
- job_name: 'blackbox_http'
metrics_path: /probe
params: { module: [http_2xx] }
file_sd_configs:
- files: ['/etc/prometheus/targets/blackbox.yml']
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox:9115
```
### 核心告警规则
```yaml
- alert: HostHighCPU
expr: avg(rate(node_cpu_seconds_total{mode="user"}[2m])) * 100 > 85
for: 2m
labels:
severity: warning
annotations:
summary: "高 CPU 使用率"
- alert: HostLowDisk
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10
for: 5m
labels:
severity: critical
- alert: TLSCertExpiring
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14
for: 1h
labels:
severity: warning
```