Files
nexus/wiki/concepts/AI-ChatOps.md
2026-04-22 04:03:04 +08:00

2.7 KiB
Raw Blame History

title, tags, created
title tags created
AI ChatOps
devops
chatops
ai
collaboration
observability
2026-04-25

AI ChatOps

Definition

AI ChatOps 是通过自然语言接口Slack / Teams / CLI进行故障排查AI 提供日志分析和解决方案建议的运维协作模式。Agentic AI 作为 24/7 的运维助手,工程师随时可通过对话获取即时支持。

与 Traditional ChatOps 的区别

维度 Traditional ChatOps AI ChatOps
响应能力 依赖人工在线 24/7 即时响应
问题诊断 人工搜索日志 AI 自动分析 + 建议
知识依赖 依赖个人经验 跨团队知识聚合
学习能力 经验不可复制 持续学习 + 知识积累
平均响应 数分钟至数小时 毫秒级

Agentic AI ChatOps 能力

ChatOps_Capabilities = {
    "Log Query": "自然语言查询日志: 'Show me errors from API service in last hour'",
    "Incident Summary": "AI 生成事故摘要: 'This is caused by X, fix is Y'",
    "Runbook Suggestion": "AI 推荐运维手册: 'Based on error pattern, try runbook #42'",
    "Metric Correlation": "AI 关联指标: 'CPU spike correlates with DB connection pool'",
    "Action Execution": "AI 执行操作: '/runbook restart-service api-gateway'",
    "Post-mortem": "AI 生成复盘报告: 自动生成 incident timeline"
}

示例

Engineer in Slack: @ai-ops Our API is slow, users are complaining

AI Response:

🔍 Analysis complete:

Root Cause: External payment API timeout (upstream)
- Payment API p99 latency: 15,000ms (normally 200ms)
- Correlated: API gateway retries causing backpressure

Suggested Actions:
1. Enable circuit breaker (auto-deploy via /ops deploy)
2. Fallback to cache for payment status (auto via /ops deploy)
3. Monitor: https://grafana.link/d/abc123

Shall I proceed with option 1? (yes/no)

AIOps 的关系

AI ChatOps 是 AIOps 能力矩阵的用户交互层:

AIOps_Capabilities = {
    "Anomaly Detection": "检测异常模式",
    "Root Cause Analysis": "自动诊断",
    "Predictive Maintenance": "预测性维护",
    "Smart Alerting": "减少告警疲劳",
    "Automated Remediation": "自动修复",
    "Capacity Optimization": "容量优化",
    "AI ChatOps ←": "自然语言交互层"  # ← 本页
}