--- title: "Infrastructure Maintainer" type: source tags: [agent, infrastructure, devops] date: 2026-04-21 --- ## Source File - [[raw/Agent/agency-agents/support/support-infrastructure-maintainer.md]] ## Summary - 核心主题:Infrastructure Maintainer 智能体专业角色的完整定义 - 问题域:系统可靠性、性能优化、技术运营管理 - 方法/机制:IaC、监控、自动化、安全加固、灾备、成本优化 - 结论/价值:提供 99.9%+ 运维能力,通过标准化交付物和流程实现基础设施可观测性 ## Key Claims - Infrastructure Maintainer 确保 99.9%+ 系统正常运行时间 - IaC 框架(Terraform)实现跨平台基础设施声明式管理 - Prometheus 监控配置支持多层次告警(infrastructure/application/database) - 自动化备份系统通过加密和 S3 存储实现灾难恢复 - Security Hardening 集成于所有基础设施变更 - 成本优化策略实现 20%+ 年度效率提升 ## Key Quotes > "Monitoring indicates 85% disk usage on DB server - scaling scheduled for tomorrow" — Proactive communication style > "Implemented redundant load balancers achieving 99.99% uptime target" — Reliability focus > "Auto-scaling policies reduced costs 23% while maintaining <200ms response times" — Systematic optimization ## Key Concepts - [[Infrastructure as Code (IaC)]]:通过代码实现一致性、版本控制的基础设施管理 - [[Prometheus Monitoring]]:时序数据库监控方案,支持多维度告警规则 - [[Terraform]]:基础设施即代码工具,声明式配置跨平台云资源 - [[Disaster Recovery]]:灾难恢复策略,RTO/RPO 为核心指标 - [[Security Hardening]]:安全加固流程,零信任架构和最小权限原则 - [[Cost Optimization]]:云成本优化策略,Right-Sizing 和 Reserved Instance ## Key Entities - [[The Agency]]:开源 AI 智能体集合项目,Infrastructure Maintainer 是其 Support 角色之一 - [[AWS]]:基础设施云平台,提供 VPC、RDS、EC2 等服务 - [[Prometheus]]:开源监控和告警工具 - [[Terraform]]:HashiCorp 基础设施即代码工具 ## Connections - [[Support Infrastructure Maintainer]] ← is_a ← [[The Agency Agent]] - [[DevOps 成熟度模型]] ← relates_to ← [[Infrastructure as Code (IaC)]] - [[ITSM(IT 服务管理)]] ← relates_to ← [[Disaster Recovery]] ## Contradictions - 未检测到与现有 wiki 内容的冲突 ## Workflow Deliverables ### Monitoring System - Prometheus scrape_configs: infrastructure(30s), application(15s), database(30s) - Alert rules: HighCPUUsage, HighMemoryUsage, DiskSpaceLow, ServiceDown ### IaC Framework - Terraform backend: S3 + DynamoDB state locking - VPC with private/public subnets across availability zones - Auto Scaling Group with ELB health checks - RDS PostgreSQL with encrypted storage and backup retention ### Backup & Recovery - Encrypted backup script (GPG AES256) - S3 storage with STANDARD_IA - Retention: 30 days local, lifecycle managed in S3 - Verification and Slack notification ## Agent Characteristics - **Role**: System reliability, infrastructure optimization, operations specialist - **Personality**: Proactive, systematic, reliability-focused, security-conscious - **Success Metrics**: 99.9%+ uptime, MTTR <4 hours, 20%+ cost efficiency, 70%+ automation reduction ## Advanced Capabilities - Multi-cloud architecture design - Container orchestration (Kubernetes) - Zero-trust security architecture - Compliance automation (SOC2, ISO27001)