Files
nexus/wiki/sources/support-infrastructure-maintainer.md
2026-04-21 04:02:47 +08:00

78 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Infrastructure Maintainer"
type: source
tags: [agent, infrastructure, devops]
date: 2026-04-21
---
## Source File
- [[raw/Agent/agency-agents/support/support-infrastructure-maintainer.md]]
## Summary
- 核心主题Infrastructure Maintainer 智能体专业角色的完整定义
- 问题域:系统可靠性、性能优化、技术运营管理
- 方法/机制IaC、监控、自动化、安全加固、灾备、成本优化
- 结论/价值:提供 99.9%+ 运维能力,通过标准化交付物和流程实现基础设施可观测性
## Key Claims
- Infrastructure Maintainer 确保 99.9%+ 系统正常运行时间
- IaC 框架Terraform实现跨平台基础设施声明式管理
- Prometheus 监控配置支持多层次告警infrastructure/application/database
- 自动化备份系统通过加密和 S3 存储实现灾难恢复
- Security Hardening 集成于所有基础设施变更
- 成本优化策略实现 20%+ 年度效率提升
## Key Quotes
> "Monitoring indicates 85% disk usage on DB server - scaling scheduled for tomorrow" — Proactive communication style
> "Implemented redundant load balancers achieving 99.99% uptime target" — Reliability focus
> "Auto-scaling policies reduced costs 23% while maintaining <200ms response times" — Systematic optimization
## Key Concepts
- [[Infrastructure as Code (IaC)]]:通过代码实现一致性、版本控制的基础设施管理
- [[Prometheus Monitoring]]:时序数据库监控方案,支持多维度告警规则
- [[Terraform]]:基础设施即代码工具,声明式配置跨平台云资源
- [[Disaster Recovery]]灾难恢复策略RTO/RPO 为核心指标
- [[Security Hardening]]:安全加固流程,零信任架构和最小权限原则
- [[Cost Optimization]]云成本优化策略Right-Sizing 和 Reserved Instance
## Key Entities
- [[The Agency]]:开源 AI 智能体集合项目Infrastructure Maintainer 是其 Support 角色之一
- [[AWS]]:基础设施云平台,提供 VPC、RDS、EC2 等服务
- [[Prometheus]]:开源监控和告警工具
- [[Terraform]]HashiCorp 基础设施即代码工具
## Connections
- [[Support Infrastructure Maintainer]] ← is_a ← [[The Agency Agent]]
- [[DevOps 成熟度模型]] ← relates_to ← [[Infrastructure as Code (IaC)]]
- [[ITSMIT 服务管理)]] ← relates_to ← [[Disaster Recovery]]
## Contradictions
- 未检测到与现有 wiki 内容的冲突
## Workflow Deliverables
### Monitoring System
- Prometheus scrape_configs: infrastructure(30s), application(15s), database(30s)
- Alert rules: HighCPUUsage, HighMemoryUsage, DiskSpaceLow, ServiceDown
### IaC Framework
- Terraform backend: S3 + DynamoDB state locking
- VPC with private/public subnets across availability zones
- Auto Scaling Group with ELB health checks
- RDS PostgreSQL with encrypted storage and backup retention
### Backup & Recovery
- Encrypted backup script (GPG AES256)
- S3 storage with STANDARD_IA
- Retention: 30 days local, lifecycle managed in S3
- Verification and Slack notification
## Agent Characteristics
- **Role**: System reliability, infrastructure optimization, operations specialist
- **Personality**: Proactive, systematic, reliability-focused, security-conscious
- **Success Metrics**: 99.9%+ uptime, MTTR <4 hours, 20%+ cost efficiency, 70%+ automation reduction
## Advanced Capabilities
- Multi-cloud architecture design
- Container orchestration (Kubernetes)
- Zero-trust security architecture
- Compliance automation (SOC2, ISO27001)