Files
nexus/wiki/concepts/StackSets-Deployment-Visibility.md
2026-04-22 04:03:04 +08:00

58 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: StackSets Deployment Visibility
type: concept
tags: [AWS, CloudFormation, StackSets, Observability, CloudOps]
date: 2025-10-24
---
## Definition
StackSets Deployment VisibilityStackSets 部署可观测性)是指在 AWS 多账户/多区域场景下,通过 EventBridge + CloudWatch Logs 实现对 CloudFormation StackSets 部署状态的集中监控和故障排查能力。核心目标是消除多账户部署中的监控盲区,提供跨账户的统一可观测性视图。
## Core Properties
- **事件捕获**EventBridge Rules 捕获所有 CloudFormation 操作事件CREATE/UPDATE/DELETE
- **跨账户转发**EventBridge Custom Event Bus 将事件从成员账户转发到管理账户
- **集中存储**CloudWatch Log Group 聚合所有账户的 CloudFormation 日志
- **统一查询**CloudWatch Logs Insights 支持跨账户、跨区域的结构化日志分析
## Event Flow
```
Member Account CloudFormation (CREATE/UPDATE/DELETE)
→ EventBridge Rule (pattern: CloudFormation events)
→ Event Bus (Custom, in Management Account)
→ CloudWatch Log Group (central-cloudformation-logs)
→ CloudWatch Logs Insights (aggregated queries)
```
## Related Concepts
- [[Multi-Account Deployment]]StackSets 部署可观测性是跨账户部署运营的核心支撑
- [[AWS CloudFormation StackSets]]:被监控的目标部署工具
- [[Amazon EventBridge]]:事件捕获和跨账户路由的核心组件
- [[Amazon CloudWatch Logs]]:集中日志存储
- [[Centralized Logging]]:部署可观测性是集中日志的具体应用
- [[Cross-Account Monitoring]]:共享同一套跨账户监控基础设施
- [[Cloud Service Delivery]]StackSets 部署可观测性是云服务交付运营的重要组成
## Monitorable Events
- Stack CREATE operation started/completed/failed
- Stack UPDATE operation started/completed/failed
- Stack DELETE operation started/completed/failed
- Resource creation/update/deletion events
- Stack set operation preferences (parallelism, fault tolerance)
## Query Patterns (CloudWatch Logs Insights)
```sql
fields @timestamp, account, region
| parse @message /"resource-type":"(?<resource_type>[^"]+)"/
| parse @message /"status":"(?<status>[^"]+)"/
| parse @message /"logical-resource-id":"(?<logical_resource_id>[^"]+)"/
| filter status = "FAILED"
| sort @timestamp desc
```
## Key Metrics to Track
- Deployment success/failure rate by account
- Time-to-deploy by resource type
- Regional distribution of deployments
- Failed operations and affected accounts
- Deployment timeline and operation duration