Files
nexus/wiki/concepts/Medallion-Architecture.md
2026-05-03 05:42:12 +08:00

51 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Medallion Architecture"
type: concept
tags: [data-engineering, lakehouse, architecture]
sources: [engineering-data-engineer]
last_updated: 2026-05-02
---
## Definition
Medallion Architecture 是一种数据湖仓Lakehouse分层架构通过 Bronze青铜→ Silver白银→ Gold黄金三层设计实现数据从原始到业务就绪的渐进式提升。
## Three Layers
### Bronze Layer原始层
- **特性**原始、不可变、追加写入append-only
- **规则**:永远不在原地转换数据;保留完整的 source file、ingestion timestamp、source system 元数据
- **Schema**Schema-on-Read读取时推断
- **分区策略**:按 ingestion date 分区,支持低成本历史重放
### Silver Layer清洗层
- **特性**已清洗、去重、统一格式conformed
- **规则**:必须可跨域 join显式处理 nullimpute/flag/reject标准化数据类型、日期格式、货币码、国家码
- **实现**SCD Type 2 追踪历史变更;主键 + 事件时间戳去重
- **质量**:每字段 null 处理规则必须明确记录
### Gold Layer业务层
- **特性**业务就绪、SLA 保证、为查询模式优化
- **规则**Gold 层消费者禁止直接读取 Bronze 或 Silver必须附带行级数据质量评分使用 replaceWhere 原子覆盖
- **优化**Z-Ordering 多维聚类、分区裁剪、预聚合
- **SLA**:明确刷新频率(如"每 15 分钟刷新一次"
## Core Principles
- **不可变性**Bronze 层不可覆盖,每条记录携带 `_ingested_at``_source_system`
- **渐进式质量**:数据质量在 Bronze→Silver→Gold 每层逐步提升
- **消费者保护**:上游 schema 变化通过 `mergeSchema=true` 捕获,但不自动污染下游
- **幂等性**Silver→Gold 每步管道必须幂等——重新运行不产生重复
## Related Concepts
- [[CDC (Change Data Capture)]]
- [[Data Contract]]
- [[Data Lineage]]
- [[SCD Type 2]]
## Related Entities
- [[Delta Lake]]Bronze/Silver/Gold 存储格式)
- [[Apache Spark]](计算引擎)
- [[Databricks]](托管平台)
- [[Apache Iceberg]](开放表格格式替代方案)