62 lines
1.9 KiB
Markdown
62 lines
1.9 KiB
Markdown
---
|
||
title: "Data Contract"
|
||
type: concept
|
||
tags: [data-engineering, data-quality, schema, SLA]
|
||
sources: [engineering-data-engineer]
|
||
last_updated: 2026-05-02
|
||
---
|
||
|
||
## Definition
|
||
|
||
Data Contract(数据契约)是数据生产者和消费者之间的明确协议,定义了数据的预期 schema、数据类型、SLA、所有权和消费方。数据契约是 Medallion Architecture 中 Silver→Gold 层质量保证的核心机制。
|
||
|
||
## Components
|
||
|
||
### Schema Contract
|
||
- 字段名、类型、约束(not_null、unique、foreign key)
|
||
- Schema 演化规则:允许添加 nullable 字段,禁止删除或修改类型
|
||
- `mergeSchema=true`:允许 schema 演进,但触发告警而非自动污染下游
|
||
|
||
### SLA Contract
|
||
- 刷新频率(如"每 15 分钟刷新一次")
|
||
- 数据新鲜度阈值(如"1 小时内必须有新数据")
|
||
- 可用性承诺(如"Gold 层 99.9% 可用性")
|
||
|
||
### Ownership Contract
|
||
- 数据所有者(Data Owner)
|
||
- 数据消费者(Data Consumer)
|
||
- 支持联系人(Support Contact)
|
||
|
||
## Enforcement
|
||
|
||
### dbt Contract Enforcement
|
||
```yaml
|
||
models:
|
||
- name: silver_orders
|
||
config:
|
||
contract:
|
||
enforced: true # 强制 schema 契约,类型不匹配则构建失败
|
||
columns:
|
||
- name: order_id
|
||
data_type: string
|
||
constraints:
|
||
- type: not_null
|
||
- type: unique
|
||
```
|
||
|
||
### Great Expectations(数据质量验证)
|
||
- 行级数据质量评分必须在 Gold 层附加
|
||
- Null 率告警阈值(如 `customer_id` null 率从 0.1% 跳至 4.2% → 触发 PagerDuty)
|
||
|
||
## Key Rules
|
||
|
||
- **Schema 漂移必须告警**:不得静默损坏下游数据
|
||
- **Null 处理必须显式**:不得隐式将 null 传播到 Gold 层
|
||
- **发布前必须与消费者确认**:数据契约签署后才能部署 Gold 层管道
|
||
|
||
## Related Concepts
|
||
- [[Medallion Architecture]]
|
||
- [[Great Expectations]](数据质量验证工具)
|
||
- [[Data Lineage]]
|
||
- [[SCD Type 2]]
|