63 lines
1.8 KiB
Markdown
63 lines
1.8 KiB
Markdown
---
|
||
title: "dbt (data build tool)"
|
||
type: entity
|
||
tags: [data-engineering, data-transformation, SQL, data-quality]
|
||
sources: [engineering-data-engineer]
|
||
last_updated: 2026-05-02
|
||
---
|
||
|
||
## Overview
|
||
|
||
dbt(data build tool)是数据转换和数据质量管理的 SQL-first 工具,允许分析师和工程师使用 SQL 定义数据转换、测试和质量契约。Data Engineer Agent 使用 dbt Cloud 定义 Medallion Architecture Silver 层的 schema 契约和数据质量测试。
|
||
|
||
## Core Capabilities
|
||
|
||
### Data Transformation
|
||
- 使用 SQL 定义 `models/`(转换模型)
|
||
- 支持 Jinja2 模板化 SQL,复用逻辑
|
||
- 增量模型(`incremental` materialization)减少全量计算
|
||
|
||
### Schema Contract Enforcement
|
||
```yaml
|
||
models:
|
||
- name: silver_orders
|
||
config:
|
||
contract:
|
||
enforced: true # schema 不匹配时构建失败
|
||
columns:
|
||
- name: order_id
|
||
data_type: string
|
||
constraints:
|
||
- type: not_null
|
||
- type: unique
|
||
tests:
|
||
- not_null
|
||
- unique
|
||
- name: revenue
|
||
data_type: decimal(18, 2)
|
||
tests:
|
||
- dbt_expectations.expect_column_values_to_be_between:
|
||
min_value: 0
|
||
max_value: 1000000
|
||
```
|
||
|
||
### Data Testing
|
||
- Column tests(not_null、unique、relationships)
|
||
- dbt Expectations 扩展(值范围、分布、新鲜度)
|
||
- Recency tests(如"1 小时内必须有新数据")
|
||
|
||
### Semantic Layer(dbt Cloud)
|
||
- 定义 Metrics(度量)一次,在多个 BI 工具中复用
|
||
- 统一业务指标定义,消除 BI 层的重复逻辑
|
||
|
||
## Integration with Lakehouse
|
||
|
||
- **dbt + Spark/Delta Lake**:Silver 层清洗和 conform
|
||
- **dbt + Kafka**:结合流式写入实现近实时 Silver 层更新
|
||
- **dbt + Databricks**:原生 Unity Catalog 集成
|
||
|
||
## Related Concepts
|
||
- [[Medallion Architecture]]
|
||
- [[Data Contract]]
|
||
- [[Delta Lake]]
|