Files
nexus/wiki/entities/dbt.md
2026-05-03 05:42:12 +08:00

63 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "dbt (data build tool)"
type: entity
tags: [data-engineering, data-transformation, SQL, data-quality]
sources: [engineering-data-engineer]
last_updated: 2026-05-02
---
## Overview
dbtdata build tool是数据转换和数据质量管理的 SQL-first 工具,允许分析师和工程师使用 SQL 定义数据转换、测试和质量契约。Data Engineer Agent 使用 dbt Cloud 定义 Medallion Architecture Silver 层的 schema 契约和数据质量测试。
## Core Capabilities
### Data Transformation
- 使用 SQL 定义 `models/`(转换模型)
- 支持 Jinja2 模板化 SQL复用逻辑
- 增量模型(`incremental` materialization减少全量计算
### Schema Contract Enforcement
```yaml
models:
- name: silver_orders
config:
contract:
enforced: true # schema 不匹配时构建失败
columns:
- name: order_id
data_type: string
constraints:
- type: not_null
- type: unique
tests:
- not_null
- unique
- name: revenue
data_type: decimal(18, 2)
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
```
### Data Testing
- Column testsnot_null、unique、relationships
- dbt Expectations 扩展(值范围、分布、新鲜度)
- Recency tests如"1 小时内必须有新数据"
### Semantic Layerdbt Cloud
- 定义 Metrics度量一次在多个 BI 工具中复用
- 统一业务指标定义,消除 BI 层的重复逻辑
## Integration with Lakehouse
- **dbt + Spark/Delta Lake**Silver 层清洗和 conform
- **dbt + Kafka**:结合流式写入实现近实时 Silver 层更新
- **dbt + Databricks**:原生 Unity Catalog 集成
## Related Concepts
- [[Medallion Architecture]]
- [[Data Contract]]
- [[Delta Lake]]