Update nexus wiki content

This commit is contained in:
2026-05-03 05:42:06 +08:00
parent 90f3811b83
commit 111bc65b7b
707 changed files with 32306 additions and 7289 deletions

View File

@@ -0,0 +1,55 @@
---
title: "Delta Lake"
type: entity
tags: [data-engineering, lakehouse, open-table-format, ACID]
sources: [engineering-data-engineer]
last_updated: 2026-05-02
---
## Overview
Delta Lake 是由 Databricks 开源的开放表格格式Open Table Format为数据湖提供 ACID 事务、时间旅行、Z-Ordering 等能力。Data Engineer Agent 使用 Delta Lake 作为 Medallion Architecture 三层Bronze/Silver/Gold的统一存储格式。
## Key Features
### ACID Transactions
- 写操作原子提交,读者永远看到一致状态
- 多并发写操作不会产生部分写入
### Time Travel
- 任意时间点查询数据(`VERSION AS OF``TIMESTAMP AS OF`
- 用于审计、合规和回滚
### Schema Enforcement & Evolution
- `mergeSchema=true`:允许 schema 演进,捕获上游变更
- 禁止删除 required 列,类型变更需显式声明
### Z-Ordering
- 多维数据聚类,将相关数据物理上聚集存储
- 显著加速复合过滤查询
### Liquid ClusteringDelta Lake 3.x+
- 自动压缩和聚类,自适应工作负载
### UPSERT / MERGE
```python
target.alias("target").merge(source.alias("source"), merge_condition) \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
```
实现幂等的增量数据更新。
## Alternative Formats
- [[Apache Iceberg]]另一个开放表格格式规范跨引擎Spark/Trino/Presto互操作
- Apache Hudi支持 hoodie-based incremental processing
## Used By
- [[Databricks]](原生支持)
- [[Apache Spark]]`delta` format 直接支持)
- AWS Glue、Snowflake通过 connectors
## Related Concepts
- [[Medallion Architecture]]
- [[Apache Spark]]
- [[SCD Type 2]]