Update nexus wiki content
This commit is contained in:
41
wiki/entities/Databricks.md
Normal file
41
wiki/entities/Databricks.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "Databricks"
|
||||
type: entity
|
||||
tags: [data-engineering, lakehouse, analytics-platform, cloud]
|
||||
sources: [engineering-data-engineer]
|
||||
last_updated: 2026-05-02
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Databricks 是基于 Apache Spark 的统一分析和 AI 平台,提供 Lakehouse、Notebook、MLflow、Delta Live Tables(DLT)和 Unity Catalog 等能力。Data Engineer Agent 使用 Databricks 作为主要的托管执行环境。
|
||||
|
||||
## Key Products for Data Engineering
|
||||
|
||||
### Unity Catalog
|
||||
- 统一治理:跨云(AWS/Azure/GCP)的数据目录和权限管理
|
||||
- 细粒度行级安全(Row-Level Security)和列掩码(Column Masking)
|
||||
|
||||
### Delta Live Tables (DLT)
|
||||
- 声明式流式和批处理管道
|
||||
- 自动管理基础设施、checkpoint 和数据质量
|
||||
- 内置期望(Expectations)定义,数据质量自动验证
|
||||
|
||||
### Databricks Workflows
|
||||
- 编排多任务管道(notebooks + SQL + JAR)
|
||||
- 支持 CI/CD 集成(Asset Bundles)
|
||||
|
||||
### Asset Bundles
|
||||
- 基础架构即代码(IaC)方式管理 Databricks 资源
|
||||
- 可与 GitHub Actions 集成实现自动化部署
|
||||
|
||||
## Cloud Platforms
|
||||
- **AWS**:S3 + Databricks
|
||||
- **Azure**:ADLS + Databricks (Microsoft Fabric 集成)
|
||||
- **GCP**:GCS + Databricks
|
||||
|
||||
## Related Concepts
|
||||
- [[Medallion Architecture]]
|
||||
- [[Delta Lake]](Databricks 是主要贡献者和推广者)
|
||||
- [[Apache Spark]]
|
||||
- [[dbt]](dbt Cloud 与 Databricks 深度集成)
|
||||
Reference in New Issue
Block a user