Files
nexus/wiki/entities/Databricks.md
2026-05-03 05:42:12 +08:00

42 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Databricks"
type: entity
tags: [data-engineering, lakehouse, analytics-platform, cloud]
sources: [engineering-data-engineer]
last_updated: 2026-05-02
---
## Overview
Databricks 是基于 Apache Spark 的统一分析和 AI 平台,提供 Lakehouse、Notebook、MLflow、Delta Live TablesDLT和 Unity Catalog 等能力。Data Engineer Agent 使用 Databricks 作为主要的托管执行环境。
## Key Products for Data Engineering
### Unity Catalog
- 统一治理跨云AWS/Azure/GCP的数据目录和权限管理
- 细粒度行级安全Row-Level Security和列掩码Column Masking
### Delta Live Tables (DLT)
- 声明式流式和批处理管道
- 自动管理基础设施、checkpoint 和数据质量
- 内置期望Expectations定义数据质量自动验证
### Databricks Workflows
- 编排多任务管道notebooks + SQL + JAR
- 支持 CI/CD 集成Asset Bundles
### Asset Bundles
- 基础架构即代码IaC方式管理 Databricks 资源
- 可与 GitHub Actions 集成实现自动化部署
## Cloud Platforms
- **AWS**S3 + Databricks
- **Azure**ADLS + Databricks (Microsoft Fabric 集成)
- **GCP**GCS + Databricks
## Related Concepts
- [[Medallion Architecture]]
- [[Delta Lake]]Databricks 是主要贡献者和推广者)
- [[Apache Spark]]
- [[dbt]]dbt Cloud 与 Databricks 深度集成)