Auto-sync: 2026-04-24 00:02
This commit is contained in:
39
wiki/concepts/Columnar-Storage.md
Normal file
39
wiki/concepts/Columnar-Storage.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: "Columnar Storage"
|
||||
type: concept
|
||||
tags:
|
||||
- Data-Warehouse
|
||||
- Storage
|
||||
- Performance
|
||||
sources:
|
||||
- ctp-topic-68-introduction-to-redshift
|
||||
last_updated: 2026-04-23
|
||||
---
|
||||
|
||||
## Overview
|
||||
列式存储(Columnar Storage)是一种数据存储格式,数据按列而非按行组织。专为分析型工作负载(OLAP)设计,相比传统行式存储能显著提升聚合查询和全表扫描性能,同时降低存储空间需求。
|
||||
|
||||
## How It Works
|
||||
行式存储按行存储:`[row1_col1, row1_col2, row1_col3, row2_col1, row2_col2, row2_col3, ...]`
|
||||
列式存储按列存储:`[col1_row1, col1_row2, ..., col2_row1, col2_row2, ..., col3_row1, col3_row2, ...]`
|
||||
|
||||
## Key Advantages
|
||||
- **查询性能**:只需读取查询涉及的列,避免全行读取 I/O 开销
|
||||
- **压缩效率**:同一列数据类型一致,压缩比更高(如 Dictionary Encoding、Run-Length Encoding)
|
||||
- **向量化执行**:列式数据可直接进行 SIMD 向量化计算,CPU 利用率更高
|
||||
- **聚合查询友好**:COUNT/SUM/AVG 等聚合仅需读取相关列
|
||||
|
||||
## Trade-offs
|
||||
- **点查询效率低**:单行更新/插入需读写整列数据
|
||||
- **写入放大**:行更新涉及多列修改
|
||||
- **适用场景受限**:适合读密集型分析,不适合频繁更新的事务处理
|
||||
|
||||
## Applications
|
||||
- **数据仓库**:Amazon Redshift、Google BigQuery、Snowflake、ClickHouse
|
||||
- **列式文件系统**:Apache Parquet、Apache ORC
|
||||
- **分析型数据库**:Apache Druid、Apache Kylin
|
||||
|
||||
## Related Concepts
|
||||
- [[MPP]]:列式存储常与 MPP 架构结合,实现大规模并行分析
|
||||
- [[Sort-Key]]:在列式存储中排序键可进一步优化范围查询性能
|
||||
- [[Data Compression]]:列式存储天然适合高压缩比
|
||||
Reference in New Issue
Block a user