Files
nexus/wiki/concepts/MPP.md
2026-04-24 00:03:01 +08:00

41 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "MPP (Massively Parallel Processing)"
type: concept
tags:
- Distributed Computing
- Data-Warehouse
- Performance
sources:
- ctp-topic-68-introduction-to-redshift
last_updated: 2026-04-23
---
## Overview
MPP大规模并行处理是一种分布式计算架构通过多个计算节点并行执行查询和数据处理任务显著提升大规模数据集的查询速度和系统吞吐量。
## How It Works
1. **任务分解**协调节点Leader/Coordinator将大型查询分解为多个子任务
2. **并行分发**子任务分发至多个计算节点Compute Node
3. **独立执行**各节点在本地数据子集Slice/Partition上并行执行计算
4. **结果汇总**:各节点结果返回协调节点,进行最终聚合和输出
## Key Benefits
- **线性扩展**:增加节点数量可线性提升查询性能
- **高吞吐量**:适合复杂分析查询和大规模数据聚合
- **容错性**:单节点故障不影响整体系统(部分实现)
## Trade-offs
- **数据倾斜Data Skew**:数据分布不均导致部分节点负载过重
- **跨节点通信**:节点间数据传输增加延迟
- **复杂查询优化**:需精心设计数据分布策略
## Applications
- **数据仓库**Amazon Redshift、Snowflake、Google BigQuery
- **大数据处理**Apache SparkSpark SQL、Presto/Trino
- **科学计算**:分布式矩阵运算、基因组分析
## Related Concepts
- [[Columnar-Storage]]:列式存储与 MPP 协同优化分析查询
- [[Distribution-Key]]:数据分布策略影响 MPP 性能
- [[Sort-Key]]:排序键优化局部性,提升 MPP 节点内效率