Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/sources/ctp-topic-68-introduction-to-redshift.md
+++ b/wiki/sources/ctp-topic-68-introduction-to-redshift.md
@@ -1,63 +1,63 @@
---
-title: "CTP Topic 68 Introduction to Redshift"
-type: source
-tags:
-  - AWS
-  - Redshift
-  - Data-Warehouse
-  - CTP
-date: 2026-04-14
---
-
-## Source File
- [[Cloud & DevOps/Public-Cloud-Learning-Sessions/01_AWS-Landing-Zone/ctp-topic-68-introduction-to-redshift]]
-
-## Summary（用中文描述）
- 核心主题：AWS Redshift 数据仓库服务的基础架构、核心组件及关键特性
- 问题域：云端 PB 级数据仓库的选型与架构设计
- 方法/机制：Leader Node + Compute Node MPP 并行架构、列式存储、行式存储、数据压缩（ZSTD/LZO）、Sort Key、Distribution Key
- 结论/价值：Redshift 是完全托管的 PB 级云数据仓库，支持 OLAP，提供 Leader Node 负责查询规划和元数据管理，Compute Node 通过 Slices 执行并行查询；RA3 实例类型性价比最优，支持 AWS 托管 NVMe 存储；Sort Key 和 Dist Key 是性能优化的关键配置
-
-## Key Claims（用中文描述）
- Redshift 通过 Leader Node 管理 Schema、元数据和查询计划，将指令分发至 Compute Node 执行，实现 MPP（大规模并行处理），显著提升查询速度和响应时间
- Redshift 支持列式存储（适合数据仓库操作）和行式存储两种模式，列式存储因更快的查询性能和更高的内存利用率而更适合 OLAP 场景
- RA3 实例类型因其成本效益和大规模存储容量而被推荐，底层使用 AWS 托管的 NVMe 存储
- Sort Key（排序键）和 Dist Key（分布键）是 Redshift 性能优化的核心机制，决定数据分布和查询执行效率
-
-## Key Quotes
-> "Redshift is a fully managed, petabyte-scale data warehouse solution in the cloud. It is designed for data warehousing, enabling quick data retrieval from large datasets." — 视频摘要
-
-> "The leader node manages schema, warehouse metadata, and query planning, distributing instructions to compute nodes." — Redshift 架构说明
-
-> "Compute nodes, determined by the instance type, execute queries across slices, processing data and returning results to the leader node." — Compute Node 工作机制
-
-> "RA3 is noted for its cost-effectiveness and large storage capacity, utilizing AWS-managed NVMe storage." — RA3 实例优势
-
-## Key Concepts
- [[MPP (Massively Parallel Processing)]]：通过多个 Compute Node 并行处理查询，提升大规模数据集的查询速度和响应时间
- [[列式存储（Columnar Storage)]]：数据按列而非按行存储，适合数据仓库的聚合查询和扫描操作，提供更快的查询性能和更高的内存效率
- [[数据压缩（Data Compression)]]：采用 ZSTD/LZO 等压缩算法减少数据大小，提升 I/O 效率和查询性能
- [[Sort Key（排序键)]]：决定数据在磁盘上的物理排序顺序，对范围查询和过滤操作性能影响显著
- [[Distribution Key（分布键)]]：决定数据在 Compute Node 间如何分布，影响数据倾斜和节点间数据传输
- [[OLAP（在线分析处理)]]：面向复杂分析查询的工作负载类型，Redshift 的核心设计目标
- [[Leader Node（主节点)]]：Redshift 架构中的协调节点，负责客户端连接、Schema 管理、元数据存储和查询计划生成
- [[Compute Node（计算节点)]]：Redshift 架构中的执行节点，负责在 Slices 上执行查询并返回结果
-
-## Key Entities
- [[Amazon Redshift]]：AWS 提供的大规模并行处理数据仓库服务，支持 PB 级数据存储，面向 OLAP 工作负载
- [[AWS]]：Amazon Web Services，云服务提供商，Redshift 的托管平台
- [[RA3]]：Redshift 的高性价比实例类型，配备 AWS 托管 NVMe 存储，适合大容量存储场景
- [[Dense Compute]]：Redshift 高计算密度实例类型，适合计算密集型查询
- [[Dense Storage]]：Redshift 高存储密度实例类型，适合存储密集型工作负载
- [[JDBC/ODBC]]：Redshift 客户端驱动协议，客户端应用通过 JDBC/ODBC 连接至 Redshift Cluster
-
-## Connections
- [[ctp-topic-51-purpose-built-databases]] ← related_to ← [[Amazon Redshift]]
- [[ctp-topic-66-rds-vs-aurora]] ← related_to ← [[Amazon Redshift]]
- [[ctp-topic-40-saas-database-architecture-on-aws-cloud]] ← related_to ← [[Amazon Redshift]]
-
-## Contradictions
- 与 [[ctp-topic-66-rds-vs-aurora]] 的数据写入模式：
-  - 冲突点：Aurora 采用共享存储架构（6副本跨3 AZ），而 Redshift 采用独立 Compute Node 架构；Aurora 更适合写入密集型 OLTP，Redshift 更适合分析密集型 OLAP
-  - 当前观点：Redshift 的列式存储 + MPP 是大规模数据分析的最优架构
-  - 对方观点：Aurora 的共享存储简化了 HA 和 DR，且 Blue-Green 部署支持更灵活
+---
+title: "CTP Topic 68 Introduction to Redshift"
+type: source
+tags:
+  - AWS
+  - Redshift
+  - Data-Warehouse
+  - CTP
+date: 2026-04-14
+---
+
+## Source File
+- [[Cloud & DevOps/Public-Cloud-Learning-Sessions/01_AWS-Landing-Zone/ctp-topic-68-introduction-to-redshift]]
+
+## Summary（用中文描述）
+- 核心主题：AWS Redshift 数据仓库服务的基础架构、核心组件及关键特性
+- 问题域：云端 PB 级数据仓库的选型与架构设计
+- 方法/机制：Leader Node + Compute Node MPP 并行架构、列式存储、行式存储、数据压缩（ZSTD/LZO）、Sort Key、Distribution Key
+- 结论/价值：Redshift 是完全托管的 PB 级云数据仓库，支持 OLAP，提供 Leader Node 负责查询规划和元数据管理，Compute Node 通过 Slices 执行并行查询；RA3 实例类型性价比最优，支持 AWS 托管 NVMe 存储；Sort Key 和 Dist Key 是性能优化的关键配置
+
+## Key Claims（用中文描述）
+- Redshift 通过 Leader Node 管理 Schema、元数据和查询计划，将指令分发至 Compute Node 执行，实现 MPP（大规模并行处理），显著提升查询速度和响应时间
+- Redshift 支持列式存储（适合数据仓库操作）和行式存储两种模式，列式存储因更快的查询性能和更高的内存利用率而更适合 OLAP 场景
+- RA3 实例类型因其成本效益和大规模存储容量而被推荐，底层使用 AWS 托管的 NVMe 存储
+- Sort Key（排序键）和 Dist Key（分布键）是 Redshift 性能优化的核心机制，决定数据分布和查询执行效率
+
+## Key Quotes
+> "Redshift is a fully managed, petabyte-scale data warehouse solution in the cloud. It is designed for data warehousing, enabling quick data retrieval from large datasets." — 视频摘要
+
+> "The leader node manages schema, warehouse metadata, and query planning, distributing instructions to compute nodes." — Redshift 架构说明
+
+> "Compute nodes, determined by the instance type, execute queries across slices, processing data and returning results to the leader node." — Compute Node 工作机制
+
+> "RA3 is noted for its cost-effectiveness and large storage capacity, utilizing AWS-managed NVMe storage." — RA3 实例优势
+
+## Key Concepts
+- [[MPP (Massively Parallel Processing)]]：通过多个 Compute Node 并行处理查询，提升大规模数据集的查询速度和响应时间
+- [[列式存储（Columnar Storage)]]：数据按列而非按行存储，适合数据仓库的聚合查询和扫描操作，提供更快的查询性能和更高的内存效率
+- [[数据压缩（Data Compression)]]：采用 ZSTD/LZO 等压缩算法减少数据大小，提升 I/O 效率和查询性能
+- [[Sort Key（排序键)]]：决定数据在磁盘上的物理排序顺序，对范围查询和过滤操作性能影响显著
+- [[Distribution Key（分布键)]]：决定数据在 Compute Node 间如何分布，影响数据倾斜和节点间数据传输
+- [[OLAP（在线分析处理)]]：面向复杂分析查询的工作负载类型，Redshift 的核心设计目标
+- [[Leader Node（主节点)]]：Redshift 架构中的协调节点，负责客户端连接、Schema 管理、元数据存储和查询计划生成
+- [[Compute Node（计算节点)]]：Redshift 架构中的执行节点，负责在 Slices 上执行查询并返回结果
+
+## Key Entities
+- [[Amazon Redshift]]：AWS 提供的大规模并行处理数据仓库服务，支持 PB 级数据存储，面向 OLAP 工作负载
+- [[AWS]]：Amazon Web Services，云服务提供商，Redshift 的托管平台
+- [[RA3]]：Redshift 的高性价比实例类型，配备 AWS 托管 NVMe 存储，适合大容量存储场景
+- [[Dense Compute]]：Redshift 高计算密度实例类型，适合计算密集型查询
+- [[Dense Storage]]：Redshift 高存储密度实例类型，适合存储密集型工作负载
+- [[JDBC/ODBC]]：Redshift 客户端驱动协议，客户端应用通过 JDBC/ODBC 连接至 Redshift Cluster
+
+## Connections
+- [[ctp-topic-51-purpose-built-databases]] ← related_to ← [[Amazon Redshift]]
+- [[ctp-topic-66-rds-vs-aurora]] ← related_to ← [[Amazon Redshift]]
+- [[ctp-topic-40-saas-database-architecture-on-aws-cloud]] ← related_to ← [[Amazon Redshift]]
+
+## Contradictions
+- 与 [[ctp-topic-66-rds-vs-aurora]] 的数据写入模式：
+  - 冲突点：Aurora 采用共享存储架构（6副本跨3 AZ），而 Redshift 采用独立 Compute Node 架构；Aurora 更适合写入密集型 OLTP，Redshift 更适合分析密集型 OLAP
+  - 当前观点：Redshift 的列式存储 + MPP 是大规模数据分析的最优架构
+  - 对方观点：Aurora 的共享存储简化了 HA 和 DR，且 Blue-Green 部署支持更灵活