Auto-sync: update nexus workspace

2026-04-28 07:26:52 +08:00
parent b83b4e3105
commit 3224ec4787
436 changed files with 17107 additions and 15920 deletions
--- a/wiki/concepts/PagedAttention.md
+++ b/wiki/concepts/PagedAttention.md
@@ -0,0 +1,24 @@
+---
+title: "PagedAttention"
+type: concept
+tags: [paged-attention, vllm, inference, optimization]
+aliases: [PagedAttention, 分页注意力]
+last_updated: 2025-12-20
+---
+
+## Definition
+PagedAttention，vLLM 的核心注意力机制创新，将 [[KV Cache]] 切分为固定大小的块（block），并用页表式映射管理，类似操作系统的虚拟内存调度方式。
+
+## Key Facts
+- 传统方式：为每条序列分配一大块连续内存，导致碎片化和 OOM（显存不足）
+- PagedAttention 解决方案：将 KV Cache 切分为固定大小块，用页表管理，灵活调度
+- 优势：避免碎片化、支持动态并发、支持 KV 块复用（多分支/重复前缀场景）
+- 显著减少预填充（Prefill）时间
+
+## Connections
+- [[vLLM]] ← 使用 ← [[PagedAttention]]
+- [[KV Cache]] ← 优化管理 ← [[PagedAttention]]
+- [[Continuous Batching]] ← 协同 ← [[PagedAttention]]
+
+## Sources
+- [[大模型相关术语和框架总结｜llm-mcp-prompt-rag-vllm-token-数据蒸馏]]