Files
nexus/wiki/concepts/PagedAttention.md

25 lines
1019 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "PagedAttention"
type: concept
tags: [paged-attention, vllm, inference, optimization]
aliases: [PagedAttention, 分页注意力]
last_updated: 2025-12-20
---
## Definition
PagedAttentionvLLM 的核心注意力机制创新,将 [[KV Cache]] 切分为固定大小的块block并用页表式映射管理类似操作系统的虚拟内存调度方式。
## Key Facts
- 传统方式:为每条序列分配一大块连续内存,导致碎片化和 OOM显存不足
- PagedAttention 解决方案:将 KV Cache 切分为固定大小块,用页表管理,灵活调度
- 优势:避免碎片化、支持动态并发、支持 KV 块复用(多分支/重复前缀场景)
- 显著减少预填充Prefill时间
## Connections
- [[vLLM]] ← 使用 ← [[PagedAttention]]
- [[KV Cache]] ← 优化管理 ← [[PagedAttention]]
- [[Continuous Batching]] ← 协同 ← [[PagedAttention]]
## Sources
- [[大模型相关术语和框架总结llm-mcp-prompt-rag-vllm-token-数据蒸馏]]