Files
nexus/wiki/concepts/KV-Cache.md

24 lines
973 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "KV Cache"
type: concept
tags: [kv-cache, inference, llm, optimization]
aliases: [KV Cache, Key-Value Cache, KV缓存]
last_updated: 2025-12-20
---
## Definition
KV Cache大语言模型推理过程中的缓存机制。KKey和 VValue是由每个 Token 的向量通过线性变换得到的两类向量用于注意力计算。KV Cache 将这些历史 K/V 保存下来,避免后续解码步骤重复计算。
## Key Facts
- 节省计算:无需每次都重新计算历史 Token 的注意力
- 显存开销KV Cache 随上下文长度、层数、头数、维度线性增长,是推理中最大的显存开销来源之一
- [[vLLM]] 的核心优化对象
- [[PagedAttention]] 通过分块管理解决其碎片化问题
## Connections
- [[vLLM]] ← 优化 ← [[KV Cache]]
- [[PagedAttention]] ← 解决 ← [[KV Cache]] 的碎片化问题
## Sources
- [[大模型相关术语和框架总结llm-mcp-prompt-rag-vllm-token-数据蒸馏]]