wiki-ingest: 大模型相关术语和框架总结
This commit is contained in:
@@ -1,8 +1,11 @@
|
||||
# Hallucination
|
||||
# Hallucination (幻觉)
|
||||
|
||||
## Definition
|
||||
|
||||
The phenomenon where an LLM generates information that appears plausible but is actually false, fabricated, or not grounded in its input or training data. The model "makes things up" with confidence, presenting fiction as fact.
|
||||
|
||||
In the context of Chinese documentation: 大模型总是一本正经的回答问题,但其实是在胡说八道。LLM 在面对陌生领域时,只会在答案中写一个"解"字(因为 LLM 的知识局限于特定数据集),然后就开始放飞自我生成看似合理但实际错误的内容。
|
||||
|
||||
## Key Statistics
|
||||
- If a single model hallucinates 20% of the time
|
||||
- 3 models hallucinating the exact same lie: 0.8% (0.2³ = 0.008)
|
||||
@@ -30,4 +33,6 @@ The phenomenon where an LLM generates information that appears plausible but is
|
||||
- [[Context Drift]]
|
||||
- [[Multi-Agent Consensus]]
|
||||
- [[Validator]]
|
||||
- [[LLM Reliability Engineering]]
|
||||
- [[LLM Reliability Engineering]]
|
||||
- [[RAG]] — 检索增强生成,通过外部知识检索解决幻觉问题,可将正确率从 60% 提升至 90%
|
||||
- [[Embedding]] — 向量化技术,支撑 RAG 的语义检索基础
|
||||
23
wiki/concepts/KV-Cache.md
Normal file
23
wiki/concepts/KV-Cache.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# KV Cache
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Type**: Concept
|
||||
- **Category**: AI/LLM/Inference Optimization
|
||||
|
||||
## Definition
|
||||
|
||||
KV Cache(Key-Value Cache)是 Transformer 模型推理优化中的关键技术。K(Key)和 V(Value)是由每个 token 的向量化后通过线性变换得到的两类向量,用于注意力机制计算。KV Cache 将这些历史 K/V 保存下来,使得后续步骤不需要重复计算,从而加速推理。
|
||||
|
||||
## Details
|
||||
|
||||
- **K 和 V 的来源**: 每个 token 的向量化结果通过线性变换得到
|
||||
- **作用**: 避免重复计算,提高推理效率
|
||||
- **局限性**: KV Cache 随上下文长度、层数、头数、维度线性增长,是推理中的主要显存开销之一
|
||||
- **优化方案**: vLLM 的 PagedAttention 将 KV Cache 切分为固定大小的块管理
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[vLLM]]
|
||||
- [[PagedAttention]]
|
||||
- [[LLM]]
|
||||
27
wiki/concepts/PagedAttention.md
Normal file
27
wiki/concepts/PagedAttention.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# PagedAttention
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Type**: Concept
|
||||
- **Category**: AI/LLM/Inference Optimization
|
||||
|
||||
## Definition
|
||||
|
||||
PagedAttention 是 vLLM 项目开发的一种注意力机制优化算法,将 KV Cache 切分为固定大小的"块"(block),并通过页表式映射管理这些块,类似于操作系统虚拟内存的调度方式。这种方法避免了按序列分配大块连续内存导致的碎片化和 OOM(内存溢出),同时支持动态并发与复用。
|
||||
|
||||
## Details
|
||||
|
||||
- **核心思想**: 将 KV Cache 分块管理,类似操作系统虚拟内存
|
||||
- **分块大小**: 固定大小的块(block)
|
||||
- **管理方式**: 页表式映射
|
||||
- **优势**:
|
||||
- 避免碎片化和 OOM
|
||||
- 支持动态并发
|
||||
- 支持相同前缀的 KV 块复用(如 beam search 和重复前缀场景)
|
||||
- 减少 prefill(预填充)时间
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[vLLM]]
|
||||
- [[KV Cache]]
|
||||
- [[LLM]]
|
||||
Reference in New Issue
Block a user