Files
nexus/wiki/concepts/vLLM.md

37 lines
712 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: vllm
title: "vLLM"
type: concept
tags: [LLM, inference, GPU, optimization]
sources:
- "[[LLM Terms Framework]]"
last_updated: 2025-12-20
---
## Definition
vLLM是一个高效LLM推理框架通过KV Cache和连续批处理提升GPU利用率。
## Key Optimizations
### KV Cache
- 缓存已计算的Key-Value矩阵
- 避免重复计算
- 大幅提升推理速度
### Continuous Batching
- 动态批处理多个请求
- 提高GPU利用率
- 降低延迟
## Why It Matters
- 官方HuggingFace推理速度慢
- vLLM可提升10-24倍速度
- 支持高并发推理
## Connections
- [[LLM]] ← uses ← [[vLLM]]
- [[推理优化]] ← uses ← [[vLLM]]
- [[GPU利用率]] ← improves ← [[vLLM]]