Auto-sync: update nexus workspace
This commit is contained in:
23
wiki/concepts/Continuous-Batching.md
Normal file
23
wiki/concepts/Continuous-Batching.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "Continuous Batching"
|
||||
type: concept
|
||||
tags: [continuous-batching, vllm, inference, gpu]
|
||||
aliases: [Continuous Batching, 连续批处理, Iteration-Level Scheduling]
|
||||
last_updated: 2025-12-20
|
||||
---
|
||||
|
||||
## Definition
|
||||
Continuous Batching,连续批处理,vLLM 的推理优化技术。与传统的攒满一批再处理不同,Continuous Batching 在每个解码步骤(按 Token 迭代)都动态组装活跃请求批次,GPU 基本满负载运转。
|
||||
|
||||
## Key Facts
|
||||
- 传统批处理:攒满一批再跑,短任务被长任务阻塞(头阻塞问题)
|
||||
- Continuous Batching:每步解码都组装新批次,无需等待整批结束即可插入新请求
|
||||
- 基于 [[PagedAttention]] 的块式内存 + 步进级调度器实现
|
||||
- 提高 GPU 并发与公平性,充分利用 GPU 算力
|
||||
|
||||
## Connections
|
||||
- [[vLLM]] ← 使用 ← [[Continuous Batching]]
|
||||
- [[PagedAttention]] ← 协同 ← [[Continuous Batching]]
|
||||
|
||||
## Sources
|
||||
- [[大模型相关术语和框架总结|llm-mcp-prompt-rag-vllm-token-数据蒸馏]]
|
||||
Reference in New Issue
Block a user