Files
nexus/wiki/concepts/Continuous-Batching.md

24 lines
1018 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Continuous Batching"
type: concept
tags: [continuous-batching, vllm, inference, gpu]
aliases: [Continuous Batching, 连续批处理, Iteration-Level Scheduling]
last_updated: 2025-12-20
---
## Definition
Continuous Batching连续批处理vLLM 的推理优化技术。与传统的攒满一批再处理不同Continuous Batching 在每个解码步骤(按 Token 迭代都动态组装活跃请求批次GPU 基本满负载运转。
## Key Facts
- 传统批处理:攒满一批再跑,短任务被长任务阻塞(头阻塞问题)
- Continuous Batching每步解码都组装新批次无需等待整批结束即可插入新请求
- 基于 [[PagedAttention]] 的块式内存 + 步进级调度器实现
- 提高 GPU 并发与公平性,充分利用 GPU 算力
## Connections
- [[vLLM]] ← 使用 ← [[Continuous Batching]]
- [[PagedAttention]] ← 协同 ← [[Continuous Batching]]
## Sources
- [[大模型相关术语和框架总结llm-mcp-prompt-rag-vllm-token-数据蒸馏]]