Auto-sync: update nexus workspace

This commit is contained in:
2026-04-28 07:26:52 +08:00
parent b83b4e3105
commit 3224ec4787
436 changed files with 17107 additions and 15920 deletions

View File

@@ -0,0 +1,23 @@
---
title: "Continuous Batching"
type: concept
tags: [continuous-batching, vllm, inference, gpu]
aliases: [Continuous Batching, 连续批处理, Iteration-Level Scheduling]
last_updated: 2025-12-20
---
## Definition
Continuous Batching连续批处理vLLM 的推理优化技术。与传统的攒满一批再处理不同Continuous Batching 在每个解码步骤(按 Token 迭代都动态组装活跃请求批次GPU 基本满负载运转。
## Key Facts
- 传统批处理:攒满一批再跑,短任务被长任务阻塞(头阻塞问题)
- Continuous Batching每步解码都组装新批次无需等待整批结束即可插入新请求
- 基于 [[PagedAttention]] 的块式内存 + 步进级调度器实现
- 提高 GPU 并发与公平性,充分利用 GPU 算力
## Connections
- [[vLLM]] ← 使用 ← [[Continuous Batching]]
- [[PagedAttention]] ← 协同 ← [[Continuous Batching]]
## Sources
- [[大模型相关术语和框架总结llm-mcp-prompt-rag-vllm-token-数据蒸馏]]