OverlapAwareChunking（重叠感知分块）

Definition

Overlap-Aware Chunking 是将超长音频（>30 分钟）切分为多个重叠片段再分别转录的技术。重叠窗口（默认 30s）在合并阶段被裁剪，防止词边界在切分处被切断而产生重复或遗漏。

The Problem

Whisper 类模型有最大输入时长限制（通常 30 秒到 10 分钟，取决于模型）。超出限制的音频如果直接截断，会导致：

词在切分点被切断 → 产生乱码/重复词
最后一块音频尾部被截断 → 内容丢失
丢失跨块语义连贯性

The Solution

原始音频（120 分钟）
       ↓ 分块（每块 30 分钟，重叠 30 秒）
chunk_0000: [0:00 - 30:30]
chunk_0001: [30:00 - 60:30]  ← 重叠 30 秒
chunk_0002: [60:00 - 90:30]  ← 重叠 30 秒
...
       ↓ 逐块 FasterWhisper 转录
       ↓ 合并（裁剪重叠区域）
最终转录文本（无重复/遗漏）

Key Parameters

参数	默认值	说明
`chunk_duration`	1800s（30分钟）	每块时长
`overlap`	30s	重叠窗口大小
合并时裁剪	30s	从第二块开始，裁掉前 30 秒

Critical Insight

"Overflow is silent and corrupts output without error." — 溢出无声损坏输出

分块策略的错误不会抛出异常，只会在最终合并的转录文本中出现乱码/重复/遗漏，难以事后发现。

FasterWhisper — 分块音频的转录执行方
VoiceActivityDetection — VAD 过滤在分块前执行效果更佳
StructuredTranscriptJSON — 分块合并后结构化输出的最终格式

engineering-voice-ai-integration-engineer

1.8 KiB Raw Blame History Unescape Escape

OverlapAwareChunking（重叠感知分块）

Definition

The Problem

The Solution

Key Parameters

Critical Insight

Related Concepts

Related Sources

1.8 KiB

Raw Blame History