Files
nexus/wiki/concepts/OverlapAwareChunking.md
2026-05-03 05:42:12 +08:00

56 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "OverlapAwareChunking"
type: concept
tags: ["voice-ai", "audio-processing", "transcription", "chunking"]
last_updated: 2026-05-02
---
# OverlapAwareChunking重叠感知分块
## Definition
Overlap-Aware Chunking 是将超长音频(>30 分钟)切分为多个重叠片段再分别转录的技术。重叠窗口(默认 30s在合并阶段被裁剪防止词边界在切分处被切断而产生重复或遗漏。
## The Problem
Whisper 类模型有最大输入时长限制(通常 30 秒到 10 分钟,取决于模型)。超出限制的音频如果直接截断,会导致:
1. 词在切分点被切断 → 产生乱码/重复词
2. 最后一块音频尾部被截断 → 内容丢失
3. 丢失跨块语义连贯性
## The Solution
```
原始音频120 分钟)
↓ 分块(每块 30 分钟,重叠 30 秒)
chunk_0000: [0:00 - 30:30]
chunk_0001: [30:00 - 60:30] ← 重叠 30 秒
chunk_0002: [60:00 - 90:30] ← 重叠 30 秒
...
↓ 逐块 FasterWhisper 转录
↓ 合并(裁剪重叠区域)
最终转录文本(无重复/遗漏)
```
## Key Parameters
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `chunk_duration` | 1800s30分钟 | 每块时长 |
| `overlap` | 30s | 重叠窗口大小 |
| 合并时裁剪 | 30s | 从第二块开始,裁掉前 30 秒 |
## Critical Insight
> "Overflow is silent and corrupts output without error." — 溢出无声损坏输出
分块策略的错误不会抛出异常,只会在最终合并的转录文本中出现乱码/重复/遗漏,难以事后发现。
## Related Concepts
- [[FasterWhisper]] — 分块音频的转录执行方
- [[VoiceActivityDetection]] — VAD 过滤在分块前执行效果更佳
- [[StructuredTranscriptJSON]] — 分块合并后结构化输出的最终格式
## Related Sources
- [[engineering-voice-ai-integration-engineer]]