Update nexus wiki content
This commit is contained in:
55
wiki/concepts/OverlapAwareChunking.md
Normal file
55
wiki/concepts/OverlapAwareChunking.md
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
title: "OverlapAwareChunking"
|
||||
type: concept
|
||||
tags: ["voice-ai", "audio-processing", "transcription", "chunking"]
|
||||
last_updated: 2026-05-02
|
||||
---
|
||||
|
||||
# OverlapAwareChunking(重叠感知分块)
|
||||
|
||||
## Definition
|
||||
|
||||
Overlap-Aware Chunking 是将超长音频(>30 分钟)切分为多个重叠片段再分别转录的技术。重叠窗口(默认 30s)在合并阶段被裁剪,防止词边界在切分处被切断而产生重复或遗漏。
|
||||
|
||||
## The Problem
|
||||
|
||||
Whisper 类模型有最大输入时长限制(通常 30 秒到 10 分钟,取决于模型)。超出限制的音频如果直接截断,会导致:
|
||||
1. 词在切分点被切断 → 产生乱码/重复词
|
||||
2. 最后一块音频尾部被截断 → 内容丢失
|
||||
3. 丢失跨块语义连贯性
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
原始音频(120 分钟)
|
||||
↓ 分块(每块 30 分钟,重叠 30 秒)
|
||||
chunk_0000: [0:00 - 30:30]
|
||||
chunk_0001: [30:00 - 60:30] ← 重叠 30 秒
|
||||
chunk_0002: [60:00 - 90:30] ← 重叠 30 秒
|
||||
...
|
||||
↓ 逐块 FasterWhisper 转录
|
||||
↓ 合并(裁剪重叠区域)
|
||||
最终转录文本(无重复/遗漏)
|
||||
```
|
||||
|
||||
## Key Parameters
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `chunk_duration` | 1800s(30分钟) | 每块时长 |
|
||||
| `overlap` | 30s | 重叠窗口大小 |
|
||||
| 合并时裁剪 | 30s | 从第二块开始,裁掉前 30 秒 |
|
||||
|
||||
## Critical Insight
|
||||
|
||||
> "Overflow is silent and corrupts output without error." — 溢出无声损坏输出
|
||||
|
||||
分块策略的错误不会抛出异常,只会在最终合并的转录文本中出现乱码/重复/遗漏,难以事后发现。
|
||||
|
||||
## Related Concepts
|
||||
- [[FasterWhisper]] — 分块音频的转录执行方
|
||||
- [[VoiceActivityDetection]] — VAD 过滤在分块前执行效果更佳
|
||||
- [[StructuredTranscriptJSON]] — 分块合并后结构化输出的最终格式
|
||||
|
||||
## Related Sources
|
||||
- [[engineering-voice-ai-integration-engineer]]
|
||||
Reference in New Issue
Block a user