Update nexus wiki content
This commit is contained in:
33
wiki/concepts/VoiceActivityDetection.md
Normal file
33
wiki/concepts/VoiceActivityDetection.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "VoiceActivityDetection"
|
||||
type: concept
|
||||
tags: ["voice-ai", "audio-processing", "whisper"]
|
||||
last_updated: 2026-05-02
|
||||
---
|
||||
|
||||
# VoiceActivityDetection (VAD)
|
||||
|
||||
## Definition
|
||||
|
||||
Voice Activity Detection(语音活动检测)是识别音频流中人类语音片段的技术,用于跳过静音区域。在 Whisper 类转录模型管道中作为预处理阶段,显著减少需要处理的音频量,提高转录效率和成本效益。
|
||||
|
||||
## Key Properties
|
||||
|
||||
- **典型阈值**:`min_silence_duration_ms=500`(500ms 静音才触发跳过)
|
||||
- **在 Whisper 中的角色**:`vad_filter=True` + `vad_parameters` 参数
|
||||
- **效果**:减少 ~30-50% 的静音处理,提升 1.5-2x 吞吐量
|
||||
- **误判风险**:音乐、背景语音、非语音音效可能被误判为静音
|
||||
|
||||
## Usage in Pipeline
|
||||
|
||||
```
|
||||
原始音频 → VAD 过滤 → 非静音片段 → FasterWhisper 转录
|
||||
```
|
||||
|
||||
## Related Concepts
|
||||
- [[FasterWhisper]] — VAD 的主要消费方
|
||||
- [[OverlapAwareChunking]] — VAD 之后对长音频的分块处理
|
||||
- [[EBUR128LoudnessNormalization]] — VAD 之前的响度归一化(确保 VAD 阈值准确)
|
||||
|
||||
## Related Sources
|
||||
- [[engineering-voice-ai-integration-engineer]]
|
||||
Reference in New Issue
Block a user