Files
nexus/wiki/concepts/VoiceActivityDetection.md
2026-05-03 05:42:12 +08:00

34 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "VoiceActivityDetection"
type: concept
tags: ["voice-ai", "audio-processing", "whisper"]
last_updated: 2026-05-02
---
# VoiceActivityDetection (VAD)
## Definition
Voice Activity Detection语音活动检测是识别音频流中人类语音片段的技术用于跳过静音区域。在 Whisper 类转录模型管道中作为预处理阶段,显著减少需要处理的音频量,提高转录效率和成本效益。
## Key Properties
- **典型阈值**`min_silence_duration_ms=500`500ms 静音才触发跳过)
- **在 Whisper 中的角色**`vad_filter=True` + `vad_parameters` 参数
- **效果**:减少 ~30-50% 的静音处理,提升 1.5-2x 吞吐量
- **误判风险**:音乐、背景语音、非语音音效可能被误判为静音
## Usage in Pipeline
```
原始音频 → VAD 过滤 → 非静音片段 → FasterWhisper 转录
```
## Related Concepts
- [[FasterWhisper]] — VAD 的主要消费方
- [[OverlapAwareChunking]] — VAD 之后对长音频的分块处理
- [[EBUR128LoudnessNormalization]] — VAD 之前的响度归一化(确保 VAD 阈值准确)
## Related Sources
- [[engineering-voice-ai-integration-engineer]]