Files
nexus/wiki/concepts/FasterWhisper.md
2026-05-03 05:42:12 +08:00

51 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "FasterWhisper"
type: concept
tags: ["voice-ai", "whisper", "local-llm", "ctranslate2"]
last_updated: 2026-05-02
---
# FasterWhisper
## Definition
Faster-Whisper 是 OpenAI Whisper 的 CTranslate2 优化实现,在精度与原版相当的情况下,速度提升 2-4 倍,内存占用减少 50%+。是生产环境中本地转录的首选实现。
## Key Properties
- **底层优化**CTranslate2自定义 CUDA/AVX kernel替代 PyTorch 原生实现
- **速度对比**Whisper large-v3 在 A10G GPU 上约 2-3x 实时Faster-Whisper 可达 8-10x 实时
- **精度**:与原版 Whisper 几乎无差别(<0.5% WER 差异)
- **模型规模**tiny / base / small / medium / large-v2 / large-v3
- **关键参数**`beam_size=5`(精度优先)、`word_timestamps=True`(字幕精度必需)、`vad_filter=True`
- **CPU 支持**:支持 CPU 推理(无 GPU 环境下的 viable 选项,速度较慢)
## Model Size Selection Guide
| 场景 | 推荐模型 | 硬件要求 |
|------|---------|---------|
| 实时/快速测试 | tiny / base | CPU 即可 |
| 平衡(大多数生产场景) | small / medium | GPURTX 3080+ |
| 最高精度(医疗/法律) | large-v3 | GPUA10G / A100 |
## Pipeline Role
```
音频预处理 → FasterWhisper.transcribe() → 带时间戳的转录段落
```
## Related Concepts
- [[VoiceActivityDetection]] — 转录前的静音过滤
- [[SpeakerDiarization]] — 转录结果与说话人标签的合并
- [[OverlapAwareChunking]] — 长音频分块后逐块转录
- [[EBUR128LoudnessNormalization]] — 归一化音频确保转录精度
- [[PIIRedaction]] — 转录后的个人身份信息脱敏
- [[StructuredTranscriptJSON]] — 转录结果的最终输出格式
## Related Entities
- [[OpenAIWhisper]] — Faster-Whisper 基于的基础模型
- [[pyannote.audio]] — 说话人分离,与 Faster-Whisper 配套使用
## Related Sources
- [[engineering-voice-ai-integration-engineer]]