51 lines
1.9 KiB
Markdown
51 lines
1.9 KiB
Markdown
---
|
||
title: "FasterWhisper"
|
||
type: concept
|
||
tags: ["voice-ai", "whisper", "local-llm", "ctranslate2"]
|
||
last_updated: 2026-05-02
|
||
---
|
||
|
||
# FasterWhisper
|
||
|
||
## Definition
|
||
|
||
Faster-Whisper 是 OpenAI Whisper 的 CTranslate2 优化实现,在精度与原版相当的情况下,速度提升 2-4 倍,内存占用减少 50%+。是生产环境中本地转录的首选实现。
|
||
|
||
## Key Properties
|
||
|
||
- **底层优化**:CTranslate2(自定义 CUDA/AVX kernel)替代 PyTorch 原生实现
|
||
- **速度对比**:Whisper large-v3 在 A10G GPU 上约 2-3x 实时,Faster-Whisper 可达 8-10x 实时
|
||
- **精度**:与原版 Whisper 几乎无差别(<0.5% WER 差异)
|
||
- **模型规模**:tiny / base / small / medium / large-v2 / large-v3
|
||
- **关键参数**:`beam_size=5`(精度优先)、`word_timestamps=True`(字幕精度必需)、`vad_filter=True`
|
||
- **CPU 支持**:支持 CPU 推理(无 GPU 环境下的 viable 选项,速度较慢)
|
||
|
||
## Model Size Selection Guide
|
||
|
||
| 场景 | 推荐模型 | 硬件要求 |
|
||
|------|---------|---------|
|
||
| 实时/快速测试 | tiny / base | CPU 即可 |
|
||
| 平衡(大多数生产场景) | small / medium | GPU(RTX 3080+) |
|
||
| 最高精度(医疗/法律) | large-v3 | GPU(A10G / A100) |
|
||
|
||
## Pipeline Role
|
||
|
||
```
|
||
音频预处理 → FasterWhisper.transcribe() → 带时间戳的转录段落
|
||
```
|
||
|
||
## Related Concepts
|
||
- [[VoiceActivityDetection]] — 转录前的静音过滤
|
||
- [[SpeakerDiarization]] — 转录结果与说话人标签的合并
|
||
- [[OverlapAwareChunking]] — 长音频分块后逐块转录
|
||
- [[EBUR128LoudnessNormalization]] — 归一化音频确保转录精度
|
||
- [[PIIRedaction]] — 转录后的个人身份信息脱敏
|
||
- [[StructuredTranscriptJSON]] — 转录结果的最终输出格式
|
||
|
||
## Related Entities
|
||
- [[OpenAIWhisper]] — Faster-Whisper 基于的基础模型
|
||
- [[pyannote.audio]] — 说话人分离,与 Faster-Whisper 配套使用
|
||
|
||
## Related Sources
|
||
- [[engineering-voice-ai-integration-engineer]]
|