title, type, tags, sources, last_updated
| title |
type |
tags |
sources |
last_updated |
| pyannote.audio |
entity |
| speaker-diarization |
| open-source |
| huggingface |
|
| engineering-voice-ai-integration-engineer |
|
2026-05-02 |
Aliases
Definition
pyannote.audio 是一个开源的说话人分离(Speaker Diarization)库,通过 Hugging Face Hub 分发模型(pyannote/speaker-diarization-3.1)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
Key Properties
- 模型:
pyannote/speaker-diarization-3.1
- 访问方式:Hugging Face token(需申请同意 pyannote 模型协议)
- 硬件要求:GPU 推荐(CUDA),CPU 可运行但较慢
- 输入:任意格式音频(WAV/MP3/FLAC 等)
- 输出:
[{start, end, speaker}] 格式的说话人片段列表
Usage
Connections
Sources