46 lines
1.6 KiB
Markdown
46 lines
1.6 KiB
Markdown
---
|
||
title: "pyannote.audio"
|
||
type: entity
|
||
tags: ["speaker-diarization", "open-source", "huggingface"]
|
||
sources: ["engineering-voice-ai-integration-engineer"]
|
||
last_updated: 2026-05-02
|
||
---
|
||
|
||
## Aliases
|
||
- pyannote
|
||
- pyannote.audio
|
||
|
||
## Definition
|
||
|
||
pyannote.audio 是一个开源的说话人分离(Speaker Diarization)库,通过 Hugging Face Hub 分发模型(`pyannote/speaker-diarization-3.1`)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
|
||
|
||
## Key Properties
|
||
|
||
- **模型**:`pyannote/speaker-diarization-3.1`
|
||
- **访问方式**:Hugging Face token(需申请同意 pyannote 模型协议)
|
||
- **硬件要求**:GPU 推荐(CUDA),CPU 可运行但较慢
|
||
- **输入**:任意格式音频(WAV/MP3/FLAC 等)
|
||
- **输出**:`[{start, end, speaker}]` 格式的说话人片段列表
|
||
|
||
## Usage
|
||
|
||
```python
|
||
from pyannote.audio import Pipeline
|
||
pipeline = Pipeline.from_pretrained(
|
||
"pyannote/speaker-diarization-3.1",
|
||
use_auth_token=hf_token
|
||
)
|
||
diarization = pipeline(audio_path, num_speakers=2) # num_speakers 可选
|
||
for turn, _, speaker in diarization.itertracks(yield_label=True):
|
||
print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
|
||
```
|
||
|
||
## Connections
|
||
- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具,分离结果与转录段落合并
|
||
- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
|
||
- [[AssemblyAI]] — 云端 ASR 替代(内置说话人分离)
|
||
- [[Deepgram]] — 另一云端 ASR 替代(内置说话人分离)
|
||
|
||
## Sources
|
||
- [[engineering-voice-ai-integration-engineer]]
|