Files
nexus/wiki/entities/pyannote.audio.md
2026-05-03 05:42:12 +08:00

46 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "pyannote.audio"
type: entity
tags: ["speaker-diarization", "open-source", "huggingface"]
sources: ["engineering-voice-ai-integration-engineer"]
last_updated: 2026-05-02
---
## Aliases
- pyannote
- pyannote.audio
## Definition
pyannote.audio 是一个开源的说话人分离Speaker Diarization通过 Hugging Face Hub 分发模型(`pyannote/speaker-diarization-3.1`)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
## Key Properties
- **模型**`pyannote/speaker-diarization-3.1`
- **访问方式**Hugging Face token需申请同意 pyannote 模型协议)
- **硬件要求**GPU 推荐CUDACPU 可运行但较慢
- **输入**任意格式音频WAV/MP3/FLAC 等)
- **输出**`[{start, end, speaker}]` 格式的说话人片段列表
## Usage
```python
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=hf_token
)
diarization = pipeline(audio_path, num_speakers=2) # num_speakers 可选
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
```
## Connections
- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具,分离结果与转录段落合并
- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
- [[AssemblyAI]] — 云端 ASR 替代(内置说话人分离)
- [[Deepgram]] — 另一云端 ASR 替代(内置说话人分离)
## Sources
- [[engineering-voice-ai-integration-engineer]]