nexus/wiki/entities/pyannote.audio.md

---
title: "pyannote.audio"
type: entity
tags: ["speaker-diarization", "open-source", "huggingface"]
sources: ["engineering-voice-ai-integration-engineer"]
last_updated: 2026-05-02
---

## Aliases
- pyannote
- pyannote.audio

## Definition

pyannote.audio 是一个开源的说话人分离（Speaker Diarization）库，通过 Hugging Face Hub 分发模型（`pyannote/speaker-diarization-3.1`）。支持音频流中"谁在何时说话"的自动检测，与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。

## Key Properties

- **模型**：`pyannote/speaker-diarization-3.1`
- **访问方式**：Hugging Face token（需申请同意 pyannote 模型协议）
- **硬件要求**：GPU 推荐（CUDA），CPU 可运行但较慢
- **输入**：任意格式音频（WAV/MP3/FLAC 等）
- **输出**：`[{start, end, speaker}]` 格式的说话人片段列表

## Usage

```python
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=hf_token
)
diarization = pipeline(audio_path, num_speakers=2)  # num_speakers 可选
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
```

## Connections
- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具，分离结果与转录段落合并
- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
- [[AssemblyAI]] — 云端 ASR 替代（内置说话人分离）
- [[Deepgram]] — 另一云端 ASR 替代（内置说话人分离）

## Sources
- [[engineering-voice-ai-integration-engineer]]