Update nexus wiki content

This commit is contained in:
2026-05-03 05:42:06 +08:00
parent 90f3811b83
commit 111bc65b7b
707 changed files with 32306 additions and 7289 deletions

View File

@@ -0,0 +1,45 @@
---
title: "pyannote.audio"
type: entity
tags: ["speaker-diarization", "open-source", "huggingface"]
sources: ["engineering-voice-ai-integration-engineer"]
last_updated: 2026-05-02
---
## Aliases
- pyannote
- pyannote.audio
## Definition
pyannote.audio 是一个开源的说话人分离Speaker Diarization通过 Hugging Face Hub 分发模型(`pyannote/speaker-diarization-3.1`)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
## Key Properties
- **模型**`pyannote/speaker-diarization-3.1`
- **访问方式**Hugging Face token需申请同意 pyannote 模型协议)
- **硬件要求**GPU 推荐CUDACPU 可运行但较慢
- **输入**任意格式音频WAV/MP3/FLAC 等)
- **输出**`[{start, end, speaker}]` 格式的说话人片段列表
## Usage
```python
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=hf_token
)
diarization = pipeline(audio_path, num_speakers=2) # num_speakers 可选
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
```
## Connections
- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具,分离结果与转录段落合并
- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
- [[AssemblyAI]] — 云端 ASR 替代(内置说话人分离)
- [[Deepgram]] — 另一云端 ASR 替代(内置说话人分离)
## Sources
- [[engineering-voice-ai-integration-engineer]]