Update nexus wiki content

2026-05-03 05:42:06 +08:00
parent 90f3811b83
commit 111bc65b7b
707 changed files with 32306 additions and 7289 deletions
--- a/wiki/entities/pyannote.audio.md
+++ b/wiki/entities/pyannote.audio.md
@@ -0,0 +1,45 @@
+---
+title: "pyannote.audio"
+type: entity
+tags: ["speaker-diarization", "open-source", "huggingface"]
+sources: ["engineering-voice-ai-integration-engineer"]
+last_updated: 2026-05-02
+---
+
+## Aliases
+- pyannote
+- pyannote.audio
+
+## Definition
+
+pyannote.audio 是一个开源的说话人分离（Speaker Diarization）库，通过 Hugging Face Hub 分发模型（`pyannote/speaker-diarization-3.1`）。支持音频流中"谁在何时说话"的自动检测，与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
+
+## Key Properties
+
+- **模型**：`pyannote/speaker-diarization-3.1`
+- **访问方式**：Hugging Face token（需申请同意 pyannote 模型协议）
+- **硬件要求**：GPU 推荐（CUDA），CPU 可运行但较慢
+- **输入**：任意格式音频（WAV/MP3/FLAC 等）
+- **输出**：`[{start, end, speaker}]` 格式的说话人片段列表
+
+## Usage
+
+```python
+from pyannote.audio import Pipeline
+pipeline = Pipeline.from_pretrained(
+    "pyannote/speaker-diarization-3.1",
+    use_auth_token=hf_token
+)
+diarization = pipeline(audio_path, num_speakers=2)  # num_speakers 可选
+for turn, _, speaker in diarization.itertracks(yield_label=True):
+    print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
+```
+
+## Connections
+- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具，分离结果与转录段落合并
+- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
+- [[AssemblyAI]] — 云端 ASR 替代（内置说话人分离）
+- [[Deepgram]] — 另一云端 ASR 替代（内置说话人分离）
+
+## Sources
+- [[engineering-voice-ai-integration-engineer]]