Update nexus wiki content
This commit is contained in:
45
wiki/entities/pyannote.audio.md
Normal file
45
wiki/entities/pyannote.audio.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "pyannote.audio"
|
||||
type: entity
|
||||
tags: ["speaker-diarization", "open-source", "huggingface"]
|
||||
sources: ["engineering-voice-ai-integration-engineer"]
|
||||
last_updated: 2026-05-02
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- pyannote
|
||||
- pyannote.audio
|
||||
|
||||
## Definition
|
||||
|
||||
pyannote.audio 是一个开源的说话人分离(Speaker Diarization)库,通过 Hugging Face Hub 分发模型(`pyannote/speaker-diarization-3.1`)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。
|
||||
|
||||
## Key Properties
|
||||
|
||||
- **模型**:`pyannote/speaker-diarization-3.1`
|
||||
- **访问方式**:Hugging Face token(需申请同意 pyannote 模型协议)
|
||||
- **硬件要求**:GPU 推荐(CUDA),CPU 可运行但较慢
|
||||
- **输入**:任意格式音频(WAV/MP3/FLAC 等)
|
||||
- **输出**:`[{start, end, speaker}]` 格式的说话人片段列表
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from pyannote.audio import Pipeline
|
||||
pipeline = Pipeline.from_pretrained(
|
||||
"pyannote/speaker-diarization-3.1",
|
||||
use_auth_token=hf_token
|
||||
)
|
||||
diarization = pipeline(audio_path, num_speakers=2) # num_speakers 可选
|
||||
for turn, _, speaker in diarization.itertracks(yield_label=True):
|
||||
print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")
|
||||
```
|
||||
|
||||
## Connections
|
||||
- [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具,分离结果与转录段落合并
|
||||
- [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现
|
||||
- [[AssemblyAI]] — 云端 ASR 替代(内置说话人分离)
|
||||
- [[Deepgram]] — 另一云端 ASR 替代(内置说话人分离)
|
||||
|
||||
## Sources
|
||||
- [[engineering-voice-ai-integration-engineer]]
|
||||
Reference in New Issue
Block a user