--- title: "pyannote.audio" type: entity tags: ["speaker-diarization", "open-source", "huggingface"] sources: ["engineering-voice-ai-integration-engineer"] last_updated: 2026-05-02 --- ## Aliases - pyannote - pyannote.audio ## Definition pyannote.audio 是一个开源的说话人分离(Speaker Diarization)库,通过 Hugging Face Hub 分发模型(`pyannote/speaker-diarization-3.1`)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。 ## Key Properties - **模型**:`pyannote/speaker-diarization-3.1` - **访问方式**:Hugging Face token(需申请同意 pyannote 模型协议) - **硬件要求**:GPU 推荐(CUDA),CPU 可运行但较慢 - **输入**:任意格式音频(WAV/MP3/FLAC 等) - **输出**:`[{start, end, speaker}]` 格式的说话人片段列表 ## Usage ```python from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.1", use_auth_token=hf_token ) diarization = pipeline(audio_path, num_speakers=2) # num_speakers 可选 for turn, _, speaker in diarization.itertracks(yield_label=True): print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]") ``` ## Connections - [[OpenAIWhisper]] / [[FasterWhisper]] — 配套转录工具,分离结果与转录段落合并 - [[SpeakerDiarization]] — pyannote.audio 是该概念的参考实现 - [[AssemblyAI]] — 云端 ASR 替代(内置说话人分离) - [[Deepgram]] — 另一云端 ASR 替代(内置说话人分离) ## Sources - [[engineering-voice-ai-integration-engineer]]