nexus/wiki/entities/pyannote.audio.md at b40abbcd473a7093d8261e212e3d6de97c1e516a

ishenwei/nexus

Fork 0

Files

weishen 111bc65b7b Update nexus wiki content

2026-05-03 05:42:12 +08:00

1.6 KiB

Raw Blame History

title, type, tags, sources, last_updated

title

type

Aliases

pyannote
pyannote.audio

Definition

pyannote.audio 是一个开源的说话人分离（Speaker Diarization）库，通过 Hugging Face Hub 分发模型（pyannote/speaker-diarization-3.1）。支持音频流中"谁在何时说话"的自动检测，与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。

Key Properties

模型：pyannote/speaker-diarization-3.1
访问方式：Hugging Face token（需申请同意 pyannote 模型协议）
硬件要求：GPU 推荐（CUDA），CPU 可运行但较慢
输入：任意格式音频（WAV/MP3/FLAC 等）
输出：[{start, end, speaker}] 格式的说话人片段列表

Usage

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=hf_token
)
diarization = pipeline(audio_path, num_speakers=2)  # num_speakers 可选
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")

Connections

OpenAIWhisper / FasterWhisper — 配套转录工具，分离结果与转录段落合并
SpeakerDiarization — pyannote.audio 是该概念的参考实现
AssemblyAI — 云端 ASR 替代（内置说话人分离）
Deepgram — 另一云端 ASR 替代（内置说话人分离）

Sources

engineering-voice-ai-integration-engineer

1.6 KiB Raw Blame History Unescape Escape