Files
nexus/wiki/entities/pyannote.audio.md
2026-05-03 05:42:12 +08:00

1.6 KiB
Raw Blame History

title, type, tags, sources, last_updated
title type tags sources last_updated
pyannote.audio entity
speaker-diarization
open-source
huggingface
engineering-voice-ai-integration-engineer
2026-05-02

Aliases

  • pyannote
  • pyannote.audio

Definition

pyannote.audio 是一个开源的说话人分离Speaker Diarization通过 Hugging Face Hub 分发模型(pyannote/speaker-diarization-3.1)。支持音频流中"谁在何时说话"的自动检测,与 Whisper 类转录模型配套使用产生带说话人归属的转录结果。

Key Properties

  • 模型pyannote/speaker-diarization-3.1
  • 访问方式Hugging Face token需申请同意 pyannote 模型协议)
  • 硬件要求GPU 推荐CUDACPU 可运行但较慢
  • 输入任意格式音频WAV/MP3/FLAC 等)
  • 输出[{start, end, speaker}] 格式的说话人片段列表

Usage

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=hf_token
)
diarization = pipeline(audio_path, num_speakers=2)  # num_speakers 可选
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{speaker}: [{turn.start:.1f}s - {turn.end:.1f}s]")

Connections

Sources