Files
nexus/wiki/concepts/SpeakerDiarization.md
2026-05-03 05:42:12 +08:00

44 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "SpeakerDiarization"
type: concept
tags: ["voice-ai", "speech-processing", "speaker-attribution"]
last_updated: 2026-05-02
---
# SpeakerDiarization说话人分离
## Definition
Speaker Diarization说话人分离是自动识别音频中"谁在说话、何时说话"的技术。通过声纹特征聚类将连续音频划分为不同说话人的片段,并为每个片段标注说话人标签(`SPEAKER_00`, `SPEAKER_01` 等)。
## Key Properties
- **主流工具**`pyannote.audio`开源、AssemblyAI 内置、Deepgram 内置
- **输入**:原始音频(或已切块的音频)
- **输出**`[{start, end, speaker}]` 格式的说话人片段列表
- **准确度影响因素**:说话人数已知 vs 未知(已知可显著提高准确度)、音频质量、重叠语音
- **与转录的集成**:分离结果通过时间重叠与转录段落合并,产生 `TranscriptSegment`(带 speaker 标签)
## Pipeline Role
```
音频 → pyannote.audio → 说话人片段
↓ 合并(时间重叠匹配)
FasterWhisper → 转录段落
带说话人归属的转录段落
```
## Related Concepts
- [[VoiceActivityDetection]] — Diarization 的前置处理
- [[StructuredTranscriptJSON]] — Diarization 结果的输出目标
- [[LLMHandoff]] — 带说话人归属的转录文本下游传递给 LLM
## Related Entities
- [[pyannote.audio]] — 主要开源 Diarization 工具
- [[AssemblyAI]] — 带 Diarization 的云端 ASR 替代方案
- [[Deepgram]] — 带 Diarization 的另一云端 ASR 选项
## Related Sources
- [[engineering-voice-ai-integration-engineer]]