Update nexus wiki content
This commit is contained in:
43
wiki/concepts/SpeakerDiarization.md
Normal file
43
wiki/concepts/SpeakerDiarization.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "SpeakerDiarization"
|
||||
type: concept
|
||||
tags: ["voice-ai", "speech-processing", "speaker-attribution"]
|
||||
last_updated: 2026-05-02
|
||||
---
|
||||
|
||||
# SpeakerDiarization(说话人分离)
|
||||
|
||||
## Definition
|
||||
|
||||
Speaker Diarization(说话人分离)是自动识别音频中"谁在说话、何时说话"的技术。通过声纹特征聚类将连续音频划分为不同说话人的片段,并为每个片段标注说话人标签(`SPEAKER_00`, `SPEAKER_01` 等)。
|
||||
|
||||
## Key Properties
|
||||
|
||||
- **主流工具**:`pyannote.audio`(开源)、AssemblyAI 内置、Deepgram 内置
|
||||
- **输入**:原始音频(或已切块的音频)
|
||||
- **输出**:`[{start, end, speaker}]` 格式的说话人片段列表
|
||||
- **准确度影响因素**:说话人数已知 vs 未知(已知可显著提高准确度)、音频质量、重叠语音
|
||||
- **与转录的集成**:分离结果通过时间重叠与转录段落合并,产生 `TranscriptSegment`(带 speaker 标签)
|
||||
|
||||
## Pipeline Role
|
||||
|
||||
```
|
||||
音频 → pyannote.audio → 说话人片段
|
||||
↓ 合并(时间重叠匹配)
|
||||
FasterWhisper → 转录段落
|
||||
↓
|
||||
带说话人归属的转录段落
|
||||
```
|
||||
|
||||
## Related Concepts
|
||||
- [[VoiceActivityDetection]] — Diarization 的前置处理
|
||||
- [[StructuredTranscriptJSON]] — Diarization 结果的输出目标
|
||||
- [[LLMHandoff]] — 带说话人归属的转录文本下游传递给 LLM
|
||||
|
||||
## Related Entities
|
||||
- [[pyannote.audio]] — 主要开源 Diarization 工具
|
||||
- [[AssemblyAI]] — 带 Diarization 的云端 ASR 替代方案
|
||||
- [[Deepgram]] — 带 Diarization 的另一云端 ASR 选项
|
||||
|
||||
## Related Sources
|
||||
- [[engineering-voice-ai-integration-engineer]]
|
||||
Reference in New Issue
Block a user