71 lines
2.3 KiB
Markdown
71 lines
2.3 KiB
Markdown
---
|
||
title: "StructuredTranscriptJSON"
|
||
type: concept
|
||
tags: ["voice-ai", "transcription", "json-schema", "interoperability"]
|
||
last_updated: 2026-05-02
|
||
---
|
||
|
||
# StructuredTranscriptJSON(结构化转录 JSON)
|
||
|
||
## Definition
|
||
|
||
Structured Transcript JSON 是转录管道输出的稳定 Schema JSON 格式,包含分段时间戳、说话人标签、置信度分数、完整文本和元数据。设计原则:**添加字段,永不删除或重命名**——下游消费者(CMS、LLM Agent、CI 工具)依赖 Schema 稳定性。
|
||
|
||
## Schema Design
|
||
|
||
```json
|
||
{
|
||
"schema_version": "1.0",
|
||
"metadata": {
|
||
"source_file": "...",
|
||
"duration": 3600.5,
|
||
"language": "en",
|
||
"transcription_date": "2026-05-02"
|
||
},
|
||
"segments": [
|
||
{
|
||
"index": 0,
|
||
"start": 0.0,
|
||
"end": 5.2,
|
||
"duration": 5.2,
|
||
"speaker": "SPEAKER_00",
|
||
"text": "Hello, welcome to the meeting.",
|
||
"confidence": -0.31
|
||
}
|
||
],
|
||
"full_text": "Hello, welcome to the meeting...",
|
||
"speakers": ["SPEAKER_00", "SPEAKER_01"],
|
||
"total_duration": 3600.5
|
||
}
|
||
```
|
||
|
||
## Schema Versioning Rules
|
||
|
||
- **向后兼容**:新增字段是安全的(消费者忽略未知字段)
|
||
- **破坏性变更**:删除/重命名字段 = 破坏所有消费者 = 必须 major 版本升级
|
||
- **版本声明**:每个文档必须包含 `schema_version` 字段
|
||
|
||
## Output Format Variants
|
||
|
||
| 格式 | 用途 | 包含内容 |
|
||
|------|------|---------|
|
||
| JSON | LLM Agent、CMS API | 全量结构(含时间戳、说话人、置信度) |
|
||
| SRT | 字幕文件、视频嵌入 | 时间戳 + 说话人前缀 + 文本 |
|
||
| VTT | Web 字幕 | 时间戳 + 说话人前缀 + 文本(WebVTT 格式) |
|
||
| TXT | 快速查阅 | 纯文本,无元数据 |
|
||
|
||
## Downstream Consumers
|
||
|
||
- [[LangChain]] / CrewAI:JSON 作为 `Document` 或 `Conversation` 输入
|
||
- CMS(Drupal/WordPress):JSON 存储为 `field_transcript_json`,`full_text` 存储为正文
|
||
- GitHub Actions:JSON 作为 CI artifact,触发后续处理流水线
|
||
- [[LLMHandoff]]:将 JSON 中的 `segments` 格式化为带时间戳的文本行用于 LLM 摘要/问答
|
||
|
||
## Related Concepts
|
||
- [[PIIRedaction]] — 输出前的 PII 脱敏
|
||
- [[SpeakerDiarization]] — `speaker` 字段的数据来源
|
||
- [[LLMHandoff]] — 消费 Structured JSON 的标准接口
|
||
|
||
## Related Sources
|
||
- [[engineering-voice-ai-integration-engineer]]
|