Files
nexus/wiki/entities/ffmpeg.md
2026-05-03 05:42:12 +08:00

66 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "ffmpeg"
type: entity
tags: ["audio-processing", "multimedia", "open-source", "cli"]
sources: ["engineering-voice-ai-integration-engineer"]
last_updated: 2026-05-02
---
## Aliases
- ffmpeg
- ffprobeffmpeg 配套工具)
## Definition
FFmpeg 是开源的多媒体处理工具链,在 Voice AI Integration Engineer 的管道中负责所有音频格式探测、转换和预处理工作。是 Whisper 类转录管道中不可或缺的底层工具。
## Core Tools Used
| 工具 | 用途 |
|------|------|
| `ffprobe` | 探测音频格式、时长、codec、采样率、声道数格式验证 |
| `ffmpeg` | 格式转换、重采样、响度归一化、静音切除(预处理) |
## Key Operations in Voice AI Pipeline
```bash
# 1. 探测音频属性(验证)
ffprobe -v quiet -print_format json \
-show_streams -show_format input.wav
# 2. Whisper 预处理16kHz 单声道 + R128 响度归一化)
ffmpeg -y -i input.mp4 -vn \
-acodec pcm_s16le \
-ar 16000 \
-ac 1 \
-af "loudnorm=I=-16:TP=-1.5:LRA=11" \
output.wav
# 3. 长音频分块
ffmpeg -y -i long_audio.wav \
-ss 0 -to 1800 \
-acodec copy chunk_0000.wav
```
## Critical Role in Quality Assurance
> "Never trust file extensions — always probe the actual container." — 必须用 ffprobe 探测实际格式
FFmpeg 是音频质量验证的核心采样率是否正确Whisper 需要 16kHz、是否有音频流.mp4 可能是纯视频)、音频 codec 是否被支持、时长是否在模型限制内。
## Key Principles
1. **永远探测,不猜**:用 `ffprobe` 而非依赖扩展名
2. **16kHz 单声道是 Whisper 标准**:除非要处理特定多声道场景
3. **R128 响度归一化先于 VAD**:确保 VAD 阈值准确
4. **-vn 去除视频轨道**.mp4/.mov 等容器中的视频流会影响处理
## Connections
- [[EBUR128LoudnessNormalization]] — 通过 `-af loudnorm` 实现
- [[FasterWhisper]] — 消费 ffmpeg 预处理后的音频
- [[OverlapAwareChunking]] — 通过 ffmpeg 时间切片实现分块
- [[VoiceActivityDetection]] — VAD 之后 ffmpeg 也可做静音切除
## Sources
- [[engineering-voice-ai-integration-engineer]]