Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/openclaw/xingshu/whisper-guide.md
+++ b/openclaw/xingshu/whisper-guide.md
@@ -1,261 +1,261 @@
-# Whisper 本地语音转录完全指南
-
-> 文档版本：2026-04-15
-> 维护者：星枢（xingshu）
-> 状态：✅ Macmini 已验证可运行
-
---
-
-## 一、Whisper 是什么
-
-Whisper 是 OpenAI 开源的自动语音识别（ASR）模型，可将音频文件转录为文字。支持 99 种语言，尤其对英文识别精度极高。
-
-**两种使用方式：**
-
-| 方式 | 说明 | 费用 |
-|---|---|---|
-| **本地运行** | 模型下载到本地 Mac/PC | **免费** |
-| OpenAI API | 调用 OpenAI Whisper API | 按分钟计费 |
-
-本指南使用**本地运行**方式。
-
---
-
-## 二、支持的模型
-
-| 模型 | 参数量 | 英文 WER* | 中文 CER* | 本地内存占用 | Macmini 兼容性 |
-|---|---|---|---|---|---|
-| `tiny` | 39M | 5.2% | ~10% | ~1GB | ✅ |
-| `base` | 74M | 3.5% | ~8% | ~1GB | ✅ |
-| **`small`** | 244M | 2.7% | ~5% | ~1.5GB | **✅ 推荐** |
-| `medium` | 769M | 2.3% | ~4% | ~5GB | ⚠️ 可能 OOM |
-| `large` | 1550M | 2.0% | ~3% | ~10GB | ❌ OOM |
-
-> \* WER = Word Error Rate，CER = Character Error Rate，越低越准确。
-
-**推荐：`small` 模型**（精度与资源占用的最佳平衡）
-
---
-
-## 三、安装
-
-### 3.1 前置条件
-
-```bash
-# 确认 Python 版本（需 3.8+）
-python3 --version
-
-# 确认 pip 可用
-pip3 --version
-```
-
-### 3.2 安装 Whisper
-
-```bash
-pip3 install openai-whisper
-```
-
-**如果遇到权限错误（macOS）：**
-```bash
-pip3 install --user openai-whisper
-```
-
-**首次运行会自动下载模型文件**（~500MB/small 模型），无需手动下载。
-
---
-
-## 四、快速测试
-
-### 4.1 单文件测试（tiny 模型，最快）
-
-```python
-import whisper
-
-model = whisper.load_model("tiny")          # 首次运行会下载模型
-result = model.transcribe("audio.mp3", language="en")
-print(result["text"])
-```
-
-### 4.2 完整示例（small 模型）
-
-```python
-import whisper
-
-# 加载模型（只需加载一次）
-model = whisper.load_model("small")
-
-# 转录
-result = model.transcribe(
-    "audio.mp3",
-    language="en",    # 指定语言，不指定则自动检测
-    fp16=False,       # Macmini 用 CPU，必须 False
-    verbose=True,     # 显示进度
-)
-
-print("语言检测:", result["language"])
-print("转写稿:", result["text"])
-print("分段数:", len(result["segments"]))
-```
-
-### 4.3 命令行测试
-
-```bash
-# 安装后可直接在命令行使用
-whisper audio.mp3 --model small --language en
-```
-
---
-
-## 五、Python API 详解
-
-### 5.1 核心方法
-
-```python
-import whisper
-
-model = whisper.load_model("small")
-
-# 完整参数
-result = model.transcribe(
-    audio="path/to/file.mp3",
-    
-    # 语言设置
-    language="en",           # 指定语言，不填则自动检测
-    # prompt="",            # 可选，引导模型偏好（如专有名词）
-    
-    # 输出控制
-   fp16=False,              # CPU 必须 False，GPU 可 True
-    temperature=0.0,         # 0=确定性，>0=随机性
-    condition_on_previous_text=True,  # 利用前一段上下文
-    
-    # 任务模式
-    task="transcribe",       # transcribe 或 translate（中译英）
-    
-    # 段落切分
-    word_timestamps=False,    # True=输出每个词的起止时间
-    
-    # 日志
-    verbose=True,
-)
-```
-
-### 5.2 返回值结构
-
-```python
-{
-    "text": "完整的转写文本...",
-    "language": "en",
-    "segments": [
-        {
-            "id": 0,
-            "start": 0.0,      # 秒
-            "end": 5.5,
-            "text": " Can you see my screen already?",
-            "words": [...]       # 如果 word_timestamps=True
-        },
-        ...
-    ],
-    "language_probability": 0.99
-}
-```
-
-### 5.3 批量转录
-
-```python
-import whisper
-import glob
-
-model = whisper.load_model("small")
-audio_files = glob.glob("*.mp3")
-
-for audio_file in audio_files:
-    print(f"Processing: {audio_file}")
-    result = model.transcribe(audio_file, language="en", fp16=False)
-    
-    # 保存转写稿
-    with open(audio_file + ".txt", "w") as f:
-        f.write(result["text"])
-```
-
---
-
-## 六、Macmini M4 Pro 性能实测
-
-| 音频时长 | 文件大小 | 模型 | 转录耗时 | 速度比 |
-|---|---|---|---|---|
-| ~54 分钟 | 3MB | `small` | ~43s | ~75x realtime |
-| ~54 分钟 | 3MB | `tiny` | ~10s | ~320x realtime |
-| ~1 小时 | 22MB | `small` | ~90s | ~40x realtime |
-
-**速度经验：** `small` 模型处理 1 小时音频约 1-2 分钟，内存占用稳定在 ~1.5GB。
-
---
-
-## 七、在流水线中的使用
-
-本项目不使用 Whisper API，而是通过 Python 脚本调用本地模型：
-
-```python
-import whisper
-
-def whisper_transcribe(mp3_path: str) -> str:
-    """单文件转录，返回英文字幕/转写稿"""
-    model = whisper.load_model("small")  # 模型只加载一次
-    result = model.transcribe(
-        mp3_path,
-        language="en",
-        fp16=False,
-    )
-    return result["text"].strip()
-
-# 使用
-transcript = whisper_transcribe("/path/to/audio.mp3")
-```
-
---
-
-## 八、常见问题
-
-### Q1: `fp16 is not supported on CPU` 警告
-**正常**，Macmini 用 CPU 运行，Whisper 自动降级到 FP32。不影响精度。
-
-### Q2: `SIGKILL` / 进程被杀死
-**内存不足**，模型太大。改用更小的模型：
-```python
-model = whisper.load_model("tiny")   # 最省内存
-```
-
-### Q3: 中文识别不准
-指定语言参数提升精度：
-```python
-result = model.transcribe("audio.mp3", language="zh")  # 中文
-result = model.transcribe("audio.mp3", language="en")  # 英文
-```
-
-### Q4: 如何加速转录
- 用 `tiny` 或 `base` 模型（牺牲精度换速度）
- Macmini M 系列芯片无需特殊优化（Neural Engine 自动加速）
- 避免同时跑多个转录任务
-
-### Q5: 支持哪些音频格式
-支持 FFmpeg 支持的所有格式：`mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm` 等。
-
---
-
-## 九、卸载
-
-```bash
-pip3 uninstall openai-whisper
-
-# 删除已下载的模型（默认缓存位置）
-rm -rf ~/.cache/whisper
-```
-
---
-
-## 十、相关资源
-
- **GitHub**: https://github.com/openai/whisper
- **模型下载**: 首次调用 `load_model()` 时自动下载
- **缓存位置**: `~/.cache/whisper/`
- **本项目脚本**: `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`
+# Whisper 本地语音转录完全指南
+
+> 文档版本：2026-04-15
+> 维护者：星枢（xingshu）
+> 状态：✅ Macmini 已验证可运行
+
+---
+
+## 一、Whisper 是什么
+
+Whisper 是 OpenAI 开源的自动语音识别（ASR）模型，可将音频文件转录为文字。支持 99 种语言，尤其对英文识别精度极高。
+
+**两种使用方式：**
+
+| 方式 | 说明 | 费用 |
+|---|---|---|
+| **本地运行** | 模型下载到本地 Mac/PC | **免费** |
+| OpenAI API | 调用 OpenAI Whisper API | 按分钟计费 |
+
+本指南使用**本地运行**方式。
+
+---
+
+## 二、支持的模型
+
+| 模型 | 参数量 | 英文 WER* | 中文 CER* | 本地内存占用 | Macmini 兼容性 |
+|---|---|---|---|---|---|
+| `tiny` | 39M | 5.2% | ~10% | ~1GB | ✅ |
+| `base` | 74M | 3.5% | ~8% | ~1GB | ✅ |
+| **`small`** | 244M | 2.7% | ~5% | ~1.5GB | **✅ 推荐** |
+| `medium` | 769M | 2.3% | ~4% | ~5GB | ⚠️ 可能 OOM |
+| `large` | 1550M | 2.0% | ~3% | ~10GB | ❌ OOM |
+
+> \* WER = Word Error Rate，CER = Character Error Rate，越低越准确。
+
+**推荐：`small` 模型**（精度与资源占用的最佳平衡）
+
+---
+
+## 三、安装
+
+### 3.1 前置条件
+
+```bash
+# 确认 Python 版本（需 3.8+）
+python3 --version
+
+# 确认 pip 可用
+pip3 --version
+```
+
+### 3.2 安装 Whisper
+
+```bash
+pip3 install openai-whisper
+```
+
+**如果遇到权限错误（macOS）：**
+```bash
+pip3 install --user openai-whisper
+```
+
+**首次运行会自动下载模型文件**（~500MB/small 模型），无需手动下载。
+
+---
+
+## 四、快速测试
+
+### 4.1 单文件测试（tiny 模型，最快）
+
+```python
+import whisper
+
+model = whisper.load_model("tiny")          # 首次运行会下载模型
+result = model.transcribe("audio.mp3", language="en")
+print(result["text"])
+```
+
+### 4.2 完整示例（small 模型）
+
+```python
+import whisper
+
+# 加载模型（只需加载一次）
+model = whisper.load_model("small")
+
+# 转录
+result = model.transcribe(
+    "audio.mp3",
+    language="en",    # 指定语言，不指定则自动检测
+    fp16=False,       # Macmini 用 CPU，必须 False
+    verbose=True,     # 显示进度
+)
+
+print("语言检测:", result["language"])
+print("转写稿:", result["text"])
+print("分段数:", len(result["segments"]))
+```
+
+### 4.3 命令行测试
+
+```bash
+# 安装后可直接在命令行使用
+whisper audio.mp3 --model small --language en
+```
+
+---
+
+## 五、Python API 详解
+
+### 5.1 核心方法
+
+```python
+import whisper
+
+model = whisper.load_model("small")
+
+# 完整参数
+result = model.transcribe(
+    audio="path/to/file.mp3",
+    
+    # 语言设置
+    language="en",           # 指定语言，不填则自动检测
+    # prompt="",            # 可选，引导模型偏好（如专有名词）
+    
+    # 输出控制
+   fp16=False,              # CPU 必须 False，GPU 可 True
+    temperature=0.0,         # 0=确定性，>0=随机性
+    condition_on_previous_text=True,  # 利用前一段上下文
+    
+    # 任务模式
+    task="transcribe",       # transcribe 或 translate（中译英）
+    
+    # 段落切分
+    word_timestamps=False,    # True=输出每个词的起止时间
+    
+    # 日志
+    verbose=True,
+)
+```
+
+### 5.2 返回值结构
+
+```python
+{
+    "text": "完整的转写文本...",
+    "language": "en",
+    "segments": [
+        {
+            "id": 0,
+            "start": 0.0,      # 秒
+            "end": 5.5,
+            "text": " Can you see my screen already?",
+            "words": [...]       # 如果 word_timestamps=True
+        },
+        ...
+    ],
+    "language_probability": 0.99
+}
+```
+
+### 5.3 批量转录
+
+```python
+import whisper
+import glob
+
+model = whisper.load_model("small")
+audio_files = glob.glob("*.mp3")
+
+for audio_file in audio_files:
+    print(f"Processing: {audio_file}")
+    result = model.transcribe(audio_file, language="en", fp16=False)
+    
+    # 保存转写稿
+    with open(audio_file + ".txt", "w") as f:
+        f.write(result["text"])
+```
+
+---
+
+## 六、Macmini M4 Pro 性能实测
+
+| 音频时长 | 文件大小 | 模型 | 转录耗时 | 速度比 |
+|---|---|---|---|---|
+| ~54 分钟 | 3MB | `small` | ~43s | ~75x realtime |
+| ~54 分钟 | 3MB | `tiny` | ~10s | ~320x realtime |
+| ~1 小时 | 22MB | `small` | ~90s | ~40x realtime |
+
+**速度经验：** `small` 模型处理 1 小时音频约 1-2 分钟，内存占用稳定在 ~1.5GB。
+
+---
+
+## 七、在流水线中的使用
+
+本项目不使用 Whisper API，而是通过 Python 脚本调用本地模型：
+
+```python
+import whisper
+
+def whisper_transcribe(mp3_path: str) -> str:
+    """单文件转录，返回英文字幕/转写稿"""
+    model = whisper.load_model("small")  # 模型只加载一次
+    result = model.transcribe(
+        mp3_path,
+        language="en",
+        fp16=False,
+    )
+    return result["text"].strip()
+
+# 使用
+transcript = whisper_transcribe("/path/to/audio.mp3")
+```
+
+---
+
+## 八、常见问题
+
+### Q1: `fp16 is not supported on CPU` 警告
+**正常**，Macmini 用 CPU 运行，Whisper 自动降级到 FP32。不影响精度。
+
+### Q2: `SIGKILL` / 进程被杀死
+**内存不足**，模型太大。改用更小的模型：
+```python
+model = whisper.load_model("tiny")   # 最省内存
+```
+
+### Q3: 中文识别不准
+指定语言参数提升精度：
+```python
+result = model.transcribe("audio.mp3", language="zh")  # 中文
+result = model.transcribe("audio.mp3", language="en")  # 英文
+```
+
+### Q4: 如何加速转录
+- 用 `tiny` 或 `base` 模型（牺牲精度换速度）
+- Macmini M 系列芯片无需特殊优化（Neural Engine 自动加速）
+- 避免同时跑多个转录任务
+
+### Q5: 支持哪些音频格式
+支持 FFmpeg 支持的所有格式：`mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm` 等。
+
+---
+
+## 九、卸载
+
+```bash
+pip3 uninstall openai-whisper
+
+# 删除已下载的模型（默认缓存位置）
+rm -rf ~/.cache/whisper
+```
+
+---
+
+## 十、相关资源
+
+- **GitHub**: https://github.com/openai/whisper
+- **模型下载**: 首次调用 `load_model()` 时自动下载
+- **缓存位置**: `~/.cache/whisper/`
+- **本项目脚本**: `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`