Update nexus: fix conflicts and sync local changes
This commit is contained in:
@@ -1,108 +1,108 @@
|
||||
# MEMORY.md - 长期记忆
|
||||
|
||||
## 我的身份
|
||||
|
||||
- **名字**: 星枢
|
||||
- **角色**: 最高统领 / Master Orchestrator
|
||||
- **职责**: 统一调度所有 Agent
|
||||
- **下属**: 星曜(IT 管家)、星辉(个人助理)
|
||||
- **头像**: ./avatars/xingshu.jpg
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ 重要原则(必须牢记)
|
||||
|
||||
**讨论/头脑风暴阶段**:
|
||||
- 未经用户允许,**禁止**安装任何程序、技能或工具
|
||||
- 未经用户允许,**禁止**编写任何代码
|
||||
- 未经用户允许,**禁止**创建任何文件或项目
|
||||
- 必待用户确认全部方案后,方可实施后续步骤
|
||||
- 节奏由用户掌控,一切行动需等待指令
|
||||
|
||||
---
|
||||
|
||||
### :star: 每日必做
|
||||
|
||||
1. **每天第一次对话时**: 自动创建当天的记忆文件 `memory/YYYY-MM-DD.md`
|
||||
2. **记录内容**: 对话中的重要操作、决策、用户要求等
|
||||
3. **用户要求**: 当用户说"请记住xxxx"时必须记录到记忆文件
|
||||
4. **同步规则**: MEMORY.md更新后,必须同步复制到Obsidian笔记目录
|
||||
- 笔记目录: `/Users/weishen/Workspace/nexus/openclaw/xingshu/MEMORY.md`
|
||||
|
||||
*此为每日必执行的routine,不可遗漏。*
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ 服务器架构
|
||||
|
||||
| 服务器 | IP | 运行的 Agent |
|
||||
|--------|-----|-------------|
|
||||
| **Mac Mini** (中央控制节点) | 192.168.3.189 | xingshu (星枢), xingyao (星曜), xinghui (星辉), RabbitMQ |
|
||||
| **Ubuntu2** (开发服务器) | 192.168.3.45 | yunhan, yunce, yunjiang, yunzhi |
|
||||
| **Ubuntu1** (准生产服务器) | 192.168.3.47 | fengheng, fengchi, fengji |
|
||||
|
||||
---
|
||||
|
||||
## 🦞 Lobster 工作流标准规范(2026-04-19 验证完成)
|
||||
|
||||
### 调用链路(标准)
|
||||
用户 → xingshu → tool_call(tool="lobster", action="run", pipeline="...", argsJson="...", timeoutMs=600000) → Gateway 执行
|
||||
|
||||
### .lobster 文件标准格式
|
||||
```yaml
|
||||
name: workflow-name
|
||||
args:
|
||||
arg1:
|
||||
description: 说明
|
||||
required: true
|
||||
default: "默认值"
|
||||
steps:
|
||||
- id: step1
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send --action json --args-json '{
|
||||
"sessionKey": "agent:xxx:...",
|
||||
"message": "指令内容 ${args.arg1}",
|
||||
"timeoutSeconds": 300
|
||||
}'
|
||||
- id: step2
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send ...
|
||||
stdin: $step1.stdout
|
||||
- id: approve
|
||||
command: approve --preview-from-stdin --prompt "确认?"
|
||||
stdin: $step2.stdout
|
||||
approval: required
|
||||
- id: deliver
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send ...
|
||||
condition: $approve.approved
|
||||
stdin: $step2.stdout
|
||||
```
|
||||
|
||||
### 关键规则
|
||||
1. argsJson 用**单引号**包裹('{...}'),防止 shell 展开 ${} 变量
|
||||
2. ${args.xxx} 在 lobster runner 层展开,不经过 shell
|
||||
3. session key 格式:agent:{agentId}:{channel}:direct:{chatId} 或带 thread
|
||||
4. 审批门控:approval: required + condition: $step.approved
|
||||
5. 数据传递:stdin: $step.stdout
|
||||
|
||||
### OpenClaw 配置要求
|
||||
- plugins.allow 含 lobster
|
||||
- plugins.entries.lobster.enabled: true
|
||||
- agents.list[xingshu].tools.alsoAllow 含 lobster
|
||||
|
||||
### 工作流文件位置
|
||||
/Users/weishen/.openclaw/workspace-agent-xingshu/workflows/
|
||||
|
||||
### 工具调用参数
|
||||
- tool: lobster
|
||||
- action: run
|
||||
- pipeline: /absolute/path/to/workflow.lobster
|
||||
- argsJson: {"arg1":"value1",...}
|
||||
- timeoutMs: 600000
|
||||
- cwd: /Users/weishen/.openclaw/workspace-agent-xingshu
|
||||
|
||||
### ⚠️ 重要限制
|
||||
- Telegram 会话是同步的,无法在回复用户的同时执行后台工具
|
||||
- 需通过 sessions_spawn 派生子 Agent 执行 lobster 工具调用
|
||||
# MEMORY.md - 长期记忆
|
||||
|
||||
## 我的身份
|
||||
|
||||
- **名字**: 星枢
|
||||
- **角色**: 最高统领 / Master Orchestrator
|
||||
- **职责**: 统一调度所有 Agent
|
||||
- **下属**: 星曜(IT 管家)、星辉(个人助理)
|
||||
- **头像**: ./avatars/xingshu.jpg
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ 重要原则(必须牢记)
|
||||
|
||||
**讨论/头脑风暴阶段**:
|
||||
- 未经用户允许,**禁止**安装任何程序、技能或工具
|
||||
- 未经用户允许,**禁止**编写任何代码
|
||||
- 未经用户允许,**禁止**创建任何文件或项目
|
||||
- 必待用户确认全部方案后,方可实施后续步骤
|
||||
- 节奏由用户掌控,一切行动需等待指令
|
||||
|
||||
---
|
||||
|
||||
### :star: 每日必做
|
||||
|
||||
1. **每天第一次对话时**: 自动创建当天的记忆文件 `memory/YYYY-MM-DD.md`
|
||||
2. **记录内容**: 对话中的重要操作、决策、用户要求等
|
||||
3. **用户要求**: 当用户说"请记住xxxx"时必须记录到记忆文件
|
||||
4. **同步规则**: MEMORY.md更新后,必须同步复制到Obsidian笔记目录
|
||||
- 笔记目录: `/Users/weishen/Workspace/nexus/openclaw/xingshu/MEMORY.md`
|
||||
|
||||
*此为每日必执行的routine,不可遗漏。*
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ 服务器架构
|
||||
|
||||
| 服务器 | IP | 运行的 Agent |
|
||||
|--------|-----|-------------|
|
||||
| **Mac Mini** (中央控制节点) | 192.168.3.189 | xingshu (星枢), xingyao (星曜), xinghui (星辉), RabbitMQ |
|
||||
| **Ubuntu2** (开发服务器) | 192.168.3.45 | yunhan, yunce, yunjiang, yunzhi |
|
||||
| **Ubuntu1** (准生产服务器) | 192.168.3.47 | fengheng, fengchi, fengji |
|
||||
|
||||
---
|
||||
|
||||
## 🦞 Lobster 工作流标准规范(2026-04-19 验证完成)
|
||||
|
||||
### 调用链路(标准)
|
||||
用户 → xingshu → tool_call(tool="lobster", action="run", pipeline="...", argsJson="...", timeoutMs=600000) → Gateway 执行
|
||||
|
||||
### .lobster 文件标准格式
|
||||
```yaml
|
||||
name: workflow-name
|
||||
args:
|
||||
arg1:
|
||||
description: 说明
|
||||
required: true
|
||||
default: "默认值"
|
||||
steps:
|
||||
- id: step1
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send --action json --args-json '{
|
||||
"sessionKey": "agent:xxx:...",
|
||||
"message": "指令内容 ${args.arg1}",
|
||||
"timeoutSeconds": 300
|
||||
}'
|
||||
- id: step2
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send ...
|
||||
stdin: $step1.stdout
|
||||
- id: approve
|
||||
command: approve --preview-from-stdin --prompt "确认?"
|
||||
stdin: $step2.stdout
|
||||
approval: required
|
||||
- id: deliver
|
||||
command: |
|
||||
openclaw.invoke --tool sessions_send ...
|
||||
condition: $approve.approved
|
||||
stdin: $step2.stdout
|
||||
```
|
||||
|
||||
### 关键规则
|
||||
1. argsJson 用**单引号**包裹('{...}'),防止 shell 展开 ${} 变量
|
||||
2. ${args.xxx} 在 lobster runner 层展开,不经过 shell
|
||||
3. session key 格式:agent:{agentId}:{channel}:direct:{chatId} 或带 thread
|
||||
4. 审批门控:approval: required + condition: $step.approved
|
||||
5. 数据传递:stdin: $step.stdout
|
||||
|
||||
### OpenClaw 配置要求
|
||||
- plugins.allow 含 lobster
|
||||
- plugins.entries.lobster.enabled: true
|
||||
- agents.list[xingshu].tools.alsoAllow 含 lobster
|
||||
|
||||
### 工作流文件位置
|
||||
/Users/weishen/.openclaw/workspace-agent-xingshu/workflows/
|
||||
|
||||
### 工具调用参数
|
||||
- tool: lobster
|
||||
- action: run
|
||||
- pipeline: /absolute/path/to/workflow.lobster
|
||||
- argsJson: {"arg1":"value1",...}
|
||||
- timeoutMs: 600000
|
||||
- cwd: /Users/weishen/.openclaw/workspace-agent-xingshu
|
||||
|
||||
### ⚠️ 重要限制
|
||||
- Telegram 会话是同步的,无法在回复用户的同时执行后台工具
|
||||
- 需通过 sessions_spawn 派生子 Agent 执行 lobster 工具调用
|
||||
- 子 Agent 需在 xingshu 主会话中触发,不能独立运行
|
||||
@@ -1,204 +1,204 @@
|
||||
# PST 邮件处理流程总结
|
||||
|
||||
> 创建时间: 2026-04-13
|
||||
> 处理对象: Shen Wei 2025.pst (15GB, 55,647封)
|
||||
|
||||
---
|
||||
|
||||
## 📋 全流程概述
|
||||
|
||||
### 阶段一:PST 提取(PST → mbox)
|
||||
|
||||
```
|
||||
原始文件: Shen Wei 2025.pst (15GB)
|
||||
↓
|
||||
工具: readpst (macOS 已安装)
|
||||
↓
|
||||
输出: extracted_2025/Shen Wei 2025/*/mbox (30个文件夹)
|
||||
时间: ~10-15 分钟
|
||||
```
|
||||
|
||||
**关键点:**
|
||||
- `readpst -S <pst_file> -o <output_dir>` 将 PST 按文件夹结构展开为 mbox 格式
|
||||
- 每个 mbox 以 `From ` 行分隔邮件
|
||||
- 原始 PST 完全未动
|
||||
|
||||
---
|
||||
|
||||
### 阶段二:建立索引(mbox → CSV)
|
||||
|
||||
```
|
||||
原始 mbox (30个文件夹, 55,647封)
|
||||
↓
|
||||
Python: mailbox 模块 + parsedate_to_datetime
|
||||
↓
|
||||
输出: ~/pst-processing/2025/YYYY-MM.csv (83个月)
|
||||
```
|
||||
|
||||
**CSV 字段:**
|
||||
|
||||
| 字段 | 说明 |
|
||||
|------|------|
|
||||
| folder | 文件夹路径(如 Inbox/SaaS Notification/AWS) |
|
||||
| year_month | 年月(如 2025-01) |
|
||||
| message_id | 邮件唯一标识(用于跨文件匹配) |
|
||||
| subject | 主题(截断至80字符保唯一性) |
|
||||
| sender | 发件人(完整显示名+邮箱) |
|
||||
| recipient | 收件人 |
|
||||
| date | 日期(RFC 2822 原始格式) |
|
||||
| has_attachment | Y/N(检测 Content-Disposition: attachment) |
|
||||
| attachment_size | 附件总字节数 |
|
||||
| email_size | 整封邮件字节数 |
|
||||
|
||||
**处理速度:** ~4分钟(55,647封)
|
||||
|
||||
---
|
||||
|
||||
### 阶段三:定义删除规则
|
||||
|
||||
**规则文件:** `~/pst-processing/rules/delete_rules.json`
|
||||
|
||||
| # | 规则ID | 文件夹匹配 | 动作 | 说明 |
|
||||
|---|--------|----------|------|------|
|
||||
| 1 | aws_notification | 含 "AWS Notification" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 2 | prisma_cloud | 含 "Prisma Cloud" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 3 | x4x_tenant_provisioning | 含 "X4X-Tenant Provisioning" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 4 | qualys | 含 "Qualys" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 5 | teams_notification | 含 "Teams Notification" | keep_if_attachment | 有附件保留,无附件删 |
|
||||
| 6 | sma_notification | 含 "SMA Notficiation" | keep_sample: 10 | 每 subject 保留1封,每月最多10封 |
|
||||
| 7 | ppm_saas_change | 含 "PPM SaaS Change" | delete_all | 全部删除 |
|
||||
| 8 | cloudhealth | 含 "CloudHealth" | keep | 全部保留 |
|
||||
| 9 | saas_bi_report | 含 "SaaS BI Report" | keep | 全部保留 |
|
||||
| 10 | x4x_decommissioning | 含 "X4X-Tenant Decommissioning" | delete_all | 全部删除 |
|
||||
| 11 | x4x_license_renewal | 含 "X4X-License Renewal" | delete_all | 全部删除 |
|
||||
|
||||
**规则动作类型:**
|
||||
- `keep`:全部保留
|
||||
- `delete_all`:全部删除
|
||||
- `keep_sample`:每 subject 保留1封,最多N封
|
||||
- `keep_if_attachment`:有附件保留,无附件删除
|
||||
|
||||
**执行脚本:** `~/pst-processing/rules/apply_rules.py`
|
||||
|
||||
---
|
||||
|
||||
### 阶段四:过滤并组装新 mbox
|
||||
|
||||
```
|
||||
原始 mbox (30个文件夹, 55,647封)
|
||||
↓
|
||||
按 message_id 匹配 delete_flag
|
||||
↓
|
||||
保留邮件: 29,088 封 (52%)
|
||||
↓
|
||||
提取 full content(含正文+附件)
|
||||
↓
|
||||
输出: shenwei2025-clean.mbox (9.6GB)
|
||||
```
|
||||
|
||||
**关键发现:**
|
||||
- 有117封邮件在原始 mbox 中匹配失败(PST→mbox 转换时的 ID 差异)
|
||||
- SMA Notification 原始 6,681 封 → 规则处理后仅保留 40 封
|
||||
|
||||
---
|
||||
|
||||
## 💡 经验总结
|
||||
|
||||
### 1. 工具链
|
||||
|
||||
| 场景 | 工具 | 备注 |
|
||||
|------|------|------|
|
||||
| PST 读取 | `readpst` | macOS 已安装 (`/opt/homebrew/bin/readpst`) |
|
||||
| PST Python 库 | ❌ pypff (PEP 668阻止安装) | 无需替代,readpst + mbox 够用 |
|
||||
| mbox 处理 | Python `mailbox` | 原生支持,够用 |
|
||||
| 日期解析 | `email.utils.parsedate_to_datetime` | 兼容 RFC 2822 |
|
||||
|
||||
### 2. CSV 索引优先于直接处理 mbox
|
||||
|
||||
- mbox 随机访问极慢(全量扫描)
|
||||
- 先建索引,按 `message_id` 匹配,效率高
|
||||
- CSV 可用 Excel/Numbers 查看,方便人工审阅
|
||||
|
||||
### 3. 删除规则设计原则
|
||||
|
||||
- **按文件夹**:系统通知类(AWS/Qualys/Prisma)→ 批量删除或采样
|
||||
- **按附件有无**:Teams 通知 → 有附件才留
|
||||
- **按 subject 采样**:保留不同类型的代表性样本,避免漏掉格式
|
||||
- **按月配额**:高频通知类 → 每月最多留N封
|
||||
|
||||
### 4. 避免的坑
|
||||
|
||||
- `mailbox.mbox` 不支持 slice 索引 → 用 `iter()` 遍历
|
||||
- Subject 可能含非 ASCII → `:80` 截断保唯一性
|
||||
- 附件大小:字符串 payload 用 `len(payload.encode())`,二进制 payload 直接 `len(payload)`
|
||||
- 日期解析失败 → 归入 "Unknown" 月份
|
||||
|
||||
---
|
||||
|
||||
## 📊 Shen Wei 2025 处理结果
|
||||
|
||||
| 指标 | 处理前 | 处理后 |
|
||||
|------|--------|--------|
|
||||
| 总邮件 | 55,647 | 29,088 |
|
||||
| 删除 | — | 26,559 |
|
||||
| 附件邮件 | — | 20,171 (69%) |
|
||||
| 总附件大小 | — | 8.6 GB |
|
||||
|
||||
---
|
||||
|
||||
## 🔄 批量处理 SOP(待执行)
|
||||
|
||||
### 输入
|
||||
- 新 PST 文件 → 复制到 `~/pst-processing/incoming/`
|
||||
- 规则文件已就绪:`~/pst-processing/rules/delete_rules.json`
|
||||
|
||||
### 执行步骤
|
||||
|
||||
```bash
|
||||
# 1. 提取 PST → mbox
|
||||
mkdir -p ~/pst-processing/incoming/<pst_name>/
|
||||
readpst -S ~/pst-processing/incoming/<file>.pst -o ~/pst-processing/incoming/<pst_name>/
|
||||
|
||||
# 2. 建立索引 CSV
|
||||
python3 ~/pst-processing/rules/index_pst.py <source_dir> <output_csv_dir>
|
||||
|
||||
# 3. 应用删除规则(复用现有规则)
|
||||
python3 ~/pst-processing/rules/apply_rules.py <csv_dir>
|
||||
|
||||
# 4. 组装新 mbox(按 message_id 匹配原始 mbox)
|
||||
python3 ~/pst-processing/rules/build_clean_mbox.py <source_mbox_dir> <output_mbox>
|
||||
```
|
||||
|
||||
### 待开发脚本
|
||||
|
||||
- [ ] `index_pst.py` — 通用 PST 索引脚本
|
||||
- [ ] `batch_process.sh` — 批量处理 wrapper
|
||||
- [ ] `rules/delete_rules.json` — 已有,可直接复用
|
||||
|
||||
---
|
||||
|
||||
## 📁 当前文件结构
|
||||
|
||||
```
|
||||
~/pst-processing/
|
||||
├── rules/
|
||||
│ ├── delete_rules.json # 规则定义
|
||||
│ └── apply_rules.py # 规则执行脚本
|
||||
├── 2025/ # 原始索引 CSV(83个月)
|
||||
│ └── YYYY-MM.csv
|
||||
├── shenwei2025-clean.mbox # 清理后完整邮件 (9.6GB, 29,088封)
|
||||
├── shenwei2025-clean-index.csv # 新索引 (12.3MB)
|
||||
├── shenwei2025-new.mbox # 旧版(仅 headers,已保留)
|
||||
└── extracted_2025/ # 原始 mbox 提取物
|
||||
└── Shen Wei 2025/
|
||||
└── */mbox
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📅 后续任务
|
||||
|
||||
- [ ] 开发通用 `index_pst.py` 脚本
|
||||
- [ ] 开发 `build_clean_mbox.py` 脚本
|
||||
- [ ] 批处理其他 PST 文件
|
||||
- [ ] 同步笔记到 Git
|
||||
# PST 邮件处理流程总结
|
||||
|
||||
> 创建时间: 2026-04-13
|
||||
> 处理对象: Shen Wei 2025.pst (15GB, 55,647封)
|
||||
|
||||
---
|
||||
|
||||
## 📋 全流程概述
|
||||
|
||||
### 阶段一:PST 提取(PST → mbox)
|
||||
|
||||
```
|
||||
原始文件: Shen Wei 2025.pst (15GB)
|
||||
↓
|
||||
工具: readpst (macOS 已安装)
|
||||
↓
|
||||
输出: extracted_2025/Shen Wei 2025/*/mbox (30个文件夹)
|
||||
时间: ~10-15 分钟
|
||||
```
|
||||
|
||||
**关键点:**
|
||||
- `readpst -S <pst_file> -o <output_dir>` 将 PST 按文件夹结构展开为 mbox 格式
|
||||
- 每个 mbox 以 `From ` 行分隔邮件
|
||||
- 原始 PST 完全未动
|
||||
|
||||
---
|
||||
|
||||
### 阶段二:建立索引(mbox → CSV)
|
||||
|
||||
```
|
||||
原始 mbox (30个文件夹, 55,647封)
|
||||
↓
|
||||
Python: mailbox 模块 + parsedate_to_datetime
|
||||
↓
|
||||
输出: ~/pst-processing/2025/YYYY-MM.csv (83个月)
|
||||
```
|
||||
|
||||
**CSV 字段:**
|
||||
|
||||
| 字段 | 说明 |
|
||||
|------|------|
|
||||
| folder | 文件夹路径(如 Inbox/SaaS Notification/AWS) |
|
||||
| year_month | 年月(如 2025-01) |
|
||||
| message_id | 邮件唯一标识(用于跨文件匹配) |
|
||||
| subject | 主题(截断至80字符保唯一性) |
|
||||
| sender | 发件人(完整显示名+邮箱) |
|
||||
| recipient | 收件人 |
|
||||
| date | 日期(RFC 2822 原始格式) |
|
||||
| has_attachment | Y/N(检测 Content-Disposition: attachment) |
|
||||
| attachment_size | 附件总字节数 |
|
||||
| email_size | 整封邮件字节数 |
|
||||
|
||||
**处理速度:** ~4分钟(55,647封)
|
||||
|
||||
---
|
||||
|
||||
### 阶段三:定义删除规则
|
||||
|
||||
**规则文件:** `~/pst-processing/rules/delete_rules.json`
|
||||
|
||||
| # | 规则ID | 文件夹匹配 | 动作 | 说明 |
|
||||
|---|--------|----------|------|------|
|
||||
| 1 | aws_notification | 含 "AWS Notification" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 2 | prisma_cloud | 含 "Prisma Cloud" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 3 | x4x_tenant_provisioning | 含 "X4X-Tenant Provisioning" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 4 | qualys | 含 "Qualys" | keep_sample: 5 | 每 subject 保留1封,最多5封 |
|
||||
| 5 | teams_notification | 含 "Teams Notification" | keep_if_attachment | 有附件保留,无附件删 |
|
||||
| 6 | sma_notification | 含 "SMA Notficiation" | keep_sample: 10 | 每 subject 保留1封,每月最多10封 |
|
||||
| 7 | ppm_saas_change | 含 "PPM SaaS Change" | delete_all | 全部删除 |
|
||||
| 8 | cloudhealth | 含 "CloudHealth" | keep | 全部保留 |
|
||||
| 9 | saas_bi_report | 含 "SaaS BI Report" | keep | 全部保留 |
|
||||
| 10 | x4x_decommissioning | 含 "X4X-Tenant Decommissioning" | delete_all | 全部删除 |
|
||||
| 11 | x4x_license_renewal | 含 "X4X-License Renewal" | delete_all | 全部删除 |
|
||||
|
||||
**规则动作类型:**
|
||||
- `keep`:全部保留
|
||||
- `delete_all`:全部删除
|
||||
- `keep_sample`:每 subject 保留1封,最多N封
|
||||
- `keep_if_attachment`:有附件保留,无附件删除
|
||||
|
||||
**执行脚本:** `~/pst-processing/rules/apply_rules.py`
|
||||
|
||||
---
|
||||
|
||||
### 阶段四:过滤并组装新 mbox
|
||||
|
||||
```
|
||||
原始 mbox (30个文件夹, 55,647封)
|
||||
↓
|
||||
按 message_id 匹配 delete_flag
|
||||
↓
|
||||
保留邮件: 29,088 封 (52%)
|
||||
↓
|
||||
提取 full content(含正文+附件)
|
||||
↓
|
||||
输出: shenwei2025-clean.mbox (9.6GB)
|
||||
```
|
||||
|
||||
**关键发现:**
|
||||
- 有117封邮件在原始 mbox 中匹配失败(PST→mbox 转换时的 ID 差异)
|
||||
- SMA Notification 原始 6,681 封 → 规则处理后仅保留 40 封
|
||||
|
||||
---
|
||||
|
||||
## 💡 经验总结
|
||||
|
||||
### 1. 工具链
|
||||
|
||||
| 场景 | 工具 | 备注 |
|
||||
|------|------|------|
|
||||
| PST 读取 | `readpst` | macOS 已安装 (`/opt/homebrew/bin/readpst`) |
|
||||
| PST Python 库 | ❌ pypff (PEP 668阻止安装) | 无需替代,readpst + mbox 够用 |
|
||||
| mbox 处理 | Python `mailbox` | 原生支持,够用 |
|
||||
| 日期解析 | `email.utils.parsedate_to_datetime` | 兼容 RFC 2822 |
|
||||
|
||||
### 2. CSV 索引优先于直接处理 mbox
|
||||
|
||||
- mbox 随机访问极慢(全量扫描)
|
||||
- 先建索引,按 `message_id` 匹配,效率高
|
||||
- CSV 可用 Excel/Numbers 查看,方便人工审阅
|
||||
|
||||
### 3. 删除规则设计原则
|
||||
|
||||
- **按文件夹**:系统通知类(AWS/Qualys/Prisma)→ 批量删除或采样
|
||||
- **按附件有无**:Teams 通知 → 有附件才留
|
||||
- **按 subject 采样**:保留不同类型的代表性样本,避免漏掉格式
|
||||
- **按月配额**:高频通知类 → 每月最多留N封
|
||||
|
||||
### 4. 避免的坑
|
||||
|
||||
- `mailbox.mbox` 不支持 slice 索引 → 用 `iter()` 遍历
|
||||
- Subject 可能含非 ASCII → `:80` 截断保唯一性
|
||||
- 附件大小:字符串 payload 用 `len(payload.encode())`,二进制 payload 直接 `len(payload)`
|
||||
- 日期解析失败 → 归入 "Unknown" 月份
|
||||
|
||||
---
|
||||
|
||||
## 📊 Shen Wei 2025 处理结果
|
||||
|
||||
| 指标 | 处理前 | 处理后 |
|
||||
|------|--------|--------|
|
||||
| 总邮件 | 55,647 | 29,088 |
|
||||
| 删除 | — | 26,559 |
|
||||
| 附件邮件 | — | 20,171 (69%) |
|
||||
| 总附件大小 | — | 8.6 GB |
|
||||
|
||||
---
|
||||
|
||||
## 🔄 批量处理 SOP(待执行)
|
||||
|
||||
### 输入
|
||||
- 新 PST 文件 → 复制到 `~/pst-processing/incoming/`
|
||||
- 规则文件已就绪:`~/pst-processing/rules/delete_rules.json`
|
||||
|
||||
### 执行步骤
|
||||
|
||||
```bash
|
||||
# 1. 提取 PST → mbox
|
||||
mkdir -p ~/pst-processing/incoming/<pst_name>/
|
||||
readpst -S ~/pst-processing/incoming/<file>.pst -o ~/pst-processing/incoming/<pst_name>/
|
||||
|
||||
# 2. 建立索引 CSV
|
||||
python3 ~/pst-processing/rules/index_pst.py <source_dir> <output_csv_dir>
|
||||
|
||||
# 3. 应用删除规则(复用现有规则)
|
||||
python3 ~/pst-processing/rules/apply_rules.py <csv_dir>
|
||||
|
||||
# 4. 组装新 mbox(按 message_id 匹配原始 mbox)
|
||||
python3 ~/pst-processing/rules/build_clean_mbox.py <source_mbox_dir> <output_mbox>
|
||||
```
|
||||
|
||||
### 待开发脚本
|
||||
|
||||
- [ ] `index_pst.py` — 通用 PST 索引脚本
|
||||
- [ ] `batch_process.sh` — 批量处理 wrapper
|
||||
- [ ] `rules/delete_rules.json` — 已有,可直接复用
|
||||
|
||||
---
|
||||
|
||||
## 📁 当前文件结构
|
||||
|
||||
```
|
||||
~/pst-processing/
|
||||
├── rules/
|
||||
│ ├── delete_rules.json # 规则定义
|
||||
│ └── apply_rules.py # 规则执行脚本
|
||||
├── 2025/ # 原始索引 CSV(83个月)
|
||||
│ └── YYYY-MM.csv
|
||||
├── shenwei2025-clean.mbox # 清理后完整邮件 (9.6GB, 29,088封)
|
||||
├── shenwei2025-clean-index.csv # 新索引 (12.3MB)
|
||||
├── shenwei2025-new.mbox # 旧版(仅 headers,已保留)
|
||||
└── extracted_2025/ # 原始 mbox 提取物
|
||||
└── Shen Wei 2025/
|
||||
└── */mbox
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📅 后续任务
|
||||
|
||||
- [ ] 开发通用 `index_pst.py` 脚本
|
||||
- [ ] 开发 `build_clean_mbox.py` 脚本
|
||||
- [ ] 批处理其他 PST 文件
|
||||
- [ ] 同步笔记到 Git
|
||||
|
||||
@@ -1,180 +1,180 @@
|
||||
---
|
||||
title: Superpowers 方法论与 Agent-Based 项目整合
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: [multi-agent, superpowers]
|
||||
---
|
||||
|
||||
# Superpowers 方法论与 Agent-Based 项目整合
|
||||
|
||||
> **创建时间:** 2026-04-05
|
||||
> **来源:** 与比利哥的讨论
|
||||
> **标签:** #方法论 #multi-agent #superpowers
|
||||
|
||||
---
|
||||
|
||||
## 一、Superpowers 是什么
|
||||
|
||||
Superpowers 是由 Jesse Vincent(Best Practical)构建的 **软件编码工作流框架**,专为 AI 编码代理(Claude Code、Cursor、Codex、OpenCode)设计。
|
||||
|
||||
**核心理念:**"Don't just jump in — step back, question, plan, then execute."
|
||||
|
||||
---
|
||||
|
||||
## 二、核心工作流(Basic Workflow)
|
||||
|
||||
| 阶段 | 触发时机 | 关键产出 |
|
||||
|---|---|---|
|
||||
| `brainstorming` | 写代码前 | 澄清意图、设计文档 |
|
||||
| `using-git-worktrees` | 设计确认后 | 隔离工作分支 |
|
||||
| `writing-plans` | 设计定稿后 | 可执行的小任务清单(2-5分钟/项) |
|
||||
| `subagent-driven-development` / `executing-plans` | 计划就绪 | 并行分发子任务、两阶段审查 |
|
||||
| `test-driven-development` | 实现过程中 | RED-GREEN-REFACTOR 循环 |
|
||||
| `requesting-code-review` | 任务间 | 按严重级别报告问题 |
|
||||
| `finishing-a-development-branch` | 任务完成 | 测试验证 + PR/合并选项 |
|
||||
|
||||
---
|
||||
|
||||
## 三、核心技能清单
|
||||
|
||||
### Testing
|
||||
- `test-driven-development` — RED-GREEN-REFACTOR 循环
|
||||
|
||||
### Debugging
|
||||
- `systematic-debugging` — 4步根因分析
|
||||
- `verification-before-completion` — 确保真正修复
|
||||
|
||||
### Collaboration
|
||||
- `brainstorming` — 苏格拉底式设计精炼
|
||||
- `writing-plans` — 详细实现计划
|
||||
- `executing-plans` — 分批执行 + 检查点
|
||||
- `dispatching-parallel-agents` — 并发子任务工作流
|
||||
- `requesting-code-review` — 审查前检查清单
|
||||
- `receiving-code-review` — 响应反馈
|
||||
- `using-git-worktrees` — 并行开发分支
|
||||
- `finishing-a-development-branch` — 合并/PR决策
|
||||
- `subagent-driven-development` — 两阶段审查(规格合规 → 代码质量)
|
||||
|
||||
### Meta
|
||||
- `writing-skills` — 创建新技能
|
||||
- `using-superpowers` — 技能系统入门
|
||||
|
||||
---
|
||||
|
||||
## 四、哲学原则
|
||||
|
||||
- **Test-Driven Development** — 先写测试,永远
|
||||
- **Systematic over ad-hoc** — 流程优于猜测
|
||||
- **Complexity reduction** — 简洁是首要目标
|
||||
- **Evidence over claims** — 验证后才算成功
|
||||
|
||||
---
|
||||
|
||||
## 五、对 OpenClaw Agent 矩阵的映射
|
||||
|
||||
### 5.1 Agent 类型与适用度
|
||||
|
||||
| Agent 类型 | 示例 | Superpowers 适用度 |
|
||||
|---|---|---|
|
||||
| **编码型** | xingjiang (星匠) | ✅ 高 — 直接可用 |
|
||||
| **运维型** | xingyao (星曜)、yunhan | ✅ 中 — 部分技能适用 |
|
||||
| **协调型** | xingshu (星枢/我) | ❌ 低 — 方法论而非执行框架 |
|
||||
|
||||
### 5.2 技能映射对照
|
||||
|
||||
| Superpowers 通用环节 | 映射到多Agent协作 |
|
||||
|---|---|
|
||||
| `brainstorming` → 意图澄清 | 收到指令先反问确认,避免直接执行 |
|
||||
| `writing-plans` → 任务分解 | 拆解后分发云系/风系执行 |
|
||||
| `verification-before-completion` → 交付检查 | 执行结果先验证再上报 |
|
||||
| `finishing-a-branch` → 收尾归档 | 结果写入 Obsidian + 通知用户 |
|
||||
|
||||
---
|
||||
|
||||
## 六、整合路径
|
||||
|
||||
### 路径 A:直接移植(适合编码子任务)
|
||||
|
||||
Superpowers 可安装为 Claude Code 的插件(Skill 格式),但:
|
||||
- **只有 Claude Code 会话可用**
|
||||
- 不会自动扩散到其他 OpenClaw Agent
|
||||
|
||||
**安装命令(Claude Code):**
|
||||
```bash
|
||||
/plugin marketplace add obra/superpowers-marketplace
|
||||
/plugin install superpowers@superpowers-marketplace
|
||||
```
|
||||
|
||||
**触发示例:**
|
||||
| 对 Claude Code 说 | 激活技能 |
|
||||
|---|---|
|
||||
| "帮我规划这个功能" | `brainstorming` |
|
||||
| "写个实现方案" | `writing-plans` |
|
||||
| "开始执行" | `executing-plans` |
|
||||
| "帮我 review 代码" | `requesting-code-review` |
|
||||
| "修复这个 bug" | `systematic-debugging` |
|
||||
| "功能写完了,帮我收尾" | `finishing-a-branch` |
|
||||
|
||||
### 路径 B:为团队定制「轻量版」方法论(推荐)
|
||||
|
||||
从 Superpowers 提取通用部分,转化为协作规范,应用于所有 Agent。
|
||||
|
||||
---
|
||||
|
||||
## 七、Superpowers 的安装与使用
|
||||
|
||||
### 7.1 支持平台
|
||||
|
||||
| 平台 | 安装方式 |
|
||||
|---|---|
|
||||
| Claude Code(官方市场) | `/plugin install superpowers@claude-plugins-official` |
|
||||
| Claude Code(插件市场) | `/plugin marketplace add obra/superpowers-marketplace` |
|
||||
| Cursor | `/add-plugin superpowers` |
|
||||
| Codex | Fetch `https://raw.githubusercontent.com/obra/superpowers/refs/heads/main/.codex/INSTALL.md` |
|
||||
| OpenCode | Fetch `https://raw.githubusercontent.com/obra/superpowers/refs/heads/main/.opencode/INSTALL.md` |
|
||||
| Gemini CLI | `gemini extensions install https://github.com/obra/superpowers` |
|
||||
|
||||
### 7.2 验证安装
|
||||
|
||||
启动新会话,问「帮我规划这个功能」或「帮我 debug」,Agent 应自动触发相关技能。
|
||||
|
||||
---
|
||||
|
||||
## 八、建议的第一步
|
||||
|
||||
**先选一个具体任务,用 Superpowers 方法论跑一遍演示:**
|
||||
|
||||
示例任务:"检查所有服务器状态并出报告"
|
||||
|
||||
执行流程:
|
||||
1. **意图澄清** — 汇总报告 vs 实时状态?报告格式?触发条件?
|
||||
2. **形成文字计划** → 用户确认
|
||||
3. **分发子任务** → 并行执行
|
||||
4. **交叉验证** → 汇总交付
|
||||
|
||||
---
|
||||
|
||||
## 九、项目信息
|
||||
|
||||
- **作者:** Jesse Vincent (Best Practical)
|
||||
- **许可证:** MIT
|
||||
- **仓库:** https://github.com/obra/superpowers
|
||||
- **市场:** https://github.com/obra/superpowers-marketplace
|
||||
- **Discord:** https://discord.gg/Jd8Vphy9jq
|
||||
- **博客:** https://blog.fsck.com/2025/10/09/superpowers/
|
||||
|
||||
---
|
||||
|
||||
## 十、待办事项
|
||||
|
||||
- [ ] 在 Claude Code 中安装 Superpowers(本次暂不执行)
|
||||
- [ ] 选一个具体任务跑一遍演示
|
||||
- [ ] 制定团队轻量版方法论文档
|
||||
- [ ] 将方法论同步到 MEMORY.md
|
||||
|
||||
---
|
||||
|
||||
*本文档由星枢整理,基于 2026-04-05 与比利哥的讨论*
|
||||
---
|
||||
title: Superpowers 方法论与 Agent-Based 项目整合
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: [multi-agent, superpowers]
|
||||
---
|
||||
|
||||
# Superpowers 方法论与 Agent-Based 项目整合
|
||||
|
||||
> **创建时间:** 2026-04-05
|
||||
> **来源:** 与比利哥的讨论
|
||||
> **标签:** #方法论 #multi-agent #superpowers
|
||||
|
||||
---
|
||||
|
||||
## 一、Superpowers 是什么
|
||||
|
||||
Superpowers 是由 Jesse Vincent(Best Practical)构建的 **软件编码工作流框架**,专为 AI 编码代理(Claude Code、Cursor、Codex、OpenCode)设计。
|
||||
|
||||
**核心理念:**"Don't just jump in — step back, question, plan, then execute."
|
||||
|
||||
---
|
||||
|
||||
## 二、核心工作流(Basic Workflow)
|
||||
|
||||
| 阶段 | 触发时机 | 关键产出 |
|
||||
|---|---|---|
|
||||
| `brainstorming` | 写代码前 | 澄清意图、设计文档 |
|
||||
| `using-git-worktrees` | 设计确认后 | 隔离工作分支 |
|
||||
| `writing-plans` | 设计定稿后 | 可执行的小任务清单(2-5分钟/项) |
|
||||
| `subagent-driven-development` / `executing-plans` | 计划就绪 | 并行分发子任务、两阶段审查 |
|
||||
| `test-driven-development` | 实现过程中 | RED-GREEN-REFACTOR 循环 |
|
||||
| `requesting-code-review` | 任务间 | 按严重级别报告问题 |
|
||||
| `finishing-a-development-branch` | 任务完成 | 测试验证 + PR/合并选项 |
|
||||
|
||||
---
|
||||
|
||||
## 三、核心技能清单
|
||||
|
||||
### Testing
|
||||
- `test-driven-development` — RED-GREEN-REFACTOR 循环
|
||||
|
||||
### Debugging
|
||||
- `systematic-debugging` — 4步根因分析
|
||||
- `verification-before-completion` — 确保真正修复
|
||||
|
||||
### Collaboration
|
||||
- `brainstorming` — 苏格拉底式设计精炼
|
||||
- `writing-plans` — 详细实现计划
|
||||
- `executing-plans` — 分批执行 + 检查点
|
||||
- `dispatching-parallel-agents` — 并发子任务工作流
|
||||
- `requesting-code-review` — 审查前检查清单
|
||||
- `receiving-code-review` — 响应反馈
|
||||
- `using-git-worktrees` — 并行开发分支
|
||||
- `finishing-a-development-branch` — 合并/PR决策
|
||||
- `subagent-driven-development` — 两阶段审查(规格合规 → 代码质量)
|
||||
|
||||
### Meta
|
||||
- `writing-skills` — 创建新技能
|
||||
- `using-superpowers` — 技能系统入门
|
||||
|
||||
---
|
||||
|
||||
## 四、哲学原则
|
||||
|
||||
- **Test-Driven Development** — 先写测试,永远
|
||||
- **Systematic over ad-hoc** — 流程优于猜测
|
||||
- **Complexity reduction** — 简洁是首要目标
|
||||
- **Evidence over claims** — 验证后才算成功
|
||||
|
||||
---
|
||||
|
||||
## 五、对 OpenClaw Agent 矩阵的映射
|
||||
|
||||
### 5.1 Agent 类型与适用度
|
||||
|
||||
| Agent 类型 | 示例 | Superpowers 适用度 |
|
||||
|---|---|---|
|
||||
| **编码型** | xingjiang (星匠) | ✅ 高 — 直接可用 |
|
||||
| **运维型** | xingyao (星曜)、yunhan | ✅ 中 — 部分技能适用 |
|
||||
| **协调型** | xingshu (星枢/我) | ❌ 低 — 方法论而非执行框架 |
|
||||
|
||||
### 5.2 技能映射对照
|
||||
|
||||
| Superpowers 通用环节 | 映射到多Agent协作 |
|
||||
|---|---|
|
||||
| `brainstorming` → 意图澄清 | 收到指令先反问确认,避免直接执行 |
|
||||
| `writing-plans` → 任务分解 | 拆解后分发云系/风系执行 |
|
||||
| `verification-before-completion` → 交付检查 | 执行结果先验证再上报 |
|
||||
| `finishing-a-branch` → 收尾归档 | 结果写入 Obsidian + 通知用户 |
|
||||
|
||||
---
|
||||
|
||||
## 六、整合路径
|
||||
|
||||
### 路径 A:直接移植(适合编码子任务)
|
||||
|
||||
Superpowers 可安装为 Claude Code 的插件(Skill 格式),但:
|
||||
- **只有 Claude Code 会话可用**
|
||||
- 不会自动扩散到其他 OpenClaw Agent
|
||||
|
||||
**安装命令(Claude Code):**
|
||||
```bash
|
||||
/plugin marketplace add obra/superpowers-marketplace
|
||||
/plugin install superpowers@superpowers-marketplace
|
||||
```
|
||||
|
||||
**触发示例:**
|
||||
| 对 Claude Code 说 | 激活技能 |
|
||||
|---|---|
|
||||
| "帮我规划这个功能" | `brainstorming` |
|
||||
| "写个实现方案" | `writing-plans` |
|
||||
| "开始执行" | `executing-plans` |
|
||||
| "帮我 review 代码" | `requesting-code-review` |
|
||||
| "修复这个 bug" | `systematic-debugging` |
|
||||
| "功能写完了,帮我收尾" | `finishing-a-branch` |
|
||||
|
||||
### 路径 B:为团队定制「轻量版」方法论(推荐)
|
||||
|
||||
从 Superpowers 提取通用部分,转化为协作规范,应用于所有 Agent。
|
||||
|
||||
---
|
||||
|
||||
## 七、Superpowers 的安装与使用
|
||||
|
||||
### 7.1 支持平台
|
||||
|
||||
| 平台 | 安装方式 |
|
||||
|---|---|
|
||||
| Claude Code(官方市场) | `/plugin install superpowers@claude-plugins-official` |
|
||||
| Claude Code(插件市场) | `/plugin marketplace add obra/superpowers-marketplace` |
|
||||
| Cursor | `/add-plugin superpowers` |
|
||||
| Codex | Fetch `https://raw.githubusercontent.com/obra/superpowers/refs/heads/main/.codex/INSTALL.md` |
|
||||
| OpenCode | Fetch `https://raw.githubusercontent.com/obra/superpowers/refs/heads/main/.opencode/INSTALL.md` |
|
||||
| Gemini CLI | `gemini extensions install https://github.com/obra/superpowers` |
|
||||
|
||||
### 7.2 验证安装
|
||||
|
||||
启动新会话,问「帮我规划这个功能」或「帮我 debug」,Agent 应自动触发相关技能。
|
||||
|
||||
---
|
||||
|
||||
## 八、建议的第一步
|
||||
|
||||
**先选一个具体任务,用 Superpowers 方法论跑一遍演示:**
|
||||
|
||||
示例任务:"检查所有服务器状态并出报告"
|
||||
|
||||
执行流程:
|
||||
1. **意图澄清** — 汇总报告 vs 实时状态?报告格式?触发条件?
|
||||
2. **形成文字计划** → 用户确认
|
||||
3. **分发子任务** → 并行执行
|
||||
4. **交叉验证** → 汇总交付
|
||||
|
||||
---
|
||||
|
||||
## 九、项目信息
|
||||
|
||||
- **作者:** Jesse Vincent (Best Practical)
|
||||
- **许可证:** MIT
|
||||
- **仓库:** https://github.com/obra/superpowers
|
||||
- **市场:** https://github.com/obra/superpowers-marketplace
|
||||
- **Discord:** https://discord.gg/Jd8Vphy9jq
|
||||
- **博客:** https://blog.fsck.com/2025/10/09/superpowers/
|
||||
|
||||
---
|
||||
|
||||
## 十、待办事项
|
||||
|
||||
- [ ] 在 Claude Code 中安装 Superpowers(本次暂不执行)
|
||||
- [ ] 选一个具体任务跑一遍演示
|
||||
- [ ] 制定团队轻量版方法论文档
|
||||
- [ ] 将方法论同步到 MEMORY.md
|
||||
|
||||
---
|
||||
|
||||
*本文档由星枢整理,基于 2026-04-05 与比利哥的讨论*
|
||||
|
||||
@@ -1,215 +1,215 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Agent 任务轮询脚本
|
||||
每个 Agent 定时运行,查询分配给自己的任务并执行
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import logging
|
||||
import requests
|
||||
from datetime import datetime
|
||||
|
||||
# 配置日志
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# 环境变量或配置
|
||||
NOTION_TOKEN = os.environ.get("NOTION_TOKEN", "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")
|
||||
AGENT_ID = os.environ.get("AGENT_ID", "yunjiang") # 当前 Agent ID
|
||||
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL", "180")) # 轮询间隔(秒),默认3分钟
|
||||
|
||||
# Database IDs
|
||||
TASKS_DB_ID = "32847fe1-da27-8135-af44-eefdbd3b1640"
|
||||
AGENTS_DB_ID = "32847fe1-da27-8101-8758-d416db87d4de"
|
||||
|
||||
# Notion API 基础 URL
|
||||
NOTION_API_BASE = "https://api.notion.com/v1"
|
||||
|
||||
|
||||
def notion_request(method, endpoint, **kwargs):
|
||||
"""Notion API 请求封装"""
|
||||
url = f"{NOTION_API_BASE}{endpoint}"
|
||||
headers = {
|
||||
"Authorization": f"Bearer {NOTION_TOKEN}",
|
||||
"Notion-Version": "2022-06-28",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
if method == "GET":
|
||||
response = requests.get(url, headers=headers, **kwargs)
|
||||
elif method == "POST":
|
||||
response = requests.post(url, headers=headers, **kwargs)
|
||||
elif method == "PATCH":
|
||||
response = requests.patch(url, headers=headers, **kwargs)
|
||||
else:
|
||||
raise ValueError(f"Unsupported method: {method}")
|
||||
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
|
||||
def get_agent_info(agent_id):
|
||||
"""获取 Agent 信息"""
|
||||
response = notion_request("POST", f"/databases/{AGENTS_DB_ID}/query")
|
||||
|
||||
for page in response.get("results", []):
|
||||
props = page.get("properties", {})
|
||||
if "Agent ID" in props:
|
||||
title = props["Agent ID"]["title"]
|
||||
if title and title[0]["plain_text"] == agent_id:
|
||||
return {
|
||||
"id": page["id"],
|
||||
"name": props["名称"]["rich_text"][0]["plain_text"] if props["名称"]["rich_text"] else agent_id,
|
||||
"status": props["状态"]["select"]["name"] if props["状态"].get("select") else "离线"
|
||||
}
|
||||
return None
|
||||
|
||||
|
||||
def query_todo_tasks(agent_page_id=None):
|
||||
"""查询 TODO 任务"""
|
||||
logger.info(f"查询 {AGENT_ID} 的 TODO 任务...")
|
||||
|
||||
# 如果有 agent_page_id,使用 Relation 过滤
|
||||
filter_dict = {
|
||||
"property": "状态",
|
||||
"select": {
|
||||
"equals": "TODO"
|
||||
}
|
||||
}
|
||||
|
||||
# 这里简化处理:查询所有 TODO 任务
|
||||
# 后续应该根据执行者 Relation 过滤
|
||||
|
||||
try:
|
||||
response = notion_request(
|
||||
"POST",
|
||||
f"/databases/{TASKS_DB_ID}/query",
|
||||
json={"filter": filter_dict}
|
||||
)
|
||||
tasks = response.get("results", [])
|
||||
logger.info(f"找到 {len(tasks)} 个 TODO 任务")
|
||||
return tasks
|
||||
except Exception as e:
|
||||
logger.error(f"查询任务失败: {e}")
|
||||
return []
|
||||
|
||||
|
||||
def claim_task(task_id):
|
||||
"""领取任务:将状态改为进行中"""
|
||||
try:
|
||||
notion_request(
|
||||
"PATCH",
|
||||
f"/pages/{task_id}",
|
||||
json={
|
||||
"properties": {
|
||||
"状态": {
|
||||
"select": {"name": "进行中"}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
logger.info(f"✓ 领取任务成功: {task_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"领取任务失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def complete_task(task_id, report_link):
|
||||
"""完成任务:将状态改为待验收"""
|
||||
try:
|
||||
notion_request(
|
||||
"PATCH",
|
||||
f"/pages/{task_id}",
|
||||
json={
|
||||
"properties": {
|
||||
"状态": {
|
||||
"select": {"name": "待验收"}
|
||||
},
|
||||
"报告链接": {
|
||||
"url": report_link
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
logger.info(f"✓ 完成任务: {task_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"完成任务失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def execute_task(task):
|
||||
"""执行任务的逻辑(可自定义)"""
|
||||
# 获取任务信息
|
||||
props = task.get("properties", {})
|
||||
task_name = props.get("任务名", {}).get("title", [{}])[0].get("plain_text", "未命名任务")
|
||||
task_id = task["id"]
|
||||
|
||||
logger.info(f"开始执行任务: {task_name}")
|
||||
|
||||
# 1. 领取任务
|
||||
if not claim_task(task_id):
|
||||
return False
|
||||
|
||||
# 2. 执行任务(这里只是示例,实际应根据任务类型执行不同操作)
|
||||
# 模拟执行
|
||||
time.sleep(2)
|
||||
|
||||
# 3. 完成任务(生成报告链接)
|
||||
# 这里应该生成实际的 Obsidian 报告
|
||||
report_link = f"https://example.com/report/{task_id}"
|
||||
|
||||
return complete_task(task_id, report_link)
|
||||
|
||||
|
||||
def polling_loop():
|
||||
"""轮询主循环"""
|
||||
logger.info(f"🚀 Agent {AGENT_ID} 任务轮询启动")
|
||||
logger.info(f"轮询间隔: {POLL_INTERVAL} 秒")
|
||||
|
||||
# 获取 Agent 信息
|
||||
agent_info = get_agent_info(AGENT_ID)
|
||||
if agent_info:
|
||||
logger.info(f"Agent 信息: {agent_info['name']} (状态: {agent_info['status']})")
|
||||
else:
|
||||
logger.warning(f"未找到 Agent: {AGENT_ID}")
|
||||
|
||||
while True:
|
||||
try:
|
||||
# 查询 TODO 任务
|
||||
tasks = query_todo_tasks(None)
|
||||
|
||||
if tasks:
|
||||
logger.info(f"发现 {len(tasks)} 个待处理任务")
|
||||
for task in tasks:
|
||||
execute_task(task)
|
||||
else:
|
||||
logger.debug("没有待处理任务")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"轮询异常: {e}")
|
||||
|
||||
time.sleep(POLL_INTERVAL)
|
||||
|
||||
|
||||
def main():
|
||||
"""主入口"""
|
||||
if len(sys.argv) > 1:
|
||||
global AGENT_ID
|
||||
AGENT_ID = sys.argv[1]
|
||||
|
||||
if len(sys.argv) > 2:
|
||||
global POLL_INTERVAL
|
||||
POLL_INTERVAL = int(sys.argv[2])
|
||||
|
||||
polling_loop()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Agent 任务轮询脚本
|
||||
每个 Agent 定时运行,查询分配给自己的任务并执行
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import logging
|
||||
import requests
|
||||
from datetime import datetime
|
||||
|
||||
# 配置日志
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# 环境变量或配置
|
||||
NOTION_TOKEN = os.environ.get("NOTION_TOKEN", "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")
|
||||
AGENT_ID = os.environ.get("AGENT_ID", "yunjiang") # 当前 Agent ID
|
||||
POLL_INTERVAL = int(os.environ.get("POLL_INTERVAL", "180")) # 轮询间隔(秒),默认3分钟
|
||||
|
||||
# Database IDs
|
||||
TASKS_DB_ID = "32847fe1-da27-8135-af44-eefdbd3b1640"
|
||||
AGENTS_DB_ID = "32847fe1-da27-8101-8758-d416db87d4de"
|
||||
|
||||
# Notion API 基础 URL
|
||||
NOTION_API_BASE = "https://api.notion.com/v1"
|
||||
|
||||
|
||||
def notion_request(method, endpoint, **kwargs):
|
||||
"""Notion API 请求封装"""
|
||||
url = f"{NOTION_API_BASE}{endpoint}"
|
||||
headers = {
|
||||
"Authorization": f"Bearer {NOTION_TOKEN}",
|
||||
"Notion-Version": "2022-06-28",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
if method == "GET":
|
||||
response = requests.get(url, headers=headers, **kwargs)
|
||||
elif method == "POST":
|
||||
response = requests.post(url, headers=headers, **kwargs)
|
||||
elif method == "PATCH":
|
||||
response = requests.patch(url, headers=headers, **kwargs)
|
||||
else:
|
||||
raise ValueError(f"Unsupported method: {method}")
|
||||
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
|
||||
def get_agent_info(agent_id):
|
||||
"""获取 Agent 信息"""
|
||||
response = notion_request("POST", f"/databases/{AGENTS_DB_ID}/query")
|
||||
|
||||
for page in response.get("results", []):
|
||||
props = page.get("properties", {})
|
||||
if "Agent ID" in props:
|
||||
title = props["Agent ID"]["title"]
|
||||
if title and title[0]["plain_text"] == agent_id:
|
||||
return {
|
||||
"id": page["id"],
|
||||
"name": props["名称"]["rich_text"][0]["plain_text"] if props["名称"]["rich_text"] else agent_id,
|
||||
"status": props["状态"]["select"]["name"] if props["状态"].get("select") else "离线"
|
||||
}
|
||||
return None
|
||||
|
||||
|
||||
def query_todo_tasks(agent_page_id=None):
|
||||
"""查询 TODO 任务"""
|
||||
logger.info(f"查询 {AGENT_ID} 的 TODO 任务...")
|
||||
|
||||
# 如果有 agent_page_id,使用 Relation 过滤
|
||||
filter_dict = {
|
||||
"property": "状态",
|
||||
"select": {
|
||||
"equals": "TODO"
|
||||
}
|
||||
}
|
||||
|
||||
# 这里简化处理:查询所有 TODO 任务
|
||||
# 后续应该根据执行者 Relation 过滤
|
||||
|
||||
try:
|
||||
response = notion_request(
|
||||
"POST",
|
||||
f"/databases/{TASKS_DB_ID}/query",
|
||||
json={"filter": filter_dict}
|
||||
)
|
||||
tasks = response.get("results", [])
|
||||
logger.info(f"找到 {len(tasks)} 个 TODO 任务")
|
||||
return tasks
|
||||
except Exception as e:
|
||||
logger.error(f"查询任务失败: {e}")
|
||||
return []
|
||||
|
||||
|
||||
def claim_task(task_id):
|
||||
"""领取任务:将状态改为进行中"""
|
||||
try:
|
||||
notion_request(
|
||||
"PATCH",
|
||||
f"/pages/{task_id}",
|
||||
json={
|
||||
"properties": {
|
||||
"状态": {
|
||||
"select": {"name": "进行中"}
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
logger.info(f"✓ 领取任务成功: {task_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"领取任务失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def complete_task(task_id, report_link):
|
||||
"""完成任务:将状态改为待验收"""
|
||||
try:
|
||||
notion_request(
|
||||
"PATCH",
|
||||
f"/pages/{task_id}",
|
||||
json={
|
||||
"properties": {
|
||||
"状态": {
|
||||
"select": {"name": "待验收"}
|
||||
},
|
||||
"报告链接": {
|
||||
"url": report_link
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
logger.info(f"✓ 完成任务: {task_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"完成任务失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def execute_task(task):
|
||||
"""执行任务的逻辑(可自定义)"""
|
||||
# 获取任务信息
|
||||
props = task.get("properties", {})
|
||||
task_name = props.get("任务名", {}).get("title", [{}])[0].get("plain_text", "未命名任务")
|
||||
task_id = task["id"]
|
||||
|
||||
logger.info(f"开始执行任务: {task_name}")
|
||||
|
||||
# 1. 领取任务
|
||||
if not claim_task(task_id):
|
||||
return False
|
||||
|
||||
# 2. 执行任务(这里只是示例,实际应根据任务类型执行不同操作)
|
||||
# 模拟执行
|
||||
time.sleep(2)
|
||||
|
||||
# 3. 完成任务(生成报告链接)
|
||||
# 这里应该生成实际的 Obsidian 报告
|
||||
report_link = f"https://example.com/report/{task_id}"
|
||||
|
||||
return complete_task(task_id, report_link)
|
||||
|
||||
|
||||
def polling_loop():
|
||||
"""轮询主循环"""
|
||||
logger.info(f"🚀 Agent {AGENT_ID} 任务轮询启动")
|
||||
logger.info(f"轮询间隔: {POLL_INTERVAL} 秒")
|
||||
|
||||
# 获取 Agent 信息
|
||||
agent_info = get_agent_info(AGENT_ID)
|
||||
if agent_info:
|
||||
logger.info(f"Agent 信息: {agent_info['name']} (状态: {agent_info['status']})")
|
||||
else:
|
||||
logger.warning(f"未找到 Agent: {AGENT_ID}")
|
||||
|
||||
while True:
|
||||
try:
|
||||
# 查询 TODO 任务
|
||||
tasks = query_todo_tasks(None)
|
||||
|
||||
if tasks:
|
||||
logger.info(f"发现 {len(tasks)} 个待处理任务")
|
||||
for task in tasks:
|
||||
execute_task(task)
|
||||
else:
|
||||
logger.debug("没有待处理任务")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"轮询异常: {e}")
|
||||
|
||||
time.sleep(POLL_INTERVAL)
|
||||
|
||||
|
||||
def main():
|
||||
"""主入口"""
|
||||
if len(sys.argv) > 1:
|
||||
global AGENT_ID
|
||||
AGENT_ID = sys.argv[1]
|
||||
|
||||
if len(sys.argv) > 2:
|
||||
global POLL_INTERVAL
|
||||
POLL_INTERVAL = int(sys.argv[2])
|
||||
|
||||
polling_loop()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -1,300 +1,300 @@
|
||||
# 音视频转录与摘要流水线
|
||||
|
||||
> 文档版本:2026-04-15
|
||||
> 维护者:星枢(xingshu)
|
||||
> 状态:✅ 已验证可运行
|
||||
> 适用场景:教学视频、讲座、播客、会议录音等任意音视频内容的转录与知识库导入
|
||||
|
||||
---
|
||||
|
||||
## 一、整体架构
|
||||
|
||||
```
|
||||
视频/音频源(本地或 NAS)
|
||||
│
|
||||
▼
|
||||
[阶段1] FFmpeg 音频提取
|
||||
│ MP4/AVI/MKV → MP3
|
||||
▼
|
||||
本地/共享 MP3 文件库
|
||||
│
|
||||
▼
|
||||
[阶段2] Whisper 转录
|
||||
│ MP3 → 英文字幕/转写稿
|
||||
▼
|
||||
本地转写稿
|
||||
│
|
||||
▼
|
||||
[阶段3] Gemini Flash 摘要
|
||||
│ 转写稿 → 结构化中文笔记
|
||||
▼
|
||||
Obsidian 知识库
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、各阶段详解
|
||||
|
||||
### 阶段 1:FFmpeg 音频提取
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | `.mp4`、`.avi`、`.mkv`、`.mov` 等常见视频格式 |
|
||||
| **输出** | 同目录下的同名 `.mp3` 文件 |
|
||||
| **工具** | FFmpeg(Macmini 已安装:`/opt/homebrew/bin/ffmpeg`) |
|
||||
| **转码参数** | `-vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1`(64kbps CBR,针对人声优化)|
|
||||
| **传输方式** | `ssh cat` 管道(NAS 不需要挂载)|
|
||||
| **速度** | ~400x realtime(1小时视频 ≈ 9秒提取)|
|
||||
|
||||
**命令示例:**
|
||||
```bash
|
||||
# NAS → Macmini FFmpeg → 回写 NAS
|
||||
ssh shenwei@192.168.3.17 "cat '/path/to/video/VIDEO.mp4'" \
|
||||
| /opt/homebrew/bin/ffmpeg -i pipe:0 -vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1 -f mp3 pipe:1 \
|
||||
| ssh shenwei@192.168.3.17 "cat > '/path/to/video/VIDEO.mp3'"
|
||||
|
||||
# 本地文件直接转换
|
||||
/opt/homebrew/bin/ffmpeg -i "/path/to/video/VIDEO.mp4" -vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1 "/path/to/video/VIDEO.mp3"
|
||||
```
|
||||
|
||||
**脚本位置:** `~/.openclaw/temp/xingshu/scripts/nas_audio_extract_v3.py`
|
||||
|
||||
---
|
||||
|
||||
### 阶段 2:Whisper 转录
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | `.mp3` 文件(已由阶段1生成,或直接提供的音频文件)|
|
||||
| **输出** | 英文字幕/转写稿(纯文本)|
|
||||
| **工具** | `openai-whisper`(Python 包,通过 `pip install openai-whisper` 安装)|
|
||||
| **模型** | `small`(精度与速度平衡,M 系列芯片友好)|
|
||||
| **硬件** | Macmini 本地运行(Neural Engine 加速)|
|
||||
| **速度** | ~50x realtime(1小时音频 ≈ 40-50秒)|
|
||||
| **内存** | ~1.5GB(small 模型)|
|
||||
| **费用** | 完全免费(本地运行,无需 API)|
|
||||
|
||||
**安装命令:**
|
||||
```bash
|
||||
pip3 install openai-whisper
|
||||
```
|
||||
|
||||
**调用示例:**
|
||||
```python
|
||||
import whisper
|
||||
model = whisper.load_model("small")
|
||||
result = model.transcribe("audio.mp3", language="en", fp16=False)
|
||||
print(result["text"]) # 英文字幕/转写稿
|
||||
```
|
||||
|
||||
**转写稿长度参考:** 1小时音频 ≈ 6000-8000 tokens(英文)
|
||||
|
||||
**模型选择参考:**
|
||||
|
||||
| 模型 | 内存占用 | 速度 | 精度 | 适用场景 |
|
||||
|---|---|---|---|---|
|
||||
| `tiny` | ~1GB | 极快 | 低 | 快速预览、噪音少的内容 |
|
||||
| `small` | ~1.5GB | 快 | 中 | **推荐日常使用** |
|
||||
| `medium` | ~3GB | 慢 | 高 | 重要内容、方言口音重 |
|
||||
| `large` | ~5GB+ | 很慢 | 最高 | 最高精度要求(Macmini 不推荐)|
|
||||
|
||||
---
|
||||
|
||||
### 阶段 3:Gemini Flash 摘要
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | Whisper 转写的英文字幕/转写稿 |
|
||||
| **输出** | 结构化中文笔记(摘要 + 关键概念 + 相关链接)|
|
||||
| **工具** | Google Gemini API(直接 HTTP 调用,不依赖 summarize CLI)|
|
||||
| **模型** | `gemini-3-flash-preview` |
|
||||
| **费用** | ~0.075$/百万输入 tokens(3000分钟音频 ≈ ~$0.15)|
|
||||
| **API Key** | `AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0`(需有效)|
|
||||
|
||||
**摘要输出格式:**
|
||||
```markdown
|
||||
## 摘要
|
||||
|
||||
> [300-500字中文摘要]
|
||||
|
||||
---
|
||||
|
||||
## 关键概念
|
||||
|
||||
- **[概念名称]**: [一句话解释]
|
||||
|
||||
---
|
||||
|
||||
## 相关笔记
|
||||
|
||||
> [!info]+ 交叉引用
|
||||
> [[相关笔记标题]] — 关联原因
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、全量流水线脚本
|
||||
|
||||
**脚本位置:** `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`
|
||||
|
||||
**核心逻辑:**
|
||||
1. 扫描指定目录下的 `.mp3` 文件(跳过 `*.done` 中已有的)
|
||||
2. `ssh cat` 下载到 Macmini 本地临时目录
|
||||
3. Whisper `small` 转录
|
||||
4. Gemini Flash 摘要
|
||||
5. 更新 Obsidian 对应笔记(替换 `## 摘要` 到 `## 相关笔记` 之间的内容)
|
||||
6. 删除本地临时文件
|
||||
7. 写入 `.done` 进度文件(断点续传)
|
||||
|
||||
**启动命令:**
|
||||
```bash
|
||||
cd ~/.openclaw/temp/xingshu
|
||||
nohup python3 scripts/nas_whisper_gemini_summarize.py > nas_whisper_summarize_stdout.log 2>&1 &
|
||||
echo "PID=$!"
|
||||
```
|
||||
|
||||
**查看进度:**
|
||||
```bash
|
||||
tail -f ~/.openclaw/temp/xingshu/logs/nas_whisper_summarize.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、Obsidian 笔记模板
|
||||
|
||||
笔记文件位于:`~/Workspace/nexus/knowledgebase/`
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "音视频标题"
|
||||
type: transcription
|
||||
source-type: video
|
||||
category: "你的分类路径"
|
||||
tags:
|
||||
- 标签1
|
||||
- 标签2
|
||||
date-added: 2026-04-15
|
||||
audio-source: "/path/to/audio.mp3"
|
||||
transcript-source: "/path/to/transcript.txt"
|
||||
status: summarized # raw → summarized
|
||||
---
|
||||
|
||||
# 音视频标题
|
||||
|
||||
**Source:** /path/to/video.mp4
|
||||
**Type:** VIDEO | **Category:** 你的分类
|
||||
|
||||
**Status:** ✅ 已完成
|
||||
|
||||
---
|
||||
|
||||
## 摘要
|
||||
|
||||
> [Gemini Flash 生成的中文摘要]
|
||||
|
||||
---
|
||||
|
||||
## 关键概念
|
||||
|
||||
- **[概念名称]**: [一句话解释]
|
||||
|
||||
---
|
||||
|
||||
## 相关笔记
|
||||
|
||||
> [!info]+ 交叉引用
|
||||
> [[相关笔记标题]] — 关联原因
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、API Key 汇总
|
||||
|
||||
| 服务 | Key | 用途 | 状态 |
|
||||
|---|---|---|---|
|
||||
| Google Gemini | `AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0` | 摘要生成 | ✅ 有效 |
|
||||
|
||||
---
|
||||
|
||||
## 六、费用估算(100个视频,~3000分钟总时长)
|
||||
|
||||
| 阶段 | 工具 | 费用 |
|
||||
|---|---|---|
|
||||
| 音频提取 | FFmpeg | $0 |
|
||||
| 语音转录 | Whisper(本地) | $0 |
|
||||
| 摘要生成 | Gemini Flash | ~$0.15 |
|
||||
| **合计** | | **~$0.15** |
|
||||
|
||||
---
|
||||
|
||||
## 七、已知限制与注意事项
|
||||
|
||||
1. **Gemini API Key 必须有效**:每 24 小时检查一次 key 状态
|
||||
2. **Whisper 模型选择**:`tiny` 最快但精度低,`small` 平衡,`medium`/`large` Macmini 内存不够
|
||||
3. **音频质量**:Whisper 对音质敏感,背景噪音会导致转写质量下降
|
||||
4. **断点续传**:脚本使用 `.done` 文件记录已完成的文件,重启不会重复处理
|
||||
5. **文件命名**:文件名中的空格和特殊字符(`_`、`(`、`)`)需要正确处理
|
||||
6. **语言参数**:默认英文转录,如需其他语言可设置 `language="zh"` 或 `language="auto"`
|
||||
7. **视频源路径**:根据实际存放位置修改脚本中的 `SOURCE_DIR` 变量
|
||||
|
||||
---
|
||||
|
||||
## 八、快捷命令速查
|
||||
|
||||
```bash
|
||||
# 查看音频提取进度
|
||||
cat ~/.openclaw/temp/xingshu/logs/nas_audio_v3.log | tail -10
|
||||
|
||||
# 查看转录摘要进度
|
||||
tail -f ~/.openclaw/temp/xingshu/logs/nas_whisper_summarize.log
|
||||
|
||||
# 查看指定目录下 mp3 数量
|
||||
ssh shenwei@192.168.3.17 "ls '/path/to/videos/'*.mp3 2>/dev/null | wc -l"
|
||||
|
||||
# 查看已处理摘要数量
|
||||
cat ~/.openclaw/temp/xingshu/nas_whisper_summarize.done | wc -l
|
||||
|
||||
# 手动测试 Whisper
|
||||
python3 -c "import whisper; m=whisper.load_model('small'); print(m.transcribe('test.mp3')['text'][:100])"
|
||||
|
||||
# 手动测试 Gemini
|
||||
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"contents":[{"parts":[{"text":"say hi in 3 words"}]}]}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 九、配置说明
|
||||
|
||||
### 修改视频源路径
|
||||
|
||||
编辑脚本 `nas_whisper_gemini_summarize.py`,找到以下变量并修改:
|
||||
|
||||
```python
|
||||
# NAS 源视频目录(包含 mp4 文件)
|
||||
SOURCE_DIR = "/volume2/work/Public Cloud Learning Sessions/"
|
||||
|
||||
# NAS 输出音频目录(mp3 输出位置)
|
||||
AUDIO_DIR = "/volume2/work/Public Cloud Learning Sessions/"
|
||||
|
||||
# Obsidian 笔记目录
|
||||
OBSIDIAN_NOTE_DIR = "/Users/weishen/Workspace/nexus/knowledgebase/"
|
||||
|
||||
# NAS SSH 连接信息
|
||||
NAS_HOST = "192.168.3.17"
|
||||
NAS_USER = "shenwei"
|
||||
```
|
||||
|
||||
### 适配不同视频源
|
||||
|
||||
```python
|
||||
# 方案1: 另一个 NAS 路径
|
||||
SOURCE_DIR = "/volume2/work/OtherVideos/"
|
||||
|
||||
# 方案2: 本地目录(无需 SSH)
|
||||
SOURCE_DIR = "/Users/weishen/Workspace/videos/"
|
||||
|
||||
# 方案3: 另一个服务器
|
||||
NAS_HOST = "192.168.3.45" # Ubuntu2
|
||||
```
|
||||
# 音视频转录与摘要流水线
|
||||
|
||||
> 文档版本:2026-04-15
|
||||
> 维护者:星枢(xingshu)
|
||||
> 状态:✅ 已验证可运行
|
||||
> 适用场景:教学视频、讲座、播客、会议录音等任意音视频内容的转录与知识库导入
|
||||
|
||||
---
|
||||
|
||||
## 一、整体架构
|
||||
|
||||
```
|
||||
视频/音频源(本地或 NAS)
|
||||
│
|
||||
▼
|
||||
[阶段1] FFmpeg 音频提取
|
||||
│ MP4/AVI/MKV → MP3
|
||||
▼
|
||||
本地/共享 MP3 文件库
|
||||
│
|
||||
▼
|
||||
[阶段2] Whisper 转录
|
||||
│ MP3 → 英文字幕/转写稿
|
||||
▼
|
||||
本地转写稿
|
||||
│
|
||||
▼
|
||||
[阶段3] Gemini Flash 摘要
|
||||
│ 转写稿 → 结构化中文笔记
|
||||
▼
|
||||
Obsidian 知识库
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、各阶段详解
|
||||
|
||||
### 阶段 1:FFmpeg 音频提取
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | `.mp4`、`.avi`、`.mkv`、`.mov` 等常见视频格式 |
|
||||
| **输出** | 同目录下的同名 `.mp3` 文件 |
|
||||
| **工具** | FFmpeg(Macmini 已安装:`/opt/homebrew/bin/ffmpeg`) |
|
||||
| **转码参数** | `-vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1`(64kbps CBR,针对人声优化)|
|
||||
| **传输方式** | `ssh cat` 管道(NAS 不需要挂载)|
|
||||
| **速度** | ~400x realtime(1小时视频 ≈ 9秒提取)|
|
||||
|
||||
**命令示例:**
|
||||
```bash
|
||||
# NAS → Macmini FFmpeg → 回写 NAS
|
||||
ssh shenwei@192.168.3.17 "cat '/path/to/video/VIDEO.mp4'" \
|
||||
| /opt/homebrew/bin/ffmpeg -i pipe:0 -vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1 -f mp3 pipe:1 \
|
||||
| ssh shenwei@192.168.3.17 "cat > '/path/to/video/VIDEO.mp3'"
|
||||
|
||||
# 本地文件直接转换
|
||||
/opt/homebrew/bin/ffmpeg -i "/path/to/video/VIDEO.mp4" -vn -acodec libmp3lame -ab 64k -ar 22050 -ac 1 "/path/to/video/VIDEO.mp3"
|
||||
```
|
||||
|
||||
**脚本位置:** `~/.openclaw/temp/xingshu/scripts/nas_audio_extract_v3.py`
|
||||
|
||||
---
|
||||
|
||||
### 阶段 2:Whisper 转录
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | `.mp3` 文件(已由阶段1生成,或直接提供的音频文件)|
|
||||
| **输出** | 英文字幕/转写稿(纯文本)|
|
||||
| **工具** | `openai-whisper`(Python 包,通过 `pip install openai-whisper` 安装)|
|
||||
| **模型** | `small`(精度与速度平衡,M 系列芯片友好)|
|
||||
| **硬件** | Macmini 本地运行(Neural Engine 加速)|
|
||||
| **速度** | ~50x realtime(1小时音频 ≈ 40-50秒)|
|
||||
| **内存** | ~1.5GB(small 模型)|
|
||||
| **费用** | 完全免费(本地运行,无需 API)|
|
||||
|
||||
**安装命令:**
|
||||
```bash
|
||||
pip3 install openai-whisper
|
||||
```
|
||||
|
||||
**调用示例:**
|
||||
```python
|
||||
import whisper
|
||||
model = whisper.load_model("small")
|
||||
result = model.transcribe("audio.mp3", language="en", fp16=False)
|
||||
print(result["text"]) # 英文字幕/转写稿
|
||||
```
|
||||
|
||||
**转写稿长度参考:** 1小时音频 ≈ 6000-8000 tokens(英文)
|
||||
|
||||
**模型选择参考:**
|
||||
|
||||
| 模型 | 内存占用 | 速度 | 精度 | 适用场景 |
|
||||
|---|---|---|---|---|
|
||||
| `tiny` | ~1GB | 极快 | 低 | 快速预览、噪音少的内容 |
|
||||
| `small` | ~1.5GB | 快 | 中 | **推荐日常使用** |
|
||||
| `medium` | ~3GB | 慢 | 高 | 重要内容、方言口音重 |
|
||||
| `large` | ~5GB+ | 很慢 | 最高 | 最高精度要求(Macmini 不推荐)|
|
||||
|
||||
---
|
||||
|
||||
### 阶段 3:Gemini Flash 摘要
|
||||
|
||||
| 项目 | 说明 |
|
||||
|---|---|
|
||||
| **输入** | Whisper 转写的英文字幕/转写稿 |
|
||||
| **输出** | 结构化中文笔记(摘要 + 关键概念 + 相关链接)|
|
||||
| **工具** | Google Gemini API(直接 HTTP 调用,不依赖 summarize CLI)|
|
||||
| **模型** | `gemini-3-flash-preview` |
|
||||
| **费用** | ~0.075$/百万输入 tokens(3000分钟音频 ≈ ~$0.15)|
|
||||
| **API Key** | `AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0`(需有效)|
|
||||
|
||||
**摘要输出格式:**
|
||||
```markdown
|
||||
## 摘要
|
||||
|
||||
> [300-500字中文摘要]
|
||||
|
||||
---
|
||||
|
||||
## 关键概念
|
||||
|
||||
- **[概念名称]**: [一句话解释]
|
||||
|
||||
---
|
||||
|
||||
## 相关笔记
|
||||
|
||||
> [!info]+ 交叉引用
|
||||
> [[相关笔记标题]] — 关联原因
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 三、全量流水线脚本
|
||||
|
||||
**脚本位置:** `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`
|
||||
|
||||
**核心逻辑:**
|
||||
1. 扫描指定目录下的 `.mp3` 文件(跳过 `*.done` 中已有的)
|
||||
2. `ssh cat` 下载到 Macmini 本地临时目录
|
||||
3. Whisper `small` 转录
|
||||
4. Gemini Flash 摘要
|
||||
5. 更新 Obsidian 对应笔记(替换 `## 摘要` 到 `## 相关笔记` 之间的内容)
|
||||
6. 删除本地临时文件
|
||||
7. 写入 `.done` 进度文件(断点续传)
|
||||
|
||||
**启动命令:**
|
||||
```bash
|
||||
cd ~/.openclaw/temp/xingshu
|
||||
nohup python3 scripts/nas_whisper_gemini_summarize.py > nas_whisper_summarize_stdout.log 2>&1 &
|
||||
echo "PID=$!"
|
||||
```
|
||||
|
||||
**查看进度:**
|
||||
```bash
|
||||
tail -f ~/.openclaw/temp/xingshu/logs/nas_whisper_summarize.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、Obsidian 笔记模板
|
||||
|
||||
笔记文件位于:`~/Workspace/nexus/knowledgebase/`
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "音视频标题"
|
||||
type: transcription
|
||||
source-type: video
|
||||
category: "你的分类路径"
|
||||
tags:
|
||||
- 标签1
|
||||
- 标签2
|
||||
date-added: 2026-04-15
|
||||
audio-source: "/path/to/audio.mp3"
|
||||
transcript-source: "/path/to/transcript.txt"
|
||||
status: summarized # raw → summarized
|
||||
---
|
||||
|
||||
# 音视频标题
|
||||
|
||||
**Source:** /path/to/video.mp4
|
||||
**Type:** VIDEO | **Category:** 你的分类
|
||||
|
||||
**Status:** ✅ 已完成
|
||||
|
||||
---
|
||||
|
||||
## 摘要
|
||||
|
||||
> [Gemini Flash 生成的中文摘要]
|
||||
|
||||
---
|
||||
|
||||
## 关键概念
|
||||
|
||||
- **[概念名称]**: [一句话解释]
|
||||
|
||||
---
|
||||
|
||||
## 相关笔记
|
||||
|
||||
> [!info]+ 交叉引用
|
||||
> [[相关笔记标题]] — 关联原因
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、API Key 汇总
|
||||
|
||||
| 服务 | Key | 用途 | 状态 |
|
||||
|---|---|---|---|
|
||||
| Google Gemini | `AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0` | 摘要生成 | ✅ 有效 |
|
||||
|
||||
---
|
||||
|
||||
## 六、费用估算(100个视频,~3000分钟总时长)
|
||||
|
||||
| 阶段 | 工具 | 费用 |
|
||||
|---|---|---|
|
||||
| 音频提取 | FFmpeg | $0 |
|
||||
| 语音转录 | Whisper(本地) | $0 |
|
||||
| 摘要生成 | Gemini Flash | ~$0.15 |
|
||||
| **合计** | | **~$0.15** |
|
||||
|
||||
---
|
||||
|
||||
## 七、已知限制与注意事项
|
||||
|
||||
1. **Gemini API Key 必须有效**:每 24 小时检查一次 key 状态
|
||||
2. **Whisper 模型选择**:`tiny` 最快但精度低,`small` 平衡,`medium`/`large` Macmini 内存不够
|
||||
3. **音频质量**:Whisper 对音质敏感,背景噪音会导致转写质量下降
|
||||
4. **断点续传**:脚本使用 `.done` 文件记录已完成的文件,重启不会重复处理
|
||||
5. **文件命名**:文件名中的空格和特殊字符(`_`、`(`、`)`)需要正确处理
|
||||
6. **语言参数**:默认英文转录,如需其他语言可设置 `language="zh"` 或 `language="auto"`
|
||||
7. **视频源路径**:根据实际存放位置修改脚本中的 `SOURCE_DIR` 变量
|
||||
|
||||
---
|
||||
|
||||
## 八、快捷命令速查
|
||||
|
||||
```bash
|
||||
# 查看音频提取进度
|
||||
cat ~/.openclaw/temp/xingshu/logs/nas_audio_v3.log | tail -10
|
||||
|
||||
# 查看转录摘要进度
|
||||
tail -f ~/.openclaw/temp/xingshu/logs/nas_whisper_summarize.log
|
||||
|
||||
# 查看指定目录下 mp3 数量
|
||||
ssh shenwei@192.168.3.17 "ls '/path/to/videos/'*.mp3 2>/dev/null | wc -l"
|
||||
|
||||
# 查看已处理摘要数量
|
||||
cat ~/.openclaw/temp/xingshu/nas_whisper_summarize.done | wc -l
|
||||
|
||||
# 手动测试 Whisper
|
||||
python3 -c "import whisper; m=whisper.load_model('small'); print(m.transcribe('test.mp3')['text'][:100])"
|
||||
|
||||
# 手动测试 Gemini
|
||||
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=AIzaSyASNIlSc-YYP1dCqKCzk59e7MXSVrnHba0" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"contents":[{"parts":[{"text":"say hi in 3 words"}]}]}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 九、配置说明
|
||||
|
||||
### 修改视频源路径
|
||||
|
||||
编辑脚本 `nas_whisper_gemini_summarize.py`,找到以下变量并修改:
|
||||
|
||||
```python
|
||||
# NAS 源视频目录(包含 mp4 文件)
|
||||
SOURCE_DIR = "/volume2/work/Public Cloud Learning Sessions/"
|
||||
|
||||
# NAS 输出音频目录(mp3 输出位置)
|
||||
AUDIO_DIR = "/volume2/work/Public Cloud Learning Sessions/"
|
||||
|
||||
# Obsidian 笔记目录
|
||||
OBSIDIAN_NOTE_DIR = "/Users/weishen/Workspace/nexus/knowledgebase/"
|
||||
|
||||
# NAS SSH 连接信息
|
||||
NAS_HOST = "192.168.3.17"
|
||||
NAS_USER = "shenwei"
|
||||
```
|
||||
|
||||
### 适配不同视频源
|
||||
|
||||
```python
|
||||
# 方案1: 另一个 NAS 路径
|
||||
SOURCE_DIR = "/volume2/work/OtherVideos/"
|
||||
|
||||
# 方案2: 本地目录(无需 SSH)
|
||||
SOURCE_DIR = "/Users/weishen/Workspace/videos/"
|
||||
|
||||
# 方案3: 另一个服务器
|
||||
NAS_HOST = "192.168.3.45" # Ubuntu2
|
||||
```
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,303 +1,303 @@
|
||||
---
|
||||
title: AgentBase 项目设计文档
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# AgentBase 项目设计文档
|
||||
|
||||
> 设计日期:2026-04-05
|
||||
> 状态:已确认,等待实施
|
||||
> 项目位置:`~/Workspace/agentbase/`
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
**项目名**:agentbase
|
||||
**项目类型**:Django Web 应用 + 数据归档系统
|
||||
**核心目标**:遍历 OpenClaw Agent 的 session JSONL 文件,解析并保存到 MariaDB,通过 Django Admin 提供 Web 查询界面。
|
||||
|
||||
**使用场景**:
|
||||
- 查找某个 Agent 在某天的所有对话记录
|
||||
- 查找某个 Agent 在处理任务时调用了哪些工具(特别是 `exec` 命令的具体内容)
|
||||
- 查看 Agent 的思考过程(thinking block)
|
||||
- 增量解析,只解析未处理过的文件
|
||||
|
||||
---
|
||||
|
||||
## 2. 数据库设计
|
||||
|
||||
### 2.1 服务器与 Agent 映射
|
||||
|
||||
| 服务器 | 标识 | Session 根目录 |
|
||||
|--------|------|----------------|
|
||||
| Mac Mini | `macmini` | `/Users/weishen/.openclaw/agents/` |
|
||||
| Ubuntu1 | `ubuntu1` | `/home/shenwei/.openclaw/agents/` |
|
||||
| Ubuntu2 | `ubuntu2` | `/home/shenwei/.openclaw/agents/` |
|
||||
|
||||
**纳入的 Agent**:main、xingyao、xinghui、xingjiang、opencode、sisyphus
|
||||
|
||||
### 2.2 Session 文件类型
|
||||
|
||||
所有 `.jsonl` 文件均需解析,包括:
|
||||
- `*.jsonl` — 正常 session
|
||||
- `*.jsonl.reset.*` — reset 快照
|
||||
- `*.jsonl.deleted.*` — 删除快照
|
||||
- `*-topic-*.jsonl` — 线程 session
|
||||
|
||||
**session_type 判定**:
|
||||
- 文件名含 `.reset.` → `reset`
|
||||
- 文件名含 `.deleted.` → `deleted`
|
||||
- 文件名含 `-topic-` → `topic`
|
||||
- 默认 → `normal`
|
||||
|
||||
**忽略文件**:`sessions.json`(索引文件,不解析)
|
||||
|
||||
### 2.3 数据库 Schema
|
||||
|
||||
#### 表:`parsed_files`(增量解析控制)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `file_path` | VARCHAR(512) | 文件绝对路径 |
|
||||
| `file_mtime` | BIGINT | 文件最后修改时间(Unix timestamp) |
|
||||
| `file_size` | BIGINT | 文件大小(字节) |
|
||||
| `status` | VARCHAR(16) | `pending` / `success` / `failed` |
|
||||
| `parsed_at` | DATETIME | 本次解析时间 |
|
||||
| `error_message` | TEXT | 失败时的错误信息(可空) |
|
||||
|
||||
**UNIQUE 约束**:`UNIQUE(server, agent_id, file_path)`
|
||||
|
||||
**增量解析逻辑**:
|
||||
```sql
|
||||
-- 插入前检查:同一文件 + 同一修改时间是否已解析
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM parsed_files
|
||||
WHERE server=? AND agent_id=? AND file_path=?
|
||||
AND file_mtime=? AND file_size=? AND status='success'
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 表:`sessions`(Session 根记录)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `session_uuid` | VARCHAR(64) | Session UUID(文件名去掉.jsonl等后缀) |
|
||||
| `file_path` | VARCHAR(512) | 来源文件路径 |
|
||||
| `session_type` | VARCHAR(32) | `normal` / `topic` / `reset` / `deleted` |
|
||||
| `cwd` | VARCHAR(512) | Session 工作目录 |
|
||||
| `started_at` | DATETIME(3) | Session 开始时间 |
|
||||
| `first_message_at` | DATETIME(3) | 第一条 message 的时间 |
|
||||
| `last_message_at` | DATETIME(3) | 最后一条 message 的时间 |
|
||||
| `message_count` | INT | 该 session 的消息总数 |
|
||||
| `created_at` | DATETIME | 记录创建时间 |
|
||||
|
||||
**UNIQUE 约束**:`UNIQUE(session_uuid, server, agent_id)`
|
||||
|
||||
---
|
||||
|
||||
#### 表:`messages`(消息内容)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `session_id` | INT | 外键 → `sessions.id` |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `session_uuid` | VARCHAR(64) | Session UUID(冗余存储,方便查询) |
|
||||
| `message_uuid` | VARCHAR(64) | 消息 UUID |
|
||||
| `parent_message_uuid` | VARCHAR(64) | 父消息 UUID(可空) |
|
||||
| `role` | VARCHAR(32) | `user` / `assistant` / `toolResult` |
|
||||
| `content_blocks` | JSON | **原文**:原始 content 数组 |
|
||||
| `text_preview` | VARCHAR(512) | 纯文本摘要(前512字符) |
|
||||
| `first_tool_call` | VARCHAR(128) | 首个 toolCall name |
|
||||
| `tool_call_count` | INT | toolCall 总数 |
|
||||
| `tool_calls_json` | JSON | **拆出**:所有 toolCall block |
|
||||
| `thinking_text` | TEXT | **拆出**:首个 thinking block |
|
||||
| `has_thinking` | TINYINT | 是否有 thinking(0/1) |
|
||||
| `has_tool_calls` | TINYINT | 是否有 toolCall(0/1) |
|
||||
| `is_error` | TINYINT | 整条 isError 标记 |
|
||||
| `provider` | VARCHAR(64) | AI Provider(如 `minimax`) |
|
||||
| `model` | VARCHAR(128) | 模型 ID(如 `MiniMax-M2.7`) |
|
||||
| `api` | VARCHAR(64) | API 类型(如 `anthropic-messages`) |
|
||||
| `stop_reason` | VARCHAR(64) | stopReason 字段值 |
|
||||
| `input_tokens` | INT | 输入 token 数 |
|
||||
| `output_tokens` | INT | 输出 token 数 |
|
||||
| `cache_read_tokens` | BIGINT | Cache Read token 数 |
|
||||
| `cache_write_tokens` | BIGINT | Cache Write token 数 |
|
||||
| `total_tokens` | INT | 总 token 数 |
|
||||
| `cost_usd` | DECIMAL(12,8) | 该消息美元成本 |
|
||||
| `timestamp` | DATETIME(3) | 消息时间 |
|
||||
| `created_at` | DATETIME | 记录创建时间 |
|
||||
|
||||
**Indexes**:
|
||||
- `idx_agent_timestamp` ON `(agent_id, timestamp)`
|
||||
- `idx_first_tool_call` ON `(first_tool_call)`
|
||||
- `idx_session_id` ON `(session_id)`
|
||||
- `idx_role` ON `(role)`
|
||||
|
||||
### 2.4 Content Block 结构
|
||||
|
||||
JSONL 中每条 `type:message` 的 `content` 数组,成员类型如下:
|
||||
|
||||
| content block type | 关键提取字段 |
|
||||
|---|---|
|
||||
| `text` | `text`(纯文本) |
|
||||
| `thinking` | `thinking`、`thinkingSignature` |
|
||||
| `toolCall` | `id`、`name`、`arguments`(JSON 对象) |
|
||||
| `toolResult` | `toolCallId`、`toolName`、`content`、`isError`、`details` |
|
||||
|
||||
**`text_preview` 提取规则**(按优先级):
|
||||
1. 找第一个 `type=="text"` 的 `text` 字段
|
||||
2. 找不到则找 `type=="thinking"` 的 `thinking` 字段
|
||||
3. 再找不到则找 `type=="toolResult"` 里第一个子 `type=="text"` 的内容
|
||||
4. 截取前512字符,strip HTML
|
||||
|
||||
---
|
||||
|
||||
## 3. Django 项目结构
|
||||
|
||||
```
|
||||
~/Workspace/agentbase/ # Git 仓库根目录
|
||||
├── manage.py
|
||||
├── agentbase/ # Django 项目
|
||||
│ ├── __init__.py
|
||||
│ ├── settings.py # 数据库配置在此
|
||||
│ ├── urls.py
|
||||
│ └── wsgi.py
|
||||
├── messages/ # Django App
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py # ParsedFile / Session / Message
|
||||
│ ├── admin.py # Django Admin 配置
|
||||
│ ├── views.py # Web 查询视图
|
||||
│ ├── urls.py
|
||||
│ ├── management/
|
||||
│ │ └── commands/
|
||||
│ │ └── parse_sessions.py # Django command
|
||||
│ └── templates/
|
||||
│ └── messages/
|
||||
│ └── message_list.html
|
||||
├── scripts/
|
||||
│ └── parse_and_import.py # OpenClaw cron 调用的入口脚本
|
||||
├── tests/
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
### 3.1 配置方式
|
||||
|
||||
数据库连接信息直接写在 `settings.py` 中(不使用 config.yaml)。
|
||||
|
||||
### 3.2 Django Admin 预期界面
|
||||
|
||||
**Messages(核心)**:
|
||||
- 列表页支持按 `agent_id` + `timestamp` 范围 + `role` + `first_tool_call` 过滤
|
||||
- 详情页展示 `content_blocks` 原始 JSON、`thinking_text`、`tool_calls_json`
|
||||
- 适合查找"某 Agent 某天所有对话和思考记录"
|
||||
|
||||
**Sessions**:列出所有 session,按 server/agent_id 过滤。
|
||||
|
||||
**ParsedFiles**:记录已解析文件列表,支持按 server/agent_id/file_path 搜索。
|
||||
|
||||
---
|
||||
|
||||
## 4. 解析脚本设计
|
||||
|
||||
### 4.1 入口脚本
|
||||
|
||||
`scripts/parse_and_import.py`
|
||||
|
||||
- OpenClaw cron 任务调用此脚本
|
||||
- 也可单独运行:`python parse_and_import.py --server macmini --agent xingyao`
|
||||
- 内部加载 Django settings,调用 Django ORM 写入数据库
|
||||
|
||||
### 4.2 解析流程
|
||||
|
||||
1. **遍历服务器和 Agent**:macmini/ubuntu1/ubuntu2 → 各 Agent 目录
|
||||
2. **扫描 session 文件**:列出所有 `.jsonl` 文件,跳过 `sessions.json`
|
||||
3. **增量检查**:查 `parsed_files` 表,文件 mtime+size 未变且 status=success → 跳过
|
||||
4. **解析 JSONL**:逐行读取,提取 `type:session` 和 `type:message` 记录
|
||||
5. **写入数据库**:先插入 `sessions`,再批量插入 `messages`
|
||||
6. **更新 `parsed_files`**:status=success 或 status=failed + error_message
|
||||
|
||||
### 4.3 OpenClaw Cron 任务
|
||||
|
||||
每日 00:05 执行(00:00 照片整理完成后错开):
|
||||
|
||||
```bash
|
||||
python ~/Workspace/agentbase/scripts/parse_and_import.py
|
||||
```
|
||||
|
||||
- 在 Mac Mini 上执行
|
||||
- 通过 SSH 访问 ubuntu1/2 的 session 目录
|
||||
- 所有 Agent 的当日新 session 均被解析
|
||||
|
||||
---
|
||||
|
||||
## 5. 典型查询示例
|
||||
|
||||
### 5.1 查某 Agent 某天的所有消息
|
||||
|
||||
```sql
|
||||
SELECT id, message_uuid, role, text_preview, has_thinking, first_tool_call, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
ORDER BY timestamp;
|
||||
```
|
||||
|
||||
### 5.2 查某 Agent 某天调用过的所有 exec 命令
|
||||
|
||||
```sql
|
||||
SELECT m.id, m.timestamp, m.text_preview, m.session_uuid
|
||||
FROM messages m
|
||||
WHERE m.agent_id = 'xingyao'
|
||||
AND m.timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
AND m.first_tool_call = 'exec';
|
||||
```
|
||||
|
||||
### 5.3 查包含 thinking 的消息
|
||||
|
||||
```sql
|
||||
SELECT id, message_uuid, role, thinking_text, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND has_thinking = 1
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 待确认事项
|
||||
|
||||
- [ ] 数据库类型:MariaDB?SQLite(开发测试用)?
|
||||
- [ ] 数据库名称
|
||||
- [ ] MariaDB 部署在哪里?(Mac Mini 本地?NAS?独立服务器?)
|
||||
|
||||
---
|
||||
|
||||
## 7. 后续步骤
|
||||
|
||||
1. 用户确认以上设计
|
||||
2. 创建 Git 仓库 `~/Workspace/agentbase/`
|
||||
3. 初始化 Django 项目
|
||||
4. 编写 `settings.py`(含数据库配置)
|
||||
5. 编写 `messages/models.py`(三张表)
|
||||
6. 编写 `messages/admin.py`(Admin 配置)
|
||||
7. 编写 `scripts/parse_and_import.py`(解析入口)
|
||||
8. 编写 `messages/management/commands/parse_sessions.py`(Django command)
|
||||
9. 配置 Django Admin 模板(message_list.html)
|
||||
10. 创建 OpenClaw cron 任务
|
||||
---
|
||||
title: AgentBase 项目设计文档
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# AgentBase 项目设计文档
|
||||
|
||||
> 设计日期:2026-04-05
|
||||
> 状态:已确认,等待实施
|
||||
> 项目位置:`~/Workspace/agentbase/`
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
**项目名**:agentbase
|
||||
**项目类型**:Django Web 应用 + 数据归档系统
|
||||
**核心目标**:遍历 OpenClaw Agent 的 session JSONL 文件,解析并保存到 MariaDB,通过 Django Admin 提供 Web 查询界面。
|
||||
|
||||
**使用场景**:
|
||||
- 查找某个 Agent 在某天的所有对话记录
|
||||
- 查找某个 Agent 在处理任务时调用了哪些工具(特别是 `exec` 命令的具体内容)
|
||||
- 查看 Agent 的思考过程(thinking block)
|
||||
- 增量解析,只解析未处理过的文件
|
||||
|
||||
---
|
||||
|
||||
## 2. 数据库设计
|
||||
|
||||
### 2.1 服务器与 Agent 映射
|
||||
|
||||
| 服务器 | 标识 | Session 根目录 |
|
||||
|--------|------|----------------|
|
||||
| Mac Mini | `macmini` | `/Users/weishen/.openclaw/agents/` |
|
||||
| Ubuntu1 | `ubuntu1` | `/home/shenwei/.openclaw/agents/` |
|
||||
| Ubuntu2 | `ubuntu2` | `/home/shenwei/.openclaw/agents/` |
|
||||
|
||||
**纳入的 Agent**:main、xingyao、xinghui、xingjiang、opencode、sisyphus
|
||||
|
||||
### 2.2 Session 文件类型
|
||||
|
||||
所有 `.jsonl` 文件均需解析,包括:
|
||||
- `*.jsonl` — 正常 session
|
||||
- `*.jsonl.reset.*` — reset 快照
|
||||
- `*.jsonl.deleted.*` — 删除快照
|
||||
- `*-topic-*.jsonl` — 线程 session
|
||||
|
||||
**session_type 判定**:
|
||||
- 文件名含 `.reset.` → `reset`
|
||||
- 文件名含 `.deleted.` → `deleted`
|
||||
- 文件名含 `-topic-` → `topic`
|
||||
- 默认 → `normal`
|
||||
|
||||
**忽略文件**:`sessions.json`(索引文件,不解析)
|
||||
|
||||
### 2.3 数据库 Schema
|
||||
|
||||
#### 表:`parsed_files`(增量解析控制)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `file_path` | VARCHAR(512) | 文件绝对路径 |
|
||||
| `file_mtime` | BIGINT | 文件最后修改时间(Unix timestamp) |
|
||||
| `file_size` | BIGINT | 文件大小(字节) |
|
||||
| `status` | VARCHAR(16) | `pending` / `success` / `failed` |
|
||||
| `parsed_at` | DATETIME | 本次解析时间 |
|
||||
| `error_message` | TEXT | 失败时的错误信息(可空) |
|
||||
|
||||
**UNIQUE 约束**:`UNIQUE(server, agent_id, file_path)`
|
||||
|
||||
**增量解析逻辑**:
|
||||
```sql
|
||||
-- 插入前检查:同一文件 + 同一修改时间是否已解析
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM parsed_files
|
||||
WHERE server=? AND agent_id=? AND file_path=?
|
||||
AND file_mtime=? AND file_size=? AND status='success'
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 表:`sessions`(Session 根记录)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `session_uuid` | VARCHAR(64) | Session UUID(文件名去掉.jsonl等后缀) |
|
||||
| `file_path` | VARCHAR(512) | 来源文件路径 |
|
||||
| `session_type` | VARCHAR(32) | `normal` / `topic` / `reset` / `deleted` |
|
||||
| `cwd` | VARCHAR(512) | Session 工作目录 |
|
||||
| `started_at` | DATETIME(3) | Session 开始时间 |
|
||||
| `first_message_at` | DATETIME(3) | 第一条 message 的时间 |
|
||||
| `last_message_at` | DATETIME(3) | 最后一条 message 的时间 |
|
||||
| `message_count` | INT | 该 session 的消息总数 |
|
||||
| `created_at` | DATETIME | 记录创建时间 |
|
||||
|
||||
**UNIQUE 约束**:`UNIQUE(session_uuid, server, agent_id)`
|
||||
|
||||
---
|
||||
|
||||
#### 表:`messages`(消息内容)
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `id` | INT AUTO_INCREMENT | 主键 |
|
||||
| `session_id` | INT | 外键 → `sessions.id` |
|
||||
| `server` | VARCHAR(32) | 服务器标识 |
|
||||
| `agent_id` | VARCHAR(64) | Agent ID |
|
||||
| `session_uuid` | VARCHAR(64) | Session UUID(冗余存储,方便查询) |
|
||||
| `message_uuid` | VARCHAR(64) | 消息 UUID |
|
||||
| `parent_message_uuid` | VARCHAR(64) | 父消息 UUID(可空) |
|
||||
| `role` | VARCHAR(32) | `user` / `assistant` / `toolResult` |
|
||||
| `content_blocks` | JSON | **原文**:原始 content 数组 |
|
||||
| `text_preview` | VARCHAR(512) | 纯文本摘要(前512字符) |
|
||||
| `first_tool_call` | VARCHAR(128) | 首个 toolCall name |
|
||||
| `tool_call_count` | INT | toolCall 总数 |
|
||||
| `tool_calls_json` | JSON | **拆出**:所有 toolCall block |
|
||||
| `thinking_text` | TEXT | **拆出**:首个 thinking block |
|
||||
| `has_thinking` | TINYINT | 是否有 thinking(0/1) |
|
||||
| `has_tool_calls` | TINYINT | 是否有 toolCall(0/1) |
|
||||
| `is_error` | TINYINT | 整条 isError 标记 |
|
||||
| `provider` | VARCHAR(64) | AI Provider(如 `minimax`) |
|
||||
| `model` | VARCHAR(128) | 模型 ID(如 `MiniMax-M2.7`) |
|
||||
| `api` | VARCHAR(64) | API 类型(如 `anthropic-messages`) |
|
||||
| `stop_reason` | VARCHAR(64) | stopReason 字段值 |
|
||||
| `input_tokens` | INT | 输入 token 数 |
|
||||
| `output_tokens` | INT | 输出 token 数 |
|
||||
| `cache_read_tokens` | BIGINT | Cache Read token 数 |
|
||||
| `cache_write_tokens` | BIGINT | Cache Write token 数 |
|
||||
| `total_tokens` | INT | 总 token 数 |
|
||||
| `cost_usd` | DECIMAL(12,8) | 该消息美元成本 |
|
||||
| `timestamp` | DATETIME(3) | 消息时间 |
|
||||
| `created_at` | DATETIME | 记录创建时间 |
|
||||
|
||||
**Indexes**:
|
||||
- `idx_agent_timestamp` ON `(agent_id, timestamp)`
|
||||
- `idx_first_tool_call` ON `(first_tool_call)`
|
||||
- `idx_session_id` ON `(session_id)`
|
||||
- `idx_role` ON `(role)`
|
||||
|
||||
### 2.4 Content Block 结构
|
||||
|
||||
JSONL 中每条 `type:message` 的 `content` 数组,成员类型如下:
|
||||
|
||||
| content block type | 关键提取字段 |
|
||||
|---|---|
|
||||
| `text` | `text`(纯文本) |
|
||||
| `thinking` | `thinking`、`thinkingSignature` |
|
||||
| `toolCall` | `id`、`name`、`arguments`(JSON 对象) |
|
||||
| `toolResult` | `toolCallId`、`toolName`、`content`、`isError`、`details` |
|
||||
|
||||
**`text_preview` 提取规则**(按优先级):
|
||||
1. 找第一个 `type=="text"` 的 `text` 字段
|
||||
2. 找不到则找 `type=="thinking"` 的 `thinking` 字段
|
||||
3. 再找不到则找 `type=="toolResult"` 里第一个子 `type=="text"` 的内容
|
||||
4. 截取前512字符,strip HTML
|
||||
|
||||
---
|
||||
|
||||
## 3. Django 项目结构
|
||||
|
||||
```
|
||||
~/Workspace/agentbase/ # Git 仓库根目录
|
||||
├── manage.py
|
||||
├── agentbase/ # Django 项目
|
||||
│ ├── __init__.py
|
||||
│ ├── settings.py # 数据库配置在此
|
||||
│ ├── urls.py
|
||||
│ └── wsgi.py
|
||||
├── messages/ # Django App
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py # ParsedFile / Session / Message
|
||||
│ ├── admin.py # Django Admin 配置
|
||||
│ ├── views.py # Web 查询视图
|
||||
│ ├── urls.py
|
||||
│ ├── management/
|
||||
│ │ └── commands/
|
||||
│ │ └── parse_sessions.py # Django command
|
||||
│ └── templates/
|
||||
│ └── messages/
|
||||
│ └── message_list.html
|
||||
├── scripts/
|
||||
│ └── parse_and_import.py # OpenClaw cron 调用的入口脚本
|
||||
├── tests/
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
### 3.1 配置方式
|
||||
|
||||
数据库连接信息直接写在 `settings.py` 中(不使用 config.yaml)。
|
||||
|
||||
### 3.2 Django Admin 预期界面
|
||||
|
||||
**Messages(核心)**:
|
||||
- 列表页支持按 `agent_id` + `timestamp` 范围 + `role` + `first_tool_call` 过滤
|
||||
- 详情页展示 `content_blocks` 原始 JSON、`thinking_text`、`tool_calls_json`
|
||||
- 适合查找"某 Agent 某天所有对话和思考记录"
|
||||
|
||||
**Sessions**:列出所有 session,按 server/agent_id 过滤。
|
||||
|
||||
**ParsedFiles**:记录已解析文件列表,支持按 server/agent_id/file_path 搜索。
|
||||
|
||||
---
|
||||
|
||||
## 4. 解析脚本设计
|
||||
|
||||
### 4.1 入口脚本
|
||||
|
||||
`scripts/parse_and_import.py`
|
||||
|
||||
- OpenClaw cron 任务调用此脚本
|
||||
- 也可单独运行:`python parse_and_import.py --server macmini --agent xingyao`
|
||||
- 内部加载 Django settings,调用 Django ORM 写入数据库
|
||||
|
||||
### 4.2 解析流程
|
||||
|
||||
1. **遍历服务器和 Agent**:macmini/ubuntu1/ubuntu2 → 各 Agent 目录
|
||||
2. **扫描 session 文件**:列出所有 `.jsonl` 文件,跳过 `sessions.json`
|
||||
3. **增量检查**:查 `parsed_files` 表,文件 mtime+size 未变且 status=success → 跳过
|
||||
4. **解析 JSONL**:逐行读取,提取 `type:session` 和 `type:message` 记录
|
||||
5. **写入数据库**:先插入 `sessions`,再批量插入 `messages`
|
||||
6. **更新 `parsed_files`**:status=success 或 status=failed + error_message
|
||||
|
||||
### 4.3 OpenClaw Cron 任务
|
||||
|
||||
每日 00:05 执行(00:00 照片整理完成后错开):
|
||||
|
||||
```bash
|
||||
python ~/Workspace/agentbase/scripts/parse_and_import.py
|
||||
```
|
||||
|
||||
- 在 Mac Mini 上执行
|
||||
- 通过 SSH 访问 ubuntu1/2 的 session 目录
|
||||
- 所有 Agent 的当日新 session 均被解析
|
||||
|
||||
---
|
||||
|
||||
## 5. 典型查询示例
|
||||
|
||||
### 5.1 查某 Agent 某天的所有消息
|
||||
|
||||
```sql
|
||||
SELECT id, message_uuid, role, text_preview, has_thinking, first_tool_call, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
ORDER BY timestamp;
|
||||
```
|
||||
|
||||
### 5.2 查某 Agent 某天调用过的所有 exec 命令
|
||||
|
||||
```sql
|
||||
SELECT m.id, m.timestamp, m.text_preview, m.session_uuid
|
||||
FROM messages m
|
||||
WHERE m.agent_id = 'xingyao'
|
||||
AND m.timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
AND m.first_tool_call = 'exec';
|
||||
```
|
||||
|
||||
### 5.3 查包含 thinking 的消息
|
||||
|
||||
```sql
|
||||
SELECT id, message_uuid, role, thinking_text, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND has_thinking = 1
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 待确认事项
|
||||
|
||||
- [ ] 数据库类型:MariaDB?SQLite(开发测试用)?
|
||||
- [ ] 数据库名称
|
||||
- [ ] MariaDB 部署在哪里?(Mac Mini 本地?NAS?独立服务器?)
|
||||
|
||||
---
|
||||
|
||||
## 7. 后续步骤
|
||||
|
||||
1. 用户确认以上设计
|
||||
2. 创建 Git 仓库 `~/Workspace/agentbase/`
|
||||
3. 初始化 Django 项目
|
||||
4. 编写 `settings.py`(含数据库配置)
|
||||
5. 编写 `messages/models.py`(三张表)
|
||||
6. 编写 `messages/admin.py`(Admin 配置)
|
||||
7. 编写 `scripts/parse_and_import.py`(解析入口)
|
||||
8. 编写 `messages/management/commands/parse_sessions.py`(Django command)
|
||||
9. 配置 Django Admin 模板(message_list.html)
|
||||
10. 创建 OpenClaw cron 任务
|
||||
|
||||
@@ -1,377 +1,377 @@
|
||||
---
|
||||
title: AgentBase 项目需求文档
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# AgentBase 项目需求文档
|
||||
|
||||
> **文档版本:** v1.0
|
||||
> **创建日期:** 2026-04-05
|
||||
> **状态:** 待确认
|
||||
> **负责人:** 星枢(协调)、星匠(实施)
|
||||
> **项目位置:** `~/Workspace/agentbase/`
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
### 1.1 项目背景
|
||||
|
||||
OpenClaw 多 Agent 系统运行于 Mac Mini + Ubuntu1 + Ubuntu2 三节点,每日产生大量 session JSONL 文件。这些文件记录了 Agent 的完整对话历史、思考过程(thinking block)、工具调用(toolCall)等核心数据,目前以原始文件形式散落各服务器,无法高效查询。
|
||||
|
||||
### 1.2 项目目标
|
||||
|
||||
构建一套 **session 解析与归档系统**,将多节点、多 Agent 的 JSONL 会话数据统一解析入库,通过 Django Admin 提供可查询、可筛选、可追溯的 Web 管理界面。
|
||||
|
||||
### 1.3 核心价值
|
||||
|
||||
- **可审计**:任何 Agent 的任何操作记录可追溯
|
||||
- **可分析**:按 Agent/时间/工具类型等多维度分析 Agent 行为模式
|
||||
- **可优化**:基于真实调用数据优化 Agent 工作流
|
||||
|
||||
---
|
||||
|
||||
## 2. 系统范围
|
||||
|
||||
### 2.1 纳入系统
|
||||
|
||||
| 服务器 | 角色 | Session 根目录 |
|
||||
|---|---|---|
|
||||
| Mac Mini (`macmini`) | 中央控制节点 | `/Users/weishen/.openclaw/agents/` |
|
||||
| Ubuntu1 (`ubuntu1`) | 准生产服务器 | `/home/shenwei/.openclaw/agents/` |
|
||||
| Ubuntu2 (`ubuntu2`) | 开发服务器 | `/home/shenwei/.openclaw/agents/` |
|
||||
|
||||
### 2.2 纳入 Agent
|
||||
|
||||
`main` / `xingyao` / `xinghui` / `xingjiang` / `opencode` / `sisyphus`
|
||||
|
||||
### 2.3 纳入文件类型
|
||||
|
||||
| 文件类型 | 说明 | session_type |
|
||||
|---|---|---|
|
||||
| `*.jsonl` | 正常 session | `normal` |
|
||||
| `*.jsonl.reset.*` | reset 快照 | `reset` |
|
||||
| `*.jsonl.deleted.*` | 删除快照 | `deleted` |
|
||||
| `*-topic-*.jsonl` | 线程 session | `topic` |
|
||||
|
||||
**忽略文件:** `sessions.json`(索引文件,不解析)
|
||||
|
||||
### 2.4 不纳入范围
|
||||
|
||||
- 实时流式解析(仅定时批量解析)
|
||||
- Session 内容编辑(只读归档)
|
||||
- 跨 Agent 关联分析(v1)
|
||||
- 移动端界面
|
||||
|
||||
---
|
||||
|
||||
## 3. 功能需求
|
||||
|
||||
### 3.1 增量解析引擎
|
||||
|
||||
**FR-PARSE-001**:扫描指定服务器 + Agent 的 session 目录,列出所有符合类型的 `.jsonl` 文件
|
||||
|
||||
**FR-PARSE-002**:对每个文件,基于「文件路径 + mtime + size」判断是否需要重新解析(增量控制)
|
||||
|
||||
**FR-PARSE-003**:解析 JSONL 中的 `type:session` 记录,提取 session 元信息
|
||||
|
||||
**FR-PARSE-004**:解析 JSONL 中的 `type:message` 记录,提取消息内容、toolCall、thinking 等
|
||||
|
||||
**FR-PARSE-005**:对于 `type:message`,正确提取并拆分以下 content block 类型:
|
||||
- `text` → 纯文本内容
|
||||
- `thinking` → 思考过程文本
|
||||
- `toolCall` → 工具调用(id/name/arguments)
|
||||
- `toolResult` → 工具结果(toolCallId/content/isError)
|
||||
|
||||
**FR-PARSE-006**:从 message 元数据中提取 AI Provider / Model / API / Token 计数 / Cost 等计量数据
|
||||
|
||||
**FR-PARSE-007**:解析完成后更新 `parsed_files` 表(status: success / failed + error_message)
|
||||
|
||||
### 3.2 数据库存储
|
||||
|
||||
**FR-DB-001**:维护 `parsed_files` 表,记录每个已解析文件的元数据及状态(增量控制)
|
||||
|
||||
**FR-DB-002**:维护 `sessions` 表,记录每个 session 的根信息(去重依据:session_uuid + server + agent_id)
|
||||
|
||||
**FR-DB-003**:维护 `messages` 表,记录每条消息的完整信息(content 原文、提取字段、计量数据)
|
||||
|
||||
**FR-DB-004**:对 `messages` 表建立合理索引,支持按 `agent_id + timestamp` / `first_tool_call` / `session_id` 等常用查询模式
|
||||
|
||||
### 3.3 Django Admin 管理界面
|
||||
|
||||
**FR-ADMIN-001**:`Messages` 列表页
|
||||
- 支持按 `agent_id` 过滤
|
||||
- 支持按 `timestamp` 范围过滤
|
||||
- 支持按 `role`(user/assistant/toolResult)过滤
|
||||
- 支持按 `first_tool_call` 过滤
|
||||
- 列表显示:id / message_uuid / role / text_preview(前100字符) / has_thinking / first_tool_call / timestamp
|
||||
|
||||
**FR-ADMIN-002**:`Messages` 详情页
|
||||
- 显示 `content_blocks` 原始 JSON(完整展开)
|
||||
- 显示 `thinking_text`(独立字段)
|
||||
- 显示 `tool_calls_json`(完整展开)
|
||||
- 显示所有计量字段(tokens / cost)
|
||||
|
||||
**FR-ADMIN-003**:`Sessions` 列表页
|
||||
- 支持按 `server` / `agent_id` / `session_type` 过滤
|
||||
- 列表显示:id / session_uuid / server / agent_id / session_type / started_at / message_count
|
||||
|
||||
**FR-ADMIN-004**:`ParsedFiles` 列表页
|
||||
- 支持按 `server` / `agent_id` / `status` 过滤
|
||||
- 列表显示:id / server / agent_id / file_path / file_mtime / status / parsed_at
|
||||
- 支持按 `file_path` 搜索
|
||||
|
||||
### 3.4 定时任务
|
||||
|
||||
**FR-CRON-001**:每日定时(建议 00:05)自动执行解析任务,覆盖所有服务器、所有 Agent
|
||||
|
||||
**FR-CRON-002**:解析任务通过 OpenClaw cron 触发,调用 `scripts/parse_and_import.py`
|
||||
|
||||
### 3.5 命令行接口
|
||||
|
||||
**FR-CLI-001**:支持按服务器 + Agent 指定解析范围
|
||||
```
|
||||
python parse_and_import.py --server macmini --agent xingyao
|
||||
```
|
||||
|
||||
**FR-CLI-002**:支持 `--dry-run` 参数,仅扫描文件不写入数据库
|
||||
|
||||
**FR-CLI-003**:支持 `--force` 参数,强制重新解析(忽略增量状态)
|
||||
|
||||
---
|
||||
|
||||
## 4. 非功能需求
|
||||
|
||||
### 4.1 性能
|
||||
|
||||
**NFR-PERF-001**:单次解析 10,000 条消息应在 60 秒内完成
|
||||
|
||||
**NFR-PERF-002**:Django Admin 列表页加载时间不超过 3 秒(百万级数据量下)
|
||||
|
||||
**NFR-PERF-003**:JSONL 文件逐行解析,不一次性加载到内存(流式处理)
|
||||
|
||||
### 4.2 可靠性
|
||||
|
||||
**NFR-RELI-001**:解析失败的文件需记录 `error_message`,不影响同批次其他文件
|
||||
|
||||
**NFR-RELI-002**:数据库操作使用事务保证一致性(单文件解析失败回滚)
|
||||
|
||||
**NFR-RELI-003**:重复解析同一文件(mtime+size 未变)应被跳过,不重复写入
|
||||
|
||||
### 4.3 可维护性
|
||||
|
||||
**NFR-MAIN-001**:数据库 schema 变更通过 Django Migration 管理
|
||||
|
||||
**NFR-MAIN-002**:配置信息(数据库连接)集中写在 `settings.py`,不散落多处
|
||||
|
||||
### 4.4 安全性
|
||||
|
||||
**NFR-SEC-001**:数据库凭据不硬编码在代码中,通过环境变量或 Django settings 管理
|
||||
|
||||
**NFR-SEC-002**:Django Admin 仅本地访问(暂不开放远程)
|
||||
|
||||
---
|
||||
|
||||
## 5. 数据库 Schema(摘要)
|
||||
|
||||
### 5.1 `parsed_files`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| file_path | VARCHAR(512) | NOT NULL |
|
||||
| file_mtime | BIGINT | NOT NULL |
|
||||
| file_size | BIGINT | NOT NULL |
|
||||
| status | VARCHAR(16) | NOT NULL |
|
||||
| parsed_at | DATETIME | |
|
||||
| error_message | TEXT | NULL |
|
||||
|
||||
**UNIQUE**:`(server, agent_id, file_path)`
|
||||
|
||||
### 5.2 `sessions`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| session_uuid | VARCHAR(64) | NOT NULL |
|
||||
| file_path | VARCHAR(512) | |
|
||||
| session_type | VARCHAR(32) | |
|
||||
| cwd | VARCHAR(512) | |
|
||||
| started_at | DATETIME(3) | |
|
||||
| first_message_at | DATETIME(3) | |
|
||||
| last_message_at | DATETIME(3) | |
|
||||
| message_count | INT | DEFAULT 0 |
|
||||
|
||||
**UNIQUE**:`(session_uuid, server, agent_id)`
|
||||
|
||||
### 5.3 `messages`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| session_id | INT | FK → sessions.id |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| session_uuid | VARCHAR(64) | NOT NULL |
|
||||
| message_uuid | VARCHAR(64) | NOT NULL |
|
||||
| parent_message_uuid | VARCHAR(64) | NULL |
|
||||
| role | VARCHAR(32) | NOT NULL |
|
||||
| content_blocks | JSON | 原文 |
|
||||
| text_preview | VARCHAR(512) | 摘要 |
|
||||
| first_tool_call | VARCHAR(128) | NULL |
|
||||
| tool_call_count | INT | DEFAULT 0 |
|
||||
| tool_calls_json | JSON | 拆出 |
|
||||
| thinking_text | TEXT | NULL |
|
||||
| has_thinking | TINYINT | DEFAULT 0 |
|
||||
| has_tool_calls | TINYINT | DEFAULT 0 |
|
||||
| is_error | TINYINT | DEFAULT 0 |
|
||||
| provider | VARCHAR(64) | |
|
||||
| model | VARCHAR(128) | |
|
||||
| api | VARCHAR(64) | |
|
||||
| stop_reason | VARCHAR(64) | |
|
||||
| input_tokens | INT | |
|
||||
| output_tokens | INT | |
|
||||
| cache_read_tokens | BIGINT | |
|
||||
| cache_write_tokens | BIGINT | |
|
||||
| total_tokens | INT | |
|
||||
| cost_usd | DECIMAL(12,8) | |
|
||||
| timestamp | DATETIME(3) | |
|
||||
|
||||
**INDEX**:`idx_agent_timestamp(agent_id, timestamp)` / `idx_first_tool_call(first_tool_call)` / `idx_session_id(session_id)` / `idx_role(role)`
|
||||
|
||||
---
|
||||
|
||||
## 6. 用户故事
|
||||
|
||||
### US-001:查询某 Agent 某天的完整对话
|
||||
|
||||
> 作为管理员,我想查看「星曜 2026-04-05 所有消息」,以便审计当天的操作记录
|
||||
|
||||
**验收标准:**
|
||||
- 在 Messages 列表页输入 agent_id = xingyao + date range = 2026-04-05
|
||||
- 返回结果按 timestamp 升序排列
|
||||
- 每条结果显示 text_preview / has_thinking / first_tool_call
|
||||
|
||||
### US-002:查看某条消息的完整思考过程
|
||||
|
||||
> 作为管理员,我想查看某条消息的 thinking_text 和 toolCalls 详情,以便分析 Agent 的决策逻辑
|
||||
|
||||
**验收标准:**
|
||||
- 点击任意消息进入详情页
|
||||
- thinking_text 字段完整展示(无截断)
|
||||
- tool_calls_json 以可读 JSON 格式展示
|
||||
|
||||
### US-003:查询某天某 Agent 执行过的所有 exec 命令
|
||||
|
||||
> 作为管理员,我想查看「星曜今天执行了哪些 exec 命令」,以便审计系统操作
|
||||
|
||||
**验收标准:**
|
||||
- 过滤器 first_tool_call = exec + agent_id = xingyao + date range
|
||||
- 返回结果包含每条 exec 的 text_preview(截取前512字符)
|
||||
|
||||
### US-004:追踪已解析文件状态
|
||||
|
||||
> 作为管理员,我想查看哪些 session 文件已成功解析、哪些失败,以便监控数据入库情况
|
||||
|
||||
**验收标准:**
|
||||
- ParsedFiles 列表页显示所有文件的解析状态
|
||||
- 失败条目显示 error_message
|
||||
- 可按 server / agent_id / status 过滤
|
||||
|
||||
### US-005:增量同步最新 session
|
||||
|
||||
> 作为系统,我需要在每日定时任务中自动解析新增的 session 文件,不重复解析已入库且未变化的文件
|
||||
|
||||
**验收标准:**
|
||||
- 同一文件 mtime+size 未变时,parsed_files 中 status=success 的记录被识别为「已解析」
|
||||
- 新增文件或变化文件被正确解析入库
|
||||
|
||||
---
|
||||
|
||||
## 7. 技术选型
|
||||
|
||||
| 组件 | 选型 | 说明 |
|
||||
|---|---|---|
|
||||
| Web 框架 | Django 4.x | 成熟稳定,Admin 功能强大 |
|
||||
| 数据库 | MariaDB | 与 NAS/现有基础设施兼容 |
|
||||
| Python 版本 | 3.10+ | OpenClaw 生态兼容 |
|
||||
| 部署位置 | Mac Mini | 与 OpenClaw 同节点,SSH 访问 ubuntu1/2 |
|
||||
| ORM | Django ORM | 与 Django 深度集成 |
|
||||
| 定时任务 | OpenClaw cron | 与现有任务系统统一 |
|
||||
|
||||
---
|
||||
|
||||
## 8. 项目目录结构
|
||||
|
||||
```
|
||||
~/Workspace/agentbase/ # Git 仓库
|
||||
├── manage.py
|
||||
├── agentbase/ # Django 项目
|
||||
│ ├── __init__.py
|
||||
│ ├── settings.py # 数据库配置
|
||||
│ ├── urls.py
|
||||
│ └── wsgi.py
|
||||
├── messages/ # Django App
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py # 三张表
|
||||
│ ├── admin.py # Admin 配置
|
||||
│ ├── views.py # Web 视图
|
||||
│ ├── urls.py
|
||||
│ ├── management/
|
||||
│ │ └── commands/
|
||||
│ │ └── parse_sessions.py
|
||||
│ └── templates/
|
||||
│ └── messages/
|
||||
├── scripts/
|
||||
│ └── parse_and_import.py # CLI 入口脚本
|
||||
├── tests/
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. 后续步骤(待用户确认后执行)
|
||||
|
||||
- [ ] 确认数据库部署位置(Mac Mini 本地 MariaDB?NAS?其他?)
|
||||
- [ ] 确认数据库名称
|
||||
- [ ] 创建 Git 仓库
|
||||
- [ ] 初始化 Django 项目
|
||||
- [ ] 实施解析引擎
|
||||
- [ ] 配置 Django Admin
|
||||
- [ ] 编写定时任务
|
||||
- [ ] 编写测试
|
||||
- [ ] 部署上线
|
||||
|
||||
---
|
||||
|
||||
## 10. 附录:典型查询参考
|
||||
|
||||
```sql
|
||||
-- US-001:查某 Agent 某天所有消息
|
||||
SELECT id, message_uuid, role, text_preview, has_thinking, first_tool_call, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
ORDER BY timestamp;
|
||||
|
||||
-- US-003:查某 Agent 某天所有 exec 调用
|
||||
SELECT m.id, m.timestamp, m.text_preview, m.session_uuid
|
||||
FROM messages m
|
||||
WHERE m.agent_id = 'xingyao'
|
||||
AND m.timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
AND m.first_tool_call = 'exec';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*本文档由星枢整理,基于 2026-04-05 与比利哥的讨论*
|
||||
---
|
||||
title: AgentBase 项目需求文档
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# AgentBase 项目需求文档
|
||||
|
||||
> **文档版本:** v1.0
|
||||
> **创建日期:** 2026-04-05
|
||||
> **状态:** 待确认
|
||||
> **负责人:** 星枢(协调)、星匠(实施)
|
||||
> **项目位置:** `~/Workspace/agentbase/`
|
||||
|
||||
---
|
||||
|
||||
## 1. 项目概述
|
||||
|
||||
### 1.1 项目背景
|
||||
|
||||
OpenClaw 多 Agent 系统运行于 Mac Mini + Ubuntu1 + Ubuntu2 三节点,每日产生大量 session JSONL 文件。这些文件记录了 Agent 的完整对话历史、思考过程(thinking block)、工具调用(toolCall)等核心数据,目前以原始文件形式散落各服务器,无法高效查询。
|
||||
|
||||
### 1.2 项目目标
|
||||
|
||||
构建一套 **session 解析与归档系统**,将多节点、多 Agent 的 JSONL 会话数据统一解析入库,通过 Django Admin 提供可查询、可筛选、可追溯的 Web 管理界面。
|
||||
|
||||
### 1.3 核心价值
|
||||
|
||||
- **可审计**:任何 Agent 的任何操作记录可追溯
|
||||
- **可分析**:按 Agent/时间/工具类型等多维度分析 Agent 行为模式
|
||||
- **可优化**:基于真实调用数据优化 Agent 工作流
|
||||
|
||||
---
|
||||
|
||||
## 2. 系统范围
|
||||
|
||||
### 2.1 纳入系统
|
||||
|
||||
| 服务器 | 角色 | Session 根目录 |
|
||||
|---|---|---|
|
||||
| Mac Mini (`macmini`) | 中央控制节点 | `/Users/weishen/.openclaw/agents/` |
|
||||
| Ubuntu1 (`ubuntu1`) | 准生产服务器 | `/home/shenwei/.openclaw/agents/` |
|
||||
| Ubuntu2 (`ubuntu2`) | 开发服务器 | `/home/shenwei/.openclaw/agents/` |
|
||||
|
||||
### 2.2 纳入 Agent
|
||||
|
||||
`main` / `xingyao` / `xinghui` / `xingjiang` / `opencode` / `sisyphus`
|
||||
|
||||
### 2.3 纳入文件类型
|
||||
|
||||
| 文件类型 | 说明 | session_type |
|
||||
|---|---|---|
|
||||
| `*.jsonl` | 正常 session | `normal` |
|
||||
| `*.jsonl.reset.*` | reset 快照 | `reset` |
|
||||
| `*.jsonl.deleted.*` | 删除快照 | `deleted` |
|
||||
| `*-topic-*.jsonl` | 线程 session | `topic` |
|
||||
|
||||
**忽略文件:** `sessions.json`(索引文件,不解析)
|
||||
|
||||
### 2.4 不纳入范围
|
||||
|
||||
- 实时流式解析(仅定时批量解析)
|
||||
- Session 内容编辑(只读归档)
|
||||
- 跨 Agent 关联分析(v1)
|
||||
- 移动端界面
|
||||
|
||||
---
|
||||
|
||||
## 3. 功能需求
|
||||
|
||||
### 3.1 增量解析引擎
|
||||
|
||||
**FR-PARSE-001**:扫描指定服务器 + Agent 的 session 目录,列出所有符合类型的 `.jsonl` 文件
|
||||
|
||||
**FR-PARSE-002**:对每个文件,基于「文件路径 + mtime + size」判断是否需要重新解析(增量控制)
|
||||
|
||||
**FR-PARSE-003**:解析 JSONL 中的 `type:session` 记录,提取 session 元信息
|
||||
|
||||
**FR-PARSE-004**:解析 JSONL 中的 `type:message` 记录,提取消息内容、toolCall、thinking 等
|
||||
|
||||
**FR-PARSE-005**:对于 `type:message`,正确提取并拆分以下 content block 类型:
|
||||
- `text` → 纯文本内容
|
||||
- `thinking` → 思考过程文本
|
||||
- `toolCall` → 工具调用(id/name/arguments)
|
||||
- `toolResult` → 工具结果(toolCallId/content/isError)
|
||||
|
||||
**FR-PARSE-006**:从 message 元数据中提取 AI Provider / Model / API / Token 计数 / Cost 等计量数据
|
||||
|
||||
**FR-PARSE-007**:解析完成后更新 `parsed_files` 表(status: success / failed + error_message)
|
||||
|
||||
### 3.2 数据库存储
|
||||
|
||||
**FR-DB-001**:维护 `parsed_files` 表,记录每个已解析文件的元数据及状态(增量控制)
|
||||
|
||||
**FR-DB-002**:维护 `sessions` 表,记录每个 session 的根信息(去重依据:session_uuid + server + agent_id)
|
||||
|
||||
**FR-DB-003**:维护 `messages` 表,记录每条消息的完整信息(content 原文、提取字段、计量数据)
|
||||
|
||||
**FR-DB-004**:对 `messages` 表建立合理索引,支持按 `agent_id + timestamp` / `first_tool_call` / `session_id` 等常用查询模式
|
||||
|
||||
### 3.3 Django Admin 管理界面
|
||||
|
||||
**FR-ADMIN-001**:`Messages` 列表页
|
||||
- 支持按 `agent_id` 过滤
|
||||
- 支持按 `timestamp` 范围过滤
|
||||
- 支持按 `role`(user/assistant/toolResult)过滤
|
||||
- 支持按 `first_tool_call` 过滤
|
||||
- 列表显示:id / message_uuid / role / text_preview(前100字符) / has_thinking / first_tool_call / timestamp
|
||||
|
||||
**FR-ADMIN-002**:`Messages` 详情页
|
||||
- 显示 `content_blocks` 原始 JSON(完整展开)
|
||||
- 显示 `thinking_text`(独立字段)
|
||||
- 显示 `tool_calls_json`(完整展开)
|
||||
- 显示所有计量字段(tokens / cost)
|
||||
|
||||
**FR-ADMIN-003**:`Sessions` 列表页
|
||||
- 支持按 `server` / `agent_id` / `session_type` 过滤
|
||||
- 列表显示:id / session_uuid / server / agent_id / session_type / started_at / message_count
|
||||
|
||||
**FR-ADMIN-004**:`ParsedFiles` 列表页
|
||||
- 支持按 `server` / `agent_id` / `status` 过滤
|
||||
- 列表显示:id / server / agent_id / file_path / file_mtime / status / parsed_at
|
||||
- 支持按 `file_path` 搜索
|
||||
|
||||
### 3.4 定时任务
|
||||
|
||||
**FR-CRON-001**:每日定时(建议 00:05)自动执行解析任务,覆盖所有服务器、所有 Agent
|
||||
|
||||
**FR-CRON-002**:解析任务通过 OpenClaw cron 触发,调用 `scripts/parse_and_import.py`
|
||||
|
||||
### 3.5 命令行接口
|
||||
|
||||
**FR-CLI-001**:支持按服务器 + Agent 指定解析范围
|
||||
```
|
||||
python parse_and_import.py --server macmini --agent xingyao
|
||||
```
|
||||
|
||||
**FR-CLI-002**:支持 `--dry-run` 参数,仅扫描文件不写入数据库
|
||||
|
||||
**FR-CLI-003**:支持 `--force` 参数,强制重新解析(忽略增量状态)
|
||||
|
||||
---
|
||||
|
||||
## 4. 非功能需求
|
||||
|
||||
### 4.1 性能
|
||||
|
||||
**NFR-PERF-001**:单次解析 10,000 条消息应在 60 秒内完成
|
||||
|
||||
**NFR-PERF-002**:Django Admin 列表页加载时间不超过 3 秒(百万级数据量下)
|
||||
|
||||
**NFR-PERF-003**:JSONL 文件逐行解析,不一次性加载到内存(流式处理)
|
||||
|
||||
### 4.2 可靠性
|
||||
|
||||
**NFR-RELI-001**:解析失败的文件需记录 `error_message`,不影响同批次其他文件
|
||||
|
||||
**NFR-RELI-002**:数据库操作使用事务保证一致性(单文件解析失败回滚)
|
||||
|
||||
**NFR-RELI-003**:重复解析同一文件(mtime+size 未变)应被跳过,不重复写入
|
||||
|
||||
### 4.3 可维护性
|
||||
|
||||
**NFR-MAIN-001**:数据库 schema 变更通过 Django Migration 管理
|
||||
|
||||
**NFR-MAIN-002**:配置信息(数据库连接)集中写在 `settings.py`,不散落多处
|
||||
|
||||
### 4.4 安全性
|
||||
|
||||
**NFR-SEC-001**:数据库凭据不硬编码在代码中,通过环境变量或 Django settings 管理
|
||||
|
||||
**NFR-SEC-002**:Django Admin 仅本地访问(暂不开放远程)
|
||||
|
||||
---
|
||||
|
||||
## 5. 数据库 Schema(摘要)
|
||||
|
||||
### 5.1 `parsed_files`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| file_path | VARCHAR(512) | NOT NULL |
|
||||
| file_mtime | BIGINT | NOT NULL |
|
||||
| file_size | BIGINT | NOT NULL |
|
||||
| status | VARCHAR(16) | NOT NULL |
|
||||
| parsed_at | DATETIME | |
|
||||
| error_message | TEXT | NULL |
|
||||
|
||||
**UNIQUE**:`(server, agent_id, file_path)`
|
||||
|
||||
### 5.2 `sessions`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| session_uuid | VARCHAR(64) | NOT NULL |
|
||||
| file_path | VARCHAR(512) | |
|
||||
| session_type | VARCHAR(32) | |
|
||||
| cwd | VARCHAR(512) | |
|
||||
| started_at | DATETIME(3) | |
|
||||
| first_message_at | DATETIME(3) | |
|
||||
| last_message_at | DATETIME(3) | |
|
||||
| message_count | INT | DEFAULT 0 |
|
||||
|
||||
**UNIQUE**:`(session_uuid, server, agent_id)`
|
||||
|
||||
### 5.3 `messages`
|
||||
|
||||
| 字段 | 类型 | 约束 |
|
||||
|---|---|---|
|
||||
| id | INT AUTO_INCREMENT | PK |
|
||||
| session_id | INT | FK → sessions.id |
|
||||
| server | VARCHAR(32) | NOT NULL |
|
||||
| agent_id | VARCHAR(64) | NOT NULL |
|
||||
| session_uuid | VARCHAR(64) | NOT NULL |
|
||||
| message_uuid | VARCHAR(64) | NOT NULL |
|
||||
| parent_message_uuid | VARCHAR(64) | NULL |
|
||||
| role | VARCHAR(32) | NOT NULL |
|
||||
| content_blocks | JSON | 原文 |
|
||||
| text_preview | VARCHAR(512) | 摘要 |
|
||||
| first_tool_call | VARCHAR(128) | NULL |
|
||||
| tool_call_count | INT | DEFAULT 0 |
|
||||
| tool_calls_json | JSON | 拆出 |
|
||||
| thinking_text | TEXT | NULL |
|
||||
| has_thinking | TINYINT | DEFAULT 0 |
|
||||
| has_tool_calls | TINYINT | DEFAULT 0 |
|
||||
| is_error | TINYINT | DEFAULT 0 |
|
||||
| provider | VARCHAR(64) | |
|
||||
| model | VARCHAR(128) | |
|
||||
| api | VARCHAR(64) | |
|
||||
| stop_reason | VARCHAR(64) | |
|
||||
| input_tokens | INT | |
|
||||
| output_tokens | INT | |
|
||||
| cache_read_tokens | BIGINT | |
|
||||
| cache_write_tokens | BIGINT | |
|
||||
| total_tokens | INT | |
|
||||
| cost_usd | DECIMAL(12,8) | |
|
||||
| timestamp | DATETIME(3) | |
|
||||
|
||||
**INDEX**:`idx_agent_timestamp(agent_id, timestamp)` / `idx_first_tool_call(first_tool_call)` / `idx_session_id(session_id)` / `idx_role(role)`
|
||||
|
||||
---
|
||||
|
||||
## 6. 用户故事
|
||||
|
||||
### US-001:查询某 Agent 某天的完整对话
|
||||
|
||||
> 作为管理员,我想查看「星曜 2026-04-05 所有消息」,以便审计当天的操作记录
|
||||
|
||||
**验收标准:**
|
||||
- 在 Messages 列表页输入 agent_id = xingyao + date range = 2026-04-05
|
||||
- 返回结果按 timestamp 升序排列
|
||||
- 每条结果显示 text_preview / has_thinking / first_tool_call
|
||||
|
||||
### US-002:查看某条消息的完整思考过程
|
||||
|
||||
> 作为管理员,我想查看某条消息的 thinking_text 和 toolCalls 详情,以便分析 Agent 的决策逻辑
|
||||
|
||||
**验收标准:**
|
||||
- 点击任意消息进入详情页
|
||||
- thinking_text 字段完整展示(无截断)
|
||||
- tool_calls_json 以可读 JSON 格式展示
|
||||
|
||||
### US-003:查询某天某 Agent 执行过的所有 exec 命令
|
||||
|
||||
> 作为管理员,我想查看「星曜今天执行了哪些 exec 命令」,以便审计系统操作
|
||||
|
||||
**验收标准:**
|
||||
- 过滤器 first_tool_call = exec + agent_id = xingyao + date range
|
||||
- 返回结果包含每条 exec 的 text_preview(截取前512字符)
|
||||
|
||||
### US-004:追踪已解析文件状态
|
||||
|
||||
> 作为管理员,我想查看哪些 session 文件已成功解析、哪些失败,以便监控数据入库情况
|
||||
|
||||
**验收标准:**
|
||||
- ParsedFiles 列表页显示所有文件的解析状态
|
||||
- 失败条目显示 error_message
|
||||
- 可按 server / agent_id / status 过滤
|
||||
|
||||
### US-005:增量同步最新 session
|
||||
|
||||
> 作为系统,我需要在每日定时任务中自动解析新增的 session 文件,不重复解析已入库且未变化的文件
|
||||
|
||||
**验收标准:**
|
||||
- 同一文件 mtime+size 未变时,parsed_files 中 status=success 的记录被识别为「已解析」
|
||||
- 新增文件或变化文件被正确解析入库
|
||||
|
||||
---
|
||||
|
||||
## 7. 技术选型
|
||||
|
||||
| 组件 | 选型 | 说明 |
|
||||
|---|---|---|
|
||||
| Web 框架 | Django 4.x | 成熟稳定,Admin 功能强大 |
|
||||
| 数据库 | MariaDB | 与 NAS/现有基础设施兼容 |
|
||||
| Python 版本 | 3.10+ | OpenClaw 生态兼容 |
|
||||
| 部署位置 | Mac Mini | 与 OpenClaw 同节点,SSH 访问 ubuntu1/2 |
|
||||
| ORM | Django ORM | 与 Django 深度集成 |
|
||||
| 定时任务 | OpenClaw cron | 与现有任务系统统一 |
|
||||
|
||||
---
|
||||
|
||||
## 8. 项目目录结构
|
||||
|
||||
```
|
||||
~/Workspace/agentbase/ # Git 仓库
|
||||
├── manage.py
|
||||
├── agentbase/ # Django 项目
|
||||
│ ├── __init__.py
|
||||
│ ├── settings.py # 数据库配置
|
||||
│ ├── urls.py
|
||||
│ └── wsgi.py
|
||||
├── messages/ # Django App
|
||||
│ ├── __init__.py
|
||||
│ ├── models.py # 三张表
|
||||
│ ├── admin.py # Admin 配置
|
||||
│ ├── views.py # Web 视图
|
||||
│ ├── urls.py
|
||||
│ ├── management/
|
||||
│ │ └── commands/
|
||||
│ │ └── parse_sessions.py
|
||||
│ └── templates/
|
||||
│ └── messages/
|
||||
├── scripts/
|
||||
│ └── parse_and_import.py # CLI 入口脚本
|
||||
├── tests/
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. 后续步骤(待用户确认后执行)
|
||||
|
||||
- [ ] 确认数据库部署位置(Mac Mini 本地 MariaDB?NAS?其他?)
|
||||
- [ ] 确认数据库名称
|
||||
- [ ] 创建 Git 仓库
|
||||
- [ ] 初始化 Django 项目
|
||||
- [ ] 实施解析引擎
|
||||
- [ ] 配置 Django Admin
|
||||
- [ ] 编写定时任务
|
||||
- [ ] 编写测试
|
||||
- [ ] 部署上线
|
||||
|
||||
---
|
||||
|
||||
## 10. 附录:典型查询参考
|
||||
|
||||
```sql
|
||||
-- US-001:查某 Agent 某天所有消息
|
||||
SELECT id, message_uuid, role, text_preview, has_thinking, first_tool_call, timestamp
|
||||
FROM messages
|
||||
WHERE agent_id = 'xingyao'
|
||||
AND timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
ORDER BY timestamp;
|
||||
|
||||
-- US-003:查某 Agent 某天所有 exec 调用
|
||||
SELECT m.id, m.timestamp, m.text_preview, m.session_uuid
|
||||
FROM messages m
|
||||
WHERE m.agent_id = 'xingyao'
|
||||
AND m.timestamp BETWEEN '2026-04-05 00:00:00' AND '2026-04-05 23:59:59'
|
||||
AND m.first_tool_call = 'exec';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*本文档由星枢整理,基于 2026-04-05 与比利哥的讨论*
|
||||
|
||||
@@ -1,101 +1,101 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PST 邮件清理规则执行脚本 v1.2
|
||||
用法: python3 apply_rules.py <csv_file>
|
||||
"""
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
|
||||
RULES_FILE = '/Users/weishen/pst-processing/rules/delete_rules.json'
|
||||
|
||||
def load_rules():
|
||||
with open(RULES_FILE) as f:
|
||||
return json.load(f)['rules']
|
||||
|
||||
def apply_rules(csv_path):
|
||||
rules = load_rules()
|
||||
rows = list(csv.DictReader(open(csv_path, encoding='utf-8')))
|
||||
|
||||
for r in rows:
|
||||
r['delete_flag'] = 'N'
|
||||
|
||||
total_deleted = 0
|
||||
|
||||
for rule in rules:
|
||||
folder_pattern = rule['folder_contains']
|
||||
action = rule['action']
|
||||
|
||||
folder_rows = [r for r in rows if folder_pattern in r['folder']]
|
||||
if not folder_rows:
|
||||
continue
|
||||
|
||||
deleted_count = 0
|
||||
kept_count = 0
|
||||
|
||||
if action == 'keep':
|
||||
kept_count = len(folder_rows)
|
||||
|
||||
elif action == 'delete_all':
|
||||
for r in folder_rows:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
elif action == 'keep_sample':
|
||||
keep_limit = rule['keep_count']
|
||||
subject_map = defaultdict(list)
|
||||
for r in folder_rows:
|
||||
subj = r['subject'].strip()[:80]
|
||||
subject_map[subj].append(r)
|
||||
sorted_subjects = sorted(subject_map.items(), key=lambda x: -len(x[1]))
|
||||
|
||||
for subj, emails in sorted_subjects:
|
||||
if kept_count >= keep_limit:
|
||||
for r in emails:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
else:
|
||||
for i, r in enumerate(emails):
|
||||
if i == 0:
|
||||
kept_count += 1
|
||||
else:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
elif action == 'keep_if_attachment':
|
||||
for r in folder_rows:
|
||||
if r['has_attachment'] == 'Y':
|
||||
kept_count += 1
|
||||
else:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
print(f" {folder_pattern}: {len(folder_rows)} | 保留{kept_count} | 删除{deleted_count}")
|
||||
total_deleted += deleted_count
|
||||
|
||||
total_kept = len(rows) - total_deleted
|
||||
print(f"📊 总计: 保留{total_kept} | 删除{total_deleted}")
|
||||
|
||||
fieldnames = list(rows[0].keys())
|
||||
out_marked = csv_path.replace('.csv', '_marked.csv')
|
||||
out_delete = csv_path.replace('.csv', '_delete_list.csv')
|
||||
|
||||
with open(out_marked, 'w', newline='', encoding='utf-8') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
|
||||
with open(out_delete, 'w', newline='', encoding='utf-8') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
writer.writerows([r for r in rows if r['delete_flag'] == 'Y'])
|
||||
|
||||
print(f"✅ 输出: {out_marked}")
|
||||
print(f"✅ 删除清单: {out_delete} ({total_deleted} 封)")
|
||||
|
||||
if __name__ == '__main__':
|
||||
if len(sys.argv) < 2:
|
||||
print("用法: python3 apply_rules.py <csv_file>")
|
||||
sys.exit(1)
|
||||
apply_rules(sys.argv[1])
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PST 邮件清理规则执行脚本 v1.2
|
||||
用法: python3 apply_rules.py <csv_file>
|
||||
"""
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
|
||||
RULES_FILE = '/Users/weishen/pst-processing/rules/delete_rules.json'
|
||||
|
||||
def load_rules():
|
||||
with open(RULES_FILE) as f:
|
||||
return json.load(f)['rules']
|
||||
|
||||
def apply_rules(csv_path):
|
||||
rules = load_rules()
|
||||
rows = list(csv.DictReader(open(csv_path, encoding='utf-8')))
|
||||
|
||||
for r in rows:
|
||||
r['delete_flag'] = 'N'
|
||||
|
||||
total_deleted = 0
|
||||
|
||||
for rule in rules:
|
||||
folder_pattern = rule['folder_contains']
|
||||
action = rule['action']
|
||||
|
||||
folder_rows = [r for r in rows if folder_pattern in r['folder']]
|
||||
if not folder_rows:
|
||||
continue
|
||||
|
||||
deleted_count = 0
|
||||
kept_count = 0
|
||||
|
||||
if action == 'keep':
|
||||
kept_count = len(folder_rows)
|
||||
|
||||
elif action == 'delete_all':
|
||||
for r in folder_rows:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
elif action == 'keep_sample':
|
||||
keep_limit = rule['keep_count']
|
||||
subject_map = defaultdict(list)
|
||||
for r in folder_rows:
|
||||
subj = r['subject'].strip()[:80]
|
||||
subject_map[subj].append(r)
|
||||
sorted_subjects = sorted(subject_map.items(), key=lambda x: -len(x[1]))
|
||||
|
||||
for subj, emails in sorted_subjects:
|
||||
if kept_count >= keep_limit:
|
||||
for r in emails:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
else:
|
||||
for i, r in enumerate(emails):
|
||||
if i == 0:
|
||||
kept_count += 1
|
||||
else:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
elif action == 'keep_if_attachment':
|
||||
for r in folder_rows:
|
||||
if r['has_attachment'] == 'Y':
|
||||
kept_count += 1
|
||||
else:
|
||||
r['delete_flag'] = 'Y'
|
||||
deleted_count += 1
|
||||
|
||||
print(f" {folder_pattern}: {len(folder_rows)} | 保留{kept_count} | 删除{deleted_count}")
|
||||
total_deleted += deleted_count
|
||||
|
||||
total_kept = len(rows) - total_deleted
|
||||
print(f"📊 总计: 保留{total_kept} | 删除{total_deleted}")
|
||||
|
||||
fieldnames = list(rows[0].keys())
|
||||
out_marked = csv_path.replace('.csv', '_marked.csv')
|
||||
out_delete = csv_path.replace('.csv', '_delete_list.csv')
|
||||
|
||||
with open(out_marked, 'w', newline='', encoding='utf-8') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
|
||||
with open(out_delete, 'w', newline='', encoding='utf-8') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||
writer.writeheader()
|
||||
writer.writerows([r for r in rows if r['delete_flag'] == 'Y'])
|
||||
|
||||
print(f"✅ 输出: {out_marked}")
|
||||
print(f"✅ 删除清单: {out_delete} ({total_deleted} 封)")
|
||||
|
||||
if __name__ == '__main__':
|
||||
if len(sys.argv) < 2:
|
||||
print("用法: python3 apply_rules.py <csv_file>")
|
||||
sys.exit(1)
|
||||
apply_rules(sys.argv[1])
|
||||
|
||||
@@ -1,84 +1,84 @@
|
||||
{
|
||||
"version": "1.2",
|
||||
"description": "PST邮件清理规则",
|
||||
"rules": [
|
||||
{
|
||||
"id": "aws_notification",
|
||||
"folder_contains": "AWS Notification",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "AWS告警通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "prisma_cloud",
|
||||
"folder_contains": "Prisma Cloud Notifications",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "Prisma Cloud通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "x4x_tenant_provisioning",
|
||||
"folder_contains": "X4X-Tenant Provisioning",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "X4X租户配置通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "qualys",
|
||||
"folder_contains": "Qualys",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "Qualys安全扫描通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "teams_notification",
|
||||
"folder_contains": "Teams Notification",
|
||||
"action": "keep_if_attachment",
|
||||
"description": "Teams会议通知,有附件保留,无附件删除"
|
||||
},
|
||||
{
|
||||
"id": "sma_notification",
|
||||
"folder_contains": "SMA Notficiation",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 10,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "SMA工单通知,每月保留10封不同subject,其余删除"
|
||||
},
|
||||
{
|
||||
"id": "ppm_saas_change",
|
||||
"folder_contains": "PPM SaaS Change",
|
||||
"action": "delete_all",
|
||||
"description": "PPM故障单解决通知,全部删除"
|
||||
},
|
||||
{
|
||||
"id": "cloudhealth",
|
||||
"folder_contains": "CloudHealth",
|
||||
"action": "keep",
|
||||
"description": "CloudHealth成本报告,全部保留"
|
||||
},
|
||||
{
|
||||
"id": "saas_bi_report",
|
||||
"folder_contains": "SaaS BI Report",
|
||||
"action": "keep",
|
||||
"description": "BI数据推送,全部保留"
|
||||
},
|
||||
{
|
||||
"id": "x4x_tenant_decommissioning",
|
||||
"folder_contains": "X4X-Tenant Decommissioning",
|
||||
"action": "delete_all",
|
||||
"description": "租户下线通知,全部删除"
|
||||
},
|
||||
{
|
||||
"id": "x4x_license_renewal",
|
||||
"folder_contains": "X4X-License Renewal",
|
||||
"action": "delete_all",
|
||||
"description": "续期通知,全部删除"
|
||||
}
|
||||
],
|
||||
"default_action": "keep",
|
||||
"updated": "2026-04-13"
|
||||
}
|
||||
{
|
||||
"version": "1.2",
|
||||
"description": "PST邮件清理规则",
|
||||
"rules": [
|
||||
{
|
||||
"id": "aws_notification",
|
||||
"folder_contains": "AWS Notification",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "AWS告警通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "prisma_cloud",
|
||||
"folder_contains": "Prisma Cloud Notifications",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "Prisma Cloud通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "x4x_tenant_provisioning",
|
||||
"folder_contains": "X4X-Tenant Provisioning",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "X4X租户配置通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "qualys",
|
||||
"folder_contains": "Qualys",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 5,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "Qualys安全扫描通知,每不同subject保留1封,最多5封"
|
||||
},
|
||||
{
|
||||
"id": "teams_notification",
|
||||
"folder_contains": "Teams Notification",
|
||||
"action": "keep_if_attachment",
|
||||
"description": "Teams会议通知,有附件保留,无附件删除"
|
||||
},
|
||||
{
|
||||
"id": "sma_notification",
|
||||
"folder_contains": "SMA Notficiation",
|
||||
"action": "keep_sample",
|
||||
"keep_count": 10,
|
||||
"keep_by": "unique_subject",
|
||||
"description": "SMA工单通知,每月保留10封不同subject,其余删除"
|
||||
},
|
||||
{
|
||||
"id": "ppm_saas_change",
|
||||
"folder_contains": "PPM SaaS Change",
|
||||
"action": "delete_all",
|
||||
"description": "PPM故障单解决通知,全部删除"
|
||||
},
|
||||
{
|
||||
"id": "cloudhealth",
|
||||
"folder_contains": "CloudHealth",
|
||||
"action": "keep",
|
||||
"description": "CloudHealth成本报告,全部保留"
|
||||
},
|
||||
{
|
||||
"id": "saas_bi_report",
|
||||
"folder_contains": "SaaS BI Report",
|
||||
"action": "keep",
|
||||
"description": "BI数据推送,全部保留"
|
||||
},
|
||||
{
|
||||
"id": "x4x_tenant_decommissioning",
|
||||
"folder_contains": "X4X-Tenant Decommissioning",
|
||||
"action": "delete_all",
|
||||
"description": "租户下线通知,全部删除"
|
||||
},
|
||||
{
|
||||
"id": "x4x_license_renewal",
|
||||
"folder_contains": "X4X-License Renewal",
|
||||
"action": "delete_all",
|
||||
"description": "续期通知,全部删除"
|
||||
}
|
||||
],
|
||||
"default_action": "keep",
|
||||
"updated": "2026-04-13"
|
||||
}
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,261 +1,261 @@
|
||||
# Whisper 本地语音转录完全指南
|
||||
|
||||
> 文档版本:2026-04-15
|
||||
> 维护者:星枢(xingshu)
|
||||
> 状态:✅ Macmini 已验证可运行
|
||||
|
||||
---
|
||||
|
||||
## 一、Whisper 是什么
|
||||
|
||||
Whisper 是 OpenAI 开源的自动语音识别(ASR)模型,可将音频文件转录为文字。支持 99 种语言,尤其对英文识别精度极高。
|
||||
|
||||
**两种使用方式:**
|
||||
|
||||
| 方式 | 说明 | 费用 |
|
||||
|---|---|---|
|
||||
| **本地运行** | 模型下载到本地 Mac/PC | **免费** |
|
||||
| OpenAI API | 调用 OpenAI Whisper API | 按分钟计费 |
|
||||
|
||||
本指南使用**本地运行**方式。
|
||||
|
||||
---
|
||||
|
||||
## 二、支持的模型
|
||||
|
||||
| 模型 | 参数量 | 英文 WER* | 中文 CER* | 本地内存占用 | Macmini 兼容性 |
|
||||
|---|---|---|---|---|---|
|
||||
| `tiny` | 39M | 5.2% | ~10% | ~1GB | ✅ |
|
||||
| `base` | 74M | 3.5% | ~8% | ~1GB | ✅ |
|
||||
| **`small`** | 244M | 2.7% | ~5% | ~1.5GB | **✅ 推荐** |
|
||||
| `medium` | 769M | 2.3% | ~4% | ~5GB | ⚠️ 可能 OOM |
|
||||
| `large` | 1550M | 2.0% | ~3% | ~10GB | ❌ OOM |
|
||||
|
||||
> \* WER = Word Error Rate,CER = Character Error Rate,越低越准确。
|
||||
|
||||
**推荐:`small` 模型**(精度与资源占用的最佳平衡)
|
||||
|
||||
---
|
||||
|
||||
## 三、安装
|
||||
|
||||
### 3.1 前置条件
|
||||
|
||||
```bash
|
||||
# 确认 Python 版本(需 3.8+)
|
||||
python3 --version
|
||||
|
||||
# 确认 pip 可用
|
||||
pip3 --version
|
||||
```
|
||||
|
||||
### 3.2 安装 Whisper
|
||||
|
||||
```bash
|
||||
pip3 install openai-whisper
|
||||
```
|
||||
|
||||
**如果遇到权限错误(macOS):**
|
||||
```bash
|
||||
pip3 install --user openai-whisper
|
||||
```
|
||||
|
||||
**首次运行会自动下载模型文件**(~500MB/small 模型),无需手动下载。
|
||||
|
||||
---
|
||||
|
||||
## 四、快速测试
|
||||
|
||||
### 4.1 单文件测试(tiny 模型,最快)
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("tiny") # 首次运行会下载模型
|
||||
result = model.transcribe("audio.mp3", language="en")
|
||||
print(result["text"])
|
||||
```
|
||||
|
||||
### 4.2 完整示例(small 模型)
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
# 加载模型(只需加载一次)
|
||||
model = whisper.load_model("small")
|
||||
|
||||
# 转录
|
||||
result = model.transcribe(
|
||||
"audio.mp3",
|
||||
language="en", # 指定语言,不指定则自动检测
|
||||
fp16=False, # Macmini 用 CPU,必须 False
|
||||
verbose=True, # 显示进度
|
||||
)
|
||||
|
||||
print("语言检测:", result["language"])
|
||||
print("转写稿:", result["text"])
|
||||
print("分段数:", len(result["segments"]))
|
||||
```
|
||||
|
||||
### 4.3 命令行测试
|
||||
|
||||
```bash
|
||||
# 安装后可直接在命令行使用
|
||||
whisper audio.mp3 --model small --language en
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、Python API 详解
|
||||
|
||||
### 5.1 核心方法
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("small")
|
||||
|
||||
# 完整参数
|
||||
result = model.transcribe(
|
||||
audio="path/to/file.mp3",
|
||||
|
||||
# 语言设置
|
||||
language="en", # 指定语言,不填则自动检测
|
||||
# prompt="", # 可选,引导模型偏好(如专有名词)
|
||||
|
||||
# 输出控制
|
||||
fp16=False, # CPU 必须 False,GPU 可 True
|
||||
temperature=0.0, # 0=确定性,>0=随机性
|
||||
condition_on_previous_text=True, # 利用前一段上下文
|
||||
|
||||
# 任务模式
|
||||
task="transcribe", # transcribe 或 translate(中译英)
|
||||
|
||||
# 段落切分
|
||||
word_timestamps=False, # True=输出每个词的起止时间
|
||||
|
||||
# 日志
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
### 5.2 返回值结构
|
||||
|
||||
```python
|
||||
{
|
||||
"text": "完整的转写文本...",
|
||||
"language": "en",
|
||||
"segments": [
|
||||
{
|
||||
"id": 0,
|
||||
"start": 0.0, # 秒
|
||||
"end": 5.5,
|
||||
"text": " Can you see my screen already?",
|
||||
"words": [...] # 如果 word_timestamps=True
|
||||
},
|
||||
...
|
||||
],
|
||||
"language_probability": 0.99
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 批量转录
|
||||
|
||||
```python
|
||||
import whisper
|
||||
import glob
|
||||
|
||||
model = whisper.load_model("small")
|
||||
audio_files = glob.glob("*.mp3")
|
||||
|
||||
for audio_file in audio_files:
|
||||
print(f"Processing: {audio_file}")
|
||||
result = model.transcribe(audio_file, language="en", fp16=False)
|
||||
|
||||
# 保存转写稿
|
||||
with open(audio_file + ".txt", "w") as f:
|
||||
f.write(result["text"])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 六、Macmini M4 Pro 性能实测
|
||||
|
||||
| 音频时长 | 文件大小 | 模型 | 转录耗时 | 速度比 |
|
||||
|---|---|---|---|---|
|
||||
| ~54 分钟 | 3MB | `small` | ~43s | ~75x realtime |
|
||||
| ~54 分钟 | 3MB | `tiny` | ~10s | ~320x realtime |
|
||||
| ~1 小时 | 22MB | `small` | ~90s | ~40x realtime |
|
||||
|
||||
**速度经验:** `small` 模型处理 1 小时音频约 1-2 分钟,内存占用稳定在 ~1.5GB。
|
||||
|
||||
---
|
||||
|
||||
## 七、在流水线中的使用
|
||||
|
||||
本项目不使用 Whisper API,而是通过 Python 脚本调用本地模型:
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
def whisper_transcribe(mp3_path: str) -> str:
|
||||
"""单文件转录,返回英文字幕/转写稿"""
|
||||
model = whisper.load_model("small") # 模型只加载一次
|
||||
result = model.transcribe(
|
||||
mp3_path,
|
||||
language="en",
|
||||
fp16=False,
|
||||
)
|
||||
return result["text"].strip()
|
||||
|
||||
# 使用
|
||||
transcript = whisper_transcribe("/path/to/audio.mp3")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 八、常见问题
|
||||
|
||||
### Q1: `fp16 is not supported on CPU` 警告
|
||||
**正常**,Macmini 用 CPU 运行,Whisper 自动降级到 FP32。不影响精度。
|
||||
|
||||
### Q2: `SIGKILL` / 进程被杀死
|
||||
**内存不足**,模型太大。改用更小的模型:
|
||||
```python
|
||||
model = whisper.load_model("tiny") # 最省内存
|
||||
```
|
||||
|
||||
### Q3: 中文识别不准
|
||||
指定语言参数提升精度:
|
||||
```python
|
||||
result = model.transcribe("audio.mp3", language="zh") # 中文
|
||||
result = model.transcribe("audio.mp3", language="en") # 英文
|
||||
```
|
||||
|
||||
### Q4: 如何加速转录
|
||||
- 用 `tiny` 或 `base` 模型(牺牲精度换速度)
|
||||
- Macmini M 系列芯片无需特殊优化(Neural Engine 自动加速)
|
||||
- 避免同时跑多个转录任务
|
||||
|
||||
### Q5: 支持哪些音频格式
|
||||
支持 FFmpeg 支持的所有格式:`mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm` 等。
|
||||
|
||||
---
|
||||
|
||||
## 九、卸载
|
||||
|
||||
```bash
|
||||
pip3 uninstall openai-whisper
|
||||
|
||||
# 删除已下载的模型(默认缓存位置)
|
||||
rm -rf ~/.cache/whisper
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 十、相关资源
|
||||
|
||||
- **GitHub**: https://github.com/openai/whisper
|
||||
- **模型下载**: 首次调用 `load_model()` 时自动下载
|
||||
- **缓存位置**: `~/.cache/whisper/`
|
||||
- **本项目脚本**: `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`
|
||||
# Whisper 本地语音转录完全指南
|
||||
|
||||
> 文档版本:2026-04-15
|
||||
> 维护者:星枢(xingshu)
|
||||
> 状态:✅ Macmini 已验证可运行
|
||||
|
||||
---
|
||||
|
||||
## 一、Whisper 是什么
|
||||
|
||||
Whisper 是 OpenAI 开源的自动语音识别(ASR)模型,可将音频文件转录为文字。支持 99 种语言,尤其对英文识别精度极高。
|
||||
|
||||
**两种使用方式:**
|
||||
|
||||
| 方式 | 说明 | 费用 |
|
||||
|---|---|---|
|
||||
| **本地运行** | 模型下载到本地 Mac/PC | **免费** |
|
||||
| OpenAI API | 调用 OpenAI Whisper API | 按分钟计费 |
|
||||
|
||||
本指南使用**本地运行**方式。
|
||||
|
||||
---
|
||||
|
||||
## 二、支持的模型
|
||||
|
||||
| 模型 | 参数量 | 英文 WER* | 中文 CER* | 本地内存占用 | Macmini 兼容性 |
|
||||
|---|---|---|---|---|---|
|
||||
| `tiny` | 39M | 5.2% | ~10% | ~1GB | ✅ |
|
||||
| `base` | 74M | 3.5% | ~8% | ~1GB | ✅ |
|
||||
| **`small`** | 244M | 2.7% | ~5% | ~1.5GB | **✅ 推荐** |
|
||||
| `medium` | 769M | 2.3% | ~4% | ~5GB | ⚠️ 可能 OOM |
|
||||
| `large` | 1550M | 2.0% | ~3% | ~10GB | ❌ OOM |
|
||||
|
||||
> \* WER = Word Error Rate,CER = Character Error Rate,越低越准确。
|
||||
|
||||
**推荐:`small` 模型**(精度与资源占用的最佳平衡)
|
||||
|
||||
---
|
||||
|
||||
## 三、安装
|
||||
|
||||
### 3.1 前置条件
|
||||
|
||||
```bash
|
||||
# 确认 Python 版本(需 3.8+)
|
||||
python3 --version
|
||||
|
||||
# 确认 pip 可用
|
||||
pip3 --version
|
||||
```
|
||||
|
||||
### 3.2 安装 Whisper
|
||||
|
||||
```bash
|
||||
pip3 install openai-whisper
|
||||
```
|
||||
|
||||
**如果遇到权限错误(macOS):**
|
||||
```bash
|
||||
pip3 install --user openai-whisper
|
||||
```
|
||||
|
||||
**首次运行会自动下载模型文件**(~500MB/small 模型),无需手动下载。
|
||||
|
||||
---
|
||||
|
||||
## 四、快速测试
|
||||
|
||||
### 4.1 单文件测试(tiny 模型,最快)
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("tiny") # 首次运行会下载模型
|
||||
result = model.transcribe("audio.mp3", language="en")
|
||||
print(result["text"])
|
||||
```
|
||||
|
||||
### 4.2 完整示例(small 模型)
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
# 加载模型(只需加载一次)
|
||||
model = whisper.load_model("small")
|
||||
|
||||
# 转录
|
||||
result = model.transcribe(
|
||||
"audio.mp3",
|
||||
language="en", # 指定语言,不指定则自动检测
|
||||
fp16=False, # Macmini 用 CPU,必须 False
|
||||
verbose=True, # 显示进度
|
||||
)
|
||||
|
||||
print("语言检测:", result["language"])
|
||||
print("转写稿:", result["text"])
|
||||
print("分段数:", len(result["segments"]))
|
||||
```
|
||||
|
||||
### 4.3 命令行测试
|
||||
|
||||
```bash
|
||||
# 安装后可直接在命令行使用
|
||||
whisper audio.mp3 --model small --language en
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、Python API 详解
|
||||
|
||||
### 5.1 核心方法
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("small")
|
||||
|
||||
# 完整参数
|
||||
result = model.transcribe(
|
||||
audio="path/to/file.mp3",
|
||||
|
||||
# 语言设置
|
||||
language="en", # 指定语言,不填则自动检测
|
||||
# prompt="", # 可选,引导模型偏好(如专有名词)
|
||||
|
||||
# 输出控制
|
||||
fp16=False, # CPU 必须 False,GPU 可 True
|
||||
temperature=0.0, # 0=确定性,>0=随机性
|
||||
condition_on_previous_text=True, # 利用前一段上下文
|
||||
|
||||
# 任务模式
|
||||
task="transcribe", # transcribe 或 translate(中译英)
|
||||
|
||||
# 段落切分
|
||||
word_timestamps=False, # True=输出每个词的起止时间
|
||||
|
||||
# 日志
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
### 5.2 返回值结构
|
||||
|
||||
```python
|
||||
{
|
||||
"text": "完整的转写文本...",
|
||||
"language": "en",
|
||||
"segments": [
|
||||
{
|
||||
"id": 0,
|
||||
"start": 0.0, # 秒
|
||||
"end": 5.5,
|
||||
"text": " Can you see my screen already?",
|
||||
"words": [...] # 如果 word_timestamps=True
|
||||
},
|
||||
...
|
||||
],
|
||||
"language_probability": 0.99
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 批量转录
|
||||
|
||||
```python
|
||||
import whisper
|
||||
import glob
|
||||
|
||||
model = whisper.load_model("small")
|
||||
audio_files = glob.glob("*.mp3")
|
||||
|
||||
for audio_file in audio_files:
|
||||
print(f"Processing: {audio_file}")
|
||||
result = model.transcribe(audio_file, language="en", fp16=False)
|
||||
|
||||
# 保存转写稿
|
||||
with open(audio_file + ".txt", "w") as f:
|
||||
f.write(result["text"])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 六、Macmini M4 Pro 性能实测
|
||||
|
||||
| 音频时长 | 文件大小 | 模型 | 转录耗时 | 速度比 |
|
||||
|---|---|---|---|---|
|
||||
| ~54 分钟 | 3MB | `small` | ~43s | ~75x realtime |
|
||||
| ~54 分钟 | 3MB | `tiny` | ~10s | ~320x realtime |
|
||||
| ~1 小时 | 22MB | `small` | ~90s | ~40x realtime |
|
||||
|
||||
**速度经验:** `small` 模型处理 1 小时音频约 1-2 分钟,内存占用稳定在 ~1.5GB。
|
||||
|
||||
---
|
||||
|
||||
## 七、在流水线中的使用
|
||||
|
||||
本项目不使用 Whisper API,而是通过 Python 脚本调用本地模型:
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
def whisper_transcribe(mp3_path: str) -> str:
|
||||
"""单文件转录,返回英文字幕/转写稿"""
|
||||
model = whisper.load_model("small") # 模型只加载一次
|
||||
result = model.transcribe(
|
||||
mp3_path,
|
||||
language="en",
|
||||
fp16=False,
|
||||
)
|
||||
return result["text"].strip()
|
||||
|
||||
# 使用
|
||||
transcript = whisper_transcribe("/path/to/audio.mp3")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 八、常见问题
|
||||
|
||||
### Q1: `fp16 is not supported on CPU` 警告
|
||||
**正常**,Macmini 用 CPU 运行,Whisper 自动降级到 FP32。不影响精度。
|
||||
|
||||
### Q2: `SIGKILL` / 进程被杀死
|
||||
**内存不足**,模型太大。改用更小的模型:
|
||||
```python
|
||||
model = whisper.load_model("tiny") # 最省内存
|
||||
```
|
||||
|
||||
### Q3: 中文识别不准
|
||||
指定语言参数提升精度:
|
||||
```python
|
||||
result = model.transcribe("audio.mp3", language="zh") # 中文
|
||||
result = model.transcribe("audio.mp3", language="en") # 英文
|
||||
```
|
||||
|
||||
### Q4: 如何加速转录
|
||||
- 用 `tiny` 或 `base` 模型(牺牲精度换速度)
|
||||
- Macmini M 系列芯片无需特殊优化(Neural Engine 自动加速)
|
||||
- 避免同时跑多个转录任务
|
||||
|
||||
### Q5: 支持哪些音频格式
|
||||
支持 FFmpeg 支持的所有格式:`mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm` 等。
|
||||
|
||||
---
|
||||
|
||||
## 九、卸载
|
||||
|
||||
```bash
|
||||
pip3 uninstall openai-whisper
|
||||
|
||||
# 删除已下载的模型(默认缓存位置)
|
||||
rm -rf ~/.cache/whisper
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 十、相关资源
|
||||
|
||||
- **GitHub**: https://github.com/openai/whisper
|
||||
- **模型下载**: 首次调用 `load_model()` 时自动下载
|
||||
- **缓存位置**: `~/.cache/whisper/`
|
||||
- **本项目脚本**: `~/.openclaw/temp/xingshu/scripts/nas_whisper_gemini_summarize.py`
|
||||
|
||||
@@ -1,147 +1,147 @@
|
||||
---
|
||||
title: 星枢调度 Agent 列表
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# 星枢调度 Agent 列表
|
||||
|
||||
> 创建日期: 2026-03-17
|
||||
> 作者: 星枢
|
||||
|
||||
---
|
||||
|
||||
## 概述
|
||||
|
||||
本文档记录星枢(最高统领)负责调度的所有 Agent,包括所属服务器、Agent 名称、职责分配等信息。
|
||||
|
||||
---
|
||||
|
||||
## Agent 架构体系
|
||||
|
||||
### 三层架构
|
||||
|
||||
| 层级 | 系列 | 含义 | 主要职责 |
|
||||
|------|------|------|----------|
|
||||
| 控制层 | 星系 | 星辰统御 | 调度、管理、智能决策 |
|
||||
| 技术层 | 云系 | 云海算力 | 开发、架构、监控 |
|
||||
| 执行层 | 风系 | 风行万里 | 测试、业务执行、流程处理 |
|
||||
|
||||
---
|
||||
|
||||
## Agent 详细列表
|
||||
|
||||
### Mac Mini(中央控制节点)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | Telegram Account | 职责 |
|
||||
|--------|----------|------|----------------|------|
|
||||
| Mac Mini | main | **星枢** | xingshu | 最高统领 / 总调度 |
|
||||
| Mac Mini | xingyao | 星曜 | xingyao | IT 管家 / 运维管理 |
|
||||
| Mac Mini | xinghui | 星辉 | xinghui | 个人助理 / 日程管理 |
|
||||
|
||||
### Ubuntu2(开发服务器)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | 职责 |
|
||||
|--------|----------|------|------|
|
||||
| Ubuntu2 (192.168.3.45) | yunhan | 云瀚 | 监控官 / 系统监控、状态巡检 |
|
||||
| Ubuntu2 (192.168.3.45) | yunce | 云策 | 架构师 / 技术方案、系统规划 |
|
||||
| Ubuntu2 (192.168.3.45) | yunjiang | 云匠 | 工匠 / 代码开发、工程实现 |
|
||||
| Ubuntu2 (192.168.3.45) | yunzhi | 云织 | 自动化师 / CI/CD、流程编排 |
|
||||
|
||||
### Ubuntu1(准生产服务器)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | 职责 |
|
||||
|--------|----------|------|------|
|
||||
| Ubuntu1 (192.168.3.47) | fengheng | 风衡 | 质检官 / QA测试、质量控制 |
|
||||
| Ubuntu1 (192.168.3.47) | fengchi | 风驰 | 执行者 / 任务执行、业务流程 |
|
||||
| Ubuntu1 (192.168.3.47) | fengji | 风纪 | 审计官 / 规则审计、合规检查 |
|
||||
|
||||
---
|
||||
|
||||
## 职责分配规则
|
||||
|
||||
| 任务类型 | 执行者 | 服务器 |
|
||||
|----------|--------|---------|
|
||||
| IT运维/服务器管理 | xingyao(星曜) | Mac Mini |
|
||||
| 日程/个人事务 | xinghui(星辉) | Mac Mini |
|
||||
| 监控/架构/自动化 | yunhan/yunce/yunzhi | Ubuntu2 |
|
||||
| 代码开发 | yunjiang(云匠) | Ubuntu2 |
|
||||
| QA测试 | fengheng(风衡) | Ubuntu1 |
|
||||
| 自动执行 | fengchi(风驰) | Ubuntu1 |
|
||||
| 审计/合规 | fengji(风纪) | Ubuntu1 |
|
||||
|
||||
---
|
||||
|
||||
## 调度命令
|
||||
|
||||
### 本地调度(Mac Mini)
|
||||
|
||||
```bash
|
||||
# 调度星曜
|
||||
openclaw agent --agent xingyao --message "任务内容" --deliver
|
||||
|
||||
# 调度星辉
|
||||
openclaw agent --agent xinghui --message "任务内容" --deliver
|
||||
```
|
||||
|
||||
### 远程调度(Ubuntu2)
|
||||
|
||||
```bash
|
||||
# 调度云系
|
||||
ssh ubuntu2 "openclaw agent --agent yunce --message '任务内容'"
|
||||
```
|
||||
|
||||
### 远程调度(Ubuntu1)
|
||||
|
||||
```bash
|
||||
# 调度风系
|
||||
ssh ubuntu1 "openclaw agent --agent fengheng --message '任务内容'"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 定时任务
|
||||
|
||||
| 任务名称 | 执行时间 | 执行内容 | 执行者 |
|
||||
|----------|----------|----------|--------|
|
||||
| 每日备份任务 | 每天 22:00 | 备份 Mac Mini + Ubuntu2 | xinghui |
|
||||
| 每日定时任务检查 | 每天 09:00 | 检查所有定时任务状态 | xinghui |
|
||||
|
||||
---
|
||||
|
||||
## 技能列表
|
||||
|
||||
### Mac Mini 技能
|
||||
|
||||
- task-summary
|
||||
- proactive-agent-lite
|
||||
- openclaw-tavily-search
|
||||
- self-improving-agent
|
||||
- (其他 60+ 技能)
|
||||
|
||||
### Ubuntu2 技能
|
||||
|
||||
- task-summary
|
||||
- proactive-agent-lite
|
||||
- self-improving-agent
|
||||
- 1password
|
||||
- agent-browser-clawdbot
|
||||
- docker
|
||||
- ontology
|
||||
|
||||
---
|
||||
|
||||
## 更新日志
|
||||
|
||||
| 日期 | 更新内容 |
|
||||
|------|----------|
|
||||
| 2026-03-17 | 初始版本 |
|
||||
|
||||
---
|
||||
|
||||
*最后更新: 2026-03-17*
|
||||
*记录者: 星枢*
|
||||
---
|
||||
title: 星枢调度 Agent 列表
|
||||
source:
|
||||
author: shenwei
|
||||
published:
|
||||
created:
|
||||
description:
|
||||
tags: []
|
||||
---
|
||||
|
||||
# 星枢调度 Agent 列表
|
||||
|
||||
> 创建日期: 2026-03-17
|
||||
> 作者: 星枢
|
||||
|
||||
---
|
||||
|
||||
## 概述
|
||||
|
||||
本文档记录星枢(最高统领)负责调度的所有 Agent,包括所属服务器、Agent 名称、职责分配等信息。
|
||||
|
||||
---
|
||||
|
||||
## Agent 架构体系
|
||||
|
||||
### 三层架构
|
||||
|
||||
| 层级 | 系列 | 含义 | 主要职责 |
|
||||
|------|------|------|----------|
|
||||
| 控制层 | 星系 | 星辰统御 | 调度、管理、智能决策 |
|
||||
| 技术层 | 云系 | 云海算力 | 开发、架构、监控 |
|
||||
| 执行层 | 风系 | 风行万里 | 测试、业务执行、流程处理 |
|
||||
|
||||
---
|
||||
|
||||
## Agent 详细列表
|
||||
|
||||
### Mac Mini(中央控制节点)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | Telegram Account | 职责 |
|
||||
|--------|----------|------|----------------|------|
|
||||
| Mac Mini | main | **星枢** | xingshu | 最高统领 / 总调度 |
|
||||
| Mac Mini | xingyao | 星曜 | xingyao | IT 管家 / 运维管理 |
|
||||
| Mac Mini | xinghui | 星辉 | xinghui | 个人助理 / 日程管理 |
|
||||
|
||||
### Ubuntu2(开发服务器)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | 职责 |
|
||||
|--------|----------|------|------|
|
||||
| Ubuntu2 (192.168.3.45) | yunhan | 云瀚 | 监控官 / 系统监控、状态巡检 |
|
||||
| Ubuntu2 (192.168.3.45) | yunce | 云策 | 架构师 / 技术方案、系统规划 |
|
||||
| Ubuntu2 (192.168.3.45) | yunjiang | 云匠 | 工匠 / 代码开发、工程实现 |
|
||||
| Ubuntu2 (192.168.3.45) | yunzhi | 云织 | 自动化师 / CI/CD、流程编排 |
|
||||
|
||||
### Ubuntu1(准生产服务器)
|
||||
|
||||
| 服务器 | Agent ID | 角色 | 职责 |
|
||||
|--------|----------|------|------|
|
||||
| Ubuntu1 (192.168.3.47) | fengheng | 风衡 | 质检官 / QA测试、质量控制 |
|
||||
| Ubuntu1 (192.168.3.47) | fengchi | 风驰 | 执行者 / 任务执行、业务流程 |
|
||||
| Ubuntu1 (192.168.3.47) | fengji | 风纪 | 审计官 / 规则审计、合规检查 |
|
||||
|
||||
---
|
||||
|
||||
## 职责分配规则
|
||||
|
||||
| 任务类型 | 执行者 | 服务器 |
|
||||
|----------|--------|---------|
|
||||
| IT运维/服务器管理 | xingyao(星曜) | Mac Mini |
|
||||
| 日程/个人事务 | xinghui(星辉) | Mac Mini |
|
||||
| 监控/架构/自动化 | yunhan/yunce/yunzhi | Ubuntu2 |
|
||||
| 代码开发 | yunjiang(云匠) | Ubuntu2 |
|
||||
| QA测试 | fengheng(风衡) | Ubuntu1 |
|
||||
| 自动执行 | fengchi(风驰) | Ubuntu1 |
|
||||
| 审计/合规 | fengji(风纪) | Ubuntu1 |
|
||||
|
||||
---
|
||||
|
||||
## 调度命令
|
||||
|
||||
### 本地调度(Mac Mini)
|
||||
|
||||
```bash
|
||||
# 调度星曜
|
||||
openclaw agent --agent xingyao --message "任务内容" --deliver
|
||||
|
||||
# 调度星辉
|
||||
openclaw agent --agent xinghui --message "任务内容" --deliver
|
||||
```
|
||||
|
||||
### 远程调度(Ubuntu2)
|
||||
|
||||
```bash
|
||||
# 调度云系
|
||||
ssh ubuntu2 "openclaw agent --agent yunce --message '任务内容'"
|
||||
```
|
||||
|
||||
### 远程调度(Ubuntu1)
|
||||
|
||||
```bash
|
||||
# 调度风系
|
||||
ssh ubuntu1 "openclaw agent --agent fengheng --message '任务内容'"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 定时任务
|
||||
|
||||
| 任务名称 | 执行时间 | 执行内容 | 执行者 |
|
||||
|----------|----------|----------|--------|
|
||||
| 每日备份任务 | 每天 22:00 | 备份 Mac Mini + Ubuntu2 | xinghui |
|
||||
| 每日定时任务检查 | 每天 09:00 | 检查所有定时任务状态 | xinghui |
|
||||
|
||||
---
|
||||
|
||||
## 技能列表
|
||||
|
||||
### Mac Mini 技能
|
||||
|
||||
- task-summary
|
||||
- proactive-agent-lite
|
||||
- openclaw-tavily-search
|
||||
- self-improving-agent
|
||||
- (其他 60+ 技能)
|
||||
|
||||
### Ubuntu2 技能
|
||||
|
||||
- task-summary
|
||||
- proactive-agent-lite
|
||||
- self-improving-agent
|
||||
- 1password
|
||||
- agent-browser-clawdbot
|
||||
- docker
|
||||
- ontology
|
||||
|
||||
---
|
||||
|
||||
## 更新日志
|
||||
|
||||
| 日期 | 更新内容 |
|
||||
|------|----------|
|
||||
| 2026-03-17 | 初始版本 |
|
||||
|
||||
---
|
||||
|
||||
*最后更新: 2026-03-17*
|
||||
*记录者: 星枢*
|
||||
|
||||
Reference in New Issue
Block a user