chore: ignore raw and wiki, update remote

This commit is contained in:
2026-04-16 13:13:32 +08:00
parent b02eb12d1d
commit 753f7841e8
11 changed files with 1038 additions and 155 deletions

2
.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
raw/
wiki/

352
CLAUDE.md
View File

@@ -1,78 +1,132 @@
# LLM Wiki Agent — Schema & Workflow Instructions
# LLM Wiki Agent — Schema & Workflow Instructions(中文版增强规范)
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
本 Wiki 完全由 Claude Code 自动维护。无需 API Key Python 脚本 —— 只需在 Claude Code 中打开本仓库并与其对话。
## Slash Commands (Claude Code)
---
# 🔴 全局强制规则CRITICAL
| Command | What to say |
|---|---|
| `/wiki-ingest` | `ingest raw/my-article.md` |
| `/wiki-query` | `query: what are the main themes?` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
## 1. 输出语言(必须遵守)
Or just describe what you want in plain English:
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the graph and show me what's connected to RAG"*
- 所有输出必须使用**简体中文**
- 专有名词允许保留英文,但首次出现必须附带中文解释
- 如果原始文件名是中文则source页面的名称尽量用中文不要用拼音表示, 如果有特殊字符可以忽略
- 禁止中英混合句(术语除外)
- 不允许输出纯英文总结或分析
Claude Code reads this file automatically and follows the workflows below.
示例:
Transformer变压器模型一种基于注意力机制的神经网络架构
---
## Directory Layout
## 2. 输出风格(严格限制)
```
raw/ # Immutable source documents — never modify these
wiki/ # Claude owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
所有输出必须:
- 去修辞(禁止 narrative 风格)
- 去模糊(禁止“可能”“大概”等词)
- 信息密度最大化
- 面向“知识结构化”,而非阅读体验
优先级:
结构 > 关系 > 结论 > 描述
---
## Page Format
## 3. 结构化语义(必须)
Every wiki page uses this frontmatter:
所有页面必须遵循结构化语义规则:
- Summary 必须使用固定字段
- Claim 必须符合标准语法
- Connections 必须使用关系类型
- 禁止自由发挥
---
# Slash CommandsClaude Code
| Command | 使用方式 |
| -------------- | --------------------------- |
| `/wiki-ingest` | `ingest raw/your-file.md` |
| `/wiki-query` | `query: 你的问题` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
---
## 自然语言示例
- ingest raw/papers/attention-is-all-you-need.md
- query: Transformer 的核心机制是什么?
- lint the wiki
- build the graph and analyze RAG
Claude Code 会自动读取本文件并执行以下工作流。
---
# Directory Layout目录结构
```
raw/ # 原始文档(不可修改)
wiki/ # 知识层(由 Claude 完全维护)
index.md # 页面索引(每次 ingest 必须更新)
log.md # 追加式日志
overview.md # 全局知识总结
sources/ # 每个原始文档对应一个页面
entities/ # 实体(人/公司/产品/项目)
concepts/ # 概念(方法/理论/框架)
syntheses/ # 查询结果沉淀
graph/ # 自动生成的图数据
tools/ # 可选 Python 工具 (require ANTHROPIC_API_KEY)
````
---
# Page Format页面格式
每个页面必须包含:
```yaml
---
id: unique_id
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
sources: [] # 来源
last_updated: YYYY-MM-DD
---
```
````
Use `[[PageName]]` wikilinks to link to other wiki pages.
必须使用 `[[PageName]]` 进行链接。
---
## Ingest Workflow
# Ingest Workflow(摄取流程)
**重要** 请严格按照摄取流程进行操作,每分析一个页面必须要创建/更新source pageentity, concept等。不可遗漏
Triggered by: *"ingest <file>"* or `/wiki-ingest`
触发方式:
- `/wiki-ingest`
- 或:`ingest <file>`
## 执行步骤(严格顺序)
1. 使用 Read 工具完整读取 source 文档
2. 读取 `wiki/index.md` 和 `wiki/overview.md`
3. 生成 `wiki/sources/原始中文名.md` (非中文使用 slug.md)
4. 更新 `wiki/index.md`
5. 更新 `wiki/overview.md`(如有必要)
6. 创建或更新 Entity 页面
7. 创建或更新 Concept 页面
8. 检测并记录冲突
9. 追加 `wiki/log.md`
Steps (in order):
1. Read the source document fully using the Read tool
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
---
### Source Page Format
# Source Page Format(增强结构)
```markdown
---
@@ -80,32 +134,46 @@ title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Source File
- [[raw/...]]
## Summary
24 sentence summary.
- 核心主题:
- 问题域:
- 方法/机制:
- 结论/价值:
## Key Claims
- Claim 1
- Claim 2
- (必须符合:主体 + 机制 + 结果)
## Key Quotes
> "Quote here" — context
> "引用内容" — 上下文说明
## Key Concepts
- [[ConceptName]]:定义
## Key Entities
- [[EntityName]]:角色说明
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
- [[A]] ← depends_on ← [[B]]
- [[C]] ← extends ← [[D]]
## Contradictions
- Contradicts [[OtherPage]] on: ...
- 与 [[OtherPage]] 冲突:
- 冲突点:
- 当前观点:
- 对方观点:
```
### Domain-Specific Templates
---
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
# Domain-Specific Templates领域模板
## Diary / Journal
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
@@ -114,18 +182,16 @@ tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
---
## Meeting Notes
```markdown
---
title: "Meeting Title"
@@ -134,97 +200,153 @@ tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
# Entity & Concept Rules关键增强
Triggered by: *"query: <question>"* or `/wiki-query`
## Entity实体
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages with the Read tool
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
创建条件:
- 出现 ≥ 2 次
- 对主题有关键影响
类型:
- 人 / 公司 / 产品 / 项目
---
## Lint Workflow
## Concept概念
创建条件:
- 可抽象
- 可复用
- 非具体实例
---
Triggered by: *"lint the wiki"* or `/wiki-lint`
## 命名规范(强制)
- 使用唯一标准名称
- 所有别名写入页面:
Use Grep and Read tools to check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
```markdown
## Aliases
- GPT4
- GPT-4
```
---
## Graph Workflow
## 去重机制(必须)
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
When the user asks to build the graph, run `tools/build_graph.py` which:
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
- Runs Louvain community detection
- Outputs `graph/graph.json` + `graph/graph.html`
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
1. Use Grep to find all `[[wikilinks]]` across wiki pages
2. Build a node/edge list
3. Write `graph/graph.json` directly
4. Write `graph/graph.html` using the vis.js template
创建前必须:
1. 搜索 index
2. 判断是否存在
3. 存在则更新
---
## Naming Conventions
# Query Workflow查询流程
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
- Source pages: `kebab-case.md`
触发:
- `/wiki-query`
- 或:`query: 问题`
## Index Format
---
## 步骤
1. 读取 index
2. 找到相关页面
3. 使用 Read 工具加载
4. 输出结构化答案
5. 使用 `[[Page]]` 引用
6. 询问是否保存为 synthesis
---
# Lint Workflow校验
检查内容:
- 孤立页面
- 断链
- 冲突
- 过期内容
- 缺失Entity
- 缺失Concept
- 知识空白
---
# Graph Workflow知识图谱
触发:
- `/wiki-graph`
---
执行:
- 优先运行 `tools/build_graph.py`
- 否则手动构建:
步骤:
1. 提取所有 `[[links]]`
2. 构建节点与边
3. 输出 `graph.json`
---
# Naming Conventions命名规范
- Source保留原始中文名称去除特殊符号非中文使用 kebab-case
- EntityTitleCase
- ConceptTitleCase
---
# Index Format索引结构
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
- [Overview](overview.md)
## Sources
- [Source Title](sources/slug.md) — one-line summary
- [Title](sources/原始中文名.md)
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
- [Entity](entities/Entity.md)
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
- [Concept](concepts/Concept.md)
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
- [Title](syntheses/slug.md)
```
## Log Format
---
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
# Log Format日志
```
grep "^## \[" wiki/log.md | tail -10
## [YYYY-MM-DD] ingest | 标题
```
Operations: `ingest`, `query`, `lint`, `graph`
---
# ✅ 最终目标
该系统用于:
- 知识沉淀
- 结构化理解
- 自动图谱构建
- Agent 推理支持
---
# END

230
CLAUDE.md.bak Normal file
View File

@@ -0,0 +1,230 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
## Slash Commands (Claude Code)
| Command | What to say |
|---|---|
| `/wiki-ingest` | `ingest raw/my-article.md` |
| `/wiki-query` | `query: what are the main themes?` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
Or just describe what you want in plain English:
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the graph and show me what's connected to RAG"*
Claude Code reads this file automatically and follows the workflows below.
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Claude owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"* or `/wiki-ingest`
Steps (in order):
1. Read the source document fully using the Read tool
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
- Claim 2
## Key Quotes
> "Quote here" — context
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"* or `/wiki-query`
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages with the Read tool
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint the wiki"* or `/wiki-lint`
Use Grep and Read tools to check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
---
## Graph Workflow
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
When the user asks to build the graph, run `tools/build_graph.py` which:
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
- Runs Louvain community detection
- Outputs `graph/graph.json` + `graph/graph.html`
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
1. Use Grep to find all `[[wikilinks]]` across wiki pages
2. Build a node/edge list
3. Write `graph/graph.json` directly
4. Write `graph/graph.html` using the vis.js template
---
## Naming Conventions
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
- Source pages: `kebab-case.md`
## Index Format
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
## Sources
- [Source Title](sources/slug.md) — one-line summary
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
```
## Log Format
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
```
grep "^## \[" wiki/log.md | tail -10
```
Operations: `ingest`, `query`, `lint`, `graph`

1
raw Symbolic link
View File

@@ -0,0 +1 @@
/Users/weishen/Workspace/nexus/raw

View File

Binary file not shown.

567
tools/sync.py Executable file
View File

@@ -0,0 +1,567 @@
#!/usr/bin/env python3
"""
Wiki ↔ Raw 三向同步工具
功能:
- 检测 raw/ 下文件变化(新增/修改/删除)
- 自动调用 ingest.py 进行同步
- 维护 manifest.json 状态映射
- 检测 orphan entity/concept仅报告不删除
用法:
python tools/sync.py --check 预览变化(不执行)
python tools/sync.py --sync 执行同步
python tools/sync.py --rebuild 从 manifest 重建 wiki/index兜底
python tools/sync.py --bootstrap 从现有 wiki sources 反向生成 manifest首次用跳过已 ingest 的文件)
manifest.json 格式:
{
"version": 1,
"updated_at": "ISO timestamp",
"files": {
"relative/path/to/file.md": {
"hash": "sha256",
"modified": "ISO timestamp",
"slug": "wiki-source-slug",
"source_path": "wiki/sources/slug.md",
"ingested": true
}
}
}
"""
import os
import sys
import json
import hashlib
import subprocess
from pathlib import Path
from datetime import datetime, timezone
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
MANIFEST_FILE = WIKI_DIR / "manifest.json"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
# ─── 工具函数 ───────────────────────────────────────────────
def green(text):
return f"\033[92m{text}\033[0m"
def yellow(text):
return f"\033[93m{text}\033[0m"
def red(text):
return f"\033[91m{text}\033[0m"
def dim(text):
return f"\033[2m{text}\033[0m"
def bold(text):
return f"\033[1m{text}\033[0m"
def log(msg, style="normal"):
prefixes = {
"normal": " ",
"info": " ",
"success": "",
"warn": "",
"error": "",
"section": "\n── ",
}
print(f"{prefixes.get(style, ' ')}{msg}")
def sha256_file(path: Path) -> str:
h = hashlib.sha256()
h.update(path.read_bytes())
return h.hexdigest()[:16]
def iso_now():
return datetime.now(timezone.utc).isoformat()
def load_manifest() -> dict:
if MANIFEST_FILE.exists():
try:
return json.loads(MANIFEST_FILE.read_text(encoding="utf-8"))
except (json.JSONDecodeError, IOError):
pass
return {"version": 1, "updated_at": iso_now(), "files": {}}
def save_manifest(manifest: dict):
manifest["updated_at"] = iso_now()
MANIFEST_FILE.write_text(json.dumps(manifest, ensure_ascii=False, indent=2), encoding="utf-8")
def scan_raw() -> dict[str, dict]:
"""返回 {relative_path: {hash, modified, size}}"""
raw_dir = REPO_ROOT / "raw"
result = {}
if not raw_dir.exists():
return result
for p in raw_dir.rglob("*.md"):
if p.is_file() and not p.name.startswith("."):
rel = str(p.relative_to(REPO_ROOT))
stat = p.stat()
result[rel] = {
"hash": sha256_file(p),
"modified": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc).isoformat(),
"size": stat.st_size,
"abs_path": str(p),
}
return result
def build_slug_from_path(rel_path: str) -> str:
"""从相对路径生成 slug尽量保留中文kebab-case"""
name = Path(rel_path).stem
name = name.replace(" ", "-").replace("/", "-").replace("\\", "-")
name = "".join(c if c.isalnum() or c in ("-", "_", "·") else "-" for c in name)
name = name.strip("-")
return name or "untitled"
def call_ingest(source_path: str, slug: str = None) -> dict:
"""调用 ingest.py返回结果"""
cmd = [sys.executable, str(REPO_ROOT / "tools" / "ingest.py"), source_path]
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=300,
cwd=str(REPO_ROOT),
)
return {
"success": result.returncode == 0,
"stdout": result.stdout,
"stderr": result.stderr,
}
except subprocess.TimeoutExpired:
return {"success": False, "stdout": "", "stderr": "Timeout (>5min)"}
except Exception as e:
return {"success": False, "stdout": "", "stderr": str(e)}
def find_orphan_entity_concept(manifest: dict) -> tuple[list, list]:
"""检测未被任何 source page 引用的 entity 和 concept"""
# 从所有 source 内容中提取 [[wikilinks]]
import re
wikilink_pattern = re.compile(r"\[\[([^\]]+)\]\]")
sources_dir = WIKI_DIR / "sources"
referenced_entities = set()
referenced_concepts = set()
if sources_dir.exists():
for src in sources_dir.glob("*.md"):
content = src.read_text(encoding="utf-8")
for link in wikilink_pattern.findall(content):
name = link.strip()
if name.startswith("entities/"):
referenced_entities.add(Path(name).stem)
elif name.startswith("concepts/"):
referenced_concepts.add(Path(name).stem)
elif "/" not in name:
# 裸 wikilink可能是 entity 或 concept
referenced_entities.add(name)
referenced_concepts.add(name)
# 检查 entity 目录
orphan_entities = []
entities_dir = WIKI_DIR / "entities"
if entities_dir.exists():
for f in entities_dir.glob("*.md"):
if f.stem not in referenced_entities:
orphan_entities.append(f.name)
# 检查 concept 目录
orphan_concepts = []
concepts_dir = WIKI_DIR / "concepts"
if concepts_dir.exists():
for f in concepts_dir.glob("*.md"):
if f.stem not in referenced_concepts:
orphan_concepts.append(f.name)
return orphan_entities, orphan_concepts
# ─── 核心同步逻辑 ───────────────────────────────────────────────
def check_changes(manifest: dict, raw_files: dict) -> dict:
"""对比 manifest 和实际 raw 文件,返回变化"""
changes = {"new": [], "updated": [], "deleted": [], "unchanged": []}
manifest_files = manifest.get("files", {})
# 遍历当前 raw 文件
for rel_path, info in raw_files.items():
if rel_path not in manifest_files:
changes["new"].append({"rel_path": rel_path, **info})
elif info["hash"] != manifest_files[rel_path]["hash"]:
changes["updated"].append({
"rel_path": rel_path,
"old_hash": manifest_files[rel_path]["hash"],
**info,
})
else:
changes["unchanged"].append(rel_path)
# 遍历 manifest找已删除的
for rel_path in manifest_files:
abs_path = REPO_ROOT / rel_path
if not abs_path.exists():
changes["deleted"].append({
"rel_path": rel_path,
"slug": manifest_files[rel_path].get("slug", build_slug_from_path(rel_path)),
"source_path": manifest_files[rel_path].get("source_path"),
})
return changes
def run_sync(dry_run: bool = False, verbose: bool = False):
print(f"\n{bold('=== Wiki Sync')}\n")
print(f" Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print(f" Raw: {REPO_ROOT / 'raw'}")
print(f" Wiki: {WIKI_DIR}")
print(f" Mode: {'DRY-RUN (preview only)' if dry_run else 'LIVE SYNC'}")
print()
# Step 1: load manifest
manifest = load_manifest()
log("manifest.json loaded", "info")
# Step 2: scan raw/
raw_files = scan_raw()
log(f"raw/ scan: {len(raw_files)} .md files found", "info")
# Step 3: check changes
changes = check_changes(manifest, raw_files)
total_changes = len(changes["new"]) + len(changes["updated"]) + len(changes["deleted"])
if total_changes == 0:
log("No changes detected — wiki is up to date.", "success")
return
# ─── Report ───
print(f"\n{bold('--- Changes ---')}")
print(f" {green('+')} New: {len(changes['new'])}")
print(f" {yellow('~')} Updated: {len(changes['updated'])}")
print(f" {red('-')} Deleted: {len(changes['deleted'])}")
if verbose or not dry_run:
if changes["new"]:
print(f"\n {bold('New Files:')}")
for f in changes["new"]:
log(f"{green('[+')} {f['rel_path']}", "normal")
if changes["updated"]:
print(f"\n {bold('Updated Files:')}")
for f in changes["updated"]:
log(f"{yellow('[~]')} {f['rel_path']} (hash changed)", "normal")
if changes["deleted"]:
print(f"\n {bold('Deleted Files:')}")
for f in changes["deleted"]:
log(f"{red('[-]')} {f['rel_path']}", "normal")
if dry_run:
log("\nDry-run complete. Run with --sync to apply.", "warn")
return
# ─── Apply Sync ───
print(f"\n{bold('--- Applying Sync ---')}")
updated_manifest = manifest.copy()
updated_manifest["files"] = manifest.get("files", {}).copy()
# ① 新增 → ingest
for f in changes["new"]:
rel_path = f["rel_path"]
abs_path = f["abs_path"]
slug = build_slug_from_path(rel_path)
print(f"\n {green('[+]')} New: {rel_path}")
print(f" slug: {slug}")
result = call_ingest(abs_path, slug)
if result["success"]:
log(f"Ingested: {slug}.md", "success")
updated_manifest["files"][rel_path] = {
"hash": f["hash"],
"modified": f["modified"],
"slug": slug,
"source_path": f"wiki/sources/{slug}.md",
"ingested": True,
"ingested_at": iso_now(),
}
else:
log(f"Failed: {result['stderr'][:200]}", "error")
# 仍然记录(避免重复 ingest
updated_manifest["files"][rel_path] = {
"hash": f["hash"],
"modified": f["modified"],
"slug": slug,
"source_path": f"wiki/sources/{slug}.md",
"ingested": False,
"ingested_at": None,
"error": result["stderr"][:500],
}
# ② 修改 → re-ingest
for f in changes["updated"]:
rel_path = f["rel_path"]
abs_path = f["abs_path"]
old_slug = manifest["files"].get(rel_path, {}).get("slug") or build_slug_from_path(rel_path)
print(f"\n {yellow('[~]')} Updated: {rel_path}")
result = call_ingest(abs_path, old_slug)
if result["success"]:
log(f"Re-ingested: {old_slug}.md", "success")
updated_manifest["files"][rel_path] = {
**updated_manifest["files"].get(rel_path, {}),
"hash": f["hash"],
"modified": f["modified"],
"slug": old_slug,
"source_path": f"wiki/sources/{old_slug}.md",
"ingested": True,
"ingested_at": iso_now(),
}
else:
log(f"Failed: {result['stderr'][:200]}", "error")
# ③ 删除 → 保留 wiki 内容,仅从 manifest 移除(按用户要求保留 orphan
for f in changes["deleted"]:
rel_path = f["rel_path"]
source_path = f.get("source_path")
print(f"\n {red('[-]')} Deleted: {rel_path}")
if source_path:
sp = WIKI_DIR / source_path
log(f" Wiki source kept: {sp}", "warn")
# 从 manifest 移除(不删除 wiki 文件)
if rel_path in updated_manifest["files"]:
del updated_manifest["files"][rel_path]
# Step 4: Save manifest
save_manifest(updated_manifest)
log(f"\nmanifest.json updated ({len(updated_manifest['files'])} entries)", "success")
# Step 5: Orphan detection
orphan_entities, orphan_concepts = find_orphan_entity_concept(updated_manifest)
if orphan_entities or orphan_concepts:
print(f"\n{bold('--- Orphan Report (kept as requested) ---')}")
if orphan_entities:
print(f" {bold('Orphan Entities')} ({len(orphan_entities)}):")
for e in sorted(orphan_entities):
print(f" {dim('?')} {e}")
if orphan_concepts:
print(f" {bold('Orphan Concepts')} ({len(orphan_concepts)}):")
for c in sorted(orphan_concepts):
print(f" {dim('?')} {c}")
log("\nOrphan pages are kept (not deleted per user request).", "info")
else:
log("No orphan entity/concept detected.", "success")
print(f"\n{bold('Done.')}")
def run_bootstrap():
"""从现有 wiki sources 反向生成 manifest跳过已 ingest 的文件"""
import re
print(f"\n{bold('=== Wiki Bootstrap')}\n")
print(f" Scanning existing wiki sources to build manifest ...\n")
sources_dir = WIKI_DIR / "sources"
if not sources_dir.exists():
print(f" {red('')} No wiki/sources/ directory found. Nothing to bootstrap.")
return
wikilink_pattern = re.compile(r"\[\[?raw/([^\]\s]+\.md)\]?]?", re.IGNORECASE)
manifest = {"version": 1, "updated_at": iso_now(), "files": {}}
raw_dir = (REPO_ROOT / "raw").resolve() # 解析 symlink 到真实路径
repo_raw_prefix = str(REPO_ROOT / "raw") # 用于 strip 前缀得到相对路径
bootstrapped = 0
skipped_not_found = 0
skipped_no_source_field = 0
for src in sources_dir.glob("*.md"):
content = src.read_text(encoding="utf-8")
# 尝试从 ## Source File 字段提取原始路径
match = wikilink_pattern.search(content)
if not match:
skipped_no_source_field += 1
continue
# raw_rel 格式如 "Agent/usecases/xxx.md"(不含 raw/ 前缀)
raw_rel = match.group(1).lstrip("/")
# 用 resolved 后的 raw_dir 拼接follow symlink
raw_path = raw_dir / raw_rel
if not raw_path.exists():
# 文件已删除,保留 source page 但不加入 manifest
skipped_not_found += 1
continue
stat = raw_path.stat()
file_hash = sha256_file(raw_path)
slug = src.stem
# manifest key 用 "raw/Agent/xxx.md" 格式REPO_ROOT 相对路径)
manifest_key = f"raw/{raw_rel}"
manifest["files"][manifest_key] = {
"hash": file_hash,
"modified": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc).isoformat(),
"slug": slug,
"source_path": f"wiki/sources/{slug}.md",
"ingested": True,
"ingested_at": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc).isoformat(),
}
bootstrapped += 1
save_manifest(manifest)
print(f" {bold('Result:')}")
print(f" {green('')} Manifest entries created: {bootstrapped}")
print(f" {yellow('~')} Skipped (source file deleted): {skipped_not_found}")
print(f" {dim('-')} Skipped (no source_file field): {skipped_no_source_field}")
print(f"\n {green('')} manifest.json created at: {MANIFEST_FILE}")
print(f"\n Run now: {bold('python tools/sync.py --check')} to preview new/updated files.\n")
def run_check():
"""只预览变化,不执行"""
manifest = load_manifest()
raw_files = scan_raw()
changes = check_changes(manifest, raw_files)
total = len(changes["new"]) + len(changes["updated"]) + len(changes["deleted"])
print(f"\n{bold('=== Wiki Sync Check')} (preview mode)\n")
print(f" Raw files: {len(raw_files)}")
print(f" Manifest entries: {len(manifest.get('files', {}))}")
print(f" {green('+')} New: {len(changes['new'])}")
print(f" {yellow('~')} Updated: {len(changes['updated'])}")
print(f" {red('-')} Deleted: {len(changes['deleted'])}")
if total > 0:
if changes["new"]:
print(f"\n {bold('New Files:')}")
for f in changes["new"]:
print(f" {green('[+]')} {f['rel_path']}")
if changes["updated"]:
print(f"\n {bold('Updated Files:')}")
for f in changes["updated"]:
print(f" {yellow('[~]')} {f['rel_path']} (was {f['old_hash']}, now {f['hash']})")
if changes["deleted"]:
print(f"\n {bold('Deleted Files:')}")
for f in changes["deleted"]:
print(f" {red('[-]')} {f['rel_path']}")
else:
print(f"\n {green('No changes — wiki is in sync.')}")
print()
def run_rebuild():
"""从 manifest 重建 wiki/index.md兜底方案"""
manifest = load_manifest()
print(f"\n{bold('=== Wiki Rebuild from Manifest')}\n")
print(f" Manifest entries: {len(manifest.get('files', {}))}")
print(f" Rebuilding index.md ...\n")
index_lines = [
"# Wiki Index\n",
"\n## Overview\n",
"- [Overview](overview.md) — living synthesis\n",
"\n## Sources\n",
]
files = manifest.get("files", {})
# 按 modified 时间倒序
sorted_files = sorted(files.items(), key=lambda x: x[1].get("modified", ""), reverse=True)
for rel_path, info in sorted_files:
slug = info.get("slug", build_slug_from_path(rel_path))
source_md_path = WIKI_DIR / "sources" / f"{slug}.md"
if source_md_path.exists():
title = source_md_path.read_text(encoding="utf-8").split("\n")[0].lstrip("# ").strip()
index_lines.append(f"- [{title}](sources/{slug}.md)\n")
else:
index_lines.append(f"- [{slug}](sources/{slug}.md) — (source missing)\n")
index_lines.append("\n## Entities\n\n## Concepts\n\n## Syntheses\n")
index_file = WIKI_DIR / "index.md"
index_file.write_text("".join(index_lines), encoding="utf-8")
print(f" {green('')} index.md rebuilt with {len(sorted_files)} sources")
# Orphan report
orphan_entities, orphan_concepts = find_orphan_entity_concept(manifest)
if orphan_entities:
print(f" {dim('?')} Orphan entities: {len(orphan_entities)}")
if orphan_concepts:
print(f" {dim('?')} Orphan concepts: {len(orphan_concepts)}")
print(f"\nDone.")
# ─── CLI 入口 ───────────────────────────────────────────────
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description="Wiki ↔ Raw 三向同步工具",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--check",
action="store_true",
help="预览变化,不执行同步",
)
parser.add_argument(
"--sync",
action="store_true",
help="执行完整同步(新增/修改/删除 + orphan 检测)",
)
parser.add_argument(
"--rebuild",
action="store_true",
help="从 manifest 重建 wiki/index.md兜底方案",
)
parser.add_argument(
"--bootstrap",
action="store_true",
help="从现有 wiki sources 反向生成 manifest首次使用跳过已 ingest 的文件)",
)
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="详细输出",
)
args = parser.parse_args()
if args.bootstrap:
run_bootstrap()
elif args.rebuild:
run_rebuild()
elif args.check:
run_check()
elif args.sync:
run_sync(dry_run=False, verbose=args.verbose)
else:
parser.print_help()
print("\n示例:")
print(" python tools/sync.py --check # 预览变化")
print(" python tools/sync.py --sync # 执行同步")
print(" python tools/sync.py --sync -v # 详细模式")
print(" python tools/sync.py --rebuild # 重建 index")
print(" python tools/sync.py --bootstrap # 首次:从 wiki sources 生成 manifest")

1
wiki Symbolic link
View File

@@ -0,0 +1 @@
/Users/weishen/Workspace/nexus/wiki

View File

@@ -1,14 +0,0 @@
# Wiki Index
This file is maintained by the LLM. Updated on every ingest.
## Overview
- [Overview](overview.md) — living synthesis across all sources
## Sources
## Entities
## Concepts
## Syntheses

View File

@@ -1,9 +0,0 @@
# Wiki Log
Append-only chronological record of all operations.
Format: `## [YYYY-MM-DD] <operation> | <title>`
Parse recent entries: `grep "^## \[" wiki/log.md | tail -10`
---

View File

@@ -1,17 +0,0 @@
---
title: "Overview"
type: synthesis
tags: []
sources: []
last_updated: ""
---
# Overview
*This page is maintained by the LLM. It is updated on every ingest to reflect the current synthesis across all sources.*
No sources ingested yet. Add your first source with:
```bash
python tools/ingest.py raw/your-source.md
```