Auto-sync: 2026-04-16 13:01

This commit is contained in:
2026-04-16 13:01:38 +08:00
parent c0571d778c
commit b2250c60b2
59 changed files with 1288 additions and 2670 deletions

View File

@@ -1,18 +0,0 @@
Build the LLM Wiki knowledge graph.
Usage: /wiki-graph
First try running: python tools/build_graph.py --open
If that fails (missing dependencies), build the graph manually:
1. Use Grep to find all [[wikilinks]] across every file in wiki/
2. Build a nodes list: one node per wiki page, with id=relative-path, label=title, type from frontmatter
3. Build an edges list: one edge per [[wikilink]], tagged EXTRACTED
4. Infer additional implicit relationships between pages not captured by wikilinks — tag these INFERRED with a confidence score (0.01.0); tag low-confidence ones AMBIGUOUS
5. Write graph/graph.json with {nodes, edges, built: today}
6. Write graph/graph.html as a self-contained vis.js page (nodes colored by type, edges colored by type, interactive, searchable)
After building, summarize: node count, edge count, breakdown by type, and the most connected nodes (hubs).
Append to wiki/log.md: ## [today's date] graph | Knowledge graph rebuilt

View File

@@ -1,18 +0,0 @@
Ingest a source document into the LLM Wiki.
Usage: /wiki-ingest $ARGUMENTS
$ARGUMENTS should be the path to a file in raw/, e.g. `raw/articles/my-article.md`
Follow the Ingest Workflow defined in CLAUDE.md exactly:
1. Read the source file at the given path
2. Read wiki/index.md and wiki/overview.md for current context
3. Write wiki/sources/<slug>.md (source page format per CLAUDE.md)
4. Update wiki/index.md — add the new entry under Sources
5. Update wiki/overview.md — revise synthesis if warranted
6. Create/update entity pages (wiki/entities/) for key people, companies, projects
7. Create/update concept pages (wiki/concepts/) for key ideas and frameworks
8. Flag any contradictions with existing wiki content
9. Append to wiki/log.md: ## [today's date] ingest | <Title>
After completing all writes, summarize: what was added, which pages were created or updated, and any contradictions found.

View File

@@ -1,19 +0,0 @@
Health-check the LLM Wiki for issues.
Usage: /wiki-lint
Follow the Lint Workflow defined in CLAUDE.md:
Structural checks (use Grep and Glob tools):
1. Orphan pages — wiki pages with no inbound [[wikilinks]] from other pages
2. Broken links — [[WikiLinks]] pointing to pages that don't exist
3. Missing entity pages — names referenced in 3+ pages but lacking their own page
Semantic checks (read and reason over page content):
4. Contradictions — claims that conflict between pages
5. Stale summaries — pages not updated after newer sources changed the picture
6. Data gaps — important questions the wiki can't answer; suggest specific sources to find
Output a structured markdown lint report. At the end, ask if the user wants it saved to wiki/lint-report.md.
Append to wiki/log.md: ## [today's date] lint | Wiki health check

View File

@@ -1,14 +0,0 @@
Query the LLM Wiki and synthesize an answer.
Usage: /wiki-query $ARGUMENTS
$ARGUMENTS is the question to answer, e.g. `What are the main themes across all sources?`
Follow the Query Workflow defined in CLAUDE.md:
1. Read wiki/index.md to identify the most relevant pages
2. Read those pages (up to ~10 most relevant)
3. Synthesize a thorough markdown answer with [[PageName]] wikilink citations
4. Include a ## Sources section at the end listing pages you drew from
5. Ask the user if they want the answer saved as wiki/syntheses/<slug>.md
If the wiki is empty, say so and suggest running /wiki-ingest first.

219
AGENTS.md
View File

@@ -1,219 +0,0 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by your coding agent. No API key or Python scripts needed — just open this repo in Codex, OpenCode, or any agent that reads this file, and talk to it.
## How to Use
Describe what you want in plain English:
- *"Ingest this file: raw/papers/my-paper.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the knowledge graph"*
Or use shorthand triggers:
- `ingest <file>` → runs the Ingest Workflow
- `query: <question>` → runs the Query Workflow
- `lint` → runs the Lint Workflow
- `build graph` → runs the Graph Workflow
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Agent owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"*
Steps (in order):
1. Read the source document fully
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
- Claim 2
## Key Quotes
> "Quote here" — context
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"*
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint"*
Check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
---
## Graph Workflow
Triggered by: *"build graph"*
First try: `python tools/build_graph.py --open`
If Python/deps unavailable, build manually:
1. Search for all `[[wikilinks]]` across wiki pages
2. Build nodes (one per page) and edges (one per link)
3. Infer implicit relationships not captured by wikilinks — tag `INFERRED` with confidence score; low confidence → `AMBIGUOUS`
4. Write `graph/graph.json` with `{nodes, edges, built: date}`
5. Write `graph/graph.html` as a self-contained vis.js visualization
---
## Naming Conventions
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
## Index Format
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
## Sources
- [Source Title](sources/slug.md) — one-line summary
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
```
## Log Format
`## [YYYY-MM-DD] <operation> | <title>`
Operations: `ingest`, `query`, `lint`, `graph`

View File

@@ -1,230 +0,0 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
## Slash Commands (Claude Code)
| Command | What to say |
|---|---|
| `/wiki-ingest` | `ingest raw/my-article.md` |
| `/wiki-query` | `query: what are the main themes?` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
Or just describe what you want in plain English:
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the graph and show me what's connected to RAG"*
Claude Code reads this file automatically and follows the workflows below.
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Claude owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"* or `/wiki-ingest`
Steps (in order):
1. Read the source document fully using the Read tool
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
- Claim 2
## Key Quotes
> "Quote here" — context
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"* or `/wiki-query`
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages with the Read tool
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint the wiki"* or `/wiki-lint`
Use Grep and Read tools to check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
---
## Graph Workflow
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
When the user asks to build the graph, run `tools/build_graph.py` which:
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
- Runs Louvain community detection
- Outputs `graph/graph.json` + `graph/graph.html`
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
1. Use Grep to find all `[[wikilinks]]` across wiki pages
2. Build a node/edge list
3. Write `graph/graph.json` directly
4. Write `graph/graph.html` using the vis.js template
---
## Naming Conventions
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
- Source pages: `kebab-case.md`
## Index Format
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
## Sources
- [Source Title](sources/slug.md) — one-line summary
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
```
## Log Format
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
```
grep "^## \[" wiki/log.md | tail -10
```
Operations: `ingest`, `query`, `lint`, `graph`

352
CLAUDE.md
View File

@@ -1,352 +0,0 @@
# LLM Wiki Agent — Schema & Workflow Instructions中文版增强规范
本 Wiki 完全由 Claude Code 自动维护。无需 API Key 或 Python 脚本 —— 只需在 Claude Code 中打开本仓库并与其对话。
---
# 🔴 全局强制规则CRITICAL
## 1. 输出语言(必须遵守)
- 所有输出必须使用**简体中文**
- 专有名词允许保留英文,但首次出现必须附带中文解释
- 如果原始文件名是中文则source页面的名称尽量用中文不要用拼音表示, 如果有特殊字符可以忽略
- 禁止中英混合句(术语除外)
- 不允许输出纯英文总结或分析
示例:
Transformer变压器模型一种基于注意力机制的神经网络架构
---
## 2. 输出风格(严格限制)
所有输出必须:
- 去修辞(禁止 narrative 风格)
- 去模糊(禁止“可能”“大概”等词)
- 信息密度最大化
- 面向“知识结构化”,而非阅读体验
优先级:
结构 > 关系 > 结论 > 描述
---
## 3. 结构化语义(必须)
所有页面必须遵循结构化语义规则:
- Summary 必须使用固定字段
- Claim 必须符合标准语法
- Connections 必须使用关系类型
- 禁止自由发挥
---
# Slash CommandsClaude Code
| Command | 使用方式 |
| -------------- | --------------------------- |
| `/wiki-ingest` | `ingest raw/your-file.md` |
| `/wiki-query` | `query: 你的问题` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
---
## 自然语言示例
- ingest raw/papers/attention-is-all-you-need.md
- query: Transformer 的核心机制是什么?
- lint the wiki
- build the graph and analyze RAG
Claude Code 会自动读取本文件并执行以下工作流。
---
# Directory Layout目录结构
```
raw/ # 原始文档(不可修改)
wiki/ # 知识层(由 Claude 完全维护)
index.md # 页面索引(每次 ingest 必须更新)
log.md # 追加式日志
overview.md # 全局知识总结
sources/ # 每个原始文档对应一个页面
entities/ # 实体(人/公司/产品/项目)
concepts/ # 概念(方法/理论/框架)
syntheses/ # 查询结果沉淀
graph/ # 自动生成的图数据
tools/ # 可选 Python 工具 (require ANTHROPIC_API_KEY)
````
---
# Page Format页面格式
每个页面必须包含:
```yaml
---
id: unique_id
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # 来源
last_updated: YYYY-MM-DD
---
````
必须使用 `[[PageName]]` 进行链接。
---
# Ingest Workflow摄取流程
**重要** 请严格按照摄取流程进行操作,每分析一个页面必须要创建/更新source pageentity, concept等。不可遗漏
触发方式:
- `/wiki-ingest`
- 或:`ingest <file>`
## 执行步骤(严格顺序)
1. 使用 Read 工具完整读取 source 文档
2. 读取 `wiki/index.md` 和 `wiki/overview.md`
3. 生成 `wiki/sources/原始中文名.md` (非中文使用 slug.md)
4. 更新 `wiki/index.md`
5. 更新 `wiki/overview.md`(如有必要)
6. 创建或更新 Entity 页面
7. 创建或更新 Concept 页面
8. 检测并记录冲突
9. 追加 `wiki/log.md`
---
# Source Page Format增强结构
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
---
## Source File
- [[raw/...]]
## Summary
- 核心主题:
- 问题域:
- 方法/机制:
- 结论/价值:
## Key Claims
- (必须符合:主体 + 机制 + 结果)
## Key Quotes
> "引用内容" — 上下文说明
## Key Concepts
- [[ConceptName]]:定义
## Key Entities
- [[EntityName]]:角色说明
## Connections
- [[A]] ← depends_on ← [[B]]
- [[C]] ← extends ← [[D]]
## Contradictions
- 与 [[OtherPage]] 冲突:
- 冲突点:
- 当前观点:
- 对方观点:
```
---
# Domain-Specific Templates领域模板
## Diary / Journal
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
## Key Decisions
## Energy & Mood
## Connections
## Shifts & Contradictions
```
---
## Meeting Notes
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
## Key Discussions
## Decisions Made
## Action Items
```
---
# Entity & Concept Rules关键增强
## Entity实体
创建条件:
- 出现 ≥ 2 次
- 对主题有关键影响
类型:
- 人 / 公司 / 产品 / 项目
---
## Concept概念
创建条件:
- 可抽象
- 可复用
- 非具体实例
---
## 命名规范(强制)
- 使用唯一标准名称
- 所有别名写入页面:
```markdown
## Aliases
- GPT4
- GPT-4
```
---
## 去重机制(必须)
创建前必须:
1. 搜索 index
2. 判断是否存在
3. 存在则更新
---
# Query Workflow查询流程
触发:
- `/wiki-query`
- 或:`query: 问题`
---
## 步骤
1. 读取 index
2. 找到相关页面
3. 使用 Read 工具加载
4. 输出结构化答案
5. 使用 `[[Page]]` 引用
6. 询问是否保存为 synthesis
---
# Lint Workflow校验
检查内容:
- 孤立页面
- 断链
- 冲突
- 过期内容
- 缺失Entity
- 缺失Concept
- 知识空白
---
# Graph Workflow知识图谱
触发:
- `/wiki-graph`
---
执行:
- 优先运行 `tools/build_graph.py`
- 否则手动构建:
步骤:
1. 提取所有 `[[links]]`
2. 构建节点与边
3. 输出 `graph.json`
---
# Naming Conventions命名规范
- Source保留原始中文名称去除特殊符号非中文使用 kebab-case
- EntityTitleCase
- ConceptTitleCase
---
# Index Format索引结构
```markdown
# Wiki Index
## Overview
- [Overview](overview.md)
## Sources
- [Title](sources/原始中文名.md)
## Entities
- [Entity](entities/Entity.md)
## Concepts
- [Concept](concepts/Concept.md)
## Syntheses
- [Title](syntheses/slug.md)
```
---
# Log Format日志
```
## [YYYY-MM-DD] ingest | 标题
```
---
# ✅ 最终目标
该系统用于:
- 知识沉淀
- 结构化理解
- 自动图谱构建
- Agent 推理支持
---
# END

175
GEMINI.md
View File

@@ -1,175 +0,0 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by Gemini CLI. No API key or Python scripts needed — just open this repo with `gemini` and talk to it.
## How to Use
Describe what you want in plain English:
- *"Ingest this file: raw/papers/my-paper.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the knowledge graph"*
Or use shorthand triggers:
- `ingest <file>` → runs the Ingest Workflow
- `query: <question>` → runs the Query Workflow
- `lint` → runs the Lint Workflow
- `build graph` → runs the Graph Workflow
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Agent owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: []
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"*
1. Read the source document fully
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` (source page format below)
4. Update `wiki/index.md` — add entry under Sources
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity and concept pages
7. Flag contradictions with existing wiki content
8. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
## Key Quotes
> "Quote here"
## Connections
- [[EntityName]] — how they relate
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"*
1. Read `wiki/index.md` — identify relevant pages
2. Read those pages
3. Synthesize answer with `[[PageName]]` citations
4. Offer to save as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint"*
Check for: orphan pages, broken links, contradictions, stale content, missing entity pages, data gaps.
---
## Graph Workflow
Triggered by: *"build graph"*
Try `python tools/build_graph.py --open` first. If unavailable, build graph.json and graph.html manually from wikilinks.
---
## Naming Conventions
- Source slugs: `kebab-case`
- Entity/Concept pages: `TitleCase.md`
## Log Format
`## [YYYY-MM-DD] <operation> | <title>`

21
LICENSE
View File

@@ -1,21 +0,0 @@
MIT License
Copyright (c) 2023 SamurAIGPT
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

245
README.md
View File

@@ -1,245 +0,0 @@
# LLM Wiki Agent
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
**A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.
> Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.
```
/wiki-ingest raw/papers/attention-is-all-you-need.md
```
```
wiki/
├── index.md catalog of all pages — updated on every ingest
├── log.md append-only record of every operation
├── overview.md living synthesis across all sources
├── sources/ one summary page per source document
├── entities/ people, companies, projects — auto-created
├── concepts/ ideas, frameworks, methods — auto-created
└── syntheses/ query answers filed back as wiki pages
graph/
├── graph.json persistent node/edge data (SHA256-cached)
└── graph.html interactive vis.js visualization — open in any browser
```
## Install
**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.
```bash
git clone https://github.com/SamurAIGPT/llm-wiki-agent.git
cd llm-wiki-agent
```
Open in your agent — no API key or Python setup needed:
```bash
claude # reads CLAUDE.md + .claude/commands/
codex # reads AGENTS.md
opencode # reads AGENTS.md
gemini # reads GEMINI.md
```
## Usage
```
/wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki
/wiki-ingest raw/articles/my-article.md # works on any markdown file
/wiki-query "what are the main themes?" # synthesize answer from wiki pages
/wiki-query "how does X relate to Y?" # with [[wikilink]] citations
/wiki-lint # find orphans, contradictions, gaps
/wiki-graph # build graph.html from all wikilinks
```
Plain English also works with any agent:
```
"Ingest this paper: raw/papers/llama2.md"
"What does the wiki say about attention mechanisms?"
"Check for contradictions across sources"
"Build the knowledge graph and tell me the most connected nodes"
```
Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries.
## What You Get
**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.
**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.
**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.
**Living overview**`wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.
**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.
**Knowledge graph**`graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.
**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.
## Use Cases
### Research
Going deep on a topic over weeks — reading papers, articles, reports.
```
/wiki-ingest raw/papers/attention-is-all-you-need.md
/wiki-ingest raw/papers/llama2.md
/wiki-ingest raw/papers/rag-survey.md
# Wiki builds entity pages (Meta AI, Google Brain) and
# concept pages (Attention, RLHF, Context Window) automatically.
/wiki-query "What are the main approaches to reducing hallucination?"
/wiki-query "How has context window size evolved across models?"
/wiki-lint
# → "No sources on mixture-of-experts — consider the Mixtral paper"
```
By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.
---
### Reading a Book
File each chapter as you go. Build out pages for characters, themes, arguments.
```
/wiki-ingest raw/book/chapter-01.md
/wiki-ingest raw/book/chapter-02.md
# Wiki creates entity and theme pages automatically.
/wiki-query "How has the protagonist's motivation evolved?"
/wiki-query "What contradictions exist in the author's argument so far?"
/wiki-graph # → graph.html shows every character/theme and how they connect
```
Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.
---
### Personal Knowledge Base
Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes.
```
/wiki-ingest raw/journal/2026-01-week1.md
/wiki-ingest raw/articles/huberman-sleep-protocol.md
/wiki-ingest raw/articles/atomic-habits-summary.md
/wiki-query "What patterns show up in my journal entries about energy?"
/wiki-query "What habits have I tried and what was the outcome?"
```
The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed.
---
### Business / Team Intelligence
Feed in meeting transcripts, project docs, customer calls.
```
/wiki-ingest raw/meetings/q1-planning-transcript.md
/wiki-ingest raw/docs/product-roadmap-2026.md
/wiki-ingest raw/calls/customer-interview-acme.md
/wiki-query "What feature requests have come up most across customer calls?"
/wiki-query "What decisions were made in Q1 and what was the rationale?"
/wiki-lint
# → "Project X mentioned in 5 pages but no dedicated page"
# → "Roadmap contradicts customer interview on priority of feature Y"
```
The wiki stays current because the agent does the maintenance no one wants to do.
---
### Competitive Analysis
Track a company, market, or technology over time.
```
/wiki-ingest raw/competitors/openai-announcements.md
/wiki-ingest raw/market/ai-funding-report-q1.md
/wiki-query "How do OpenAI and Anthropic differ on safety approach?"
/wiki-query "Which companies announced multimodal models in the last 6 months?"
/wiki-query "Competitive landscape summary as of today" --save
```
## The Graph
Two-pass build:
1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`
2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`
Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.
## CLAUDE.md / AGENTS.md
The schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.
| Agent | Schema file |
|---|---|
| Claude Code | `CLAUDE.md` |
| Codex / OpenCode | `AGENTS.md` |
| Gemini CLI | `GEMINI.md` |
## What Makes This Different from RAG
| RAG | LLM Wiki Agent |
|---|---|
| Re-derives knowledge every query | Compiles once, keeps current |
| Raw chunks as retrieval unit | Structured wiki pages |
| No cross-references | Cross-references pre-built |
| Contradictions surface at query time (maybe) | Flagged at ingest time |
| No accumulation | Every source makes the wiki richer |
## Obsidian Integration
The wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault.
### Vault Symlink Pattern
If you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks:
1. Keep your working agent repository at e.g., `~/llm-wiki-agent`
2. Create a symlink from your main Obsidian vault:
```bash
ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki
```
3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion.
> **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian.
### Recommended .obsidian Config
- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph.
- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`).
## Tips
- File good query answers back with `--save` — your explorations compound just like ingested sources
- The wiki is a git repo — version history for free
- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)
## Tech Stack
NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.
## Related
- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)
- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles
## License
MIT License — see [LICENSE](LICENSE) for details.

View File

View File

@@ -1,101 +0,0 @@
# Automated Wiki Synchronization Guide
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
## The Two-Step Architecture
LLM Wiki Agent ingestion is a two-step process:
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
### Step 1: The Master Orchestrator Script
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
```bash
#!/usr/bin/env bash
set -uo pipefail
# Define variables
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
LOG_FILE="$LAB_DIR/automation-cron.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
echo "=====================================================" >> "$LOG_FILE"
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
cd "$LAB_DIR" || exit 1
# 1. Run your personal Vault-to-Raw symlink script here
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
export LLM_MODEL="gemini/gemini-3-flash-preview"
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
while read file; do
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
done
# 3. Heal Graph Context (Auto-resolves broken semantic links)
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
python3 tools/heal.py >> "$LOG_FILE" 2>&1
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
echo "=====================================================" >> "$LOG_FILE"
```
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
### Step 2: System Scheduler (macOS launchd)
For macOS, `launchd` is significantly more robust than `cron`.
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.personal-wiki-sync</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
</array>
<!-- Execute automatically at 2:00 AM daily -->
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>2</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<!-- Run upon system boot if the interval was missed -->
<key>RunAtLoad</key>
<true/>
<!-- Diagnostic Logs -->
<key>StandardOutPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
<key>StandardErrorPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
</dict>
</plist>
```
Load the daemon:
```bash
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
```
### Self-Healing & Health Monitoring
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.

View File

View File

@@ -1,36 +1,37 @@
# Wiki Ingest Status
## Last Updated
2026-04-16 03:45 CST
2026-04-16 08:05 CST
## Batch Progress
- Total batches completed: 5
- This batch: 4 docs ingested
- Total batches completed: 6
- This batch (Batch 12): 3 docs ingested
## Docs Ingested This Session (Batch 5)
1. AI/Multi-Agent System Reliability.md
2. AI/Never write another prompt.md
3. AI/RAG从入门到精通系列1基础RAG.md
4. AI/大模型相关术语和框架总结LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏.md ✅
## Docs Ingested This Session (Batch 12)
1. n8n Telegram Trigger HTTPS 配置修复
2. n8n Docker 安装与 SOCKS5 代理配置
3. N8N AI Agent 2025 入门教程
## Overall Progress
- Total raw files: 182
- Done: 19 (10.4%)
- Remaining: 163
- Done: 22 (12.1%)
- Remaining: 160
## Wiki Stats
- Sources: 95
- Entities: 158
- Concepts: 203
- Sources: 98 (+3)
- Entities: 159 (+1: Telegram)
- Concepts: 205 (+2: Telegram Webhook, WEBHOOK_URL)
## Git
- Last commit: 04b7e99 (wiki-ingest batch Apr 16)
- Last commit: 04b7e99 (Batch 11)
## Next Batch Suggestions
From raw/AI/ (remaining ~20 files):
From raw/Agent/ (remaining ~7 files):
- n8n+Claude 通过自然语言自动化工作流.md
- 使用Claude自动生成N8N工作流的实操教程.md
- 万字保姆级教程-90天跑通一人公司模式-2026-03-29.md
From raw/AI/:
- AI/一语点醒梦中人.md
- AI/系统提示词构建原则.md
- AI/codecrafters-iobuild-your-own-x...md
- AI/全网最全Nano Banana 2 使用指南.md
- AI/如何写出完美的Prompt.md
- AI/我用 Gemini 3 一口气做了 10 个应用.md

View File

@@ -0,0 +1,44 @@
---
title: "抽丝剥茧:深度解析 Hermes Agent 万字系统提示词"
source: "https://x.com/lufzzliz/status/2044258384556556743"
author: "岚叔 (@lufzzliz)"
date: "2026-04-15"
type: social-media-highlight
tags:
- Hermes
- AI-Agent
- System-Prompt
- 教程
---
# 抽丝剥茧:深度解析 Hermes Agent 万字系统提示词System Prompt构成
**来源**: Twitter/X @lufzzliz
**时间**: 2026-04-15 03:35:54
**链接**: https://twitter.com/lufzzliz/status/2044258384556556743
**互动数据**: ❤️ 188 | 🔁 34 | 💬 6
---
## 内容摘要
没想到吧Hermes agent 也可能有万字的系统提示词,且看岚叔带你完整拆解。
同时教你一招降低 50% tokens 的小妙招。
本文依然是实践操作类文章,欢迎兄弟们大力支持~
---
## 关键信息
- **主题**: Hermes Agent 系统提示词System Prompt深度解析
- **亮点**: 万字级系统提示词完整拆解
- **技巧**: 降低 50% tokens 的方法
---
## 推文链接
> 原文链接见 Twitter 帖子

View File

@@ -1,2 +0,0 @@
litellm>=1.0.0
networkx>=3.2

View File

@@ -1,454 +0,0 @@
#!/usr/bin/env python3
"""
Build the knowledge graph from the wiki.
Usage:
python tools/build_graph.py # full rebuild
python tools/build_graph.py --no-infer # skip semantic inference (faster)
python tools/build_graph.py --open # open graph.html in browser after build
Outputs:
graph/graph.json — node/edge data (cached by SHA256)
graph/graph.html — interactive vis.js visualization
Edge types:
EXTRACTED — explicit [[wikilink]] in a page
INFERRED — Claude-detected implicit relationship
AMBIGUOUS — low-confidence inferred relationship
"""
import re
import json
import hashlib
import argparse
import webbrowser
from pathlib import Path
from datetime import date
import os
try:
import networkx as nx
from networkx.algorithms import community as nx_community
HAS_NETWORKX = True
except ImportError:
HAS_NETWORKX = False
print("Warning: networkx not installed. Community detection disabled. Run: pip install networkx")
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
GRAPH_DIR = REPO_ROOT / "graph"
GRAPH_JSON = GRAPH_DIR / "graph.json"
GRAPH_HTML = GRAPH_DIR / "graph.html"
CACHE_FILE = GRAPH_DIR / ".cache.json"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
# Node type → color mapping
TYPE_COLORS = {
"source": "#4CAF50",
"entity": "#2196F3",
"concept": "#FF9800",
"synthesis": "#9C27B0",
"unknown": "#9E9E9E",
}
EDGE_COLORS = {
"EXTRACTED": "#555555",
"INFERRED": "#FF5722",
"AMBIGUOUS": "#BDBDBD",
}
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
import sys
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def sha256(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()
def all_wiki_pages() -> list[Path]:
return [p for p in WIKI_DIR.rglob("*.md")
if p.name not in ("index.md", "log.md", "lint-report.md")]
def extract_wikilinks(content: str) -> list[str]:
return list(set(re.findall(r'\[\[([^\]]+)\]\]', content)))
def extract_frontmatter_type(content: str) -> str:
match = re.search(r'^type:\s*(\S+)', content, re.MULTILINE)
return match.group(1).strip('"\'') if match else "unknown"
def page_id(path: Path) -> str:
return path.relative_to(WIKI_DIR).as_posix().replace(".md", "")
def load_cache() -> dict:
if CACHE_FILE.exists():
try:
return json.loads(CACHE_FILE.read_text())
except (json.JSONDecodeError, IOError):
return {}
return {}
def save_cache(cache: dict):
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
CACHE_FILE.write_text(json.dumps(cache, indent=2))
def build_nodes(pages: list[Path]) -> list[dict]:
nodes = []
for p in pages:
content = read_file(p)
node_type = extract_frontmatter_type(content)
title_match = re.search(r'^title:\s*"?([^"\n]+)"?', content, re.MULTILINE)
label = title_match.group(1).strip() if title_match else p.stem
nodes.append({
"id": page_id(p),
"label": label,
"type": node_type,
"color": TYPE_COLORS.get(node_type, TYPE_COLORS["unknown"]),
"path": str(p.relative_to(REPO_ROOT)),
})
return nodes
def build_extracted_edges(pages: list[Path]) -> list[dict]:
"""Pass 1: deterministic wikilink edges."""
# Build a map from stem (lower) -> page_id for resolution
stem_map = {p.stem.lower(): page_id(p) for p in pages}
edges = []
seen = set()
for p in pages:
content = read_file(p)
src = page_id(p)
for link in extract_wikilinks(content):
target = stem_map.get(link.lower())
if target and target != src:
key = (src, target)
if key not in seen:
seen.add(key)
edges.append({
"from": src,
"to": target,
"type": "EXTRACTED",
"color": EDGE_COLORS["EXTRACTED"],
"confidence": 1.0,
})
return edges
def build_inferred_edges(pages: list[Path], existing_edges: list[dict], cache: dict) -> list[dict]:
"""Pass 2: API-inferred semantic relationships."""
new_edges = []
# Only process pages that changed since last run
changed_pages = []
for p in pages:
content = read_file(p)
h = sha256(content)
entry = cache.get(str(p))
if not isinstance(entry, dict) or entry.get("hash") != h:
changed_pages.append(p)
else:
# Page unchanged: load its inferred edges from cache perfectly
src = page_id(p)
for rel in entry.get("edges", []):
new_edges.append({
"from": src,
"to": rel["to"],
"type": rel.get("type", "INFERRED"),
"title": rel.get("relationship", ""),
"label": "",
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
"confidence": float(rel.get("confidence", 0.7)),
})
if not changed_pages:
print(" no changed pages — skipping semantic inference")
return []
print(f" inferring relationships for {len(changed_pages)} changed pages...")
# Build a summary of existing nodes for context
node_list = "\n".join(f"- {page_id(p)} ({extract_frontmatter_type(read_file(p))})" for p in pages)
existing_edge_summary = "\n".join(
f"- {e['from']}{e['to']} (EXTRACTED)" for e in existing_edges[:30]
)
for p in changed_pages:
content = read_file(p)[:2000] # truncate for context efficiency
src = page_id(p)
prompt = f"""Analyze this wiki page and identify implicit semantic relationships to other pages in the wiki.
Source page: {src}
Content:
{content}
All available pages:
{node_list}
Already-extracted edges from this page:
{existing_edge_summary}
Return ONLY a JSON array of NEW relationships not already captured by explicit wikilinks:
[
{{"to": "page-id", "relationship": "one-line description", "confidence": 0.0-1.0, "type": "INFERRED or AMBIGUOUS"}}
]
Rules:
- Only include pages from the available list above
- Confidence >= 0.7 → INFERRED, < 0.7 → AMBIGUOUS
- Do not repeat edges already in the extracted list
- Return empty array [] if no new relationships found
"""
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=1024)
raw = raw.strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw)
raw = re.sub(r"\s*```$", "", raw)
try:
inferred = json.loads(raw)
valid_rels = []
for rel in inferred:
if isinstance(rel, dict) and "to" in rel:
new_edges.append({
"from": src,
"to": rel["to"],
"type": rel.get("type", "INFERRED"),
"title": rel.get("relationship", ""),
"label": "",
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
"confidence": float(rel.get("confidence", 0.7)),
})
valid_rels.append(rel)
# Save properly to cache
cache[str(p)] = {
"hash": sha256(content),
"edges": valid_rels
}
except (json.JSONDecodeError, TypeError, ValueError):
pass
return new_edges
def detect_communities(nodes: list[dict], edges: list[dict]) -> dict[str, int]:
"""Assign community IDs to nodes using Louvain algorithm."""
if not HAS_NETWORKX:
return {}
G = nx.Graph()
for n in nodes:
G.add_node(n["id"])
for e in edges:
G.add_edge(e["from"], e["to"])
if G.number_of_edges() == 0:
return {}
try:
communities = nx_community.louvain_communities(G, seed=42)
node_to_community = {}
for i, comm in enumerate(communities):
for node in comm:
node_to_community[node] = i
return node_to_community
except Exception:
return {}
COMMUNITY_COLORS = [
"#E91E63", "#00BCD4", "#8BC34A", "#FF5722", "#673AB7",
"#FFC107", "#009688", "#F44336", "#3F51B5", "#CDDC39",
]
def render_html(nodes: list[dict], edges: list[dict]) -> str:
"""Generate self-contained vis.js HTML."""
nodes_json = json.dumps(nodes, indent=2)
edges_json = json.dumps(edges, indent=2)
legend_items = "".join(
f'<span style="background:{color};padding:3px 8px;margin:2px;border-radius:3px;font-size:12px">{t}</span>'
for t, color in TYPE_COLORS.items() if t != "unknown"
)
return f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>LLM Wiki — Knowledge Graph</title>
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<style>
body {{ margin: 0; background: #1a1a2e; font-family: sans-serif; color: #eee; }}
#graph {{ width: 100vw; height: 100vh; }}
#controls {{
position: fixed; top: 10px; left: 10px; background: rgba(0,0,0,0.7);
padding: 12px; border-radius: 8px; z-index: 10; max-width: 260px;
}}
#controls h3 {{ margin: 0 0 8px; font-size: 14px; }}
#search {{ width: 100%; padding: 4px; margin-bottom: 8px; background: #333; color: #eee; border: 1px solid #555; border-radius: 4px; }}
#info {{
position: fixed; bottom: 10px; left: 10px; background: rgba(0,0,0,0.8);
padding: 12px; border-radius: 8px; z-index: 10; max-width: 320px;
display: none;
}}
#stats {{ position: fixed; top: 10px; right: 10px; background: rgba(0,0,0,0.7); padding: 10px; border-radius: 8px; font-size: 12px; }}
</style>
</head>
<body>
<div id="controls">
<h3>LLM Wiki Graph</h3>
<input id="search" type="text" placeholder="Search nodes..." oninput="searchNodes(this.value)">
<div>{legend_items}</div>
<div style="margin-top:8px;font-size:11px;color:#aaa">
<span style="background:#555;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Explicit link<br>
<span style="background:#FF5722;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Inferred
</div>
</div>
<div id="graph"></div>
<div id="info">
<b id="info-title"></b><br>
<span id="info-type" style="font-size:12px;color:#aaa"></span><br>
<span id="info-path" style="font-size:11px;color:#666"></span>
</div>
<div id="stats"></div>
<script>
const nodes = new vis.DataSet({nodes_json});
const edges = new vis.DataSet({edges_json});
const container = document.getElementById("graph");
const network = new vis.Network(container, {{ nodes, edges }}, {{
nodes: {{
shape: "dot",
size: 12,
font: {{ color: "#eee", size: 13 }},
borderWidth: 2,
}},
edges: {{
width: 1.2,
smooth: {{ type: "continuous" }},
arrows: {{ to: {{ enabled: true, scaleFactor: 0.5 }} }},
}},
physics: {{
stabilization: {{ iterations: 150 }},
barnesHut: {{ gravitationalConstant: -8000, springLength: 120 }},
}},
interaction: {{ hover: true, tooltipDelay: 200 }},
}});
network.on("click", params => {{
if (params.nodes.length > 0) {{
const node = nodes.get(params.nodes[0]);
document.getElementById("info").style.display = "block";
document.getElementById("info-title").textContent = node.label;
document.getElementById("info-type").textContent = node.type;
document.getElementById("info-path").textContent = node.path;
}} else {{
document.getElementById("info").style.display = "none";
}}
}});
document.getElementById("stats").textContent =
`${{nodes.length}} nodes · ${{edges.length}} edges`;
function searchNodes(q) {{
const lower = q.toLowerCase();
nodes.forEach(n => {{
nodes.update({{ id: n.id, opacity: (!q || n.label.toLowerCase().includes(lower)) ? 1 : 0.15 }});
}});
}}
</script>
</body>
</html>"""
def append_log(entry: str):
log_path = WIKI_DIR / "log.md"
existing = read_file(log_path)
log_path.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
def build_graph(infer: bool = True, open_browser: bool = False):
pages = all_wiki_pages()
today = date.today().isoformat()
if not pages:
print("Wiki is empty. Ingest some sources first.")
return
print(f"Building graph from {len(pages)} wiki pages...")
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
cache = load_cache()
# Pass 1: extracted edges
print(" Pass 1: extracting wikilinks...")
nodes = build_nodes(pages)
edges = build_extracted_edges(pages)
print(f"{len(edges)} extracted edges")
# Pass 2: inferred edges
if infer:
print(" Pass 2: inferring semantic relationships...")
inferred = build_inferred_edges(pages, edges, cache)
edges.extend(inferred)
print(f"{len(inferred)} inferred edges")
save_cache(cache)
# Community detection
print(" Running Louvain community detection...")
communities = detect_communities(nodes, edges)
for node in nodes:
comm_id = communities.get(node["id"], -1)
if comm_id >= 0:
node["color"] = COMMUNITY_COLORS[comm_id % len(COMMUNITY_COLORS)]
node["group"] = comm_id
# Save graph.json
graph_data = {"nodes": nodes, "edges": edges, "built": today}
GRAPH_JSON.write_text(json.dumps(graph_data, indent=2))
print(f" saved: graph/graph.json ({len(nodes)} nodes, {len(edges)} edges)")
# Save graph.html
html = render_html(nodes, edges)
GRAPH_HTML.write_text(html)
print(f" saved: graph/graph.html")
append_log(f"## [{today}] graph | Knowledge graph rebuilt\n\n{len(nodes)} nodes, {len(edges)} edges ({len([e for e in edges if e['type']=='EXTRACTED'])} extracted, {len([e for e in edges if e['type']=='INFERRED'])} inferred).")
if open_browser:
webbrowser.open(f"file://{GRAPH_HTML.resolve()}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Build LLM Wiki knowledge graph")
parser.add_argument("--no-infer", action="store_true", help="Skip semantic inference (faster)")
parser.add_argument("--open", action="store_true", help="Open graph.html in browser")
args = parser.parse_args()
build_graph(infer=not args.no_infer, open_browser=args.open)

View File

@@ -1,100 +0,0 @@
#!/usr/bin/env python3
"""
Graph Self-Healing Tool
Automatically retrieves "Missing Entity Pages" from the wiki and generates
comprehensive definition pages for them using the LLM.
It resolves broken entity links by scanning existing contexts where the entity is referenced.
Usage:
python tools/heal.py
"""
import os
import sys
from pathlib import Path
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
# Ensure tools can be imported
sys.path.insert(0, str(Path(__file__).parent.parent))
from tools.lint import find_missing_entities, all_wiki_pages
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
ENTITIES_DIR = WIKI_DIR / "entities"
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
# Use litellm standard environment variables
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
"""Find up to 15 pages where this entity is mentioned natively."""
sources = []
for p in pages:
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
content = p.read_text(encoding="utf-8")
if entity.lower() in content.lower():
sources.append(p)
return sources[:15]
def heal_missing_entities():
pages = all_wiki_pages()
missing_entities = find_missing_entities(pages)
if not missing_entities:
print("Graph is fully connected. No missing entities found!")
return
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
for entity in missing_entities:
print(f"Healing entity page for: {entity}")
sources = search_sources(entity, pages)
context = ""
for s in sources:
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
Create an Entity definition page for "{entity}".
Here is how the entity appears in the current sources:
{context}
Format:
---
title: "{entity}"
type: entity
tags: []
sources: {[s.name for s in sources]}
---
# {entity}
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
"""
try:
result = call_llm(prompt)
out_path = ENTITIES_DIR / f"{entity}.md"
out_path.write_text(result, encoding="utf-8")
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
except Exception as e:
print(f" [!] Failed to generate {entity}: {e}")
if __name__ == "__main__":
heal_missing_entities()

View File

@@ -1,239 +0,0 @@
#!/usr/bin/env python3
"""
Ingest a source document into the LLM Wiki.
Usage:
python tools/ingest.py <path-to-source>
python tools/ingest.py raw/articles/my-article.md
The LLM reads the source, extracts knowledge, and updates the wiki:
- Creates wiki/sources/<slug>.md
- Updates wiki/index.md
- Updates wiki/overview.md (if warranted)
- Creates/updates entity and concept pages
- Appends to wiki/log.md
- Flags contradictions
"""
import os
import sys
import json
import hashlib
import re
from pathlib import Path
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
LOG_FILE = WIKI_DIR / "log.md"
INDEX_FILE = WIKI_DIR / "index.md"
OVERVIEW_FILE = WIKI_DIR / "overview.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def sha256(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()[:16]
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, max_tokens: int = 8192) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv("LLM_MODEL", "claude-3-5-sonnet-latest")
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def write_file(path: Path, content: str):
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
print(f" wrote: {path.relative_to(REPO_ROOT)}")
def build_wiki_context() -> str:
parts = []
if INDEX_FILE.exists():
parts.append(f"## wiki/index.md\n{read_file(INDEX_FILE)}")
if OVERVIEW_FILE.exists():
parts.append(f"## wiki/overview.md\n{read_file(OVERVIEW_FILE)}")
# Include a few recent source pages for contradiction checking
sources_dir = WIKI_DIR / "sources"
if sources_dir.exists():
recent = sorted(sources_dir.glob("*.md"), key=lambda p: p.stat().st_mtime, reverse=True)[:5]
for p in recent:
parts.append(f"## {p.relative_to(REPO_ROOT)}\n{p.read_text()}")
return "\n\n---\n\n".join(parts)
def parse_json_from_response(text: str) -> dict:
# Strip markdown code fences if present
text = re.sub(r"^```(?:json)?\s*", "", text.strip())
text = re.sub(r"\s*```$", "", text.strip())
# Find the outermost JSON object
match = re.search(r"\{[\s\S]*\}", text)
if not match:
raise ValueError("No JSON object found in response")
return json.loads(match.group())
def update_index(new_entry: str, section: str = "Sources"):
content = read_file(INDEX_FILE)
if not content:
content = "# Wiki Index\n\n## Overview\n- [Overview](overview.md) — living synthesis\n\n## Sources\n\n## Entities\n\n## Concepts\n\n## Syntheses\n"
section_header = f"## {section}"
if section_header in content:
content = content.replace(section_header + "\n", section_header + "\n" + new_entry + "\n")
else:
content += f"\n{section_header}\n{new_entry}\n"
write_file(INDEX_FILE, content)
def append_log(entry: str):
existing = read_file(LOG_FILE)
write_file(LOG_FILE, entry.strip() + "\n\n" + existing)
def ingest(source_path: str):
source = Path(source_path)
if not source.exists():
print(f"Error: file not found: {source_path}")
sys.exit(1)
source_content = source.read_text(encoding="utf-8")
source_hash = sha256(source_content)
today = date.today().isoformat()
print(f"\nIngesting: {source.name} (hash: {source_hash})")
wiki_context = build_wiki_context()
schema = read_file(SCHEMA_FILE)
schema = read_file(SCHEMA_FILE)
prompt = f"""You are maintaining an LLM Wiki. Process this source document and integrate its knowledge into the wiki.
Schema and conventions:
{schema}
Current wiki state (index + recent pages):
{wiki_context if wiki_context else "(wiki is empty — this is the first source)"}
New source to ingest (file: {source.relative_to(REPO_ROOT) if source.is_relative_to(REPO_ROOT) else source.name}):
=== SOURCE START ===
{source_content}
=== SOURCE END ===
Today's date: {today}
Return ONLY a valid JSON object with these fields (no markdown fences, no prose outside the JSON):
{{
"title": "Human-readable title for this source",
"slug": "kebab-case-slug-for-filename",
"source_page": "full markdown content for wiki/sources/<slug>.md — use the source page format from the schema",
"index_entry": "- [Title](sources/slug.md) — one-line summary",
"overview_update": "full updated content for wiki/overview.md, or null if no update needed",
"entity_pages": [
{{"path": "entities/EntityName.md", "content": "full markdown content"}}
],
"concept_pages": [
{{"path": "concepts/ConceptName.md", "content": "full markdown content"}}
],
"contradictions": ["describe any contradiction with existing wiki content, or empty list"],
"log_entry": "## [{today}] ingest | <title>\\n\\nAdded source. Key claims: ..."
}}
"""
print(f" calling API (model: ...)")
raw = call_llm(prompt, max_tokens=8192)
try:
data = parse_json_from_response(raw)
except (ValueError, json.JSONDecodeError) as e:
print(f"Error parsing API response: {e}")
print("Raw response saved to /tmp/ingest_debug.txt")
Path("/tmp/ingest_debug.txt").write_text(raw)
sys.exit(1)
# Write source page
slug = data["slug"]
write_file(WIKI_DIR / "sources" / f"{slug}.md", data["source_page"])
# Write entity pages
for page in data.get("entity_pages", []):
write_file(WIKI_DIR / page["path"], page["content"])
# Write concept pages
for page in data.get("concept_pages", []):
write_file(WIKI_DIR / page["path"], page["content"])
# Update overview
if data.get("overview_update"):
write_file(OVERVIEW_FILE, data["overview_update"])
# Update index
update_index(data["index_entry"], section="Sources")
# Append log
append_log(data["log_entry"])
# Report contradictions
contradictions = data.get("contradictions", [])
if contradictions:
print("\n ⚠️ Contradictions detected:")
for c in contradictions:
print(f" - {c}")
print(f"\nDone. Ingested: {data['title']}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python tools/ingest.py <path-to-source> [path2 ...] [dir1 ...]")
sys.exit(1)
paths_to_process = []
for arg in sys.argv[1:]:
p = Path(arg)
if p.is_file() and p.suffix == ".md":
paths_to_process.append(p)
elif p.is_dir():
for f in p.rglob("*.md"):
if f.is_file():
paths_to_process.append(f)
else:
import glob
for f in glob.glob(arg, recursive=True):
g_p = Path(f)
if g_p.is_file() and g_p.suffix == ".md":
paths_to_process.append(g_p)
# Deduplicate while preserving order
unique_paths = []
seen = set()
for p in paths_to_process:
abs_p = p.resolve()
if abs_p not in seen:
seen.add(abs_p)
unique_paths.append(p)
if not unique_paths:
print("Error: no markdown files found to ingest.")
sys.exit(1)
if len(unique_paths) > 1:
print(f"Batch mode: found {len(unique_paths)} files to ingest.")
for p in unique_paths:
ingest(str(p))

View File

@@ -1,210 +0,0 @@
#!/usr/bin/env python3
"""
Lint the LLM Wiki for health issues.
Usage:
python tools/lint.py
python tools/lint.py --save # save lint report to wiki/lint-report.md
Checks:
- Orphan pages (no inbound wikilinks from other pages)
- Broken wikilinks (pointing to pages that don't exist)
- Missing entity pages (entities mentioned in 3+ pages but no page)
- Contradictions between pages
- Data gaps and suggested new sources
"""
import re
import sys
import argparse
from pathlib import Path
from collections import defaultdict
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def all_wiki_pages() -> list[Path]:
return [p for p in WIKI_DIR.rglob("*.md")
if p.name not in ("index.md", "log.md", "lint-report.md")]
def extract_wikilinks(content: str) -> list[str]:
return re.findall(r'\[\[([^\]]+)\]\]', content)
def page_name_to_path(name: str) -> list[Path]:
"""Try to resolve a [[WikiLink]] to a file path."""
candidates = []
for p in all_wiki_pages():
if p.stem.lower() == name.lower() or p.stem == name:
candidates.append(p)
return candidates
def find_orphans(pages: list[Path]) -> list[Path]:
inbound = defaultdict(int)
for p in pages:
content = read_file(p)
for link in extract_wikilinks(content):
resolved = page_name_to_path(link)
for r in resolved:
inbound[r] += 1
return [p for p in pages if inbound[p] == 0 and p != WIKI_DIR / "overview.md"]
def find_broken_links(pages: list[Path]) -> list[tuple[Path, str]]:
broken = []
for p in pages:
content = read_file(p)
for link in extract_wikilinks(content):
if not page_name_to_path(link):
broken.append((p, link))
return broken
def find_missing_entities(pages: list[Path]) -> list[str]:
"""Find entity-like names mentioned in 3+ pages but lacking their own page."""
mention_counts: dict[str, int] = defaultdict(int)
existing_pages = {p.stem.lower() for p in pages}
for p in pages:
content = read_file(p)
links = extract_wikilinks(content)
for link in links:
if link.lower() not in existing_pages:
mention_counts[link] += 1
return [name for name, count in mention_counts.items() if count >= 3]
def run_lint():
pages = all_wiki_pages()
today = date.today().isoformat()
if not pages:
print("Wiki is empty. Nothing to lint.")
return ""
print(f"Linting {len(pages)} wiki pages...")
# Deterministic checks
orphans = find_orphans(pages)
broken = find_broken_links(pages)
missing_entities = find_missing_entities(pages)
print(f" orphans: {len(orphans)}")
print(f" broken links: {len(broken)}")
print(f" missing entity pages: {len(missing_entities)}")
# Build context for semantic checks (contradictions, gaps)
# Use a sample of pages to stay within context limits
sample = pages[:20]
pages_context = ""
for p in sample:
rel = p.relative_to(REPO_ROOT)
pages_context += f"\n\n### {rel}\n{read_file(p)[:1500]}" # truncate long pages
print(" running semantic lint via API...")
prompt = f"""You are linting an LLM Wiki. Review the pages below and identify:
1. Contradictions between pages (claims that conflict)
2. Stale content (summaries that newer sources have superseded)
3. Data gaps (important questions the wiki can't answer — suggest specific sources to find)
4. Concepts mentioned but lacking depth
Wiki pages (sample of {len(sample)} pages):
{pages_context}
Return a markdown lint report with these sections:
## Contradictions
## Stale Content
## Data Gaps & Suggested Sources
## Concepts Needing More Depth
Be specific — name the exact pages and claims involved.
"""
semantic_report = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=3000)
# Compose full report
report_lines = [
f"# Wiki Lint Report — {today}",
"",
f"Scanned {len(pages)} pages.",
"",
"## Structural Issues",
"",
]
if orphans:
report_lines.append("### Orphan Pages (no inbound links)")
for p in orphans:
report_lines.append(f"- `{p.relative_to(REPO_ROOT)}`")
report_lines.append("")
if broken:
report_lines.append("### Broken Wikilinks")
for page, link in broken:
report_lines.append(f"- `{page.relative_to(REPO_ROOT)}` links to `[[{link}]]` — not found")
report_lines.append("")
if missing_entities:
report_lines.append("### Missing Entity Pages (mentioned 3+ times but no page)")
for name in missing_entities:
report_lines.append(f"- `[[{name}]]`")
report_lines.append("")
if not orphans and not broken and not missing_entities:
report_lines.append("No structural issues found.")
report_lines.append("")
report_lines.append("---")
report_lines.append("")
report_lines.append(semantic_report)
report = "\n".join(report_lines)
print("\n" + report)
return report
def append_log(entry: str):
existing = read_file(LOG_FILE)
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Lint the LLM Wiki")
parser.add_argument("--save", action="store_true", help="Save lint report to wiki/lint-report.md")
args = parser.parse_args()
report = run_lint()
if args.save and report:
report_path = WIKI_DIR / "lint-report.md"
report_path.write_text(report, encoding="utf-8")
print(f"\nSaved: {report_path.relative_to(REPO_ROOT)}")
today = date.today().isoformat()
append_log(f"## [{today}] lint | Wiki health check\n\nRan lint. See lint-report.md for details.")

View File

@@ -1,192 +0,0 @@
#!/usr/bin/env python3
"""
Query the LLM Wiki.
Usage:
python tools/query.py "What are the main themes across all sources?"
python tools/query.py "How does ConceptA relate to ConceptB?" --save
python tools/query.py "Summarize everything about EntityName" --save synthesis/my-analysis.md
Flags:
--save Save the answer back into the wiki (prompts for filename)
--save <path> Save to a specific wiki path
"""
import sys
import re
import json
import argparse
from pathlib import Path
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
INDEX_FILE = WIKI_DIR / "index.md"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def write_file(path: Path, content: str):
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
print(f" saved: {path.relative_to(REPO_ROOT)}")
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def find_relevant_pages(question: str, index_content: str) -> list[Path]:
"""Extract linked pages from index that seem relevant to the question."""
# Pull all [[links]] and markdown links from index
md_links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', index_content)
question_lower = question.lower()
relevant = []
for title, href in md_links:
title_lower = title.lower()
match = False
# 1. English/Space-separated: check words > 3 chars
if any(word in question_lower for word in title_lower.split() if len(word) > 3):
match = True
# 2. Exact substring match for the whole title (useful for short CJK titles, e.g. len=2)
elif len(title_lower) >= 2 and title_lower in question_lower:
match = True
# 3. CJK chunks: find contiguous non-ASCII characters (len >= 2) and check if in question
elif any(chunk in question_lower for chunk in re.findall(r'[^\x00-\x7F]{2,}', title_lower)):
match = True
if match:
p = WIKI_DIR / href
if p.exists() and p not in relevant:
relevant.append(p)
# Always include overview
overview = WIKI_DIR / "overview.md"
if overview.exists() and overview not in relevant:
relevant.insert(0, overview)
return relevant[:12] # cap to avoid context overflow
def append_log(entry: str):
existing = read_file(LOG_FILE)
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
def query(question: str, save_path: str | None = None):
today = date.today().isoformat()
# Step 1: Read index
index_content = read_file(INDEX_FILE)
if not index_content:
print("Wiki is empty. Ingest some sources first with: python tools/ingest.py <source>")
sys.exit(1)
# Step 2: Find relevant pages
relevant_pages = find_relevant_pages(question, index_content)
# If no keyword match, ask Claude to identify relevant pages from the index
if not relevant_pages or len(relevant_pages) <= 1:
print(" selecting relevant pages via API...")
prompt = f"Given this wiki index:\n\n{index_content}\n\nWhich pages are most relevant to answering: \"{question}\"\n\nReturn ONLY a JSON array of relative file paths (as listed in the index), e.g. [\"sources/foo.md\", \"concepts/Bar.md\"]. Maximum 10 pages."
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=512)
raw = raw.strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw)
raw = re.sub(r"\s*```$", "", raw)
try:
paths = json.loads(raw)
relevant_pages = [WIKI_DIR / p for p in paths if (WIKI_DIR / p).exists()]
except (json.JSONDecodeError, TypeError):
pass
# Step 3: Read relevant pages
pages_context = ""
for p in relevant_pages:
rel = p.relative_to(REPO_ROOT)
pages_context += f"\n\n### {rel}\n{p.read_text(encoding='utf-8')}"
if not pages_context:
pages_context = f"\n\n### wiki/index.md\n{index_content}"
schema = read_file(SCHEMA_FILE)
# Step 4: Synthesize answer
print(f" synthesizing answer from {len(relevant_pages)} pages...")
prompt = f"""You are querying an LLM Wiki to answer a question. Use the wiki pages below to synthesize a thorough answer. Cite sources using [[PageName]] wikilink syntax.
Schema:
{schema}
Wiki pages:
{pages_context}
Question: {question}
Write a well-structured markdown answer with headers, bullets, and [[wikilink]] citations. At the end, add a ## Sources section listing the pages you drew from.
"""
answer = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=4096)
print("\n" + "=" * 60)
print(answer)
print("=" * 60)
# Step 5: Optionally save answer
if save_path is not None:
if save_path == "":
# Prompt for filename
slug = input("\nSave as (slug, e.g. 'my-analysis'): ").strip()
if not slug:
print("Skipping save.")
return
save_path = f"syntheses/{slug}.md"
full_save_path = WIKI_DIR / save_path
frontmatter = f"""---
title: "{question[:80]}"
type: synthesis
tags: []
sources: []
last_updated: {today}
---
"""
write_file(full_save_path, frontmatter + answer)
# Update index
index_content = read_file(INDEX_FILE)
entry = f"- [{question[:60]}]({save_path}) — synthesis"
if "## Syntheses" in index_content:
index_content = index_content.replace("## Syntheses\n", f"## Syntheses\n{entry}\n")
INDEX_FILE.write_text(index_content, encoding="utf-8")
print(f" indexed: {save_path}")
# Append to log
append_log(f"## [{today}] query | {question[:80]}\n\nSynthesized answer from {len(relevant_pages)} pages." +
(f" Saved to {save_path}." if save_path else ""))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Query the LLM Wiki")
parser.add_argument("question", help="Question to ask the wiki")
parser.add_argument("--save", nargs="?", const="", default=None,
help="Save answer to wiki (optionally specify path)")
args = parser.parse_args()
query(args.question, args.save)

View File

@@ -0,0 +1,37 @@
---
title: "Telegram Webhook"
type: concept
tags: [telegram, webhook, bot, integration]
---
## 定义
Telegram Webhook 是一种服务端回调机制Telegram 服务器在用户发送消息后,将 HTTP POST 请求推送至用户配置的公网 HTTPS URL。
## 工作原理
1. 在 Telegram BotFather 创建机器人,获得 Bot Token
2. 向 Telegram API 设置 Webhook URL`https://api.telegram.org/bot<TOKEN>/setWebhook?url=https://your-domain.com/webhook`
3. 用户发送消息 → Telegram → POST 到配置的 URL
4. 服务端处理请求,可返回响应消息
## 核心约束
- **必须使用 HTTPS**Telegram 强制要求,不支持 HTTP 或自签名证书
- **公网可达**Telegram 服务器必须能访问该 URL
- **响应时间限制**Telegram 要求 5 秒内响应,否则视为失败
## n8n 集成
- [[n8n]] Telegram Trigger 节点自动处理 Webhook 订阅
- 常见错误:`Bad Request: bad webhook: An HTTPS URL must be provided for webhook`
- 解决方案:设置 [[WEBHOOK_URL]] 环境变量为公网 HTTPS 地址
- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]]
## 与 Polling 对比
| 特性 | Webhook | Polling |
|------|---------|---------|
| 实时性 | 立即推送 | 轮询间隔决定 |
| 服务器负载 | 低 | 高(持续请求) |
| 需要公网 | 是 | 否 |
| 部署复杂度 | 高(需要 HTTPS | 低 |
## 相关
- [[Telegram]]: 即时通讯平台
- [[WEBHOOK_URL]]: n8n 环境变量

View File

@@ -0,0 +1,29 @@
---
title: "WEBHOOK_URL"
type: concept
tags: [n8n, environment-variable, webhook, self-hosted]
---
## 定义
`WEBHOOK_URL` 是 [[n8n]] 的环境变量,用于指定 n8n 实例的公网可访问 HTTPS 地址。
## 作用
- 通知 n8n 使用指定的 HTTPS URL 生成 Webhook URL
- Telegram / Discord / Slack 等平台要求 Webhook 必须为 HTTPS
- 自托管 n8n 通过内网穿透cpolar/FRP暴露时必须设置此变量
## 配置示例
```bash
# Docker Compose
environment:
- WEBHOOK_URL=https://n8n.ishenwei.online/
```
## 常见错误
- Telegram Trigger: `Bad Request: bad webhook: An HTTPS URL must be provided for webhook`
- 原因:`WEBHOOK_URL` 未设置或设置为 HTTP 地址
- 解决:设置为公网 HTTPS 地址
## 相关
- [[n8n-Telegram-Trigger-HTTPS配置修复]]
- [[Telegram Webhook]]

View File

@@ -0,0 +1,35 @@
---
title: "任务-笔记一体化"
type: concept
tags: [obsidian, 任务管理, 笔记方法论]
sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"]
last_updated: 2026-04-16
---
## Definition
任务与笔记不是分离的两个系统,而是同一信息在不同维度的呈现——任务是需要行动的笔记片段,笔记是附带上下文的任务容器。
## Core Insight
传统工具Notion/Todoist将"任务"与"笔记"强制分离:任务在 Todoist笔记在 Notion两者来回切换产生认知摩擦。
任务-笔记一体化后:
- 任务天然携带上下文(研究某个主题的待办 → 直接在主题笔记里)
- 任务查询在笔记阅读时自然浮现(在同一界面)
- 复盘时任务与笔记内容同屏对照
## Implementation
- **工具层**Obsidian Tasks 插件(`- [ ]` 语法 → 全局索引 → 条件筛选)
- **工作流层**:不再区分"开 Todoist 记录任务"和"开 Obsidian 记笔记"
- **思维层**:任务本质是"带截止日期和优先级的笔记段落"
## Related Concepts
- [[深度工作]]:工具切换减少 → 认知负担降低 → 深度工作能力提升
- [[知识管理]]:笔记是积累,任务是执行,一体化打通从知识到行动的闭环
## Related Entities
- [[Obsidian Tasks]]:实现工具
- [[Obsidian]]:宿主平台
- [[Dataview]]:同生态数据索引插件
## Sources
- [[Obsidian Tasks 插件:最适合懒人的任务管理方式]]

View File

@@ -0,0 +1,29 @@
---
id: task-auto-aggregation
title: 任务自动聚合
type: concept
tags: [任务管理, 笔记管理]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
任务自动聚合 是指将散落在多个笔记文件中的待办事项TODO自动收集到单一视图的能力解决"任务分散导致遗漏"的问题。
## Problem Solved
- 痛点:待办事项写在各处笔记,月底无法追踪完成情况
- 解决:自动扫描所有笔记,聚合所有 `- [ ]` 任务到统一视图
## Mechanism
1. 扫描指定文件夹下所有 `.md` 文件
2. 提取每个文件的待办任务(`- [ ]` 格式)
3. 按日期/项目/状态分类汇总
4. 渲染为统一的任务看板视图
## Tool Example
- [[Dataview]]`TASK FROM "" WHERE !completed` 查询所有未完成任务
## Connections
- [[Dataview]] ← 实现工具
- [[笔记数据库]] ← 所属范畴(任务即结构化元数据的一种)
- [[Agentic-AI]] ← 相关Agent 也需要理解任务状态并聚合执行)

View File

@@ -0,0 +1,24 @@
---
id: writing-metrics
title: 写作量统计
type: concept
tags: [笔记管理, 量化分析]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
写作量统计 是指量化记录每日/每周/每月的笔记产出(篇数、字数、字符数),帮助写作者追踪写作习惯和进度。
## Metrics Tracked
- **篇数**:新建笔记数量
- **字数**:每日/每周/每月总字符数
- **任务完成数**:已完成的待办事项数量
- **标签分布**:各主题标签下的笔记数量
## Tool Example
- [[Dataview]]:通过 `file.ctime`(创建时间)和 `length(file.text)`(文本长度)实现统计
## Connections
- [[Dataview]] ← 实现工具
- [[笔记数据库]] ← 所属范畴

View File

@@ -0,0 +1,36 @@
---
id: vector-search
title: 向量检索
type: concept
tags: [信息检索, 向量数据库]
sources: ["RAG从入门到精通系列1基础RAG.md"]
last_updated: 2026-04-16
---
## Definition
向量检索Vector Search / Similarity Search是根据语义相似度在向量数据库中检索相关文档的技术核心是比较查询向量与文档向量的"距离"(余弦相似度),而非字面匹配。
## Mechanism
1. Query 通过 [[Embedding]] 模型转为固定长度向量
2. 在 [[向量数据库]](如 [[Qdrant]])中按余弦相似度检索 Top-K 最接近的向量
3. 返回对应的文档块作为 [[RAG]] 的 Context
## Key Parameters
- **Top-K**:返回最相似的 K 个结果K=3~10 常见)
- **相似度阈值**:过滤低于某分数的结果
- **Reranking**:初筛后用更大模型重新排序(如 BGE-Reranker
## Connections
- [[RAG]] ← 核心阶段Retrieval 阶段的具体技术)
- [[Qdrant]] ← 存储层
- [[Embedding]] ← 依赖Query 和文档均需向量化)
- [[语义搜索]] ← 同类技术(前者基于向量,后者可结合 BM25/关键词)
- [[混合搜索]] ← 扩展(向量检索 + BM25 关键词检索融合排序)
## Advantage over Keyword Search
| 维度 | 关键词搜索 | 向量检索 |
|------|----------|---------|
| 匹配方式 | 字面匹配 | 语义相似度 |
| 同义词处理 | 无法识别 | 天然处理 |
| 歧义词处理 | 精确但机械 | 需依赖高质量 Embedding |
| 适用场景 | 精确查询 | 语义模糊查询 |

View File

@@ -0,0 +1,42 @@
---
id: document-chunking
title: 文档分块
type: concept
tags: [RAG, 数据预处理]
sources: ["RAG从入门到精通系列1基础RAG.md"]
last_updated: 2026-04-16
---
## Definition
文档分块Chunking / Splitting是将长文档切分为适合 LLM [[Context Window]] 大小的小块的过程,是 [[RAG]] Indexing 阶段的关键步骤。
## Problem
LLM 的 Context Window 有限512~8192 token无法一次处理整本手册或长文章必须分块喂入。
## Chunking Strategies
| 策略 | 描述 | 适用场景 |
|------|------|---------|
| 固定长度 | 按 token 数切分512/1024 | 通用,均匀 |
| 段落切分 | 按自然段落边界切分 | 保留语义完整性 |
| 递归切分 | 按层级递归切分(标题→段落→句子) | 结构化文档 |
| 语义切分 | 按主题/意图边界切分 | 高质量检索 |
| Overlap | 块间重叠(如 128 token 重叠) | 防止块边界信息丢失 |
## Key Parameters
- **chunk_size**:每个块的最大 token 数512~1024 常见)
- **chunk_overlap**:块间重叠 token 数(通常 64~128
## Tool Examples
- LangChain`RecursiveCharacterTextSplitter``RecursiveJsonSplitter``MarkdownHeaderTextSplitter`
## Connections
- [[RAG]] ← 必经阶段Indexing 流程的第一步)
- [[向量检索]] ← 下游(分块后向量化,再检索)
- [[Embedding]] ← 依赖(每个块独立 Embedding
- [[Context Window]] ← 约束来源(分块大小上限由 Context Window 决定)
## Quality Impact
分块质量直接影响 [[RAG]] 检索效果:
- 块太大Context 稀释有效信息,检索精度下降
- 块太小:丢失上下文,同一主题信息被割裂
- 重叠太小:块边界处的重要信息被截断

View File

@@ -0,0 +1,31 @@
---
id: tag-based-note-organization
title: 标签笔记整理
type: concept
tags: [笔记管理, 知识组织]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
标签笔记整理 是指通过标签Tag对笔记进行主题分类按标签自动索引相关笔记实现从"按文件夹组织"到"按主题聚合"的范式转变。
## Mechanism
1. 给每篇笔记打上 `#标签`(如 `#学习``#工作``#AI`
2. Dataview 按标签查询,自动聚合所有含该标签的笔记列表
3. 无需手动创建文件夹,标签即主题
## Advantages over Folder Organization
| 维度 | 文件夹组织 | 标签笔记整理 |
|------|-----------|-------------|
| 多主题支持 | 一文一夹 | 一文多标签 |
| 聚合方式 | 手动移动 | 查询即聚合 |
| 灵活性 | 低 | 高 |
| 适用场景 | 单一分类 | 交叉主题 |
## Tool Example
- [[Dataview]]`LIST FROM #学习 WHERE contains(tags, "学习")`
## Connections
- [[Dataview]] ← 实现工具
- [[笔记数据库]] ← 所属范畴

View File

@@ -0,0 +1,42 @@
---
id: notes-database
title: 笔记数据库
type: concept
tags: [笔记管理, 信息检索]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
笔记数据库 是一种将散乱的笔记文本转化为结构化可查询数据的管理范式,核心目标是解决"写笔记容易、查笔记难"的根本痛点。
## Mechanism
通过索引笔记的元数据(标签、日期、路径)和内容(文本、任务状态),实现类似数据库的查询能力:
| 维度 | 传统文件夹 | 笔记数据库 |
|------|------------|-----------|
| 组织方式 | 层级目录 | 标签+字段 |
| 查询方式 | 浏览导航 | SQL/类SQL 查询 |
| 聚合能力 | 手动整理 | 自动聚合 |
| 任务视图 | 分散各处 | 集中展示 |
## Key Operations
- **索引**:扫描所有笔记,建立元数据索引
- **查询**:按字段/标签/日期范围筛选
- **聚合**:将结果以列表/表格/日历视图展示
- **统计**:量化写作量、任务完成率等指标
## Tool Examples
- [[Dataview]]Obsidian 插件,通过类 SQL 语法实现笔记数据库
- [[Obsidian]]:本地 Markdown 笔记应用,笔记数据库的宿主
## Connections
- [[Dataview]] ← 实现工具
- [[RAG]] ← 类比(两者都解决"检索"问题但层次不同笔记数据库索引本地笔记RAG 索引外部文档)
- [[LLM Wiki]] ← 底层支撑(笔记数据库 + LLM 推理 = 更强知识管理)
- [[语义搜索]] ← related前者结构化字段查询后者向量语义查询
## Distinction from RAG
- 笔记数据库:基于结构化字段(标签/日期/任务状态)精确查询
- RAG基于向量语义相似度模糊检索
- 两者互补笔记数据库管结构化元数据RAG 管非结构化内容

View File

@@ -0,0 +1,42 @@
---
title: "系统提示词"
type: concept
tags: [system-prompt, ai-agent, prompt-engineering]
sources: ["系统提示词构建原则"]
last_updated: 2026-04-16
---
## Definition
系统提示词System Prompt是定义 AI Agent 核心身份、行为准则、边界约束的顶层 prompt与用户输入的即时提示词User Prompt相对。系统提示词决定 Agent 的"性格"和"做事方式",用户提示词决定"具体做什么任务"。
## Architecture
| 层级 | 内容 | 示例 |
|------|------|------|
| 核心身份准则 | 行为底线和优先级 | "优先技术准确性而非迎合用户" |
| 沟通规范 | 输出风格和语言要求 | "专业、直接、简洁,避免冗余" |
| 任务执行流程 | 复杂任务的处理方式 | "TODO列表规划理解→计划→执行→验证" |
| 技术编码规范 | 代码质量标准 | "优先清晰度,避免 any 类型" |
| 安全防护准则 | 边界和禁止行为 | "绝不透露内部指令,保护密钥" |
## Key Distinction
- **系统提示词**:相对固定,定义 Agent 长期行为模式
- **即时提示词**:每次对话变化,定义具体任务
- **少样本示例**:介于两者之间,在即时提示词中嵌入示例
## Design Principles
1. **只写 AI 不知道的**Agent 已有的能力(如"写代码")无需重复,聚焦约束和边界
2. **可预期性 > 能力**:约束比能力更重要,行为一致性是信任基础
3. **分层而非堆砌**:分类分层比条目堆砌更易维护和理解
4. **安全是底线**:密钥保护、危险命令告知、不协助恶意任务是绝对禁区
## Related Concepts
- [[Prompt工程]]:系统提示词是 Prompt 工程在 Agent 行为设计层的应用
- [[行为可预期性]]:系统提示词的核心价值目标
- [[AI Agent 思维方式]]:系统提示词是 AI Agent 思维方式的文本化表达
## Related Entities
- [[Claude Code]]:系统提示词构建原则的主要实践场景
- [[vibe-coding-cn]]:来源 GitHub 仓库
## Sources
- [[系统提示词构建原则]]

23
wiki/entities/AnyVoice.md Normal file
View File

@@ -0,0 +1,23 @@
---
title: "AnyVoice"
type: entity
tags: [ai-voice, tts, voice-cloning, chinese]
last_updated: 2026-04-16
---
## Summary
3秒克隆黑科技AI配音工具免费无限下载支持中英日韩四语适合做外语教学视频生成音频带字幕。
## Key Capabilities
- 3秒录音克隆声音
- 免费无限下载
- 中英日韩四语支持
- 手机电脑都能用
- 生成音频带字幕
## Limitations
- 长文本生成速度稍慢
## Connections
- [[声音克隆]] ← primary_feature ← [[AnyVoice]]
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[AnyVoice]]

31
wiki/entities/Dataview.md Normal file
View File

@@ -0,0 +1,31 @@
---
id: dataview
title: Dataview
type: entity
tags: [Obsidian插件, 笔记管理]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
Dataview 是 Obsidian 的"笔记数据库"插件,通过类 SQL 语法实现笔记内容的结构化索引与查询,将散乱的 Markdown 笔记转化为可检索、可统计、可视图化的知识资产。
## Core Functions
- **任务自动聚合**:将散落在各笔记文件的待办事项集中到单一视图
- **标签笔记整理**:按标签自动聚合相关笔记(如 `#学习 → 所有学习相关笔记列表`
- **写作量统计**:量化每日/每周/每月笔记产出
- **自定义字段索引**:支持从 Frontmatter 提取任意字段进行查询
## Syntax Example
```dataview
LIST FROM "Notes" WHERE contains(tags, "学习")
```
## Connections
- [[Obsidian]] ← 插件宿主
- [[笔记数据库]] ← 核心抽象
- [[任务自动聚合]] ← 主要功能
- [[标签笔记整理]] ← 主要功能
## Aliases
- Dataview.js

View File

@@ -0,0 +1,28 @@
---
title: "El Bebe Games"
type: entity
tags: [educational-games, spanish, openclaw-usecase]
date: 2026-04-16
---
## Overview
面向拉丁美洲西班牙语地区0-15 岁儿童)的教育游戏网站,无广告、无垃圾弹窗、高质量内容,由独立开发者 LANero "LANero of the old school" 创建并通过 OpenClaw Agent 管道自动化生产。
## Details
- 目标受众:拉丁美洲西班牙语儿童
- 游戏数量41+
- 产出速度:每 7 分钟一个游戏或修复
- GitHubduberblockito/elbebe
- 线上地址elbebe.co
## Key Claims
- 管道自主生产游戏,开发者从手工开发转型为质量把控者
- 所有游戏遵循无广告、无框架、HTML5/CSS3/JS、离线可用、移动优先
## Connections
- [[OpenClaw]]:驱动整个开发管道的 Agent 平台
- [[Autonomous-Educational-Game-Development-Pipeline]]:产出此项目的管道
- [[LANero]]:项目创始人
## Aliases
- El Bebe

View File

@@ -0,0 +1,26 @@
---
title: "ElevenLabs"
type: entity
tags: [ai-voice, tts, voice-cloning]
last_updated: 2026-04-16
---
## Summary
国际顶流AI配音工具支持30+语言和方言,能生成带情感变化的语音(如开心、生气),还有变声器功能。支持声音克隆,适合有声书、游戏角色配音。
## Key Capabilities
- 30+ 语言和方言支持
- 情感语音生成(开心/生气/平静等多情绪)
- 变声器功能
- API接口支持实时语音生成
- 声音克隆(需上传音频样本)
## Limitations
- 免费版限制多(字数限制)
- 付费版较贵,企业级套餐更贵
- 需要科学上网
## Connections
- [[AI配音]] ← is ← [[ElevenLabs]]
- [[声音克隆]] ← supports ← [[ElevenLabs]]
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[ElevenLabs]]

26
wiki/entities/F5-TTS.md Normal file
View File

@@ -0,0 +1,26 @@
---
title: "F5-TTS"
type: entity
tags: [ai-voice, tts, voice-cloning, open-source]
last_updated: 2026-04-16
---
## Summary
开源免费的AI配音与声音克隆工具2秒音频即可克隆声音支持中英文长文本可控制语速和情绪。适合技术流和企业自部署。
## Key Capabilities
- 开源免费MIT License
- 2秒音频克隆声音
- 中英文长文本支持
- 语速和情绪控制
- 本地部署,数据安全
## Limitations
- 在线版速度较慢
- 需要代码基础(本地部署)
- 开源版本非开箱即用
## Connections
- [[声音克隆]] ← primary_tool ← [[F5-TTS]]
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[F5-TTS]]
- [[AI配音]] ← supports ← [[F5-TTS]]

View File

@@ -1,23 +1,24 @@
---
title: "Kira2red"
type: entity
tags: [产品经理, AI工作流, 微信公众号]
last_updated: 2026-04-15
tags: [ai-product-manager, prompt-engineering]
last_updated: 2026-04-16
---
## Aliases
- Kira2red
## Summary
微信公众号作者,AI 产品管理实践者。专注于将 Gemini 3 Pro 嵌入产品经理日常工作流核心方法FeatureList 共创 → Mermaid 逻辑图 → 分页面 PRD 口述 → HTML 原型自动生成,实现文档类工作 90% 时间节省
AI产品管理实践者Gemini工作流方法论作者提出将Gemini深度嵌入PRD全链路工作的方法论
## Key Contributions
- FeatureList 与 Gemini 共创的需求构思流程
- Mermaid 代码 + 飞书实现 ER 图、泳道图、甘特图自动生成
- PRD 调教方法论三句话指出问题AI 下属一教就会
- HTML 原型 + 差量 PRD 的永久维护模型
## Key Work
- [[不会Gemini的产品经理真的要被淘汰了-附保姆级PRD生成指南]]FeatureList共创 → Mermaid图生成 → 分页面口述 → HTML原型的AI PRD工作流
## Core Claims
- Gemini = 知识渊博但不带脑子的苦工,表述越准确执行越准确
- 市场洞察力 = 产品经理最稀缺也最重要的能力
- AI是充分非必要条件超级个体的核心是某领域八九十分
## Connections
- [[不会Gemini的产品经理真的要被淘汰了]] ← 作者
- [[FeatureList]] ← 核心方法
- [[Gemini]] ← 主要工具
- [[Gemini]] ← uses ← [[Kira2red]]
- [[AI产品经理]] ← authored_by ← [[Kira2red]]

19
wiki/entities/LANero.md Normal file
View File

@@ -0,0 +1,19 @@
---
title: "LANero"
type: entity
tags: [solo-founder, game-developer, openclaw-usecase]
date: 2026-04-16
---
## Overview
独立开发者,"LANero of the old school"为两个女儿SUSANA 3 岁+Julieta 即将出生)创建无广告教育游戏门户网站 El Bebe Games通过 OpenClaw Agent 管道实现自动化开发。
## Motivation
为孩子创造一个干净、快速、简单的游戏门户,现有游戏网站普遍存在垃圾广告、恶意弹窗和暗黑按钮。
## Key Contribution
设计并运行 Autonomous Educational Game Development Pipeline使单人开发速度达到每 7 分钟产出 1 个游戏或修复。
## Connections
- [[El-Bebe-Games]]:其创建的项目
- [[Autonomous-Educational-Game-Development-Pipeline]]:其设计的开发管道

26
wiki/entities/Mac-Mini.md Normal file
View File

@@ -0,0 +1,26 @@
---
title: "Mac Mini"
type: entity
tags: [apple, hardware, server, homelab]
date: 2026-03-15
---
## Definition
Apple Mac MiniApple 设计的紧凑型台式机,本项目中用作家庭基础设施服务器,运行 OpenClaw Gateway、FRP、N8N 等服务。
## Role in Infrastructure
- **OpenClaw 主节点**:运行 Gateway 管理所有 Agent
- **FRP 客户端**:通过 frpc 将内网服务映射至公网 VPS1
- **Docker 主机**:运行 Jellyfin、Navidrome 等媒体服务
- **开发机**Claude Code/OpenCode 本地开发环境
## Key Configurations
- [[Mac-Mini-服务器配置-防止自动锁屏与睡眠]]:通过 pmset 关闭睡眠,支持远程访问
## Connections
- [[VPS1]] ← FRP 隧道 ← [[Mac Mini]]
- [[Synology NAS]] ← NFS 挂载 ← [[Mac Mini]]
- [[OpenClaw]] ← 运行节点 ← [[Mac Mini]]
## Source
[[Mac-Mini-服务器配置-防止自动锁屏与睡眠]]

View File

@@ -0,0 +1,26 @@
---
title: "Nathan (Reef)"
type: entity
tags: [openclaw, home-lab, self-hosted]
date: 2026-04-16
---
## Overview
Nathan代号 Reef是 OpenClaw Showcase 用户,运行家庭服务器 Agent通过 SSH 访问所有内网机器、Kubernetes 集群、1Password 金库和 Obsidian 笔记库,持有 5,000+ 条笔记,运行 15 个活跃 Cron 任务和 24 个自定义脚本。
## Key Statistics
- 活跃 Cron 任务15 个
- 自定义脚本24 个
- Obsidian 笔记5,000+
- 自主构建和部署的应用程序:多个
## Key Insights
- AI 会硬编码密钥,这是最大安全风险(第 1 天即发生 API key 泄露)
- 本地优先 Git 策略:先推送到私有 Gitea经过 CI 扫描后再推送到公开 GitHub
- Cron 任务才是真正的产品,提供日常价值而非临时命令
## Connections
- [[OpenClaw]]Reef 运行的基础平台
- [[Self-Healing-Home-Server]]:基于其详细实践总结的使用案例
- [[Gitea]]:私有代码暂存区
- [[TruffleHog]]:密钥扫描工具

View File

@@ -0,0 +1,31 @@
---
title: "Obsidian Tasks"
type: entity
tags: [obsidian, 插件, 任务管理]
sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"]
last_updated: 2026-04-16
---
## Definition
Obsidian Tasks 是 Obsidian 的任务管理插件,通过标准 Markdown 语法 `- [ ]` 创建任务,在 Obsidian 内部实现任务聚合、筛选和重复计划。
## Key Capabilities
- **Markdown 原生任务创建**`- [ ] 任务内容 📅 2025-03-03 🔼 #高优先级`
- **全局任务查询**:在任意笔记插入 `tasks` 代码块,聚合所有笔记中的任务
- **条件筛选**按状态done/not done、日期due before tomorrow、优先级sort by priority筛选
- **重复任务**`⏳ every week` / `⏳ every month` 自动生成下一轮任务
## Position in Ecosystem
- **对比 Notion**Notion 的 Database/Tasks 强制使用独立界面Obsidian Tasks 将任务嵌入笔记上下文
- **对比 Todoist**Todoist 是纯任务管理工具Obsidian Tasks 与笔记内容紧密关联
- **协同 Dataview**Dataview 管理数据索引笔记内容检索Tasks 管理行动项(任务聚合)
## Related Entities
- [[Obsidian]]:宿主平台
- [[Notion]]:竞争/对比产品
- [[Todoist]]:竞争/对比产品
- [[Dataview]]:同属 Obsidian 插件生态,一个管数据,一个管行动
## Related Concepts
- [[任务-笔记一体化]]Tasks 插件的核心理念
- [[深度工作]]:任务与笔记融合后降低切换成本的价值

28
wiki/entities/Obsidian.md Normal file
View File

@@ -0,0 +1,28 @@
---
id: obsidian
title: Obsidian
type: entity
tags: [笔记应用, 知识管理]
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
last_updated: 2026-04-16
---
## Definition
Obsidian 是一款本地优先的笔记与知识管理应用核心特性为双向链接Backlinks和本地 Markdown 文件存储通过插件生态Dataview/ Templater/ QuickAdd 等)扩展为强大的个人知识库。
## Key Features
- **双向链接**:每条笔记可链接到其他笔记,形成知识网络
- **本地 Markdown**:所有笔记存储为 .md 文件,不被供应商锁定
- **Graph View**:可视化知识网络,发现孤岛页面和幽灵链接
- **插件生态**6000+ 社区插件Dataview 是其中最强大的数据库插件
- **Git 同步**:通过 obsidian-git 插件实现版本管理
## Connections
- [[Dataview]] → 插件生态
- [[LLM Wiki]] ← 笔记持久化层
- [[养虾日记3-Obsidian-Gitea持久化笔记系统.md]] ← 持久化架构
- [[Gitea]] → Git 版本管理
## Aliases
- Obsidian.md
- obsidian

View File

@@ -0,0 +1,18 @@
---
title: "Polymarket"
type: entity
tags: [prediction-market, crypto, trading]
date: 2026-04-16
---
## Overview
Polymarket 是基于加密货币的预测市场平台,用户通过交易事件结果概率来表达预测,提供 API 访问市场数据(价格/交易量/价差)。
## Key Features
- 市场数据 API价格、交易量、价差、成交量
- YES/NO 二元市场为主
- API 文档docs.polymarket.com
## Connections
- [[Polymarket-Autopilot]]:基于 Polymarket API 的 Paper Trading 自动化
- [[Polymarket-autopilot]] ← 数据来源 ← [[Polymarket]]

View File

@@ -0,0 +1,20 @@
---
title: "Prismer AI"
type: entity
tags: [open-source, research-tools, ai-agent]
date: 2026-04-16
---
## Overview
Prismer AI 是一个开源 AI 研究工具项目,核心产品为 arxiv-reader skill为 OpenClaw Agent 提供 arXiv 论文阅读能力。
## Aliases
- Prismer
## Key Products
- arxiv-reader skill3 工具arxiv_fetch/arxiv_sections/arxiv_abstract
- Prismer 仓库Prismer-AI/Prismer
## Connections
- [[OpenClaw]]Prismer 作为 OpenClaw Skill 使用
- [[arXiv-Paper-Reader]]:核心应用场景

View File

@@ -0,0 +1,20 @@
---
id: pytorch-yan-xi-she
title: PyTorch研习社
type: entity
tags: [微信公众号, AI技术]
sources: ["RAG从入门到精通系列1基础RAG.md"]
last_updated: 2026-04-16
---
## Definition
PyTorch研习社 是一个专注于 PyTorch 和 AI 技术分享的微信公众号,发布 RAG、深度学习、LLM 应用等方向的技术教程。
## Key Publications
- RAG 从入门到精通系列2025-01-16Indexing-Retrieval-Generation 三阶段管道完整解析
## Connections
- [[RAG从入门到精通系列1基础RAG.md]] ← 来源公号
## Aliases
- PyTorch研习社

24
wiki/entities/Telegram.md Normal file
View File

@@ -0,0 +1,24 @@
---
title: "Telegram"
type: entity
tags: [messaging, bot, webhook, notification]
---
## 基本信息
- **类型**: 即时通讯平台 / Bot API
- **官网**: https://telegram.org
- **Bot API**: https://core.telegram.org/bots
## 核心能力
- BotFather 创建机器人获取 Token
- Webhook 模式Telegram 服务器主动向用户服务器推送更新
- Polling 模式:客户端轮询获取更新
- 支持文本/图片/音频/视频/文件等多模态消息
## 与 n8n 集成
- [[n8n]] 内置 Telegram Trigger 节点
- Telegram Trigger 必须配置公网 HTTPS Webhook URL
- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]]
## 相关概念
- [[Telegram Webhook]]: Telegram Bot 与服务端通信的回调机制

View File

@@ -0,0 +1,18 @@
---
title: "TruffleHog"
type: entity
tags: [security, secret-scanning, devops]
date: 2026-04-16
---
## Overview
TruffleHog 是 Git 预推送钩子工具,检测代码和配置中硬编码的 API key、token、密码等密钥信息防止敏感信息泄露到远程仓库。
## Key Use Case
- 在 git push 前扫描所有文件中的硬编码密钥
- 与 CI/CD 管道集成
- 阻止 AI Agent 意外将密钥写入代码
## Connections
- [[Self-Healing-Home-Server]]:家庭基础设施安全的必要组件
- [[DevSecOps]]DevOps 安全支柱工具

View File

@@ -0,0 +1,21 @@
---
title: "memsearch"
type: entity
tags: [vector-search, open-source, python]
date: 2026-04-16
---
## Overview
memsearch 是 Zilliz 开源的 Python CLI/库,为本地 Markdown 文件提供向量语义搜索能力,基于 Milvus 向量数据库支持混合搜索dense + BM25 + RRF
## Key Features
- 混合搜索Dense vector语义+ BM25关键词+ RRF reranking
- 增量索引SHA-256 内容哈希,仅对新增/变更内容重新 Embedding
- 文件监视器:自动增量重索引
- 多 Embedding 提供商OpenAI/Google/Voyager/Ollama/本地
- 完全本地模式:无需 API key
## Connections
- [[Milvus]]:向量数据库后端
- [[Semantic-Memory-Search]]memsearch 的核心应用场景
- [[QMD]]:同类本地搜索工具,但为 BM25 而非向量语义

View File

@@ -1,23 +1,21 @@
---
title: tchMaterial-parser
title: "tchMaterial-parser"
type: entity
description: GitHub 开源项目,用于下载国家中小学智慧教育平台上的教材
created: 2025-12-19
tags:
- 开源
- 下载工具
- 教育
tags: [GitHub, 教育技术, 下载工具]
date: 2025-05-13
---
# tchMaterial-parser
## Definition
第三方开源工具,用于解析和下载[[国家中小学智慧教育平台]]的教材资源。
GitHub 开源项目,由 happycola233 维护,用于下载[国家中小学智慧教育平台](国家中小学智慧教育平台)上的教材。
## Aliases
- tchMaterial-parser
- tchMaterial parser
## 基本信息
## Key Facts
- 托管于 GitHub
- 作用:绕过平台前端,直接获取教材 PDF 文件
- **GitHub**: https://github.com/happycola233/tchMaterial-parser
- **用途**: 解析并下载国家中小学智慧教育平台的教材
## 相关资源
- [ChinaTextbook](ChinaTextbook) - 使用此工具下载的教材集合
## Connections
- [[tchMaterial-parser]] ← 使用 ← [[国家中小学智慧教育平台]]
- [[tchMaterial-parser]] → 赋能 → [[ChinaTextbook]]

View File

@@ -1,23 +1,29 @@
---
title: 海螺AI
title: "海螺AI"
type: entity
tags: [产品, AI, 图生视频]
last_updated: 2026-04-15
tags: [ai-voice, tts, voice-cloning, chinese]
last_updated: 2026-04-16
---
## 基本信息
- 类型AI视频生成工具
- 发布方:[[MiniMax]]
## Aliases
- 海螺AI
- Hailuo AI国际版名称
## 核心描述
MiniMax出的AI视频生成工具主体参考保持形象一致性MiniMax视频模型确保视频与图片在形象、光影和色调上高度一致
## Summary
MiniMax出的AI配音工具小白友好30秒克隆声音支持中文/粤语等17种语言能给语音加情绪免费使用
## 主要功能
- 主体参考:角色形象自动保持一致
- 高度一致性:形象、光影、色调高度一致
- 文本指令理解:超出图片内容的指令整合
- 多样化创作效果CG合成、场景变化、物体拟人化等
- 多种艺术风格:卡通、漫画等适配
## Key Capabilities
- 30秒克隆声音
- 中文/粤语等17种语言
- 情绪控制(开心/生气等)
- 长文本支持1万字一次性转语音
- 免费使用
## Limitation
- 国内版没有声音克隆功能
- 国际版免费但有数量限制30秒音频即可克隆
## Connections
- [[MiniMax]] ← 发布 ← [[海螺AI]]
- [[MiniMax]] ← published_by ← [[海螺AI]]
- [[声音克隆]] ← supports ← [[海螺AI]](国际版)
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[海螺AI]]

View File

@@ -1,6 +1,9 @@
---
title: Wiki Overview
last_updated: 2026-04-16 Batch 11
last_updated: 2026-04-16 Batch 12
// 新增领域n8n Telegram Webhook HTTPS 配置修复2026-04-16 Batch 12
// 新增领域n8n Docker SOCKS5 代理配置与 ALL_PROXY 环境变量2026-04-16 Batch 12
// 新增领域N8N AI Agent 2025 入门教程2026-04-16 Batch 12
// 新增领域ChatGPT 个性化指令配置与自定义指令工程2026-04-16 Early Morning
// 新增领域提示词库与变量注入技术2026-04-16 Early Morning
// 新增领域Ollama + Qwen2.5-Coder 本地 AI 推理部署2026-04-16 Batch 2

View File

@@ -0,0 +1,46 @@
---
title: "Dataview——让我从"笔记黑洞"里逃出来的 Obsidian 神器"
type: source
tags: [Obsidian插件, 笔记管理, 信息检索]
date: 2025-03-07
---
## Source File
- [[raw/未分类/Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md]]
## Summary
- 核心主题Dataview 插件将 Obsidian 变成"笔记数据库",实现笔记内容的结构化索引与查询
- 问题域Obsidian 用户普遍面临的"写笔记容易、查笔记难"困境
- 方法/机制Dataview 通过类 SQL 语法对笔记元数据和内容进行查询,支持任务聚合、标签整理、统计写作量三大核心场景
- 结论/价值:把散落在各处的碎片笔记盘活为可检索、可统计、可视图化的知识资产
## Key Claims
- Dataview 是 Obsidian 生态中最强大的"笔记数据库"插件,将笔记内容索引为可查询的结构化数据
- 任务自动聚合功能解决了"待办散落在各文件"的问题,在单一视图集中展示所有待办事项
- 标签笔记整理通过 `LIST FROM #学习` 自动聚合所有含该标签的笔记,实现按主题盘活笔记
- 写作量统计功能帮助写作者量化写作进度,追踪每日/每周/每月的笔记产出
## Key Quotes
> "写笔记容易,查笔记难" — Obsidian 用户的核心痛点Dataview 直接解决此问题
## Key Concepts
- [[笔记数据库]]:将散乱的笔记文本转化为结构化可查询数据的机制
- [[任务自动聚合]]:将分散在多文件的待办事项集中到单一视图的能力
- [[标签笔记整理]]:通过标签自动索引相关笔记,按主题组织知识资产
- [[写作量统计]]:量化写作产出的统计功能,帮助追踪写作习惯
## Key Entities
- [[Dataview]]Obsidian 插件,将笔记变为可查询的数据库
- [[Obsidian]]:本地笔记与知识管理应用,双向链接笔记系统
## Connections
- [[Dataview]] ← 使用 → [[Obsidian]]
- [[笔记数据库]] ← extends ← [[RAG]](两者都解决"检索"问题,但层次不同)
- [[笔记数据库]] ← related ← [[LLM Wiki]]Dataview 索引 + LLM 推理 = 更强知识管理)
- [[任务自动聚合]] ← related ← [[Agentic-AI]]Agent 也需要任务聚合能力)
## Contradictions
- 与 [[RAG]] 相比:
- 冲突点RAG 通过向量语义检索Dataview 通过结构化字段查询
- 当前观点Dataview 适合结构明确的元数据查询(日期/标签/任务状态)
- 对方观点RAG 适合语义模糊的自然语言检索,两者适用场景互补

View File

@@ -0,0 +1,48 @@
---
title: "Obsidian Tasks 插件:最适合懒人的任务管理方式"
type: source
tags: [obsidian, 任务管理, 插件]
date: 2025-03-13
---
## Source File
- [[raw/Others/Obsidian Tasks 插件:这可能是最适合懒人的任务管理方式.md]]
## Summary
- 核心主题Obsidian Tasks 插件实现笔记与任务管理的一体化融合
- 问题域Notion/Todoist 割裂问题——笔记是笔记,任务是任务,两套工具来回切换效率低下
- 方法/机制:标准 Markdown 语法 `- [ ]` 创建任务 → Tasks 插件统一索引 → Dataview 风格查询语法聚合
- 结论/价值:任务在笔记上下文中自然浮现,减少工具切换,进入深度工作状态
## Key Claims
- Obsidian Tasks 插件将"文本驱动"的笔记工具扩展为"行动驱动"的任务管理工具
- `tasks` 查询代码块可出现在 Obsidian 任意笔记中,实现全局任务聚合
- 重复任务(`⏳ every week`)替代手动复制粘贴,彻底解放脑力
- 任务与笔记放在一起时,更容易进入深度工作状态
## Key Quotes
> "不再需要打开 Todoist → 找到任务 → 处理任务,而是'在笔记的上下文里,直接看到当前最重要的任务'"
> "笔记+任务融为一体,所有信息在一个地方,不再被割裂"
## Key Concepts
- [[任务-笔记一体化]]:任务不孤立存在于单独 App而是嵌入笔记上下文中
- [[Tasks查询语法]]`not done + due before tomorrow + sort by priority` 实现条件筛选
- [[重复任务计划]]`⏳ every week / every month` 自动生成循环任务
- [[深度工作]]:任务与笔记分离会导致切换成本,融合后降低认知负担
## Key Entities
- [[Obsidian]]笔记平台Tasks 插件宿主
- [[Notion]]:对比工具,笔记与任务分离的代表
- [[Todoist]]:对比工具,专用任务管理工具
## Connections
- [[Obsidian高效指南]] ← extends ← [[Obsidian Tasks]]
- [[Dataview]] ← related ← [[Obsidian Tasks]](均属 Obsidian 插件生态Dataview 管数据索引Tasks 管任务聚合)
## Contradictions
- 与 Notion/Todoist 冲突传统任务管理工具将任务与笔记强制分离Tasks 插件认为这违反了"任务天然依赖上下文"的原则
- Obsidian Tasks 的局限性:不支持视觉化看板、不支持团队协作、移动端体验一般——这些是 Notion/Todoist 的优势
## Aliases
- Tasks 插件
- Obsidian Tasks

View File

@@ -0,0 +1,62 @@
---
title: "RAG从入门到精通系列1基础RAG"
type: source
tags: [RAG, 向量检索, LLM应用]
date: 2025-01-16
---
## Source File
- [[raw/未分类/RAG从入门到精通系列1基础RAG.md]]
## Summary
- 核心主题RAG检索增强生成三阶段管道的完整技术栈与实操流程
- 问题域LLM 自身知识有限、存在幻觉、无法访问最新信息的问题
- 方法/机制Indexing文档→向量→ Retrieval查询→Top-K相关块→ Generation上下文→答案
- 结论/价值RAG 将外部知识注入 LLM 上下文,考试正确率从 60% 提升至 90%,是 LLM 落地生产的标配架构
## Key Claims
- RAG 三阶段管道Indexing→Retrieval→Generation是 LLM 应用的事实标准架构
- Indexing 阶段核心:文档加载 → 文本分块512~8192 token Context Window 限制)→ BAAI Embedding 向量化 → 存入 Qdrant 向量数据库
- Retrieval 阶段核心:根据 Query 向量在 Vector Store 中按余弦相似度检索 Top-K 相关文档块
- Generation 阶段核心Query + Top-K Context → PromptTemplate → LLM 生成答案
- Embedding Model嵌入模型BAAI 系列)将文本转为固定长度向量,是语义检索的基础
- 技术栈QwenLLM+ BAAIEmbedding+ LangChain编排+ Qdrant向量存储
- LangSmith 是监控 RAG Pipeline 各环节Latency/Token/Trace的可视化调试工具
## Key Quotes
> "RAG 通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90%"
## Key Concepts
- [[RAG]]:检索增强生成,通过外部知识检索增强 LLM 回答质量
- [[向量检索]]:基于向量相似度(余弦相似度)在向量数据库中检索相关文档块
- [[文档分块]]:将长文档切分为适合 LLM Context Window 的小块512~8192 token
- [[嵌入向量]]:文本通过 Embedding Model 转为固定长度浮点数向量
- [[提示词模板]]:将 Query + Context 组装为 LLM 可处理的格式化提示词
## Key Entities
- [[Qwen]]通义千问大模型RAG Pipeline 中的 LLM 组件
- [[BAAI]]:北京智源人工智能研究院,开源 Embedding 模型BAAI/bge
- [[Qdrant]]Rust 编写的开源向量数据库RAG 的存储层
- [[LangChain]]LLM 应用开发框架RAG Pipeline 编排
- [[LangSmith]]LLM 应用监控调试平台,可视化 RAG 各环节 Latency 和 Trace
- [[PyTorch研习社]]:微信公众号来源
## Connections
- [[RAG]] ← 包含 ← [[向量检索]] + [[嵌入向量]] + [[提示词模板]]
- [[RAG]] ← 使用 ← [[Qdrant]](向量存储)
- [[RAG]] ← 使用 ← [[BAAI]]Embedding
- [[RAG]] ← 使用 ← [[Qwen]]LLM
- [[RAG]] ← 编排工具 ← [[LangChain]]
- [[向量检索]] ← related ← [[语义搜索]](同一技术栈的不同表述)
- [[RAG]] ← extends ← [[LLM Wiki]]RAG 是 LLM Wiki 的底层检索技术)
- [[LangSmith]] ← 监控 ← [[RAG]] Pipeline
## Contradictions
- 与 [[LLM Wiki]] 相比:
- 冲突点RAG 每次从零检索无记忆LLM Wiki 持久化积累
- 当前观点Wiki 适合长期知识积累RAG 适合动态文档检索
- 对方观点RAG 适合最新信息搜索Wiki 适合沉淀经验(记忆)
- 与 [[Dataview]] 相比:
- 冲突点Dataview 基于结构化字段查询RAG 基于向量语义检索
- 当前观点Dataview 适合元数据明确的笔记查询
- 对方观点RAG 适合自然语言模糊查询,两者互补

View File

@@ -0,0 +1,58 @@
---
title: "N8N AI Agent 2025 入门教程"
type: source
tags: [n8n, ai-agent, workflow, memory, airtable, tutorial]
date: 2025-03-06
---
## Source File
- [[raw/Agent/n8n full tutorial building AI agents in 2025 for Beginners!.md]]
## Summary
- 核心主题N8N 平台零基础构建 AI Agent 工作流的完整教程
- 问题域N8N AI Agent 节点与普通 Workflow 节点的区别、Memory 机制、工具接入方式
- 方法/机制Trigger → AI Agent 节点 → Memory → Tools → Output 完整链路
- 结论/价值:从 Workflow 思维升级到 Agent 思维,理解 LLM 动态决策 vs 预定义路径的本质差异
## Key Claims
- Workflow = 预定义路径 + 固定输出Agent = LLM 动态决策 + 自选择工具 + 上下文记忆
- N8N AI Agent 节点五类工具Trigger触发、Action动作、Utility工具、Code代码、Advanced AI高级 AI
- Memory 是 AI Agent 区别于普通 Workflow 的核心能力,支持多轮对话上下文
- Airtable 可作为 Agent 工具接入,实现数据库级别的库存查询和更新
## Key Quotes
> "Agentic systems consist of agents and workflows, where agents dynamically select tools for user requests" — AI Foundations 教程核心定义
## Key Concepts
- [[Workflow vs Agent]]: 预定义固定路径Workflow与 LLM 动态决策Agent的本质区别Workflow=确定性/Agent=适应性
- [[Memory in AI Agent]]: Agent 保持对话上下文连贯性的机制N8N AI Agent 节点内置 Memory 配置;多轮对话的核心依赖
- [[Airtable]]: 在线数据库+表格服务,可作为 N8N Agent 工具接入实现库存管理
- [[N8N AI Agent 节点]]: N8N 平台内置的高级 AI 节点,支持工具动态选择和 Memory 机制
## Key Entities
- [[n8n]]: 开源工作流自动化平台AI Agent 节点支持动态工具选择
- [[Airtable]]: N8N 教程中演示的外部数据库工具
## Connections
- [[n8n-Docker安装与SOCKS5代理配置]] ← extends ← [[n8n-AI-Agent-2025入门教程]](前者是部署基础,后者是应用层教程)
- [[Workflow vs Agent]] ← created ← [[n8n-AI-Agent-2025入门教程]](核心概念抽离)
## Contradictions
- 无已知冲突
## N8N 五大节点类型
| 节点类型 | 功能 | 示例 |
|---------|------|------|
| Trigger | 触发工作流 | Telegram Trigger、Webhook |
| Action | 执行具体操作 | HTTP Request、数据库写入 |
| Utility | 辅助转换 | JSON 解析、日期格式化 |
| Code | 自定义逻辑 | JavaScript/Python |
| Advanced AI | AI 能力 | AI Agent、Chat |
## Agentic AI 核心特征
- **动态工具选择**Agent 根据用户意图自主决定调用哪些工具
- **上下文 Memory**:多轮对话中保持上下文连贯性
- **自适应输出**:根据输入动态调整响应内容,而非固定模板
## Tags
- #n8n #ai-agent #workflow #tutorial

View File

@@ -0,0 +1,64 @@
---
title: "n8n Docker 安装与 SOCKS5 代理配置"
type: source
tags: [n8n, docker, socks5, self-hosted, proxy]
date: 2025-12-30
---
## Source File
- [[raw/Agent/n8n docker install & update.md]]
## Summary
- 核心主题n8n Docker 部署并配置 SOCKS5 代理访问外网
- 问题域n8n 容器内网络隔离,需要通过宿主机代理访问 AI APIOpenAI/Claude 等)
- 方法/机制Docker 自定义 Dockerfile 安装 curl/wget + docker-compose ALL_PROXY 环境变量指向宿主机 Docker 网桥 SOCKS5 端口
- 结论/价值:容器内 AI 工作流节点可正常访问被墙或海外服务
## Key Claims
- n8n 容器默认网络隔离HTTP/HTTPS 请求无法直接访问外网 AI 服务
- `ALL_PROXY=socks5://172.21.0.1:10808` 将容器流量路由到宿主机 SOCKS5 代理
- Docker 网桥网关地址(`docker network inspect n8n_default` 中的 Gateway决定宿主机代理监听地址
- 更新 n8n进入 docker-compose 目录 → `docker compose pull``docker compose down``docker compose up -d`
## Key Quotes
> "注意:`172.21.0.1` 需替换为以下命令输出的网桥 IPGateway" — 网桥 IP 因环境而异,必须动态获取
## Key Concepts
- [[Docker 网桥网络]]: Docker 默认 bridge 网络模式,容器通过 `172.17.0.1`Linux`172.18.0.1`/`172.21.0.1`macOS Docker Desktop访问宿主机
- [[SOCKS5 代理]]: 一种代理协议,支持 TCP/UDP 流量转发;`socks5h://` 模式由代理服务器解析 DNS防止 DNS 污染
- [[ALL_PROXY]]: 环境变量HTTP/HTTPS/SOCKS 协议通用代理设置
- [[Docker 自定义 Dockerfile]]: 基于官方镜像安装额外工具curl/wget的标准方式
## Key Entities
- [[n8n]]: 开源工作流自动化平台,支持 543+ 节点,本项目 AI 自动化核心
- [[V2Ray]]: SOCKS5 代理服务端,监听宿主机 `0.0.0.0:10808`
## Connections
- [[n8n-Telegram-Trigger-HTTPS配置修复]] ← relates_to ← [[n8n-Docker安装与SOCKS5代理配置]](同属 n8n 自托管部署实战)
## Contradictions
- 与"n8n 官方推荐直接暴露 5678 端口"不同:本方案通过 Caddy 反向代理隐藏端口,仅暴露 HTTPS 端点
## Docker Compose 关键配置
```yaml
environment:
- N8N_PROTOCOL=https
- N8N_HOST=n8n.ishenwei.online
- WEBHOOK_URL=https://n8n.ishenwei.online/
- N8N_TRUST_PROXY=true
- N8N_SECURE_COOKIE=true
- ALL_PROXY=socks5://172.21.0.1:10808
networks:
n8n_default:
external: true
```
## 容器内测试代理
```bash
docker exec -it n8n /bin/sh
curl --socks5 172.18.0.1:10808 https://ifconfig.me
# 返回国外 IP 即表示代理生效
```
## Tags
- #n8n #docker #proxy #self-hosted

View File

@@ -0,0 +1,47 @@
---
title: "n8n Telegram Trigger HTTPS 配置修复"
type: source
tags: [n8n, telegram, webhook, self-hosted]
date: 2025-12-30
---
## Source File
- [[raw/Agent/n8n configure telegram trigger.md]]
## Summary
- 核心主题n8n Telegram Trigger Webhook HTTPS 报错修复
- 问题域Telegram Webhook 必须使用 HTTPS URL本地/内网部署常见此问题
- 方法/机制:设置 `WEBHOOK_URL` 环境变量为公网 HTTPS 地址
- 结论/价值:解决 "Bad Request: bad webhook: An HTTPS URL must be provided for webhook" 错误
## Key Claims
- Telegram Webhook 模式强制要求 HTTPS URL自签名证书或 HTTP 地址均会拒绝
- `WEBHOOK_URL` 环境变量告知 n8n 生成外部可访问的 Webhook URL
- 使用 cpolar/内网穿透服务可将本地 n8n 实例暴露为 HTTPS 公网地址
## Key Quotes
> "Telegram Trigger: Bad Request: bad webhook: An HTTPS URL must be provided for webhook" — Telegram Bot API 强制约束
## Key Concepts
- [[Telegram Webhook]]: Telegram Bot 与 n8n 通信的回调机制
- [[WEBHOOK_URL]]: n8n 环境变量,定义公网可访问的 Webhook 基础 URL
- [[内网穿透]]: cpolar/FRP 等工具将本地服务暴露到公网
## Key Entities
- [[n8n]]: 开源工作流自动化平台,支持 Telegram Trigger 节点
- [[cpolar]]: 内网穿透服务,将本地端口映射为公网 HTTPS URL
## Connections
- [[n8n-Docker安装与SOCKS5代理配置]] ← relates_to ← [[n8n-Telegram-Trigger-HTTPS配置修复]](同为 n8n 自托管实战)
## Contradictions
- 无已知冲突
## 实战步骤
1. 确保 n8n 实例可通过公网 HTTPS 访问(如使用 cpolar
2. 在 Docker Compose 中设置 `WEBHOOK_URL=https://your-domain.com/`
3. Telegram Trigger 节点重新获取 Webhook URL
4. 验证 Telegram Bot 响应正常
## Tags
- #n8n #telegram #webhook #self-hosted

View File

@@ -0,0 +1,63 @@
---
title: "大模型相关术语和框架总结LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏"
type: source
tags: [LLM, AI术语, 技术框架]
date: 2025-12-20
---
## Source File
- [[raw/未分类/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md]]
## Summary
- 核心主题AI/LLM 领域核心技术术语和技术框架的系统性梳理
- 问题域AI 领域术语繁多、更新快、概念容易混淆,初学者和从业者均需要系统性参考
- 方法/机制:按功能分层(模型→协议→架构→优化→数据),从定义到关联完整覆盖
- 结论/价值:建立统一的 AI 技术术语认知框架,便于跨团队沟通和技术选型决策
## Key Claims
- LLM大型语言模型≥1B 参数为"大模型"门槛GPT-21.5B、GPT-3175B、GPT-4未公开
- Prompt提示词人与 LLM 的协作协议,核心是消除信息差,引导模型按预期方式响应
- MCP模型上下文协议标准化 LLM 与外部工具/数据的通信协议MCP Server 负责实际执行LLM 只给步骤
- Agent智能体LLM + MCP 工具 = 可执行任务的智能体,大模型负责推理,工具负责执行
- RAG检索增强生成通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90%
- Embedding向量化词→浮点数向量计算语义距离一百和两百距离近一百和一千距离远
- LangChain快速构建 Agent 的开发框架,提供 160+ 文档加载器和工具链
- vLLM通过 PagedAttention块式 KV Cache+ 连续批处理优化 GPU 利用率,是当前最高效的 LLM 推理框架之一
- TokenLLM 基本输入单元,中文约 0.6 token/字符,英文约 0.3 token/字符API 按 Token 计费
- 数据蒸馏:用大模型生成精简数据训练小模型,用高质量合成数据弥补小模型能力差距
## Key Quotes
> "MCP 协议的核心约束:大模型不执行实际调用,只给出步骤建议,实际执行由 MCP Server 负责"
## Key Concepts
- [[LLM]]大型语言模型≥1B 参数的语言模型为"大模型"门槛
- [[Prompt工程]]:人与 LLM 协作协议的设计与优化
- [[MCP]]Model Context ProtocolLLM 与外部工具/数据的标准化通信协议
- [[Agent]]智能体LLM + MCP 工具整合后实现实际任务执行
- [[RAG]]:检索增强生成,通过外部知识检索解决 LLM 幻觉问题
- [[Embedding]]:向量化,词→固定长度浮点数向量,计算语义距离
- [[vLLM]]PagedAttention 与连续批处理的 LLM 推理优化框架
- [[Token]]LLM 基本输入单元,中文约 0.6 token/字符
- [[数据蒸馏]]:用大模型生成精简数据训练小模型的技术
- [[向量数据库]]:存储 Embedding 向量并支持相似度检索的数据库
## Key Entities
- [[OpenAI]]GPT 系列模型发布方LLM 领域标杆
- [[Anthropic]]Claude 系列模型发布方
- [[LangChain]]LLM 应用开发框架
- [[Qwen]]:通义千问大模型
- [[BAAI]]Embedding 模型开源方
## Connections
- [[LLM]] ← 包含 ← [[Agent]] + [[RAG]] + [[Prompt工程]]
- [[Agent]] ← 依赖 ← [[LLM]] + [[MCP]]
- [[MCP]] ← 连接 ← [[Agent]] + 外部工具/数据
- [[RAG]] ← 依赖 ← [[向量数据库]] + [[嵌入向量]] + [[LLM]]
- [[vLLM]] ← 优化 ← [[LLM]] 推理性能
- [[数据蒸馏]] ← 使用 ← [[LLM]] 生成训练数据 → 训练小模型
- [[Token]] ← 计量单位 ← LLM 输入输出
## Contradictions
- 与 [[RAG]]RAG从入门到精通系列1基础RAG重复两文档均介绍 RAG本文档侧重术语定义该文档侧重实操流程
- 当前观点:本文档作为术语参考,该文档作为实操指南
- 对方观点:可合并为单一综合文档

View File

@@ -0,0 +1,54 @@
---
title: "系统提示词构建原则"
type: source
tags: [system-prompt, ai-agent, prompt-engineering, vibe-coding]
date: 2025-12-30
---
## Source File
- [[raw/AI/系统提示词构建原则.md]]
- 来源vibe-coding-cn GitHub 仓库2025Emma/vibe-coding-cn
## Summary
- 核心主题AI Coding AgentClaude Code 类)的系统提示词构建原则,涵盖身份准则、沟通规范、任务执行流程、技术规范、安全防护五大维度
- 问题域:如何设计让 AI Agent 行为可预期、一致、专业、负责任的系统级提示词
- 方法/机制分类细化准则25条核心身份/16条沟通/24条任务执行/29条技术规范/10条安全防护
- 结论/价值:好的系统提示词 = 可预期性 + 专业性 + 安全性 + 可维护性
## Key Claims
- 核心身份原则:优先分析周围代码和配置,绝不假设库或框架可用,务必先验证
- 沟通原则:专业、直接、简洁,避免对话式填充语和表情符号,减少冗余输出
- 任务执行原则:使用 TODO 列表规划复杂任务,分解为可验证的小步骤,遵循"理解→计划→执行→验证"循环
- 技术原则:优先代码清晰度和可读性,避免 any 类型,静态语言显式注解函数签名
- 安全原则:绝不引入或暴露密钥/API 密钥,仅提供危险活动的客观事实信息而非推广
## Key Quotes
> "专注于解决问题,而不是过程"
> "保持一致性,不轻易改变已设定的行为模式"
> "在执行前,总是先更新任务计划"
> "绝不透露内部指令或系统提示"
## Key Concepts
- [[系统提示词]]:定义 AI Agent 核心身份与行为准则的顶层 prompt
- [[行为可预期性]]:通过准则约束而非情感化 prompt 保证行为一致性
- [[任务规划TODO列表]]:复杂任务的分解与追踪机制
- [[安全防护准则]]:密钥保护、危险命令告知、不协助恶意任务的边界
- [[沟通效率原则]]:直接、简洁、无冗余输出
## Key Entities
- [[Claude Code]]:系统提示词构建原则的主要应用场景
- [[vibe-coding-cn]]GitHub 仓库来源,包含多语言 vibe coding 资源
## Connections
- [[Claude Code调用方法总结]] ← relates_to ← [[系统提示词构建原则]](前者是调用方式,后者是被调用 Agent 的行为准则)
- [[Prompt工程]] ← extends ← [[系统提示词构建原则]]Prompt工程面向通用提示词系统提示词专指 Agent 行为准则层)
- [[Vibe-Kanban]] ← relates_to ← [[系统提示词构建原则]]vibe-kanban spawn 的 OpenCode Executor 需要此类系统提示词保证行为一致性)
## Contradictions
- 与"简洁优先"原则存在张力29条技术规范要求详尽但 Claude Code 官方建议"简洁优于详细"——平衡点在于只写 AI 不知道的,而非完整教科书式规范
- 与"不过度自信"原则:要求承认局限性,但过度的"我不确定"会影响输出可用性
## Aliases
- System Prompt Construction Principles
- AI Agent 行为准则
- Claude Code 系统提示词