Auto-sync: 2026-04-16 13:01
This commit is contained in:
@@ -1,18 +0,0 @@
|
||||
Build the LLM Wiki knowledge graph.
|
||||
|
||||
Usage: /wiki-graph
|
||||
|
||||
First try running: python tools/build_graph.py --open
|
||||
|
||||
If that fails (missing dependencies), build the graph manually:
|
||||
|
||||
1. Use Grep to find all [[wikilinks]] across every file in wiki/
|
||||
2. Build a nodes list: one node per wiki page, with id=relative-path, label=title, type from frontmatter
|
||||
3. Build an edges list: one edge per [[wikilink]], tagged EXTRACTED
|
||||
4. Infer additional implicit relationships between pages not captured by wikilinks — tag these INFERRED with a confidence score (0.0–1.0); tag low-confidence ones AMBIGUOUS
|
||||
5. Write graph/graph.json with {nodes, edges, built: today}
|
||||
6. Write graph/graph.html as a self-contained vis.js page (nodes colored by type, edges colored by type, interactive, searchable)
|
||||
|
||||
After building, summarize: node count, edge count, breakdown by type, and the most connected nodes (hubs).
|
||||
|
||||
Append to wiki/log.md: ## [today's date] graph | Knowledge graph rebuilt
|
||||
@@ -1,18 +0,0 @@
|
||||
Ingest a source document into the LLM Wiki.
|
||||
|
||||
Usage: /wiki-ingest $ARGUMENTS
|
||||
|
||||
$ARGUMENTS should be the path to a file in raw/, e.g. `raw/articles/my-article.md`
|
||||
|
||||
Follow the Ingest Workflow defined in CLAUDE.md exactly:
|
||||
1. Read the source file at the given path
|
||||
2. Read wiki/index.md and wiki/overview.md for current context
|
||||
3. Write wiki/sources/<slug>.md (source page format per CLAUDE.md)
|
||||
4. Update wiki/index.md — add the new entry under Sources
|
||||
5. Update wiki/overview.md — revise synthesis if warranted
|
||||
6. Create/update entity pages (wiki/entities/) for key people, companies, projects
|
||||
7. Create/update concept pages (wiki/concepts/) for key ideas and frameworks
|
||||
8. Flag any contradictions with existing wiki content
|
||||
9. Append to wiki/log.md: ## [today's date] ingest | <Title>
|
||||
|
||||
After completing all writes, summarize: what was added, which pages were created or updated, and any contradictions found.
|
||||
@@ -1,19 +0,0 @@
|
||||
Health-check the LLM Wiki for issues.
|
||||
|
||||
Usage: /wiki-lint
|
||||
|
||||
Follow the Lint Workflow defined in CLAUDE.md:
|
||||
|
||||
Structural checks (use Grep and Glob tools):
|
||||
1. Orphan pages — wiki pages with no inbound [[wikilinks]] from other pages
|
||||
2. Broken links — [[WikiLinks]] pointing to pages that don't exist
|
||||
3. Missing entity pages — names referenced in 3+ pages but lacking their own page
|
||||
|
||||
Semantic checks (read and reason over page content):
|
||||
4. Contradictions — claims that conflict between pages
|
||||
5. Stale summaries — pages not updated after newer sources changed the picture
|
||||
6. Data gaps — important questions the wiki can't answer; suggest specific sources to find
|
||||
|
||||
Output a structured markdown lint report. At the end, ask if the user wants it saved to wiki/lint-report.md.
|
||||
|
||||
Append to wiki/log.md: ## [today's date] lint | Wiki health check
|
||||
@@ -1,14 +0,0 @@
|
||||
Query the LLM Wiki and synthesize an answer.
|
||||
|
||||
Usage: /wiki-query $ARGUMENTS
|
||||
|
||||
$ARGUMENTS is the question to answer, e.g. `What are the main themes across all sources?`
|
||||
|
||||
Follow the Query Workflow defined in CLAUDE.md:
|
||||
1. Read wiki/index.md to identify the most relevant pages
|
||||
2. Read those pages (up to ~10 most relevant)
|
||||
3. Synthesize a thorough markdown answer with [[PageName]] wikilink citations
|
||||
4. Include a ## Sources section at the end listing pages you drew from
|
||||
5. Ask the user if they want the answer saved as wiki/syntheses/<slug>.md
|
||||
|
||||
If the wiki is empty, say so and suggest running /wiki-ingest first.
|
||||
219
AGENTS.md
219
AGENTS.md
@@ -1,219 +0,0 @@
|
||||
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||
|
||||
This wiki is maintained entirely by your coding agent. No API key or Python scripts needed — just open this repo in Codex, OpenCode, or any agent that reads this file, and talk to it.
|
||||
|
||||
## How to Use
|
||||
|
||||
Describe what you want in plain English:
|
||||
- *"Ingest this file: raw/papers/my-paper.md"*
|
||||
- *"What does the wiki say about transformer models?"*
|
||||
- *"Check the wiki for orphan pages and contradictions"*
|
||||
- *"Build the knowledge graph"*
|
||||
|
||||
Or use shorthand triggers:
|
||||
- `ingest <file>` → runs the Ingest Workflow
|
||||
- `query: <question>` → runs the Query Workflow
|
||||
- `lint` → runs the Lint Workflow
|
||||
- `build graph` → runs the Graph Workflow
|
||||
|
||||
---
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
raw/ # Immutable source documents — never modify these
|
||||
wiki/ # Agent owns this layer entirely
|
||||
index.md # Catalog of all pages — update on every ingest
|
||||
log.md # Append-only chronological record
|
||||
overview.md # Living synthesis across all sources
|
||||
sources/ # One summary page per source document
|
||||
entities/ # People, companies, projects, products
|
||||
concepts/ # Ideas, frameworks, methods, theories
|
||||
syntheses/ # Saved query answers
|
||||
graph/ # Auto-generated graph data
|
||||
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Page Format
|
||||
|
||||
Every wiki page uses this frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
title: "Page Title"
|
||||
type: source | entity | concept | synthesis
|
||||
tags: []
|
||||
sources: [] # list of source slugs that inform this page
|
||||
last_updated: YYYY-MM-DD
|
||||
---
|
||||
```
|
||||
|
||||
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||
|
||||
---
|
||||
|
||||
## Ingest Workflow
|
||||
|
||||
Triggered by: *"ingest <file>"*
|
||||
|
||||
Steps (in order):
|
||||
1. Read the source document fully
|
||||
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||
3. Write `wiki/sources/<slug>.md` — use the source page format below
|
||||
4. Update `wiki/index.md` — add entry under Sources section
|
||||
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||
6. Update/create entity pages for key people, companies, projects mentioned
|
||||
7. Update/create concept pages for key ideas and frameworks discussed
|
||||
8. Flag any contradictions with existing wiki content
|
||||
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||
|
||||
### Source Page Format
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "Source Title"
|
||||
type: source
|
||||
tags: []
|
||||
date: YYYY-MM-DD
|
||||
source_file: raw/...
|
||||
---
|
||||
|
||||
## Summary
|
||||
2–4 sentence summary.
|
||||
|
||||
## Key Claims
|
||||
- Claim 1
|
||||
- Claim 2
|
||||
|
||||
## Key Quotes
|
||||
> "Quote here" — context
|
||||
|
||||
## Connections
|
||||
- [[EntityName]] — how they relate
|
||||
- [[ConceptName]] — how it connects
|
||||
|
||||
## Contradictions
|
||||
- Contradicts [[OtherPage]] on: ...
|
||||
```
|
||||
|
||||
### Domain-Specific Templates
|
||||
|
||||
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||
|
||||
#### Diary / Journal Template
|
||||
```markdown
|
||||
---
|
||||
title: "YYYY-MM-DD Diary"
|
||||
type: source
|
||||
tags: [diary]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Event Summary
|
||||
...
|
||||
## Key Decisions
|
||||
...
|
||||
## Energy & Mood
|
||||
...
|
||||
## Connections
|
||||
...
|
||||
## Shifts & Contradictions
|
||||
...
|
||||
```
|
||||
|
||||
#### Meeting Notes Template
|
||||
```markdown
|
||||
---
|
||||
title: "Meeting Title"
|
||||
type: source
|
||||
tags: [meeting]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Goal
|
||||
...
|
||||
## Key Discussions
|
||||
...
|
||||
## Decisions Made
|
||||
...
|
||||
## Action Items
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Workflow
|
||||
|
||||
Triggered by: *"query: <question>"*
|
||||
|
||||
Steps:
|
||||
1. Read `wiki/index.md` to identify relevant pages
|
||||
2. Read those pages
|
||||
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
|
||||
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
|
||||
|
||||
---
|
||||
|
||||
## Lint Workflow
|
||||
|
||||
Triggered by: *"lint"*
|
||||
|
||||
Check for:
|
||||
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
|
||||
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
|
||||
- **Contradictions** — claims that conflict across pages
|
||||
- **Stale summaries** — pages not updated after newer sources
|
||||
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
|
||||
- **Data gaps** — questions the wiki can't answer; suggest new sources
|
||||
|
||||
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
|
||||
|
||||
---
|
||||
|
||||
## Graph Workflow
|
||||
|
||||
Triggered by: *"build graph"*
|
||||
|
||||
First try: `python tools/build_graph.py --open`
|
||||
|
||||
If Python/deps unavailable, build manually:
|
||||
1. Search for all `[[wikilinks]]` across wiki pages
|
||||
2. Build nodes (one per page) and edges (one per link)
|
||||
3. Infer implicit relationships not captured by wikilinks — tag `INFERRED` with confidence score; low confidence → `AMBIGUOUS`
|
||||
4. Write `graph/graph.json` with `{nodes, edges, built: date}`
|
||||
5. Write `graph/graph.html` as a self-contained vis.js visualization
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
- Source slugs: `kebab-case` matching source filename
|
||||
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
|
||||
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
|
||||
|
||||
## Index Format
|
||||
|
||||
```markdown
|
||||
# Wiki Index
|
||||
|
||||
## Overview
|
||||
- [Overview](overview.md) — living synthesis
|
||||
|
||||
## Sources
|
||||
- [Source Title](sources/slug.md) — one-line summary
|
||||
|
||||
## Entities
|
||||
- [Entity Name](entities/EntityName.md) — one-line description
|
||||
|
||||
## Concepts
|
||||
- [Concept Name](concepts/ConceptName.md) — one-line description
|
||||
|
||||
## Syntheses
|
||||
- [Analysis Title](syntheses/slug.md) — what question it answers
|
||||
```
|
||||
|
||||
## Log Format
|
||||
|
||||
`## [YYYY-MM-DD] <operation> | <title>`
|
||||
|
||||
Operations: `ingest`, `query`, `lint`, `graph`
|
||||
@@ -1,230 +0,0 @@
|
||||
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||
|
||||
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
|
||||
|
||||
## Slash Commands (Claude Code)
|
||||
|
||||
| Command | What to say |
|
||||
|---|---|
|
||||
| `/wiki-ingest` | `ingest raw/my-article.md` |
|
||||
| `/wiki-query` | `query: what are the main themes?` |
|
||||
| `/wiki-lint` | `lint the wiki` |
|
||||
| `/wiki-graph` | `build the knowledge graph` |
|
||||
|
||||
Or just describe what you want in plain English:
|
||||
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
|
||||
- *"What does the wiki say about transformer models?"*
|
||||
- *"Check the wiki for orphan pages and contradictions"*
|
||||
- *"Build the graph and show me what's connected to RAG"*
|
||||
|
||||
Claude Code reads this file automatically and follows the workflows below.
|
||||
|
||||
---
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
raw/ # Immutable source documents — never modify these
|
||||
wiki/ # Claude owns this layer entirely
|
||||
index.md # Catalog of all pages — update on every ingest
|
||||
log.md # Append-only chronological record
|
||||
overview.md # Living synthesis across all sources
|
||||
sources/ # One summary page per source document
|
||||
entities/ # People, companies, projects, products
|
||||
concepts/ # Ideas, frameworks, methods, theories
|
||||
syntheses/ # Saved query answers
|
||||
graph/ # Auto-generated graph data
|
||||
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Page Format
|
||||
|
||||
Every wiki page uses this frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
title: "Page Title"
|
||||
type: source | entity | concept | synthesis
|
||||
tags: []
|
||||
sources: [] # list of source slugs that inform this page
|
||||
last_updated: YYYY-MM-DD
|
||||
---
|
||||
```
|
||||
|
||||
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||
|
||||
---
|
||||
|
||||
## Ingest Workflow
|
||||
|
||||
Triggered by: *"ingest <file>"* or `/wiki-ingest`
|
||||
|
||||
Steps (in order):
|
||||
1. Read the source document fully using the Read tool
|
||||
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||
3. Write `wiki/sources/<slug>.md` — use the source page format below
|
||||
4. Update `wiki/index.md` — add entry under Sources section
|
||||
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||
6. Update/create entity pages for key people, companies, projects mentioned
|
||||
7. Update/create concept pages for key ideas and frameworks discussed
|
||||
8. Flag any contradictions with existing wiki content
|
||||
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||
|
||||
### Source Page Format
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "Source Title"
|
||||
type: source
|
||||
tags: []
|
||||
date: YYYY-MM-DD
|
||||
source_file: raw/...
|
||||
---
|
||||
|
||||
## Summary
|
||||
2–4 sentence summary.
|
||||
|
||||
## Key Claims
|
||||
- Claim 1
|
||||
- Claim 2
|
||||
|
||||
## Key Quotes
|
||||
> "Quote here" — context
|
||||
|
||||
## Connections
|
||||
- [[EntityName]] — how they relate
|
||||
- [[ConceptName]] — how it connects
|
||||
|
||||
## Contradictions
|
||||
- Contradicts [[OtherPage]] on: ...
|
||||
```
|
||||
|
||||
### Domain-Specific Templates
|
||||
|
||||
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||
|
||||
#### Diary / Journal Template
|
||||
```markdown
|
||||
---
|
||||
title: "YYYY-MM-DD Diary"
|
||||
type: source
|
||||
tags: [diary]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Event Summary
|
||||
...
|
||||
## Key Decisions
|
||||
...
|
||||
## Energy & Mood
|
||||
...
|
||||
## Connections
|
||||
...
|
||||
## Shifts & Contradictions
|
||||
...
|
||||
```
|
||||
|
||||
#### Meeting Notes Template
|
||||
```markdown
|
||||
---
|
||||
title: "Meeting Title"
|
||||
type: source
|
||||
tags: [meeting]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Goal
|
||||
...
|
||||
## Key Discussions
|
||||
...
|
||||
## Decisions Made
|
||||
...
|
||||
## Action Items
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Workflow
|
||||
|
||||
Triggered by: *"query: <question>"* or `/wiki-query`
|
||||
|
||||
Steps:
|
||||
1. Read `wiki/index.md` to identify relevant pages
|
||||
2. Read those pages with the Read tool
|
||||
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
|
||||
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
|
||||
|
||||
---
|
||||
|
||||
## Lint Workflow
|
||||
|
||||
Triggered by: *"lint the wiki"* or `/wiki-lint`
|
||||
|
||||
Use Grep and Read tools to check for:
|
||||
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
|
||||
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
|
||||
- **Contradictions** — claims that conflict across pages
|
||||
- **Stale summaries** — pages not updated after newer sources
|
||||
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
|
||||
- **Data gaps** — questions the wiki can't answer; suggest new sources
|
||||
|
||||
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
|
||||
|
||||
---
|
||||
|
||||
## Graph Workflow
|
||||
|
||||
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
|
||||
|
||||
When the user asks to build the graph, run `tools/build_graph.py` which:
|
||||
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
|
||||
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
|
||||
- Runs Louvain community detection
|
||||
- Outputs `graph/graph.json` + `graph/graph.html`
|
||||
|
||||
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
|
||||
1. Use Grep to find all `[[wikilinks]]` across wiki pages
|
||||
2. Build a node/edge list
|
||||
3. Write `graph/graph.json` directly
|
||||
4. Write `graph/graph.html` using the vis.js template
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
- Source slugs: `kebab-case` matching source filename
|
||||
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
|
||||
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
|
||||
- Source pages: `kebab-case.md`
|
||||
|
||||
## Index Format
|
||||
|
||||
```markdown
|
||||
# Wiki Index
|
||||
|
||||
## Overview
|
||||
- [Overview](overview.md) — living synthesis
|
||||
|
||||
## Sources
|
||||
- [Source Title](sources/slug.md) — one-line summary
|
||||
|
||||
## Entities
|
||||
- [Entity Name](entities/EntityName.md) — one-line description
|
||||
|
||||
## Concepts
|
||||
- [Concept Name](concepts/ConceptName.md) — one-line description
|
||||
|
||||
## Syntheses
|
||||
- [Analysis Title](syntheses/slug.md) — what question it answers
|
||||
```
|
||||
|
||||
## Log Format
|
||||
|
||||
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
|
||||
|
||||
```
|
||||
grep "^## \[" wiki/log.md | tail -10
|
||||
```
|
||||
|
||||
Operations: `ingest`, `query`, `lint`, `graph`
|
||||
352
CLAUDE.md
352
CLAUDE.md
@@ -1,352 +0,0 @@
|
||||
# LLM Wiki Agent — Schema & Workflow Instructions(中文版增强规范)
|
||||
|
||||
本 Wiki 完全由 Claude Code 自动维护。无需 API Key 或 Python 脚本 —— 只需在 Claude Code 中打开本仓库并与其对话。
|
||||
|
||||
---
|
||||
# 🔴 全局强制规则(CRITICAL)
|
||||
|
||||
## 1. 输出语言(必须遵守)
|
||||
|
||||
- 所有输出必须使用**简体中文**
|
||||
- 专有名词允许保留英文,但首次出现必须附带中文解释
|
||||
- 如果原始文件名是中文,则source页面的名称尽量用中文,不要用拼音表示, 如果有特殊字符可以忽略
|
||||
- 禁止中英混合句(术语除外)
|
||||
- 不允许输出纯英文总结或分析
|
||||
|
||||
示例:
|
||||
|
||||
Transformer(变压器模型,一种基于注意力机制的神经网络架构)
|
||||
|
||||
---
|
||||
|
||||
## 2. 输出风格(严格限制)
|
||||
|
||||
所有输出必须:
|
||||
|
||||
- 去修辞(禁止 narrative 风格)
|
||||
- 去模糊(禁止“可能”“大概”等词)
|
||||
- 信息密度最大化
|
||||
- 面向“知识结构化”,而非阅读体验
|
||||
|
||||
优先级:
|
||||
|
||||
结构 > 关系 > 结论 > 描述
|
||||
|
||||
---
|
||||
|
||||
## 3. 结构化语义(必须)
|
||||
|
||||
所有页面必须遵循结构化语义规则:
|
||||
|
||||
- Summary 必须使用固定字段
|
||||
- Claim 必须符合标准语法
|
||||
- Connections 必须使用关系类型
|
||||
- 禁止自由发挥
|
||||
|
||||
---
|
||||
|
||||
# Slash Commands(Claude Code)
|
||||
|
||||
| Command | 使用方式 |
|
||||
| -------------- | --------------------------- |
|
||||
| `/wiki-ingest` | `ingest raw/your-file.md` |
|
||||
| `/wiki-query` | `query: 你的问题` |
|
||||
| `/wiki-lint` | `lint the wiki` |
|
||||
| `/wiki-graph` | `build the knowledge graph` |
|
||||
|
||||
---
|
||||
|
||||
## 自然语言示例
|
||||
|
||||
- ingest raw/papers/attention-is-all-you-need.md
|
||||
- query: Transformer 的核心机制是什么?
|
||||
- lint the wiki
|
||||
- build the graph and analyze RAG
|
||||
|
||||
Claude Code 会自动读取本文件并执行以下工作流。
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Directory Layout(目录结构)
|
||||
|
||||
```
|
||||
raw/ # 原始文档(不可修改)
|
||||
wiki/ # 知识层(由 Claude 完全维护)
|
||||
index.md # 页面索引(每次 ingest 必须更新)
|
||||
log.md # 追加式日志
|
||||
overview.md # 全局知识总结
|
||||
sources/ # 每个原始文档对应一个页面
|
||||
entities/ # 实体(人/公司/产品/项目)
|
||||
concepts/ # 概念(方法/理论/框架)
|
||||
syntheses/ # 查询结果沉淀
|
||||
graph/ # 自动生成的图数据
|
||||
tools/ # 可选 Python 工具 (require ANTHROPIC_API_KEY)
|
||||
````
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Page Format(页面格式)
|
||||
|
||||
每个页面必须包含:
|
||||
|
||||
```yaml
|
||||
---
|
||||
id: unique_id
|
||||
title: "Page Title"
|
||||
type: source | entity | concept | synthesis
|
||||
tags: []
|
||||
sources: [] # 来源
|
||||
last_updated: YYYY-MM-DD
|
||||
---
|
||||
````
|
||||
|
||||
必须使用 `[[PageName]]` 进行链接。
|
||||
|
||||
---
|
||||
|
||||
# Ingest Workflow(摄取流程)
|
||||
**重要** 请严格按照摄取流程进行操作,每分析一个页面必须要创建/更新source page,entity, concept等。不可遗漏!
|
||||
|
||||
触发方式:
|
||||
- `/wiki-ingest`
|
||||
- 或:`ingest <file>`
|
||||
## 执行步骤(严格顺序)
|
||||
1. 使用 Read 工具完整读取 source 文档
|
||||
2. 读取 `wiki/index.md` 和 `wiki/overview.md`
|
||||
3. 生成 `wiki/sources/原始中文名.md` (非中文使用 slug.md)
|
||||
4. 更新 `wiki/index.md`
|
||||
5. 更新 `wiki/overview.md`(如有必要)
|
||||
6. 创建或更新 Entity 页面
|
||||
7. 创建或更新 Concept 页面
|
||||
8. 检测并记录冲突
|
||||
9. 追加 `wiki/log.md`
|
||||
|
||||
---
|
||||
|
||||
# Source Page Format(增强结构)
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "Source Title"
|
||||
type: source
|
||||
tags: []
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/...]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:
|
||||
- 问题域:
|
||||
- 方法/机制:
|
||||
- 结论/价值:
|
||||
|
||||
## Key Claims
|
||||
- (必须符合:主体 + 机制 + 结果)
|
||||
|
||||
## Key Quotes
|
||||
> "引用内容" — 上下文说明
|
||||
|
||||
## Key Concepts
|
||||
- [[ConceptName]]:定义
|
||||
|
||||
## Key Entities
|
||||
- [[EntityName]]:角色说明
|
||||
|
||||
## Connections
|
||||
- [[A]] ← depends_on ← [[B]]
|
||||
- [[C]] ← extends ← [[D]]
|
||||
|
||||
## Contradictions
|
||||
- 与 [[OtherPage]] 冲突:
|
||||
- 冲突点:
|
||||
- 当前观点:
|
||||
- 对方观点:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Domain-Specific Templates(领域模板)
|
||||
|
||||
## Diary / Journal
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "YYYY-MM-DD Diary"
|
||||
type: source
|
||||
tags: [diary]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Event Summary
|
||||
## Key Decisions
|
||||
## Energy & Mood
|
||||
## Connections
|
||||
## Shifts & Contradictions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Meeting Notes
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "Meeting Title"
|
||||
type: source
|
||||
tags: [meeting]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Goal
|
||||
## Key Discussions
|
||||
## Decisions Made
|
||||
## Action Items
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Entity & Concept Rules(关键增强)
|
||||
|
||||
## Entity(实体)
|
||||
|
||||
创建条件:
|
||||
- 出现 ≥ 2 次
|
||||
或
|
||||
- 对主题有关键影响
|
||||
|
||||
类型:
|
||||
- 人 / 公司 / 产品 / 项目
|
||||
|
||||
---
|
||||
|
||||
## Concept(概念)
|
||||
创建条件:
|
||||
- 可抽象
|
||||
- 可复用
|
||||
- 非具体实例
|
||||
---
|
||||
|
||||
## 命名规范(强制)
|
||||
- 使用唯一标准名称
|
||||
- 所有别名写入页面:
|
||||
|
||||
```markdown
|
||||
## Aliases
|
||||
- GPT4
|
||||
- GPT-4
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 去重机制(必须)
|
||||
|
||||
创建前必须:
|
||||
1. 搜索 index
|
||||
2. 判断是否存在
|
||||
3. 存在则更新
|
||||
|
||||
---
|
||||
|
||||
# Query Workflow(查询流程)
|
||||
|
||||
触发:
|
||||
- `/wiki-query`
|
||||
- 或:`query: 问题`
|
||||
|
||||
---
|
||||
|
||||
## 步骤
|
||||
|
||||
1. 读取 index
|
||||
2. 找到相关页面
|
||||
3. 使用 Read 工具加载
|
||||
4. 输出结构化答案
|
||||
5. 使用 `[[Page]]` 引用
|
||||
6. 询问是否保存为 synthesis
|
||||
|
||||
---
|
||||
|
||||
# Lint Workflow(校验)
|
||||
|
||||
检查内容:
|
||||
|
||||
- 孤立页面
|
||||
- 断链
|
||||
- 冲突
|
||||
- 过期内容
|
||||
- 缺失Entity
|
||||
- 缺失Concept
|
||||
- 知识空白
|
||||
|
||||
---
|
||||
|
||||
# Graph Workflow(知识图谱)
|
||||
|
||||
触发:
|
||||
- `/wiki-graph`
|
||||
|
||||
---
|
||||
|
||||
执行:
|
||||
- 优先运行 `tools/build_graph.py`
|
||||
- 否则手动构建:
|
||||
|
||||
步骤:
|
||||
1. 提取所有 `[[links]]`
|
||||
2. 构建节点与边
|
||||
3. 输出 `graph.json`
|
||||
|
||||
---
|
||||
|
||||
# Naming Conventions(命名规范)
|
||||
- Source:保留原始中文名称(去除特殊符号),非中文使用 kebab-case
|
||||
- Entity:TitleCase
|
||||
- Concept:TitleCase
|
||||
|
||||
---
|
||||
|
||||
# Index Format(索引结构)
|
||||
|
||||
```markdown
|
||||
# Wiki Index
|
||||
|
||||
## Overview
|
||||
- [Overview](overview.md)
|
||||
|
||||
## Sources
|
||||
- [Title](sources/原始中文名.md)
|
||||
|
||||
## Entities
|
||||
- [Entity](entities/Entity.md)
|
||||
|
||||
## Concepts
|
||||
- [Concept](concepts/Concept.md)
|
||||
|
||||
## Syntheses
|
||||
- [Title](syntheses/slug.md)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Log Format(日志)
|
||||
|
||||
```
|
||||
## [YYYY-MM-DD] ingest | 标题
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# ✅ 最终目标
|
||||
|
||||
该系统用于:
|
||||
|
||||
- 知识沉淀
|
||||
- 结构化理解
|
||||
- 自动图谱构建
|
||||
- Agent 推理支持
|
||||
|
||||
---
|
||||
|
||||
# END
|
||||
175
GEMINI.md
175
GEMINI.md
@@ -1,175 +0,0 @@
|
||||
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||
|
||||
This wiki is maintained entirely by Gemini CLI. No API key or Python scripts needed — just open this repo with `gemini` and talk to it.
|
||||
|
||||
## How to Use
|
||||
|
||||
Describe what you want in plain English:
|
||||
- *"Ingest this file: raw/papers/my-paper.md"*
|
||||
- *"What does the wiki say about transformer models?"*
|
||||
- *"Check the wiki for orphan pages and contradictions"*
|
||||
- *"Build the knowledge graph"*
|
||||
|
||||
Or use shorthand triggers:
|
||||
- `ingest <file>` → runs the Ingest Workflow
|
||||
- `query: <question>` → runs the Query Workflow
|
||||
- `lint` → runs the Lint Workflow
|
||||
- `build graph` → runs the Graph Workflow
|
||||
|
||||
---
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
raw/ # Immutable source documents — never modify these
|
||||
wiki/ # Agent owns this layer entirely
|
||||
index.md # Catalog of all pages — update on every ingest
|
||||
log.md # Append-only chronological record
|
||||
overview.md # Living synthesis across all sources
|
||||
sources/ # One summary page per source document
|
||||
entities/ # People, companies, projects, products
|
||||
concepts/ # Ideas, frameworks, methods, theories
|
||||
syntheses/ # Saved query answers
|
||||
graph/ # Auto-generated graph data
|
||||
tools/ # Optional standalone Python scripts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Page Format
|
||||
|
||||
Every wiki page uses this frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
title: "Page Title"
|
||||
type: source | entity | concept | synthesis
|
||||
tags: []
|
||||
sources: []
|
||||
last_updated: YYYY-MM-DD
|
||||
---
|
||||
```
|
||||
|
||||
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||
|
||||
---
|
||||
|
||||
## Ingest Workflow
|
||||
|
||||
Triggered by: *"ingest <file>"*
|
||||
|
||||
1. Read the source document fully
|
||||
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||
3. Write `wiki/sources/<slug>.md` (source page format below)
|
||||
4. Update `wiki/index.md` — add entry under Sources
|
||||
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||
6. Update/create entity and concept pages
|
||||
7. Flag contradictions with existing wiki content
|
||||
8. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||
|
||||
### Source Page Format
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "Source Title"
|
||||
type: source
|
||||
tags: []
|
||||
date: YYYY-MM-DD
|
||||
source_file: raw/...
|
||||
---
|
||||
|
||||
## Summary
|
||||
2–4 sentence summary.
|
||||
|
||||
## Key Claims
|
||||
- Claim 1
|
||||
|
||||
## Key Quotes
|
||||
> "Quote here"
|
||||
|
||||
## Connections
|
||||
- [[EntityName]] — how they relate
|
||||
|
||||
## Contradictions
|
||||
- Contradicts [[OtherPage]] on: ...
|
||||
```
|
||||
|
||||
### Domain-Specific Templates
|
||||
|
||||
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||
|
||||
#### Diary / Journal Template
|
||||
```markdown
|
||||
---
|
||||
title: "YYYY-MM-DD Diary"
|
||||
type: source
|
||||
tags: [diary]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Event Summary
|
||||
...
|
||||
## Key Decisions
|
||||
...
|
||||
## Energy & Mood
|
||||
...
|
||||
## Connections
|
||||
...
|
||||
## Shifts & Contradictions
|
||||
...
|
||||
```
|
||||
|
||||
#### Meeting Notes Template
|
||||
```markdown
|
||||
---
|
||||
title: "Meeting Title"
|
||||
type: source
|
||||
tags: [meeting]
|
||||
date: YYYY-MM-DD
|
||||
---
|
||||
## Goal
|
||||
...
|
||||
## Key Discussions
|
||||
...
|
||||
## Decisions Made
|
||||
...
|
||||
## Action Items
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Workflow
|
||||
|
||||
Triggered by: *"query: <question>"*
|
||||
|
||||
1. Read `wiki/index.md` — identify relevant pages
|
||||
2. Read those pages
|
||||
3. Synthesize answer with `[[PageName]]` citations
|
||||
4. Offer to save as `wiki/syntheses/<slug>.md`
|
||||
|
||||
---
|
||||
|
||||
## Lint Workflow
|
||||
|
||||
Triggered by: *"lint"*
|
||||
|
||||
Check for: orphan pages, broken links, contradictions, stale content, missing entity pages, data gaps.
|
||||
|
||||
---
|
||||
|
||||
## Graph Workflow
|
||||
|
||||
Triggered by: *"build graph"*
|
||||
|
||||
Try `python tools/build_graph.py --open` first. If unavailable, build graph.json and graph.html manually from wikilinks.
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
- Source slugs: `kebab-case`
|
||||
- Entity/Concept pages: `TitleCase.md`
|
||||
|
||||
## Log Format
|
||||
|
||||
`## [YYYY-MM-DD] <operation> | <title>`
|
||||
21
LICENSE
21
LICENSE
@@ -1,21 +0,0 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2023 SamurAIGPT
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
245
README.md
245
README.md
@@ -1,245 +0,0 @@
|
||||
# LLM Wiki Agent
|
||||
|
||||
[](LICENSE)
|
||||
|
||||
**A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.
|
||||
|
||||
> Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/papers/attention-is-all-you-need.md
|
||||
```
|
||||
|
||||
```
|
||||
wiki/
|
||||
├── index.md catalog of all pages — updated on every ingest
|
||||
├── log.md append-only record of every operation
|
||||
├── overview.md living synthesis across all sources
|
||||
├── sources/ one summary page per source document
|
||||
├── entities/ people, companies, projects — auto-created
|
||||
├── concepts/ ideas, frameworks, methods — auto-created
|
||||
└── syntheses/ query answers filed back as wiki pages
|
||||
graph/
|
||||
├── graph.json persistent node/edge data (SHA256-cached)
|
||||
└── graph.html interactive vis.js visualization — open in any browser
|
||||
```
|
||||
|
||||
## Install
|
||||
|
||||
**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/SamurAIGPT/llm-wiki-agent.git
|
||||
cd llm-wiki-agent
|
||||
```
|
||||
|
||||
Open in your agent — no API key or Python setup needed:
|
||||
|
||||
```bash
|
||||
claude # reads CLAUDE.md + .claude/commands/
|
||||
codex # reads AGENTS.md
|
||||
opencode # reads AGENTS.md
|
||||
gemini # reads GEMINI.md
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki
|
||||
/wiki-ingest raw/articles/my-article.md # works on any markdown file
|
||||
|
||||
/wiki-query "what are the main themes?" # synthesize answer from wiki pages
|
||||
/wiki-query "how does X relate to Y?" # with [[wikilink]] citations
|
||||
|
||||
/wiki-lint # find orphans, contradictions, gaps
|
||||
/wiki-graph # build graph.html from all wikilinks
|
||||
```
|
||||
|
||||
Plain English also works with any agent:
|
||||
```
|
||||
"Ingest this paper: raw/papers/llama2.md"
|
||||
"What does the wiki say about attention mechanisms?"
|
||||
"Check for contradictions across sources"
|
||||
"Build the knowledge graph and tell me the most connected nodes"
|
||||
```
|
||||
|
||||
Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries.
|
||||
|
||||
## What You Get
|
||||
|
||||
**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.
|
||||
|
||||
**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.
|
||||
|
||||
**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.
|
||||
|
||||
**Living overview** — `wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.
|
||||
|
||||
**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.
|
||||
|
||||
**Knowledge graph** — `graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.
|
||||
|
||||
**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Research
|
||||
|
||||
Going deep on a topic over weeks — reading papers, articles, reports.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/papers/attention-is-all-you-need.md
|
||||
/wiki-ingest raw/papers/llama2.md
|
||||
/wiki-ingest raw/papers/rag-survey.md
|
||||
|
||||
# Wiki builds entity pages (Meta AI, Google Brain) and
|
||||
# concept pages (Attention, RLHF, Context Window) automatically.
|
||||
|
||||
/wiki-query "What are the main approaches to reducing hallucination?"
|
||||
/wiki-query "How has context window size evolved across models?"
|
||||
|
||||
/wiki-lint
|
||||
# → "No sources on mixture-of-experts — consider the Mixtral paper"
|
||||
```
|
||||
|
||||
By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.
|
||||
|
||||
---
|
||||
|
||||
### Reading a Book
|
||||
|
||||
File each chapter as you go. Build out pages for characters, themes, arguments.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/book/chapter-01.md
|
||||
/wiki-ingest raw/book/chapter-02.md
|
||||
|
||||
# Wiki creates entity and theme pages automatically.
|
||||
|
||||
/wiki-query "How has the protagonist's motivation evolved?"
|
||||
/wiki-query "What contradictions exist in the author's argument so far?"
|
||||
|
||||
/wiki-graph # → graph.html shows every character/theme and how they connect
|
||||
```
|
||||
|
||||
Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.
|
||||
|
||||
---
|
||||
|
||||
### Personal Knowledge Base
|
||||
|
||||
Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/journal/2026-01-week1.md
|
||||
/wiki-ingest raw/articles/huberman-sleep-protocol.md
|
||||
/wiki-ingest raw/articles/atomic-habits-summary.md
|
||||
|
||||
/wiki-query "What patterns show up in my journal entries about energy?"
|
||||
/wiki-query "What habits have I tried and what was the outcome?"
|
||||
```
|
||||
|
||||
The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed.
|
||||
|
||||
---
|
||||
|
||||
### Business / Team Intelligence
|
||||
|
||||
Feed in meeting transcripts, project docs, customer calls.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/meetings/q1-planning-transcript.md
|
||||
/wiki-ingest raw/docs/product-roadmap-2026.md
|
||||
/wiki-ingest raw/calls/customer-interview-acme.md
|
||||
|
||||
/wiki-query "What feature requests have come up most across customer calls?"
|
||||
/wiki-query "What decisions were made in Q1 and what was the rationale?"
|
||||
|
||||
/wiki-lint
|
||||
# → "Project X mentioned in 5 pages but no dedicated page"
|
||||
# → "Roadmap contradicts customer interview on priority of feature Y"
|
||||
```
|
||||
|
||||
The wiki stays current because the agent does the maintenance no one wants to do.
|
||||
|
||||
---
|
||||
|
||||
### Competitive Analysis
|
||||
|
||||
Track a company, market, or technology over time.
|
||||
|
||||
```
|
||||
/wiki-ingest raw/competitors/openai-announcements.md
|
||||
/wiki-ingest raw/market/ai-funding-report-q1.md
|
||||
|
||||
/wiki-query "How do OpenAI and Anthropic differ on safety approach?"
|
||||
/wiki-query "Which companies announced multimodal models in the last 6 months?"
|
||||
/wiki-query "Competitive landscape summary as of today" --save
|
||||
```
|
||||
|
||||
## The Graph
|
||||
|
||||
Two-pass build:
|
||||
|
||||
1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`
|
||||
2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`
|
||||
|
||||
Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.
|
||||
|
||||
## CLAUDE.md / AGENTS.md
|
||||
|
||||
The schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.
|
||||
|
||||
| Agent | Schema file |
|
||||
|---|---|
|
||||
| Claude Code | `CLAUDE.md` |
|
||||
| Codex / OpenCode | `AGENTS.md` |
|
||||
| Gemini CLI | `GEMINI.md` |
|
||||
|
||||
## What Makes This Different from RAG
|
||||
|
||||
| RAG | LLM Wiki Agent |
|
||||
|---|---|
|
||||
| Re-derives knowledge every query | Compiles once, keeps current |
|
||||
| Raw chunks as retrieval unit | Structured wiki pages |
|
||||
| No cross-references | Cross-references pre-built |
|
||||
| Contradictions surface at query time (maybe) | Flagged at ingest time |
|
||||
| No accumulation | Every source makes the wiki richer |
|
||||
|
||||
## Obsidian Integration
|
||||
|
||||
The wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault.
|
||||
|
||||
### Vault Symlink Pattern
|
||||
If you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks:
|
||||
1. Keep your working agent repository at e.g., `~/llm-wiki-agent`
|
||||
2. Create a symlink from your main Obsidian vault:
|
||||
```bash
|
||||
ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki
|
||||
```
|
||||
3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion.
|
||||
|
||||
> **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian.
|
||||
|
||||
### Recommended .obsidian Config
|
||||
- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph.
|
||||
- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`).
|
||||
|
||||
## Tips
|
||||
|
||||
- File good query answers back with `--save` — your explorations compound just like ingested sources
|
||||
- The wiki is a git repo — version history for free
|
||||
- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)
|
||||
|
||||
## Tech Stack
|
||||
|
||||
NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.
|
||||
|
||||
## Related
|
||||
|
||||
- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)
|
||||
- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles
|
||||
|
||||
## License
|
||||
|
||||
MIT License — see [LICENSE](LICENSE) for details.
|
||||
@@ -1,101 +0,0 @@
|
||||
# Automated Wiki Synchronization Guide
|
||||
|
||||
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
|
||||
|
||||
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
|
||||
|
||||
## The Two-Step Architecture
|
||||
|
||||
LLM Wiki Agent ingestion is a two-step process:
|
||||
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
|
||||
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
|
||||
|
||||
### Step 1: The Master Orchestrator Script
|
||||
|
||||
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -uo pipefail
|
||||
|
||||
# Define variables
|
||||
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
|
||||
LOG_FILE="$LAB_DIR/automation-cron.log"
|
||||
DATE=$(date "+%Y-%m-%d %H:%M:%S")
|
||||
|
||||
echo "=====================================================" >> "$LOG_FILE"
|
||||
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
|
||||
|
||||
cd "$LAB_DIR" || exit 1
|
||||
|
||||
# 1. Run your personal Vault-to-Raw symlink script here
|
||||
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
|
||||
|
||||
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
|
||||
export LLM_MODEL="gemini/gemini-3-flash-preview"
|
||||
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
|
||||
|
||||
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
|
||||
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
|
||||
while read file; do
|
||||
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
|
||||
done
|
||||
|
||||
# 3. Heal Graph Context (Auto-resolves broken semantic links)
|
||||
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
|
||||
python3 tools/heal.py >> "$LOG_FILE" 2>&1
|
||||
|
||||
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
|
||||
echo "=====================================================" >> "$LOG_FILE"
|
||||
```
|
||||
|
||||
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
|
||||
|
||||
### Step 2: System Scheduler (macOS launchd)
|
||||
|
||||
For macOS, `launchd` is significantly more robust than `cron`.
|
||||
|
||||
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.personal-wiki-sync</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/bin/bash</string>
|
||||
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
|
||||
</array>
|
||||
|
||||
<!-- Execute automatically at 2:00 AM daily -->
|
||||
<key>StartCalendarInterval</key>
|
||||
<dict>
|
||||
<key>Hour</key>
|
||||
<integer>2</integer>
|
||||
<key>Minute</key>
|
||||
<integer>0</integer>
|
||||
</dict>
|
||||
|
||||
<!-- Run upon system boot if the interval was missed -->
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
|
||||
<!-- Diagnostic Logs -->
|
||||
<key>StandardOutPath</key>
|
||||
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
Load the daemon:
|
||||
```bash
|
||||
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
|
||||
```
|
||||
|
||||
### Self-Healing & Health Monitoring
|
||||
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.
|
||||
@@ -1,36 +1,37 @@
|
||||
# Wiki Ingest Status
|
||||
|
||||
## Last Updated
|
||||
2026-04-16 03:45 CST
|
||||
2026-04-16 08:05 CST
|
||||
|
||||
## Batch Progress
|
||||
- Total batches completed: 5
|
||||
- This batch: 4 docs ingested
|
||||
- Total batches completed: 6
|
||||
- This batch (Batch 12): 3 docs ingested
|
||||
|
||||
## Docs Ingested This Session (Batch 5)
|
||||
1. AI/Multi-Agent System Reliability.md ✅
|
||||
2. AI/Never write another prompt.md ✅
|
||||
3. AI/RAG从入门到精通系列1:基础RAG.md ✅
|
||||
4. AI/大模型相关术语和框架总结|LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏.md ✅
|
||||
## Docs Ingested This Session (Batch 12)
|
||||
1. n8n Telegram Trigger HTTPS 配置修复 ✅
|
||||
2. n8n Docker 安装与 SOCKS5 代理配置 ✅
|
||||
3. N8N AI Agent 2025 入门教程 ✅
|
||||
|
||||
## Overall Progress
|
||||
- Total raw files: 182
|
||||
- Done: 19 (10.4%)
|
||||
- Remaining: 163
|
||||
- Done: 22 (12.1%)
|
||||
- Remaining: 160
|
||||
|
||||
## Wiki Stats
|
||||
- Sources: 95
|
||||
- Entities: 158
|
||||
- Concepts: 203
|
||||
- Sources: 98 (+3)
|
||||
- Entities: 159 (+1: Telegram)
|
||||
- Concepts: 205 (+2: Telegram Webhook, WEBHOOK_URL)
|
||||
|
||||
## Git
|
||||
- Last commit: 04b7e99 (wiki-ingest batch Apr 16)
|
||||
- Last commit: 04b7e99 (Batch 11)
|
||||
|
||||
## Next Batch Suggestions
|
||||
From raw/AI/ (remaining ~20 files):
|
||||
From raw/Agent/ (remaining ~7 files):
|
||||
- n8n+Claude 通过自然语言自动化工作流.md
|
||||
- 使用Claude自动生成N8N工作流的实操教程.md
|
||||
- 万字保姆级教程-90天跑通一人公司模式-2026-03-29.md
|
||||
|
||||
From raw/AI/:
|
||||
- AI/一语点醒梦中人.md
|
||||
- AI/系统提示词构建原则.md
|
||||
- AI/codecrafters-iobuild-your-own-x...md
|
||||
- AI/全网最全Nano Banana 2 使用指南.md
|
||||
- AI/如何写出完美的Prompt.md
|
||||
- AI/我用 Gemini 3 一口气做了 10 个应用.md
|
||||
|
||||
44
openclaw/xinghui/Hermes-Agent系统提示词解析-岚叔-2026-04-15.md
Normal file
44
openclaw/xinghui/Hermes-Agent系统提示词解析-岚叔-2026-04-15.md
Normal file
@@ -0,0 +1,44 @@
|
||||
---
|
||||
title: "抽丝剥茧:深度解析 Hermes Agent 万字系统提示词"
|
||||
source: "https://x.com/lufzzliz/status/2044258384556556743"
|
||||
author: "岚叔 (@lufzzliz)"
|
||||
date: "2026-04-15"
|
||||
type: social-media-highlight
|
||||
tags:
|
||||
- Hermes
|
||||
- AI-Agent
|
||||
- System-Prompt
|
||||
- 教程
|
||||
---
|
||||
|
||||
# 抽丝剥茧:深度解析 Hermes Agent 万字系统提示词(System Prompt)构成
|
||||
|
||||
**来源**: Twitter/X @lufzzliz
|
||||
**时间**: 2026-04-15 03:35:54
|
||||
**链接**: https://twitter.com/lufzzliz/status/2044258384556556743
|
||||
|
||||
**互动数据**: ❤️ 188 | 🔁 34 | 💬 6
|
||||
|
||||
---
|
||||
|
||||
## 内容摘要
|
||||
|
||||
没想到吧,Hermes agent 也可能有万字的系统提示词,且看岚叔带你完整拆解。
|
||||
|
||||
同时教你一招降低 50% tokens 的小妙招。
|
||||
|
||||
本文依然是实践操作类文章,欢迎兄弟们大力支持~
|
||||
|
||||
---
|
||||
|
||||
## 关键信息
|
||||
|
||||
- **主题**: Hermes Agent 系统提示词(System Prompt)深度解析
|
||||
- **亮点**: 万字级系统提示词完整拆解
|
||||
- **技巧**: 降低 50% tokens 的方法
|
||||
|
||||
---
|
||||
|
||||
## 推文链接
|
||||
|
||||
> 原文链接见 Twitter 帖子
|
||||
@@ -1,2 +0,0 @@
|
||||
litellm>=1.0.0
|
||||
networkx>=3.2
|
||||
@@ -1,454 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Build the knowledge graph from the wiki.
|
||||
|
||||
Usage:
|
||||
python tools/build_graph.py # full rebuild
|
||||
python tools/build_graph.py --no-infer # skip semantic inference (faster)
|
||||
python tools/build_graph.py --open # open graph.html in browser after build
|
||||
|
||||
Outputs:
|
||||
graph/graph.json — node/edge data (cached by SHA256)
|
||||
graph/graph.html — interactive vis.js visualization
|
||||
|
||||
Edge types:
|
||||
EXTRACTED — explicit [[wikilink]] in a page
|
||||
INFERRED — Claude-detected implicit relationship
|
||||
AMBIGUOUS — low-confidence inferred relationship
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
import hashlib
|
||||
import argparse
|
||||
import webbrowser
|
||||
from pathlib import Path
|
||||
from datetime import date
|
||||
|
||||
import os
|
||||
|
||||
try:
|
||||
import networkx as nx
|
||||
from networkx.algorithms import community as nx_community
|
||||
HAS_NETWORKX = True
|
||||
except ImportError:
|
||||
HAS_NETWORKX = False
|
||||
print("Warning: networkx not installed. Community detection disabled. Run: pip install networkx")
|
||||
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
WIKI_DIR = REPO_ROOT / "wiki"
|
||||
GRAPH_DIR = REPO_ROOT / "graph"
|
||||
GRAPH_JSON = GRAPH_DIR / "graph.json"
|
||||
GRAPH_HTML = GRAPH_DIR / "graph.html"
|
||||
CACHE_FILE = GRAPH_DIR / ".cache.json"
|
||||
LOG_FILE = WIKI_DIR / "log.md"
|
||||
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||
|
||||
# Node type → color mapping
|
||||
TYPE_COLORS = {
|
||||
"source": "#4CAF50",
|
||||
"entity": "#2196F3",
|
||||
"concept": "#FF9800",
|
||||
"synthesis": "#9C27B0",
|
||||
"unknown": "#9E9E9E",
|
||||
}
|
||||
|
||||
EDGE_COLORS = {
|
||||
"EXTRACTED": "#555555",
|
||||
"INFERRED": "#FF5722",
|
||||
"AMBIGUOUS": "#BDBDBD",
|
||||
}
|
||||
|
||||
|
||||
def read_file(path: Path) -> str:
|
||||
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||
|
||||
|
||||
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||
try:
|
||||
from litellm import completion
|
||||
except ImportError:
|
||||
print("Error: litellm not installed. Run: pip install litellm")
|
||||
import sys
|
||||
sys.exit(1)
|
||||
|
||||
model = os.getenv(model_env, default_model)
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
def sha256(text: str) -> str:
|
||||
return hashlib.sha256(text.encode()).hexdigest()
|
||||
|
||||
|
||||
def all_wiki_pages() -> list[Path]:
|
||||
return [p for p in WIKI_DIR.rglob("*.md")
|
||||
if p.name not in ("index.md", "log.md", "lint-report.md")]
|
||||
|
||||
|
||||
def extract_wikilinks(content: str) -> list[str]:
|
||||
return list(set(re.findall(r'\[\[([^\]]+)\]\]', content)))
|
||||
|
||||
|
||||
def extract_frontmatter_type(content: str) -> str:
|
||||
match = re.search(r'^type:\s*(\S+)', content, re.MULTILINE)
|
||||
return match.group(1).strip('"\'') if match else "unknown"
|
||||
|
||||
|
||||
def page_id(path: Path) -> str:
|
||||
return path.relative_to(WIKI_DIR).as_posix().replace(".md", "")
|
||||
|
||||
|
||||
def load_cache() -> dict:
|
||||
if CACHE_FILE.exists():
|
||||
try:
|
||||
return json.loads(CACHE_FILE.read_text())
|
||||
except (json.JSONDecodeError, IOError):
|
||||
return {}
|
||||
return {}
|
||||
|
||||
|
||||
def save_cache(cache: dict):
|
||||
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
|
||||
CACHE_FILE.write_text(json.dumps(cache, indent=2))
|
||||
|
||||
|
||||
def build_nodes(pages: list[Path]) -> list[dict]:
|
||||
nodes = []
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
node_type = extract_frontmatter_type(content)
|
||||
title_match = re.search(r'^title:\s*"?([^"\n]+)"?', content, re.MULTILINE)
|
||||
label = title_match.group(1).strip() if title_match else p.stem
|
||||
nodes.append({
|
||||
"id": page_id(p),
|
||||
"label": label,
|
||||
"type": node_type,
|
||||
"color": TYPE_COLORS.get(node_type, TYPE_COLORS["unknown"]),
|
||||
"path": str(p.relative_to(REPO_ROOT)),
|
||||
})
|
||||
return nodes
|
||||
|
||||
|
||||
def build_extracted_edges(pages: list[Path]) -> list[dict]:
|
||||
"""Pass 1: deterministic wikilink edges."""
|
||||
# Build a map from stem (lower) -> page_id for resolution
|
||||
stem_map = {p.stem.lower(): page_id(p) for p in pages}
|
||||
edges = []
|
||||
seen = set()
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
src = page_id(p)
|
||||
for link in extract_wikilinks(content):
|
||||
target = stem_map.get(link.lower())
|
||||
if target and target != src:
|
||||
key = (src, target)
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
edges.append({
|
||||
"from": src,
|
||||
"to": target,
|
||||
"type": "EXTRACTED",
|
||||
"color": EDGE_COLORS["EXTRACTED"],
|
||||
"confidence": 1.0,
|
||||
})
|
||||
return edges
|
||||
|
||||
|
||||
def build_inferred_edges(pages: list[Path], existing_edges: list[dict], cache: dict) -> list[dict]:
|
||||
"""Pass 2: API-inferred semantic relationships."""
|
||||
new_edges = []
|
||||
|
||||
# Only process pages that changed since last run
|
||||
changed_pages = []
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
h = sha256(content)
|
||||
entry = cache.get(str(p))
|
||||
|
||||
if not isinstance(entry, dict) or entry.get("hash") != h:
|
||||
changed_pages.append(p)
|
||||
else:
|
||||
# Page unchanged: load its inferred edges from cache perfectly
|
||||
src = page_id(p)
|
||||
for rel in entry.get("edges", []):
|
||||
new_edges.append({
|
||||
"from": src,
|
||||
"to": rel["to"],
|
||||
"type": rel.get("type", "INFERRED"),
|
||||
"title": rel.get("relationship", ""),
|
||||
"label": "",
|
||||
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
|
||||
"confidence": float(rel.get("confidence", 0.7)),
|
||||
})
|
||||
|
||||
if not changed_pages:
|
||||
print(" no changed pages — skipping semantic inference")
|
||||
return []
|
||||
|
||||
print(f" inferring relationships for {len(changed_pages)} changed pages...")
|
||||
|
||||
# Build a summary of existing nodes for context
|
||||
node_list = "\n".join(f"- {page_id(p)} ({extract_frontmatter_type(read_file(p))})" for p in pages)
|
||||
existing_edge_summary = "\n".join(
|
||||
f"- {e['from']} → {e['to']} (EXTRACTED)" for e in existing_edges[:30]
|
||||
)
|
||||
|
||||
for p in changed_pages:
|
||||
content = read_file(p)[:2000] # truncate for context efficiency
|
||||
src = page_id(p)
|
||||
|
||||
prompt = f"""Analyze this wiki page and identify implicit semantic relationships to other pages in the wiki.
|
||||
|
||||
Source page: {src}
|
||||
Content:
|
||||
{content}
|
||||
|
||||
All available pages:
|
||||
{node_list}
|
||||
|
||||
Already-extracted edges from this page:
|
||||
{existing_edge_summary}
|
||||
|
||||
Return ONLY a JSON array of NEW relationships not already captured by explicit wikilinks:
|
||||
[
|
||||
{{"to": "page-id", "relationship": "one-line description", "confidence": 0.0-1.0, "type": "INFERRED or AMBIGUOUS"}}
|
||||
]
|
||||
|
||||
Rules:
|
||||
- Only include pages from the available list above
|
||||
- Confidence >= 0.7 → INFERRED, < 0.7 → AMBIGUOUS
|
||||
- Do not repeat edges already in the extracted list
|
||||
- Return empty array [] if no new relationships found
|
||||
"""
|
||||
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=1024)
|
||||
raw = raw.strip()
|
||||
raw = re.sub(r"^```(?:json)?\s*", "", raw)
|
||||
raw = re.sub(r"\s*```$", "", raw)
|
||||
|
||||
try:
|
||||
inferred = json.loads(raw)
|
||||
valid_rels = []
|
||||
for rel in inferred:
|
||||
if isinstance(rel, dict) and "to" in rel:
|
||||
new_edges.append({
|
||||
"from": src,
|
||||
"to": rel["to"],
|
||||
"type": rel.get("type", "INFERRED"),
|
||||
"title": rel.get("relationship", ""),
|
||||
"label": "",
|
||||
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
|
||||
"confidence": float(rel.get("confidence", 0.7)),
|
||||
})
|
||||
valid_rels.append(rel)
|
||||
|
||||
# Save properly to cache
|
||||
cache[str(p)] = {
|
||||
"hash": sha256(content),
|
||||
"edges": valid_rels
|
||||
}
|
||||
except (json.JSONDecodeError, TypeError, ValueError):
|
||||
pass
|
||||
|
||||
return new_edges
|
||||
|
||||
|
||||
def detect_communities(nodes: list[dict], edges: list[dict]) -> dict[str, int]:
|
||||
"""Assign community IDs to nodes using Louvain algorithm."""
|
||||
if not HAS_NETWORKX:
|
||||
return {}
|
||||
|
||||
G = nx.Graph()
|
||||
for n in nodes:
|
||||
G.add_node(n["id"])
|
||||
for e in edges:
|
||||
G.add_edge(e["from"], e["to"])
|
||||
|
||||
if G.number_of_edges() == 0:
|
||||
return {}
|
||||
|
||||
try:
|
||||
communities = nx_community.louvain_communities(G, seed=42)
|
||||
node_to_community = {}
|
||||
for i, comm in enumerate(communities):
|
||||
for node in comm:
|
||||
node_to_community[node] = i
|
||||
return node_to_community
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
|
||||
COMMUNITY_COLORS = [
|
||||
"#E91E63", "#00BCD4", "#8BC34A", "#FF5722", "#673AB7",
|
||||
"#FFC107", "#009688", "#F44336", "#3F51B5", "#CDDC39",
|
||||
]
|
||||
|
||||
|
||||
def render_html(nodes: list[dict], edges: list[dict]) -> str:
|
||||
"""Generate self-contained vis.js HTML."""
|
||||
nodes_json = json.dumps(nodes, indent=2)
|
||||
edges_json = json.dumps(edges, indent=2)
|
||||
|
||||
legend_items = "".join(
|
||||
f'<span style="background:{color};padding:3px 8px;margin:2px;border-radius:3px;font-size:12px">{t}</span>'
|
||||
for t, color in TYPE_COLORS.items() if t != "unknown"
|
||||
)
|
||||
|
||||
return f"""<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>LLM Wiki — Knowledge Graph</title>
|
||||
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
|
||||
<style>
|
||||
body {{ margin: 0; background: #1a1a2e; font-family: sans-serif; color: #eee; }}
|
||||
#graph {{ width: 100vw; height: 100vh; }}
|
||||
#controls {{
|
||||
position: fixed; top: 10px; left: 10px; background: rgba(0,0,0,0.7);
|
||||
padding: 12px; border-radius: 8px; z-index: 10; max-width: 260px;
|
||||
}}
|
||||
#controls h3 {{ margin: 0 0 8px; font-size: 14px; }}
|
||||
#search {{ width: 100%; padding: 4px; margin-bottom: 8px; background: #333; color: #eee; border: 1px solid #555; border-radius: 4px; }}
|
||||
#info {{
|
||||
position: fixed; bottom: 10px; left: 10px; background: rgba(0,0,0,0.8);
|
||||
padding: 12px; border-radius: 8px; z-index: 10; max-width: 320px;
|
||||
display: none;
|
||||
}}
|
||||
#stats {{ position: fixed; top: 10px; right: 10px; background: rgba(0,0,0,0.7); padding: 10px; border-radius: 8px; font-size: 12px; }}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div id="controls">
|
||||
<h3>LLM Wiki Graph</h3>
|
||||
<input id="search" type="text" placeholder="Search nodes..." oninput="searchNodes(this.value)">
|
||||
<div>{legend_items}</div>
|
||||
<div style="margin-top:8px;font-size:11px;color:#aaa">
|
||||
<span style="background:#555;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Explicit link<br>
|
||||
<span style="background:#FF5722;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Inferred
|
||||
</div>
|
||||
</div>
|
||||
<div id="graph"></div>
|
||||
<div id="info">
|
||||
<b id="info-title"></b><br>
|
||||
<span id="info-type" style="font-size:12px;color:#aaa"></span><br>
|
||||
<span id="info-path" style="font-size:11px;color:#666"></span>
|
||||
</div>
|
||||
<div id="stats"></div>
|
||||
<script>
|
||||
const nodes = new vis.DataSet({nodes_json});
|
||||
const edges = new vis.DataSet({edges_json});
|
||||
|
||||
const container = document.getElementById("graph");
|
||||
const network = new vis.Network(container, {{ nodes, edges }}, {{
|
||||
nodes: {{
|
||||
shape: "dot",
|
||||
size: 12,
|
||||
font: {{ color: "#eee", size: 13 }},
|
||||
borderWidth: 2,
|
||||
}},
|
||||
edges: {{
|
||||
width: 1.2,
|
||||
smooth: {{ type: "continuous" }},
|
||||
arrows: {{ to: {{ enabled: true, scaleFactor: 0.5 }} }},
|
||||
}},
|
||||
physics: {{
|
||||
stabilization: {{ iterations: 150 }},
|
||||
barnesHut: {{ gravitationalConstant: -8000, springLength: 120 }},
|
||||
}},
|
||||
interaction: {{ hover: true, tooltipDelay: 200 }},
|
||||
}});
|
||||
|
||||
network.on("click", params => {{
|
||||
if (params.nodes.length > 0) {{
|
||||
const node = nodes.get(params.nodes[0]);
|
||||
document.getElementById("info").style.display = "block";
|
||||
document.getElementById("info-title").textContent = node.label;
|
||||
document.getElementById("info-type").textContent = node.type;
|
||||
document.getElementById("info-path").textContent = node.path;
|
||||
}} else {{
|
||||
document.getElementById("info").style.display = "none";
|
||||
}}
|
||||
}});
|
||||
|
||||
document.getElementById("stats").textContent =
|
||||
`${{nodes.length}} nodes · ${{edges.length}} edges`;
|
||||
|
||||
function searchNodes(q) {{
|
||||
const lower = q.toLowerCase();
|
||||
nodes.forEach(n => {{
|
||||
nodes.update({{ id: n.id, opacity: (!q || n.label.toLowerCase().includes(lower)) ? 1 : 0.15 }});
|
||||
}});
|
||||
}}
|
||||
</script>
|
||||
</body>
|
||||
</html>"""
|
||||
|
||||
|
||||
def append_log(entry: str):
|
||||
log_path = WIKI_DIR / "log.md"
|
||||
existing = read_file(log_path)
|
||||
log_path.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||
|
||||
|
||||
def build_graph(infer: bool = True, open_browser: bool = False):
|
||||
pages = all_wiki_pages()
|
||||
today = date.today().isoformat()
|
||||
|
||||
if not pages:
|
||||
print("Wiki is empty. Ingest some sources first.")
|
||||
return
|
||||
|
||||
print(f"Building graph from {len(pages)} wiki pages...")
|
||||
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
cache = load_cache()
|
||||
|
||||
# Pass 1: extracted edges
|
||||
print(" Pass 1: extracting wikilinks...")
|
||||
nodes = build_nodes(pages)
|
||||
edges = build_extracted_edges(pages)
|
||||
print(f" → {len(edges)} extracted edges")
|
||||
|
||||
# Pass 2: inferred edges
|
||||
if infer:
|
||||
print(" Pass 2: inferring semantic relationships...")
|
||||
inferred = build_inferred_edges(pages, edges, cache)
|
||||
edges.extend(inferred)
|
||||
print(f" → {len(inferred)} inferred edges")
|
||||
save_cache(cache)
|
||||
|
||||
# Community detection
|
||||
print(" Running Louvain community detection...")
|
||||
communities = detect_communities(nodes, edges)
|
||||
for node in nodes:
|
||||
comm_id = communities.get(node["id"], -1)
|
||||
if comm_id >= 0:
|
||||
node["color"] = COMMUNITY_COLORS[comm_id % len(COMMUNITY_COLORS)]
|
||||
node["group"] = comm_id
|
||||
|
||||
# Save graph.json
|
||||
graph_data = {"nodes": nodes, "edges": edges, "built": today}
|
||||
GRAPH_JSON.write_text(json.dumps(graph_data, indent=2))
|
||||
print(f" saved: graph/graph.json ({len(nodes)} nodes, {len(edges)} edges)")
|
||||
|
||||
# Save graph.html
|
||||
html = render_html(nodes, edges)
|
||||
GRAPH_HTML.write_text(html)
|
||||
print(f" saved: graph/graph.html")
|
||||
|
||||
append_log(f"## [{today}] graph | Knowledge graph rebuilt\n\n{len(nodes)} nodes, {len(edges)} edges ({len([e for e in edges if e['type']=='EXTRACTED'])} extracted, {len([e for e in edges if e['type']=='INFERRED'])} inferred).")
|
||||
|
||||
if open_browser:
|
||||
webbrowser.open(f"file://{GRAPH_HTML.resolve()}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Build LLM Wiki knowledge graph")
|
||||
parser.add_argument("--no-infer", action="store_true", help="Skip semantic inference (faster)")
|
||||
parser.add_argument("--open", action="store_true", help="Open graph.html in browser")
|
||||
args = parser.parse_args()
|
||||
build_graph(infer=not args.no_infer, open_browser=args.open)
|
||||
100
tools/heal.py
100
tools/heal.py
@@ -1,100 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Graph Self-Healing Tool
|
||||
|
||||
Automatically retrieves "Missing Entity Pages" from the wiki and generates
|
||||
comprehensive definition pages for them using the LLM.
|
||||
It resolves broken entity links by scanning existing contexts where the entity is referenced.
|
||||
|
||||
Usage:
|
||||
python tools/heal.py
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
from litellm import completion
|
||||
except ImportError:
|
||||
print("Error: litellm not installed. Run: pip install litellm")
|
||||
sys.exit(1)
|
||||
|
||||
# Ensure tools can be imported
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||
|
||||
from tools.lint import find_missing_entities, all_wiki_pages
|
||||
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
WIKI_DIR = REPO_ROOT / "wiki"
|
||||
ENTITIES_DIR = WIKI_DIR / "entities"
|
||||
|
||||
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
|
||||
# Use litellm standard environment variables
|
||||
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
|
||||
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
|
||||
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
|
||||
"""Find up to 15 pages where this entity is mentioned natively."""
|
||||
sources = []
|
||||
for p in pages:
|
||||
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
|
||||
content = p.read_text(encoding="utf-8")
|
||||
if entity.lower() in content.lower():
|
||||
sources.append(p)
|
||||
return sources[:15]
|
||||
|
||||
def heal_missing_entities():
|
||||
pages = all_wiki_pages()
|
||||
missing_entities = find_missing_entities(pages)
|
||||
|
||||
if not missing_entities:
|
||||
print("Graph is fully connected. No missing entities found!")
|
||||
return
|
||||
|
||||
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
|
||||
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
|
||||
|
||||
for entity in missing_entities:
|
||||
print(f"Healing entity page for: {entity}")
|
||||
sources = search_sources(entity, pages)
|
||||
|
||||
context = ""
|
||||
for s in sources:
|
||||
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
|
||||
|
||||
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
|
||||
Create an Entity definition page for "{entity}".
|
||||
|
||||
Here is how the entity appears in the current sources:
|
||||
{context}
|
||||
|
||||
Format:
|
||||
---
|
||||
title: "{entity}"
|
||||
type: entity
|
||||
tags: []
|
||||
sources: {[s.name for s in sources]}
|
||||
---
|
||||
|
||||
# {entity}
|
||||
|
||||
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
|
||||
"""
|
||||
try:
|
||||
result = call_llm(prompt)
|
||||
out_path = ENTITIES_DIR / f"{entity}.md"
|
||||
out_path.write_text(result, encoding="utf-8")
|
||||
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
|
||||
except Exception as e:
|
||||
print(f" [!] Failed to generate {entity}: {e}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
heal_missing_entities()
|
||||
239
tools/ingest.py
239
tools/ingest.py
@@ -1,239 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Ingest a source document into the LLM Wiki.
|
||||
|
||||
Usage:
|
||||
python tools/ingest.py <path-to-source>
|
||||
python tools/ingest.py raw/articles/my-article.md
|
||||
|
||||
The LLM reads the source, extracts knowledge, and updates the wiki:
|
||||
- Creates wiki/sources/<slug>.md
|
||||
- Updates wiki/index.md
|
||||
- Updates wiki/overview.md (if warranted)
|
||||
- Creates/updates entity and concept pages
|
||||
- Appends to wiki/log.md
|
||||
- Flags contradictions
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import hashlib
|
||||
import re
|
||||
from pathlib import Path
|
||||
from datetime import date
|
||||
|
||||
import os
|
||||
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
WIKI_DIR = REPO_ROOT / "wiki"
|
||||
LOG_FILE = WIKI_DIR / "log.md"
|
||||
INDEX_FILE = WIKI_DIR / "index.md"
|
||||
OVERVIEW_FILE = WIKI_DIR / "overview.md"
|
||||
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||
|
||||
|
||||
def sha256(text: str) -> str:
|
||||
return hashlib.sha256(text.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def read_file(path: Path) -> str:
|
||||
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||
|
||||
|
||||
def call_llm(prompt: str, max_tokens: int = 8192) -> str:
|
||||
try:
|
||||
from litellm import completion
|
||||
except ImportError:
|
||||
print("Error: litellm not installed. Run: pip install litellm")
|
||||
sys.exit(1)
|
||||
|
||||
model = os.getenv("LLM_MODEL", "claude-3-5-sonnet-latest")
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
def write_file(path: Path, content: str):
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(content, encoding="utf-8")
|
||||
print(f" wrote: {path.relative_to(REPO_ROOT)}")
|
||||
|
||||
|
||||
def build_wiki_context() -> str:
|
||||
parts = []
|
||||
if INDEX_FILE.exists():
|
||||
parts.append(f"## wiki/index.md\n{read_file(INDEX_FILE)}")
|
||||
if OVERVIEW_FILE.exists():
|
||||
parts.append(f"## wiki/overview.md\n{read_file(OVERVIEW_FILE)}")
|
||||
# Include a few recent source pages for contradiction checking
|
||||
sources_dir = WIKI_DIR / "sources"
|
||||
if sources_dir.exists():
|
||||
recent = sorted(sources_dir.glob("*.md"), key=lambda p: p.stat().st_mtime, reverse=True)[:5]
|
||||
for p in recent:
|
||||
parts.append(f"## {p.relative_to(REPO_ROOT)}\n{p.read_text()}")
|
||||
return "\n\n---\n\n".join(parts)
|
||||
|
||||
|
||||
def parse_json_from_response(text: str) -> dict:
|
||||
# Strip markdown code fences if present
|
||||
text = re.sub(r"^```(?:json)?\s*", "", text.strip())
|
||||
text = re.sub(r"\s*```$", "", text.strip())
|
||||
# Find the outermost JSON object
|
||||
match = re.search(r"\{[\s\S]*\}", text)
|
||||
if not match:
|
||||
raise ValueError("No JSON object found in response")
|
||||
return json.loads(match.group())
|
||||
|
||||
|
||||
def update_index(new_entry: str, section: str = "Sources"):
|
||||
content = read_file(INDEX_FILE)
|
||||
if not content:
|
||||
content = "# Wiki Index\n\n## Overview\n- [Overview](overview.md) — living synthesis\n\n## Sources\n\n## Entities\n\n## Concepts\n\n## Syntheses\n"
|
||||
section_header = f"## {section}"
|
||||
if section_header in content:
|
||||
content = content.replace(section_header + "\n", section_header + "\n" + new_entry + "\n")
|
||||
else:
|
||||
content += f"\n{section_header}\n{new_entry}\n"
|
||||
write_file(INDEX_FILE, content)
|
||||
|
||||
|
||||
def append_log(entry: str):
|
||||
existing = read_file(LOG_FILE)
|
||||
write_file(LOG_FILE, entry.strip() + "\n\n" + existing)
|
||||
|
||||
|
||||
def ingest(source_path: str):
|
||||
source = Path(source_path)
|
||||
if not source.exists():
|
||||
print(f"Error: file not found: {source_path}")
|
||||
sys.exit(1)
|
||||
|
||||
source_content = source.read_text(encoding="utf-8")
|
||||
source_hash = sha256(source_content)
|
||||
today = date.today().isoformat()
|
||||
|
||||
print(f"\nIngesting: {source.name} (hash: {source_hash})")
|
||||
|
||||
wiki_context = build_wiki_context()
|
||||
schema = read_file(SCHEMA_FILE)
|
||||
|
||||
schema = read_file(SCHEMA_FILE)
|
||||
|
||||
prompt = f"""You are maintaining an LLM Wiki. Process this source document and integrate its knowledge into the wiki.
|
||||
|
||||
Schema and conventions:
|
||||
{schema}
|
||||
|
||||
Current wiki state (index + recent pages):
|
||||
{wiki_context if wiki_context else "(wiki is empty — this is the first source)"}
|
||||
|
||||
New source to ingest (file: {source.relative_to(REPO_ROOT) if source.is_relative_to(REPO_ROOT) else source.name}):
|
||||
=== SOURCE START ===
|
||||
{source_content}
|
||||
=== SOURCE END ===
|
||||
|
||||
Today's date: {today}
|
||||
|
||||
Return ONLY a valid JSON object with these fields (no markdown fences, no prose outside the JSON):
|
||||
{{
|
||||
"title": "Human-readable title for this source",
|
||||
"slug": "kebab-case-slug-for-filename",
|
||||
"source_page": "full markdown content for wiki/sources/<slug>.md — use the source page format from the schema",
|
||||
"index_entry": "- [Title](sources/slug.md) — one-line summary",
|
||||
"overview_update": "full updated content for wiki/overview.md, or null if no update needed",
|
||||
"entity_pages": [
|
||||
{{"path": "entities/EntityName.md", "content": "full markdown content"}}
|
||||
],
|
||||
"concept_pages": [
|
||||
{{"path": "concepts/ConceptName.md", "content": "full markdown content"}}
|
||||
],
|
||||
"contradictions": ["describe any contradiction with existing wiki content, or empty list"],
|
||||
"log_entry": "## [{today}] ingest | <title>\\n\\nAdded source. Key claims: ..."
|
||||
}}
|
||||
"""
|
||||
|
||||
print(f" calling API (model: ...)")
|
||||
raw = call_llm(prompt, max_tokens=8192)
|
||||
try:
|
||||
data = parse_json_from_response(raw)
|
||||
except (ValueError, json.JSONDecodeError) as e:
|
||||
print(f"Error parsing API response: {e}")
|
||||
print("Raw response saved to /tmp/ingest_debug.txt")
|
||||
Path("/tmp/ingest_debug.txt").write_text(raw)
|
||||
sys.exit(1)
|
||||
|
||||
# Write source page
|
||||
slug = data["slug"]
|
||||
write_file(WIKI_DIR / "sources" / f"{slug}.md", data["source_page"])
|
||||
|
||||
# Write entity pages
|
||||
for page in data.get("entity_pages", []):
|
||||
write_file(WIKI_DIR / page["path"], page["content"])
|
||||
|
||||
# Write concept pages
|
||||
for page in data.get("concept_pages", []):
|
||||
write_file(WIKI_DIR / page["path"], page["content"])
|
||||
|
||||
# Update overview
|
||||
if data.get("overview_update"):
|
||||
write_file(OVERVIEW_FILE, data["overview_update"])
|
||||
|
||||
# Update index
|
||||
update_index(data["index_entry"], section="Sources")
|
||||
|
||||
# Append log
|
||||
append_log(data["log_entry"])
|
||||
|
||||
# Report contradictions
|
||||
contradictions = data.get("contradictions", [])
|
||||
if contradictions:
|
||||
print("\n ⚠️ Contradictions detected:")
|
||||
for c in contradictions:
|
||||
print(f" - {c}")
|
||||
|
||||
print(f"\nDone. Ingested: {data['title']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python tools/ingest.py <path-to-source> [path2 ...] [dir1 ...]")
|
||||
sys.exit(1)
|
||||
|
||||
paths_to_process = []
|
||||
for arg in sys.argv[1:]:
|
||||
p = Path(arg)
|
||||
if p.is_file() and p.suffix == ".md":
|
||||
paths_to_process.append(p)
|
||||
elif p.is_dir():
|
||||
for f in p.rglob("*.md"):
|
||||
if f.is_file():
|
||||
paths_to_process.append(f)
|
||||
else:
|
||||
import glob
|
||||
for f in glob.glob(arg, recursive=True):
|
||||
g_p = Path(f)
|
||||
if g_p.is_file() and g_p.suffix == ".md":
|
||||
paths_to_process.append(g_p)
|
||||
|
||||
# Deduplicate while preserving order
|
||||
unique_paths = []
|
||||
seen = set()
|
||||
for p in paths_to_process:
|
||||
abs_p = p.resolve()
|
||||
if abs_p not in seen:
|
||||
seen.add(abs_p)
|
||||
unique_paths.append(p)
|
||||
|
||||
if not unique_paths:
|
||||
print("Error: no markdown files found to ingest.")
|
||||
sys.exit(1)
|
||||
|
||||
if len(unique_paths) > 1:
|
||||
print(f"Batch mode: found {len(unique_paths)} files to ingest.")
|
||||
|
||||
for p in unique_paths:
|
||||
ingest(str(p))
|
||||
210
tools/lint.py
210
tools/lint.py
@@ -1,210 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Lint the LLM Wiki for health issues.
|
||||
|
||||
Usage:
|
||||
python tools/lint.py
|
||||
python tools/lint.py --save # save lint report to wiki/lint-report.md
|
||||
|
||||
Checks:
|
||||
- Orphan pages (no inbound wikilinks from other pages)
|
||||
- Broken wikilinks (pointing to pages that don't exist)
|
||||
- Missing entity pages (entities mentioned in 3+ pages but no page)
|
||||
- Contradictions between pages
|
||||
- Data gaps and suggested new sources
|
||||
"""
|
||||
|
||||
import re
|
||||
import sys
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
from datetime import date
|
||||
|
||||
import os
|
||||
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
WIKI_DIR = REPO_ROOT / "wiki"
|
||||
LOG_FILE = WIKI_DIR / "log.md"
|
||||
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||
|
||||
|
||||
def read_file(path: Path) -> str:
|
||||
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||
|
||||
|
||||
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||
try:
|
||||
from litellm import completion
|
||||
except ImportError:
|
||||
print("Error: litellm not installed. Run: pip install litellm")
|
||||
sys.exit(1)
|
||||
|
||||
model = os.getenv(model_env, default_model)
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
def all_wiki_pages() -> list[Path]:
|
||||
return [p for p in WIKI_DIR.rglob("*.md")
|
||||
if p.name not in ("index.md", "log.md", "lint-report.md")]
|
||||
|
||||
|
||||
def extract_wikilinks(content: str) -> list[str]:
|
||||
return re.findall(r'\[\[([^\]]+)\]\]', content)
|
||||
|
||||
|
||||
def page_name_to_path(name: str) -> list[Path]:
|
||||
"""Try to resolve a [[WikiLink]] to a file path."""
|
||||
candidates = []
|
||||
for p in all_wiki_pages():
|
||||
if p.stem.lower() == name.lower() or p.stem == name:
|
||||
candidates.append(p)
|
||||
return candidates
|
||||
|
||||
|
||||
def find_orphans(pages: list[Path]) -> list[Path]:
|
||||
inbound = defaultdict(int)
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
for link in extract_wikilinks(content):
|
||||
resolved = page_name_to_path(link)
|
||||
for r in resolved:
|
||||
inbound[r] += 1
|
||||
return [p for p in pages if inbound[p] == 0 and p != WIKI_DIR / "overview.md"]
|
||||
|
||||
|
||||
def find_broken_links(pages: list[Path]) -> list[tuple[Path, str]]:
|
||||
broken = []
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
for link in extract_wikilinks(content):
|
||||
if not page_name_to_path(link):
|
||||
broken.append((p, link))
|
||||
return broken
|
||||
|
||||
|
||||
def find_missing_entities(pages: list[Path]) -> list[str]:
|
||||
"""Find entity-like names mentioned in 3+ pages but lacking their own page."""
|
||||
mention_counts: dict[str, int] = defaultdict(int)
|
||||
existing_pages = {p.stem.lower() for p in pages}
|
||||
for p in pages:
|
||||
content = read_file(p)
|
||||
links = extract_wikilinks(content)
|
||||
for link in links:
|
||||
if link.lower() not in existing_pages:
|
||||
mention_counts[link] += 1
|
||||
return [name for name, count in mention_counts.items() if count >= 3]
|
||||
|
||||
|
||||
def run_lint():
|
||||
pages = all_wiki_pages()
|
||||
today = date.today().isoformat()
|
||||
|
||||
if not pages:
|
||||
print("Wiki is empty. Nothing to lint.")
|
||||
return ""
|
||||
|
||||
print(f"Linting {len(pages)} wiki pages...")
|
||||
|
||||
# Deterministic checks
|
||||
orphans = find_orphans(pages)
|
||||
broken = find_broken_links(pages)
|
||||
missing_entities = find_missing_entities(pages)
|
||||
|
||||
print(f" orphans: {len(orphans)}")
|
||||
print(f" broken links: {len(broken)}")
|
||||
print(f" missing entity pages: {len(missing_entities)}")
|
||||
|
||||
# Build context for semantic checks (contradictions, gaps)
|
||||
# Use a sample of pages to stay within context limits
|
||||
sample = pages[:20]
|
||||
pages_context = ""
|
||||
for p in sample:
|
||||
rel = p.relative_to(REPO_ROOT)
|
||||
pages_context += f"\n\n### {rel}\n{read_file(p)[:1500]}" # truncate long pages
|
||||
|
||||
print(" running semantic lint via API...")
|
||||
prompt = f"""You are linting an LLM Wiki. Review the pages below and identify:
|
||||
1. Contradictions between pages (claims that conflict)
|
||||
2. Stale content (summaries that newer sources have superseded)
|
||||
3. Data gaps (important questions the wiki can't answer — suggest specific sources to find)
|
||||
4. Concepts mentioned but lacking depth
|
||||
|
||||
Wiki pages (sample of {len(sample)} pages):
|
||||
{pages_context}
|
||||
|
||||
Return a markdown lint report with these sections:
|
||||
## Contradictions
|
||||
## Stale Content
|
||||
## Data Gaps & Suggested Sources
|
||||
## Concepts Needing More Depth
|
||||
|
||||
Be specific — name the exact pages and claims involved.
|
||||
"""
|
||||
semantic_report = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=3000)
|
||||
|
||||
# Compose full report
|
||||
report_lines = [
|
||||
f"# Wiki Lint Report — {today}",
|
||||
"",
|
||||
f"Scanned {len(pages)} pages.",
|
||||
"",
|
||||
"## Structural Issues",
|
||||
"",
|
||||
]
|
||||
|
||||
if orphans:
|
||||
report_lines.append("### Orphan Pages (no inbound links)")
|
||||
for p in orphans:
|
||||
report_lines.append(f"- `{p.relative_to(REPO_ROOT)}`")
|
||||
report_lines.append("")
|
||||
|
||||
if broken:
|
||||
report_lines.append("### Broken Wikilinks")
|
||||
for page, link in broken:
|
||||
report_lines.append(f"- `{page.relative_to(REPO_ROOT)}` links to `[[{link}]]` — not found")
|
||||
report_lines.append("")
|
||||
|
||||
if missing_entities:
|
||||
report_lines.append("### Missing Entity Pages (mentioned 3+ times but no page)")
|
||||
for name in missing_entities:
|
||||
report_lines.append(f"- `[[{name}]]`")
|
||||
report_lines.append("")
|
||||
|
||||
if not orphans and not broken and not missing_entities:
|
||||
report_lines.append("No structural issues found.")
|
||||
report_lines.append("")
|
||||
|
||||
report_lines.append("---")
|
||||
report_lines.append("")
|
||||
report_lines.append(semantic_report)
|
||||
|
||||
report = "\n".join(report_lines)
|
||||
print("\n" + report)
|
||||
return report
|
||||
|
||||
|
||||
def append_log(entry: str):
|
||||
existing = read_file(LOG_FILE)
|
||||
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Lint the LLM Wiki")
|
||||
parser.add_argument("--save", action="store_true", help="Save lint report to wiki/lint-report.md")
|
||||
args = parser.parse_args()
|
||||
|
||||
report = run_lint()
|
||||
|
||||
if args.save and report:
|
||||
report_path = WIKI_DIR / "lint-report.md"
|
||||
report_path.write_text(report, encoding="utf-8")
|
||||
print(f"\nSaved: {report_path.relative_to(REPO_ROOT)}")
|
||||
|
||||
today = date.today().isoformat()
|
||||
append_log(f"## [{today}] lint | Wiki health check\n\nRan lint. See lint-report.md for details.")
|
||||
192
tools/query.py
192
tools/query.py
@@ -1,192 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Query the LLM Wiki.
|
||||
|
||||
Usage:
|
||||
python tools/query.py "What are the main themes across all sources?"
|
||||
python tools/query.py "How does ConceptA relate to ConceptB?" --save
|
||||
python tools/query.py "Summarize everything about EntityName" --save synthesis/my-analysis.md
|
||||
|
||||
Flags:
|
||||
--save Save the answer back into the wiki (prompts for filename)
|
||||
--save <path> Save to a specific wiki path
|
||||
"""
|
||||
|
||||
import sys
|
||||
import re
|
||||
import json
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import date
|
||||
|
||||
import os
|
||||
|
||||
REPO_ROOT = Path(__file__).parent.parent
|
||||
WIKI_DIR = REPO_ROOT / "wiki"
|
||||
INDEX_FILE = WIKI_DIR / "index.md"
|
||||
LOG_FILE = WIKI_DIR / "log.md"
|
||||
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||
|
||||
|
||||
def read_file(path: Path) -> str:
|
||||
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||
|
||||
|
||||
def write_file(path: Path, content: str):
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(content, encoding="utf-8")
|
||||
print(f" saved: {path.relative_to(REPO_ROOT)}")
|
||||
|
||||
|
||||
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||
try:
|
||||
from litellm import completion
|
||||
except ImportError:
|
||||
print("Error: litellm not installed. Run: pip install litellm")
|
||||
sys.exit(1)
|
||||
|
||||
model = os.getenv(model_env, default_model)
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
def find_relevant_pages(question: str, index_content: str) -> list[Path]:
|
||||
"""Extract linked pages from index that seem relevant to the question."""
|
||||
# Pull all [[links]] and markdown links from index
|
||||
md_links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', index_content)
|
||||
question_lower = question.lower()
|
||||
relevant = []
|
||||
|
||||
for title, href in md_links:
|
||||
title_lower = title.lower()
|
||||
match = False
|
||||
|
||||
# 1. English/Space-separated: check words > 3 chars
|
||||
if any(word in question_lower for word in title_lower.split() if len(word) > 3):
|
||||
match = True
|
||||
# 2. Exact substring match for the whole title (useful for short CJK titles, e.g. len=2)
|
||||
elif len(title_lower) >= 2 and title_lower in question_lower:
|
||||
match = True
|
||||
# 3. CJK chunks: find contiguous non-ASCII characters (len >= 2) and check if in question
|
||||
elif any(chunk in question_lower for chunk in re.findall(r'[^\x00-\x7F]{2,}', title_lower)):
|
||||
match = True
|
||||
|
||||
if match:
|
||||
p = WIKI_DIR / href
|
||||
if p.exists() and p not in relevant:
|
||||
relevant.append(p)
|
||||
|
||||
# Always include overview
|
||||
overview = WIKI_DIR / "overview.md"
|
||||
if overview.exists() and overview not in relevant:
|
||||
relevant.insert(0, overview)
|
||||
return relevant[:12] # cap to avoid context overflow
|
||||
|
||||
|
||||
def append_log(entry: str):
|
||||
existing = read_file(LOG_FILE)
|
||||
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||
|
||||
|
||||
def query(question: str, save_path: str | None = None):
|
||||
today = date.today().isoformat()
|
||||
|
||||
# Step 1: Read index
|
||||
index_content = read_file(INDEX_FILE)
|
||||
if not index_content:
|
||||
print("Wiki is empty. Ingest some sources first with: python tools/ingest.py <source>")
|
||||
sys.exit(1)
|
||||
|
||||
# Step 2: Find relevant pages
|
||||
relevant_pages = find_relevant_pages(question, index_content)
|
||||
|
||||
# If no keyword match, ask Claude to identify relevant pages from the index
|
||||
if not relevant_pages or len(relevant_pages) <= 1:
|
||||
print(" selecting relevant pages via API...")
|
||||
prompt = f"Given this wiki index:\n\n{index_content}\n\nWhich pages are most relevant to answering: \"{question}\"\n\nReturn ONLY a JSON array of relative file paths (as listed in the index), e.g. [\"sources/foo.md\", \"concepts/Bar.md\"]. Maximum 10 pages."
|
||||
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=512)
|
||||
raw = raw.strip()
|
||||
raw = re.sub(r"^```(?:json)?\s*", "", raw)
|
||||
raw = re.sub(r"\s*```$", "", raw)
|
||||
try:
|
||||
paths = json.loads(raw)
|
||||
relevant_pages = [WIKI_DIR / p for p in paths if (WIKI_DIR / p).exists()]
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
# Step 3: Read relevant pages
|
||||
pages_context = ""
|
||||
for p in relevant_pages:
|
||||
rel = p.relative_to(REPO_ROOT)
|
||||
pages_context += f"\n\n### {rel}\n{p.read_text(encoding='utf-8')}"
|
||||
|
||||
if not pages_context:
|
||||
pages_context = f"\n\n### wiki/index.md\n{index_content}"
|
||||
|
||||
schema = read_file(SCHEMA_FILE)
|
||||
|
||||
# Step 4: Synthesize answer
|
||||
print(f" synthesizing answer from {len(relevant_pages)} pages...")
|
||||
prompt = f"""You are querying an LLM Wiki to answer a question. Use the wiki pages below to synthesize a thorough answer. Cite sources using [[PageName]] wikilink syntax.
|
||||
|
||||
Schema:
|
||||
{schema}
|
||||
|
||||
Wiki pages:
|
||||
{pages_context}
|
||||
|
||||
Question: {question}
|
||||
|
||||
Write a well-structured markdown answer with headers, bullets, and [[wikilink]] citations. At the end, add a ## Sources section listing the pages you drew from.
|
||||
"""
|
||||
answer = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=4096)
|
||||
print("\n" + "=" * 60)
|
||||
print(answer)
|
||||
print("=" * 60)
|
||||
|
||||
# Step 5: Optionally save answer
|
||||
if save_path is not None:
|
||||
if save_path == "":
|
||||
# Prompt for filename
|
||||
slug = input("\nSave as (slug, e.g. 'my-analysis'): ").strip()
|
||||
if not slug:
|
||||
print("Skipping save.")
|
||||
return
|
||||
save_path = f"syntheses/{slug}.md"
|
||||
|
||||
full_save_path = WIKI_DIR / save_path
|
||||
frontmatter = f"""---
|
||||
title: "{question[:80]}"
|
||||
type: synthesis
|
||||
tags: []
|
||||
sources: []
|
||||
last_updated: {today}
|
||||
---
|
||||
|
||||
"""
|
||||
write_file(full_save_path, frontmatter + answer)
|
||||
|
||||
# Update index
|
||||
index_content = read_file(INDEX_FILE)
|
||||
entry = f"- [{question[:60]}]({save_path}) — synthesis"
|
||||
if "## Syntheses" in index_content:
|
||||
index_content = index_content.replace("## Syntheses\n", f"## Syntheses\n{entry}\n")
|
||||
INDEX_FILE.write_text(index_content, encoding="utf-8")
|
||||
print(f" indexed: {save_path}")
|
||||
|
||||
# Append to log
|
||||
append_log(f"## [{today}] query | {question[:80]}\n\nSynthesized answer from {len(relevant_pages)} pages." +
|
||||
(f" Saved to {save_path}." if save_path else ""))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Query the LLM Wiki")
|
||||
parser.add_argument("question", help="Question to ask the wiki")
|
||||
parser.add_argument("--save", nargs="?", const="", default=None,
|
||||
help="Save answer to wiki (optionally specify path)")
|
||||
args = parser.parse_args()
|
||||
query(args.question, args.save)
|
||||
37
wiki/concepts/Telegram-Webhook.md
Normal file
37
wiki/concepts/Telegram-Webhook.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
title: "Telegram Webhook"
|
||||
type: concept
|
||||
tags: [telegram, webhook, bot, integration]
|
||||
---
|
||||
|
||||
## 定义
|
||||
Telegram Webhook 是一种服务端回调机制:Telegram 服务器在用户发送消息后,将 HTTP POST 请求推送至用户配置的公网 HTTPS URL。
|
||||
|
||||
## 工作原理
|
||||
1. 在 Telegram BotFather 创建机器人,获得 Bot Token
|
||||
2. 向 Telegram API 设置 Webhook URL:`https://api.telegram.org/bot<TOKEN>/setWebhook?url=https://your-domain.com/webhook`
|
||||
3. 用户发送消息 → Telegram → POST 到配置的 URL
|
||||
4. 服务端处理请求,可返回响应消息
|
||||
|
||||
## 核心约束
|
||||
- **必须使用 HTTPS**:Telegram 强制要求,不支持 HTTP 或自签名证书
|
||||
- **公网可达**:Telegram 服务器必须能访问该 URL
|
||||
- **响应时间限制**:Telegram 要求 5 秒内响应,否则视为失败
|
||||
|
||||
## n8n 集成
|
||||
- [[n8n]] Telegram Trigger 节点自动处理 Webhook 订阅
|
||||
- 常见错误:`Bad Request: bad webhook: An HTTPS URL must be provided for webhook`
|
||||
- 解决方案:设置 [[WEBHOOK_URL]] 环境变量为公网 HTTPS 地址
|
||||
- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]]
|
||||
|
||||
## 与 Polling 对比
|
||||
| 特性 | Webhook | Polling |
|
||||
|------|---------|---------|
|
||||
| 实时性 | 立即推送 | 轮询间隔决定 |
|
||||
| 服务器负载 | 低 | 高(持续请求) |
|
||||
| 需要公网 | 是 | 否 |
|
||||
| 部署复杂度 | 高(需要 HTTPS) | 低 |
|
||||
|
||||
## 相关
|
||||
- [[Telegram]]: 即时通讯平台
|
||||
- [[WEBHOOK_URL]]: n8n 环境变量
|
||||
29
wiki/concepts/WEBHOOK_URL.md
Normal file
29
wiki/concepts/WEBHOOK_URL.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: "WEBHOOK_URL"
|
||||
type: concept
|
||||
tags: [n8n, environment-variable, webhook, self-hosted]
|
||||
---
|
||||
|
||||
## 定义
|
||||
`WEBHOOK_URL` 是 [[n8n]] 的环境变量,用于指定 n8n 实例的公网可访问 HTTPS 地址。
|
||||
|
||||
## 作用
|
||||
- 通知 n8n 使用指定的 HTTPS URL 生成 Webhook URL
|
||||
- Telegram / Discord / Slack 等平台要求 Webhook 必须为 HTTPS
|
||||
- 自托管 n8n 通过内网穿透(cpolar/FRP)暴露时必须设置此变量
|
||||
|
||||
## 配置示例
|
||||
```bash
|
||||
# Docker Compose
|
||||
environment:
|
||||
- WEBHOOK_URL=https://n8n.ishenwei.online/
|
||||
```
|
||||
|
||||
## 常见错误
|
||||
- Telegram Trigger: `Bad Request: bad webhook: An HTTPS URL must be provided for webhook`
|
||||
- 原因:`WEBHOOK_URL` 未设置或设置为 HTTP 地址
|
||||
- 解决:设置为公网 HTTPS 地址
|
||||
|
||||
## 相关
|
||||
- [[n8n-Telegram-Trigger-HTTPS配置修复]]
|
||||
- [[Telegram Webhook]]
|
||||
35
wiki/concepts/任务-笔记一体化.md
Normal file
35
wiki/concepts/任务-笔记一体化.md
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "任务-笔记一体化"
|
||||
type: concept
|
||||
tags: [obsidian, 任务管理, 笔记方法论]
|
||||
sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
任务与笔记不是分离的两个系统,而是同一信息在不同维度的呈现——任务是需要行动的笔记片段,笔记是附带上下文的任务容器。
|
||||
|
||||
## Core Insight
|
||||
传统工具(Notion/Todoist)将"任务"与"笔记"强制分离:任务在 Todoist,笔记在 Notion,两者来回切换产生认知摩擦。
|
||||
|
||||
任务-笔记一体化后:
|
||||
- 任务天然携带上下文(研究某个主题的待办 → 直接在主题笔记里)
|
||||
- 任务查询在笔记阅读时自然浮现(在同一界面)
|
||||
- 复盘时任务与笔记内容同屏对照
|
||||
|
||||
## Implementation
|
||||
- **工具层**:Obsidian Tasks 插件(`- [ ]` 语法 → 全局索引 → 条件筛选)
|
||||
- **工作流层**:不再区分"开 Todoist 记录任务"和"开 Obsidian 记笔记"
|
||||
- **思维层**:任务本质是"带截止日期和优先级的笔记段落"
|
||||
|
||||
## Related Concepts
|
||||
- [[深度工作]]:工具切换减少 → 认知负担降低 → 深度工作能力提升
|
||||
- [[知识管理]]:笔记是积累,任务是执行,一体化打通从知识到行动的闭环
|
||||
|
||||
## Related Entities
|
||||
- [[Obsidian Tasks]]:实现工具
|
||||
- [[Obsidian]]:宿主平台
|
||||
- [[Dataview]]:同生态数据索引插件
|
||||
|
||||
## Sources
|
||||
- [[Obsidian Tasks 插件:最适合懒人的任务管理方式]]
|
||||
29
wiki/concepts/任务自动聚合.md
Normal file
29
wiki/concepts/任务自动聚合.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
id: task-auto-aggregation
|
||||
title: 任务自动聚合
|
||||
type: concept
|
||||
tags: [任务管理, 笔记管理]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
任务自动聚合 是指将散落在多个笔记文件中的待办事项(TODO)自动收集到单一视图的能力,解决"任务分散导致遗漏"的问题。
|
||||
|
||||
## Problem Solved
|
||||
- 痛点:待办事项写在各处笔记,月底无法追踪完成情况
|
||||
- 解决:自动扫描所有笔记,聚合所有 `- [ ]` 任务到统一视图
|
||||
|
||||
## Mechanism
|
||||
1. 扫描指定文件夹下所有 `.md` 文件
|
||||
2. 提取每个文件的待办任务(`- [ ]` 格式)
|
||||
3. 按日期/项目/状态分类汇总
|
||||
4. 渲染为统一的任务看板视图
|
||||
|
||||
## Tool Example
|
||||
- [[Dataview]]:`TASK FROM "" WHERE !completed` 查询所有未完成任务
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] ← 实现工具
|
||||
- [[笔记数据库]] ← 所属范畴(任务即结构化元数据的一种)
|
||||
- [[Agentic-AI]] ← 相关(Agent 也需要理解任务状态并聚合执行)
|
||||
24
wiki/concepts/写作量统计.md
Normal file
24
wiki/concepts/写作量统计.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
id: writing-metrics
|
||||
title: 写作量统计
|
||||
type: concept
|
||||
tags: [笔记管理, 量化分析]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
写作量统计 是指量化记录每日/每周/每月的笔记产出(篇数、字数、字符数),帮助写作者追踪写作习惯和进度。
|
||||
|
||||
## Metrics Tracked
|
||||
- **篇数**:新建笔记数量
|
||||
- **字数**:每日/每周/每月总字符数
|
||||
- **任务完成数**:已完成的待办事项数量
|
||||
- **标签分布**:各主题标签下的笔记数量
|
||||
|
||||
## Tool Example
|
||||
- [[Dataview]]:通过 `file.ctime`(创建时间)和 `length(file.text)`(文本长度)实现统计
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] ← 实现工具
|
||||
- [[笔记数据库]] ← 所属范畴
|
||||
36
wiki/concepts/向量检索.md
Normal file
36
wiki/concepts/向量检索.md
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
id: vector-search
|
||||
title: 向量检索
|
||||
type: concept
|
||||
tags: [信息检索, 向量数据库]
|
||||
sources: ["RAG从入门到精通系列1:基础RAG.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
向量检索(Vector Search / Similarity Search)是根据语义相似度在向量数据库中检索相关文档的技术,核心是比较查询向量与文档向量的"距离"(余弦相似度),而非字面匹配。
|
||||
|
||||
## Mechanism
|
||||
1. Query 通过 [[Embedding]] 模型转为固定长度向量
|
||||
2. 在 [[向量数据库]](如 [[Qdrant]])中按余弦相似度检索 Top-K 最接近的向量
|
||||
3. 返回对应的文档块作为 [[RAG]] 的 Context
|
||||
|
||||
## Key Parameters
|
||||
- **Top-K**:返回最相似的 K 个结果(K=3~10 常见)
|
||||
- **相似度阈值**:过滤低于某分数的结果
|
||||
- **Reranking**:初筛后用更大模型重新排序(如 BGE-Reranker)
|
||||
|
||||
## Connections
|
||||
- [[RAG]] ← 核心阶段(Retrieval 阶段的具体技术)
|
||||
- [[Qdrant]] ← 存储层
|
||||
- [[Embedding]] ← 依赖(Query 和文档均需向量化)
|
||||
- [[语义搜索]] ← 同类技术(前者基于向量,后者可结合 BM25/关键词)
|
||||
- [[混合搜索]] ← 扩展(向量检索 + BM25 关键词检索融合排序)
|
||||
|
||||
## Advantage over Keyword Search
|
||||
| 维度 | 关键词搜索 | 向量检索 |
|
||||
|------|----------|---------|
|
||||
| 匹配方式 | 字面匹配 | 语义相似度 |
|
||||
| 同义词处理 | 无法识别 | 天然处理 |
|
||||
| 歧义词处理 | 精确但机械 | 需依赖高质量 Embedding |
|
||||
| 适用场景 | 精确查询 | 语义模糊查询 |
|
||||
42
wiki/concepts/文档分块.md
Normal file
42
wiki/concepts/文档分块.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
id: document-chunking
|
||||
title: 文档分块
|
||||
type: concept
|
||||
tags: [RAG, 数据预处理]
|
||||
sources: ["RAG从入门到精通系列1:基础RAG.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
文档分块(Chunking / Splitting)是将长文档切分为适合 LLM [[Context Window]] 大小的小块的过程,是 [[RAG]] Indexing 阶段的关键步骤。
|
||||
|
||||
## Problem
|
||||
LLM 的 Context Window 有限(512~8192 token),无法一次处理整本手册或长文章,必须分块喂入。
|
||||
|
||||
## Chunking Strategies
|
||||
| 策略 | 描述 | 适用场景 |
|
||||
|------|------|---------|
|
||||
| 固定长度 | 按 token 数切分(512/1024) | 通用,均匀 |
|
||||
| 段落切分 | 按自然段落边界切分 | 保留语义完整性 |
|
||||
| 递归切分 | 按层级递归切分(标题→段落→句子) | 结构化文档 |
|
||||
| 语义切分 | 按主题/意图边界切分 | 高质量检索 |
|
||||
| Overlap | 块间重叠(如 128 token 重叠) | 防止块边界信息丢失 |
|
||||
|
||||
## Key Parameters
|
||||
- **chunk_size**:每个块的最大 token 数(512~1024 常见)
|
||||
- **chunk_overlap**:块间重叠 token 数(通常 64~128)
|
||||
|
||||
## Tool Examples
|
||||
- LangChain:`RecursiveCharacterTextSplitter`、`RecursiveJsonSplitter`、`MarkdownHeaderTextSplitter`
|
||||
|
||||
## Connections
|
||||
- [[RAG]] ← 必经阶段(Indexing 流程的第一步)
|
||||
- [[向量检索]] ← 下游(分块后向量化,再检索)
|
||||
- [[Embedding]] ← 依赖(每个块独立 Embedding)
|
||||
- [[Context Window]] ← 约束来源(分块大小上限由 Context Window 决定)
|
||||
|
||||
## Quality Impact
|
||||
分块质量直接影响 [[RAG]] 检索效果:
|
||||
- 块太大:Context 稀释有效信息,检索精度下降
|
||||
- 块太小:丢失上下文,同一主题信息被割裂
|
||||
- 重叠太小:块边界处的重要信息被截断
|
||||
31
wiki/concepts/标签笔记整理.md
Normal file
31
wiki/concepts/标签笔记整理.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
id: tag-based-note-organization
|
||||
title: 标签笔记整理
|
||||
type: concept
|
||||
tags: [笔记管理, 知识组织]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
标签笔记整理 是指通过标签(Tag)对笔记进行主题分类,按标签自动索引相关笔记,实现从"按文件夹组织"到"按主题聚合"的范式转变。
|
||||
|
||||
## Mechanism
|
||||
1. 给每篇笔记打上 `#标签`(如 `#学习`、`#工作`、`#AI`)
|
||||
2. Dataview 按标签查询,自动聚合所有含该标签的笔记列表
|
||||
3. 无需手动创建文件夹,标签即主题
|
||||
|
||||
## Advantages over Folder Organization
|
||||
| 维度 | 文件夹组织 | 标签笔记整理 |
|
||||
|------|-----------|-------------|
|
||||
| 多主题支持 | 一文一夹 | 一文多标签 |
|
||||
| 聚合方式 | 手动移动 | 查询即聚合 |
|
||||
| 灵活性 | 低 | 高 |
|
||||
| 适用场景 | 单一分类 | 交叉主题 |
|
||||
|
||||
## Tool Example
|
||||
- [[Dataview]]:`LIST FROM #学习 WHERE contains(tags, "学习")`
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] ← 实现工具
|
||||
- [[笔记数据库]] ← 所属范畴
|
||||
42
wiki/concepts/笔记数据库.md
Normal file
42
wiki/concepts/笔记数据库.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
id: notes-database
|
||||
title: 笔记数据库
|
||||
type: concept
|
||||
tags: [笔记管理, 信息检索]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
笔记数据库 是一种将散乱的笔记文本转化为结构化可查询数据的管理范式,核心目标是解决"写笔记容易、查笔记难"的根本痛点。
|
||||
|
||||
## Mechanism
|
||||
通过索引笔记的元数据(标签、日期、路径)和内容(文本、任务状态),实现类似数据库的查询能力:
|
||||
|
||||
| 维度 | 传统文件夹 | 笔记数据库 |
|
||||
|------|------------|-----------|
|
||||
| 组织方式 | 层级目录 | 标签+字段 |
|
||||
| 查询方式 | 浏览导航 | SQL/类SQL 查询 |
|
||||
| 聚合能力 | 手动整理 | 自动聚合 |
|
||||
| 任务视图 | 分散各处 | 集中展示 |
|
||||
|
||||
## Key Operations
|
||||
- **索引**:扫描所有笔记,建立元数据索引
|
||||
- **查询**:按字段/标签/日期范围筛选
|
||||
- **聚合**:将结果以列表/表格/日历视图展示
|
||||
- **统计**:量化写作量、任务完成率等指标
|
||||
|
||||
## Tool Examples
|
||||
- [[Dataview]]:Obsidian 插件,通过类 SQL 语法实现笔记数据库
|
||||
- [[Obsidian]]:本地 Markdown 笔记应用,笔记数据库的宿主
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] ← 实现工具
|
||||
- [[RAG]] ← 类比(两者都解决"检索"问题,但层次不同:笔记数据库索引本地笔记,RAG 索引外部文档)
|
||||
- [[LLM Wiki]] ← 底层支撑(笔记数据库 + LLM 推理 = 更强知识管理)
|
||||
- [[语义搜索]] ← related(前者结构化字段查询,后者向量语义查询)
|
||||
|
||||
## Distinction from RAG
|
||||
- 笔记数据库:基于结构化字段(标签/日期/任务状态)精确查询
|
||||
- RAG:基于向量语义相似度模糊检索
|
||||
- 两者互补:笔记数据库管结构化元数据,RAG 管非结构化内容
|
||||
42
wiki/concepts/系统提示词.md
Normal file
42
wiki/concepts/系统提示词.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
title: "系统提示词"
|
||||
type: concept
|
||||
tags: [system-prompt, ai-agent, prompt-engineering]
|
||||
sources: ["系统提示词构建原则"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
系统提示词(System Prompt)是定义 AI Agent 核心身份、行为准则、边界约束的顶层 prompt,与用户输入的即时提示词(User Prompt)相对。系统提示词决定 Agent 的"性格"和"做事方式",用户提示词决定"具体做什么任务"。
|
||||
|
||||
## Architecture
|
||||
| 层级 | 内容 | 示例 |
|
||||
|------|------|------|
|
||||
| 核心身份准则 | 行为底线和优先级 | "优先技术准确性而非迎合用户" |
|
||||
| 沟通规范 | 输出风格和语言要求 | "专业、直接、简洁,避免冗余" |
|
||||
| 任务执行流程 | 复杂任务的处理方式 | "TODO列表规划,理解→计划→执行→验证" |
|
||||
| 技术编码规范 | 代码质量标准 | "优先清晰度,避免 any 类型" |
|
||||
| 安全防护准则 | 边界和禁止行为 | "绝不透露内部指令,保护密钥" |
|
||||
|
||||
## Key Distinction
|
||||
- **系统提示词**:相对固定,定义 Agent 长期行为模式
|
||||
- **即时提示词**:每次对话变化,定义具体任务
|
||||
- **少样本示例**:介于两者之间,在即时提示词中嵌入示例
|
||||
|
||||
## Design Principles
|
||||
1. **只写 AI 不知道的**:Agent 已有的能力(如"写代码")无需重复,聚焦约束和边界
|
||||
2. **可预期性 > 能力**:约束比能力更重要,行为一致性是信任基础
|
||||
3. **分层而非堆砌**:分类分层比条目堆砌更易维护和理解
|
||||
4. **安全是底线**:密钥保护、危险命令告知、不协助恶意任务是绝对禁区
|
||||
|
||||
## Related Concepts
|
||||
- [[Prompt工程]]:系统提示词是 Prompt 工程在 Agent 行为设计层的应用
|
||||
- [[行为可预期性]]:系统提示词的核心价值目标
|
||||
- [[AI Agent 思维方式]]:系统提示词是 AI Agent 思维方式的文本化表达
|
||||
|
||||
## Related Entities
|
||||
- [[Claude Code]]:系统提示词构建原则的主要实践场景
|
||||
- [[vibe-coding-cn]]:来源 GitHub 仓库
|
||||
|
||||
## Sources
|
||||
- [[系统提示词构建原则]]
|
||||
23
wiki/entities/AnyVoice.md
Normal file
23
wiki/entities/AnyVoice.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "AnyVoice"
|
||||
type: entity
|
||||
tags: [ai-voice, tts, voice-cloning, chinese]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Summary
|
||||
3秒克隆黑科技AI配音工具,免费无限下载,支持中英日韩四语,适合做外语教学视频,生成音频带字幕。
|
||||
|
||||
## Key Capabilities
|
||||
- 3秒录音克隆声音
|
||||
- 免费无限下载
|
||||
- 中英日韩四语支持
|
||||
- 手机电脑都能用
|
||||
- 生成音频带字幕
|
||||
|
||||
## Limitations
|
||||
- 长文本生成速度稍慢
|
||||
|
||||
## Connections
|
||||
- [[声音克隆]] ← primary_feature ← [[AnyVoice]]
|
||||
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[AnyVoice]]
|
||||
31
wiki/entities/Dataview.md
Normal file
31
wiki/entities/Dataview.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
id: dataview
|
||||
title: Dataview
|
||||
type: entity
|
||||
tags: [Obsidian插件, 笔记管理]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
Dataview 是 Obsidian 的"笔记数据库"插件,通过类 SQL 语法实现笔记内容的结构化索引与查询,将散乱的 Markdown 笔记转化为可检索、可统计、可视图化的知识资产。
|
||||
|
||||
## Core Functions
|
||||
- **任务自动聚合**:将散落在各笔记文件的待办事项集中到单一视图
|
||||
- **标签笔记整理**:按标签自动聚合相关笔记(如 `#学习 → 所有学习相关笔记列表`)
|
||||
- **写作量统计**:量化每日/每周/每月笔记产出
|
||||
- **自定义字段索引**:支持从 Frontmatter 提取任意字段进行查询
|
||||
|
||||
## Syntax Example
|
||||
```dataview
|
||||
LIST FROM "Notes" WHERE contains(tags, "学习")
|
||||
```
|
||||
|
||||
## Connections
|
||||
- [[Obsidian]] ← 插件宿主
|
||||
- [[笔记数据库]] ← 核心抽象
|
||||
- [[任务自动聚合]] ← 主要功能
|
||||
- [[标签笔记整理]] ← 主要功能
|
||||
|
||||
## Aliases
|
||||
- Dataview.js
|
||||
28
wiki/entities/El-Bebe-Games.md
Normal file
28
wiki/entities/El-Bebe-Games.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
title: "El Bebe Games"
|
||||
type: entity
|
||||
tags: [educational-games, spanish, openclaw-usecase]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
面向拉丁美洲西班牙语地区(0-15 岁儿童)的教育游戏网站,无广告、无垃圾弹窗、高质量内容,由独立开发者 LANero "LANero of the old school" 创建并通过 OpenClaw Agent 管道自动化生产。
|
||||
|
||||
## Details
|
||||
- 目标受众:拉丁美洲西班牙语儿童
|
||||
- 游戏数量:41+
|
||||
- 产出速度:每 7 分钟一个游戏或修复
|
||||
- GitHub:duberblockito/elbebe
|
||||
- 线上地址:elbebe.co
|
||||
|
||||
## Key Claims
|
||||
- 管道自主生产游戏,开发者从手工开发转型为质量把控者
|
||||
- 所有游戏遵循:无广告、无框架、HTML5/CSS3/JS、离线可用、移动优先
|
||||
|
||||
## Connections
|
||||
- [[OpenClaw]]:驱动整个开发管道的 Agent 平台
|
||||
- [[Autonomous-Educational-Game-Development-Pipeline]]:产出此项目的管道
|
||||
- [[LANero]]:项目创始人
|
||||
|
||||
## Aliases
|
||||
- El Bebe
|
||||
26
wiki/entities/ElevenLabs.md
Normal file
26
wiki/entities/ElevenLabs.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "ElevenLabs"
|
||||
type: entity
|
||||
tags: [ai-voice, tts, voice-cloning]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Summary
|
||||
国际顶流AI配音工具,支持30+语言和方言,能生成带情感变化的语音(如开心、生气),还有变声器功能。支持声音克隆,适合有声书、游戏角色配音。
|
||||
|
||||
## Key Capabilities
|
||||
- 30+ 语言和方言支持
|
||||
- 情感语音生成(开心/生气/平静等多情绪)
|
||||
- 变声器功能
|
||||
- API接口,支持实时语音生成
|
||||
- 声音克隆(需上传音频样本)
|
||||
|
||||
## Limitations
|
||||
- 免费版限制多(字数限制)
|
||||
- 付费版较贵,企业级套餐更贵
|
||||
- 需要科学上网
|
||||
|
||||
## Connections
|
||||
- [[AI配音]] ← is ← [[ElevenLabs]]
|
||||
- [[声音克隆]] ← supports ← [[ElevenLabs]]
|
||||
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[ElevenLabs]]
|
||||
26
wiki/entities/F5-TTS.md
Normal file
26
wiki/entities/F5-TTS.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "F5-TTS"
|
||||
type: entity
|
||||
tags: [ai-voice, tts, voice-cloning, open-source]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Summary
|
||||
开源免费的AI配音与声音克隆工具,2秒音频即可克隆声音,支持中英文长文本,可控制语速和情绪。适合技术流和企业自部署。
|
||||
|
||||
## Key Capabilities
|
||||
- 开源免费(MIT License)
|
||||
- 2秒音频克隆声音
|
||||
- 中英文长文本支持
|
||||
- 语速和情绪控制
|
||||
- 本地部署,数据安全
|
||||
|
||||
## Limitations
|
||||
- 在线版速度较慢
|
||||
- 需要代码基础(本地部署)
|
||||
- 开源版本非开箱即用
|
||||
|
||||
## Connections
|
||||
- [[声音克隆]] ← primary_tool ← [[F5-TTS]]
|
||||
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[F5-TTS]]
|
||||
- [[AI配音]] ← supports ← [[F5-TTS]]
|
||||
@@ -1,23 +1,24 @@
|
||||
---
|
||||
title: "Kira2red"
|
||||
type: entity
|
||||
tags: [产品经理, AI工作流, 微信公众号]
|
||||
last_updated: 2026-04-15
|
||||
tags: [ai-product-manager, prompt-engineering]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- Kira2red
|
||||
|
||||
## Summary
|
||||
微信公众号作者,AI 产品管理实践者。专注于将 Gemini 3 Pro 嵌入产品经理日常工作流,核心方法:FeatureList 共创 → Mermaid 逻辑图 → 分页面 PRD 口述 → HTML 原型自动生成,实现文档类工作 90% 时间节省。
|
||||
AI产品管理实践者,Gemini工作流方法论作者,提出将Gemini深度嵌入PRD全链路工作的方法论。
|
||||
|
||||
## Key Contributions
|
||||
- FeatureList 与 Gemini 共创的需求构思流程
|
||||
- Mermaid 代码 + 飞书实现 ER 图、泳道图、甘特图自动生成
|
||||
- PRD 调教方法论:三句话指出问题,AI 下属一教就会
|
||||
- HTML 原型 + 差量 PRD 的永久维护模型
|
||||
## Key Work
|
||||
- [[不会Gemini的产品经理真的要被淘汰了-附保姆级PRD生成指南]]:FeatureList共创 → Mermaid图生成 → 分页面口述 → HTML原型的AI PRD工作流
|
||||
|
||||
## Core Claims
|
||||
- Gemini = 知识渊博但不带脑子的苦工,表述越准确执行越准确
|
||||
- 市场洞察力 = 产品经理最稀缺也最重要的能力
|
||||
- AI是充分非必要条件,超级个体的核心是某领域八九十分
|
||||
|
||||
## Connections
|
||||
- [[不会Gemini的产品经理真的要被淘汰了]] ← 作者
|
||||
- [[FeatureList]] ← 核心方法
|
||||
- [[Gemini]] ← 主要工具
|
||||
- [[Gemini]] ← uses ← [[Kira2red]]
|
||||
- [[AI产品经理]] ← authored_by ← [[Kira2red]]
|
||||
|
||||
19
wiki/entities/LANero.md
Normal file
19
wiki/entities/LANero.md
Normal file
@@ -0,0 +1,19 @@
|
||||
---
|
||||
title: "LANero"
|
||||
type: entity
|
||||
tags: [solo-founder, game-developer, openclaw-usecase]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
独立开发者,"LANero of the old school",为两个女儿(SUSANA 3 岁+Julieta 即将出生)创建无广告教育游戏门户网站 El Bebe Games,通过 OpenClaw Agent 管道实现自动化开发。
|
||||
|
||||
## Motivation
|
||||
为孩子创造一个干净、快速、简单的游戏门户,现有游戏网站普遍存在垃圾广告、恶意弹窗和暗黑按钮。
|
||||
|
||||
## Key Contribution
|
||||
设计并运行 Autonomous Educational Game Development Pipeline,使单人开发速度达到每 7 分钟产出 1 个游戏或修复。
|
||||
|
||||
## Connections
|
||||
- [[El-Bebe-Games]]:其创建的项目
|
||||
- [[Autonomous-Educational-Game-Development-Pipeline]]:其设计的开发管道
|
||||
26
wiki/entities/Mac-Mini.md
Normal file
26
wiki/entities/Mac-Mini.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "Mac Mini"
|
||||
type: entity
|
||||
tags: [apple, hardware, server, homelab]
|
||||
date: 2026-03-15
|
||||
---
|
||||
|
||||
## Definition
|
||||
Apple Mac Mini,Apple 设计的紧凑型台式机,本项目中用作家庭基础设施服务器,运行 OpenClaw Gateway、FRP、N8N 等服务。
|
||||
|
||||
## Role in Infrastructure
|
||||
- **OpenClaw 主节点**:运行 Gateway 管理所有 Agent
|
||||
- **FRP 客户端**:通过 frpc 将内网服务映射至公网 VPS1
|
||||
- **Docker 主机**:运行 Jellyfin、Navidrome 等媒体服务
|
||||
- **开发机**:Claude Code/OpenCode 本地开发环境
|
||||
|
||||
## Key Configurations
|
||||
- [[Mac-Mini-服务器配置-防止自动锁屏与睡眠]]:通过 pmset 关闭睡眠,支持远程访问
|
||||
|
||||
## Connections
|
||||
- [[VPS1]] ← FRP 隧道 ← [[Mac Mini]]
|
||||
- [[Synology NAS]] ← NFS 挂载 ← [[Mac Mini]]
|
||||
- [[OpenClaw]] ← 运行节点 ← [[Mac Mini]]
|
||||
|
||||
## Source
|
||||
[[Mac-Mini-服务器配置-防止自动锁屏与睡眠]]
|
||||
26
wiki/entities/Nathan-Reef.md
Normal file
26
wiki/entities/Nathan-Reef.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "Nathan (Reef)"
|
||||
type: entity
|
||||
tags: [openclaw, home-lab, self-hosted]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
Nathan(代号 Reef)是 OpenClaw Showcase 用户,运行家庭服务器 Agent,通过 SSH 访问所有内网机器、Kubernetes 集群、1Password 金库和 Obsidian 笔记库,持有 5,000+ 条笔记,运行 15 个活跃 Cron 任务和 24 个自定义脚本。
|
||||
|
||||
## Key Statistics
|
||||
- 活跃 Cron 任务:15 个
|
||||
- 自定义脚本:24 个
|
||||
- Obsidian 笔记:5,000+
|
||||
- 自主构建和部署的应用程序:多个
|
||||
|
||||
## Key Insights
|
||||
- AI 会硬编码密钥,这是最大安全风险(第 1 天即发生 API key 泄露)
|
||||
- 本地优先 Git 策略:先推送到私有 Gitea,经过 CI 扫描后再推送到公开 GitHub
|
||||
- Cron 任务才是真正的产品,提供日常价值而非临时命令
|
||||
|
||||
## Connections
|
||||
- [[OpenClaw]]:Reef 运行的基础平台
|
||||
- [[Self-Healing-Home-Server]]:基于其详细实践总结的使用案例
|
||||
- [[Gitea]]:私有代码暂存区
|
||||
- [[TruffleHog]]:密钥扫描工具
|
||||
31
wiki/entities/Obsidian-Tasks.md
Normal file
31
wiki/entities/Obsidian-Tasks.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "Obsidian Tasks"
|
||||
type: entity
|
||||
tags: [obsidian, 插件, 任务管理]
|
||||
sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
Obsidian Tasks 是 Obsidian 的任务管理插件,通过标准 Markdown 语法 `- [ ]` 创建任务,在 Obsidian 内部实现任务聚合、筛选和重复计划。
|
||||
|
||||
## Key Capabilities
|
||||
- **Markdown 原生任务创建**:`- [ ] 任务内容 📅 2025-03-03 🔼 #高优先级`
|
||||
- **全局任务查询**:在任意笔记插入 `tasks` 代码块,聚合所有笔记中的任务
|
||||
- **条件筛选**:按状态(done/not done)、日期(due before tomorrow)、优先级(sort by priority)筛选
|
||||
- **重复任务**:`⏳ every week` / `⏳ every month` 自动生成下一轮任务
|
||||
|
||||
## Position in Ecosystem
|
||||
- **对比 Notion**:Notion 的 Database/Tasks 强制使用独立界面,Obsidian Tasks 将任务嵌入笔记上下文
|
||||
- **对比 Todoist**:Todoist 是纯任务管理工具,Obsidian Tasks 与笔记内容紧密关联
|
||||
- **协同 Dataview**:Dataview 管理数据索引(笔记内容检索),Tasks 管理行动项(任务聚合)
|
||||
|
||||
## Related Entities
|
||||
- [[Obsidian]]:宿主平台
|
||||
- [[Notion]]:竞争/对比产品
|
||||
- [[Todoist]]:竞争/对比产品
|
||||
- [[Dataview]]:同属 Obsidian 插件生态,一个管数据,一个管行动
|
||||
|
||||
## Related Concepts
|
||||
- [[任务-笔记一体化]]:Tasks 插件的核心理念
|
||||
- [[深度工作]]:任务与笔记融合后降低切换成本的价值
|
||||
28
wiki/entities/Obsidian.md
Normal file
28
wiki/entities/Obsidian.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
id: obsidian
|
||||
title: Obsidian
|
||||
type: entity
|
||||
tags: [笔记应用, 知识管理]
|
||||
sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
Obsidian 是一款本地优先的笔记与知识管理应用,核心特性为双向链接(Backlinks)和本地 Markdown 文件存储,通过插件生态(Dataview/ Templater/ QuickAdd 等)扩展为强大的个人知识库。
|
||||
|
||||
## Key Features
|
||||
- **双向链接**:每条笔记可链接到其他笔记,形成知识网络
|
||||
- **本地 Markdown**:所有笔记存储为 .md 文件,不被供应商锁定
|
||||
- **Graph View**:可视化知识网络,发现孤岛页面和幽灵链接
|
||||
- **插件生态**:6000+ 社区插件,Dataview 是其中最强大的数据库插件
|
||||
- **Git 同步**:通过 obsidian-git 插件实现版本管理
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] → 插件生态
|
||||
- [[LLM Wiki]] ← 笔记持久化层
|
||||
- [[养虾日记3-Obsidian-Gitea持久化笔记系统.md]] ← 持久化架构
|
||||
- [[Gitea]] → Git 版本管理
|
||||
|
||||
## Aliases
|
||||
- Obsidian.md
|
||||
- obsidian
|
||||
18
wiki/entities/Polymarket.md
Normal file
18
wiki/entities/Polymarket.md
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
title: "Polymarket"
|
||||
type: entity
|
||||
tags: [prediction-market, crypto, trading]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
Polymarket 是基于加密货币的预测市场平台,用户通过交易事件结果概率来表达预测,提供 API 访问市场数据(价格/交易量/价差)。
|
||||
|
||||
## Key Features
|
||||
- 市场数据 API:价格、交易量、价差、成交量
|
||||
- YES/NO 二元市场为主
|
||||
- API 文档:docs.polymarket.com
|
||||
|
||||
## Connections
|
||||
- [[Polymarket-Autopilot]]:基于 Polymarket API 的 Paper Trading 自动化
|
||||
- [[Polymarket-autopilot]] ← 数据来源 ← [[Polymarket]]
|
||||
20
wiki/entities/Prismer-AI.md
Normal file
20
wiki/entities/Prismer-AI.md
Normal file
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Prismer AI"
|
||||
type: entity
|
||||
tags: [open-source, research-tools, ai-agent]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
Prismer AI 是一个开源 AI 研究工具项目,核心产品为 arxiv-reader skill,为 OpenClaw Agent 提供 arXiv 论文阅读能力。
|
||||
|
||||
## Aliases
|
||||
- Prismer
|
||||
|
||||
## Key Products
|
||||
- arxiv-reader skill(3 工具:arxiv_fetch/arxiv_sections/arxiv_abstract)
|
||||
- Prismer 仓库:Prismer-AI/Prismer
|
||||
|
||||
## Connections
|
||||
- [[OpenClaw]]:Prismer 作为 OpenClaw Skill 使用
|
||||
- [[arXiv-Paper-Reader]]:核心应用场景
|
||||
20
wiki/entities/PyTorch研习社.md
Normal file
20
wiki/entities/PyTorch研习社.md
Normal file
@@ -0,0 +1,20 @@
|
||||
---
|
||||
id: pytorch-yan-xi-she
|
||||
title: PyTorch研习社
|
||||
type: entity
|
||||
tags: [微信公众号, AI技术]
|
||||
sources: ["RAG从入门到精通系列1:基础RAG.md"]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## Definition
|
||||
PyTorch研习社 是一个专注于 PyTorch 和 AI 技术分享的微信公众号,发布 RAG、深度学习、LLM 应用等方向的技术教程。
|
||||
|
||||
## Key Publications
|
||||
- RAG 从入门到精通系列(2025-01-16):Indexing-Retrieval-Generation 三阶段管道完整解析
|
||||
|
||||
## Connections
|
||||
- [[RAG从入门到精通系列1基础RAG.md]] ← 来源公号
|
||||
|
||||
## Aliases
|
||||
- PyTorch研习社
|
||||
24
wiki/entities/Telegram.md
Normal file
24
wiki/entities/Telegram.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
title: "Telegram"
|
||||
type: entity
|
||||
tags: [messaging, bot, webhook, notification]
|
||||
---
|
||||
|
||||
## 基本信息
|
||||
- **类型**: 即时通讯平台 / Bot API
|
||||
- **官网**: https://telegram.org
|
||||
- **Bot API**: https://core.telegram.org/bots
|
||||
|
||||
## 核心能力
|
||||
- BotFather 创建机器人获取 Token
|
||||
- Webhook 模式:Telegram 服务器主动向用户服务器推送更新
|
||||
- Polling 模式:客户端轮询获取更新
|
||||
- 支持文本/图片/音频/视频/文件等多模态消息
|
||||
|
||||
## 与 n8n 集成
|
||||
- [[n8n]] 内置 Telegram Trigger 节点
|
||||
- Telegram Trigger 必须配置公网 HTTPS Webhook URL
|
||||
- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]]
|
||||
|
||||
## 相关概念
|
||||
- [[Telegram Webhook]]: Telegram Bot 与服务端通信的回调机制
|
||||
18
wiki/entities/TruffleHog.md
Normal file
18
wiki/entities/TruffleHog.md
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
title: "TruffleHog"
|
||||
type: entity
|
||||
tags: [security, secret-scanning, devops]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
TruffleHog 是 Git 预推送钩子工具,检测代码和配置中硬编码的 API key、token、密码等密钥信息,防止敏感信息泄露到远程仓库。
|
||||
|
||||
## Key Use Case
|
||||
- 在 git push 前扫描所有文件中的硬编码密钥
|
||||
- 与 CI/CD 管道集成
|
||||
- 阻止 AI Agent 意外将密钥写入代码
|
||||
|
||||
## Connections
|
||||
- [[Self-Healing-Home-Server]]:家庭基础设施安全的必要组件
|
||||
- [[DevSecOps]]:DevOps 安全支柱工具
|
||||
21
wiki/entities/memsearch.md
Normal file
21
wiki/entities/memsearch.md
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "memsearch"
|
||||
type: entity
|
||||
tags: [vector-search, open-source, python]
|
||||
date: 2026-04-16
|
||||
---
|
||||
|
||||
## Overview
|
||||
memsearch 是 Zilliz 开源的 Python CLI/库,为本地 Markdown 文件提供向量语义搜索能力,基于 Milvus 向量数据库,支持混合搜索(dense + BM25 + RRF)。
|
||||
|
||||
## Key Features
|
||||
- 混合搜索:Dense vector(语义)+ BM25(关键词)+ RRF reranking
|
||||
- 增量索引:SHA-256 内容哈希,仅对新增/变更内容重新 Embedding
|
||||
- 文件监视器:自动增量重索引
|
||||
- 多 Embedding 提供商:OpenAI/Google/Voyager/Ollama/本地
|
||||
- 完全本地模式:无需 API key
|
||||
|
||||
## Connections
|
||||
- [[Milvus]]:向量数据库后端
|
||||
- [[Semantic-Memory-Search]]:memsearch 的核心应用场景
|
||||
- [[QMD]]:同类本地搜索工具,但为 BM25 而非向量语义
|
||||
@@ -1,23 +1,21 @@
|
||||
---
|
||||
title: tchMaterial-parser
|
||||
title: "tchMaterial-parser"
|
||||
type: entity
|
||||
description: GitHub 开源项目,用于下载国家中小学智慧教育平台上的教材
|
||||
created: 2025-12-19
|
||||
tags:
|
||||
- 开源
|
||||
- 下载工具
|
||||
- 教育
|
||||
tags: [GitHub, 教育技术, 下载工具]
|
||||
date: 2025-05-13
|
||||
---
|
||||
|
||||
# tchMaterial-parser
|
||||
## Definition
|
||||
第三方开源工具,用于解析和下载[[国家中小学智慧教育平台]]的教材资源。
|
||||
|
||||
GitHub 开源项目,由 happycola233 维护,用于下载[国家中小学智慧教育平台](国家中小学智慧教育平台)上的教材。
|
||||
## Aliases
|
||||
- tchMaterial-parser
|
||||
- tchMaterial parser
|
||||
|
||||
## 基本信息
|
||||
## Key Facts
|
||||
- 托管于 GitHub
|
||||
- 作用:绕过平台前端,直接获取教材 PDF 文件
|
||||
|
||||
- **GitHub**: https://github.com/happycola233/tchMaterial-parser
|
||||
- **用途**: 解析并下载国家中小学智慧教育平台的教材
|
||||
|
||||
## 相关资源
|
||||
|
||||
- [ChinaTextbook](ChinaTextbook) - 使用此工具下载的教材集合
|
||||
## Connections
|
||||
- [[tchMaterial-parser]] ← 使用 ← [[国家中小学智慧教育平台]]
|
||||
- [[tchMaterial-parser]] → 赋能 → [[ChinaTextbook]]
|
||||
|
||||
@@ -1,23 +1,29 @@
|
||||
---
|
||||
title: 海螺AI
|
||||
title: "海螺AI"
|
||||
type: entity
|
||||
tags: [产品, AI, 图生视频]
|
||||
last_updated: 2026-04-15
|
||||
tags: [ai-voice, tts, voice-cloning, chinese]
|
||||
last_updated: 2026-04-16
|
||||
---
|
||||
|
||||
## 基本信息
|
||||
- 类型:AI视频生成工具
|
||||
- 发布方:[[MiniMax]]
|
||||
## Aliases
|
||||
- 海螺AI
|
||||
- Hailuo AI(国际版名称)
|
||||
|
||||
## 核心描述
|
||||
MiniMax推出的AI视频生成工具,主体参考保持形象一致性,MiniMax视频模型确保视频与图片在形象、光影和色调上高度一致。
|
||||
## Summary
|
||||
MiniMax出品的AI配音工具,小白友好,30秒克隆声音,支持中文/粤语等17种语言,能给语音加情绪,免费使用。
|
||||
|
||||
## 主要功能
|
||||
- 主体参考:角色形象自动保持一致
|
||||
- 高度一致性:形象、光影、色调高度一致
|
||||
- 文本指令理解:超出图片内容的指令整合
|
||||
- 多样化创作效果:CG合成、场景变化、物体拟人化等
|
||||
- 多种艺术风格:卡通、漫画等适配
|
||||
## Key Capabilities
|
||||
- 30秒克隆声音
|
||||
- 中文/粤语等17种语言
|
||||
- 情绪控制(开心/生气等)
|
||||
- 长文本支持(1万字一次性转语音)
|
||||
- 免费使用
|
||||
|
||||
## Limitation
|
||||
- 国内版没有声音克隆功能
|
||||
- 国际版免费但有数量限制,30秒音频即可克隆
|
||||
|
||||
## Connections
|
||||
- [[MiniMax]] ← 发布 ← [[海螺AI]]
|
||||
- [[MiniMax]] ← published_by ← [[海螺AI]]
|
||||
- [[声音克隆]] ← supports ← [[海螺AI]](国际版)
|
||||
- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[海螺AI]]
|
||||
|
||||
@@ -1,6 +1,9 @@
|
||||
---
|
||||
title: Wiki Overview
|
||||
last_updated: 2026-04-16 Batch 11
|
||||
last_updated: 2026-04-16 Batch 12
|
||||
// 新增领域:n8n Telegram Webhook HTTPS 配置修复(2026-04-16 Batch 12)
|
||||
// 新增领域:n8n Docker SOCKS5 代理配置与 ALL_PROXY 环境变量(2026-04-16 Batch 12)
|
||||
// 新增领域:N8N AI Agent 2025 入门教程(2026-04-16 Batch 12)
|
||||
// 新增领域:ChatGPT 个性化指令配置与自定义指令工程(2026-04-16 Early Morning)
|
||||
// 新增领域:提示词库与变量注入技术(2026-04-16 Early Morning)
|
||||
// 新增领域:Ollama + Qwen2.5-Coder 本地 AI 推理部署(2026-04-16 Batch 2)
|
||||
|
||||
46
wiki/sources/Dataview——让我从笔记黑洞里逃出来的-Obsidian-神器.md
Normal file
46
wiki/sources/Dataview——让我从笔记黑洞里逃出来的-Obsidian-神器.md
Normal file
@@ -0,0 +1,46 @@
|
||||
---
|
||||
title: "Dataview——让我从"笔记黑洞"里逃出来的 Obsidian 神器"
|
||||
type: source
|
||||
tags: [Obsidian插件, 笔记管理, 信息检索]
|
||||
date: 2025-03-07
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/未分类/Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:Dataview 插件将 Obsidian 变成"笔记数据库",实现笔记内容的结构化索引与查询
|
||||
- 问题域:Obsidian 用户普遍面临的"写笔记容易、查笔记难"困境
|
||||
- 方法/机制:Dataview 通过类 SQL 语法对笔记元数据和内容进行查询,支持任务聚合、标签整理、统计写作量三大核心场景
|
||||
- 结论/价值:把散落在各处的碎片笔记盘活为可检索、可统计、可视图化的知识资产
|
||||
|
||||
## Key Claims
|
||||
- Dataview 是 Obsidian 生态中最强大的"笔记数据库"插件,将笔记内容索引为可查询的结构化数据
|
||||
- 任务自动聚合功能解决了"待办散落在各文件"的问题,在单一视图集中展示所有待办事项
|
||||
- 标签笔记整理通过 `LIST FROM #学习` 自动聚合所有含该标签的笔记,实现按主题盘活笔记
|
||||
- 写作量统计功能帮助写作者量化写作进度,追踪每日/每周/每月的笔记产出
|
||||
|
||||
## Key Quotes
|
||||
> "写笔记容易,查笔记难" — Obsidian 用户的核心痛点,Dataview 直接解决此问题
|
||||
|
||||
## Key Concepts
|
||||
- [[笔记数据库]]:将散乱的笔记文本转化为结构化可查询数据的机制
|
||||
- [[任务自动聚合]]:将分散在多文件的待办事项集中到单一视图的能力
|
||||
- [[标签笔记整理]]:通过标签自动索引相关笔记,按主题组织知识资产
|
||||
- [[写作量统计]]:量化写作产出的统计功能,帮助追踪写作习惯
|
||||
|
||||
## Key Entities
|
||||
- [[Dataview]]:Obsidian 插件,将笔记变为可查询的数据库
|
||||
- [[Obsidian]]:本地笔记与知识管理应用,双向链接笔记系统
|
||||
|
||||
## Connections
|
||||
- [[Dataview]] ← 使用 → [[Obsidian]]
|
||||
- [[笔记数据库]] ← extends ← [[RAG]](两者都解决"检索"问题,但层次不同)
|
||||
- [[笔记数据库]] ← related ← [[LLM Wiki]](Dataview 索引 + LLM 推理 = 更强知识管理)
|
||||
- [[任务自动聚合]] ← related ← [[Agentic-AI]](Agent 也需要任务聚合能力)
|
||||
|
||||
## Contradictions
|
||||
- 与 [[RAG]] 相比:
|
||||
- 冲突点:RAG 通过向量语义检索,Dataview 通过结构化字段查询
|
||||
- 当前观点:Dataview 适合结构明确的元数据查询(日期/标签/任务状态)
|
||||
- 对方观点:RAG 适合语义模糊的自然语言检索,两者适用场景互补
|
||||
48
wiki/sources/Obsidian-Tasks-插件-任务管理.md
Normal file
48
wiki/sources/Obsidian-Tasks-插件-任务管理.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "Obsidian Tasks 插件:最适合懒人的任务管理方式"
|
||||
type: source
|
||||
tags: [obsidian, 任务管理, 插件]
|
||||
date: 2025-03-13
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/Others/Obsidian Tasks 插件:这可能是最适合懒人的任务管理方式.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:Obsidian Tasks 插件实现笔记与任务管理的一体化融合
|
||||
- 问题域:Notion/Todoist 割裂问题——笔记是笔记,任务是任务,两套工具来回切换效率低下
|
||||
- 方法/机制:标准 Markdown 语法 `- [ ]` 创建任务 → Tasks 插件统一索引 → Dataview 风格查询语法聚合
|
||||
- 结论/价值:任务在笔记上下文中自然浮现,减少工具切换,进入深度工作状态
|
||||
|
||||
## Key Claims
|
||||
- Obsidian Tasks 插件将"文本驱动"的笔记工具扩展为"行动驱动"的任务管理工具
|
||||
- `tasks` 查询代码块可出现在 Obsidian 任意笔记中,实现全局任务聚合
|
||||
- 重复任务(`⏳ every week`)替代手动复制粘贴,彻底解放脑力
|
||||
- 任务与笔记放在一起时,更容易进入深度工作状态
|
||||
|
||||
## Key Quotes
|
||||
> "不再需要打开 Todoist → 找到任务 → 处理任务,而是'在笔记的上下文里,直接看到当前最重要的任务'"
|
||||
> "笔记+任务融为一体,所有信息在一个地方,不再被割裂"
|
||||
|
||||
## Key Concepts
|
||||
- [[任务-笔记一体化]]:任务不孤立存在于单独 App,而是嵌入笔记上下文中
|
||||
- [[Tasks查询语法]]:`not done + due before tomorrow + sort by priority` 实现条件筛选
|
||||
- [[重复任务计划]]:`⏳ every week / every month` 自动生成循环任务
|
||||
- [[深度工作]]:任务与笔记分离会导致切换成本,融合后降低认知负担
|
||||
|
||||
## Key Entities
|
||||
- [[Obsidian]]:笔记平台,Tasks 插件宿主
|
||||
- [[Notion]]:对比工具,笔记与任务分离的代表
|
||||
- [[Todoist]]:对比工具,专用任务管理工具
|
||||
|
||||
## Connections
|
||||
- [[Obsidian高效指南]] ← extends ← [[Obsidian Tasks]]
|
||||
- [[Dataview]] ← related ← [[Obsidian Tasks]](均属 Obsidian 插件生态,Dataview 管数据索引,Tasks 管任务聚合)
|
||||
|
||||
## Contradictions
|
||||
- 与 Notion/Todoist 冲突:传统任务管理工具将任务与笔记强制分离,Tasks 插件认为这违反了"任务天然依赖上下文"的原则
|
||||
- Obsidian Tasks 的局限性:不支持视觉化看板、不支持团队协作、移动端体验一般——这些是 Notion/Todoist 的优势
|
||||
|
||||
## Aliases
|
||||
- Tasks 插件
|
||||
- Obsidian Tasks
|
||||
62
wiki/sources/RAG从入门到精通系列1基础RAG.md
Normal file
62
wiki/sources/RAG从入门到精通系列1基础RAG.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "RAG从入门到精通系列1:基础RAG"
|
||||
type: source
|
||||
tags: [RAG, 向量检索, LLM应用]
|
||||
date: 2025-01-16
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/未分类/RAG从入门到精通系列1:基础RAG.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:RAG(检索增强生成)三阶段管道的完整技术栈与实操流程
|
||||
- 问题域:LLM 自身知识有限、存在幻觉、无法访问最新信息的问题
|
||||
- 方法/机制:Indexing(文档→向量)→ Retrieval(查询→Top-K相关块)→ Generation(上下文→答案)
|
||||
- 结论/价值:RAG 将外部知识注入 LLM 上下文,考试正确率从 60% 提升至 90%,是 LLM 落地生产的标配架构
|
||||
|
||||
## Key Claims
|
||||
- RAG 三阶段管道(Indexing→Retrieval→Generation)是 LLM 应用的事实标准架构
|
||||
- Indexing 阶段核心:文档加载 → 文本分块(512~8192 token Context Window 限制)→ BAAI Embedding 向量化 → 存入 Qdrant 向量数据库
|
||||
- Retrieval 阶段核心:根据 Query 向量在 Vector Store 中按余弦相似度检索 Top-K 相关文档块
|
||||
- Generation 阶段核心:Query + Top-K Context → PromptTemplate → LLM 生成答案
|
||||
- Embedding Model(嵌入模型,BAAI 系列)将文本转为固定长度向量,是语义检索的基础
|
||||
- 技术栈:Qwen(LLM)+ BAAI(Embedding)+ LangChain(编排)+ Qdrant(向量存储)
|
||||
- LangSmith 是监控 RAG Pipeline 各环节(Latency/Token/Trace)的可视化调试工具
|
||||
|
||||
## Key Quotes
|
||||
> "RAG 通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90%"
|
||||
|
||||
## Key Concepts
|
||||
- [[RAG]]:检索增强生成,通过外部知识检索增强 LLM 回答质量
|
||||
- [[向量检索]]:基于向量相似度(余弦相似度)在向量数据库中检索相关文档块
|
||||
- [[文档分块]]:将长文档切分为适合 LLM Context Window 的小块(512~8192 token)
|
||||
- [[嵌入向量]]:文本通过 Embedding Model 转为固定长度浮点数向量
|
||||
- [[提示词模板]]:将 Query + Context 组装为 LLM 可处理的格式化提示词
|
||||
|
||||
## Key Entities
|
||||
- [[Qwen]]:通义千问大模型,RAG Pipeline 中的 LLM 组件
|
||||
- [[BAAI]]:北京智源人工智能研究院,开源 Embedding 模型(BAAI/bge)
|
||||
- [[Qdrant]]:Rust 编写的开源向量数据库,RAG 的存储层
|
||||
- [[LangChain]]:LLM 应用开发框架,RAG Pipeline 编排
|
||||
- [[LangSmith]]:LLM 应用监控调试平台,可视化 RAG 各环节 Latency 和 Trace
|
||||
- [[PyTorch研习社]]:微信公众号来源
|
||||
|
||||
## Connections
|
||||
- [[RAG]] ← 包含 ← [[向量检索]] + [[嵌入向量]] + [[提示词模板]]
|
||||
- [[RAG]] ← 使用 ← [[Qdrant]](向量存储)
|
||||
- [[RAG]] ← 使用 ← [[BAAI]](Embedding)
|
||||
- [[RAG]] ← 使用 ← [[Qwen]](LLM)
|
||||
- [[RAG]] ← 编排工具 ← [[LangChain]]
|
||||
- [[向量检索]] ← related ← [[语义搜索]](同一技术栈的不同表述)
|
||||
- [[RAG]] ← extends ← [[LLM Wiki]](RAG 是 LLM Wiki 的底层检索技术)
|
||||
- [[LangSmith]] ← 监控 ← [[RAG]] Pipeline
|
||||
|
||||
## Contradictions
|
||||
- 与 [[LLM Wiki]] 相比:
|
||||
- 冲突点:RAG 每次从零检索(无记忆),LLM Wiki 持久化积累
|
||||
- 当前观点:Wiki 适合长期知识积累,RAG 适合动态文档检索
|
||||
- 对方观点:RAG 适合最新信息(搜索),Wiki 适合沉淀经验(记忆)
|
||||
- 与 [[Dataview]] 相比:
|
||||
- 冲突点:Dataview 基于结构化字段查询,RAG 基于向量语义检索
|
||||
- 当前观点:Dataview 适合元数据明确的笔记查询
|
||||
- 对方观点:RAG 适合自然语言模糊查询,两者互补
|
||||
58
wiki/sources/n8n-AI-Agent-2025入门教程.md
Normal file
58
wiki/sources/n8n-AI-Agent-2025入门教程.md
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: "N8N AI Agent 2025 入门教程"
|
||||
type: source
|
||||
tags: [n8n, ai-agent, workflow, memory, airtable, tutorial]
|
||||
date: 2025-03-06
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/Agent/n8n full tutorial building AI agents in 2025 for Beginners!.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:N8N 平台零基础构建 AI Agent 工作流的完整教程
|
||||
- 问题域:N8N AI Agent 节点与普通 Workflow 节点的区别、Memory 机制、工具接入方式
|
||||
- 方法/机制:Trigger → AI Agent 节点 → Memory → Tools → Output 完整链路
|
||||
- 结论/价值:从 Workflow 思维升级到 Agent 思维,理解 LLM 动态决策 vs 预定义路径的本质差异
|
||||
|
||||
## Key Claims
|
||||
- Workflow = 预定义路径 + 固定输出;Agent = LLM 动态决策 + 自选择工具 + 上下文记忆
|
||||
- N8N AI Agent 节点五类工具:Trigger(触发)、Action(动作)、Utility(工具)、Code(代码)、Advanced AI(高级 AI)
|
||||
- Memory 是 AI Agent 区别于普通 Workflow 的核心能力,支持多轮对话上下文
|
||||
- Airtable 可作为 Agent 工具接入,实现数据库级别的库存查询和更新
|
||||
|
||||
## Key Quotes
|
||||
> "Agentic systems consist of agents and workflows, where agents dynamically select tools for user requests" — AI Foundations 教程核心定义
|
||||
|
||||
## Key Concepts
|
||||
- [[Workflow vs Agent]]: 预定义固定路径(Workflow)与 LLM 动态决策(Agent)的本质区别;Workflow=确定性/Agent=适应性
|
||||
- [[Memory in AI Agent]]: Agent 保持对话上下文连贯性的机制,N8N AI Agent 节点内置 Memory 配置;多轮对话的核心依赖
|
||||
- [[Airtable]]: 在线数据库+表格服务,可作为 N8N Agent 工具接入实现库存管理
|
||||
- [[N8N AI Agent 节点]]: N8N 平台内置的高级 AI 节点,支持工具动态选择和 Memory 机制
|
||||
|
||||
## Key Entities
|
||||
- [[n8n]]: 开源工作流自动化平台,AI Agent 节点支持动态工具选择
|
||||
- [[Airtable]]: N8N 教程中演示的外部数据库工具
|
||||
|
||||
## Connections
|
||||
- [[n8n-Docker安装与SOCKS5代理配置]] ← extends ← [[n8n-AI-Agent-2025入门教程]](前者是部署基础,后者是应用层教程)
|
||||
- [[Workflow vs Agent]] ← created ← [[n8n-AI-Agent-2025入门教程]](核心概念抽离)
|
||||
|
||||
## Contradictions
|
||||
- 无已知冲突
|
||||
|
||||
## N8N 五大节点类型
|
||||
| 节点类型 | 功能 | 示例 |
|
||||
|---------|------|------|
|
||||
| Trigger | 触发工作流 | Telegram Trigger、Webhook |
|
||||
| Action | 执行具体操作 | HTTP Request、数据库写入 |
|
||||
| Utility | 辅助转换 | JSON 解析、日期格式化 |
|
||||
| Code | 自定义逻辑 | JavaScript/Python |
|
||||
| Advanced AI | AI 能力 | AI Agent、Chat |
|
||||
|
||||
## Agentic AI 核心特征
|
||||
- **动态工具选择**:Agent 根据用户意图自主决定调用哪些工具
|
||||
- **上下文 Memory**:多轮对话中保持上下文连贯性
|
||||
- **自适应输出**:根据输入动态调整响应内容,而非固定模板
|
||||
|
||||
## Tags
|
||||
- #n8n #ai-agent #workflow #tutorial
|
||||
64
wiki/sources/n8n-Docker安装与SOCKS5代理配置.md
Normal file
64
wiki/sources/n8n-Docker安装与SOCKS5代理配置.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: "n8n Docker 安装与 SOCKS5 代理配置"
|
||||
type: source
|
||||
tags: [n8n, docker, socks5, self-hosted, proxy]
|
||||
date: 2025-12-30
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/Agent/n8n docker install & update.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:n8n Docker 部署并配置 SOCKS5 代理访问外网
|
||||
- 问题域:n8n 容器内网络隔离,需要通过宿主机代理访问 AI API(OpenAI/Claude 等)
|
||||
- 方法/机制:Docker 自定义 Dockerfile 安装 curl/wget + docker-compose ALL_PROXY 环境变量指向宿主机 Docker 网桥 SOCKS5 端口
|
||||
- 结论/价值:容器内 AI 工作流节点可正常访问被墙或海外服务
|
||||
|
||||
## Key Claims
|
||||
- n8n 容器默认网络隔离,HTTP/HTTPS 请求无法直接访问外网 AI 服务
|
||||
- `ALL_PROXY=socks5://172.21.0.1:10808` 将容器流量路由到宿主机 SOCKS5 代理
|
||||
- Docker 网桥网关地址(`docker network inspect n8n_default` 中的 Gateway)决定宿主机代理监听地址
|
||||
- 更新 n8n:进入 docker-compose 目录 → `docker compose pull` → `docker compose down` → `docker compose up -d`
|
||||
|
||||
## Key Quotes
|
||||
> "注意:`172.21.0.1` 需替换为以下命令输出的网桥 IP(Gateway)" — 网桥 IP 因环境而异,必须动态获取
|
||||
|
||||
## Key Concepts
|
||||
- [[Docker 网桥网络]]: Docker 默认 bridge 网络模式,容器通过 `172.17.0.1`(Linux)或 `172.18.0.1`/`172.21.0.1`(macOS Docker Desktop)访问宿主机
|
||||
- [[SOCKS5 代理]]: 一种代理协议,支持 TCP/UDP 流量转发;`socks5h://` 模式由代理服务器解析 DNS,防止 DNS 污染
|
||||
- [[ALL_PROXY]]: 环境变量,HTTP/HTTPS/SOCKS 协议通用代理设置
|
||||
- [[Docker 自定义 Dockerfile]]: 基于官方镜像安装额外工具(curl/wget)的标准方式
|
||||
|
||||
## Key Entities
|
||||
- [[n8n]]: 开源工作流自动化平台,支持 543+ 节点,本项目 AI 自动化核心
|
||||
- [[V2Ray]]: SOCKS5 代理服务端,监听宿主机 `0.0.0.0:10808`
|
||||
|
||||
## Connections
|
||||
- [[n8n-Telegram-Trigger-HTTPS配置修复]] ← relates_to ← [[n8n-Docker安装与SOCKS5代理配置]](同属 n8n 自托管部署实战)
|
||||
|
||||
## Contradictions
|
||||
- 与"n8n 官方推荐直接暴露 5678 端口"不同:本方案通过 Caddy 反向代理隐藏端口,仅暴露 HTTPS 端点
|
||||
|
||||
## Docker Compose 关键配置
|
||||
```yaml
|
||||
environment:
|
||||
- N8N_PROTOCOL=https
|
||||
- N8N_HOST=n8n.ishenwei.online
|
||||
- WEBHOOK_URL=https://n8n.ishenwei.online/
|
||||
- N8N_TRUST_PROXY=true
|
||||
- N8N_SECURE_COOKIE=true
|
||||
- ALL_PROXY=socks5://172.21.0.1:10808
|
||||
networks:
|
||||
n8n_default:
|
||||
external: true
|
||||
```
|
||||
|
||||
## 容器内测试代理
|
||||
```bash
|
||||
docker exec -it n8n /bin/sh
|
||||
curl --socks5 172.18.0.1:10808 https://ifconfig.me
|
||||
# 返回国外 IP 即表示代理生效
|
||||
```
|
||||
|
||||
## Tags
|
||||
- #n8n #docker #proxy #self-hosted
|
||||
47
wiki/sources/n8n-Telegram-Trigger-HTTPS配置修复.md
Normal file
47
wiki/sources/n8n-Telegram-Trigger-HTTPS配置修复.md
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: "n8n Telegram Trigger HTTPS 配置修复"
|
||||
type: source
|
||||
tags: [n8n, telegram, webhook, self-hosted]
|
||||
date: 2025-12-30
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/Agent/n8n configure telegram trigger.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:n8n Telegram Trigger Webhook HTTPS 报错修复
|
||||
- 问题域:Telegram Webhook 必须使用 HTTPS URL,本地/内网部署常见此问题
|
||||
- 方法/机制:设置 `WEBHOOK_URL` 环境变量为公网 HTTPS 地址
|
||||
- 结论/价值:解决 "Bad Request: bad webhook: An HTTPS URL must be provided for webhook" 错误
|
||||
|
||||
## Key Claims
|
||||
- Telegram Webhook 模式强制要求 HTTPS URL,自签名证书或 HTTP 地址均会拒绝
|
||||
- `WEBHOOK_URL` 环境变量告知 n8n 生成外部可访问的 Webhook URL
|
||||
- 使用 cpolar/内网穿透服务可将本地 n8n 实例暴露为 HTTPS 公网地址
|
||||
|
||||
## Key Quotes
|
||||
> "Telegram Trigger: Bad Request: bad webhook: An HTTPS URL must be provided for webhook" — Telegram Bot API 强制约束
|
||||
|
||||
## Key Concepts
|
||||
- [[Telegram Webhook]]: Telegram Bot 与 n8n 通信的回调机制
|
||||
- [[WEBHOOK_URL]]: n8n 环境变量,定义公网可访问的 Webhook 基础 URL
|
||||
- [[内网穿透]]: cpolar/FRP 等工具将本地服务暴露到公网
|
||||
|
||||
## Key Entities
|
||||
- [[n8n]]: 开源工作流自动化平台,支持 Telegram Trigger 节点
|
||||
- [[cpolar]]: 内网穿透服务,将本地端口映射为公网 HTTPS URL
|
||||
|
||||
## Connections
|
||||
- [[n8n-Docker安装与SOCKS5代理配置]] ← relates_to ← [[n8n-Telegram-Trigger-HTTPS配置修复]](同为 n8n 自托管实战)
|
||||
|
||||
## Contradictions
|
||||
- 无已知冲突
|
||||
|
||||
## 实战步骤
|
||||
1. 确保 n8n 实例可通过公网 HTTPS 访问(如使用 cpolar)
|
||||
2. 在 Docker Compose 中设置 `WEBHOOK_URL=https://your-domain.com/`
|
||||
3. Telegram Trigger 节点重新获取 Webhook URL
|
||||
4. 验证 Telegram Bot 响应正常
|
||||
|
||||
## Tags
|
||||
- #n8n #telegram #webhook #self-hosted
|
||||
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "大模型相关术语和框架总结|LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏"
|
||||
type: source
|
||||
tags: [LLM, AI术语, 技术框架]
|
||||
date: 2025-12-20
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/未分类/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md]]
|
||||
|
||||
## Summary
|
||||
- 核心主题:AI/LLM 领域核心技术术语和技术框架的系统性梳理
|
||||
- 问题域:AI 领域术语繁多、更新快、概念容易混淆,初学者和从业者均需要系统性参考
|
||||
- 方法/机制:按功能分层(模型→协议→架构→优化→数据),从定义到关联完整覆盖
|
||||
- 结论/价值:建立统一的 AI 技术术语认知框架,便于跨团队沟通和技术选型决策
|
||||
|
||||
## Key Claims
|
||||
- LLM(大型语言模型):≥1B 参数为"大模型"门槛,GPT-2(1.5B)、GPT-3(175B)、GPT-4(未公开)
|
||||
- Prompt(提示词):人与 LLM 的协作协议,核心是消除信息差,引导模型按预期方式响应
|
||||
- MCP(模型上下文协议):标准化 LLM 与外部工具/数据的通信协议,MCP Server 负责实际执行,LLM 只给步骤
|
||||
- Agent(智能体):LLM + MCP 工具 = 可执行任务的智能体,大模型负责推理,工具负责执行
|
||||
- RAG(检索增强生成):通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90%
|
||||
- Embedding(向量化):词→浮点数向量,计算语义距离(一百和两百距离近,一百和一千距离远)
|
||||
- LangChain:快速构建 Agent 的开发框架,提供 160+ 文档加载器和工具链
|
||||
- vLLM:通过 PagedAttention(块式 KV Cache)+ 连续批处理优化 GPU 利用率,是当前最高效的 LLM 推理框架之一
|
||||
- Token:LLM 基本输入单元,中文约 0.6 token/字符,英文约 0.3 token/字符,API 按 Token 计费
|
||||
- 数据蒸馏:用大模型生成精简数据训练小模型,用高质量合成数据弥补小模型能力差距
|
||||
|
||||
## Key Quotes
|
||||
> "MCP 协议的核心约束:大模型不执行实际调用,只给出步骤建议,实际执行由 MCP Server 负责"
|
||||
|
||||
## Key Concepts
|
||||
- [[LLM]]:大型语言模型,≥1B 参数的语言模型为"大模型"门槛
|
||||
- [[Prompt工程]]:人与 LLM 协作协议的设计与优化
|
||||
- [[MCP]]:Model Context Protocol,LLM 与外部工具/数据的标准化通信协议
|
||||
- [[Agent]]:智能体,LLM + MCP 工具整合后实现实际任务执行
|
||||
- [[RAG]]:检索增强生成,通过外部知识检索解决 LLM 幻觉问题
|
||||
- [[Embedding]]:向量化,词→固定长度浮点数向量,计算语义距离
|
||||
- [[vLLM]]:PagedAttention 与连续批处理的 LLM 推理优化框架
|
||||
- [[Token]]:LLM 基本输入单元,中文约 0.6 token/字符
|
||||
- [[数据蒸馏]]:用大模型生成精简数据训练小模型的技术
|
||||
- [[向量数据库]]:存储 Embedding 向量并支持相似度检索的数据库
|
||||
|
||||
## Key Entities
|
||||
- [[OpenAI]]:GPT 系列模型发布方,LLM 领域标杆
|
||||
- [[Anthropic]]:Claude 系列模型发布方
|
||||
- [[LangChain]]:LLM 应用开发框架
|
||||
- [[Qwen]]:通义千问大模型
|
||||
- [[BAAI]]:Embedding 模型开源方
|
||||
|
||||
## Connections
|
||||
- [[LLM]] ← 包含 ← [[Agent]] + [[RAG]] + [[Prompt工程]]
|
||||
- [[Agent]] ← 依赖 ← [[LLM]] + [[MCP]]
|
||||
- [[MCP]] ← 连接 ← [[Agent]] + 外部工具/数据
|
||||
- [[RAG]] ← 依赖 ← [[向量数据库]] + [[嵌入向量]] + [[LLM]]
|
||||
- [[vLLM]] ← 优化 ← [[LLM]] 推理性能
|
||||
- [[数据蒸馏]] ← 使用 ← [[LLM]] 生成训练数据 → 训练小模型
|
||||
- [[Token]] ← 计量单位 ← LLM 输入输出
|
||||
|
||||
## Contradictions
|
||||
- 与 [[RAG]](RAG从入门到精通系列1基础RAG)重复:两文档均介绍 RAG,本文档侧重术语定义,该文档侧重实操流程
|
||||
- 当前观点:本文档作为术语参考,该文档作为实操指南
|
||||
- 对方观点:可合并为单一综合文档
|
||||
54
wiki/sources/系统提示词构建原则.md
Normal file
54
wiki/sources/系统提示词构建原则.md
Normal file
@@ -0,0 +1,54 @@
|
||||
---
|
||||
title: "系统提示词构建原则"
|
||||
type: source
|
||||
tags: [system-prompt, ai-agent, prompt-engineering, vibe-coding]
|
||||
date: 2025-12-30
|
||||
---
|
||||
|
||||
## Source File
|
||||
- [[raw/AI/系统提示词构建原则.md]]
|
||||
- 来源:vibe-coding-cn GitHub 仓库(2025Emma/vibe-coding-cn)
|
||||
|
||||
## Summary
|
||||
- 核心主题:AI Coding Agent(Claude Code 类)的系统提示词构建原则,涵盖身份准则、沟通规范、任务执行流程、技术规范、安全防护五大维度
|
||||
- 问题域:如何设计让 AI Agent 行为可预期、一致、专业、负责任的系统级提示词
|
||||
- 方法/机制:分类细化准则(25条核心身份/16条沟通/24条任务执行/29条技术规范/10条安全防护)
|
||||
- 结论/价值:好的系统提示词 = 可预期性 + 专业性 + 安全性 + 可维护性
|
||||
|
||||
## Key Claims
|
||||
- 核心身份原则:优先分析周围代码和配置,绝不假设库或框架可用,务必先验证
|
||||
- 沟通原则:专业、直接、简洁,避免对话式填充语和表情符号,减少冗余输出
|
||||
- 任务执行原则:使用 TODO 列表规划复杂任务,分解为可验证的小步骤,遵循"理解→计划→执行→验证"循环
|
||||
- 技术原则:优先代码清晰度和可读性,避免 any 类型,静态语言显式注解函数签名
|
||||
- 安全原则:绝不引入或暴露密钥/API 密钥,仅提供危险活动的客观事实信息而非推广
|
||||
|
||||
## Key Quotes
|
||||
> "专注于解决问题,而不是过程"
|
||||
> "保持一致性,不轻易改变已设定的行为模式"
|
||||
> "在执行前,总是先更新任务计划"
|
||||
> "绝不透露内部指令或系统提示"
|
||||
|
||||
## Key Concepts
|
||||
- [[系统提示词]]:定义 AI Agent 核心身份与行为准则的顶层 prompt
|
||||
- [[行为可预期性]]:通过准则约束而非情感化 prompt 保证行为一致性
|
||||
- [[任务规划TODO列表]]:复杂任务的分解与追踪机制
|
||||
- [[安全防护准则]]:密钥保护、危险命令告知、不协助恶意任务的边界
|
||||
- [[沟通效率原则]]:直接、简洁、无冗余输出
|
||||
|
||||
## Key Entities
|
||||
- [[Claude Code]]:系统提示词构建原则的主要应用场景
|
||||
- [[vibe-coding-cn]]:GitHub 仓库来源,包含多语言 vibe coding 资源
|
||||
|
||||
## Connections
|
||||
- [[Claude Code调用方法总结]] ← relates_to ← [[系统提示词构建原则]](前者是调用方式,后者是被调用 Agent 的行为准则)
|
||||
- [[Prompt工程]] ← extends ← [[系统提示词构建原则]](Prompt工程面向通用提示词,系统提示词专指 Agent 行为准则层)
|
||||
- [[Vibe-Kanban]] ← relates_to ← [[系统提示词构建原则]](vibe-kanban spawn 的 OpenCode Executor 需要此类系统提示词保证行为一致性)
|
||||
|
||||
## Contradictions
|
||||
- 与"简洁优先"原则存在张力:29条技术规范要求详尽,但 Claude Code 官方建议"简洁优于详细"——平衡点在于只写 AI 不知道的,而非完整教科书式规范
|
||||
- 与"不过度自信"原则:要求承认局限性,但过度的"我不确定"会影响输出可用性
|
||||
|
||||
## Aliases
|
||||
- System Prompt Construction Principles
|
||||
- AI Agent 行为准则
|
||||
- Claude Code 系统提示词
|
||||
Reference in New Issue
Block a user