Auto-sync

This commit is contained in:
2026-04-14 11:58:16 +08:00
parent abc9369d1f
commit be67293b60
20 changed files with 2246 additions and 9 deletions

BIN
.DS_Store vendored

Binary file not shown.

219
AGENTS.md Normal file
View File

@@ -0,0 +1,219 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by your coding agent. No API key or Python scripts needed — just open this repo in Codex, OpenCode, or any agent that reads this file, and talk to it.
## How to Use
Describe what you want in plain English:
- *"Ingest this file: raw/papers/my-paper.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the knowledge graph"*
Or use shorthand triggers:
- `ingest <file>` → runs the Ingest Workflow
- `query: <question>` → runs the Query Workflow
- `lint` → runs the Lint Workflow
- `build graph` → runs the Graph Workflow
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Agent owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"*
Steps (in order):
1. Read the source document fully
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
- Claim 2
## Key Quotes
> "Quote here" — context
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"*
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint"*
Check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
---
## Graph Workflow
Triggered by: *"build graph"*
First try: `python tools/build_graph.py --open`
If Python/deps unavailable, build manually:
1. Search for all `[[wikilinks]]` across wiki pages
2. Build nodes (one per page) and edges (one per link)
3. Infer implicit relationships not captured by wikilinks — tag `INFERRED` with confidence score; low confidence → `AMBIGUOUS`
4. Write `graph/graph.json` with `{nodes, edges, built: date}`
5. Write `graph/graph.html` as a self-contained vis.js visualization
---
## Naming Conventions
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
## Index Format
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
## Sources
- [Source Title](sources/slug.md) — one-line summary
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
```
## Log Format
`## [YYYY-MM-DD] <operation> | <title>`
Operations: `ingest`, `query`, `lint`, `graph`

230
CLAUDE.md Normal file
View File

@@ -0,0 +1,230 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
## Slash Commands (Claude Code)
| Command | What to say |
|---|---|
| `/wiki-ingest` | `ingest raw/my-article.md` |
| `/wiki-query` | `query: what are the main themes?` |
| `/wiki-lint` | `lint the wiki` |
| `/wiki-graph` | `build the knowledge graph` |
Or just describe what you want in plain English:
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the graph and show me what's connected to RAG"*
Claude Code reads this file automatically and follows the workflows below.
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Claude owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: [] # list of source slugs that inform this page
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"* or `/wiki-ingest`
Steps (in order):
1. Read the source document fully using the Read tool
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` — use the source page format below
4. Update `wiki/index.md` — add entry under Sources section
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity pages for key people, companies, projects mentioned
7. Update/create concept pages for key ideas and frameworks discussed
8. Flag any contradictions with existing wiki content
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
- Claim 2
## Key Quotes
> "Quote here" — context
## Connections
- [[EntityName]] — how they relate
- [[ConceptName]] — how it connects
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"* or `/wiki-query`
Steps:
1. Read `wiki/index.md` to identify relevant pages
2. Read those pages with the Read tool
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint the wiki"* or `/wiki-lint`
Use Grep and Read tools to check for:
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
- **Contradictions** — claims that conflict across pages
- **Stale summaries** — pages not updated after newer sources
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
- **Data gaps** — questions the wiki can't answer; suggest new sources
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
---
## Graph Workflow
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
When the user asks to build the graph, run `tools/build_graph.py` which:
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
- Runs Louvain community detection
- Outputs `graph/graph.json` + `graph/graph.html`
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
1. Use Grep to find all `[[wikilinks]]` across wiki pages
2. Build a node/edge list
3. Write `graph/graph.json` directly
4. Write `graph/graph.html` using the vis.js template
---
## Naming Conventions
- Source slugs: `kebab-case` matching source filename
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
- Source pages: `kebab-case.md`
## Index Format
```markdown
# Wiki Index
## Overview
- [Overview](overview.md) — living synthesis
## Sources
- [Source Title](sources/slug.md) — one-line summary
## Entities
- [Entity Name](entities/EntityName.md) — one-line description
## Concepts
- [Concept Name](concepts/ConceptName.md) — one-line description
## Syntheses
- [Analysis Title](syntheses/slug.md) — what question it answers
```
## Log Format
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
```
grep "^## \[" wiki/log.md | tail -10
```
Operations: `ingest`, `query`, `lint`, `graph`

175
GEMINI.md Normal file
View File

@@ -0,0 +1,175 @@
# LLM Wiki Agent — Schema & Workflow Instructions
This wiki is maintained entirely by Gemini CLI. No API key or Python scripts needed — just open this repo with `gemini` and talk to it.
## How to Use
Describe what you want in plain English:
- *"Ingest this file: raw/papers/my-paper.md"*
- *"What does the wiki say about transformer models?"*
- *"Check the wiki for orphan pages and contradictions"*
- *"Build the knowledge graph"*
Or use shorthand triggers:
- `ingest <file>` → runs the Ingest Workflow
- `query: <question>` → runs the Query Workflow
- `lint` → runs the Lint Workflow
- `build graph` → runs the Graph Workflow
---
## Directory Layout
```
raw/ # Immutable source documents — never modify these
wiki/ # Agent owns this layer entirely
index.md # Catalog of all pages — update on every ingest
log.md # Append-only chronological record
overview.md # Living synthesis across all sources
sources/ # One summary page per source document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
syntheses/ # Saved query answers
graph/ # Auto-generated graph data
tools/ # Optional standalone Python scripts
```
---
## Page Format
Every wiki page uses this frontmatter:
```yaml
---
title: "Page Title"
type: source | entity | concept | synthesis
tags: []
sources: []
last_updated: YYYY-MM-DD
---
```
Use `[[PageName]]` wikilinks to link to other wiki pages.
---
## Ingest Workflow
Triggered by: *"ingest <file>"*
1. Read the source document fully
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
3. Write `wiki/sources/<slug>.md` (source page format below)
4. Update `wiki/index.md` — add entry under Sources
5. Update `wiki/overview.md` — revise synthesis if warranted
6. Update/create entity and concept pages
7. Flag contradictions with existing wiki content
8. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
### Source Page Format
```markdown
---
title: "Source Title"
type: source
tags: []
date: YYYY-MM-DD
source_file: raw/...
---
## Summary
24 sentence summary.
## Key Claims
- Claim 1
## Key Quotes
> "Quote here"
## Connections
- [[EntityName]] — how they relate
## Contradictions
- Contradicts [[OtherPage]] on: ...
```
### Domain-Specific Templates
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
#### Diary / Journal Template
```markdown
---
title: "YYYY-MM-DD Diary"
type: source
tags: [diary]
date: YYYY-MM-DD
---
## Event Summary
...
## Key Decisions
...
## Energy & Mood
...
## Connections
...
## Shifts & Contradictions
...
```
#### Meeting Notes Template
```markdown
---
title: "Meeting Title"
type: source
tags: [meeting]
date: YYYY-MM-DD
---
## Goal
...
## Key Discussions
...
## Decisions Made
...
## Action Items
...
```
---
## Query Workflow
Triggered by: *"query: <question>"*
1. Read `wiki/index.md` — identify relevant pages
2. Read those pages
3. Synthesize answer with `[[PageName]]` citations
4. Offer to save as `wiki/syntheses/<slug>.md`
---
## Lint Workflow
Triggered by: *"lint"*
Check for: orphan pages, broken links, contradictions, stale content, missing entity pages, data gaps.
---
## Graph Workflow
Triggered by: *"build graph"*
Try `python tools/build_graph.py --open` first. If unavailable, build graph.json and graph.html manually from wikilinks.
---
## Naming Conventions
- Source slugs: `kebab-case`
- Entity/Concept pages: `TitleCase.md`
## Log Format
`## [YYYY-MM-DD] <operation> | <title>`

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2023 SamurAIGPT
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

251
README.md
View File

@@ -1,12 +1,245 @@
--- # LLM Wiki Agent
title: nexus
source: [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
author: shenwei
published: **A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.
created:
description: > Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.
tags: []
```
/wiki-ingest raw/papers/attention-is-all-you-need.md
```
```
wiki/
├── index.md catalog of all pages — updated on every ingest
├── log.md append-only record of every operation
├── overview.md living synthesis across all sources
├── sources/ one summary page per source document
├── entities/ people, companies, projects — auto-created
├── concepts/ ideas, frameworks, methods — auto-created
└── syntheses/ query answers filed back as wiki pages
graph/
├── graph.json persistent node/edge data (SHA256-cached)
└── graph.html interactive vis.js visualization — open in any browser
```
## Install
**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.
```bash
git clone https://github.com/SamurAIGPT/llm-wiki-agent.git
cd llm-wiki-agent
```
Open in your agent — no API key or Python setup needed:
```bash
claude # reads CLAUDE.md + .claude/commands/
codex # reads AGENTS.md
opencode # reads AGENTS.md
gemini # reads GEMINI.md
```
## Usage
```
/wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki
/wiki-ingest raw/articles/my-article.md # works on any markdown file
/wiki-query "what are the main themes?" # synthesize answer from wiki pages
/wiki-query "how does X relate to Y?" # with [[wikilink]] citations
/wiki-lint # find orphans, contradictions, gaps
/wiki-graph # build graph.html from all wikilinks
```
Plain English also works with any agent:
```
"Ingest this paper: raw/papers/llama2.md"
"What does the wiki say about attention mechanisms?"
"Check for contradictions across sources"
"Build the knowledge graph and tell me the most connected nodes"
```
Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries.
## What You Get
**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.
**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.
**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.
**Living overview**`wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.
**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.
**Knowledge graph**`graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.
**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.
## Use Cases
### Research
Going deep on a topic over weeks — reading papers, articles, reports.
```
/wiki-ingest raw/papers/attention-is-all-you-need.md
/wiki-ingest raw/papers/llama2.md
/wiki-ingest raw/papers/rag-survey.md
# Wiki builds entity pages (Meta AI, Google Brain) and
# concept pages (Attention, RLHF, Context Window) automatically.
/wiki-query "What are the main approaches to reducing hallucination?"
/wiki-query "How has context window size evolved across models?"
/wiki-lint
# → "No sources on mixture-of-experts — consider the Mixtral paper"
```
By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.
--- ---
# nexus ### Reading a Book
File each chapter as you go. Build out pages for characters, themes, arguments.
```
/wiki-ingest raw/book/chapter-01.md
/wiki-ingest raw/book/chapter-02.md
# Wiki creates entity and theme pages automatically.
/wiki-query "How has the protagonist's motivation evolved?"
/wiki-query "What contradictions exist in the author's argument so far?"
/wiki-graph # → graph.html shows every character/theme and how they connect
```
Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.
---
### Personal Knowledge Base
Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes.
```
/wiki-ingest raw/journal/2026-01-week1.md
/wiki-ingest raw/articles/huberman-sleep-protocol.md
/wiki-ingest raw/articles/atomic-habits-summary.md
/wiki-query "What patterns show up in my journal entries about energy?"
/wiki-query "What habits have I tried and what was the outcome?"
```
The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed.
---
### Business / Team Intelligence
Feed in meeting transcripts, project docs, customer calls.
```
/wiki-ingest raw/meetings/q1-planning-transcript.md
/wiki-ingest raw/docs/product-roadmap-2026.md
/wiki-ingest raw/calls/customer-interview-acme.md
/wiki-query "What feature requests have come up most across customer calls?"
/wiki-query "What decisions were made in Q1 and what was the rationale?"
/wiki-lint
# → "Project X mentioned in 5 pages but no dedicated page"
# → "Roadmap contradicts customer interview on priority of feature Y"
```
The wiki stays current because the agent does the maintenance no one wants to do.
---
### Competitive Analysis
Track a company, market, or technology over time.
```
/wiki-ingest raw/competitors/openai-announcements.md
/wiki-ingest raw/market/ai-funding-report-q1.md
/wiki-query "How do OpenAI and Anthropic differ on safety approach?"
/wiki-query "Which companies announced multimodal models in the last 6 months?"
/wiki-query "Competitive landscape summary as of today" --save
```
## The Graph
Two-pass build:
1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`
2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`
Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.
## CLAUDE.md / AGENTS.md
The schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.
| Agent | Schema file |
|---|---|
| Claude Code | `CLAUDE.md` |
| Codex / OpenCode | `AGENTS.md` |
| Gemini CLI | `GEMINI.md` |
## What Makes This Different from RAG
| RAG | LLM Wiki Agent |
|---|---|
| Re-derives knowledge every query | Compiles once, keeps current |
| Raw chunks as retrieval unit | Structured wiki pages |
| No cross-references | Cross-references pre-built |
| Contradictions surface at query time (maybe) | Flagged at ingest time |
| No accumulation | Every source makes the wiki richer |
## Obsidian Integration
The wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault.
### Vault Symlink Pattern
If you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks:
1. Keep your working agent repository at e.g., `~/llm-wiki-agent`
2. Create a symlink from your main Obsidian vault:
```bash
ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki
```
3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion.
> **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian.
### Recommended .obsidian Config
- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph.
- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`).
## Tips
- File good query answers back with `--save` — your explorations compound just like ingested sources
- The wiki is a git repo — version history for free
- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)
## Tech Stack
NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.
## Related
- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)
- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles
## License
MIT License — see [LICENSE](LICENSE) for details.

101
docs/automated-sync.md Normal file
View File

@@ -0,0 +1,101 @@
# Automated Wiki Synchronization Guide
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
## The Two-Step Architecture
LLM Wiki Agent ingestion is a two-step process:
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
### Step 1: The Master Orchestrator Script
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
```bash
#!/usr/bin/env bash
set -uo pipefail
# Define variables
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
LOG_FILE="$LAB_DIR/automation-cron.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
echo "=====================================================" >> "$LOG_FILE"
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
cd "$LAB_DIR" || exit 1
# 1. Run your personal Vault-to-Raw symlink script here
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
export LLM_MODEL="gemini/gemini-3-flash-preview"
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
while read file; do
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
done
# 3. Heal Graph Context (Auto-resolves broken semantic links)
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
python3 tools/heal.py >> "$LOG_FILE" 2>&1
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
echo "=====================================================" >> "$LOG_FILE"
```
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
### Step 2: System Scheduler (macOS launchd)
For macOS, `launchd` is significantly more robust than `cron`.
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.personal-wiki-sync</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
</array>
<!-- Execute automatically at 2:00 AM daily -->
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>2</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<!-- Run upon system boot if the interval was missed -->
<key>RunAtLoad</key>
<true/>
<!-- Diagnostic Logs -->
<key>StandardOutPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
<key>StandardErrorPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
</dict>
</plist>
```
Load the daemon:
```bash
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
```
### Self-Healing & Health Monitoring
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.

View File

@@ -0,0 +1,14 @@
# CJK Showcase (Chinese Language Example)
This directory demonstrates how LLM Wiki Agent performs with Non-English (CJK) languages.
The agent naturally supports processing Chinese content. With the CJK query bug fixed, you can ingest, query, and linguistically search across Chinese entries without any language-specific configuration.
## Files included in this showcase:
- `raw/2026-04-13-reflection.md`: A sample source document (a personal reflection on career transition).
- `wiki/sources/2026-04-13-reflection.md`: The parsed structured source page.
- `wiki/entities/杨帆.md`: Auto-extracted Chinese entity page.
- `wiki/concepts/AI转型.md`: Auto-extracted Chinese concept page.
Try running `python tools/query.py "关于AI转型的建议"` from the root directory after moving these to your main knowledge base to see how semantic extraction and keyword matching behave in non-English contexts!

View File

@@ -0,0 +1,7 @@
# 2026-04-13 关于AI转型的复盘总结
今天和杨帆深入讨论了土木工程转向AI产品经理的路径。他提到最大的陷阱是“工具旅游Tool Tourism”——很多非技术背景的人沉迷于尝试各种AI工具却忽略了业务本质和产品交付。
真正的破局点在于将大模型视为一种新的计算范式而不是魔术。我们需要关注模型稳定性、成本、并发以及长上下文的召回率。同时我也在思考目前个人的技术栈从玩提示词到掌握Agentic Workflow框架如LangChain或自定义多Agent系统这是一个质的飞跃。
决定下一步减少看泛科普文章直接深入开源社区比如通过贡献代码或者提出架构Issue来积累实际影响力。

0
graph/.gitkeep Normal file
View File

0
raw/.gitkeep Normal file
View File

2
requirements.txt Normal file
View File

@@ -0,0 +1,2 @@
litellm>=1.0.0
networkx>=3.2

454
tools/build_graph.py Normal file
View File

@@ -0,0 +1,454 @@
#!/usr/bin/env python3
"""
Build the knowledge graph from the wiki.
Usage:
python tools/build_graph.py # full rebuild
python tools/build_graph.py --no-infer # skip semantic inference (faster)
python tools/build_graph.py --open # open graph.html in browser after build
Outputs:
graph/graph.json — node/edge data (cached by SHA256)
graph/graph.html — interactive vis.js visualization
Edge types:
EXTRACTED — explicit [[wikilink]] in a page
INFERRED — Claude-detected implicit relationship
AMBIGUOUS — low-confidence inferred relationship
"""
import re
import json
import hashlib
import argparse
import webbrowser
from pathlib import Path
from datetime import date
import os
try:
import networkx as nx
from networkx.algorithms import community as nx_community
HAS_NETWORKX = True
except ImportError:
HAS_NETWORKX = False
print("Warning: networkx not installed. Community detection disabled. Run: pip install networkx")
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
GRAPH_DIR = REPO_ROOT / "graph"
GRAPH_JSON = GRAPH_DIR / "graph.json"
GRAPH_HTML = GRAPH_DIR / "graph.html"
CACHE_FILE = GRAPH_DIR / ".cache.json"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
# Node type → color mapping
TYPE_COLORS = {
"source": "#4CAF50",
"entity": "#2196F3",
"concept": "#FF9800",
"synthesis": "#9C27B0",
"unknown": "#9E9E9E",
}
EDGE_COLORS = {
"EXTRACTED": "#555555",
"INFERRED": "#FF5722",
"AMBIGUOUS": "#BDBDBD",
}
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
import sys
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def sha256(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()
def all_wiki_pages() -> list[Path]:
return [p for p in WIKI_DIR.rglob("*.md")
if p.name not in ("index.md", "log.md", "lint-report.md")]
def extract_wikilinks(content: str) -> list[str]:
return list(set(re.findall(r'\[\[([^\]]+)\]\]', content)))
def extract_frontmatter_type(content: str) -> str:
match = re.search(r'^type:\s*(\S+)', content, re.MULTILINE)
return match.group(1).strip('"\'') if match else "unknown"
def page_id(path: Path) -> str:
return path.relative_to(WIKI_DIR).as_posix().replace(".md", "")
def load_cache() -> dict:
if CACHE_FILE.exists():
try:
return json.loads(CACHE_FILE.read_text())
except (json.JSONDecodeError, IOError):
return {}
return {}
def save_cache(cache: dict):
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
CACHE_FILE.write_text(json.dumps(cache, indent=2))
def build_nodes(pages: list[Path]) -> list[dict]:
nodes = []
for p in pages:
content = read_file(p)
node_type = extract_frontmatter_type(content)
title_match = re.search(r'^title:\s*"?([^"\n]+)"?', content, re.MULTILINE)
label = title_match.group(1).strip() if title_match else p.stem
nodes.append({
"id": page_id(p),
"label": label,
"type": node_type,
"color": TYPE_COLORS.get(node_type, TYPE_COLORS["unknown"]),
"path": str(p.relative_to(REPO_ROOT)),
})
return nodes
def build_extracted_edges(pages: list[Path]) -> list[dict]:
"""Pass 1: deterministic wikilink edges."""
# Build a map from stem (lower) -> page_id for resolution
stem_map = {p.stem.lower(): page_id(p) for p in pages}
edges = []
seen = set()
for p in pages:
content = read_file(p)
src = page_id(p)
for link in extract_wikilinks(content):
target = stem_map.get(link.lower())
if target and target != src:
key = (src, target)
if key not in seen:
seen.add(key)
edges.append({
"from": src,
"to": target,
"type": "EXTRACTED",
"color": EDGE_COLORS["EXTRACTED"],
"confidence": 1.0,
})
return edges
def build_inferred_edges(pages: list[Path], existing_edges: list[dict], cache: dict) -> list[dict]:
"""Pass 2: API-inferred semantic relationships."""
new_edges = []
# Only process pages that changed since last run
changed_pages = []
for p in pages:
content = read_file(p)
h = sha256(content)
entry = cache.get(str(p))
if not isinstance(entry, dict) or entry.get("hash") != h:
changed_pages.append(p)
else:
# Page unchanged: load its inferred edges from cache perfectly
src = page_id(p)
for rel in entry.get("edges", []):
new_edges.append({
"from": src,
"to": rel["to"],
"type": rel.get("type", "INFERRED"),
"title": rel.get("relationship", ""),
"label": "",
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
"confidence": float(rel.get("confidence", 0.7)),
})
if not changed_pages:
print(" no changed pages — skipping semantic inference")
return []
print(f" inferring relationships for {len(changed_pages)} changed pages...")
# Build a summary of existing nodes for context
node_list = "\n".join(f"- {page_id(p)} ({extract_frontmatter_type(read_file(p))})" for p in pages)
existing_edge_summary = "\n".join(
f"- {e['from']}{e['to']} (EXTRACTED)" for e in existing_edges[:30]
)
for p in changed_pages:
content = read_file(p)[:2000] # truncate for context efficiency
src = page_id(p)
prompt = f"""Analyze this wiki page and identify implicit semantic relationships to other pages in the wiki.
Source page: {src}
Content:
{content}
All available pages:
{node_list}
Already-extracted edges from this page:
{existing_edge_summary}
Return ONLY a JSON array of NEW relationships not already captured by explicit wikilinks:
[
{{"to": "page-id", "relationship": "one-line description", "confidence": 0.0-1.0, "type": "INFERRED or AMBIGUOUS"}}
]
Rules:
- Only include pages from the available list above
- Confidence >= 0.7 → INFERRED, < 0.7 → AMBIGUOUS
- Do not repeat edges already in the extracted list
- Return empty array [] if no new relationships found
"""
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=1024)
raw = raw.strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw)
raw = re.sub(r"\s*```$", "", raw)
try:
inferred = json.loads(raw)
valid_rels = []
for rel in inferred:
if isinstance(rel, dict) and "to" in rel:
new_edges.append({
"from": src,
"to": rel["to"],
"type": rel.get("type", "INFERRED"),
"title": rel.get("relationship", ""),
"label": "",
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
"confidence": float(rel.get("confidence", 0.7)),
})
valid_rels.append(rel)
# Save properly to cache
cache[str(p)] = {
"hash": sha256(content),
"edges": valid_rels
}
except (json.JSONDecodeError, TypeError, ValueError):
pass
return new_edges
def detect_communities(nodes: list[dict], edges: list[dict]) -> dict[str, int]:
"""Assign community IDs to nodes using Louvain algorithm."""
if not HAS_NETWORKX:
return {}
G = nx.Graph()
for n in nodes:
G.add_node(n["id"])
for e in edges:
G.add_edge(e["from"], e["to"])
if G.number_of_edges() == 0:
return {}
try:
communities = nx_community.louvain_communities(G, seed=42)
node_to_community = {}
for i, comm in enumerate(communities):
for node in comm:
node_to_community[node] = i
return node_to_community
except Exception:
return {}
COMMUNITY_COLORS = [
"#E91E63", "#00BCD4", "#8BC34A", "#FF5722", "#673AB7",
"#FFC107", "#009688", "#F44336", "#3F51B5", "#CDDC39",
]
def render_html(nodes: list[dict], edges: list[dict]) -> str:
"""Generate self-contained vis.js HTML."""
nodes_json = json.dumps(nodes, indent=2)
edges_json = json.dumps(edges, indent=2)
legend_items = "".join(
f'<span style="background:{color};padding:3px 8px;margin:2px;border-radius:3px;font-size:12px">{t}</span>'
for t, color in TYPE_COLORS.items() if t != "unknown"
)
return f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>LLM Wiki — Knowledge Graph</title>
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<style>
body {{ margin: 0; background: #1a1a2e; font-family: sans-serif; color: #eee; }}
#graph {{ width: 100vw; height: 100vh; }}
#controls {{
position: fixed; top: 10px; left: 10px; background: rgba(0,0,0,0.7);
padding: 12px; border-radius: 8px; z-index: 10; max-width: 260px;
}}
#controls h3 {{ margin: 0 0 8px; font-size: 14px; }}
#search {{ width: 100%; padding: 4px; margin-bottom: 8px; background: #333; color: #eee; border: 1px solid #555; border-radius: 4px; }}
#info {{
position: fixed; bottom: 10px; left: 10px; background: rgba(0,0,0,0.8);
padding: 12px; border-radius: 8px; z-index: 10; max-width: 320px;
display: none;
}}
#stats {{ position: fixed; top: 10px; right: 10px; background: rgba(0,0,0,0.7); padding: 10px; border-radius: 8px; font-size: 12px; }}
</style>
</head>
<body>
<div id="controls">
<h3>LLM Wiki Graph</h3>
<input id="search" type="text" placeholder="Search nodes..." oninput="searchNodes(this.value)">
<div>{legend_items}</div>
<div style="margin-top:8px;font-size:11px;color:#aaa">
<span style="background:#555;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Explicit link<br>
<span style="background:#FF5722;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Inferred
</div>
</div>
<div id="graph"></div>
<div id="info">
<b id="info-title"></b><br>
<span id="info-type" style="font-size:12px;color:#aaa"></span><br>
<span id="info-path" style="font-size:11px;color:#666"></span>
</div>
<div id="stats"></div>
<script>
const nodes = new vis.DataSet({nodes_json});
const edges = new vis.DataSet({edges_json});
const container = document.getElementById("graph");
const network = new vis.Network(container, {{ nodes, edges }}, {{
nodes: {{
shape: "dot",
size: 12,
font: {{ color: "#eee", size: 13 }},
borderWidth: 2,
}},
edges: {{
width: 1.2,
smooth: {{ type: "continuous" }},
arrows: {{ to: {{ enabled: true, scaleFactor: 0.5 }} }},
}},
physics: {{
stabilization: {{ iterations: 150 }},
barnesHut: {{ gravitationalConstant: -8000, springLength: 120 }},
}},
interaction: {{ hover: true, tooltipDelay: 200 }},
}});
network.on("click", params => {{
if (params.nodes.length > 0) {{
const node = nodes.get(params.nodes[0]);
document.getElementById("info").style.display = "block";
document.getElementById("info-title").textContent = node.label;
document.getElementById("info-type").textContent = node.type;
document.getElementById("info-path").textContent = node.path;
}} else {{
document.getElementById("info").style.display = "none";
}}
}});
document.getElementById("stats").textContent =
`${{nodes.length}} nodes · ${{edges.length}} edges`;
function searchNodes(q) {{
const lower = q.toLowerCase();
nodes.forEach(n => {{
nodes.update({{ id: n.id, opacity: (!q || n.label.toLowerCase().includes(lower)) ? 1 : 0.15 }});
}});
}}
</script>
</body>
</html>"""
def append_log(entry: str):
log_path = WIKI_DIR / "log.md"
existing = read_file(log_path)
log_path.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
def build_graph(infer: bool = True, open_browser: bool = False):
pages = all_wiki_pages()
today = date.today().isoformat()
if not pages:
print("Wiki is empty. Ingest some sources first.")
return
print(f"Building graph from {len(pages)} wiki pages...")
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
cache = load_cache()
# Pass 1: extracted edges
print(" Pass 1: extracting wikilinks...")
nodes = build_nodes(pages)
edges = build_extracted_edges(pages)
print(f"{len(edges)} extracted edges")
# Pass 2: inferred edges
if infer:
print(" Pass 2: inferring semantic relationships...")
inferred = build_inferred_edges(pages, edges, cache)
edges.extend(inferred)
print(f"{len(inferred)} inferred edges")
save_cache(cache)
# Community detection
print(" Running Louvain community detection...")
communities = detect_communities(nodes, edges)
for node in nodes:
comm_id = communities.get(node["id"], -1)
if comm_id >= 0:
node["color"] = COMMUNITY_COLORS[comm_id % len(COMMUNITY_COLORS)]
node["group"] = comm_id
# Save graph.json
graph_data = {"nodes": nodes, "edges": edges, "built": today}
GRAPH_JSON.write_text(json.dumps(graph_data, indent=2))
print(f" saved: graph/graph.json ({len(nodes)} nodes, {len(edges)} edges)")
# Save graph.html
html = render_html(nodes, edges)
GRAPH_HTML.write_text(html)
print(f" saved: graph/graph.html")
append_log(f"## [{today}] graph | Knowledge graph rebuilt\n\n{len(nodes)} nodes, {len(edges)} edges ({len([e for e in edges if e['type']=='EXTRACTED'])} extracted, {len([e for e in edges if e['type']=='INFERRED'])} inferred).")
if open_browser:
webbrowser.open(f"file://{GRAPH_HTML.resolve()}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Build LLM Wiki knowledge graph")
parser.add_argument("--no-infer", action="store_true", help="Skip semantic inference (faster)")
parser.add_argument("--open", action="store_true", help="Open graph.html in browser")
args = parser.parse_args()
build_graph(infer=not args.no_infer, open_browser=args.open)

100
tools/heal.py Executable file
View File

@@ -0,0 +1,100 @@
#!/usr/bin/env python3
"""
Graph Self-Healing Tool
Automatically retrieves "Missing Entity Pages" from the wiki and generates
comprehensive definition pages for them using the LLM.
It resolves broken entity links by scanning existing contexts where the entity is referenced.
Usage:
python tools/heal.py
"""
import os
import sys
from pathlib import Path
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
# Ensure tools can be imported
sys.path.insert(0, str(Path(__file__).parent.parent))
from tools.lint import find_missing_entities, all_wiki_pages
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
ENTITIES_DIR = WIKI_DIR / "entities"
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
# Use litellm standard environment variables
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
"""Find up to 15 pages where this entity is mentioned natively."""
sources = []
for p in pages:
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
content = p.read_text(encoding="utf-8")
if entity.lower() in content.lower():
sources.append(p)
return sources[:15]
def heal_missing_entities():
pages = all_wiki_pages()
missing_entities = find_missing_entities(pages)
if not missing_entities:
print("Graph is fully connected. No missing entities found!")
return
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
for entity in missing_entities:
print(f"Healing entity page for: {entity}")
sources = search_sources(entity, pages)
context = ""
for s in sources:
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
Create an Entity definition page for "{entity}".
Here is how the entity appears in the current sources:
{context}
Format:
---
title: "{entity}"
type: entity
tags: []
sources: {[s.name for s in sources]}
---
# {entity}
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
"""
try:
result = call_llm(prompt)
out_path = ENTITIES_DIR / f"{entity}.md"
out_path.write_text(result, encoding="utf-8")
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
except Exception as e:
print(f" [!] Failed to generate {entity}: {e}")
if __name__ == "__main__":
heal_missing_entities()

239
tools/ingest.py Normal file
View File

@@ -0,0 +1,239 @@
#!/usr/bin/env python3
"""
Ingest a source document into the LLM Wiki.
Usage:
python tools/ingest.py <path-to-source>
python tools/ingest.py raw/articles/my-article.md
The LLM reads the source, extracts knowledge, and updates the wiki:
- Creates wiki/sources/<slug>.md
- Updates wiki/index.md
- Updates wiki/overview.md (if warranted)
- Creates/updates entity and concept pages
- Appends to wiki/log.md
- Flags contradictions
"""
import os
import sys
import json
import hashlib
import re
from pathlib import Path
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
LOG_FILE = WIKI_DIR / "log.md"
INDEX_FILE = WIKI_DIR / "index.md"
OVERVIEW_FILE = WIKI_DIR / "overview.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def sha256(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()[:16]
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, max_tokens: int = 8192) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv("LLM_MODEL", "claude-3-5-sonnet-latest")
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def write_file(path: Path, content: str):
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
print(f" wrote: {path.relative_to(REPO_ROOT)}")
def build_wiki_context() -> str:
parts = []
if INDEX_FILE.exists():
parts.append(f"## wiki/index.md\n{read_file(INDEX_FILE)}")
if OVERVIEW_FILE.exists():
parts.append(f"## wiki/overview.md\n{read_file(OVERVIEW_FILE)}")
# Include a few recent source pages for contradiction checking
sources_dir = WIKI_DIR / "sources"
if sources_dir.exists():
recent = sorted(sources_dir.glob("*.md"), key=lambda p: p.stat().st_mtime, reverse=True)[:5]
for p in recent:
parts.append(f"## {p.relative_to(REPO_ROOT)}\n{p.read_text()}")
return "\n\n---\n\n".join(parts)
def parse_json_from_response(text: str) -> dict:
# Strip markdown code fences if present
text = re.sub(r"^```(?:json)?\s*", "", text.strip())
text = re.sub(r"\s*```$", "", text.strip())
# Find the outermost JSON object
match = re.search(r"\{[\s\S]*\}", text)
if not match:
raise ValueError("No JSON object found in response")
return json.loads(match.group())
def update_index(new_entry: str, section: str = "Sources"):
content = read_file(INDEX_FILE)
if not content:
content = "# Wiki Index\n\n## Overview\n- [Overview](overview.md) — living synthesis\n\n## Sources\n\n## Entities\n\n## Concepts\n\n## Syntheses\n"
section_header = f"## {section}"
if section_header in content:
content = content.replace(section_header + "\n", section_header + "\n" + new_entry + "\n")
else:
content += f"\n{section_header}\n{new_entry}\n"
write_file(INDEX_FILE, content)
def append_log(entry: str):
existing = read_file(LOG_FILE)
write_file(LOG_FILE, entry.strip() + "\n\n" + existing)
def ingest(source_path: str):
source = Path(source_path)
if not source.exists():
print(f"Error: file not found: {source_path}")
sys.exit(1)
source_content = source.read_text(encoding="utf-8")
source_hash = sha256(source_content)
today = date.today().isoformat()
print(f"\nIngesting: {source.name} (hash: {source_hash})")
wiki_context = build_wiki_context()
schema = read_file(SCHEMA_FILE)
schema = read_file(SCHEMA_FILE)
prompt = f"""You are maintaining an LLM Wiki. Process this source document and integrate its knowledge into the wiki.
Schema and conventions:
{schema}
Current wiki state (index + recent pages):
{wiki_context if wiki_context else "(wiki is empty — this is the first source)"}
New source to ingest (file: {source.relative_to(REPO_ROOT) if source.is_relative_to(REPO_ROOT) else source.name}):
=== SOURCE START ===
{source_content}
=== SOURCE END ===
Today's date: {today}
Return ONLY a valid JSON object with these fields (no markdown fences, no prose outside the JSON):
{{
"title": "Human-readable title for this source",
"slug": "kebab-case-slug-for-filename",
"source_page": "full markdown content for wiki/sources/<slug>.md — use the source page format from the schema",
"index_entry": "- [Title](sources/slug.md) — one-line summary",
"overview_update": "full updated content for wiki/overview.md, or null if no update needed",
"entity_pages": [
{{"path": "entities/EntityName.md", "content": "full markdown content"}}
],
"concept_pages": [
{{"path": "concepts/ConceptName.md", "content": "full markdown content"}}
],
"contradictions": ["describe any contradiction with existing wiki content, or empty list"],
"log_entry": "## [{today}] ingest | <title>\\n\\nAdded source. Key claims: ..."
}}
"""
print(f" calling API (model: ...)")
raw = call_llm(prompt, max_tokens=8192)
try:
data = parse_json_from_response(raw)
except (ValueError, json.JSONDecodeError) as e:
print(f"Error parsing API response: {e}")
print("Raw response saved to /tmp/ingest_debug.txt")
Path("/tmp/ingest_debug.txt").write_text(raw)
sys.exit(1)
# Write source page
slug = data["slug"]
write_file(WIKI_DIR / "sources" / f"{slug}.md", data["source_page"])
# Write entity pages
for page in data.get("entity_pages", []):
write_file(WIKI_DIR / page["path"], page["content"])
# Write concept pages
for page in data.get("concept_pages", []):
write_file(WIKI_DIR / page["path"], page["content"])
# Update overview
if data.get("overview_update"):
write_file(OVERVIEW_FILE, data["overview_update"])
# Update index
update_index(data["index_entry"], section="Sources")
# Append log
append_log(data["log_entry"])
# Report contradictions
contradictions = data.get("contradictions", [])
if contradictions:
print("\n ⚠️ Contradictions detected:")
for c in contradictions:
print(f" - {c}")
print(f"\nDone. Ingested: {data['title']}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python tools/ingest.py <path-to-source> [path2 ...] [dir1 ...]")
sys.exit(1)
paths_to_process = []
for arg in sys.argv[1:]:
p = Path(arg)
if p.is_file() and p.suffix == ".md":
paths_to_process.append(p)
elif p.is_dir():
for f in p.rglob("*.md"):
if f.is_file():
paths_to_process.append(f)
else:
import glob
for f in glob.glob(arg, recursive=True):
g_p = Path(f)
if g_p.is_file() and g_p.suffix == ".md":
paths_to_process.append(g_p)
# Deduplicate while preserving order
unique_paths = []
seen = set()
for p in paths_to_process:
abs_p = p.resolve()
if abs_p not in seen:
seen.add(abs_p)
unique_paths.append(p)
if not unique_paths:
print("Error: no markdown files found to ingest.")
sys.exit(1)
if len(unique_paths) > 1:
print(f"Batch mode: found {len(unique_paths)} files to ingest.")
for p in unique_paths:
ingest(str(p))

210
tools/lint.py Normal file
View File

@@ -0,0 +1,210 @@
#!/usr/bin/env python3
"""
Lint the LLM Wiki for health issues.
Usage:
python tools/lint.py
python tools/lint.py --save # save lint report to wiki/lint-report.md
Checks:
- Orphan pages (no inbound wikilinks from other pages)
- Broken wikilinks (pointing to pages that don't exist)
- Missing entity pages (entities mentioned in 3+ pages but no page)
- Contradictions between pages
- Data gaps and suggested new sources
"""
import re
import sys
import argparse
from pathlib import Path
from collections import defaultdict
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def all_wiki_pages() -> list[Path]:
return [p for p in WIKI_DIR.rglob("*.md")
if p.name not in ("index.md", "log.md", "lint-report.md")]
def extract_wikilinks(content: str) -> list[str]:
return re.findall(r'\[\[([^\]]+)\]\]', content)
def page_name_to_path(name: str) -> list[Path]:
"""Try to resolve a [[WikiLink]] to a file path."""
candidates = []
for p in all_wiki_pages():
if p.stem.lower() == name.lower() or p.stem == name:
candidates.append(p)
return candidates
def find_orphans(pages: list[Path]) -> list[Path]:
inbound = defaultdict(int)
for p in pages:
content = read_file(p)
for link in extract_wikilinks(content):
resolved = page_name_to_path(link)
for r in resolved:
inbound[r] += 1
return [p for p in pages if inbound[p] == 0 and p != WIKI_DIR / "overview.md"]
def find_broken_links(pages: list[Path]) -> list[tuple[Path, str]]:
broken = []
for p in pages:
content = read_file(p)
for link in extract_wikilinks(content):
if not page_name_to_path(link):
broken.append((p, link))
return broken
def find_missing_entities(pages: list[Path]) -> list[str]:
"""Find entity-like names mentioned in 3+ pages but lacking their own page."""
mention_counts: dict[str, int] = defaultdict(int)
existing_pages = {p.stem.lower() for p in pages}
for p in pages:
content = read_file(p)
links = extract_wikilinks(content)
for link in links:
if link.lower() not in existing_pages:
mention_counts[link] += 1
return [name for name, count in mention_counts.items() if count >= 3]
def run_lint():
pages = all_wiki_pages()
today = date.today().isoformat()
if not pages:
print("Wiki is empty. Nothing to lint.")
return ""
print(f"Linting {len(pages)} wiki pages...")
# Deterministic checks
orphans = find_orphans(pages)
broken = find_broken_links(pages)
missing_entities = find_missing_entities(pages)
print(f" orphans: {len(orphans)}")
print(f" broken links: {len(broken)}")
print(f" missing entity pages: {len(missing_entities)}")
# Build context for semantic checks (contradictions, gaps)
# Use a sample of pages to stay within context limits
sample = pages[:20]
pages_context = ""
for p in sample:
rel = p.relative_to(REPO_ROOT)
pages_context += f"\n\n### {rel}\n{read_file(p)[:1500]}" # truncate long pages
print(" running semantic lint via API...")
prompt = f"""You are linting an LLM Wiki. Review the pages below and identify:
1. Contradictions between pages (claims that conflict)
2. Stale content (summaries that newer sources have superseded)
3. Data gaps (important questions the wiki can't answer — suggest specific sources to find)
4. Concepts mentioned but lacking depth
Wiki pages (sample of {len(sample)} pages):
{pages_context}
Return a markdown lint report with these sections:
## Contradictions
## Stale Content
## Data Gaps & Suggested Sources
## Concepts Needing More Depth
Be specific — name the exact pages and claims involved.
"""
semantic_report = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=3000)
# Compose full report
report_lines = [
f"# Wiki Lint Report — {today}",
"",
f"Scanned {len(pages)} pages.",
"",
"## Structural Issues",
"",
]
if orphans:
report_lines.append("### Orphan Pages (no inbound links)")
for p in orphans:
report_lines.append(f"- `{p.relative_to(REPO_ROOT)}`")
report_lines.append("")
if broken:
report_lines.append("### Broken Wikilinks")
for page, link in broken:
report_lines.append(f"- `{page.relative_to(REPO_ROOT)}` links to `[[{link}]]` — not found")
report_lines.append("")
if missing_entities:
report_lines.append("### Missing Entity Pages (mentioned 3+ times but no page)")
for name in missing_entities:
report_lines.append(f"- `[[{name}]]`")
report_lines.append("")
if not orphans and not broken and not missing_entities:
report_lines.append("No structural issues found.")
report_lines.append("")
report_lines.append("---")
report_lines.append("")
report_lines.append(semantic_report)
report = "\n".join(report_lines)
print("\n" + report)
return report
def append_log(entry: str):
existing = read_file(LOG_FILE)
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Lint the LLM Wiki")
parser.add_argument("--save", action="store_true", help="Save lint report to wiki/lint-report.md")
args = parser.parse_args()
report = run_lint()
if args.save and report:
report_path = WIKI_DIR / "lint-report.md"
report_path.write_text(report, encoding="utf-8")
print(f"\nSaved: {report_path.relative_to(REPO_ROOT)}")
today = date.today().isoformat()
append_log(f"## [{today}] lint | Wiki health check\n\nRan lint. See lint-report.md for details.")

192
tools/query.py Normal file
View File

@@ -0,0 +1,192 @@
#!/usr/bin/env python3
"""
Query the LLM Wiki.
Usage:
python tools/query.py "What are the main themes across all sources?"
python tools/query.py "How does ConceptA relate to ConceptB?" --save
python tools/query.py "Summarize everything about EntityName" --save synthesis/my-analysis.md
Flags:
--save Save the answer back into the wiki (prompts for filename)
--save <path> Save to a specific wiki path
"""
import sys
import re
import json
import argparse
from pathlib import Path
from datetime import date
import os
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
INDEX_FILE = WIKI_DIR / "index.md"
LOG_FILE = WIKI_DIR / "log.md"
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
def read_file(path: Path) -> str:
return path.read_text(encoding="utf-8") if path.exists() else ""
def write_file(path: Path, content: str):
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
print(f" saved: {path.relative_to(REPO_ROOT)}")
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
model = os.getenv(model_env, default_model)
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def find_relevant_pages(question: str, index_content: str) -> list[Path]:
"""Extract linked pages from index that seem relevant to the question."""
# Pull all [[links]] and markdown links from index
md_links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', index_content)
question_lower = question.lower()
relevant = []
for title, href in md_links:
title_lower = title.lower()
match = False
# 1. English/Space-separated: check words > 3 chars
if any(word in question_lower for word in title_lower.split() if len(word) > 3):
match = True
# 2. Exact substring match for the whole title (useful for short CJK titles, e.g. len=2)
elif len(title_lower) >= 2 and title_lower in question_lower:
match = True
# 3. CJK chunks: find contiguous non-ASCII characters (len >= 2) and check if in question
elif any(chunk in question_lower for chunk in re.findall(r'[^\x00-\x7F]{2,}', title_lower)):
match = True
if match:
p = WIKI_DIR / href
if p.exists() and p not in relevant:
relevant.append(p)
# Always include overview
overview = WIKI_DIR / "overview.md"
if overview.exists() and overview not in relevant:
relevant.insert(0, overview)
return relevant[:12] # cap to avoid context overflow
def append_log(entry: str):
existing = read_file(LOG_FILE)
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
def query(question: str, save_path: str | None = None):
today = date.today().isoformat()
# Step 1: Read index
index_content = read_file(INDEX_FILE)
if not index_content:
print("Wiki is empty. Ingest some sources first with: python tools/ingest.py <source>")
sys.exit(1)
# Step 2: Find relevant pages
relevant_pages = find_relevant_pages(question, index_content)
# If no keyword match, ask Claude to identify relevant pages from the index
if not relevant_pages or len(relevant_pages) <= 1:
print(" selecting relevant pages via API...")
prompt = f"Given this wiki index:\n\n{index_content}\n\nWhich pages are most relevant to answering: \"{question}\"\n\nReturn ONLY a JSON array of relative file paths (as listed in the index), e.g. [\"sources/foo.md\", \"concepts/Bar.md\"]. Maximum 10 pages."
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=512)
raw = raw.strip()
raw = re.sub(r"^```(?:json)?\s*", "", raw)
raw = re.sub(r"\s*```$", "", raw)
try:
paths = json.loads(raw)
relevant_pages = [WIKI_DIR / p for p in paths if (WIKI_DIR / p).exists()]
except (json.JSONDecodeError, TypeError):
pass
# Step 3: Read relevant pages
pages_context = ""
for p in relevant_pages:
rel = p.relative_to(REPO_ROOT)
pages_context += f"\n\n### {rel}\n{p.read_text(encoding='utf-8')}"
if not pages_context:
pages_context = f"\n\n### wiki/index.md\n{index_content}"
schema = read_file(SCHEMA_FILE)
# Step 4: Synthesize answer
print(f" synthesizing answer from {len(relevant_pages)} pages...")
prompt = f"""You are querying an LLM Wiki to answer a question. Use the wiki pages below to synthesize a thorough answer. Cite sources using [[PageName]] wikilink syntax.
Schema:
{schema}
Wiki pages:
{pages_context}
Question: {question}
Write a well-structured markdown answer with headers, bullets, and [[wikilink]] citations. At the end, add a ## Sources section listing the pages you drew from.
"""
answer = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=4096)
print("\n" + "=" * 60)
print(answer)
print("=" * 60)
# Step 5: Optionally save answer
if save_path is not None:
if save_path == "":
# Prompt for filename
slug = input("\nSave as (slug, e.g. 'my-analysis'): ").strip()
if not slug:
print("Skipping save.")
return
save_path = f"syntheses/{slug}.md"
full_save_path = WIKI_DIR / save_path
frontmatter = f"""---
title: "{question[:80]}"
type: synthesis
tags: []
sources: []
last_updated: {today}
---
"""
write_file(full_save_path, frontmatter + answer)
# Update index
index_content = read_file(INDEX_FILE)
entry = f"- [{question[:60]}]({save_path}) — synthesis"
if "## Syntheses" in index_content:
index_content = index_content.replace("## Syntheses\n", f"## Syntheses\n{entry}\n")
INDEX_FILE.write_text(index_content, encoding="utf-8")
print(f" indexed: {save_path}")
# Append to log
append_log(f"## [{today}] query | {question[:80]}\n\nSynthesized answer from {len(relevant_pages)} pages." +
(f" Saved to {save_path}." if save_path else ""))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Query the LLM Wiki")
parser.add_argument("question", help="Question to ask the wiki")
parser.add_argument("--save", nargs="?", const="", default=None,
help="Save answer to wiki (optionally specify path)")
args = parser.parse_args()
query(args.question, args.save)

14
wiki/index.md Normal file
View File

@@ -0,0 +1,14 @@
# Wiki Index
This file is maintained by the LLM. Updated on every ingest.
## Overview
- [Overview](overview.md) — living synthesis across all sources
## Sources
## Entities
## Concepts
## Syntheses

9
wiki/log.md Normal file
View File

@@ -0,0 +1,9 @@
# Wiki Log
Append-only chronological record of all operations.
Format: `## [YYYY-MM-DD] <operation> | <title>`
Parse recent entries: `grep "^## \[" wiki/log.md | tail -10`
---

17
wiki/overview.md Normal file
View File

@@ -0,0 +1,17 @@
---
title: "Overview"
type: synthesis
tags: []
sources: []
last_updated: ""
---
# Overview
*This page is maintained by the LLM. It is updated on every ingest to reflect the current synthesis across all sources.*
No sources ingested yet. Add your first source with:
```bash
python tools/ingest.py raw/your-source.md
```