Rewrite README for clarity and impact

- Lead with one-sentence hook + output structure upfront
- Add What You Get section naming concrete deliverables
- Consolidate agent compatibility into schema file table
- Add tech stack one-liner
- Streamline use cases, quick start, and graph sections
This commit is contained in:
Anil Matcha
2026-04-07 08:21:48 +05:30
parent e94fdbdafe
commit d8ac6107bf

370
README.md
View File

@@ -2,125 +2,199 @@
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
**A personal knowledge base that builds and maintains itself.** Drop in source documents — articles, papers, notes — and the LLM reads them, extracts the knowledge, and integrates everything into a persistent, interlinked wiki. You never write the wiki. Claude does. **A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.
Unlike RAG systems that re-derive knowledge from scratch on every query, LLM Wiki Agent compiles knowledge once and keeps it current. Cross-references are pre-built. Contradictions are flagged at ingest time. Every new source makes the wiki richer. > Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.
## How It Works
``` ```
You drop a source → Claude reads it → wiki pages are created/updated → graph is rebuilt /wiki-ingest raw/papers/attention-is-all-you-need.md
You ask a question → Claude reads relevant wiki pages → synthesizes answer with citations
``` ```
Three layers: ```
wiki/
├── index.md catalog of all pages — updated on every ingest
├── log.md append-only record of every operation
├── overview.md living synthesis across all sources
├── sources/ one summary page per source document
├── entities/ people, companies, projects — auto-created
├── concepts/ ideas, frameworks, methods — auto-created
└── syntheses/ query answers filed back as wiki pages
graph/
├── graph.json persistent node/edge data (SHA256-cached)
└── graph.html interactive vis.js visualization — open in any browser
```
- **`raw/`** — your source documents (immutable, you own this) ## Install
- **`wiki/`** — Claude-maintained markdown pages (Claude writes, you read)
- **`graph/`** — auto-generated knowledge graph visualization
## Quick Start **Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.
```bash ```bash
git clone https://github.com/SamurAIGPT/llm-wiki-agent.git git clone https://github.com/SamurAIGPT/llm-wiki-agent.git
cd llm-wiki-agent cd llm-wiki-agent
``` ```
Open it in your coding agent — **no API key or Python setup needed**: Open in your agent — no API key or Python setup needed:
```bash ```bash
claude # Claude Code claude # reads CLAUDE.md + .claude/commands/
codex # OpenAI Codex codex # reads AGENTS.md
opencode # OpenCode / Pear AI opencode # reads AGENTS.md
gemini # Gemini CLI gemini # reads GEMINI.md
``` ```
Each agent reads its config file automatically (`CLAUDE.md`, `AGENTS.md`, or `GEMINI.md`) and follows the same workflows. Then just talk to it: ## Usage
``` ```
# Claude Code — slash commands: /wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki
/wiki-ingest raw/articles/my-article.md /wiki-ingest raw/articles/my-article.md # works on any markdown file
/wiki-query what are the main themes across all sources?
/wiki-query "what are the main themes?" # synthesize answer from wiki pages
/wiki-query "how does X relate to Y?" # with [[wikilink]] citations
/wiki-lint # find orphans, contradictions, gaps
/wiki-graph # build graph.html from all wikilinks
```
Plain English also works with any agent:
```
"Ingest this paper: raw/papers/llama2.md"
"What does the wiki say about attention mechanisms?"
"Check for contradictions across sources"
"Build the knowledge graph and tell me the most connected nodes"
```
Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries.
## What You Get
**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.
**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.
**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.
**Living overview**`wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.
**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.
**Knowledge graph**`graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.
**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.
## Use Cases
### Research
Going deep on a topic over weeks — reading papers, articles, reports.
```
/wiki-ingest raw/papers/attention-is-all-you-need.md
/wiki-ingest raw/papers/llama2.md
/wiki-ingest raw/papers/rag-survey.md
# Wiki builds entity pages (Meta AI, Google Brain) and
# concept pages (Attention, RLHF, Context Window) automatically.
/wiki-query "What are the main approaches to reducing hallucination?"
/wiki-query "How has context window size evolved across models?"
/wiki-lint /wiki-lint
/wiki-graph # → "No sources on mixture-of-experts — consider the Mixtral paper"
# Any agent — plain English works too:
"Ingest this paper: raw/papers/my-paper.md"
"What does the wiki say about X?"
"Check for contradictions"
"Build the knowledge graph"
``` ```
| Agent | Config file | By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.
|---|---|
| [Claude Code](https://claude.ai/code) | `CLAUDE.md` + `.claude/commands/` |
| [OpenAI Codex](https://openai.com/codex) | `AGENTS.md` |
| OpenCode / Pear AI | `AGENTS.md` |
| [Gemini CLI](https://github.com/google-gemini/gemini-cli) | `GEMINI.md` |
> **Standalone use** (without a coding agent): `pip install -r requirements.txt`, set `ANTHROPIC_API_KEY`, then use `python tools/ingest.py`, `python tools/query.py`, etc. ---
## Architecture ### Reading a Book
File each chapter as you go. Build out pages for characters, themes, arguments.
``` ```
raw/ ← your sources (never modified by LLM) /wiki-ingest raw/book/chapter-01.md
wiki/ /wiki-ingest raw/book/chapter-02.md
index.md ← catalog of all pages (updated on every ingest)
log.md ← append-only operation log # Wiki creates entity and theme pages automatically.
overview.md ← living synthesis across all sources
sources/ ← one page per source document /wiki-query "How has the protagonist's motivation evolved?"
entities/ ← people, companies, projects /wiki-query "What contradictions exist in the author's argument so far?"
concepts/ ← ideas, frameworks, methods
syntheses/ ← answers to queries, filed back as pages /wiki-graph # → graph.html shows every character/theme and how they connect
graph/
graph.json ← node/edge data (SHA256-cached)
graph.html ← interactive vis.js visualization
tools/
ingest.py ← process a new source
query.py ← ask a question
lint.py ← health-check the wiki
build_graph.py ← rebuild the knowledge graph
CLAUDE.md ← schema and workflow instructions for the LLM
``` ```
## Commands Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.
### Claude Code (primary — no API key) ---
| Slash command | What it does | ### Personal Knowledge Base
|---|---|
| `/wiki-ingest <file>` | Read a source, update wiki pages, append to log |
| `/wiki-query <question>` | Search wiki, synthesize answer with citations |
| `/wiki-lint` | Check for orphans, broken links, contradictions, gaps |
| `/wiki-graph` | Build knowledge graph (`graph.json` + `graph.html`) |
Or describe what you want in plain English — Claude Code follows `CLAUDE.md` and does the right thing. Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes.
### Standalone Python (optional — requires `ANTHROPIC_API_KEY`) ```
/wiki-ingest raw/journal/2026-01-week1.md
/wiki-ingest raw/articles/huberman-sleep-protocol.md
/wiki-ingest raw/articles/atomic-habits-summary.md
| Command | What it does | /wiki-query "What patterns show up in my journal entries about energy?"
|---|---| /wiki-query "What habits have I tried and what was the outcome?"
| `python tools/ingest.py <file>` | Ingest a source | ```
| `python tools/query.py "<question>"` | Query the wiki |
| `python tools/query.py "<question>" --save` | Query and file answer back | The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed.
| `python tools/lint.py` | Lint the wiki |
| `python tools/build_graph.py` | Build graph | ---
| `python tools/build_graph.py --no-infer` | Build graph (skip inference, faster) |
| `python tools/build_graph.py --open` | Build and open in browser | ### Business / Team Intelligence
Feed in meeting transcripts, project docs, customer calls.
```
/wiki-ingest raw/meetings/q1-planning-transcript.md
/wiki-ingest raw/docs/product-roadmap-2026.md
/wiki-ingest raw/calls/customer-interview-acme.md
/wiki-query "What feature requests have come up most across customer calls?"
/wiki-query "What decisions were made in Q1 and what was the rationale?"
/wiki-lint
# → "Project X mentioned in 5 pages but no dedicated page"
# → "Roadmap contradicts customer interview on priority of feature Y"
```
The wiki stays current because the agent does the maintenance no one wants to do.
---
### Competitive Analysis
Track a company, market, or technology over time.
```
/wiki-ingest raw/competitors/openai-announcements.md
/wiki-ingest raw/market/ai-funding-report-q1.md
/wiki-query "How do OpenAI and Anthropic differ on safety approach?"
/wiki-query "Which companies announced multimodal models in the last 6 months?"
/wiki-query "Competitive landscape summary as of today" --save
```
## The Graph ## The Graph
`build_graph.py` runs two passes: Two-pass build:
1. **Deterministic** — parse all `[[wikilinks]]` in every page → explicit edges tagged `EXTRACTED` 1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`
2. **Semantic**Claude infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence) or `AMBIGUOUS` 2. **Semantic**agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`
Community detection (Louvain) clusters nodes by topic. The output is a self-contained `graph.html` — open it in any browser. SHA256 caching means only changed pages are reprocessed. Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.
## CLAUDE.md ## CLAUDE.md / AGENTS.md
`CLAUDE.md` is the schema document — it tells the LLM how to maintain the wiki. It defines page formats, ingest/query/lint workflows, naming conventions, and log format. This is the key configuration file. Edit it to customize behavior for your domain. The schema file tells the agent how to maintain the wiki page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.
| Agent | Schema file |
|---|---|
| Claude Code | `CLAUDE.md` |
| Codex / OpenCode | `AGENTS.md` |
| Gemini CLI | `GEMINI.md` |
## What Makes This Different from RAG ## What Makes This Different from RAG
@@ -132,141 +206,23 @@ Community detection (Louvain) clusters nodes by topic. The output is a self-cont
| Contradictions surface at query time (maybe) | Flagged at ingest time | | Contradictions surface at query time (maybe) | Flagged at ingest time |
| No accumulation | Every source makes the wiki richer | | No accumulation | Every source makes the wiki richer |
## Use Cases
### Research
Going deep on a topic over weeks or months — reading papers, articles, reports.
```
# Each paper you read gets ingested:
/wiki-ingest raw/papers/attention-is-all-you-need.md
/wiki-ingest raw/papers/llama2.md
/wiki-ingest raw/papers/rag-survey.md
# Wiki builds up entity pages (e.g. "Meta AI", "Google Brain") and
# concept pages (e.g. "Attention Mechanism", "RLHF") automatically.
# Ask synthesis questions across everything you've read:
/wiki-query "What are the main approaches to reducing hallucination?"
/wiki-query "How has context window size evolved across models?"
# Check where your knowledge has gaps:
/wiki-lint
# → "No sources on mixture-of-experts — consider reading the Mixtral paper"
```
By the end of a research project you have a structured, interlinked reference that reflects everything you've read — not a folder of PDFs you'll never reopen.
---
### Reading a Book
File each chapter as you go. Build out pages for characters, themes, plot threads.
```
# After each chapter:
/wiki-ingest raw/book/chapter-01-the-beginning.md
/wiki-ingest raw/book/chapter-02-the-conflict.md
# Wiki creates pages like:
# entities/ElonMusk.md, entities/Tesla.md
# concepts/FirstPrinciplesThinking.md
# Mid-book:
/wiki-query "How has the protagonist's motivation evolved?"
/wiki-query "What contradictions exist in the author's argument so far?"
# End of book — build the graph:
/wiki-graph
# Open graph.html → see every character/theme/event and how they connect
```
Think fan wikis like the Tolkien Gateway — thousands of interlinked pages. You can build something like that as you read, with the agent doing all the cross-referencing.
---
### Personal Knowledge Base
Track goals, health, psychology, self-improvement — file journal entries, articles, podcast notes.
```
# File your journal entries:
/wiki-ingest raw/journal/2026-01-week1.md
/wiki-ingest raw/journal/2026-01-week2.md
# File articles and podcast notes that resonated:
/wiki-ingest raw/articles/huberman-sleep-protocol.md
/wiki-ingest raw/articles/atomic-habits-summary.md
# Ask introspective questions:
/wiki-query "What patterns show up in my journal entries about energy levels?"
/wiki-query "What habits have I tried and what was the outcome?"
# The wiki builds a structured picture of you over time —
# entities like "Sleep", "Exercise", "Deep Work" accumulate evidence
# from every source you've filed.
```
---
### Business / Team Intelligence
Feed in meeting transcripts, Slack exports, project docs, customer calls.
```
# Onboard new context:
/wiki-ingest raw/meetings/q1-planning-transcript.md
/wiki-ingest raw/docs/product-roadmap-2026.md
/wiki-ingest raw/calls/customer-interview-acme.md
# Wiki creates pages for projects, people, decisions, recurring themes.
# Ask strategic questions:
/wiki-query "What feature requests have come up most across customer calls?"
/wiki-query "What decisions were made in Q1 planning and what was the rationale?"
# Lint catches things like:
# → "Project X mentioned in 5 pages but no dedicated page"
# → "Roadmap contradicts customer interview on priority of feature Y"
```
The wiki stays current because the agent does the maintenance no one on the team wants to do.
---
### Competitive Analysis / Due Diligence
Track a company, market, or technology area over time.
```
# Feed in everything you find:
/wiki-ingest raw/competitors/openai-announcements.md
/wiki-ingest raw/competitors/anthropic-blog-posts.md
/wiki-ingest raw/market/ai-funding-report-q1.md
# Wiki builds entity pages per company, concept pages per technology.
# Ask comparison questions:
/wiki-query "How do OpenAI and Anthropic differ in their approach to safety?"
/wiki-query "Which companies have announced multimodal models in the last 6 months?"
# Save the answer back as a reusable synthesis:
/wiki-query "Competitive landscape summary as of today" --save
```
## Tips ## Tips
- Use [Obsidian](https://obsidian.md) to read/browse the wiki — follow links, check graph view - Use [Obsidian](https://obsidian.md) to browse the wiki — follow links, check graph view, use Dataview for frontmatter queries
- Use [Obsidian Web Clipper](https://obsidian.md/clipper) to clip web articles directly to `raw/` - Use [Obsidian Web Clipper](https://obsidian.md/clipper) to clip web articles directly to `raw/`
- The wiki is a git repo — you get version history for free
- File good query answers back with `--save` — your explorations compound just like ingested sources - File good query answers back with `--save` — your explorations compound just like ingested sources
- The wiki is a git repo — version history for free
- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)
## License ## Tech Stack
MIT License — see [LICENSE](LICENSE) for details. NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.
## Related ## Related
- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer) - [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)
- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this is related to in spirit - [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles
## License
MIT License — see [LICENSE](LICENSE) for details.