Auto-sync
This commit is contained in:
219
AGENTS.md
Normal file
219
AGENTS.md
Normal file
@@ -0,0 +1,219 @@
|
|||||||
|
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||||
|
|
||||||
|
This wiki is maintained entirely by your coding agent. No API key or Python scripts needed — just open this repo in Codex, OpenCode, or any agent that reads this file, and talk to it.
|
||||||
|
|
||||||
|
## How to Use
|
||||||
|
|
||||||
|
Describe what you want in plain English:
|
||||||
|
- *"Ingest this file: raw/papers/my-paper.md"*
|
||||||
|
- *"What does the wiki say about transformer models?"*
|
||||||
|
- *"Check the wiki for orphan pages and contradictions"*
|
||||||
|
- *"Build the knowledge graph"*
|
||||||
|
|
||||||
|
Or use shorthand triggers:
|
||||||
|
- `ingest <file>` → runs the Ingest Workflow
|
||||||
|
- `query: <question>` → runs the Query Workflow
|
||||||
|
- `lint` → runs the Lint Workflow
|
||||||
|
- `build graph` → runs the Graph Workflow
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
raw/ # Immutable source documents — never modify these
|
||||||
|
wiki/ # Agent owns this layer entirely
|
||||||
|
index.md # Catalog of all pages — update on every ingest
|
||||||
|
log.md # Append-only chronological record
|
||||||
|
overview.md # Living synthesis across all sources
|
||||||
|
sources/ # One summary page per source document
|
||||||
|
entities/ # People, companies, projects, products
|
||||||
|
concepts/ # Ideas, frameworks, methods, theories
|
||||||
|
syntheses/ # Saved query answers
|
||||||
|
graph/ # Auto-generated graph data
|
||||||
|
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Page Format
|
||||||
|
|
||||||
|
Every wiki page uses this frontmatter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
title: "Page Title"
|
||||||
|
type: source | entity | concept | synthesis
|
||||||
|
tags: []
|
||||||
|
sources: [] # list of source slugs that inform this page
|
||||||
|
last_updated: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ingest Workflow
|
||||||
|
|
||||||
|
Triggered by: *"ingest <file>"*
|
||||||
|
|
||||||
|
Steps (in order):
|
||||||
|
1. Read the source document fully
|
||||||
|
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||||
|
3. Write `wiki/sources/<slug>.md` — use the source page format below
|
||||||
|
4. Update `wiki/index.md` — add entry under Sources section
|
||||||
|
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||||
|
6. Update/create entity pages for key people, companies, projects mentioned
|
||||||
|
7. Update/create concept pages for key ideas and frameworks discussed
|
||||||
|
8. Flag any contradictions with existing wiki content
|
||||||
|
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||||
|
|
||||||
|
### Source Page Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Source Title"
|
||||||
|
type: source
|
||||||
|
tags: []
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
source_file: raw/...
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
2–4 sentence summary.
|
||||||
|
|
||||||
|
## Key Claims
|
||||||
|
- Claim 1
|
||||||
|
- Claim 2
|
||||||
|
|
||||||
|
## Key Quotes
|
||||||
|
> "Quote here" — context
|
||||||
|
|
||||||
|
## Connections
|
||||||
|
- [[EntityName]] — how they relate
|
||||||
|
- [[ConceptName]] — how it connects
|
||||||
|
|
||||||
|
## Contradictions
|
||||||
|
- Contradicts [[OtherPage]] on: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Domain-Specific Templates
|
||||||
|
|
||||||
|
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||||
|
|
||||||
|
#### Diary / Journal Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "YYYY-MM-DD Diary"
|
||||||
|
type: source
|
||||||
|
tags: [diary]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Event Summary
|
||||||
|
...
|
||||||
|
## Key Decisions
|
||||||
|
...
|
||||||
|
## Energy & Mood
|
||||||
|
...
|
||||||
|
## Connections
|
||||||
|
...
|
||||||
|
## Shifts & Contradictions
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Meeting Notes Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Meeting Title"
|
||||||
|
type: source
|
||||||
|
tags: [meeting]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Goal
|
||||||
|
...
|
||||||
|
## Key Discussions
|
||||||
|
...
|
||||||
|
## Decisions Made
|
||||||
|
...
|
||||||
|
## Action Items
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Query Workflow
|
||||||
|
|
||||||
|
Triggered by: *"query: <question>"*
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Read `wiki/index.md` to identify relevant pages
|
||||||
|
2. Read those pages
|
||||||
|
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
|
||||||
|
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lint Workflow
|
||||||
|
|
||||||
|
Triggered by: *"lint"*
|
||||||
|
|
||||||
|
Check for:
|
||||||
|
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
|
||||||
|
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
|
||||||
|
- **Contradictions** — claims that conflict across pages
|
||||||
|
- **Stale summaries** — pages not updated after newer sources
|
||||||
|
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
|
||||||
|
- **Data gaps** — questions the wiki can't answer; suggest new sources
|
||||||
|
|
||||||
|
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Graph Workflow
|
||||||
|
|
||||||
|
Triggered by: *"build graph"*
|
||||||
|
|
||||||
|
First try: `python tools/build_graph.py --open`
|
||||||
|
|
||||||
|
If Python/deps unavailable, build manually:
|
||||||
|
1. Search for all `[[wikilinks]]` across wiki pages
|
||||||
|
2. Build nodes (one per page) and edges (one per link)
|
||||||
|
3. Infer implicit relationships not captured by wikilinks — tag `INFERRED` with confidence score; low confidence → `AMBIGUOUS`
|
||||||
|
4. Write `graph/graph.json` with `{nodes, edges, built: date}`
|
||||||
|
5. Write `graph/graph.html` as a self-contained vis.js visualization
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Naming Conventions
|
||||||
|
|
||||||
|
- Source slugs: `kebab-case` matching source filename
|
||||||
|
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
|
||||||
|
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
|
||||||
|
|
||||||
|
## Index Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Wiki Index
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
- [Overview](overview.md) — living synthesis
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
- [Source Title](sources/slug.md) — one-line summary
|
||||||
|
|
||||||
|
## Entities
|
||||||
|
- [Entity Name](entities/EntityName.md) — one-line description
|
||||||
|
|
||||||
|
## Concepts
|
||||||
|
- [Concept Name](concepts/ConceptName.md) — one-line description
|
||||||
|
|
||||||
|
## Syntheses
|
||||||
|
- [Analysis Title](syntheses/slug.md) — what question it answers
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Format
|
||||||
|
|
||||||
|
`## [YYYY-MM-DD] <operation> | <title>`
|
||||||
|
|
||||||
|
Operations: `ingest`, `query`, `lint`, `graph`
|
||||||
230
CLAUDE.md
Normal file
230
CLAUDE.md
Normal file
@@ -0,0 +1,230 @@
|
|||||||
|
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||||
|
|
||||||
|
This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it.
|
||||||
|
|
||||||
|
## Slash Commands (Claude Code)
|
||||||
|
|
||||||
|
| Command | What to say |
|
||||||
|
|---|---|
|
||||||
|
| `/wiki-ingest` | `ingest raw/my-article.md` |
|
||||||
|
| `/wiki-query` | `query: what are the main themes?` |
|
||||||
|
| `/wiki-lint` | `lint the wiki` |
|
||||||
|
| `/wiki-graph` | `build the knowledge graph` |
|
||||||
|
|
||||||
|
Or just describe what you want in plain English:
|
||||||
|
- *"Ingest this file: raw/papers/attention-is-all-you-need.md"*
|
||||||
|
- *"What does the wiki say about transformer models?"*
|
||||||
|
- *"Check the wiki for orphan pages and contradictions"*
|
||||||
|
- *"Build the graph and show me what's connected to RAG"*
|
||||||
|
|
||||||
|
Claude Code reads this file automatically and follows the workflows below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
raw/ # Immutable source documents — never modify these
|
||||||
|
wiki/ # Claude owns this layer entirely
|
||||||
|
index.md # Catalog of all pages — update on every ingest
|
||||||
|
log.md # Append-only chronological record
|
||||||
|
overview.md # Living synthesis across all sources
|
||||||
|
sources/ # One summary page per source document
|
||||||
|
entities/ # People, companies, projects, products
|
||||||
|
concepts/ # Ideas, frameworks, methods, theories
|
||||||
|
syntheses/ # Saved query answers
|
||||||
|
graph/ # Auto-generated graph data
|
||||||
|
tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Page Format
|
||||||
|
|
||||||
|
Every wiki page uses this frontmatter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
title: "Page Title"
|
||||||
|
type: source | entity | concept | synthesis
|
||||||
|
tags: []
|
||||||
|
sources: [] # list of source slugs that inform this page
|
||||||
|
last_updated: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ingest Workflow
|
||||||
|
|
||||||
|
Triggered by: *"ingest <file>"* or `/wiki-ingest`
|
||||||
|
|
||||||
|
Steps (in order):
|
||||||
|
1. Read the source document fully using the Read tool
|
||||||
|
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||||
|
3. Write `wiki/sources/<slug>.md` — use the source page format below
|
||||||
|
4. Update `wiki/index.md` — add entry under Sources section
|
||||||
|
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||||
|
6. Update/create entity pages for key people, companies, projects mentioned
|
||||||
|
7. Update/create concept pages for key ideas and frameworks discussed
|
||||||
|
8. Flag any contradictions with existing wiki content
|
||||||
|
9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||||
|
|
||||||
|
### Source Page Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Source Title"
|
||||||
|
type: source
|
||||||
|
tags: []
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
source_file: raw/...
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
2–4 sentence summary.
|
||||||
|
|
||||||
|
## Key Claims
|
||||||
|
- Claim 1
|
||||||
|
- Claim 2
|
||||||
|
|
||||||
|
## Key Quotes
|
||||||
|
> "Quote here" — context
|
||||||
|
|
||||||
|
## Connections
|
||||||
|
- [[EntityName]] — how they relate
|
||||||
|
- [[ConceptName]] — how it connects
|
||||||
|
|
||||||
|
## Contradictions
|
||||||
|
- Contradicts [[OtherPage]] on: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Domain-Specific Templates
|
||||||
|
|
||||||
|
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||||
|
|
||||||
|
#### Diary / Journal Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "YYYY-MM-DD Diary"
|
||||||
|
type: source
|
||||||
|
tags: [diary]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Event Summary
|
||||||
|
...
|
||||||
|
## Key Decisions
|
||||||
|
...
|
||||||
|
## Energy & Mood
|
||||||
|
...
|
||||||
|
## Connections
|
||||||
|
...
|
||||||
|
## Shifts & Contradictions
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Meeting Notes Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Meeting Title"
|
||||||
|
type: source
|
||||||
|
tags: [meeting]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Goal
|
||||||
|
...
|
||||||
|
## Key Discussions
|
||||||
|
...
|
||||||
|
## Decisions Made
|
||||||
|
...
|
||||||
|
## Action Items
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Query Workflow
|
||||||
|
|
||||||
|
Triggered by: *"query: <question>"* or `/wiki-query`
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Read `wiki/index.md` to identify relevant pages
|
||||||
|
2. Read those pages with the Read tool
|
||||||
|
3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks
|
||||||
|
4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lint Workflow
|
||||||
|
|
||||||
|
Triggered by: *"lint the wiki"* or `/wiki-lint`
|
||||||
|
|
||||||
|
Use Grep and Read tools to check for:
|
||||||
|
- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages
|
||||||
|
- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist
|
||||||
|
- **Contradictions** — claims that conflict across pages
|
||||||
|
- **Stale summaries** — pages not updated after newer sources
|
||||||
|
- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page
|
||||||
|
- **Data gaps** — questions the wiki can't answer; suggest new sources
|
||||||
|
|
||||||
|
Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Graph Workflow
|
||||||
|
|
||||||
|
Triggered by: *"build the knowledge graph"* or `/wiki-graph`
|
||||||
|
|
||||||
|
When the user asks to build the graph, run `tools/build_graph.py` which:
|
||||||
|
- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges
|
||||||
|
- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores
|
||||||
|
- Runs Louvain community detection
|
||||||
|
- Outputs `graph/graph.json` + `graph/graph.html`
|
||||||
|
|
||||||
|
If the user doesn't have Python/dependencies set up, instead generate the graph data manually:
|
||||||
|
1. Use Grep to find all `[[wikilinks]]` across wiki pages
|
||||||
|
2. Build a node/edge list
|
||||||
|
3. Write `graph/graph.json` directly
|
||||||
|
4. Write `graph/graph.html` using the vis.js template
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Naming Conventions
|
||||||
|
|
||||||
|
- Source slugs: `kebab-case` matching source filename
|
||||||
|
- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`)
|
||||||
|
- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`)
|
||||||
|
- Source pages: `kebab-case.md`
|
||||||
|
|
||||||
|
## Index Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Wiki Index
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
- [Overview](overview.md) — living synthesis
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
- [Source Title](sources/slug.md) — one-line summary
|
||||||
|
|
||||||
|
## Entities
|
||||||
|
- [Entity Name](entities/EntityName.md) — one-line description
|
||||||
|
|
||||||
|
## Concepts
|
||||||
|
- [Concept Name](concepts/ConceptName.md) — one-line description
|
||||||
|
|
||||||
|
## Syntheses
|
||||||
|
- [Analysis Title](syntheses/slug.md) — what question it answers
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Format
|
||||||
|
|
||||||
|
Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable:
|
||||||
|
|
||||||
|
```
|
||||||
|
grep "^## \[" wiki/log.md | tail -10
|
||||||
|
```
|
||||||
|
|
||||||
|
Operations: `ingest`, `query`, `lint`, `graph`
|
||||||
175
GEMINI.md
Normal file
175
GEMINI.md
Normal file
@@ -0,0 +1,175 @@
|
|||||||
|
# LLM Wiki Agent — Schema & Workflow Instructions
|
||||||
|
|
||||||
|
This wiki is maintained entirely by Gemini CLI. No API key or Python scripts needed — just open this repo with `gemini` and talk to it.
|
||||||
|
|
||||||
|
## How to Use
|
||||||
|
|
||||||
|
Describe what you want in plain English:
|
||||||
|
- *"Ingest this file: raw/papers/my-paper.md"*
|
||||||
|
- *"What does the wiki say about transformer models?"*
|
||||||
|
- *"Check the wiki for orphan pages and contradictions"*
|
||||||
|
- *"Build the knowledge graph"*
|
||||||
|
|
||||||
|
Or use shorthand triggers:
|
||||||
|
- `ingest <file>` → runs the Ingest Workflow
|
||||||
|
- `query: <question>` → runs the Query Workflow
|
||||||
|
- `lint` → runs the Lint Workflow
|
||||||
|
- `build graph` → runs the Graph Workflow
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Directory Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
raw/ # Immutable source documents — never modify these
|
||||||
|
wiki/ # Agent owns this layer entirely
|
||||||
|
index.md # Catalog of all pages — update on every ingest
|
||||||
|
log.md # Append-only chronological record
|
||||||
|
overview.md # Living synthesis across all sources
|
||||||
|
sources/ # One summary page per source document
|
||||||
|
entities/ # People, companies, projects, products
|
||||||
|
concepts/ # Ideas, frameworks, methods, theories
|
||||||
|
syntheses/ # Saved query answers
|
||||||
|
graph/ # Auto-generated graph data
|
||||||
|
tools/ # Optional standalone Python scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Page Format
|
||||||
|
|
||||||
|
Every wiki page uses this frontmatter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
title: "Page Title"
|
||||||
|
type: source | entity | concept | synthesis
|
||||||
|
tags: []
|
||||||
|
sources: []
|
||||||
|
last_updated: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `[[PageName]]` wikilinks to link to other wiki pages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ingest Workflow
|
||||||
|
|
||||||
|
Triggered by: *"ingest <file>"*
|
||||||
|
|
||||||
|
1. Read the source document fully
|
||||||
|
2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context
|
||||||
|
3. Write `wiki/sources/<slug>.md` (source page format below)
|
||||||
|
4. Update `wiki/index.md` — add entry under Sources
|
||||||
|
5. Update `wiki/overview.md` — revise synthesis if warranted
|
||||||
|
6. Update/create entity and concept pages
|
||||||
|
7. Flag contradictions with existing wiki content
|
||||||
|
8. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>`
|
||||||
|
|
||||||
|
### Source Page Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Source Title"
|
||||||
|
type: source
|
||||||
|
tags: []
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
source_file: raw/...
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
2–4 sentence summary.
|
||||||
|
|
||||||
|
## Key Claims
|
||||||
|
- Claim 1
|
||||||
|
|
||||||
|
## Key Quotes
|
||||||
|
> "Quote here"
|
||||||
|
|
||||||
|
## Connections
|
||||||
|
- [[EntityName]] — how they relate
|
||||||
|
|
||||||
|
## Contradictions
|
||||||
|
- Contradicts [[OtherPage]] on: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Domain-Specific Templates
|
||||||
|
|
||||||
|
If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above:
|
||||||
|
|
||||||
|
#### Diary / Journal Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "YYYY-MM-DD Diary"
|
||||||
|
type: source
|
||||||
|
tags: [diary]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Event Summary
|
||||||
|
...
|
||||||
|
## Key Decisions
|
||||||
|
...
|
||||||
|
## Energy & Mood
|
||||||
|
...
|
||||||
|
## Connections
|
||||||
|
...
|
||||||
|
## Shifts & Contradictions
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Meeting Notes Template
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
title: "Meeting Title"
|
||||||
|
type: source
|
||||||
|
tags: [meeting]
|
||||||
|
date: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
## Goal
|
||||||
|
...
|
||||||
|
## Key Discussions
|
||||||
|
...
|
||||||
|
## Decisions Made
|
||||||
|
...
|
||||||
|
## Action Items
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Query Workflow
|
||||||
|
|
||||||
|
Triggered by: *"query: <question>"*
|
||||||
|
|
||||||
|
1. Read `wiki/index.md` — identify relevant pages
|
||||||
|
2. Read those pages
|
||||||
|
3. Synthesize answer with `[[PageName]]` citations
|
||||||
|
4. Offer to save as `wiki/syntheses/<slug>.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lint Workflow
|
||||||
|
|
||||||
|
Triggered by: *"lint"*
|
||||||
|
|
||||||
|
Check for: orphan pages, broken links, contradictions, stale content, missing entity pages, data gaps.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Graph Workflow
|
||||||
|
|
||||||
|
Triggered by: *"build graph"*
|
||||||
|
|
||||||
|
Try `python tools/build_graph.py --open` first. If unavailable, build graph.json and graph.html manually from wikilinks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Naming Conventions
|
||||||
|
|
||||||
|
- Source slugs: `kebab-case`
|
||||||
|
- Entity/Concept pages: `TitleCase.md`
|
||||||
|
|
||||||
|
## Log Format
|
||||||
|
|
||||||
|
`## [YYYY-MM-DD] <operation> | <title>`
|
||||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2023 SamurAIGPT
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
251
README.md
251
README.md
@@ -1,12 +1,245 @@
|
|||||||
---
|
# LLM Wiki Agent
|
||||||
title: nexus
|
|
||||||
source:
|
[](LICENSE)
|
||||||
author: shenwei
|
|
||||||
published:
|
**A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.
|
||||||
created:
|
|
||||||
description:
|
> Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.
|
||||||
tags: []
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/papers/attention-is-all-you-need.md
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
wiki/
|
||||||
|
├── index.md catalog of all pages — updated on every ingest
|
||||||
|
├── log.md append-only record of every operation
|
||||||
|
├── overview.md living synthesis across all sources
|
||||||
|
├── sources/ one summary page per source document
|
||||||
|
├── entities/ people, companies, projects — auto-created
|
||||||
|
├── concepts/ ideas, frameworks, methods — auto-created
|
||||||
|
└── syntheses/ query answers filed back as wiki pages
|
||||||
|
graph/
|
||||||
|
├── graph.json persistent node/edge data (SHA256-cached)
|
||||||
|
└── graph.html interactive vis.js visualization — open in any browser
|
||||||
|
```
|
||||||
|
|
||||||
|
## Install
|
||||||
|
|
||||||
|
**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/SamurAIGPT/llm-wiki-agent.git
|
||||||
|
cd llm-wiki-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
Open in your agent — no API key or Python setup needed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claude # reads CLAUDE.md + .claude/commands/
|
||||||
|
codex # reads AGENTS.md
|
||||||
|
opencode # reads AGENTS.md
|
||||||
|
gemini # reads GEMINI.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki
|
||||||
|
/wiki-ingest raw/articles/my-article.md # works on any markdown file
|
||||||
|
|
||||||
|
/wiki-query "what are the main themes?" # synthesize answer from wiki pages
|
||||||
|
/wiki-query "how does X relate to Y?" # with [[wikilink]] citations
|
||||||
|
|
||||||
|
/wiki-lint # find orphans, contradictions, gaps
|
||||||
|
/wiki-graph # build graph.html from all wikilinks
|
||||||
|
```
|
||||||
|
|
||||||
|
Plain English also works with any agent:
|
||||||
|
```
|
||||||
|
"Ingest this paper: raw/papers/llama2.md"
|
||||||
|
"What does the wiki say about attention mechanisms?"
|
||||||
|
"Check for contradictions across sources"
|
||||||
|
"Build the knowledge graph and tell me the most connected nodes"
|
||||||
|
```
|
||||||
|
|
||||||
|
Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries.
|
||||||
|
|
||||||
|
## What You Get
|
||||||
|
|
||||||
|
**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.
|
||||||
|
|
||||||
|
**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.
|
||||||
|
|
||||||
|
**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.
|
||||||
|
|
||||||
|
**Living overview** — `wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.
|
||||||
|
|
||||||
|
**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.
|
||||||
|
|
||||||
|
**Knowledge graph** — `graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.
|
||||||
|
|
||||||
|
**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
### Research
|
||||||
|
|
||||||
|
Going deep on a topic over weeks — reading papers, articles, reports.
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/papers/attention-is-all-you-need.md
|
||||||
|
/wiki-ingest raw/papers/llama2.md
|
||||||
|
/wiki-ingest raw/papers/rag-survey.md
|
||||||
|
|
||||||
|
# Wiki builds entity pages (Meta AI, Google Brain) and
|
||||||
|
# concept pages (Attention, RLHF, Context Window) automatically.
|
||||||
|
|
||||||
|
/wiki-query "What are the main approaches to reducing hallucination?"
|
||||||
|
/wiki-query "How has context window size evolved across models?"
|
||||||
|
|
||||||
|
/wiki-lint
|
||||||
|
# → "No sources on mixture-of-experts — consider the Mixtral paper"
|
||||||
|
```
|
||||||
|
|
||||||
|
By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# nexus
|
### Reading a Book
|
||||||
|
|
||||||
|
File each chapter as you go. Build out pages for characters, themes, arguments.
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/book/chapter-01.md
|
||||||
|
/wiki-ingest raw/book/chapter-02.md
|
||||||
|
|
||||||
|
# Wiki creates entity and theme pages automatically.
|
||||||
|
|
||||||
|
/wiki-query "How has the protagonist's motivation evolved?"
|
||||||
|
/wiki-query "What contradictions exist in the author's argument so far?"
|
||||||
|
|
||||||
|
/wiki-graph # → graph.html shows every character/theme and how they connect
|
||||||
|
```
|
||||||
|
|
||||||
|
Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Personal Knowledge Base
|
||||||
|
|
||||||
|
Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes.
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/journal/2026-01-week1.md
|
||||||
|
/wiki-ingest raw/articles/huberman-sleep-protocol.md
|
||||||
|
/wiki-ingest raw/articles/atomic-habits-summary.md
|
||||||
|
|
||||||
|
/wiki-query "What patterns show up in my journal entries about energy?"
|
||||||
|
/wiki-query "What habits have I tried and what was the outcome?"
|
||||||
|
```
|
||||||
|
|
||||||
|
The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Business / Team Intelligence
|
||||||
|
|
||||||
|
Feed in meeting transcripts, project docs, customer calls.
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/meetings/q1-planning-transcript.md
|
||||||
|
/wiki-ingest raw/docs/product-roadmap-2026.md
|
||||||
|
/wiki-ingest raw/calls/customer-interview-acme.md
|
||||||
|
|
||||||
|
/wiki-query "What feature requests have come up most across customer calls?"
|
||||||
|
/wiki-query "What decisions were made in Q1 and what was the rationale?"
|
||||||
|
|
||||||
|
/wiki-lint
|
||||||
|
# → "Project X mentioned in 5 pages but no dedicated page"
|
||||||
|
# → "Roadmap contradicts customer interview on priority of feature Y"
|
||||||
|
```
|
||||||
|
|
||||||
|
The wiki stays current because the agent does the maintenance no one wants to do.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Competitive Analysis
|
||||||
|
|
||||||
|
Track a company, market, or technology over time.
|
||||||
|
|
||||||
|
```
|
||||||
|
/wiki-ingest raw/competitors/openai-announcements.md
|
||||||
|
/wiki-ingest raw/market/ai-funding-report-q1.md
|
||||||
|
|
||||||
|
/wiki-query "How do OpenAI and Anthropic differ on safety approach?"
|
||||||
|
/wiki-query "Which companies announced multimodal models in the last 6 months?"
|
||||||
|
/wiki-query "Competitive landscape summary as of today" --save
|
||||||
|
```
|
||||||
|
|
||||||
|
## The Graph
|
||||||
|
|
||||||
|
Two-pass build:
|
||||||
|
|
||||||
|
1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`
|
||||||
|
2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`
|
||||||
|
|
||||||
|
Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.
|
||||||
|
|
||||||
|
## CLAUDE.md / AGENTS.md
|
||||||
|
|
||||||
|
The schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.
|
||||||
|
|
||||||
|
| Agent | Schema file |
|
||||||
|
|---|---|
|
||||||
|
| Claude Code | `CLAUDE.md` |
|
||||||
|
| Codex / OpenCode | `AGENTS.md` |
|
||||||
|
| Gemini CLI | `GEMINI.md` |
|
||||||
|
|
||||||
|
## What Makes This Different from RAG
|
||||||
|
|
||||||
|
| RAG | LLM Wiki Agent |
|
||||||
|
|---|---|
|
||||||
|
| Re-derives knowledge every query | Compiles once, keeps current |
|
||||||
|
| Raw chunks as retrieval unit | Structured wiki pages |
|
||||||
|
| No cross-references | Cross-references pre-built |
|
||||||
|
| Contradictions surface at query time (maybe) | Flagged at ingest time |
|
||||||
|
| No accumulation | Every source makes the wiki richer |
|
||||||
|
|
||||||
|
## Obsidian Integration
|
||||||
|
|
||||||
|
The wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault.
|
||||||
|
|
||||||
|
### Vault Symlink Pattern
|
||||||
|
If you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks:
|
||||||
|
1. Keep your working agent repository at e.g., `~/llm-wiki-agent`
|
||||||
|
2. Create a symlink from your main Obsidian vault:
|
||||||
|
```bash
|
||||||
|
ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki
|
||||||
|
```
|
||||||
|
3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion.
|
||||||
|
|
||||||
|
> **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian.
|
||||||
|
|
||||||
|
### Recommended .obsidian Config
|
||||||
|
- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph.
|
||||||
|
- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`).
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
- File good query answers back with `--save` — your explorations compound just like ingested sources
|
||||||
|
- The wiki is a git repo — version history for free
|
||||||
|
- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)
|
||||||
|
- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT License — see [LICENSE](LICENSE) for details.
|
||||||
|
|||||||
101
docs/automated-sync.md
Normal file
101
docs/automated-sync.md
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
# Automated Wiki Synchronization Guide
|
||||||
|
|
||||||
|
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
|
||||||
|
|
||||||
|
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
|
||||||
|
|
||||||
|
## The Two-Step Architecture
|
||||||
|
|
||||||
|
LLM Wiki Agent ingestion is a two-step process:
|
||||||
|
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
|
||||||
|
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
|
||||||
|
|
||||||
|
### Step 1: The Master Orchestrator Script
|
||||||
|
|
||||||
|
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# Define variables
|
||||||
|
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
|
||||||
|
LOG_FILE="$LAB_DIR/automation-cron.log"
|
||||||
|
DATE=$(date "+%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
echo "=====================================================" >> "$LOG_FILE"
|
||||||
|
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
|
||||||
|
|
||||||
|
cd "$LAB_DIR" || exit 1
|
||||||
|
|
||||||
|
# 1. Run your personal Vault-to-Raw symlink script here
|
||||||
|
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
|
||||||
|
export LLM_MODEL="gemini/gemini-3-flash-preview"
|
||||||
|
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
|
||||||
|
|
||||||
|
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
|
||||||
|
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
|
||||||
|
while read file; do
|
||||||
|
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
|
||||||
|
done
|
||||||
|
|
||||||
|
# 3. Heal Graph Context (Auto-resolves broken semantic links)
|
||||||
|
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
|
||||||
|
python3 tools/heal.py >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
|
||||||
|
echo "=====================================================" >> "$LOG_FILE"
|
||||||
|
```
|
||||||
|
|
||||||
|
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
|
||||||
|
|
||||||
|
### Step 2: System Scheduler (macOS launchd)
|
||||||
|
|
||||||
|
For macOS, `launchd` is significantly more robust than `cron`.
|
||||||
|
|
||||||
|
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||||
|
<plist version="1.0">
|
||||||
|
<dict>
|
||||||
|
<key>Label</key>
|
||||||
|
<string>com.personal-wiki-sync</string>
|
||||||
|
<key>ProgramArguments</key>
|
||||||
|
<array>
|
||||||
|
<string>/bin/bash</string>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
|
||||||
|
</array>
|
||||||
|
|
||||||
|
<!-- Execute automatically at 2:00 AM daily -->
|
||||||
|
<key>StartCalendarInterval</key>
|
||||||
|
<dict>
|
||||||
|
<key>Hour</key>
|
||||||
|
<integer>2</integer>
|
||||||
|
<key>Minute</key>
|
||||||
|
<integer>0</integer>
|
||||||
|
</dict>
|
||||||
|
|
||||||
|
<!-- Run upon system boot if the interval was missed -->
|
||||||
|
<key>RunAtLoad</key>
|
||||||
|
<true/>
|
||||||
|
|
||||||
|
<!-- Diagnostic Logs -->
|
||||||
|
<key>StandardOutPath</key>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
|
||||||
|
<key>StandardErrorPath</key>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
|
||||||
|
</dict>
|
||||||
|
</plist>
|
||||||
|
```
|
||||||
|
|
||||||
|
Load the daemon:
|
||||||
|
```bash
|
||||||
|
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
|
||||||
|
```
|
||||||
|
|
||||||
|
### Self-Healing & Health Monitoring
|
||||||
|
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.
|
||||||
14
examples/cjk-showcase/README.md
Normal file
14
examples/cjk-showcase/README.md
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# CJK Showcase (Chinese Language Example)
|
||||||
|
|
||||||
|
This directory demonstrates how LLM Wiki Agent performs with Non-English (CJK) languages.
|
||||||
|
|
||||||
|
The agent naturally supports processing Chinese content. With the CJK query bug fixed, you can ingest, query, and linguistically search across Chinese entries without any language-specific configuration.
|
||||||
|
|
||||||
|
## Files included in this showcase:
|
||||||
|
|
||||||
|
- `raw/2026-04-13-reflection.md`: A sample source document (a personal reflection on career transition).
|
||||||
|
- `wiki/sources/2026-04-13-reflection.md`: The parsed structured source page.
|
||||||
|
- `wiki/entities/杨帆.md`: Auto-extracted Chinese entity page.
|
||||||
|
- `wiki/concepts/AI转型.md`: Auto-extracted Chinese concept page.
|
||||||
|
|
||||||
|
Try running `python tools/query.py "关于AI转型的建议"` from the root directory after moving these to your main knowledge base to see how semantic extraction and keyword matching behave in non-English contexts!
|
||||||
7
examples/cjk-showcase/raw/2026-04-13-reflection.md
Normal file
7
examples/cjk-showcase/raw/2026-04-13-reflection.md
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
# 2026-04-13 关于AI转型的复盘总结
|
||||||
|
|
||||||
|
今天和杨帆深入讨论了土木工程转向AI产品经理的路径。他提到最大的陷阱是“工具旅游(Tool Tourism)”——很多非技术背景的人沉迷于尝试各种AI工具,却忽略了业务本质和产品交付。
|
||||||
|
|
||||||
|
真正的破局点在于将大模型视为一种新的计算范式,而不是魔术。我们需要关注模型稳定性、成本、并发以及长上下文的召回率。同时,我也在思考目前个人的技术栈,从玩提示词到掌握Agentic Workflow框架(如LangChain或自定义多Agent系统),这是一个质的飞跃。
|
||||||
|
|
||||||
|
决定下一步:减少看泛科普文章,直接深入开源社区,比如通过贡献代码或者提出架构Issue来积累实际影响力。
|
||||||
0
graph/.gitkeep
Normal file
0
graph/.gitkeep
Normal file
0
raw/.gitkeep
Normal file
0
raw/.gitkeep
Normal file
2
requirements.txt
Normal file
2
requirements.txt
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
litellm>=1.0.0
|
||||||
|
networkx>=3.2
|
||||||
454
tools/build_graph.py
Normal file
454
tools/build_graph.py
Normal file
@@ -0,0 +1,454 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Build the knowledge graph from the wiki.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/build_graph.py # full rebuild
|
||||||
|
python tools/build_graph.py --no-infer # skip semantic inference (faster)
|
||||||
|
python tools/build_graph.py --open # open graph.html in browser after build
|
||||||
|
|
||||||
|
Outputs:
|
||||||
|
graph/graph.json — node/edge data (cached by SHA256)
|
||||||
|
graph/graph.html — interactive vis.js visualization
|
||||||
|
|
||||||
|
Edge types:
|
||||||
|
EXTRACTED — explicit [[wikilink]] in a page
|
||||||
|
INFERRED — Claude-detected implicit relationship
|
||||||
|
AMBIGUOUS — low-confidence inferred relationship
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
import json
|
||||||
|
import hashlib
|
||||||
|
import argparse
|
||||||
|
import webbrowser
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import date
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
try:
|
||||||
|
import networkx as nx
|
||||||
|
from networkx.algorithms import community as nx_community
|
||||||
|
HAS_NETWORKX = True
|
||||||
|
except ImportError:
|
||||||
|
HAS_NETWORKX = False
|
||||||
|
print("Warning: networkx not installed. Community detection disabled. Run: pip install networkx")
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
GRAPH_DIR = REPO_ROOT / "graph"
|
||||||
|
GRAPH_JSON = GRAPH_DIR / "graph.json"
|
||||||
|
GRAPH_HTML = GRAPH_DIR / "graph.html"
|
||||||
|
CACHE_FILE = GRAPH_DIR / ".cache.json"
|
||||||
|
LOG_FILE = WIKI_DIR / "log.md"
|
||||||
|
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||||
|
|
||||||
|
# Node type → color mapping
|
||||||
|
TYPE_COLORS = {
|
||||||
|
"source": "#4CAF50",
|
||||||
|
"entity": "#2196F3",
|
||||||
|
"concept": "#FF9800",
|
||||||
|
"synthesis": "#9C27B0",
|
||||||
|
"unknown": "#9E9E9E",
|
||||||
|
}
|
||||||
|
|
||||||
|
EDGE_COLORS = {
|
||||||
|
"EXTRACTED": "#555555",
|
||||||
|
"INFERRED": "#FF5722",
|
||||||
|
"AMBIGUOUS": "#BDBDBD",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def read_file(path: Path) -> str:
|
||||||
|
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
import sys
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
model = os.getenv(model_env, default_model)
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
|
||||||
|
def sha256(text: str) -> str:
|
||||||
|
return hashlib.sha256(text.encode()).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def all_wiki_pages() -> list[Path]:
|
||||||
|
return [p for p in WIKI_DIR.rglob("*.md")
|
||||||
|
if p.name not in ("index.md", "log.md", "lint-report.md")]
|
||||||
|
|
||||||
|
|
||||||
|
def extract_wikilinks(content: str) -> list[str]:
|
||||||
|
return list(set(re.findall(r'\[\[([^\]]+)\]\]', content)))
|
||||||
|
|
||||||
|
|
||||||
|
def extract_frontmatter_type(content: str) -> str:
|
||||||
|
match = re.search(r'^type:\s*(\S+)', content, re.MULTILINE)
|
||||||
|
return match.group(1).strip('"\'') if match else "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def page_id(path: Path) -> str:
|
||||||
|
return path.relative_to(WIKI_DIR).as_posix().replace(".md", "")
|
||||||
|
|
||||||
|
|
||||||
|
def load_cache() -> dict:
|
||||||
|
if CACHE_FILE.exists():
|
||||||
|
try:
|
||||||
|
return json.loads(CACHE_FILE.read_text())
|
||||||
|
except (json.JSONDecodeError, IOError):
|
||||||
|
return {}
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def save_cache(cache: dict):
|
||||||
|
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
CACHE_FILE.write_text(json.dumps(cache, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
def build_nodes(pages: list[Path]) -> list[dict]:
|
||||||
|
nodes = []
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
node_type = extract_frontmatter_type(content)
|
||||||
|
title_match = re.search(r'^title:\s*"?([^"\n]+)"?', content, re.MULTILINE)
|
||||||
|
label = title_match.group(1).strip() if title_match else p.stem
|
||||||
|
nodes.append({
|
||||||
|
"id": page_id(p),
|
||||||
|
"label": label,
|
||||||
|
"type": node_type,
|
||||||
|
"color": TYPE_COLORS.get(node_type, TYPE_COLORS["unknown"]),
|
||||||
|
"path": str(p.relative_to(REPO_ROOT)),
|
||||||
|
})
|
||||||
|
return nodes
|
||||||
|
|
||||||
|
|
||||||
|
def build_extracted_edges(pages: list[Path]) -> list[dict]:
|
||||||
|
"""Pass 1: deterministic wikilink edges."""
|
||||||
|
# Build a map from stem (lower) -> page_id for resolution
|
||||||
|
stem_map = {p.stem.lower(): page_id(p) for p in pages}
|
||||||
|
edges = []
|
||||||
|
seen = set()
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
src = page_id(p)
|
||||||
|
for link in extract_wikilinks(content):
|
||||||
|
target = stem_map.get(link.lower())
|
||||||
|
if target and target != src:
|
||||||
|
key = (src, target)
|
||||||
|
if key not in seen:
|
||||||
|
seen.add(key)
|
||||||
|
edges.append({
|
||||||
|
"from": src,
|
||||||
|
"to": target,
|
||||||
|
"type": "EXTRACTED",
|
||||||
|
"color": EDGE_COLORS["EXTRACTED"],
|
||||||
|
"confidence": 1.0,
|
||||||
|
})
|
||||||
|
return edges
|
||||||
|
|
||||||
|
|
||||||
|
def build_inferred_edges(pages: list[Path], existing_edges: list[dict], cache: dict) -> list[dict]:
|
||||||
|
"""Pass 2: API-inferred semantic relationships."""
|
||||||
|
new_edges = []
|
||||||
|
|
||||||
|
# Only process pages that changed since last run
|
||||||
|
changed_pages = []
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
h = sha256(content)
|
||||||
|
entry = cache.get(str(p))
|
||||||
|
|
||||||
|
if not isinstance(entry, dict) or entry.get("hash") != h:
|
||||||
|
changed_pages.append(p)
|
||||||
|
else:
|
||||||
|
# Page unchanged: load its inferred edges from cache perfectly
|
||||||
|
src = page_id(p)
|
||||||
|
for rel in entry.get("edges", []):
|
||||||
|
new_edges.append({
|
||||||
|
"from": src,
|
||||||
|
"to": rel["to"],
|
||||||
|
"type": rel.get("type", "INFERRED"),
|
||||||
|
"title": rel.get("relationship", ""),
|
||||||
|
"label": "",
|
||||||
|
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
|
||||||
|
"confidence": float(rel.get("confidence", 0.7)),
|
||||||
|
})
|
||||||
|
|
||||||
|
if not changed_pages:
|
||||||
|
print(" no changed pages — skipping semantic inference")
|
||||||
|
return []
|
||||||
|
|
||||||
|
print(f" inferring relationships for {len(changed_pages)} changed pages...")
|
||||||
|
|
||||||
|
# Build a summary of existing nodes for context
|
||||||
|
node_list = "\n".join(f"- {page_id(p)} ({extract_frontmatter_type(read_file(p))})" for p in pages)
|
||||||
|
existing_edge_summary = "\n".join(
|
||||||
|
f"- {e['from']} → {e['to']} (EXTRACTED)" for e in existing_edges[:30]
|
||||||
|
)
|
||||||
|
|
||||||
|
for p in changed_pages:
|
||||||
|
content = read_file(p)[:2000] # truncate for context efficiency
|
||||||
|
src = page_id(p)
|
||||||
|
|
||||||
|
prompt = f"""Analyze this wiki page and identify implicit semantic relationships to other pages in the wiki.
|
||||||
|
|
||||||
|
Source page: {src}
|
||||||
|
Content:
|
||||||
|
{content}
|
||||||
|
|
||||||
|
All available pages:
|
||||||
|
{node_list}
|
||||||
|
|
||||||
|
Already-extracted edges from this page:
|
||||||
|
{existing_edge_summary}
|
||||||
|
|
||||||
|
Return ONLY a JSON array of NEW relationships not already captured by explicit wikilinks:
|
||||||
|
[
|
||||||
|
{{"to": "page-id", "relationship": "one-line description", "confidence": 0.0-1.0, "type": "INFERRED or AMBIGUOUS"}}
|
||||||
|
]
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Only include pages from the available list above
|
||||||
|
- Confidence >= 0.7 → INFERRED, < 0.7 → AMBIGUOUS
|
||||||
|
- Do not repeat edges already in the extracted list
|
||||||
|
- Return empty array [] if no new relationships found
|
||||||
|
"""
|
||||||
|
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=1024)
|
||||||
|
raw = raw.strip()
|
||||||
|
raw = re.sub(r"^```(?:json)?\s*", "", raw)
|
||||||
|
raw = re.sub(r"\s*```$", "", raw)
|
||||||
|
|
||||||
|
try:
|
||||||
|
inferred = json.loads(raw)
|
||||||
|
valid_rels = []
|
||||||
|
for rel in inferred:
|
||||||
|
if isinstance(rel, dict) and "to" in rel:
|
||||||
|
new_edges.append({
|
||||||
|
"from": src,
|
||||||
|
"to": rel["to"],
|
||||||
|
"type": rel.get("type", "INFERRED"),
|
||||||
|
"title": rel.get("relationship", ""),
|
||||||
|
"label": "",
|
||||||
|
"color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]),
|
||||||
|
"confidence": float(rel.get("confidence", 0.7)),
|
||||||
|
})
|
||||||
|
valid_rels.append(rel)
|
||||||
|
|
||||||
|
# Save properly to cache
|
||||||
|
cache[str(p)] = {
|
||||||
|
"hash": sha256(content),
|
||||||
|
"edges": valid_rels
|
||||||
|
}
|
||||||
|
except (json.JSONDecodeError, TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
return new_edges
|
||||||
|
|
||||||
|
|
||||||
|
def detect_communities(nodes: list[dict], edges: list[dict]) -> dict[str, int]:
|
||||||
|
"""Assign community IDs to nodes using Louvain algorithm."""
|
||||||
|
if not HAS_NETWORKX:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
G = nx.Graph()
|
||||||
|
for n in nodes:
|
||||||
|
G.add_node(n["id"])
|
||||||
|
for e in edges:
|
||||||
|
G.add_edge(e["from"], e["to"])
|
||||||
|
|
||||||
|
if G.number_of_edges() == 0:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
try:
|
||||||
|
communities = nx_community.louvain_communities(G, seed=42)
|
||||||
|
node_to_community = {}
|
||||||
|
for i, comm in enumerate(communities):
|
||||||
|
for node in comm:
|
||||||
|
node_to_community[node] = i
|
||||||
|
return node_to_community
|
||||||
|
except Exception:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
COMMUNITY_COLORS = [
|
||||||
|
"#E91E63", "#00BCD4", "#8BC34A", "#FF5722", "#673AB7",
|
||||||
|
"#FFC107", "#009688", "#F44336", "#3F51B5", "#CDDC39",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def render_html(nodes: list[dict], edges: list[dict]) -> str:
|
||||||
|
"""Generate self-contained vis.js HTML."""
|
||||||
|
nodes_json = json.dumps(nodes, indent=2)
|
||||||
|
edges_json = json.dumps(edges, indent=2)
|
||||||
|
|
||||||
|
legend_items = "".join(
|
||||||
|
f'<span style="background:{color};padding:3px 8px;margin:2px;border-radius:3px;font-size:12px">{t}</span>'
|
||||||
|
for t, color in TYPE_COLORS.items() if t != "unknown"
|
||||||
|
)
|
||||||
|
|
||||||
|
return f"""<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<title>LLM Wiki — Knowledge Graph</title>
|
||||||
|
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
|
||||||
|
<style>
|
||||||
|
body {{ margin: 0; background: #1a1a2e; font-family: sans-serif; color: #eee; }}
|
||||||
|
#graph {{ width: 100vw; height: 100vh; }}
|
||||||
|
#controls {{
|
||||||
|
position: fixed; top: 10px; left: 10px; background: rgba(0,0,0,0.7);
|
||||||
|
padding: 12px; border-radius: 8px; z-index: 10; max-width: 260px;
|
||||||
|
}}
|
||||||
|
#controls h3 {{ margin: 0 0 8px; font-size: 14px; }}
|
||||||
|
#search {{ width: 100%; padding: 4px; margin-bottom: 8px; background: #333; color: #eee; border: 1px solid #555; border-radius: 4px; }}
|
||||||
|
#info {{
|
||||||
|
position: fixed; bottom: 10px; left: 10px; background: rgba(0,0,0,0.8);
|
||||||
|
padding: 12px; border-radius: 8px; z-index: 10; max-width: 320px;
|
||||||
|
display: none;
|
||||||
|
}}
|
||||||
|
#stats {{ position: fixed; top: 10px; right: 10px; background: rgba(0,0,0,0.7); padding: 10px; border-radius: 8px; font-size: 12px; }}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="controls">
|
||||||
|
<h3>LLM Wiki Graph</h3>
|
||||||
|
<input id="search" type="text" placeholder="Search nodes..." oninput="searchNodes(this.value)">
|
||||||
|
<div>{legend_items}</div>
|
||||||
|
<div style="margin-top:8px;font-size:11px;color:#aaa">
|
||||||
|
<span style="background:#555;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Explicit link<br>
|
||||||
|
<span style="background:#FF5722;padding:2px 6px;border-radius:3px;margin-right:4px">──</span> Inferred
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div id="graph"></div>
|
||||||
|
<div id="info">
|
||||||
|
<b id="info-title"></b><br>
|
||||||
|
<span id="info-type" style="font-size:12px;color:#aaa"></span><br>
|
||||||
|
<span id="info-path" style="font-size:11px;color:#666"></span>
|
||||||
|
</div>
|
||||||
|
<div id="stats"></div>
|
||||||
|
<script>
|
||||||
|
const nodes = new vis.DataSet({nodes_json});
|
||||||
|
const edges = new vis.DataSet({edges_json});
|
||||||
|
|
||||||
|
const container = document.getElementById("graph");
|
||||||
|
const network = new vis.Network(container, {{ nodes, edges }}, {{
|
||||||
|
nodes: {{
|
||||||
|
shape: "dot",
|
||||||
|
size: 12,
|
||||||
|
font: {{ color: "#eee", size: 13 }},
|
||||||
|
borderWidth: 2,
|
||||||
|
}},
|
||||||
|
edges: {{
|
||||||
|
width: 1.2,
|
||||||
|
smooth: {{ type: "continuous" }},
|
||||||
|
arrows: {{ to: {{ enabled: true, scaleFactor: 0.5 }} }},
|
||||||
|
}},
|
||||||
|
physics: {{
|
||||||
|
stabilization: {{ iterations: 150 }},
|
||||||
|
barnesHut: {{ gravitationalConstant: -8000, springLength: 120 }},
|
||||||
|
}},
|
||||||
|
interaction: {{ hover: true, tooltipDelay: 200 }},
|
||||||
|
}});
|
||||||
|
|
||||||
|
network.on("click", params => {{
|
||||||
|
if (params.nodes.length > 0) {{
|
||||||
|
const node = nodes.get(params.nodes[0]);
|
||||||
|
document.getElementById("info").style.display = "block";
|
||||||
|
document.getElementById("info-title").textContent = node.label;
|
||||||
|
document.getElementById("info-type").textContent = node.type;
|
||||||
|
document.getElementById("info-path").textContent = node.path;
|
||||||
|
}} else {{
|
||||||
|
document.getElementById("info").style.display = "none";
|
||||||
|
}}
|
||||||
|
}});
|
||||||
|
|
||||||
|
document.getElementById("stats").textContent =
|
||||||
|
`${{nodes.length}} nodes · ${{edges.length}} edges`;
|
||||||
|
|
||||||
|
function searchNodes(q) {{
|
||||||
|
const lower = q.toLowerCase();
|
||||||
|
nodes.forEach(n => {{
|
||||||
|
nodes.update({{ id: n.id, opacity: (!q || n.label.toLowerCase().includes(lower)) ? 1 : 0.15 }});
|
||||||
|
}});
|
||||||
|
}}
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>"""
|
||||||
|
|
||||||
|
|
||||||
|
def append_log(entry: str):
|
||||||
|
log_path = WIKI_DIR / "log.md"
|
||||||
|
existing = read_file(log_path)
|
||||||
|
log_path.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def build_graph(infer: bool = True, open_browser: bool = False):
|
||||||
|
pages = all_wiki_pages()
|
||||||
|
today = date.today().isoformat()
|
||||||
|
|
||||||
|
if not pages:
|
||||||
|
print("Wiki is empty. Ingest some sources first.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Building graph from {len(pages)} wiki pages...")
|
||||||
|
GRAPH_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
cache = load_cache()
|
||||||
|
|
||||||
|
# Pass 1: extracted edges
|
||||||
|
print(" Pass 1: extracting wikilinks...")
|
||||||
|
nodes = build_nodes(pages)
|
||||||
|
edges = build_extracted_edges(pages)
|
||||||
|
print(f" → {len(edges)} extracted edges")
|
||||||
|
|
||||||
|
# Pass 2: inferred edges
|
||||||
|
if infer:
|
||||||
|
print(" Pass 2: inferring semantic relationships...")
|
||||||
|
inferred = build_inferred_edges(pages, edges, cache)
|
||||||
|
edges.extend(inferred)
|
||||||
|
print(f" → {len(inferred)} inferred edges")
|
||||||
|
save_cache(cache)
|
||||||
|
|
||||||
|
# Community detection
|
||||||
|
print(" Running Louvain community detection...")
|
||||||
|
communities = detect_communities(nodes, edges)
|
||||||
|
for node in nodes:
|
||||||
|
comm_id = communities.get(node["id"], -1)
|
||||||
|
if comm_id >= 0:
|
||||||
|
node["color"] = COMMUNITY_COLORS[comm_id % len(COMMUNITY_COLORS)]
|
||||||
|
node["group"] = comm_id
|
||||||
|
|
||||||
|
# Save graph.json
|
||||||
|
graph_data = {"nodes": nodes, "edges": edges, "built": today}
|
||||||
|
GRAPH_JSON.write_text(json.dumps(graph_data, indent=2))
|
||||||
|
print(f" saved: graph/graph.json ({len(nodes)} nodes, {len(edges)} edges)")
|
||||||
|
|
||||||
|
# Save graph.html
|
||||||
|
html = render_html(nodes, edges)
|
||||||
|
GRAPH_HTML.write_text(html)
|
||||||
|
print(f" saved: graph/graph.html")
|
||||||
|
|
||||||
|
append_log(f"## [{today}] graph | Knowledge graph rebuilt\n\n{len(nodes)} nodes, {len(edges)} edges ({len([e for e in edges if e['type']=='EXTRACTED'])} extracted, {len([e for e in edges if e['type']=='INFERRED'])} inferred).")
|
||||||
|
|
||||||
|
if open_browser:
|
||||||
|
webbrowser.open(f"file://{GRAPH_HTML.resolve()}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="Build LLM Wiki knowledge graph")
|
||||||
|
parser.add_argument("--no-infer", action="store_true", help="Skip semantic inference (faster)")
|
||||||
|
parser.add_argument("--open", action="store_true", help="Open graph.html in browser")
|
||||||
|
args = parser.parse_args()
|
||||||
|
build_graph(infer=not args.no_infer, open_browser=args.open)
|
||||||
100
tools/heal.py
Executable file
100
tools/heal.py
Executable file
@@ -0,0 +1,100 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Graph Self-Healing Tool
|
||||||
|
|
||||||
|
Automatically retrieves "Missing Entity Pages" from the wiki and generates
|
||||||
|
comprehensive definition pages for them using the LLM.
|
||||||
|
It resolves broken entity links by scanning existing contexts where the entity is referenced.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/heal.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Ensure tools can be imported
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from tools.lint import find_missing_entities, all_wiki_pages
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
ENTITIES_DIR = WIKI_DIR / "entities"
|
||||||
|
|
||||||
|
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
|
||||||
|
# Use litellm standard environment variables
|
||||||
|
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
|
||||||
|
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
|
||||||
|
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
|
||||||
|
"""Find up to 15 pages where this entity is mentioned natively."""
|
||||||
|
sources = []
|
||||||
|
for p in pages:
|
||||||
|
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
|
||||||
|
content = p.read_text(encoding="utf-8")
|
||||||
|
if entity.lower() in content.lower():
|
||||||
|
sources.append(p)
|
||||||
|
return sources[:15]
|
||||||
|
|
||||||
|
def heal_missing_entities():
|
||||||
|
pages = all_wiki_pages()
|
||||||
|
missing_entities = find_missing_entities(pages)
|
||||||
|
|
||||||
|
if not missing_entities:
|
||||||
|
print("Graph is fully connected. No missing entities found!")
|
||||||
|
return
|
||||||
|
|
||||||
|
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
|
||||||
|
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
|
||||||
|
|
||||||
|
for entity in missing_entities:
|
||||||
|
print(f"Healing entity page for: {entity}")
|
||||||
|
sources = search_sources(entity, pages)
|
||||||
|
|
||||||
|
context = ""
|
||||||
|
for s in sources:
|
||||||
|
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
|
||||||
|
|
||||||
|
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
|
||||||
|
Create an Entity definition page for "{entity}".
|
||||||
|
|
||||||
|
Here is how the entity appears in the current sources:
|
||||||
|
{context}
|
||||||
|
|
||||||
|
Format:
|
||||||
|
---
|
||||||
|
title: "{entity}"
|
||||||
|
type: entity
|
||||||
|
tags: []
|
||||||
|
sources: {[s.name for s in sources]}
|
||||||
|
---
|
||||||
|
|
||||||
|
# {entity}
|
||||||
|
|
||||||
|
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = call_llm(prompt)
|
||||||
|
out_path = ENTITIES_DIR / f"{entity}.md"
|
||||||
|
out_path.write_text(result, encoding="utf-8")
|
||||||
|
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [!] Failed to generate {entity}: {e}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
heal_missing_entities()
|
||||||
239
tools/ingest.py
Normal file
239
tools/ingest.py
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Ingest a source document into the LLM Wiki.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/ingest.py <path-to-source>
|
||||||
|
python tools/ingest.py raw/articles/my-article.md
|
||||||
|
|
||||||
|
The LLM reads the source, extracts knowledge, and updates the wiki:
|
||||||
|
- Creates wiki/sources/<slug>.md
|
||||||
|
- Updates wiki/index.md
|
||||||
|
- Updates wiki/overview.md (if warranted)
|
||||||
|
- Creates/updates entity and concept pages
|
||||||
|
- Appends to wiki/log.md
|
||||||
|
- Flags contradictions
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import hashlib
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import date
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
LOG_FILE = WIKI_DIR / "log.md"
|
||||||
|
INDEX_FILE = WIKI_DIR / "index.md"
|
||||||
|
OVERVIEW_FILE = WIKI_DIR / "overview.md"
|
||||||
|
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||||
|
|
||||||
|
|
||||||
|
def sha256(text: str) -> str:
|
||||||
|
return hashlib.sha256(text.encode()).hexdigest()[:16]
|
||||||
|
|
||||||
|
|
||||||
|
def read_file(path: Path) -> str:
|
||||||
|
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str, max_tokens: int = 8192) -> str:
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
model = os.getenv("LLM_MODEL", "claude-3-5-sonnet-latest")
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
|
||||||
|
def write_file(path: Path, content: str):
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(content, encoding="utf-8")
|
||||||
|
print(f" wrote: {path.relative_to(REPO_ROOT)}")
|
||||||
|
|
||||||
|
|
||||||
|
def build_wiki_context() -> str:
|
||||||
|
parts = []
|
||||||
|
if INDEX_FILE.exists():
|
||||||
|
parts.append(f"## wiki/index.md\n{read_file(INDEX_FILE)}")
|
||||||
|
if OVERVIEW_FILE.exists():
|
||||||
|
parts.append(f"## wiki/overview.md\n{read_file(OVERVIEW_FILE)}")
|
||||||
|
# Include a few recent source pages for contradiction checking
|
||||||
|
sources_dir = WIKI_DIR / "sources"
|
||||||
|
if sources_dir.exists():
|
||||||
|
recent = sorted(sources_dir.glob("*.md"), key=lambda p: p.stat().st_mtime, reverse=True)[:5]
|
||||||
|
for p in recent:
|
||||||
|
parts.append(f"## {p.relative_to(REPO_ROOT)}\n{p.read_text()}")
|
||||||
|
return "\n\n---\n\n".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_json_from_response(text: str) -> dict:
|
||||||
|
# Strip markdown code fences if present
|
||||||
|
text = re.sub(r"^```(?:json)?\s*", "", text.strip())
|
||||||
|
text = re.sub(r"\s*```$", "", text.strip())
|
||||||
|
# Find the outermost JSON object
|
||||||
|
match = re.search(r"\{[\s\S]*\}", text)
|
||||||
|
if not match:
|
||||||
|
raise ValueError("No JSON object found in response")
|
||||||
|
return json.loads(match.group())
|
||||||
|
|
||||||
|
|
||||||
|
def update_index(new_entry: str, section: str = "Sources"):
|
||||||
|
content = read_file(INDEX_FILE)
|
||||||
|
if not content:
|
||||||
|
content = "# Wiki Index\n\n## Overview\n- [Overview](overview.md) — living synthesis\n\n## Sources\n\n## Entities\n\n## Concepts\n\n## Syntheses\n"
|
||||||
|
section_header = f"## {section}"
|
||||||
|
if section_header in content:
|
||||||
|
content = content.replace(section_header + "\n", section_header + "\n" + new_entry + "\n")
|
||||||
|
else:
|
||||||
|
content += f"\n{section_header}\n{new_entry}\n"
|
||||||
|
write_file(INDEX_FILE, content)
|
||||||
|
|
||||||
|
|
||||||
|
def append_log(entry: str):
|
||||||
|
existing = read_file(LOG_FILE)
|
||||||
|
write_file(LOG_FILE, entry.strip() + "\n\n" + existing)
|
||||||
|
|
||||||
|
|
||||||
|
def ingest(source_path: str):
|
||||||
|
source = Path(source_path)
|
||||||
|
if not source.exists():
|
||||||
|
print(f"Error: file not found: {source_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
source_content = source.read_text(encoding="utf-8")
|
||||||
|
source_hash = sha256(source_content)
|
||||||
|
today = date.today().isoformat()
|
||||||
|
|
||||||
|
print(f"\nIngesting: {source.name} (hash: {source_hash})")
|
||||||
|
|
||||||
|
wiki_context = build_wiki_context()
|
||||||
|
schema = read_file(SCHEMA_FILE)
|
||||||
|
|
||||||
|
schema = read_file(SCHEMA_FILE)
|
||||||
|
|
||||||
|
prompt = f"""You are maintaining an LLM Wiki. Process this source document and integrate its knowledge into the wiki.
|
||||||
|
|
||||||
|
Schema and conventions:
|
||||||
|
{schema}
|
||||||
|
|
||||||
|
Current wiki state (index + recent pages):
|
||||||
|
{wiki_context if wiki_context else "(wiki is empty — this is the first source)"}
|
||||||
|
|
||||||
|
New source to ingest (file: {source.relative_to(REPO_ROOT) if source.is_relative_to(REPO_ROOT) else source.name}):
|
||||||
|
=== SOURCE START ===
|
||||||
|
{source_content}
|
||||||
|
=== SOURCE END ===
|
||||||
|
|
||||||
|
Today's date: {today}
|
||||||
|
|
||||||
|
Return ONLY a valid JSON object with these fields (no markdown fences, no prose outside the JSON):
|
||||||
|
{{
|
||||||
|
"title": "Human-readable title for this source",
|
||||||
|
"slug": "kebab-case-slug-for-filename",
|
||||||
|
"source_page": "full markdown content for wiki/sources/<slug>.md — use the source page format from the schema",
|
||||||
|
"index_entry": "- [Title](sources/slug.md) — one-line summary",
|
||||||
|
"overview_update": "full updated content for wiki/overview.md, or null if no update needed",
|
||||||
|
"entity_pages": [
|
||||||
|
{{"path": "entities/EntityName.md", "content": "full markdown content"}}
|
||||||
|
],
|
||||||
|
"concept_pages": [
|
||||||
|
{{"path": "concepts/ConceptName.md", "content": "full markdown content"}}
|
||||||
|
],
|
||||||
|
"contradictions": ["describe any contradiction with existing wiki content, or empty list"],
|
||||||
|
"log_entry": "## [{today}] ingest | <title>\\n\\nAdded source. Key claims: ..."
|
||||||
|
}}
|
||||||
|
"""
|
||||||
|
|
||||||
|
print(f" calling API (model: ...)")
|
||||||
|
raw = call_llm(prompt, max_tokens=8192)
|
||||||
|
try:
|
||||||
|
data = parse_json_from_response(raw)
|
||||||
|
except (ValueError, json.JSONDecodeError) as e:
|
||||||
|
print(f"Error parsing API response: {e}")
|
||||||
|
print("Raw response saved to /tmp/ingest_debug.txt")
|
||||||
|
Path("/tmp/ingest_debug.txt").write_text(raw)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Write source page
|
||||||
|
slug = data["slug"]
|
||||||
|
write_file(WIKI_DIR / "sources" / f"{slug}.md", data["source_page"])
|
||||||
|
|
||||||
|
# Write entity pages
|
||||||
|
for page in data.get("entity_pages", []):
|
||||||
|
write_file(WIKI_DIR / page["path"], page["content"])
|
||||||
|
|
||||||
|
# Write concept pages
|
||||||
|
for page in data.get("concept_pages", []):
|
||||||
|
write_file(WIKI_DIR / page["path"], page["content"])
|
||||||
|
|
||||||
|
# Update overview
|
||||||
|
if data.get("overview_update"):
|
||||||
|
write_file(OVERVIEW_FILE, data["overview_update"])
|
||||||
|
|
||||||
|
# Update index
|
||||||
|
update_index(data["index_entry"], section="Sources")
|
||||||
|
|
||||||
|
# Append log
|
||||||
|
append_log(data["log_entry"])
|
||||||
|
|
||||||
|
# Report contradictions
|
||||||
|
contradictions = data.get("contradictions", [])
|
||||||
|
if contradictions:
|
||||||
|
print("\n ⚠️ Contradictions detected:")
|
||||||
|
for c in contradictions:
|
||||||
|
print(f" - {c}")
|
||||||
|
|
||||||
|
print(f"\nDone. Ingested: {data['title']}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python tools/ingest.py <path-to-source> [path2 ...] [dir1 ...]")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
paths_to_process = []
|
||||||
|
for arg in sys.argv[1:]:
|
||||||
|
p = Path(arg)
|
||||||
|
if p.is_file() and p.suffix == ".md":
|
||||||
|
paths_to_process.append(p)
|
||||||
|
elif p.is_dir():
|
||||||
|
for f in p.rglob("*.md"):
|
||||||
|
if f.is_file():
|
||||||
|
paths_to_process.append(f)
|
||||||
|
else:
|
||||||
|
import glob
|
||||||
|
for f in glob.glob(arg, recursive=True):
|
||||||
|
g_p = Path(f)
|
||||||
|
if g_p.is_file() and g_p.suffix == ".md":
|
||||||
|
paths_to_process.append(g_p)
|
||||||
|
|
||||||
|
# Deduplicate while preserving order
|
||||||
|
unique_paths = []
|
||||||
|
seen = set()
|
||||||
|
for p in paths_to_process:
|
||||||
|
abs_p = p.resolve()
|
||||||
|
if abs_p not in seen:
|
||||||
|
seen.add(abs_p)
|
||||||
|
unique_paths.append(p)
|
||||||
|
|
||||||
|
if not unique_paths:
|
||||||
|
print("Error: no markdown files found to ingest.")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if len(unique_paths) > 1:
|
||||||
|
print(f"Batch mode: found {len(unique_paths)} files to ingest.")
|
||||||
|
|
||||||
|
for p in unique_paths:
|
||||||
|
ingest(str(p))
|
||||||
210
tools/lint.py
Normal file
210
tools/lint.py
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Lint the LLM Wiki for health issues.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/lint.py
|
||||||
|
python tools/lint.py --save # save lint report to wiki/lint-report.md
|
||||||
|
|
||||||
|
Checks:
|
||||||
|
- Orphan pages (no inbound wikilinks from other pages)
|
||||||
|
- Broken wikilinks (pointing to pages that don't exist)
|
||||||
|
- Missing entity pages (entities mentioned in 3+ pages but no page)
|
||||||
|
- Contradictions between pages
|
||||||
|
- Data gaps and suggested new sources
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from collections import defaultdict
|
||||||
|
from datetime import date
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
LOG_FILE = WIKI_DIR / "log.md"
|
||||||
|
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||||
|
|
||||||
|
|
||||||
|
def read_file(path: Path) -> str:
|
||||||
|
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
model = os.getenv(model_env, default_model)
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
|
||||||
|
def all_wiki_pages() -> list[Path]:
|
||||||
|
return [p for p in WIKI_DIR.rglob("*.md")
|
||||||
|
if p.name not in ("index.md", "log.md", "lint-report.md")]
|
||||||
|
|
||||||
|
|
||||||
|
def extract_wikilinks(content: str) -> list[str]:
|
||||||
|
return re.findall(r'\[\[([^\]]+)\]\]', content)
|
||||||
|
|
||||||
|
|
||||||
|
def page_name_to_path(name: str) -> list[Path]:
|
||||||
|
"""Try to resolve a [[WikiLink]] to a file path."""
|
||||||
|
candidates = []
|
||||||
|
for p in all_wiki_pages():
|
||||||
|
if p.stem.lower() == name.lower() or p.stem == name:
|
||||||
|
candidates.append(p)
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def find_orphans(pages: list[Path]) -> list[Path]:
|
||||||
|
inbound = defaultdict(int)
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
for link in extract_wikilinks(content):
|
||||||
|
resolved = page_name_to_path(link)
|
||||||
|
for r in resolved:
|
||||||
|
inbound[r] += 1
|
||||||
|
return [p for p in pages if inbound[p] == 0 and p != WIKI_DIR / "overview.md"]
|
||||||
|
|
||||||
|
|
||||||
|
def find_broken_links(pages: list[Path]) -> list[tuple[Path, str]]:
|
||||||
|
broken = []
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
for link in extract_wikilinks(content):
|
||||||
|
if not page_name_to_path(link):
|
||||||
|
broken.append((p, link))
|
||||||
|
return broken
|
||||||
|
|
||||||
|
|
||||||
|
def find_missing_entities(pages: list[Path]) -> list[str]:
|
||||||
|
"""Find entity-like names mentioned in 3+ pages but lacking their own page."""
|
||||||
|
mention_counts: dict[str, int] = defaultdict(int)
|
||||||
|
existing_pages = {p.stem.lower() for p in pages}
|
||||||
|
for p in pages:
|
||||||
|
content = read_file(p)
|
||||||
|
links = extract_wikilinks(content)
|
||||||
|
for link in links:
|
||||||
|
if link.lower() not in existing_pages:
|
||||||
|
mention_counts[link] += 1
|
||||||
|
return [name for name, count in mention_counts.items() if count >= 3]
|
||||||
|
|
||||||
|
|
||||||
|
def run_lint():
|
||||||
|
pages = all_wiki_pages()
|
||||||
|
today = date.today().isoformat()
|
||||||
|
|
||||||
|
if not pages:
|
||||||
|
print("Wiki is empty. Nothing to lint.")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
print(f"Linting {len(pages)} wiki pages...")
|
||||||
|
|
||||||
|
# Deterministic checks
|
||||||
|
orphans = find_orphans(pages)
|
||||||
|
broken = find_broken_links(pages)
|
||||||
|
missing_entities = find_missing_entities(pages)
|
||||||
|
|
||||||
|
print(f" orphans: {len(orphans)}")
|
||||||
|
print(f" broken links: {len(broken)}")
|
||||||
|
print(f" missing entity pages: {len(missing_entities)}")
|
||||||
|
|
||||||
|
# Build context for semantic checks (contradictions, gaps)
|
||||||
|
# Use a sample of pages to stay within context limits
|
||||||
|
sample = pages[:20]
|
||||||
|
pages_context = ""
|
||||||
|
for p in sample:
|
||||||
|
rel = p.relative_to(REPO_ROOT)
|
||||||
|
pages_context += f"\n\n### {rel}\n{read_file(p)[:1500]}" # truncate long pages
|
||||||
|
|
||||||
|
print(" running semantic lint via API...")
|
||||||
|
prompt = f"""You are linting an LLM Wiki. Review the pages below and identify:
|
||||||
|
1. Contradictions between pages (claims that conflict)
|
||||||
|
2. Stale content (summaries that newer sources have superseded)
|
||||||
|
3. Data gaps (important questions the wiki can't answer — suggest specific sources to find)
|
||||||
|
4. Concepts mentioned but lacking depth
|
||||||
|
|
||||||
|
Wiki pages (sample of {len(sample)} pages):
|
||||||
|
{pages_context}
|
||||||
|
|
||||||
|
Return a markdown lint report with these sections:
|
||||||
|
## Contradictions
|
||||||
|
## Stale Content
|
||||||
|
## Data Gaps & Suggested Sources
|
||||||
|
## Concepts Needing More Depth
|
||||||
|
|
||||||
|
Be specific — name the exact pages and claims involved.
|
||||||
|
"""
|
||||||
|
semantic_report = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=3000)
|
||||||
|
|
||||||
|
# Compose full report
|
||||||
|
report_lines = [
|
||||||
|
f"# Wiki Lint Report — {today}",
|
||||||
|
"",
|
||||||
|
f"Scanned {len(pages)} pages.",
|
||||||
|
"",
|
||||||
|
"## Structural Issues",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
|
||||||
|
if orphans:
|
||||||
|
report_lines.append("### Orphan Pages (no inbound links)")
|
||||||
|
for p in orphans:
|
||||||
|
report_lines.append(f"- `{p.relative_to(REPO_ROOT)}`")
|
||||||
|
report_lines.append("")
|
||||||
|
|
||||||
|
if broken:
|
||||||
|
report_lines.append("### Broken Wikilinks")
|
||||||
|
for page, link in broken:
|
||||||
|
report_lines.append(f"- `{page.relative_to(REPO_ROOT)}` links to `[[{link}]]` — not found")
|
||||||
|
report_lines.append("")
|
||||||
|
|
||||||
|
if missing_entities:
|
||||||
|
report_lines.append("### Missing Entity Pages (mentioned 3+ times but no page)")
|
||||||
|
for name in missing_entities:
|
||||||
|
report_lines.append(f"- `[[{name}]]`")
|
||||||
|
report_lines.append("")
|
||||||
|
|
||||||
|
if not orphans and not broken and not missing_entities:
|
||||||
|
report_lines.append("No structural issues found.")
|
||||||
|
report_lines.append("")
|
||||||
|
|
||||||
|
report_lines.append("---")
|
||||||
|
report_lines.append("")
|
||||||
|
report_lines.append(semantic_report)
|
||||||
|
|
||||||
|
report = "\n".join(report_lines)
|
||||||
|
print("\n" + report)
|
||||||
|
return report
|
||||||
|
|
||||||
|
|
||||||
|
def append_log(entry: str):
|
||||||
|
existing = read_file(LOG_FILE)
|
||||||
|
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="Lint the LLM Wiki")
|
||||||
|
parser.add_argument("--save", action="store_true", help="Save lint report to wiki/lint-report.md")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
report = run_lint()
|
||||||
|
|
||||||
|
if args.save and report:
|
||||||
|
report_path = WIKI_DIR / "lint-report.md"
|
||||||
|
report_path.write_text(report, encoding="utf-8")
|
||||||
|
print(f"\nSaved: {report_path.relative_to(REPO_ROOT)}")
|
||||||
|
|
||||||
|
today = date.today().isoformat()
|
||||||
|
append_log(f"## [{today}] lint | Wiki health check\n\nRan lint. See lint-report.md for details.")
|
||||||
192
tools/query.py
Normal file
192
tools/query.py
Normal file
@@ -0,0 +1,192 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Query the LLM Wiki.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/query.py "What are the main themes across all sources?"
|
||||||
|
python tools/query.py "How does ConceptA relate to ConceptB?" --save
|
||||||
|
python tools/query.py "Summarize everything about EntityName" --save synthesis/my-analysis.md
|
||||||
|
|
||||||
|
Flags:
|
||||||
|
--save Save the answer back into the wiki (prompts for filename)
|
||||||
|
--save <path> Save to a specific wiki path
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
import json
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import date
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
INDEX_FILE = WIKI_DIR / "index.md"
|
||||||
|
LOG_FILE = WIKI_DIR / "log.md"
|
||||||
|
SCHEMA_FILE = REPO_ROOT / "CLAUDE.md"
|
||||||
|
|
||||||
|
|
||||||
|
def read_file(path: Path) -> str:
|
||||||
|
return path.read_text(encoding="utf-8") if path.exists() else ""
|
||||||
|
|
||||||
|
|
||||||
|
def write_file(path: Path, content: str):
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(content, encoding="utf-8")
|
||||||
|
print(f" saved: {path.relative_to(REPO_ROOT)}")
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str:
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
model = os.getenv(model_env, default_model)
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
|
||||||
|
def find_relevant_pages(question: str, index_content: str) -> list[Path]:
|
||||||
|
"""Extract linked pages from index that seem relevant to the question."""
|
||||||
|
# Pull all [[links]] and markdown links from index
|
||||||
|
md_links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', index_content)
|
||||||
|
question_lower = question.lower()
|
||||||
|
relevant = []
|
||||||
|
|
||||||
|
for title, href in md_links:
|
||||||
|
title_lower = title.lower()
|
||||||
|
match = False
|
||||||
|
|
||||||
|
# 1. English/Space-separated: check words > 3 chars
|
||||||
|
if any(word in question_lower for word in title_lower.split() if len(word) > 3):
|
||||||
|
match = True
|
||||||
|
# 2. Exact substring match for the whole title (useful for short CJK titles, e.g. len=2)
|
||||||
|
elif len(title_lower) >= 2 and title_lower in question_lower:
|
||||||
|
match = True
|
||||||
|
# 3. CJK chunks: find contiguous non-ASCII characters (len >= 2) and check if in question
|
||||||
|
elif any(chunk in question_lower for chunk in re.findall(r'[^\x00-\x7F]{2,}', title_lower)):
|
||||||
|
match = True
|
||||||
|
|
||||||
|
if match:
|
||||||
|
p = WIKI_DIR / href
|
||||||
|
if p.exists() and p not in relevant:
|
||||||
|
relevant.append(p)
|
||||||
|
|
||||||
|
# Always include overview
|
||||||
|
overview = WIKI_DIR / "overview.md"
|
||||||
|
if overview.exists() and overview not in relevant:
|
||||||
|
relevant.insert(0, overview)
|
||||||
|
return relevant[:12] # cap to avoid context overflow
|
||||||
|
|
||||||
|
|
||||||
|
def append_log(entry: str):
|
||||||
|
existing = read_file(LOG_FILE)
|
||||||
|
LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def query(question: str, save_path: str | None = None):
|
||||||
|
today = date.today().isoformat()
|
||||||
|
|
||||||
|
# Step 1: Read index
|
||||||
|
index_content = read_file(INDEX_FILE)
|
||||||
|
if not index_content:
|
||||||
|
print("Wiki is empty. Ingest some sources first with: python tools/ingest.py <source>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Step 2: Find relevant pages
|
||||||
|
relevant_pages = find_relevant_pages(question, index_content)
|
||||||
|
|
||||||
|
# If no keyword match, ask Claude to identify relevant pages from the index
|
||||||
|
if not relevant_pages or len(relevant_pages) <= 1:
|
||||||
|
print(" selecting relevant pages via API...")
|
||||||
|
prompt = f"Given this wiki index:\n\n{index_content}\n\nWhich pages are most relevant to answering: \"{question}\"\n\nReturn ONLY a JSON array of relative file paths (as listed in the index), e.g. [\"sources/foo.md\", \"concepts/Bar.md\"]. Maximum 10 pages."
|
||||||
|
raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=512)
|
||||||
|
raw = raw.strip()
|
||||||
|
raw = re.sub(r"^```(?:json)?\s*", "", raw)
|
||||||
|
raw = re.sub(r"\s*```$", "", raw)
|
||||||
|
try:
|
||||||
|
paths = json.loads(raw)
|
||||||
|
relevant_pages = [WIKI_DIR / p for p in paths if (WIKI_DIR / p).exists()]
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Step 3: Read relevant pages
|
||||||
|
pages_context = ""
|
||||||
|
for p in relevant_pages:
|
||||||
|
rel = p.relative_to(REPO_ROOT)
|
||||||
|
pages_context += f"\n\n### {rel}\n{p.read_text(encoding='utf-8')}"
|
||||||
|
|
||||||
|
if not pages_context:
|
||||||
|
pages_context = f"\n\n### wiki/index.md\n{index_content}"
|
||||||
|
|
||||||
|
schema = read_file(SCHEMA_FILE)
|
||||||
|
|
||||||
|
# Step 4: Synthesize answer
|
||||||
|
print(f" synthesizing answer from {len(relevant_pages)} pages...")
|
||||||
|
prompt = f"""You are querying an LLM Wiki to answer a question. Use the wiki pages below to synthesize a thorough answer. Cite sources using [[PageName]] wikilink syntax.
|
||||||
|
|
||||||
|
Schema:
|
||||||
|
{schema}
|
||||||
|
|
||||||
|
Wiki pages:
|
||||||
|
{pages_context}
|
||||||
|
|
||||||
|
Question: {question}
|
||||||
|
|
||||||
|
Write a well-structured markdown answer with headers, bullets, and [[wikilink]] citations. At the end, add a ## Sources section listing the pages you drew from.
|
||||||
|
"""
|
||||||
|
answer = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=4096)
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(answer)
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
# Step 5: Optionally save answer
|
||||||
|
if save_path is not None:
|
||||||
|
if save_path == "":
|
||||||
|
# Prompt for filename
|
||||||
|
slug = input("\nSave as (slug, e.g. 'my-analysis'): ").strip()
|
||||||
|
if not slug:
|
||||||
|
print("Skipping save.")
|
||||||
|
return
|
||||||
|
save_path = f"syntheses/{slug}.md"
|
||||||
|
|
||||||
|
full_save_path = WIKI_DIR / save_path
|
||||||
|
frontmatter = f"""---
|
||||||
|
title: "{question[:80]}"
|
||||||
|
type: synthesis
|
||||||
|
tags: []
|
||||||
|
sources: []
|
||||||
|
last_updated: {today}
|
||||||
|
---
|
||||||
|
|
||||||
|
"""
|
||||||
|
write_file(full_save_path, frontmatter + answer)
|
||||||
|
|
||||||
|
# Update index
|
||||||
|
index_content = read_file(INDEX_FILE)
|
||||||
|
entry = f"- [{question[:60]}]({save_path}) — synthesis"
|
||||||
|
if "## Syntheses" in index_content:
|
||||||
|
index_content = index_content.replace("## Syntheses\n", f"## Syntheses\n{entry}\n")
|
||||||
|
INDEX_FILE.write_text(index_content, encoding="utf-8")
|
||||||
|
print(f" indexed: {save_path}")
|
||||||
|
|
||||||
|
# Append to log
|
||||||
|
append_log(f"## [{today}] query | {question[:80]}\n\nSynthesized answer from {len(relevant_pages)} pages." +
|
||||||
|
(f" Saved to {save_path}." if save_path else ""))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="Query the LLM Wiki")
|
||||||
|
parser.add_argument("question", help="Question to ask the wiki")
|
||||||
|
parser.add_argument("--save", nargs="?", const="", default=None,
|
||||||
|
help="Save answer to wiki (optionally specify path)")
|
||||||
|
args = parser.parse_args()
|
||||||
|
query(args.question, args.save)
|
||||||
14
wiki/index.md
Normal file
14
wiki/index.md
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# Wiki Index
|
||||||
|
|
||||||
|
This file is maintained by the LLM. Updated on every ingest.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
- [Overview](overview.md) — living synthesis across all sources
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
## Entities
|
||||||
|
|
||||||
|
## Concepts
|
||||||
|
|
||||||
|
## Syntheses
|
||||||
9
wiki/log.md
Normal file
9
wiki/log.md
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Wiki Log
|
||||||
|
|
||||||
|
Append-only chronological record of all operations.
|
||||||
|
|
||||||
|
Format: `## [YYYY-MM-DD] <operation> | <title>`
|
||||||
|
|
||||||
|
Parse recent entries: `grep "^## \[" wiki/log.md | tail -10`
|
||||||
|
|
||||||
|
---
|
||||||
17
wiki/overview.md
Normal file
17
wiki/overview.md
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
---
|
||||||
|
title: "Overview"
|
||||||
|
type: synthesis
|
||||||
|
tags: []
|
||||||
|
sources: []
|
||||||
|
last_updated: ""
|
||||||
|
---
|
||||||
|
|
||||||
|
# Overview
|
||||||
|
|
||||||
|
*This page is maintained by the LLM. It is updated on every ingest to reflect the current synthesis across all sources.*
|
||||||
|
|
||||||
|
No sources ingested yet. Add your first source with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/ingest.py raw/your-source.md
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user