feat(tools): add self-healing graph utility and automated orchestration docs

- Introduced tools/heal.py to automatically dynamically identify and build missing structural concepts and entities by tracing contextual usages using litellm.
- Add docs/automated-sync.md with cron/launchd orchestration best-practices.
- Closes Issue #16 on Graph Integrity Constraints.
This commit is contained in:
watsonk1998
2026-04-14 00:37:44 +08:00
parent d8ac6107bf
commit 6868034554
2 changed files with 201 additions and 0 deletions

101
docs/automated-sync.md Normal file
View File

@@ -0,0 +1,101 @@
# Automated Wiki Synchronization Guide
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
## The Two-Step Architecture
LLM Wiki Agent ingestion is a two-step process:
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
### Step 1: The Master Orchestrator Script
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
```bash
#!/usr/bin/env bash
set -uo pipefail
# Define variables
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
LOG_FILE="$LAB_DIR/automation-cron.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
echo "=====================================================" >> "$LOG_FILE"
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
cd "$LAB_DIR" || exit 1
# 1. Run your personal Vault-to-Raw symlink script here
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
export LLM_MODEL="gemini/gemini-3-flash-preview"
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
while read file; do
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
done
# 3. Heal Graph Context (Auto-resolves broken semantic links)
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
python3 tools/heal.py >> "$LOG_FILE" 2>&1
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
echo "=====================================================" >> "$LOG_FILE"
```
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
### Step 2: System Scheduler (macOS launchd)
For macOS, `launchd` is significantly more robust than `cron`.
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.personal-wiki-sync</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
</array>
<!-- Execute automatically at 2:00 AM daily -->
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>2</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<!-- Run upon system boot if the interval was missed -->
<key>RunAtLoad</key>
<true/>
<!-- Diagnostic Logs -->
<key>StandardOutPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
<key>StandardErrorPath</key>
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
</dict>
</plist>
```
Load the daemon:
```bash
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
```
### Self-Healing & Health Monitoring
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.

100
tools/heal.py Executable file
View File

@@ -0,0 +1,100 @@
#!/usr/bin/env python3
"""
Graph Self-Healing Tool
Automatically retrieves "Missing Entity Pages" from the wiki and generates
comprehensive definition pages for them using the LLM.
It resolves broken entity links by scanning existing contexts where the entity is referenced.
Usage:
python tools/heal.py
"""
import os
import sys
from pathlib import Path
try:
from litellm import completion
except ImportError:
print("Error: litellm not installed. Run: pip install litellm")
sys.exit(1)
# Ensure tools can be imported
sys.path.insert(0, str(Path(__file__).parent.parent))
from tools.lint import find_missing_entities, all_wiki_pages
REPO_ROOT = Path(__file__).parent.parent
WIKI_DIR = REPO_ROOT / "wiki"
ENTITIES_DIR = WIKI_DIR / "entities"
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
# Use litellm standard environment variables
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
"""Find up to 15 pages where this entity is mentioned natively."""
sources = []
for p in pages:
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
content = p.read_text(encoding="utf-8")
if entity.lower() in content.lower():
sources.append(p)
return sources[:15]
def heal_missing_entities():
pages = all_wiki_pages()
missing_entities = find_missing_entities(pages)
if not missing_entities:
print("Graph is fully connected. No missing entities found!")
return
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
for entity in missing_entities:
print(f"Healing entity page for: {entity}")
sources = search_sources(entity, pages)
context = ""
for s in sources:
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
Create an Entity definition page for "{entity}".
Here is how the entity appears in the current sources:
{context}
Format:
---
title: "{entity}"
type: entity
tags: []
sources: {[s.name for s in sources]}
---
# {entity}
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
"""
try:
result = call_llm(prompt)
out_path = ENTITIES_DIR / f"{entity}.md"
out_path.write_text(result, encoding="utf-8")
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
except Exception as e:
print(f" [!] Failed to generate {entity}: {e}")
if __name__ == "__main__":
heal_missing_entities()