Merge pull request #20 from watsonctl/feat/self-healing-and-automation
feat: Add intelligent graph self-healing tool (heal.py) & Daily cron orchestration guide
This commit is contained in:
101
docs/automated-sync.md
Normal file
101
docs/automated-sync.md
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
# Automated Wiki Synchronization Guide
|
||||||
|
|
||||||
|
Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline.
|
||||||
|
|
||||||
|
This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments.
|
||||||
|
|
||||||
|
## The Two-Step Architecture
|
||||||
|
|
||||||
|
LLM Wiki Agent ingestion is a two-step process:
|
||||||
|
1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area.
|
||||||
|
2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph.
|
||||||
|
|
||||||
|
### Step 1: The Master Orchestrator Script
|
||||||
|
|
||||||
|
Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -uo pipefail
|
||||||
|
|
||||||
|
# Define variables
|
||||||
|
LAB_DIR="$HOME/projects/active/personal-wiki-lab"
|
||||||
|
LOG_FILE="$LAB_DIR/automation-cron.log"
|
||||||
|
DATE=$(date "+%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
echo "=====================================================" >> "$LOG_FILE"
|
||||||
|
echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE"
|
||||||
|
|
||||||
|
cd "$LAB_DIR" || exit 1
|
||||||
|
|
||||||
|
# 1. Run your personal Vault-to-Raw symlink script here
|
||||||
|
# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
# 2. Trigger Litellm Batch Ingestion using LLM of your choice
|
||||||
|
export LLM_MODEL="gemini/gemini-3-flash-preview"
|
||||||
|
export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY
|
||||||
|
|
||||||
|
echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE"
|
||||||
|
find raw/ -type l -name "*.md" -o -type f -name "*.md" | \
|
||||||
|
while read file; do
|
||||||
|
python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1
|
||||||
|
done
|
||||||
|
|
||||||
|
# 3. Heal Graph Context (Auto-resolves broken semantic links)
|
||||||
|
echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE"
|
||||||
|
python3 tools/heal.py >> "$LOG_FILE" 2>&1
|
||||||
|
|
||||||
|
echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE"
|
||||||
|
echo "=====================================================" >> "$LOG_FILE"
|
||||||
|
```
|
||||||
|
|
||||||
|
Don't forget to make it executable: `chmod +x daily-automated-sync.sh`.
|
||||||
|
|
||||||
|
### Step 2: System Scheduler (macOS launchd)
|
||||||
|
|
||||||
|
For macOS, `launchd` is significantly more robust than `cron`.
|
||||||
|
|
||||||
|
Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||||
|
<plist version="1.0">
|
||||||
|
<dict>
|
||||||
|
<key>Label</key>
|
||||||
|
<string>com.personal-wiki-sync</string>
|
||||||
|
<key>ProgramArguments</key>
|
||||||
|
<array>
|
||||||
|
<string>/bin/bash</string>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string>
|
||||||
|
</array>
|
||||||
|
|
||||||
|
<!-- Execute automatically at 2:00 AM daily -->
|
||||||
|
<key>StartCalendarInterval</key>
|
||||||
|
<dict>
|
||||||
|
<key>Hour</key>
|
||||||
|
<integer>2</integer>
|
||||||
|
<key>Minute</key>
|
||||||
|
<integer>0</integer>
|
||||||
|
</dict>
|
||||||
|
|
||||||
|
<!-- Run upon system boot if the interval was missed -->
|
||||||
|
<key>RunAtLoad</key>
|
||||||
|
<true/>
|
||||||
|
|
||||||
|
<!-- Diagnostic Logs -->
|
||||||
|
<key>StandardOutPath</key>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string>
|
||||||
|
<key>StandardErrorPath</key>
|
||||||
|
<string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string>
|
||||||
|
</dict>
|
||||||
|
</plist>
|
||||||
|
```
|
||||||
|
|
||||||
|
Load the daemon:
|
||||||
|
```bash
|
||||||
|
launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist
|
||||||
|
```
|
||||||
|
|
||||||
|
### Self-Healing & Health Monitoring
|
||||||
|
Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized.
|
||||||
100
tools/heal.py
Executable file
100
tools/heal.py
Executable file
@@ -0,0 +1,100 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Graph Self-Healing Tool
|
||||||
|
|
||||||
|
Automatically retrieves "Missing Entity Pages" from the wiki and generates
|
||||||
|
comprehensive definition pages for them using the LLM.
|
||||||
|
It resolves broken entity links by scanning existing contexts where the entity is referenced.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python tools/heal.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
try:
|
||||||
|
from litellm import completion
|
||||||
|
except ImportError:
|
||||||
|
print("Error: litellm not installed. Run: pip install litellm")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Ensure tools can be imported
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from tools.lint import find_missing_entities, all_wiki_pages
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).parent.parent
|
||||||
|
WIKI_DIR = REPO_ROOT / "wiki"
|
||||||
|
ENTITIES_DIR = WIKI_DIR / "entities"
|
||||||
|
|
||||||
|
def call_llm(prompt: str, max_tokens: int = 1500) -> str:
|
||||||
|
# Use litellm standard environment variables
|
||||||
|
# e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY
|
||||||
|
model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model
|
||||||
|
|
||||||
|
response = completion(
|
||||||
|
model=model,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
max_tokens=max_tokens
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
|
def search_sources(entity: str, pages: list[Path]) -> list[Path]:
|
||||||
|
"""Find up to 15 pages where this entity is mentioned natively."""
|
||||||
|
sources = []
|
||||||
|
for p in pages:
|
||||||
|
if "entities" not in str(p.parent) and "concepts" not in str(p.parent):
|
||||||
|
content = p.read_text(encoding="utf-8")
|
||||||
|
if entity.lower() in content.lower():
|
||||||
|
sources.append(p)
|
||||||
|
return sources[:15]
|
||||||
|
|
||||||
|
def heal_missing_entities():
|
||||||
|
pages = all_wiki_pages()
|
||||||
|
missing_entities = find_missing_entities(pages)
|
||||||
|
|
||||||
|
if not missing_entities:
|
||||||
|
print("Graph is fully connected. No missing entities found!")
|
||||||
|
return
|
||||||
|
|
||||||
|
ENTITIES_DIR.mkdir(exist_ok=True, parents=True)
|
||||||
|
print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...")
|
||||||
|
|
||||||
|
for entity in missing_entities:
|
||||||
|
print(f"Healing entity page for: {entity}")
|
||||||
|
sources = search_sources(entity, pages)
|
||||||
|
|
||||||
|
context = ""
|
||||||
|
for s in sources:
|
||||||
|
context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}"
|
||||||
|
|
||||||
|
prompt = f"""You are filling a data gap in the Personal LLM Wiki.
|
||||||
|
Create an Entity definition page for "{entity}".
|
||||||
|
|
||||||
|
Here is how the entity appears in the current sources:
|
||||||
|
{context}
|
||||||
|
|
||||||
|
Format:
|
||||||
|
---
|
||||||
|
title: "{entity}"
|
||||||
|
type: entity
|
||||||
|
tags: []
|
||||||
|
sources: {[s.name for s in sources]}
|
||||||
|
---
|
||||||
|
|
||||||
|
# {entity}
|
||||||
|
|
||||||
|
Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = call_llm(prompt)
|
||||||
|
out_path = ENTITIES_DIR / f"{entity}.md"
|
||||||
|
out_path.write_text(result, encoding="utf-8")
|
||||||
|
print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" [!] Failed to generate {entity}: {e}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
heal_missing_entities()
|
||||||
Reference in New Issue
Block a user