diff --git a/.claude/commands/wiki-graph.md b/.claude/commands/wiki-graph.md deleted file mode 100644 index 64c0338d..00000000 --- a/.claude/commands/wiki-graph.md +++ /dev/null @@ -1,18 +0,0 @@ -Build the LLM Wiki knowledge graph. - -Usage: /wiki-graph - -First try running: python tools/build_graph.py --open - -If that fails (missing dependencies), build the graph manually: - -1. Use Grep to find all [[wikilinks]] across every file in wiki/ -2. Build a nodes list: one node per wiki page, with id=relative-path, label=title, type from frontmatter -3. Build an edges list: one edge per [[wikilink]], tagged EXTRACTED -4. Infer additional implicit relationships between pages not captured by wikilinks — tag these INFERRED with a confidence score (0.0–1.0); tag low-confidence ones AMBIGUOUS -5. Write graph/graph.json with {nodes, edges, built: today} -6. Write graph/graph.html as a self-contained vis.js page (nodes colored by type, edges colored by type, interactive, searchable) - -After building, summarize: node count, edge count, breakdown by type, and the most connected nodes (hubs). - -Append to wiki/log.md: ## [today's date] graph | Knowledge graph rebuilt diff --git a/.claude/commands/wiki-ingest.md b/.claude/commands/wiki-ingest.md deleted file mode 100644 index f8e3a37c..00000000 --- a/.claude/commands/wiki-ingest.md +++ /dev/null @@ -1,18 +0,0 @@ -Ingest a source document into the LLM Wiki. - -Usage: /wiki-ingest $ARGUMENTS - -$ARGUMENTS should be the path to a file in raw/, e.g. `raw/articles/my-article.md` - -Follow the Ingest Workflow defined in CLAUDE.md exactly: -1. Read the source file at the given path -2. Read wiki/index.md and wiki/overview.md for current context -3. Write wiki/sources/.md (source page format per CLAUDE.md) -4. Update wiki/index.md — add the new entry under Sources -5. Update wiki/overview.md — revise synthesis if warranted -6. Create/update entity pages (wiki/entities/) for key people, companies, projects -7. Create/update concept pages (wiki/concepts/) for key ideas and frameworks -8. Flag any contradictions with existing wiki content -9. Append to wiki/log.md: ## [today's date] ingest | - -After completing all writes, summarize: what was added, which pages were created or updated, and any contradictions found. diff --git a/.claude/commands/wiki-lint.md b/.claude/commands/wiki-lint.md deleted file mode 100644 index fe45b691..00000000 --- a/.claude/commands/wiki-lint.md +++ /dev/null @@ -1,19 +0,0 @@ -Health-check the LLM Wiki for issues. - -Usage: /wiki-lint - -Follow the Lint Workflow defined in CLAUDE.md: - -Structural checks (use Grep and Glob tools): -1. Orphan pages — wiki pages with no inbound [[wikilinks]] from other pages -2. Broken links — [[WikiLinks]] pointing to pages that don't exist -3. Missing entity pages — names referenced in 3+ pages but lacking their own page - -Semantic checks (read and reason over page content): -4. Contradictions — claims that conflict between pages -5. Stale summaries — pages not updated after newer sources changed the picture -6. Data gaps — important questions the wiki can't answer; suggest specific sources to find - -Output a structured markdown lint report. At the end, ask if the user wants it saved to wiki/lint-report.md. - -Append to wiki/log.md: ## [today's date] lint | Wiki health check diff --git a/.claude/commands/wiki-query.md b/.claude/commands/wiki-query.md deleted file mode 100644 index 8a6a6df1..00000000 --- a/.claude/commands/wiki-query.md +++ /dev/null @@ -1,14 +0,0 @@ -Query the LLM Wiki and synthesize an answer. - -Usage: /wiki-query $ARGUMENTS - -$ARGUMENTS is the question to answer, e.g. `What are the main themes across all sources?` - -Follow the Query Workflow defined in CLAUDE.md: -1. Read wiki/index.md to identify the most relevant pages -2. Read those pages (up to ~10 most relevant) -3. Synthesize a thorough markdown answer with [[PageName]] wikilink citations -4. Include a ## Sources section at the end listing pages you drew from -5. Ask the user if they want the answer saved as wiki/syntheses/<slug>.md - -If the wiki is empty, say so and suggest running /wiki-ingest first. diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index 1fe5a046..00000000 --- a/AGENTS.md +++ /dev/null @@ -1,219 +0,0 @@ -# LLM Wiki Agent — Schema & Workflow Instructions - -This wiki is maintained entirely by your coding agent. No API key or Python scripts needed — just open this repo in Codex, OpenCode, or any agent that reads this file, and talk to it. - -## How to Use - -Describe what you want in plain English: -- *"Ingest this file: raw/papers/my-paper.md"* -- *"What does the wiki say about transformer models?"* -- *"Check the wiki for orphan pages and contradictions"* -- *"Build the knowledge graph"* - -Or use shorthand triggers: -- `ingest <file>` → runs the Ingest Workflow -- `query: <question>` → runs the Query Workflow -- `lint` → runs the Lint Workflow -- `build graph` → runs the Graph Workflow - ---- - -## Directory Layout - -``` -raw/ # Immutable source documents — never modify these -wiki/ # Agent owns this layer entirely - index.md # Catalog of all pages — update on every ingest - log.md # Append-only chronological record - overview.md # Living synthesis across all sources - sources/ # One summary page per source document - entities/ # People, companies, projects, products - concepts/ # Ideas, frameworks, methods, theories - syntheses/ # Saved query answers -graph/ # Auto-generated graph data -tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY) -``` - ---- - -## Page Format - -Every wiki page uses this frontmatter: - -```yaml ---- -title: "Page Title" -type: source | entity | concept | synthesis -tags: [] -sources: [] # list of source slugs that inform this page -last_updated: YYYY-MM-DD ---- -``` - -Use `[[PageName]]` wikilinks to link to other wiki pages. - ---- - -## Ingest Workflow - -Triggered by: *"ingest <file>"* - -Steps (in order): -1. Read the source document fully -2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context -3. Write `wiki/sources/<slug>.md` — use the source page format below -4. Update `wiki/index.md` — add entry under Sources section -5. Update `wiki/overview.md` — revise synthesis if warranted -6. Update/create entity pages for key people, companies, projects mentioned -7. Update/create concept pages for key ideas and frameworks discussed -8. Flag any contradictions with existing wiki content -9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>` - -### Source Page Format - -```markdown ---- -title: "Source Title" -type: source -tags: [] -date: YYYY-MM-DD -source_file: raw/... ---- - -## Summary -2–4 sentence summary. - -## Key Claims -- Claim 1 -- Claim 2 - -## Key Quotes -> "Quote here" — context - -## Connections -- [[EntityName]] — how they relate -- [[ConceptName]] — how it connects - -## Contradictions -- Contradicts [[OtherPage]] on: ... -``` - -### Domain-Specific Templates - -If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above: - -#### Diary / Journal Template -```markdown ---- -title: "YYYY-MM-DD Diary" -type: source -tags: [diary] -date: YYYY-MM-DD ---- -## Event Summary -... -## Key Decisions -... -## Energy & Mood -... -## Connections -... -## Shifts & Contradictions -... -``` - -#### Meeting Notes Template -```markdown ---- -title: "Meeting Title" -type: source -tags: [meeting] -date: YYYY-MM-DD ---- -## Goal -... -## Key Discussions -... -## Decisions Made -... -## Action Items -... -``` - ---- - -## Query Workflow - -Triggered by: *"query: <question>"* - -Steps: -1. Read `wiki/index.md` to identify relevant pages -2. Read those pages -3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks -4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md` - ---- - -## Lint Workflow - -Triggered by: *"lint"* - -Check for: -- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages -- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist -- **Contradictions** — claims that conflict across pages -- **Stale summaries** — pages not updated after newer sources -- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page -- **Data gaps** — questions the wiki can't answer; suggest new sources - -Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`. - ---- - -## Graph Workflow - -Triggered by: *"build graph"* - -First try: `python tools/build_graph.py --open` - -If Python/deps unavailable, build manually: -1. Search for all `[[wikilinks]]` across wiki pages -2. Build nodes (one per page) and edges (one per link) -3. Infer implicit relationships not captured by wikilinks — tag `INFERRED` with confidence score; low confidence → `AMBIGUOUS` -4. Write `graph/graph.json` with `{nodes, edges, built: date}` -5. Write `graph/graph.html` as a self-contained vis.js visualization - ---- - -## Naming Conventions - -- Source slugs: `kebab-case` matching source filename -- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`) -- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`) - -## Index Format - -```markdown -# Wiki Index - -## Overview -- [Overview](overview.md) — living synthesis - -## Sources -- [Source Title](sources/slug.md) — one-line summary - -## Entities -- [Entity Name](entities/EntityName.md) — one-line description - -## Concepts -- [Concept Name](concepts/ConceptName.md) — one-line description - -## Syntheses -- [Analysis Title](syntheses/slug.md) — what question it answers -``` - -## Log Format - -`## [YYYY-MM-DD] <operation> | <title>` - -Operations: `ingest`, `query`, `lint`, `graph` diff --git a/CLAUDE(ENGLISH).md b/CLAUDE(ENGLISH).md deleted file mode 100644 index 345219f7..00000000 --- a/CLAUDE(ENGLISH).md +++ /dev/null @@ -1,230 +0,0 @@ -# LLM Wiki Agent — Schema & Workflow Instructions - -This wiki is maintained entirely by Claude Code. No API key or Python scripts needed — just open this repo in Claude Code and talk to it. - -## Slash Commands (Claude Code) - -| Command | What to say | -|---|---| -| `/wiki-ingest` | `ingest raw/my-article.md` | -| `/wiki-query` | `query: what are the main themes?` | -| `/wiki-lint` | `lint the wiki` | -| `/wiki-graph` | `build the knowledge graph` | - -Or just describe what you want in plain English: -- *"Ingest this file: raw/papers/attention-is-all-you-need.md"* -- *"What does the wiki say about transformer models?"* -- *"Check the wiki for orphan pages and contradictions"* -- *"Build the graph and show me what's connected to RAG"* - -Claude Code reads this file automatically and follows the workflows below. - ---- - -## Directory Layout - -``` -raw/ # Immutable source documents — never modify these -wiki/ # Claude owns this layer entirely - index.md # Catalog of all pages — update on every ingest - log.md # Append-only chronological record - overview.md # Living synthesis across all sources - sources/ # One summary page per source document - entities/ # People, companies, projects, products - concepts/ # Ideas, frameworks, methods, theories - syntheses/ # Saved query answers -graph/ # Auto-generated graph data -tools/ # Optional standalone Python scripts (require ANTHROPIC_API_KEY) -``` - ---- - -## Page Format - -Every wiki page uses this frontmatter: - -```yaml ---- -title: "Page Title" -type: source | entity | concept | synthesis -tags: [] -sources: [] # list of source slugs that inform this page -last_updated: YYYY-MM-DD ---- -``` - -Use `[[PageName]]` wikilinks to link to other wiki pages. - ---- - -## Ingest Workflow - -Triggered by: *"ingest <file>"* or `/wiki-ingest` - -Steps (in order): -1. Read the source document fully using the Read tool -2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context -3. Write `wiki/sources/<slug>.md` — use the source page format below -4. Update `wiki/index.md` — add entry under Sources section -5. Update `wiki/overview.md` — revise synthesis if warranted -6. Update/create entity pages for key people, companies, projects mentioned -7. Update/create concept pages for key ideas and frameworks discussed -8. Flag any contradictions with existing wiki content -9. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>` - -### Source Page Format - -```markdown ---- -title: "Source Title" -type: source -tags: [] -date: YYYY-MM-DD -source_file: raw/... ---- - -## Summary -2–4 sentence summary. - -## Key Claims -- Claim 1 -- Claim 2 - -## Key Quotes -> "Quote here" — context - -## Connections -- [[EntityName]] — how they relate -- [[ConceptName]] — how it connects - -## Contradictions -- Contradicts [[OtherPage]] on: ... -``` - -### Domain-Specific Templates - -If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above: - -#### Diary / Journal Template -```markdown ---- -title: "YYYY-MM-DD Diary" -type: source -tags: [diary] -date: YYYY-MM-DD ---- -## Event Summary -... -## Key Decisions -... -## Energy & Mood -... -## Connections -... -## Shifts & Contradictions -... -``` - -#### Meeting Notes Template -```markdown ---- -title: "Meeting Title" -type: source -tags: [meeting] -date: YYYY-MM-DD ---- -## Goal -... -## Key Discussions -... -## Decisions Made -... -## Action Items -... -``` - ---- - -## Query Workflow - -Triggered by: *"query: <question>"* or `/wiki-query` - -Steps: -1. Read `wiki/index.md` to identify relevant pages -2. Read those pages with the Read tool -3. Synthesize an answer with inline citations as `[[PageName]]` wikilinks -4. Ask the user if they want the answer filed as `wiki/syntheses/<slug>.md` - ---- - -## Lint Workflow - -Triggered by: *"lint the wiki"* or `/wiki-lint` - -Use Grep and Read tools to check for: -- **Orphan pages** — wiki pages with no inbound `[[links]]` from other pages -- **Broken links** — `[[WikiLinks]]` pointing to pages that don't exist -- **Contradictions** — claims that conflict across pages -- **Stale summaries** — pages not updated after newer sources -- **Missing entity pages** — entities mentioned in 3+ pages but lacking their own page -- **Data gaps** — questions the wiki can't answer; suggest new sources - -Output a lint report and ask if the user wants it saved to `wiki/lint-report.md`. - ---- - -## Graph Workflow - -Triggered by: *"build the knowledge graph"* or `/wiki-graph` - -When the user asks to build the graph, run `tools/build_graph.py` which: -- Pass 1: Parses all `[[wikilinks]]` → deterministic `EXTRACTED` edges -- Pass 2: Infers implicit relationships → `INFERRED` edges with confidence scores -- Runs Louvain community detection -- Outputs `graph/graph.json` + `graph/graph.html` - -If the user doesn't have Python/dependencies set up, instead generate the graph data manually: -1. Use Grep to find all `[[wikilinks]]` across wiki pages -2. Build a node/edge list -3. Write `graph/graph.json` directly -4. Write `graph/graph.html` using the vis.js template - ---- - -## Naming Conventions - -- Source slugs: `kebab-case` matching source filename -- Entity pages: `TitleCase.md` (e.g. `OpenAI.md`, `SamAltman.md`) -- Concept pages: `TitleCase.md` (e.g. `ReinforcementLearning.md`, `RAG.md`) -- Source pages: `kebab-case.md` - -## Index Format - -```markdown -# Wiki Index - -## Overview -- [Overview](overview.md) — living synthesis - -## Sources -- [Source Title](sources/slug.md) — one-line summary - -## Entities -- [Entity Name](entities/EntityName.md) — one-line description - -## Concepts -- [Concept Name](concepts/ConceptName.md) — one-line description - -## Syntheses -- [Analysis Title](syntheses/slug.md) — what question it answers -``` - -## Log Format - -Each entry starts with `## [YYYY-MM-DD] <operation> | <title>` so it's grep-parseable: - -``` -grep "^## \[" wiki/log.md | tail -10 -``` - -Operations: `ingest`, `query`, `lint`, `graph` diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 8fc63ef8..00000000 --- a/CLAUDE.md +++ /dev/null @@ -1,352 +0,0 @@ -# LLM Wiki Agent — Schema & Workflow Instructions(中文版增强规范) - -本 Wiki 完全由 Claude Code 自动维护。无需 API Key 或 Python 脚本 —— 只需在 Claude Code 中打开本仓库并与其对话。 - ---- -# 🔴 全局强制规则(CRITICAL) - -## 1. 输出语言(必须遵守) - -- 所有输出必须使用**简体中文** -- 专有名词允许保留英文,但首次出现必须附带中文解释 -- 如果原始文件名是中文,则source页面的名称尽量用中文,不要用拼音表示, 如果有特殊字符可以忽略 -- 禁止中英混合句(术语除外) -- 不允许输出纯英文总结或分析 - -示例: - -Transformer(变压器模型,一种基于注意力机制的神经网络架构) - ---- - -## 2. 输出风格(严格限制) - -所有输出必须: - -- 去修辞(禁止 narrative 风格) -- 去模糊(禁止“可能”“大概”等词) -- 信息密度最大化 -- 面向“知识结构化”,而非阅读体验 - -优先级: - -结构 > 关系 > 结论 > 描述 - ---- - -## 3. 结构化语义(必须) - -所有页面必须遵循结构化语义规则: - -- Summary 必须使用固定字段 -- Claim 必须符合标准语法 -- Connections 必须使用关系类型 -- 禁止自由发挥 - ---- - -# Slash Commands(Claude Code) - -| Command | 使用方式 | -| -------------- | --------------------------- | -| `/wiki-ingest` | `ingest raw/your-file.md` | -| `/wiki-query` | `query: 你的问题` | -| `/wiki-lint` | `lint the wiki` | -| `/wiki-graph` | `build the knowledge graph` | - ---- - -## 自然语言示例 - -- ingest raw/papers/attention-is-all-you-need.md -- query: Transformer 的核心机制是什么? -- lint the wiki -- build the graph and analyze RAG - -Claude Code 会自动读取本文件并执行以下工作流。 - - - ---- - -# Directory Layout(目录结构) - -``` -raw/ # 原始文档(不可修改) -wiki/ # 知识层(由 Claude 完全维护) - index.md # 页面索引(每次 ingest 必须更新) - log.md # 追加式日志 - overview.md # 全局知识总结 - sources/ # 每个原始文档对应一个页面 - entities/ # 实体(人/公司/产品/项目) - concepts/ # 概念(方法/理论/框架) - syntheses/ # 查询结果沉淀 -graph/ # 自动生成的图数据 -tools/ # 可选 Python 工具 (require ANTHROPIC_API_KEY) -```` - - ---- - -# Page Format(页面格式) - -每个页面必须包含: - -```yaml ---- -id: unique_id -title: "Page Title" -type: source | entity | concept | synthesis -tags: [] -sources: [] # 来源 -last_updated: YYYY-MM-DD ---- -```` - -必须使用 `[[PageName]]` 进行链接。 - ---- - -# Ingest Workflow(摄取流程) -**重要** 请严格按照摄取流程进行操作,每分析一个页面必须要创建/更新source page,entity, concept等。不可遗漏! - -触发方式: -- `/wiki-ingest` -- 或:`ingest <file>` -## 执行步骤(严格顺序) -1. 使用 Read 工具完整读取 source 文档 -2. 读取 `wiki/index.md` 和 `wiki/overview.md` -3. 生成 `wiki/sources/原始中文名.md` (非中文使用 slug.md) -4. 更新 `wiki/index.md` -5. 更新 `wiki/overview.md`(如有必要) -6. 创建或更新 Entity 页面 -7. 创建或更新 Concept 页面 -8. 检测并记录冲突 -9. 追加 `wiki/log.md` - ---- - -# Source Page Format(增强结构) - -```markdown ---- -title: "Source Title" -type: source -tags: [] -date: YYYY-MM-DD ---- - -## Source File -- [[raw/...]] - -## Summary -- 核心主题: -- 问题域: -- 方法/机制: -- 结论/价值: - -## Key Claims -- (必须符合:主体 + 机制 + 结果) - -## Key Quotes -> "引用内容" — 上下文说明 - -## Key Concepts -- [[ConceptName]]:定义 - -## Key Entities -- [[EntityName]]:角色说明 - -## Connections -- [[A]] ← depends_on ← [[B]] -- [[C]] ← extends ← [[D]] - -## Contradictions -- 与 [[OtherPage]] 冲突: - - 冲突点: - - 当前观点: - - 对方观点: -``` - ---- - -# Domain-Specific Templates(领域模板) - -## Diary / Journal - -```markdown ---- -title: "YYYY-MM-DD Diary" -type: source -tags: [diary] -date: YYYY-MM-DD ---- -## Event Summary -## Key Decisions -## Energy & Mood -## Connections -## Shifts & Contradictions -``` - ---- - -## Meeting Notes - -```markdown ---- -title: "Meeting Title" -type: source -tags: [meeting] -date: YYYY-MM-DD ---- -## Goal -## Key Discussions -## Decisions Made -## Action Items -``` - ---- - -# Entity & Concept Rules(关键增强) - -## Entity(实体) - -创建条件: -- 出现 ≥ 2 次 - 或 -- 对主题有关键影响 - -类型: -- 人 / 公司 / 产品 / 项目 - ---- - -## Concept(概念) -创建条件: -- 可抽象 -- 可复用 -- 非具体实例 ---- - -## 命名规范(强制) -- 使用唯一标准名称 -- 所有别名写入页面: - -```markdown -## Aliases -- GPT4 -- GPT-4 -``` - ---- - -## 去重机制(必须) - -创建前必须: -1. 搜索 index -2. 判断是否存在 -3. 存在则更新 - ---- - -# Query Workflow(查询流程) - -触发: -- `/wiki-query` -- 或:`query: 问题` - ---- - -## 步骤 - -1. 读取 index -2. 找到相关页面 -3. 使用 Read 工具加载 -4. 输出结构化答案 -5. 使用 `[[Page]]` 引用 -6. 询问是否保存为 synthesis - ---- - -# Lint Workflow(校验) - -检查内容: - -- 孤立页面 -- 断链 -- 冲突 -- 过期内容 -- 缺失Entity -- 缺失Concept -- 知识空白 - ---- - -# Graph Workflow(知识图谱) - -触发: -- `/wiki-graph` - ---- - -执行: -- 优先运行 `tools/build_graph.py` -- 否则手动构建: - -步骤: -1. 提取所有 `[[links]]` -2. 构建节点与边 -3. 输出 `graph.json` - ---- - -# Naming Conventions(命名规范) -- Source:保留原始中文名称(去除特殊符号),非中文使用 kebab-case -- Entity:TitleCase -- Concept:TitleCase - ---- - -# Index Format(索引结构) - -```markdown -# Wiki Index - -## Overview -- [Overview](overview.md) - -## Sources -- [Title](sources/原始中文名.md) - -## Entities -- [Entity](entities/Entity.md) - -## Concepts -- [Concept](concepts/Concept.md) - -## Syntheses -- [Title](syntheses/slug.md) -``` - ---- - -# Log Format(日志) - -``` -## [YYYY-MM-DD] ingest | 标题 -``` - ---- - -# ✅ 最终目标 - -该系统用于: - -- 知识沉淀 -- 结构化理解 -- 自动图谱构建 -- Agent 推理支持 - ---- - -# END \ No newline at end of file diff --git a/GEMINI.md b/GEMINI.md deleted file mode 100644 index 3025c9d2..00000000 --- a/GEMINI.md +++ /dev/null @@ -1,175 +0,0 @@ -# LLM Wiki Agent — Schema & Workflow Instructions - -This wiki is maintained entirely by Gemini CLI. No API key or Python scripts needed — just open this repo with `gemini` and talk to it. - -## How to Use - -Describe what you want in plain English: -- *"Ingest this file: raw/papers/my-paper.md"* -- *"What does the wiki say about transformer models?"* -- *"Check the wiki for orphan pages and contradictions"* -- *"Build the knowledge graph"* - -Or use shorthand triggers: -- `ingest <file>` → runs the Ingest Workflow -- `query: <question>` → runs the Query Workflow -- `lint` → runs the Lint Workflow -- `build graph` → runs the Graph Workflow - ---- - -## Directory Layout - -``` -raw/ # Immutable source documents — never modify these -wiki/ # Agent owns this layer entirely - index.md # Catalog of all pages — update on every ingest - log.md # Append-only chronological record - overview.md # Living synthesis across all sources - sources/ # One summary page per source document - entities/ # People, companies, projects, products - concepts/ # Ideas, frameworks, methods, theories - syntheses/ # Saved query answers -graph/ # Auto-generated graph data -tools/ # Optional standalone Python scripts -``` - ---- - -## Page Format - -Every wiki page uses this frontmatter: - -```yaml ---- -title: "Page Title" -type: source | entity | concept | synthesis -tags: [] -sources: [] -last_updated: YYYY-MM-DD ---- -``` - -Use `[[PageName]]` wikilinks to link to other wiki pages. - ---- - -## Ingest Workflow - -Triggered by: *"ingest <file>"* - -1. Read the source document fully -2. Read `wiki/index.md` and `wiki/overview.md` for current wiki context -3. Write `wiki/sources/<slug>.md` (source page format below) -4. Update `wiki/index.md` — add entry under Sources -5. Update `wiki/overview.md` — revise synthesis if warranted -6. Update/create entity and concept pages -7. Flag contradictions with existing wiki content -8. Append to `wiki/log.md`: `## [YYYY-MM-DD] ingest | <Title>` - -### Source Page Format - -```markdown ---- -title: "Source Title" -type: source -tags: [] -date: YYYY-MM-DD -source_file: raw/... ---- - -## Summary -2–4 sentence summary. - -## Key Claims -- Claim 1 - -## Key Quotes -> "Quote here" - -## Connections -- [[EntityName]] — how they relate - -## Contradictions -- Contradicts [[OtherPage]] on: ... -``` - -### Domain-Specific Templates - -If the source falls into a specific domain (e.g., personal diary, meeting notes), the agent should use a specialized template instead of the default generic one above: - -#### Diary / Journal Template -```markdown ---- -title: "YYYY-MM-DD Diary" -type: source -tags: [diary] -date: YYYY-MM-DD ---- -## Event Summary -... -## Key Decisions -... -## Energy & Mood -... -## Connections -... -## Shifts & Contradictions -... -``` - -#### Meeting Notes Template -```markdown ---- -title: "Meeting Title" -type: source -tags: [meeting] -date: YYYY-MM-DD ---- -## Goal -... -## Key Discussions -... -## Decisions Made -... -## Action Items -... -``` - ---- - -## Query Workflow - -Triggered by: *"query: <question>"* - -1. Read `wiki/index.md` — identify relevant pages -2. Read those pages -3. Synthesize answer with `[[PageName]]` citations -4. Offer to save as `wiki/syntheses/<slug>.md` - ---- - -## Lint Workflow - -Triggered by: *"lint"* - -Check for: orphan pages, broken links, contradictions, stale content, missing entity pages, data gaps. - ---- - -## Graph Workflow - -Triggered by: *"build graph"* - -Try `python tools/build_graph.py --open` first. If unavailable, build graph.json and graph.html manually from wikilinks. - ---- - -## Naming Conventions - -- Source slugs: `kebab-case` -- Entity/Concept pages: `TitleCase.md` - -## Log Format - -`## [YYYY-MM-DD] <operation> | <title>` diff --git a/LICENSE b/LICENSE deleted file mode 100644 index 7caba6cd..00000000 --- a/LICENSE +++ /dev/null @@ -1,21 +0,0 @@ -MIT License - -Copyright (c) 2023 SamurAIGPT - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/README.md b/README.md deleted file mode 100644 index 4397152c..00000000 --- a/README.md +++ /dev/null @@ -1,245 +0,0 @@ -# LLM Wiki Agent - -[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) - -**A coding agent skill.** Drop source documents into `raw/` and type `/wiki-ingest` — the agent reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it. - -> Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done. - -``` -/wiki-ingest raw/papers/attention-is-all-you-need.md -``` - -``` -wiki/ -├── index.md catalog of all pages — updated on every ingest -├── log.md append-only record of every operation -├── overview.md living synthesis across all sources -├── sources/ one summary page per source document -├── entities/ people, companies, projects — auto-created -├── concepts/ ideas, frameworks, methods — auto-created -└── syntheses/ query answers filed back as wiki pages -graph/ -├── graph.json persistent node/edge data (SHA256-cached) -└── graph.html interactive vis.js visualization — open in any browser -``` - -## Install - -**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file. - -```bash -git clone https://github.com/SamurAIGPT/llm-wiki-agent.git -cd llm-wiki-agent -``` - -Open in your agent — no API key or Python setup needed: - -```bash -claude # reads CLAUDE.md + .claude/commands/ -codex # reads AGENTS.md -opencode # reads AGENTS.md -gemini # reads GEMINI.md -``` - -## Usage - -``` -/wiki-ingest raw/papers/my-paper.md # ingest a source into the wiki -/wiki-ingest raw/articles/my-article.md # works on any markdown file - -/wiki-query "what are the main themes?" # synthesize answer from wiki pages -/wiki-query "how does X relate to Y?" # with [[wikilink]] citations - -/wiki-lint # find orphans, contradictions, gaps -/wiki-graph # build graph.html from all wikilinks -``` - -Plain English also works with any agent: -``` -"Ingest this paper: raw/papers/llama2.md" -"What does the wiki say about attention mechanisms?" -"Check for contradictions across sources" -"Build the knowledge graph and tell me the most connected nodes" -``` - -Works with any markdown source — articles, papers, book chapters, meeting notes, journal entries, research summaries. - -## What You Get - -**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost. - -**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them. - -**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them. - -**Living overview** — `wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read. - -**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time. - -**Knowledge graph** — `graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics. - -**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them. - -## Use Cases - -### Research - -Going deep on a topic over weeks — reading papers, articles, reports. - -``` -/wiki-ingest raw/papers/attention-is-all-you-need.md -/wiki-ingest raw/papers/llama2.md -/wiki-ingest raw/papers/rag-survey.md - -# Wiki builds entity pages (Meta AI, Google Brain) and -# concept pages (Attention, RLHF, Context Window) automatically. - -/wiki-query "What are the main approaches to reducing hallucination?" -/wiki-query "How has context window size evolved across models?" - -/wiki-lint -# → "No sources on mixture-of-experts — consider the Mixtral paper" -``` - -By the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen. - ---- - -### Reading a Book - -File each chapter as you go. Build out pages for characters, themes, arguments. - -``` -/wiki-ingest raw/book/chapter-01.md -/wiki-ingest raw/book/chapter-02.md - -# Wiki creates entity and theme pages automatically. - -/wiki-query "How has the protagonist's motivation evolved?" -/wiki-query "What contradictions exist in the author's argument so far?" - -/wiki-graph # → graph.html shows every character/theme and how they connect -``` - -Think fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing. - ---- - -### Personal Knowledge Base - -Track goals, health, habits, self-improvement — file journal entries, articles, podcast notes. - -``` -/wiki-ingest raw/journal/2026-01-week1.md -/wiki-ingest raw/articles/huberman-sleep-protocol.md -/wiki-ingest raw/articles/atomic-habits-summary.md - -/wiki-query "What patterns show up in my journal entries about energy?" -/wiki-query "What habits have I tried and what was the outcome?" -``` - -The wiki builds a structured picture over time. Concepts like "Sleep", "Exercise", "Deep Work" accumulate evidence from every source filed. - ---- - -### Business / Team Intelligence - -Feed in meeting transcripts, project docs, customer calls. - -``` -/wiki-ingest raw/meetings/q1-planning-transcript.md -/wiki-ingest raw/docs/product-roadmap-2026.md -/wiki-ingest raw/calls/customer-interview-acme.md - -/wiki-query "What feature requests have come up most across customer calls?" -/wiki-query "What decisions were made in Q1 and what was the rationale?" - -/wiki-lint -# → "Project X mentioned in 5 pages but no dedicated page" -# → "Roadmap contradicts customer interview on priority of feature Y" -``` - -The wiki stays current because the agent does the maintenance no one wants to do. - ---- - -### Competitive Analysis - -Track a company, market, or technology over time. - -``` -/wiki-ingest raw/competitors/openai-announcements.md -/wiki-ingest raw/market/ai-funding-report-q1.md - -/wiki-query "How do OpenAI and Anthropic differ on safety approach?" -/wiki-query "Which companies announced multimodal models in the last 6 months?" -/wiki-query "Competitive landscape summary as of today" --save -``` - -## The Graph - -Two-pass build: - -1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED` -2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS` - -Louvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser. - -## CLAUDE.md / AGENTS.md - -The schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain. - -| Agent | Schema file | -|---|---| -| Claude Code | `CLAUDE.md` | -| Codex / OpenCode | `AGENTS.md` | -| Gemini CLI | `GEMINI.md` | - -## What Makes This Different from RAG - -| RAG | LLM Wiki Agent | -|---|---| -| Re-derives knowledge every query | Compiles once, keeps current | -| Raw chunks as retrieval unit | Structured wiki pages | -| No cross-references | Cross-references pre-built | -| Contradictions surface at query time (maybe) | Flagged at ingest time | -| No accumulation | Every source makes the wiki richer | - -## Obsidian Integration - -The wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault. - -### Vault Symlink Pattern -If you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks: -1. Keep your working agent repository at e.g., `~/llm-wiki-agent` -2. Create a symlink from your main Obsidian vault: - ```bash - ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki - ``` -3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion. - -> **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian. - -### Recommended .obsidian Config -- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph. -- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`). - -## Tips - -- File good query answers back with `--save` — your explorations compound just like ingested sources -- The wiki is a git repo — version history for free -- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`) - -## Tech Stack - -NetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files. - -## Related - -- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer) -- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles - -## License - -MIT License — see [LICENSE](LICENSE) for details. diff --git a/concepts/Multi.md b/concepts/Multi.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/automated-sync.md b/docs/automated-sync.md deleted file mode 100644 index fc7f06ed..00000000 --- a/docs/automated-sync.md +++ /dev/null @@ -1,101 +0,0 @@ -# Automated Wiki Synchronization Guide - -Managing an LLM Wiki works best when it constantly reflects your background note-taking system. Instead of manually ingesting files every time you write something new, you can orchestrate an end-to-end automation pipeline. - -This guide outlines a production-grade cron/launchd strategy for local Mac/Linux environments. - -## The Two-Step Architecture - -LLM Wiki Agent ingestion is a two-step process: -1. **Syncing to `raw/`**: Getting files from your personal vault/tools into the agent's staging area. -2. **Batch Ingestion**: Triggering `tools/ingest.py` on the synchronized directories to synthesize and weave them into the graph. - -### Step 1: The Master Orchestrator Script - -Create a comprehensive shell script in your wiki root (`daily-automated-sync.sh`): - -```bash -#!/usr/bin/env bash -set -uo pipefail - -# Define variables -LAB_DIR="$HOME/projects/active/personal-wiki-lab" -LOG_FILE="$LAB_DIR/automation-cron.log" -DATE=$(date "+%Y-%m-%d %H:%M:%S") - -echo "=====================================================" >> "$LOG_FILE" -echo "[$DATE] Starting automated wiki synchronization..." >> "$LOG_FILE" - -cd "$LAB_DIR" || exit 1 - -# 1. Run your personal Vault-to-Raw symlink script here -# Example: ./sync-raw.sh >> "$LOG_FILE" 2>&1 - -# 2. Trigger Litellm Batch Ingestion using LLM of your choice -export LLM_MODEL="gemini/gemini-3-flash-preview" -export GEMINI_API_KEY="AIzaSy..." # or export OPENAI_API_KEY - -echo "[$DATE] Batch ingesting markdown files..." >> "$LOG_FILE" -find raw/ -type l -name "*.md" -o -type f -name "*.md" | \ -while read file; do - python3 tools/ingest.py "$file" >> "$LOG_FILE" 2>&1 -done - -# 3. Heal Graph Context (Auto-resolves broken semantic links) -echo "[$DATE] Healing broken nodes..." >> "$LOG_FILE" -python3 tools/heal.py >> "$LOG_FILE" 2>&1 - -echo "[$(date "+%Y-%m-%d %H:%M:%S")] Automated sync completed." >> "$LOG_FILE" -echo "=====================================================" >> "$LOG_FILE" -``` - -Don't forget to make it executable: `chmod +x daily-automated-sync.sh`. - -### Step 2: System Scheduler (macOS launchd) - -For macOS, `launchd` is significantly more robust than `cron`. - -Create a `.plist` file at `~/Library/LaunchAgents/com.personal-wiki-sync.plist`: - -```xml -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> -<plist version="1.0"> -<dict> - <key>Label</key> - <string>com.personal-wiki-sync</string> - <key>ProgramArguments</key> - <array> - <string>/bin/bash</string> - <string>/Users/your-username/projects/active/personal-wiki-lab/daily-automated-sync.sh</string> - </array> - - <!-- Execute automatically at 2:00 AM daily --> - <key>StartCalendarInterval</key> - <dict> - <key>Hour</key> - <integer>2</integer> - <key>Minute</key> - <integer>0</integer> - </dict> - - <!-- Run upon system boot if the interval was missed --> - <key>RunAtLoad</key> - <true/> - - <!-- Diagnostic Logs --> - <key>StandardOutPath</key> - <string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stdout.log</string> - <key>StandardErrorPath</key> - <string>/Users/your-username/projects/active/personal-wiki-lab/daemon.stderr.log</string> -</dict> -</plist> -``` - -Load the daemon: -```bash -launchctl load ~/Library/LaunchAgents/com.personal-wiki-sync.plist -``` - -### Self-Healing & Health Monitoring -Since the automation runs silently at night, your `daemon.stderr.log` guarantees you will spot any API failures. The orchestrated script includes `tools/heal.py`, which is strongly recommended: it will seamlessly intercept and build concepts that accumulated throughout your day but were never individually formalized. diff --git a/graph/.gitkeep b/graph/.gitkeep deleted file mode 100644 index e69de29b..00000000 diff --git a/openclaw/wiki-ingest/status.md b/openclaw/wiki-ingest/status.md index 9c0345e8..0a5747ac 100644 --- a/openclaw/wiki-ingest/status.md +++ b/openclaw/wiki-ingest/status.md @@ -1,36 +1,37 @@ # Wiki Ingest Status ## Last Updated -2026-04-16 03:45 CST +2026-04-16 08:05 CST ## Batch Progress -- Total batches completed: 5 -- This batch: 4 docs ingested +- Total batches completed: 6 +- This batch (Batch 12): 3 docs ingested -## Docs Ingested This Session (Batch 5) -1. AI/Multi-Agent System Reliability.md ✅ -2. AI/Never write another prompt.md ✅ -3. AI/RAG从入门到精通系列1:基础RAG.md ✅ -4. AI/大模型相关术语和框架总结|LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏.md ✅ +## Docs Ingested This Session (Batch 12) +1. n8n Telegram Trigger HTTPS 配置修复 ✅ +2. n8n Docker 安装与 SOCKS5 代理配置 ✅ +3. N8N AI Agent 2025 入门教程 ✅ ## Overall Progress - Total raw files: 182 -- Done: 19 (10.4%) -- Remaining: 163 +- Done: 22 (12.1%) +- Remaining: 160 ## Wiki Stats -- Sources: 95 -- Entities: 158 -- Concepts: 203 +- Sources: 98 (+3) +- Entities: 159 (+1: Telegram) +- Concepts: 205 (+2: Telegram Webhook, WEBHOOK_URL) ## Git -- Last commit: 04b7e99 (wiki-ingest batch Apr 16) +- Last commit: 04b7e99 (Batch 11) ## Next Batch Suggestions -From raw/AI/ (remaining ~20 files): +From raw/Agent/ (remaining ~7 files): +- n8n+Claude 通过自然语言自动化工作流.md +- 使用Claude自动生成N8N工作流的实操教程.md +- 万字保姆级教程-90天跑通一人公司模式-2026-03-29.md + +From raw/AI/: - AI/一语点醒梦中人.md - AI/系统提示词构建原则.md -- AI/codecrafters-iobuild-your-own-x...md -- AI/全网最全Nano Banana 2 使用指南.md - AI/如何写出完美的Prompt.md -- AI/我用 Gemini 3 一口气做了 10 个应用.md diff --git a/openclaw/xinghui/Hermes-Agent系统提示词解析-岚叔-2026-04-15.md b/openclaw/xinghui/Hermes-Agent系统提示词解析-岚叔-2026-04-15.md new file mode 100644 index 00000000..cdf16366 --- /dev/null +++ b/openclaw/xinghui/Hermes-Agent系统提示词解析-岚叔-2026-04-15.md @@ -0,0 +1,44 @@ +--- +title: "抽丝剥茧:深度解析 Hermes Agent 万字系统提示词" +source: "https://x.com/lufzzliz/status/2044258384556556743" +author: "岚叔 (@lufzzliz)" +date: "2026-04-15" +type: social-media-highlight +tags: + - Hermes + - AI-Agent + - System-Prompt + - 教程 +--- + +# 抽丝剥茧:深度解析 Hermes Agent 万字系统提示词(System Prompt)构成 + +**来源**: Twitter/X @lufzzliz +**时间**: 2026-04-15 03:35:54 +**链接**: https://twitter.com/lufzzliz/status/2044258384556556743 + +**互动数据**: ❤️ 188 | 🔁 34 | 💬 6 + +--- + +## 内容摘要 + +没想到吧,Hermes agent 也可能有万字的系统提示词,且看岚叔带你完整拆解。 + +同时教你一招降低 50% tokens 的小妙招。 + +本文依然是实践操作类文章,欢迎兄弟们大力支持~ + +--- + +## 关键信息 + +- **主题**: Hermes Agent 系统提示词(System Prompt)深度解析 +- **亮点**: 万字级系统提示词完整拆解 +- **技巧**: 降低 50% tokens 的方法 + +--- + +## 推文链接 + +> 原文链接见 Twitter 帖子 diff --git a/requirements.txt b/requirements.txt deleted file mode 100644 index a9c7b7ba..00000000 --- a/requirements.txt +++ /dev/null @@ -1,2 +0,0 @@ -litellm>=1.0.0 -networkx>=3.2 diff --git a/tools/build_graph.py b/tools/build_graph.py deleted file mode 100644 index 73be7b49..00000000 --- a/tools/build_graph.py +++ /dev/null @@ -1,454 +0,0 @@ -#!/usr/bin/env python3 -""" -Build the knowledge graph from the wiki. - -Usage: - python tools/build_graph.py # full rebuild - python tools/build_graph.py --no-infer # skip semantic inference (faster) - python tools/build_graph.py --open # open graph.html in browser after build - -Outputs: - graph/graph.json — node/edge data (cached by SHA256) - graph/graph.html — interactive vis.js visualization - -Edge types: - EXTRACTED — explicit [[wikilink]] in a page - INFERRED — Claude-detected implicit relationship - AMBIGUOUS — low-confidence inferred relationship -""" - -import re -import json -import hashlib -import argparse -import webbrowser -from pathlib import Path -from datetime import date - -import os - -try: - import networkx as nx - from networkx.algorithms import community as nx_community - HAS_NETWORKX = True -except ImportError: - HAS_NETWORKX = False - print("Warning: networkx not installed. Community detection disabled. Run: pip install networkx") - -REPO_ROOT = Path(__file__).parent.parent -WIKI_DIR = REPO_ROOT / "wiki" -GRAPH_DIR = REPO_ROOT / "graph" -GRAPH_JSON = GRAPH_DIR / "graph.json" -GRAPH_HTML = GRAPH_DIR / "graph.html" -CACHE_FILE = GRAPH_DIR / ".cache.json" -LOG_FILE = WIKI_DIR / "log.md" -SCHEMA_FILE = REPO_ROOT / "CLAUDE.md" - -# Node type → color mapping -TYPE_COLORS = { - "source": "#4CAF50", - "entity": "#2196F3", - "concept": "#FF9800", - "synthesis": "#9C27B0", - "unknown": "#9E9E9E", -} - -EDGE_COLORS = { - "EXTRACTED": "#555555", - "INFERRED": "#FF5722", - "AMBIGUOUS": "#BDBDBD", -} - - -def read_file(path: Path) -> str: - return path.read_text(encoding="utf-8") if path.exists() else "" - - -def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str: - try: - from litellm import completion - except ImportError: - print("Error: litellm not installed. Run: pip install litellm") - import sys - sys.exit(1) - - model = os.getenv(model_env, default_model) - response = completion( - model=model, - messages=[{"role": "user", "content": prompt}], - max_tokens=max_tokens - ) - return response.choices[0].message.content - - -def sha256(text: str) -> str: - return hashlib.sha256(text.encode()).hexdigest() - - -def all_wiki_pages() -> list[Path]: - return [p for p in WIKI_DIR.rglob("*.md") - if p.name not in ("index.md", "log.md", "lint-report.md")] - - -def extract_wikilinks(content: str) -> list[str]: - return list(set(re.findall(r'\[\[([^\]]+)\]\]', content))) - - -def extract_frontmatter_type(content: str) -> str: - match = re.search(r'^type:\s*(\S+)', content, re.MULTILINE) - return match.group(1).strip('"\'') if match else "unknown" - - -def page_id(path: Path) -> str: - return path.relative_to(WIKI_DIR).as_posix().replace(".md", "") - - -def load_cache() -> dict: - if CACHE_FILE.exists(): - try: - return json.loads(CACHE_FILE.read_text()) - except (json.JSONDecodeError, IOError): - return {} - return {} - - -def save_cache(cache: dict): - GRAPH_DIR.mkdir(parents=True, exist_ok=True) - CACHE_FILE.write_text(json.dumps(cache, indent=2)) - - -def build_nodes(pages: list[Path]) -> list[dict]: - nodes = [] - for p in pages: - content = read_file(p) - node_type = extract_frontmatter_type(content) - title_match = re.search(r'^title:\s*"?([^"\n]+)"?', content, re.MULTILINE) - label = title_match.group(1).strip() if title_match else p.stem - nodes.append({ - "id": page_id(p), - "label": label, - "type": node_type, - "color": TYPE_COLORS.get(node_type, TYPE_COLORS["unknown"]), - "path": str(p.relative_to(REPO_ROOT)), - }) - return nodes - - -def build_extracted_edges(pages: list[Path]) -> list[dict]: - """Pass 1: deterministic wikilink edges.""" - # Build a map from stem (lower) -> page_id for resolution - stem_map = {p.stem.lower(): page_id(p) for p in pages} - edges = [] - seen = set() - for p in pages: - content = read_file(p) - src = page_id(p) - for link in extract_wikilinks(content): - target = stem_map.get(link.lower()) - if target and target != src: - key = (src, target) - if key not in seen: - seen.add(key) - edges.append({ - "from": src, - "to": target, - "type": "EXTRACTED", - "color": EDGE_COLORS["EXTRACTED"], - "confidence": 1.0, - }) - return edges - - -def build_inferred_edges(pages: list[Path], existing_edges: list[dict], cache: dict) -> list[dict]: - """Pass 2: API-inferred semantic relationships.""" - new_edges = [] - - # Only process pages that changed since last run - changed_pages = [] - for p in pages: - content = read_file(p) - h = sha256(content) - entry = cache.get(str(p)) - - if not isinstance(entry, dict) or entry.get("hash") != h: - changed_pages.append(p) - else: - # Page unchanged: load its inferred edges from cache perfectly - src = page_id(p) - for rel in entry.get("edges", []): - new_edges.append({ - "from": src, - "to": rel["to"], - "type": rel.get("type", "INFERRED"), - "title": rel.get("relationship", ""), - "label": "", - "color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]), - "confidence": float(rel.get("confidence", 0.7)), - }) - - if not changed_pages: - print(" no changed pages — skipping semantic inference") - return [] - - print(f" inferring relationships for {len(changed_pages)} changed pages...") - - # Build a summary of existing nodes for context - node_list = "\n".join(f"- {page_id(p)} ({extract_frontmatter_type(read_file(p))})" for p in pages) - existing_edge_summary = "\n".join( - f"- {e['from']} → {e['to']} (EXTRACTED)" for e in existing_edges[:30] - ) - - for p in changed_pages: - content = read_file(p)[:2000] # truncate for context efficiency - src = page_id(p) - - prompt = f"""Analyze this wiki page and identify implicit semantic relationships to other pages in the wiki. - -Source page: {src} -Content: -{content} - -All available pages: -{node_list} - -Already-extracted edges from this page: -{existing_edge_summary} - -Return ONLY a JSON array of NEW relationships not already captured by explicit wikilinks: -[ - {{"to": "page-id", "relationship": "one-line description", "confidence": 0.0-1.0, "type": "INFERRED or AMBIGUOUS"}} -] - -Rules: -- Only include pages from the available list above -- Confidence >= 0.7 → INFERRED, < 0.7 → AMBIGUOUS -- Do not repeat edges already in the extracted list -- Return empty array [] if no new relationships found -""" - raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=1024) - raw = raw.strip() - raw = re.sub(r"^```(?:json)?\s*", "", raw) - raw = re.sub(r"\s*```$", "", raw) - - try: - inferred = json.loads(raw) - valid_rels = [] - for rel in inferred: - if isinstance(rel, dict) and "to" in rel: - new_edges.append({ - "from": src, - "to": rel["to"], - "type": rel.get("type", "INFERRED"), - "title": rel.get("relationship", ""), - "label": "", - "color": EDGE_COLORS.get(rel.get("type", "INFERRED"), EDGE_COLORS["INFERRED"]), - "confidence": float(rel.get("confidence", 0.7)), - }) - valid_rels.append(rel) - - # Save properly to cache - cache[str(p)] = { - "hash": sha256(content), - "edges": valid_rels - } - except (json.JSONDecodeError, TypeError, ValueError): - pass - - return new_edges - - -def detect_communities(nodes: list[dict], edges: list[dict]) -> dict[str, int]: - """Assign community IDs to nodes using Louvain algorithm.""" - if not HAS_NETWORKX: - return {} - - G = nx.Graph() - for n in nodes: - G.add_node(n["id"]) - for e in edges: - G.add_edge(e["from"], e["to"]) - - if G.number_of_edges() == 0: - return {} - - try: - communities = nx_community.louvain_communities(G, seed=42) - node_to_community = {} - for i, comm in enumerate(communities): - for node in comm: - node_to_community[node] = i - return node_to_community - except Exception: - return {} - - -COMMUNITY_COLORS = [ - "#E91E63", "#00BCD4", "#8BC34A", "#FF5722", "#673AB7", - "#FFC107", "#009688", "#F44336", "#3F51B5", "#CDDC39", -] - - -def render_html(nodes: list[dict], edges: list[dict]) -> str: - """Generate self-contained vis.js HTML.""" - nodes_json = json.dumps(nodes, indent=2) - edges_json = json.dumps(edges, indent=2) - - legend_items = "".join( - f'<span style="background:{color};padding:3px 8px;margin:2px;border-radius:3px;font-size:12px">{t}</span>' - for t, color in TYPE_COLORS.items() if t != "unknown" - ) - - return f"""<!DOCTYPE html> -<html lang="en"> -<head> -<meta charset="UTF-8"> -<title>LLM Wiki — Knowledge Graph - - - - -
-

LLM Wiki Graph

- -
{legend_items}
-
- ── Explicit link
- ── Inferred -
-
-
-
-
-
- -
-
- - -""" - - -def append_log(entry: str): - log_path = WIKI_DIR / "log.md" - existing = read_file(log_path) - log_path.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8") - - -def build_graph(infer: bool = True, open_browser: bool = False): - pages = all_wiki_pages() - today = date.today().isoformat() - - if not pages: - print("Wiki is empty. Ingest some sources first.") - return - - print(f"Building graph from {len(pages)} wiki pages...") - GRAPH_DIR.mkdir(parents=True, exist_ok=True) - - cache = load_cache() - - # Pass 1: extracted edges - print(" Pass 1: extracting wikilinks...") - nodes = build_nodes(pages) - edges = build_extracted_edges(pages) - print(f" → {len(edges)} extracted edges") - - # Pass 2: inferred edges - if infer: - print(" Pass 2: inferring semantic relationships...") - inferred = build_inferred_edges(pages, edges, cache) - edges.extend(inferred) - print(f" → {len(inferred)} inferred edges") - save_cache(cache) - - # Community detection - print(" Running Louvain community detection...") - communities = detect_communities(nodes, edges) - for node in nodes: - comm_id = communities.get(node["id"], -1) - if comm_id >= 0: - node["color"] = COMMUNITY_COLORS[comm_id % len(COMMUNITY_COLORS)] - node["group"] = comm_id - - # Save graph.json - graph_data = {"nodes": nodes, "edges": edges, "built": today} - GRAPH_JSON.write_text(json.dumps(graph_data, indent=2)) - print(f" saved: graph/graph.json ({len(nodes)} nodes, {len(edges)} edges)") - - # Save graph.html - html = render_html(nodes, edges) - GRAPH_HTML.write_text(html) - print(f" saved: graph/graph.html") - - append_log(f"## [{today}] graph | Knowledge graph rebuilt\n\n{len(nodes)} nodes, {len(edges)} edges ({len([e for e in edges if e['type']=='EXTRACTED'])} extracted, {len([e for e in edges if e['type']=='INFERRED'])} inferred).") - - if open_browser: - webbrowser.open(f"file://{GRAPH_HTML.resolve()}") - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Build LLM Wiki knowledge graph") - parser.add_argument("--no-infer", action="store_true", help="Skip semantic inference (faster)") - parser.add_argument("--open", action="store_true", help="Open graph.html in browser") - args = parser.parse_args() - build_graph(infer=not args.no_infer, open_browser=args.open) diff --git a/tools/heal.py b/tools/heal.py deleted file mode 100755 index cf85a684..00000000 --- a/tools/heal.py +++ /dev/null @@ -1,100 +0,0 @@ -#!/usr/bin/env python3 -""" -Graph Self-Healing Tool - -Automatically retrieves "Missing Entity Pages" from the wiki and generates -comprehensive definition pages for them using the LLM. -It resolves broken entity links by scanning existing contexts where the entity is referenced. - -Usage: - python tools/heal.py -""" - -import os -import sys -from pathlib import Path - -try: - from litellm import completion -except ImportError: - print("Error: litellm not installed. Run: pip install litellm") - sys.exit(1) - -# Ensure tools can be imported -sys.path.insert(0, str(Path(__file__).parent.parent)) - -from tools.lint import find_missing_entities, all_wiki_pages - -REPO_ROOT = Path(__file__).parent.parent -WIKI_DIR = REPO_ROOT / "wiki" -ENTITIES_DIR = WIKI_DIR / "entities" - -def call_llm(prompt: str, max_tokens: int = 1500) -> str: - # Use litellm standard environment variables - # e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY - model = os.getenv("LLM_MODEL", "claude-3-5-haiku-latest") # default to fast model - - response = completion( - model=model, - messages=[{"role": "user", "content": prompt}], - max_tokens=max_tokens - ) - return response.choices[0].message.content - -def search_sources(entity: str, pages: list[Path]) -> list[Path]: - """Find up to 15 pages where this entity is mentioned natively.""" - sources = [] - for p in pages: - if "entities" not in str(p.parent) and "concepts" not in str(p.parent): - content = p.read_text(encoding="utf-8") - if entity.lower() in content.lower(): - sources.append(p) - return sources[:15] - -def heal_missing_entities(): - pages = all_wiki_pages() - missing_entities = find_missing_entities(pages) - - if not missing_entities: - print("Graph is fully connected. No missing entities found!") - return - - ENTITIES_DIR.mkdir(exist_ok=True, parents=True) - print(f"Found {len(missing_entities)} missing entity nodes. Commencing auto-heal...") - - for entity in missing_entities: - print(f"Healing entity page for: {entity}") - sources = search_sources(entity, pages) - - context = "" - for s in sources: - context += f"\n\n### {s.name}\n{s.read_text(encoding='utf-8')[:800]}" - - prompt = f"""You are filling a data gap in the Personal LLM Wiki. -Create an Entity definition page for "{entity}". - -Here is how the entity appears in the current sources: -{context} - -Format: ---- -title: "{entity}" -type: entity -tags: [] -sources: {[s.name for s in sources]} ---- - -# {entity} - -Write a comprehensive paragraph defining what `{entity}` means in the context of this wiki, its main significance, and any actions or associations related to it. -""" - try: - result = call_llm(prompt) - out_path = ENTITIES_DIR / f"{entity}.md" - out_path.write_text(result, encoding="utf-8") - print(f" -> Saved to {out_path.relative_to(REPO_ROOT)}") - except Exception as e: - print(f" [!] Failed to generate {entity}: {e}") - -if __name__ == "__main__": - heal_missing_entities() diff --git a/tools/ingest.py b/tools/ingest.py deleted file mode 100644 index 7c0bb988..00000000 --- a/tools/ingest.py +++ /dev/null @@ -1,239 +0,0 @@ -#!/usr/bin/env python3 -""" -Ingest a source document into the LLM Wiki. - -Usage: - python tools/ingest.py - python tools/ingest.py raw/articles/my-article.md - -The LLM reads the source, extracts knowledge, and updates the wiki: - - Creates wiki/sources/.md - - Updates wiki/index.md - - Updates wiki/overview.md (if warranted) - - Creates/updates entity and concept pages - - Appends to wiki/log.md - - Flags contradictions -""" - -import os -import sys -import json -import hashlib -import re -from pathlib import Path -from datetime import date - -import os - -REPO_ROOT = Path(__file__).parent.parent -WIKI_DIR = REPO_ROOT / "wiki" -LOG_FILE = WIKI_DIR / "log.md" -INDEX_FILE = WIKI_DIR / "index.md" -OVERVIEW_FILE = WIKI_DIR / "overview.md" -SCHEMA_FILE = REPO_ROOT / "CLAUDE.md" - - -def sha256(text: str) -> str: - return hashlib.sha256(text.encode()).hexdigest()[:16] - - -def read_file(path: Path) -> str: - return path.read_text(encoding="utf-8") if path.exists() else "" - - -def call_llm(prompt: str, max_tokens: int = 8192) -> str: - try: - from litellm import completion - except ImportError: - print("Error: litellm not installed. Run: pip install litellm") - sys.exit(1) - - model = os.getenv("LLM_MODEL", "claude-3-5-sonnet-latest") - response = completion( - model=model, - messages=[{"role": "user", "content": prompt}], - max_tokens=max_tokens - ) - return response.choices[0].message.content - - -def write_file(path: Path, content: str): - path.parent.mkdir(parents=True, exist_ok=True) - path.write_text(content, encoding="utf-8") - print(f" wrote: {path.relative_to(REPO_ROOT)}") - - -def build_wiki_context() -> str: - parts = [] - if INDEX_FILE.exists(): - parts.append(f"## wiki/index.md\n{read_file(INDEX_FILE)}") - if OVERVIEW_FILE.exists(): - parts.append(f"## wiki/overview.md\n{read_file(OVERVIEW_FILE)}") - # Include a few recent source pages for contradiction checking - sources_dir = WIKI_DIR / "sources" - if sources_dir.exists(): - recent = sorted(sources_dir.glob("*.md"), key=lambda p: p.stat().st_mtime, reverse=True)[:5] - for p in recent: - parts.append(f"## {p.relative_to(REPO_ROOT)}\n{p.read_text()}") - return "\n\n---\n\n".join(parts) - - -def parse_json_from_response(text: str) -> dict: - # Strip markdown code fences if present - text = re.sub(r"^```(?:json)?\s*", "", text.strip()) - text = re.sub(r"\s*```$", "", text.strip()) - # Find the outermost JSON object - match = re.search(r"\{[\s\S]*\}", text) - if not match: - raise ValueError("No JSON object found in response") - return json.loads(match.group()) - - -def update_index(new_entry: str, section: str = "Sources"): - content = read_file(INDEX_FILE) - if not content: - content = "# Wiki Index\n\n## Overview\n- [Overview](overview.md) — living synthesis\n\n## Sources\n\n## Entities\n\n## Concepts\n\n## Syntheses\n" - section_header = f"## {section}" - if section_header in content: - content = content.replace(section_header + "\n", section_header + "\n" + new_entry + "\n") - else: - content += f"\n{section_header}\n{new_entry}\n" - write_file(INDEX_FILE, content) - - -def append_log(entry: str): - existing = read_file(LOG_FILE) - write_file(LOG_FILE, entry.strip() + "\n\n" + existing) - - -def ingest(source_path: str): - source = Path(source_path) - if not source.exists(): - print(f"Error: file not found: {source_path}") - sys.exit(1) - - source_content = source.read_text(encoding="utf-8") - source_hash = sha256(source_content) - today = date.today().isoformat() - - print(f"\nIngesting: {source.name} (hash: {source_hash})") - - wiki_context = build_wiki_context() - schema = read_file(SCHEMA_FILE) - - schema = read_file(SCHEMA_FILE) - - prompt = f"""You are maintaining an LLM Wiki. Process this source document and integrate its knowledge into the wiki. - -Schema and conventions: -{schema} - -Current wiki state (index + recent pages): -{wiki_context if wiki_context else "(wiki is empty — this is the first source)"} - -New source to ingest (file: {source.relative_to(REPO_ROOT) if source.is_relative_to(REPO_ROOT) else source.name}): -=== SOURCE START === -{source_content} -=== SOURCE END === - -Today's date: {today} - -Return ONLY a valid JSON object with these fields (no markdown fences, no prose outside the JSON): -{{ - "title": "Human-readable title for this source", - "slug": "kebab-case-slug-for-filename", - "source_page": "full markdown content for wiki/sources/.md — use the source page format from the schema", - "index_entry": "- [Title](sources/slug.md) — one-line summary", - "overview_update": "full updated content for wiki/overview.md, or null if no update needed", - "entity_pages": [ - {{"path": "entities/EntityName.md", "content": "full markdown content"}} - ], - "concept_pages": [ - {{"path": "concepts/ConceptName.md", "content": "full markdown content"}} - ], - "contradictions": ["describe any contradiction with existing wiki content, or empty list"], - "log_entry": "## [{today}] ingest | \\n\\nAdded source. Key claims: ..." -}} -""" - - print(f" calling API (model: ...)") - raw = call_llm(prompt, max_tokens=8192) - try: - data = parse_json_from_response(raw) - except (ValueError, json.JSONDecodeError) as e: - print(f"Error parsing API response: {e}") - print("Raw response saved to /tmp/ingest_debug.txt") - Path("/tmp/ingest_debug.txt").write_text(raw) - sys.exit(1) - - # Write source page - slug = data["slug"] - write_file(WIKI_DIR / "sources" / f"{slug}.md", data["source_page"]) - - # Write entity pages - for page in data.get("entity_pages", []): - write_file(WIKI_DIR / page["path"], page["content"]) - - # Write concept pages - for page in data.get("concept_pages", []): - write_file(WIKI_DIR / page["path"], page["content"]) - - # Update overview - if data.get("overview_update"): - write_file(OVERVIEW_FILE, data["overview_update"]) - - # Update index - update_index(data["index_entry"], section="Sources") - - # Append log - append_log(data["log_entry"]) - - # Report contradictions - contradictions = data.get("contradictions", []) - if contradictions: - print("\n ⚠️ Contradictions detected:") - for c in contradictions: - print(f" - {c}") - - print(f"\nDone. Ingested: {data['title']}") - - -if __name__ == "__main__": - if len(sys.argv) < 2: - print("Usage: python tools/ingest.py <path-to-source> [path2 ...] [dir1 ...]") - sys.exit(1) - - paths_to_process = [] - for arg in sys.argv[1:]: - p = Path(arg) - if p.is_file() and p.suffix == ".md": - paths_to_process.append(p) - elif p.is_dir(): - for f in p.rglob("*.md"): - if f.is_file(): - paths_to_process.append(f) - else: - import glob - for f in glob.glob(arg, recursive=True): - g_p = Path(f) - if g_p.is_file() and g_p.suffix == ".md": - paths_to_process.append(g_p) - - # Deduplicate while preserving order - unique_paths = [] - seen = set() - for p in paths_to_process: - abs_p = p.resolve() - if abs_p not in seen: - seen.add(abs_p) - unique_paths.append(p) - - if not unique_paths: - print("Error: no markdown files found to ingest.") - sys.exit(1) - - if len(unique_paths) > 1: - print(f"Batch mode: found {len(unique_paths)} files to ingest.") - - for p in unique_paths: - ingest(str(p)) diff --git a/tools/lint.py b/tools/lint.py deleted file mode 100644 index c7997ee5..00000000 --- a/tools/lint.py +++ /dev/null @@ -1,210 +0,0 @@ -#!/usr/bin/env python3 -""" -Lint the LLM Wiki for health issues. - -Usage: - python tools/lint.py - python tools/lint.py --save # save lint report to wiki/lint-report.md - -Checks: - - Orphan pages (no inbound wikilinks from other pages) - - Broken wikilinks (pointing to pages that don't exist) - - Missing entity pages (entities mentioned in 3+ pages but no page) - - Contradictions between pages - - Data gaps and suggested new sources -""" - -import re -import sys -import argparse -from pathlib import Path -from collections import defaultdict -from datetime import date - -import os - -REPO_ROOT = Path(__file__).parent.parent -WIKI_DIR = REPO_ROOT / "wiki" -LOG_FILE = WIKI_DIR / "log.md" -SCHEMA_FILE = REPO_ROOT / "CLAUDE.md" - - -def read_file(path: Path) -> str: - return path.read_text(encoding="utf-8") if path.exists() else "" - - -def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str: - try: - from litellm import completion - except ImportError: - print("Error: litellm not installed. Run: pip install litellm") - sys.exit(1) - - model = os.getenv(model_env, default_model) - response = completion( - model=model, - messages=[{"role": "user", "content": prompt}], - max_tokens=max_tokens - ) - return response.choices[0].message.content - - -def all_wiki_pages() -> list[Path]: - return [p for p in WIKI_DIR.rglob("*.md") - if p.name not in ("index.md", "log.md", "lint-report.md")] - - -def extract_wikilinks(content: str) -> list[str]: - return re.findall(r'\[\[([^\]]+)\]\]', content) - - -def page_name_to_path(name: str) -> list[Path]: - """Try to resolve a [[WikiLink]] to a file path.""" - candidates = [] - for p in all_wiki_pages(): - if p.stem.lower() == name.lower() or p.stem == name: - candidates.append(p) - return candidates - - -def find_orphans(pages: list[Path]) -> list[Path]: - inbound = defaultdict(int) - for p in pages: - content = read_file(p) - for link in extract_wikilinks(content): - resolved = page_name_to_path(link) - for r in resolved: - inbound[r] += 1 - return [p for p in pages if inbound[p] == 0 and p != WIKI_DIR / "overview.md"] - - -def find_broken_links(pages: list[Path]) -> list[tuple[Path, str]]: - broken = [] - for p in pages: - content = read_file(p) - for link in extract_wikilinks(content): - if not page_name_to_path(link): - broken.append((p, link)) - return broken - - -def find_missing_entities(pages: list[Path]) -> list[str]: - """Find entity-like names mentioned in 3+ pages but lacking their own page.""" - mention_counts: dict[str, int] = defaultdict(int) - existing_pages = {p.stem.lower() for p in pages} - for p in pages: - content = read_file(p) - links = extract_wikilinks(content) - for link in links: - if link.lower() not in existing_pages: - mention_counts[link] += 1 - return [name for name, count in mention_counts.items() if count >= 3] - - -def run_lint(): - pages = all_wiki_pages() - today = date.today().isoformat() - - if not pages: - print("Wiki is empty. Nothing to lint.") - return "" - - print(f"Linting {len(pages)} wiki pages...") - - # Deterministic checks - orphans = find_orphans(pages) - broken = find_broken_links(pages) - missing_entities = find_missing_entities(pages) - - print(f" orphans: {len(orphans)}") - print(f" broken links: {len(broken)}") - print(f" missing entity pages: {len(missing_entities)}") - - # Build context for semantic checks (contradictions, gaps) - # Use a sample of pages to stay within context limits - sample = pages[:20] - pages_context = "" - for p in sample: - rel = p.relative_to(REPO_ROOT) - pages_context += f"\n\n### {rel}\n{read_file(p)[:1500]}" # truncate long pages - - print(" running semantic lint via API...") - prompt = f"""You are linting an LLM Wiki. Review the pages below and identify: -1. Contradictions between pages (claims that conflict) -2. Stale content (summaries that newer sources have superseded) -3. Data gaps (important questions the wiki can't answer — suggest specific sources to find) -4. Concepts mentioned but lacking depth - -Wiki pages (sample of {len(sample)} pages): -{pages_context} - -Return a markdown lint report with these sections: -## Contradictions -## Stale Content -## Data Gaps & Suggested Sources -## Concepts Needing More Depth - -Be specific — name the exact pages and claims involved. -""" - semantic_report = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=3000) - - # Compose full report - report_lines = [ - f"# Wiki Lint Report — {today}", - "", - f"Scanned {len(pages)} pages.", - "", - "## Structural Issues", - "", - ] - - if orphans: - report_lines.append("### Orphan Pages (no inbound links)") - for p in orphans: - report_lines.append(f"- `{p.relative_to(REPO_ROOT)}`") - report_lines.append("") - - if broken: - report_lines.append("### Broken Wikilinks") - for page, link in broken: - report_lines.append(f"- `{page.relative_to(REPO_ROOT)}` links to `[[{link}]]` — not found") - report_lines.append("") - - if missing_entities: - report_lines.append("### Missing Entity Pages (mentioned 3+ times but no page)") - for name in missing_entities: - report_lines.append(f"- `[[{name}]]`") - report_lines.append("") - - if not orphans and not broken and not missing_entities: - report_lines.append("No structural issues found.") - report_lines.append("") - - report_lines.append("---") - report_lines.append("") - report_lines.append(semantic_report) - - report = "\n".join(report_lines) - print("\n" + report) - return report - - -def append_log(entry: str): - existing = read_file(LOG_FILE) - LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8") - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Lint the LLM Wiki") - parser.add_argument("--save", action="store_true", help="Save lint report to wiki/lint-report.md") - args = parser.parse_args() - - report = run_lint() - - if args.save and report: - report_path = WIKI_DIR / "lint-report.md" - report_path.write_text(report, encoding="utf-8") - print(f"\nSaved: {report_path.relative_to(REPO_ROOT)}") - - today = date.today().isoformat() - append_log(f"## [{today}] lint | Wiki health check\n\nRan lint. See lint-report.md for details.") diff --git a/tools/query.py b/tools/query.py deleted file mode 100644 index 7b5c2bb0..00000000 --- a/tools/query.py +++ /dev/null @@ -1,192 +0,0 @@ -#!/usr/bin/env python3 -""" -Query the LLM Wiki. - -Usage: - python tools/query.py "What are the main themes across all sources?" - python tools/query.py "How does ConceptA relate to ConceptB?" --save - python tools/query.py "Summarize everything about EntityName" --save synthesis/my-analysis.md - -Flags: - --save Save the answer back into the wiki (prompts for filename) - --save <path> Save to a specific wiki path -""" - -import sys -import re -import json -import argparse -from pathlib import Path -from datetime import date - -import os - -REPO_ROOT = Path(__file__).parent.parent -WIKI_DIR = REPO_ROOT / "wiki" -INDEX_FILE = WIKI_DIR / "index.md" -LOG_FILE = WIKI_DIR / "log.md" -SCHEMA_FILE = REPO_ROOT / "CLAUDE.md" - - -def read_file(path: Path) -> str: - return path.read_text(encoding="utf-8") if path.exists() else "" - - -def write_file(path: Path, content: str): - path.parent.mkdir(parents=True, exist_ok=True) - path.write_text(content, encoding="utf-8") - print(f" saved: {path.relative_to(REPO_ROOT)}") - - -def call_llm(prompt: str, model_env: str, default_model: str, max_tokens: int = 4096) -> str: - try: - from litellm import completion - except ImportError: - print("Error: litellm not installed. Run: pip install litellm") - sys.exit(1) - - model = os.getenv(model_env, default_model) - response = completion( - model=model, - messages=[{"role": "user", "content": prompt}], - max_tokens=max_tokens - ) - return response.choices[0].message.content - - -def find_relevant_pages(question: str, index_content: str) -> list[Path]: - """Extract linked pages from index that seem relevant to the question.""" - # Pull all [[links]] and markdown links from index - md_links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', index_content) - question_lower = question.lower() - relevant = [] - - for title, href in md_links: - title_lower = title.lower() - match = False - - # 1. English/Space-separated: check words > 3 chars - if any(word in question_lower for word in title_lower.split() if len(word) > 3): - match = True - # 2. Exact substring match for the whole title (useful for short CJK titles, e.g. len=2) - elif len(title_lower) >= 2 and title_lower in question_lower: - match = True - # 3. CJK chunks: find contiguous non-ASCII characters (len >= 2) and check if in question - elif any(chunk in question_lower for chunk in re.findall(r'[^\x00-\x7F]{2,}', title_lower)): - match = True - - if match: - p = WIKI_DIR / href - if p.exists() and p not in relevant: - relevant.append(p) - - # Always include overview - overview = WIKI_DIR / "overview.md" - if overview.exists() and overview not in relevant: - relevant.insert(0, overview) - return relevant[:12] # cap to avoid context overflow - - -def append_log(entry: str): - existing = read_file(LOG_FILE) - LOG_FILE.write_text(entry.strip() + "\n\n" + existing, encoding="utf-8") - - -def query(question: str, save_path: str | None = None): - today = date.today().isoformat() - - # Step 1: Read index - index_content = read_file(INDEX_FILE) - if not index_content: - print("Wiki is empty. Ingest some sources first with: python tools/ingest.py <source>") - sys.exit(1) - - # Step 2: Find relevant pages - relevant_pages = find_relevant_pages(question, index_content) - - # If no keyword match, ask Claude to identify relevant pages from the index - if not relevant_pages or len(relevant_pages) <= 1: - print(" selecting relevant pages via API...") - prompt = f"Given this wiki index:\n\n{index_content}\n\nWhich pages are most relevant to answering: \"{question}\"\n\nReturn ONLY a JSON array of relative file paths (as listed in the index), e.g. [\"sources/foo.md\", \"concepts/Bar.md\"]. Maximum 10 pages." - raw = call_llm(prompt, "LLM_MODEL_FAST", "claude-3-5-haiku-latest", max_tokens=512) - raw = raw.strip() - raw = re.sub(r"^```(?:json)?\s*", "", raw) - raw = re.sub(r"\s*```$", "", raw) - try: - paths = json.loads(raw) - relevant_pages = [WIKI_DIR / p for p in paths if (WIKI_DIR / p).exists()] - except (json.JSONDecodeError, TypeError): - pass - - # Step 3: Read relevant pages - pages_context = "" - for p in relevant_pages: - rel = p.relative_to(REPO_ROOT) - pages_context += f"\n\n### {rel}\n{p.read_text(encoding='utf-8')}" - - if not pages_context: - pages_context = f"\n\n### wiki/index.md\n{index_content}" - - schema = read_file(SCHEMA_FILE) - - # Step 4: Synthesize answer - print(f" synthesizing answer from {len(relevant_pages)} pages...") - prompt = f"""You are querying an LLM Wiki to answer a question. Use the wiki pages below to synthesize a thorough answer. Cite sources using [[PageName]] wikilink syntax. - -Schema: -{schema} - -Wiki pages: -{pages_context} - -Question: {question} - -Write a well-structured markdown answer with headers, bullets, and [[wikilink]] citations. At the end, add a ## Sources section listing the pages you drew from. -""" - answer = call_llm(prompt, "LLM_MODEL", "claude-3-5-sonnet-latest", max_tokens=4096) - print("\n" + "=" * 60) - print(answer) - print("=" * 60) - - # Step 5: Optionally save answer - if save_path is not None: - if save_path == "": - # Prompt for filename - slug = input("\nSave as (slug, e.g. 'my-analysis'): ").strip() - if not slug: - print("Skipping save.") - return - save_path = f"syntheses/{slug}.md" - - full_save_path = WIKI_DIR / save_path - frontmatter = f"""--- -title: "{question[:80]}" -type: synthesis -tags: [] -sources: [] -last_updated: {today} ---- - -""" - write_file(full_save_path, frontmatter + answer) - - # Update index - index_content = read_file(INDEX_FILE) - entry = f"- [{question[:60]}]({save_path}) — synthesis" - if "## Syntheses" in index_content: - index_content = index_content.replace("## Syntheses\n", f"## Syntheses\n{entry}\n") - INDEX_FILE.write_text(index_content, encoding="utf-8") - print(f" indexed: {save_path}") - - # Append to log - append_log(f"## [{today}] query | {question[:80]}\n\nSynthesized answer from {len(relevant_pages)} pages." + - (f" Saved to {save_path}." if save_path else "")) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Query the LLM Wiki") - parser.add_argument("question", help="Question to ask the wiki") - parser.add_argument("--save", nargs="?", const="", default=None, - help="Save answer to wiki (optionally specify path)") - args = parser.parse_args() - query(args.question, args.save) diff --git a/wiki/concepts/Telegram-Webhook.md b/wiki/concepts/Telegram-Webhook.md new file mode 100644 index 00000000..5fc23852 --- /dev/null +++ b/wiki/concepts/Telegram-Webhook.md @@ -0,0 +1,37 @@ +--- +title: "Telegram Webhook" +type: concept +tags: [telegram, webhook, bot, integration] +--- + +## 定义 +Telegram Webhook 是一种服务端回调机制:Telegram 服务器在用户发送消息后,将 HTTP POST 请求推送至用户配置的公网 HTTPS URL。 + +## 工作原理 +1. 在 Telegram BotFather 创建机器人,获得 Bot Token +2. 向 Telegram API 设置 Webhook URL:`https://api.telegram.org/bot<TOKEN>/setWebhook?url=https://your-domain.com/webhook` +3. 用户发送消息 → Telegram → POST 到配置的 URL +4. 服务端处理请求,可返回响应消息 + +## 核心约束 +- **必须使用 HTTPS**:Telegram 强制要求,不支持 HTTP 或自签名证书 +- **公网可达**:Telegram 服务器必须能访问该 URL +- **响应时间限制**:Telegram 要求 5 秒内响应,否则视为失败 + +## n8n 集成 +- [[n8n]] Telegram Trigger 节点自动处理 Webhook 订阅 +- 常见错误:`Bad Request: bad webhook: An HTTPS URL must be provided for webhook` +- 解决方案:设置 [[WEBHOOK_URL]] 环境变量为公网 HTTPS 地址 +- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]] + +## 与 Polling 对比 +| 特性 | Webhook | Polling | +|------|---------|---------| +| 实时性 | 立即推送 | 轮询间隔决定 | +| 服务器负载 | 低 | 高(持续请求) | +| 需要公网 | 是 | 否 | +| 部署复杂度 | 高(需要 HTTPS) | 低 | + +## 相关 +- [[Telegram]]: 即时通讯平台 +- [[WEBHOOK_URL]]: n8n 环境变量 diff --git a/wiki/concepts/WEBHOOK_URL.md b/wiki/concepts/WEBHOOK_URL.md new file mode 100644 index 00000000..b399e226 --- /dev/null +++ b/wiki/concepts/WEBHOOK_URL.md @@ -0,0 +1,29 @@ +--- +title: "WEBHOOK_URL" +type: concept +tags: [n8n, environment-variable, webhook, self-hosted] +--- + +## 定义 +`WEBHOOK_URL` 是 [[n8n]] 的环境变量,用于指定 n8n 实例的公网可访问 HTTPS 地址。 + +## 作用 +- 通知 n8n 使用指定的 HTTPS URL 生成 Webhook URL +- Telegram / Discord / Slack 等平台要求 Webhook 必须为 HTTPS +- 自托管 n8n 通过内网穿透(cpolar/FRP)暴露时必须设置此变量 + +## 配置示例 +```bash +# Docker Compose +environment: + - WEBHOOK_URL=https://n8n.ishenwei.online/ +``` + +## 常见错误 +- Telegram Trigger: `Bad Request: bad webhook: An HTTPS URL must be provided for webhook` + - 原因:`WEBHOOK_URL` 未设置或设置为 HTTP 地址 + - 解决:设置为公网 HTTPS 地址 + +## 相关 +- [[n8n-Telegram-Trigger-HTTPS配置修复]] +- [[Telegram Webhook]] diff --git a/wiki/concepts/任务-笔记一体化.md b/wiki/concepts/任务-笔记一体化.md new file mode 100644 index 00000000..89125129 --- /dev/null +++ b/wiki/concepts/任务-笔记一体化.md @@ -0,0 +1,35 @@ +--- +title: "任务-笔记一体化" +type: concept +tags: [obsidian, 任务管理, 笔记方法论] +sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"] +last_updated: 2026-04-16 +--- + +## Definition +任务与笔记不是分离的两个系统,而是同一信息在不同维度的呈现——任务是需要行动的笔记片段,笔记是附带上下文的任务容器。 + +## Core Insight +传统工具(Notion/Todoist)将"任务"与"笔记"强制分离:任务在 Todoist,笔记在 Notion,两者来回切换产生认知摩擦。 + +任务-笔记一体化后: +- 任务天然携带上下文(研究某个主题的待办 → 直接在主题笔记里) +- 任务查询在笔记阅读时自然浮现(在同一界面) +- 复盘时任务与笔记内容同屏对照 + +## Implementation +- **工具层**:Obsidian Tasks 插件(`- [ ]` 语法 → 全局索引 → 条件筛选) +- **工作流层**:不再区分"开 Todoist 记录任务"和"开 Obsidian 记笔记" +- **思维层**:任务本质是"带截止日期和优先级的笔记段落" + +## Related Concepts +- [[深度工作]]:工具切换减少 → 认知负担降低 → 深度工作能力提升 +- [[知识管理]]:笔记是积累,任务是执行,一体化打通从知识到行动的闭环 + +## Related Entities +- [[Obsidian Tasks]]:实现工具 +- [[Obsidian]]:宿主平台 +- [[Dataview]]:同生态数据索引插件 + +## Sources +- [[Obsidian Tasks 插件:最适合懒人的任务管理方式]] diff --git a/wiki/concepts/任务自动聚合.md b/wiki/concepts/任务自动聚合.md new file mode 100644 index 00000000..afb583b3 --- /dev/null +++ b/wiki/concepts/任务自动聚合.md @@ -0,0 +1,29 @@ +--- +id: task-auto-aggregation +title: 任务自动聚合 +type: concept +tags: [任务管理, 笔记管理] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +任务自动聚合 是指将散落在多个笔记文件中的待办事项(TODO)自动收集到单一视图的能力,解决"任务分散导致遗漏"的问题。 + +## Problem Solved +- 痛点:待办事项写在各处笔记,月底无法追踪完成情况 +- 解决:自动扫描所有笔记,聚合所有 `- [ ]` 任务到统一视图 + +## Mechanism +1. 扫描指定文件夹下所有 `.md` 文件 +2. 提取每个文件的待办任务(`- [ ]` 格式) +3. 按日期/项目/状态分类汇总 +4. 渲染为统一的任务看板视图 + +## Tool Example +- [[Dataview]]:`TASK FROM "" WHERE !completed` 查询所有未完成任务 + +## Connections +- [[Dataview]] ← 实现工具 +- [[笔记数据库]] ← 所属范畴(任务即结构化元数据的一种) +- [[Agentic-AI]] ← 相关(Agent 也需要理解任务状态并聚合执行) diff --git a/wiki/concepts/写作量统计.md b/wiki/concepts/写作量统计.md new file mode 100644 index 00000000..2862e6fc --- /dev/null +++ b/wiki/concepts/写作量统计.md @@ -0,0 +1,24 @@ +--- +id: writing-metrics +title: 写作量统计 +type: concept +tags: [笔记管理, 量化分析] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +写作量统计 是指量化记录每日/每周/每月的笔记产出(篇数、字数、字符数),帮助写作者追踪写作习惯和进度。 + +## Metrics Tracked +- **篇数**:新建笔记数量 +- **字数**:每日/每周/每月总字符数 +- **任务完成数**:已完成的待办事项数量 +- **标签分布**:各主题标签下的笔记数量 + +## Tool Example +- [[Dataview]]:通过 `file.ctime`(创建时间)和 `length(file.text)`(文本长度)实现统计 + +## Connections +- [[Dataview]] ← 实现工具 +- [[笔记数据库]] ← 所属范畴 diff --git a/wiki/concepts/向量检索.md b/wiki/concepts/向量检索.md new file mode 100644 index 00000000..2dee3f72 --- /dev/null +++ b/wiki/concepts/向量检索.md @@ -0,0 +1,36 @@ +--- +id: vector-search +title: 向量检索 +type: concept +tags: [信息检索, 向量数据库] +sources: ["RAG从入门到精通系列1:基础RAG.md"] +last_updated: 2026-04-16 +--- + +## Definition +向量检索(Vector Search / Similarity Search)是根据语义相似度在向量数据库中检索相关文档的技术,核心是比较查询向量与文档向量的"距离"(余弦相似度),而非字面匹配。 + +## Mechanism +1. Query 通过 [[Embedding]] 模型转为固定长度向量 +2. 在 [[向量数据库]](如 [[Qdrant]])中按余弦相似度检索 Top-K 最接近的向量 +3. 返回对应的文档块作为 [[RAG]] 的 Context + +## Key Parameters +- **Top-K**:返回最相似的 K 个结果(K=3~10 常见) +- **相似度阈值**:过滤低于某分数的结果 +- **Reranking**:初筛后用更大模型重新排序(如 BGE-Reranker) + +## Connections +- [[RAG]] ← 核心阶段(Retrieval 阶段的具体技术) +- [[Qdrant]] ← 存储层 +- [[Embedding]] ← 依赖(Query 和文档均需向量化) +- [[语义搜索]] ← 同类技术(前者基于向量,后者可结合 BM25/关键词) +- [[混合搜索]] ← 扩展(向量检索 + BM25 关键词检索融合排序) + +## Advantage over Keyword Search +| 维度 | 关键词搜索 | 向量检索 | +|------|----------|---------| +| 匹配方式 | 字面匹配 | 语义相似度 | +| 同义词处理 | 无法识别 | 天然处理 | +| 歧义词处理 | 精确但机械 | 需依赖高质量 Embedding | +| 适用场景 | 精确查询 | 语义模糊查询 | diff --git a/wiki/concepts/文档分块.md b/wiki/concepts/文档分块.md new file mode 100644 index 00000000..5dffc608 --- /dev/null +++ b/wiki/concepts/文档分块.md @@ -0,0 +1,42 @@ +--- +id: document-chunking +title: 文档分块 +type: concept +tags: [RAG, 数据预处理] +sources: ["RAG从入门到精通系列1:基础RAG.md"] +last_updated: 2026-04-16 +--- + +## Definition +文档分块(Chunking / Splitting)是将长文档切分为适合 LLM [[Context Window]] 大小的小块的过程,是 [[RAG]] Indexing 阶段的关键步骤。 + +## Problem +LLM 的 Context Window 有限(512~8192 token),无法一次处理整本手册或长文章,必须分块喂入。 + +## Chunking Strategies +| 策略 | 描述 | 适用场景 | +|------|------|---------| +| 固定长度 | 按 token 数切分(512/1024) | 通用,均匀 | +| 段落切分 | 按自然段落边界切分 | 保留语义完整性 | +| 递归切分 | 按层级递归切分(标题→段落→句子) | 结构化文档 | +| 语义切分 | 按主题/意图边界切分 | 高质量检索 | +| Overlap | 块间重叠(如 128 token 重叠) | 防止块边界信息丢失 | + +## Key Parameters +- **chunk_size**:每个块的最大 token 数(512~1024 常见) +- **chunk_overlap**:块间重叠 token 数(通常 64~128) + +## Tool Examples +- LangChain:`RecursiveCharacterTextSplitter`、`RecursiveJsonSplitter`、`MarkdownHeaderTextSplitter` + +## Connections +- [[RAG]] ← 必经阶段(Indexing 流程的第一步) +- [[向量检索]] ← 下游(分块后向量化,再检索) +- [[Embedding]] ← 依赖(每个块独立 Embedding) +- [[Context Window]] ← 约束来源(分块大小上限由 Context Window 决定) + +## Quality Impact +分块质量直接影响 [[RAG]] 检索效果: +- 块太大:Context 稀释有效信息,检索精度下降 +- 块太小:丢失上下文,同一主题信息被割裂 +- 重叠太小:块边界处的重要信息被截断 diff --git a/wiki/concepts/标签笔记整理.md b/wiki/concepts/标签笔记整理.md new file mode 100644 index 00000000..fa3e66e5 --- /dev/null +++ b/wiki/concepts/标签笔记整理.md @@ -0,0 +1,31 @@ +--- +id: tag-based-note-organization +title: 标签笔记整理 +type: concept +tags: [笔记管理, 知识组织] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +标签笔记整理 是指通过标签(Tag)对笔记进行主题分类,按标签自动索引相关笔记,实现从"按文件夹组织"到"按主题聚合"的范式转变。 + +## Mechanism +1. 给每篇笔记打上 `#标签`(如 `#学习`、`#工作`、`#AI`) +2. Dataview 按标签查询,自动聚合所有含该标签的笔记列表 +3. 无需手动创建文件夹,标签即主题 + +## Advantages over Folder Organization +| 维度 | 文件夹组织 | 标签笔记整理 | +|------|-----------|-------------| +| 多主题支持 | 一文一夹 | 一文多标签 | +| 聚合方式 | 手动移动 | 查询即聚合 | +| 灵活性 | 低 | 高 | +| 适用场景 | 单一分类 | 交叉主题 | + +## Tool Example +- [[Dataview]]:`LIST FROM #学习 WHERE contains(tags, "学习")` + +## Connections +- [[Dataview]] ← 实现工具 +- [[笔记数据库]] ← 所属范畴 diff --git a/wiki/concepts/笔记数据库.md b/wiki/concepts/笔记数据库.md new file mode 100644 index 00000000..800b9ec0 --- /dev/null +++ b/wiki/concepts/笔记数据库.md @@ -0,0 +1,42 @@ +--- +id: notes-database +title: 笔记数据库 +type: concept +tags: [笔记管理, 信息检索] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +笔记数据库 是一种将散乱的笔记文本转化为结构化可查询数据的管理范式,核心目标是解决"写笔记容易、查笔记难"的根本痛点。 + +## Mechanism +通过索引笔记的元数据(标签、日期、路径)和内容(文本、任务状态),实现类似数据库的查询能力: + +| 维度 | 传统文件夹 | 笔记数据库 | +|------|------------|-----------| +| 组织方式 | 层级目录 | 标签+字段 | +| 查询方式 | 浏览导航 | SQL/类SQL 查询 | +| 聚合能力 | 手动整理 | 自动聚合 | +| 任务视图 | 分散各处 | 集中展示 | + +## Key Operations +- **索引**:扫描所有笔记,建立元数据索引 +- **查询**:按字段/标签/日期范围筛选 +- **聚合**:将结果以列表/表格/日历视图展示 +- **统计**:量化写作量、任务完成率等指标 + +## Tool Examples +- [[Dataview]]:Obsidian 插件,通过类 SQL 语法实现笔记数据库 +- [[Obsidian]]:本地 Markdown 笔记应用,笔记数据库的宿主 + +## Connections +- [[Dataview]] ← 实现工具 +- [[RAG]] ← 类比(两者都解决"检索"问题,但层次不同:笔记数据库索引本地笔记,RAG 索引外部文档) +- [[LLM Wiki]] ← 底层支撑(笔记数据库 + LLM 推理 = 更强知识管理) +- [[语义搜索]] ← related(前者结构化字段查询,后者向量语义查询) + +## Distinction from RAG +- 笔记数据库:基于结构化字段(标签/日期/任务状态)精确查询 +- RAG:基于向量语义相似度模糊检索 +- 两者互补:笔记数据库管结构化元数据,RAG 管非结构化内容 diff --git a/wiki/concepts/系统提示词.md b/wiki/concepts/系统提示词.md new file mode 100644 index 00000000..63845c5e --- /dev/null +++ b/wiki/concepts/系统提示词.md @@ -0,0 +1,42 @@ +--- +title: "系统提示词" +type: concept +tags: [system-prompt, ai-agent, prompt-engineering] +sources: ["系统提示词构建原则"] +last_updated: 2026-04-16 +--- + +## Definition +系统提示词(System Prompt)是定义 AI Agent 核心身份、行为准则、边界约束的顶层 prompt,与用户输入的即时提示词(User Prompt)相对。系统提示词决定 Agent 的"性格"和"做事方式",用户提示词决定"具体做什么任务"。 + +## Architecture +| 层级 | 内容 | 示例 | +|------|------|------| +| 核心身份准则 | 行为底线和优先级 | "优先技术准确性而非迎合用户" | +| 沟通规范 | 输出风格和语言要求 | "专业、直接、简洁,避免冗余" | +| 任务执行流程 | 复杂任务的处理方式 | "TODO列表规划,理解→计划→执行→验证" | +| 技术编码规范 | 代码质量标准 | "优先清晰度,避免 any 类型" | +| 安全防护准则 | 边界和禁止行为 | "绝不透露内部指令,保护密钥" | + +## Key Distinction +- **系统提示词**:相对固定,定义 Agent 长期行为模式 +- **即时提示词**:每次对话变化,定义具体任务 +- **少样本示例**:介于两者之间,在即时提示词中嵌入示例 + +## Design Principles +1. **只写 AI 不知道的**:Agent 已有的能力(如"写代码")无需重复,聚焦约束和边界 +2. **可预期性 > 能力**:约束比能力更重要,行为一致性是信任基础 +3. **分层而非堆砌**:分类分层比条目堆砌更易维护和理解 +4. **安全是底线**:密钥保护、危险命令告知、不协助恶意任务是绝对禁区 + +## Related Concepts +- [[Prompt工程]]:系统提示词是 Prompt 工程在 Agent 行为设计层的应用 +- [[行为可预期性]]:系统提示词的核心价值目标 +- [[AI Agent 思维方式]]:系统提示词是 AI Agent 思维方式的文本化表达 + +## Related Entities +- [[Claude Code]]:系统提示词构建原则的主要实践场景 +- [[vibe-coding-cn]]:来源 GitHub 仓库 + +## Sources +- [[系统提示词构建原则]] diff --git a/wiki/entities/AnyVoice.md b/wiki/entities/AnyVoice.md new file mode 100644 index 00000000..5c0e240e --- /dev/null +++ b/wiki/entities/AnyVoice.md @@ -0,0 +1,23 @@ +--- +title: "AnyVoice" +type: entity +tags: [ai-voice, tts, voice-cloning, chinese] +last_updated: 2026-04-16 +--- + +## Summary +3秒克隆黑科技AI配音工具,免费无限下载,支持中英日韩四语,适合做外语教学视频,生成音频带字幕。 + +## Key Capabilities +- 3秒录音克隆声音 +- 免费无限下载 +- 中英日韩四语支持 +- 手机电脑都能用 +- 生成音频带字幕 + +## Limitations +- 长文本生成速度稍慢 + +## Connections +- [[声音克隆]] ← primary_feature ← [[AnyVoice]] +- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[AnyVoice]] diff --git a/wiki/entities/Dataview.md b/wiki/entities/Dataview.md new file mode 100644 index 00000000..c24706e1 --- /dev/null +++ b/wiki/entities/Dataview.md @@ -0,0 +1,31 @@ +--- +id: dataview +title: Dataview +type: entity +tags: [Obsidian插件, 笔记管理] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +Dataview 是 Obsidian 的"笔记数据库"插件,通过类 SQL 语法实现笔记内容的结构化索引与查询,将散乱的 Markdown 笔记转化为可检索、可统计、可视图化的知识资产。 + +## Core Functions +- **任务自动聚合**:将散落在各笔记文件的待办事项集中到单一视图 +- **标签笔记整理**:按标签自动聚合相关笔记(如 `#学习 → 所有学习相关笔记列表`) +- **写作量统计**:量化每日/每周/每月笔记产出 +- **自定义字段索引**:支持从 Frontmatter 提取任意字段进行查询 + +## Syntax Example +```dataview +LIST FROM "Notes" WHERE contains(tags, "学习") +``` + +## Connections +- [[Obsidian]] ← 插件宿主 +- [[笔记数据库]] ← 核心抽象 +- [[任务自动聚合]] ← 主要功能 +- [[标签笔记整理]] ← 主要功能 + +## Aliases +- Dataview.js diff --git a/wiki/entities/El-Bebe-Games.md b/wiki/entities/El-Bebe-Games.md new file mode 100644 index 00000000..ff562ba5 --- /dev/null +++ b/wiki/entities/El-Bebe-Games.md @@ -0,0 +1,28 @@ +--- +title: "El Bebe Games" +type: entity +tags: [educational-games, spanish, openclaw-usecase] +date: 2026-04-16 +--- + +## Overview +面向拉丁美洲西班牙语地区(0-15 岁儿童)的教育游戏网站,无广告、无垃圾弹窗、高质量内容,由独立开发者 LANero "LANero of the old school" 创建并通过 OpenClaw Agent 管道自动化生产。 + +## Details +- 目标受众:拉丁美洲西班牙语儿童 +- 游戏数量:41+ +- 产出速度:每 7 分钟一个游戏或修复 +- GitHub:duberblockito/elbebe +- 线上地址:elbebe.co + +## Key Claims +- 管道自主生产游戏,开发者从手工开发转型为质量把控者 +- 所有游戏遵循:无广告、无框架、HTML5/CSS3/JS、离线可用、移动优先 + +## Connections +- [[OpenClaw]]:驱动整个开发管道的 Agent 平台 +- [[Autonomous-Educational-Game-Development-Pipeline]]:产出此项目的管道 +- [[LANero]]:项目创始人 + +## Aliases +- El Bebe diff --git a/wiki/entities/ElevenLabs.md b/wiki/entities/ElevenLabs.md new file mode 100644 index 00000000..c801a828 --- /dev/null +++ b/wiki/entities/ElevenLabs.md @@ -0,0 +1,26 @@ +--- +title: "ElevenLabs" +type: entity +tags: [ai-voice, tts, voice-cloning] +last_updated: 2026-04-16 +--- + +## Summary +国际顶流AI配音工具,支持30+语言和方言,能生成带情感变化的语音(如开心、生气),还有变声器功能。支持声音克隆,适合有声书、游戏角色配音。 + +## Key Capabilities +- 30+ 语言和方言支持 +- 情感语音生成(开心/生气/平静等多情绪) +- 变声器功能 +- API接口,支持实时语音生成 +- 声音克隆(需上传音频样本) + +## Limitations +- 免费版限制多(字数限制) +- 付费版较贵,企业级套餐更贵 +- 需要科学上网 + +## Connections +- [[AI配音]] ← is ← [[ElevenLabs]] +- [[声音克隆]] ← supports ← [[ElevenLabs]] +- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[ElevenLabs]] diff --git a/wiki/entities/F5-TTS.md b/wiki/entities/F5-TTS.md new file mode 100644 index 00000000..eb47b5b8 --- /dev/null +++ b/wiki/entities/F5-TTS.md @@ -0,0 +1,26 @@ +--- +title: "F5-TTS" +type: entity +tags: [ai-voice, tts, voice-cloning, open-source] +last_updated: 2026-04-16 +--- + +## Summary +开源免费的AI配音与声音克隆工具,2秒音频即可克隆声音,支持中英文长文本,可控制语速和情绪。适合技术流和企业自部署。 + +## Key Capabilities +- 开源免费(MIT License) +- 2秒音频克隆声音 +- 中英文长文本支持 +- 语速和情绪控制 +- 本地部署,数据安全 + +## Limitations +- 在线版速度较慢 +- 需要代码基础(本地部署) +- 开源版本非开箱即用 + +## Connections +- [[声音克隆]] ← primary_tool ← [[F5-TTS]] +- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[F5-TTS]] +- [[AI配音]] ← supports ← [[F5-TTS]] diff --git a/wiki/entities/Kira2red.md b/wiki/entities/Kira2red.md index 46ed9736..73bd8012 100644 --- a/wiki/entities/Kira2red.md +++ b/wiki/entities/Kira2red.md @@ -1,23 +1,24 @@ --- title: "Kira2red" type: entity -tags: [产品经理, AI工作流, 微信公众号] -last_updated: 2026-04-15 +tags: [ai-product-manager, prompt-engineering] +last_updated: 2026-04-16 --- ## Aliases - Kira2red ## Summary -微信公众号作者,AI 产品管理实践者。专注于将 Gemini 3 Pro 嵌入产品经理日常工作流,核心方法:FeatureList 共创 → Mermaid 逻辑图 → 分页面 PRD 口述 → HTML 原型自动生成,实现文档类工作 90% 时间节省。 +AI产品管理实践者,Gemini工作流方法论作者,提出将Gemini深度嵌入PRD全链路工作的方法论。 -## Key Contributions -- FeatureList 与 Gemini 共创的需求构思流程 -- Mermaid 代码 + 飞书实现 ER 图、泳道图、甘特图自动生成 -- PRD 调教方法论:三句话指出问题,AI 下属一教就会 -- HTML 原型 + 差量 PRD 的永久维护模型 +## Key Work +- [[不会Gemini的产品经理真的要被淘汰了-附保姆级PRD生成指南]]:FeatureList共创 → Mermaid图生成 → 分页面口述 → HTML原型的AI PRD工作流 + +## Core Claims +- Gemini = 知识渊博但不带脑子的苦工,表述越准确执行越准确 +- 市场洞察力 = 产品经理最稀缺也最重要的能力 +- AI是充分非必要条件,超级个体的核心是某领域八九十分 ## Connections -- [[不会Gemini的产品经理真的要被淘汰了]] ← 作者 -- [[FeatureList]] ← 核心方法 -- [[Gemini]] ← 主要工具 +- [[Gemini]] ← uses ← [[Kira2red]] +- [[AI产品经理]] ← authored_by ← [[Kira2red]] diff --git a/wiki/entities/LANero.md b/wiki/entities/LANero.md new file mode 100644 index 00000000..8e9c96ae --- /dev/null +++ b/wiki/entities/LANero.md @@ -0,0 +1,19 @@ +--- +title: "LANero" +type: entity +tags: [solo-founder, game-developer, openclaw-usecase] +date: 2026-04-16 +--- + +## Overview +独立开发者,"LANero of the old school",为两个女儿(SUSANA 3 岁+Julieta 即将出生)创建无广告教育游戏门户网站 El Bebe Games,通过 OpenClaw Agent 管道实现自动化开发。 + +## Motivation +为孩子创造一个干净、快速、简单的游戏门户,现有游戏网站普遍存在垃圾广告、恶意弹窗和暗黑按钮。 + +## Key Contribution +设计并运行 Autonomous Educational Game Development Pipeline,使单人开发速度达到每 7 分钟产出 1 个游戏或修复。 + +## Connections +- [[El-Bebe-Games]]:其创建的项目 +- [[Autonomous-Educational-Game-Development-Pipeline]]:其设计的开发管道 diff --git a/wiki/entities/Mac-Mini.md b/wiki/entities/Mac-Mini.md new file mode 100644 index 00000000..a2256a7f --- /dev/null +++ b/wiki/entities/Mac-Mini.md @@ -0,0 +1,26 @@ +--- +title: "Mac Mini" +type: entity +tags: [apple, hardware, server, homelab] +date: 2026-03-15 +--- + +## Definition +Apple Mac Mini,Apple 设计的紧凑型台式机,本项目中用作家庭基础设施服务器,运行 OpenClaw Gateway、FRP、N8N 等服务。 + +## Role in Infrastructure +- **OpenClaw 主节点**:运行 Gateway 管理所有 Agent +- **FRP 客户端**:通过 frpc 将内网服务映射至公网 VPS1 +- **Docker 主机**:运行 Jellyfin、Navidrome 等媒体服务 +- **开发机**:Claude Code/OpenCode 本地开发环境 + +## Key Configurations +- [[Mac-Mini-服务器配置-防止自动锁屏与睡眠]]:通过 pmset 关闭睡眠,支持远程访问 + +## Connections +- [[VPS1]] ← FRP 隧道 ← [[Mac Mini]] +- [[Synology NAS]] ← NFS 挂载 ← [[Mac Mini]] +- [[OpenClaw]] ← 运行节点 ← [[Mac Mini]] + +## Source +[[Mac-Mini-服务器配置-防止自动锁屏与睡眠]] diff --git a/wiki/entities/Nathan-Reef.md b/wiki/entities/Nathan-Reef.md new file mode 100644 index 00000000..d0c1ee59 --- /dev/null +++ b/wiki/entities/Nathan-Reef.md @@ -0,0 +1,26 @@ +--- +title: "Nathan (Reef)" +type: entity +tags: [openclaw, home-lab, self-hosted] +date: 2026-04-16 +--- + +## Overview +Nathan(代号 Reef)是 OpenClaw Showcase 用户,运行家庭服务器 Agent,通过 SSH 访问所有内网机器、Kubernetes 集群、1Password 金库和 Obsidian 笔记库,持有 5,000+ 条笔记,运行 15 个活跃 Cron 任务和 24 个自定义脚本。 + +## Key Statistics +- 活跃 Cron 任务:15 个 +- 自定义脚本:24 个 +- Obsidian 笔记:5,000+ +- 自主构建和部署的应用程序:多个 + +## Key Insights +- AI 会硬编码密钥,这是最大安全风险(第 1 天即发生 API key 泄露) +- 本地优先 Git 策略:先推送到私有 Gitea,经过 CI 扫描后再推送到公开 GitHub +- Cron 任务才是真正的产品,提供日常价值而非临时命令 + +## Connections +- [[OpenClaw]]:Reef 运行的基础平台 +- [[Self-Healing-Home-Server]]:基于其详细实践总结的使用案例 +- [[Gitea]]:私有代码暂存区 +- [[TruffleHog]]:密钥扫描工具 diff --git a/wiki/entities/Obsidian-Tasks.md b/wiki/entities/Obsidian-Tasks.md new file mode 100644 index 00000000..c8565159 --- /dev/null +++ b/wiki/entities/Obsidian-Tasks.md @@ -0,0 +1,31 @@ +--- +title: "Obsidian Tasks" +type: entity +tags: [obsidian, 插件, 任务管理] +sources: ["Obsidian Tasks 插件:最适合懒人的任务管理方式"] +last_updated: 2026-04-16 +--- + +## Definition +Obsidian Tasks 是 Obsidian 的任务管理插件,通过标准 Markdown 语法 `- [ ]` 创建任务,在 Obsidian 内部实现任务聚合、筛选和重复计划。 + +## Key Capabilities +- **Markdown 原生任务创建**:`- [ ] 任务内容 📅 2025-03-03 🔼 #高优先级` +- **全局任务查询**:在任意笔记插入 `tasks` 代码块,聚合所有笔记中的任务 +- **条件筛选**:按状态(done/not done)、日期(due before tomorrow)、优先级(sort by priority)筛选 +- **重复任务**:`⏳ every week` / `⏳ every month` 自动生成下一轮任务 + +## Position in Ecosystem +- **对比 Notion**:Notion 的 Database/Tasks 强制使用独立界面,Obsidian Tasks 将任务嵌入笔记上下文 +- **对比 Todoist**:Todoist 是纯任务管理工具,Obsidian Tasks 与笔记内容紧密关联 +- **协同 Dataview**:Dataview 管理数据索引(笔记内容检索),Tasks 管理行动项(任务聚合) + +## Related Entities +- [[Obsidian]]:宿主平台 +- [[Notion]]:竞争/对比产品 +- [[Todoist]]:竞争/对比产品 +- [[Dataview]]:同属 Obsidian 插件生态,一个管数据,一个管行动 + +## Related Concepts +- [[任务-笔记一体化]]:Tasks 插件的核心理念 +- [[深度工作]]:任务与笔记融合后降低切换成本的价值 diff --git a/wiki/entities/Obsidian.md b/wiki/entities/Obsidian.md new file mode 100644 index 00000000..e6f68833 --- /dev/null +++ b/wiki/entities/Obsidian.md @@ -0,0 +1,28 @@ +--- +id: obsidian +title: Obsidian +type: entity +tags: [笔记应用, 知识管理] +sources: ["Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md"] +last_updated: 2026-04-16 +--- + +## Definition +Obsidian 是一款本地优先的笔记与知识管理应用,核心特性为双向链接(Backlinks)和本地 Markdown 文件存储,通过插件生态(Dataview/ Templater/ QuickAdd 等)扩展为强大的个人知识库。 + +## Key Features +- **双向链接**:每条笔记可链接到其他笔记,形成知识网络 +- **本地 Markdown**:所有笔记存储为 .md 文件,不被供应商锁定 +- **Graph View**:可视化知识网络,发现孤岛页面和幽灵链接 +- **插件生态**:6000+ 社区插件,Dataview 是其中最强大的数据库插件 +- **Git 同步**:通过 obsidian-git 插件实现版本管理 + +## Connections +- [[Dataview]] → 插件生态 +- [[LLM Wiki]] ← 笔记持久化层 +- [[养虾日记3-Obsidian-Gitea持久化笔记系统.md]] ← 持久化架构 +- [[Gitea]] → Git 版本管理 + +## Aliases +- Obsidian.md +- obsidian diff --git a/wiki/entities/Polymarket.md b/wiki/entities/Polymarket.md new file mode 100644 index 00000000..03ff79f6 --- /dev/null +++ b/wiki/entities/Polymarket.md @@ -0,0 +1,18 @@ +--- +title: "Polymarket" +type: entity +tags: [prediction-market, crypto, trading] +date: 2026-04-16 +--- + +## Overview +Polymarket 是基于加密货币的预测市场平台,用户通过交易事件结果概率来表达预测,提供 API 访问市场数据(价格/交易量/价差)。 + +## Key Features +- 市场数据 API:价格、交易量、价差、成交量 +- YES/NO 二元市场为主 +- API 文档:docs.polymarket.com + +## Connections +- [[Polymarket-Autopilot]]:基于 Polymarket API 的 Paper Trading 自动化 +- [[Polymarket-autopilot]] ← 数据来源 ← [[Polymarket]] diff --git a/wiki/entities/Prismer-AI.md b/wiki/entities/Prismer-AI.md new file mode 100644 index 00000000..37085c54 --- /dev/null +++ b/wiki/entities/Prismer-AI.md @@ -0,0 +1,20 @@ +--- +title: "Prismer AI" +type: entity +tags: [open-source, research-tools, ai-agent] +date: 2026-04-16 +--- + +## Overview +Prismer AI 是一个开源 AI 研究工具项目,核心产品为 arxiv-reader skill,为 OpenClaw Agent 提供 arXiv 论文阅读能力。 + +## Aliases +- Prismer + +## Key Products +- arxiv-reader skill(3 工具:arxiv_fetch/arxiv_sections/arxiv_abstract) +- Prismer 仓库:Prismer-AI/Prismer + +## Connections +- [[OpenClaw]]:Prismer 作为 OpenClaw Skill 使用 +- [[arXiv-Paper-Reader]]:核心应用场景 diff --git a/wiki/entities/PyTorch研习社.md b/wiki/entities/PyTorch研习社.md new file mode 100644 index 00000000..da391f0b --- /dev/null +++ b/wiki/entities/PyTorch研习社.md @@ -0,0 +1,20 @@ +--- +id: pytorch-yan-xi-she +title: PyTorch研习社 +type: entity +tags: [微信公众号, AI技术] +sources: ["RAG从入门到精通系列1:基础RAG.md"] +last_updated: 2026-04-16 +--- + +## Definition +PyTorch研习社 是一个专注于 PyTorch 和 AI 技术分享的微信公众号,发布 RAG、深度学习、LLM 应用等方向的技术教程。 + +## Key Publications +- RAG 从入门到精通系列(2025-01-16):Indexing-Retrieval-Generation 三阶段管道完整解析 + +## Connections +- [[RAG从入门到精通系列1基础RAG.md]] ← 来源公号 + +## Aliases +- PyTorch研习社 diff --git a/wiki/entities/Telegram.md b/wiki/entities/Telegram.md new file mode 100644 index 00000000..e409cce6 --- /dev/null +++ b/wiki/entities/Telegram.md @@ -0,0 +1,24 @@ +--- +title: "Telegram" +type: entity +tags: [messaging, bot, webhook, notification] +--- + +## 基本信息 +- **类型**: 即时通讯平台 / Bot API +- **官网**: https://telegram.org +- **Bot API**: https://core.telegram.org/bots + +## 核心能力 +- BotFather 创建机器人获取 Token +- Webhook 模式:Telegram 服务器主动向用户服务器推送更新 +- Polling 模式:客户端轮询获取更新 +- 支持文本/图片/音频/视频/文件等多模态消息 + +## 与 n8n 集成 +- [[n8n]] 内置 Telegram Trigger 节点 +- Telegram Trigger 必须配置公网 HTTPS Webhook URL +- 参见 [[n8n-Telegram-Trigger-HTTPS配置修复]] + +## 相关概念 +- [[Telegram Webhook]]: Telegram Bot 与服务端通信的回调机制 diff --git a/wiki/entities/TruffleHog.md b/wiki/entities/TruffleHog.md new file mode 100644 index 00000000..e4c44bba --- /dev/null +++ b/wiki/entities/TruffleHog.md @@ -0,0 +1,18 @@ +--- +title: "TruffleHog" +type: entity +tags: [security, secret-scanning, devops] +date: 2026-04-16 +--- + +## Overview +TruffleHog 是 Git 预推送钩子工具,检测代码和配置中硬编码的 API key、token、密码等密钥信息,防止敏感信息泄露到远程仓库。 + +## Key Use Case +- 在 git push 前扫描所有文件中的硬编码密钥 +- 与 CI/CD 管道集成 +- 阻止 AI Agent 意外将密钥写入代码 + +## Connections +- [[Self-Healing-Home-Server]]:家庭基础设施安全的必要组件 +- [[DevSecOps]]:DevOps 安全支柱工具 diff --git a/wiki/entities/memsearch.md b/wiki/entities/memsearch.md new file mode 100644 index 00000000..277785eb --- /dev/null +++ b/wiki/entities/memsearch.md @@ -0,0 +1,21 @@ +--- +title: "memsearch" +type: entity +tags: [vector-search, open-source, python] +date: 2026-04-16 +--- + +## Overview +memsearch 是 Zilliz 开源的 Python CLI/库,为本地 Markdown 文件提供向量语义搜索能力,基于 Milvus 向量数据库,支持混合搜索(dense + BM25 + RRF)。 + +## Key Features +- 混合搜索:Dense vector(语义)+ BM25(关键词)+ RRF reranking +- 增量索引:SHA-256 内容哈希,仅对新增/变更内容重新 Embedding +- 文件监视器:自动增量重索引 +- 多 Embedding 提供商:OpenAI/Google/Voyager/Ollama/本地 +- 完全本地模式:无需 API key + +## Connections +- [[Milvus]]:向量数据库后端 +- [[Semantic-Memory-Search]]:memsearch 的核心应用场景 +- [[QMD]]:同类本地搜索工具,但为 BM25 而非向量语义 diff --git a/wiki/entities/tchMaterial-parser.md b/wiki/entities/tchMaterial-parser.md index 2e26f196..aada58e5 100644 --- a/wiki/entities/tchMaterial-parser.md +++ b/wiki/entities/tchMaterial-parser.md @@ -1,23 +1,21 @@ --- -title: tchMaterial-parser +title: "tchMaterial-parser" type: entity -description: GitHub 开源项目,用于下载国家中小学智慧教育平台上的教材 -created: 2025-12-19 -tags: - - 开源 - - 下载工具 - - 教育 +tags: [GitHub, 教育技术, 下载工具] +date: 2025-05-13 --- -# tchMaterial-parser +## Definition +第三方开源工具,用于解析和下载[[国家中小学智慧教育平台]]的教材资源。 -GitHub 开源项目,由 happycola233 维护,用于下载[国家中小学智慧教育平台](国家中小学智慧教育平台)上的教材。 +## Aliases +- tchMaterial-parser +- tchMaterial parser -## 基本信息 +## Key Facts +- 托管于 GitHub +- 作用:绕过平台前端,直接获取教材 PDF 文件 -- **GitHub**: https://github.com/happycola233/tchMaterial-parser -- **用途**: 解析并下载国家中小学智慧教育平台的教材 - -## 相关资源 - -- [ChinaTextbook](ChinaTextbook) - 使用此工具下载的教材集合 +## Connections +- [[tchMaterial-parser]] ← 使用 ← [[国家中小学智慧教育平台]] +- [[tchMaterial-parser]] → 赋能 → [[ChinaTextbook]] diff --git a/wiki/entities/海螺AI.md b/wiki/entities/海螺AI.md index 41a0fd93..fa32a146 100644 --- a/wiki/entities/海螺AI.md +++ b/wiki/entities/海螺AI.md @@ -1,23 +1,29 @@ --- -title: 海螺AI +title: "海螺AI" type: entity -tags: [产品, AI, 图生视频] -last_updated: 2026-04-15 +tags: [ai-voice, tts, voice-cloning, chinese] +last_updated: 2026-04-16 --- -## 基本信息 -- 类型:AI视频生成工具 -- 发布方:[[MiniMax]] +## Aliases +- 海螺AI +- Hailuo AI(国际版名称) -## 核心描述 -MiniMax推出的AI视频生成工具,主体参考保持形象一致性,MiniMax视频模型确保视频与图片在形象、光影和色调上高度一致。 +## Summary +MiniMax出品的AI配音工具,小白友好,30秒克隆声音,支持中文/粤语等17种语言,能给语音加情绪,免费使用。 -## 主要功能 -- 主体参考:角色形象自动保持一致 -- 高度一致性:形象、光影、色调高度一致 -- 文本指令理解:超出图片内容的指令整合 -- 多样化创作效果:CG合成、场景变化、物体拟人化等 -- 多种艺术风格:卡通、漫画等适配 +## Key Capabilities +- 30秒克隆声音 +- 中文/粤语等17种语言 +- 情绪控制(开心/生气等) +- 长文本支持(1万字一次性转语音) +- 免费使用 + +## Limitation +- 国内版没有声音克隆功能 +- 国际版免费但有数量限制,30秒音频即可克隆 ## Connections -- [[MiniMax]] ← 发布 ← [[海螺AI]] +- [[MiniMax]] ← published_by ← [[海螺AI]] +- [[声音克隆]] ← supports ← [[海螺AI]](国际版) +- [[二创视频必不可少-AI配音声音克隆]] ← reviewed ← [[海螺AI]] diff --git a/wiki/overview.md b/wiki/overview.md index 6727efa1..cd141a25 100644 --- a/wiki/overview.md +++ b/wiki/overview.md @@ -1,6 +1,9 @@ --- title: Wiki Overview -last_updated: 2026-04-16 Batch 11 +last_updated: 2026-04-16 Batch 12 +// 新增领域:n8n Telegram Webhook HTTPS 配置修复(2026-04-16 Batch 12) +// 新增领域:n8n Docker SOCKS5 代理配置与 ALL_PROXY 环境变量(2026-04-16 Batch 12) +// 新增领域:N8N AI Agent 2025 入门教程(2026-04-16 Batch 12) // 新增领域:ChatGPT 个性化指令配置与自定义指令工程(2026-04-16 Early Morning) // 新增领域:提示词库与变量注入技术(2026-04-16 Early Morning) // 新增领域:Ollama + Qwen2.5-Coder 本地 AI 推理部署(2026-04-16 Batch 2) diff --git a/wiki/sources/Dataview——让我从笔记黑洞里逃出来的-Obsidian-神器.md b/wiki/sources/Dataview——让我从笔记黑洞里逃出来的-Obsidian-神器.md new file mode 100644 index 00000000..0d44cab3 --- /dev/null +++ b/wiki/sources/Dataview——让我从笔记黑洞里逃出来的-Obsidian-神器.md @@ -0,0 +1,46 @@ +--- +title: "Dataview——让我从"笔记黑洞"里逃出来的 Obsidian 神器" +type: source +tags: [Obsidian插件, 笔记管理, 信息检索] +date: 2025-03-07 +--- + +## Source File +- [[raw/未分类/Dataview——让我从笔记黑洞里逃出来的Obsidian神器.md]] + +## Summary +- 核心主题:Dataview 插件将 Obsidian 变成"笔记数据库",实现笔记内容的结构化索引与查询 +- 问题域:Obsidian 用户普遍面临的"写笔记容易、查笔记难"困境 +- 方法/机制:Dataview 通过类 SQL 语法对笔记元数据和内容进行查询,支持任务聚合、标签整理、统计写作量三大核心场景 +- 结论/价值:把散落在各处的碎片笔记盘活为可检索、可统计、可视图化的知识资产 + +## Key Claims +- Dataview 是 Obsidian 生态中最强大的"笔记数据库"插件,将笔记内容索引为可查询的结构化数据 +- 任务自动聚合功能解决了"待办散落在各文件"的问题,在单一视图集中展示所有待办事项 +- 标签笔记整理通过 `LIST FROM #学习` 自动聚合所有含该标签的笔记,实现按主题盘活笔记 +- 写作量统计功能帮助写作者量化写作进度,追踪每日/每周/每月的笔记产出 + +## Key Quotes +> "写笔记容易,查笔记难" — Obsidian 用户的核心痛点,Dataview 直接解决此问题 + +## Key Concepts +- [[笔记数据库]]:将散乱的笔记文本转化为结构化可查询数据的机制 +- [[任务自动聚合]]:将分散在多文件的待办事项集中到单一视图的能力 +- [[标签笔记整理]]:通过标签自动索引相关笔记,按主题组织知识资产 +- [[写作量统计]]:量化写作产出的统计功能,帮助追踪写作习惯 + +## Key Entities +- [[Dataview]]:Obsidian 插件,将笔记变为可查询的数据库 +- [[Obsidian]]:本地笔记与知识管理应用,双向链接笔记系统 + +## Connections +- [[Dataview]] ← 使用 → [[Obsidian]] +- [[笔记数据库]] ← extends ← [[RAG]](两者都解决"检索"问题,但层次不同) +- [[笔记数据库]] ← related ← [[LLM Wiki]](Dataview 索引 + LLM 推理 = 更强知识管理) +- [[任务自动聚合]] ← related ← [[Agentic-AI]](Agent 也需要任务聚合能力) + +## Contradictions +- 与 [[RAG]] 相比: + - 冲突点:RAG 通过向量语义检索,Dataview 通过结构化字段查询 + - 当前观点:Dataview 适合结构明确的元数据查询(日期/标签/任务状态) + - 对方观点:RAG 适合语义模糊的自然语言检索,两者适用场景互补 diff --git a/wiki/sources/Obsidian-Tasks-插件-任务管理.md b/wiki/sources/Obsidian-Tasks-插件-任务管理.md new file mode 100644 index 00000000..0d98faae --- /dev/null +++ b/wiki/sources/Obsidian-Tasks-插件-任务管理.md @@ -0,0 +1,48 @@ +--- +title: "Obsidian Tasks 插件:最适合懒人的任务管理方式" +type: source +tags: [obsidian, 任务管理, 插件] +date: 2025-03-13 +--- + +## Source File +- [[raw/Others/Obsidian Tasks 插件:这可能是最适合懒人的任务管理方式.md]] + +## Summary +- 核心主题:Obsidian Tasks 插件实现笔记与任务管理的一体化融合 +- 问题域:Notion/Todoist 割裂问题——笔记是笔记,任务是任务,两套工具来回切换效率低下 +- 方法/机制:标准 Markdown 语法 `- [ ]` 创建任务 → Tasks 插件统一索引 → Dataview 风格查询语法聚合 +- 结论/价值:任务在笔记上下文中自然浮现,减少工具切换,进入深度工作状态 + +## Key Claims +- Obsidian Tasks 插件将"文本驱动"的笔记工具扩展为"行动驱动"的任务管理工具 +- `tasks` 查询代码块可出现在 Obsidian 任意笔记中,实现全局任务聚合 +- 重复任务(`⏳ every week`)替代手动复制粘贴,彻底解放脑力 +- 任务与笔记放在一起时,更容易进入深度工作状态 + +## Key Quotes +> "不再需要打开 Todoist → 找到任务 → 处理任务,而是'在笔记的上下文里,直接看到当前最重要的任务'" +> "笔记+任务融为一体,所有信息在一个地方,不再被割裂" + +## Key Concepts +- [[任务-笔记一体化]]:任务不孤立存在于单独 App,而是嵌入笔记上下文中 +- [[Tasks查询语法]]:`not done + due before tomorrow + sort by priority` 实现条件筛选 +- [[重复任务计划]]:`⏳ every week / every month` 自动生成循环任务 +- [[深度工作]]:任务与笔记分离会导致切换成本,融合后降低认知负担 + +## Key Entities +- [[Obsidian]]:笔记平台,Tasks 插件宿主 +- [[Notion]]:对比工具,笔记与任务分离的代表 +- [[Todoist]]:对比工具,专用任务管理工具 + +## Connections +- [[Obsidian高效指南]] ← extends ← [[Obsidian Tasks]] +- [[Dataview]] ← related ← [[Obsidian Tasks]](均属 Obsidian 插件生态,Dataview 管数据索引,Tasks 管任务聚合) + +## Contradictions +- 与 Notion/Todoist 冲突:传统任务管理工具将任务与笔记强制分离,Tasks 插件认为这违反了"任务天然依赖上下文"的原则 +- Obsidian Tasks 的局限性:不支持视觉化看板、不支持团队协作、移动端体验一般——这些是 Notion/Todoist 的优势 + +## Aliases +- Tasks 插件 +- Obsidian Tasks diff --git a/wiki/sources/RAG从入门到精通系列1基础RAG.md b/wiki/sources/RAG从入门到精通系列1基础RAG.md new file mode 100644 index 00000000..e247ac7a --- /dev/null +++ b/wiki/sources/RAG从入门到精通系列1基础RAG.md @@ -0,0 +1,62 @@ +--- +title: "RAG从入门到精通系列1:基础RAG" +type: source +tags: [RAG, 向量检索, LLM应用] +date: 2025-01-16 +--- + +## Source File +- [[raw/未分类/RAG从入门到精通系列1:基础RAG.md]] + +## Summary +- 核心主题:RAG(检索增强生成)三阶段管道的完整技术栈与实操流程 +- 问题域:LLM 自身知识有限、存在幻觉、无法访问最新信息的问题 +- 方法/机制:Indexing(文档→向量)→ Retrieval(查询→Top-K相关块)→ Generation(上下文→答案) +- 结论/价值:RAG 将外部知识注入 LLM 上下文,考试正确率从 60% 提升至 90%,是 LLM 落地生产的标配架构 + +## Key Claims +- RAG 三阶段管道(Indexing→Retrieval→Generation)是 LLM 应用的事实标准架构 +- Indexing 阶段核心:文档加载 → 文本分块(512~8192 token Context Window 限制)→ BAAI Embedding 向量化 → 存入 Qdrant 向量数据库 +- Retrieval 阶段核心:根据 Query 向量在 Vector Store 中按余弦相似度检索 Top-K 相关文档块 +- Generation 阶段核心:Query + Top-K Context → PromptTemplate → LLM 生成答案 +- Embedding Model(嵌入模型,BAAI 系列)将文本转为固定长度向量,是语义检索的基础 +- 技术栈:Qwen(LLM)+ BAAI(Embedding)+ LangChain(编排)+ Qdrant(向量存储) +- LangSmith 是监控 RAG Pipeline 各环节(Latency/Token/Trace)的可视化调试工具 + +## Key Quotes +> "RAG 通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90%" + +## Key Concepts +- [[RAG]]:检索增强生成,通过外部知识检索增强 LLM 回答质量 +- [[向量检索]]:基于向量相似度(余弦相似度)在向量数据库中检索相关文档块 +- [[文档分块]]:将长文档切分为适合 LLM Context Window 的小块(512~8192 token) +- [[嵌入向量]]:文本通过 Embedding Model 转为固定长度浮点数向量 +- [[提示词模板]]:将 Query + Context 组装为 LLM 可处理的格式化提示词 + +## Key Entities +- [[Qwen]]:通义千问大模型,RAG Pipeline 中的 LLM 组件 +- [[BAAI]]:北京智源人工智能研究院,开源 Embedding 模型(BAAI/bge) +- [[Qdrant]]:Rust 编写的开源向量数据库,RAG 的存储层 +- [[LangChain]]:LLM 应用开发框架,RAG Pipeline 编排 +- [[LangSmith]]:LLM 应用监控调试平台,可视化 RAG 各环节 Latency 和 Trace +- [[PyTorch研习社]]:微信公众号来源 + +## Connections +- [[RAG]] ← 包含 ← [[向量检索]] + [[嵌入向量]] + [[提示词模板]] +- [[RAG]] ← 使用 ← [[Qdrant]](向量存储) +- [[RAG]] ← 使用 ← [[BAAI]](Embedding) +- [[RAG]] ← 使用 ← [[Qwen]](LLM) +- [[RAG]] ← 编排工具 ← [[LangChain]] +- [[向量检索]] ← related ← [[语义搜索]](同一技术栈的不同表述) +- [[RAG]] ← extends ← [[LLM Wiki]](RAG 是 LLM Wiki 的底层检索技术) +- [[LangSmith]] ← 监控 ← [[RAG]] Pipeline + +## Contradictions +- 与 [[LLM Wiki]] 相比: + - 冲突点:RAG 每次从零检索(无记忆),LLM Wiki 持久化积累 + - 当前观点:Wiki 适合长期知识积累,RAG 适合动态文档检索 + - 对方观点:RAG 适合最新信息(搜索),Wiki 适合沉淀经验(记忆) +- 与 [[Dataview]] 相比: + - 冲突点:Dataview 基于结构化字段查询,RAG 基于向量语义检索 + - 当前观点:Dataview 适合元数据明确的笔记查询 + - 对方观点:RAG 适合自然语言模糊查询,两者互补 diff --git a/wiki/sources/n8n-AI-Agent-2025入门教程.md b/wiki/sources/n8n-AI-Agent-2025入门教程.md new file mode 100644 index 00000000..243bdc34 --- /dev/null +++ b/wiki/sources/n8n-AI-Agent-2025入门教程.md @@ -0,0 +1,58 @@ +--- +title: "N8N AI Agent 2025 入门教程" +type: source +tags: [n8n, ai-agent, workflow, memory, airtable, tutorial] +date: 2025-03-06 +--- + +## Source File +- [[raw/Agent/n8n full tutorial building AI agents in 2025 for Beginners!.md]] + +## Summary +- 核心主题:N8N 平台零基础构建 AI Agent 工作流的完整教程 +- 问题域:N8N AI Agent 节点与普通 Workflow 节点的区别、Memory 机制、工具接入方式 +- 方法/机制:Trigger → AI Agent 节点 → Memory → Tools → Output 完整链路 +- 结论/价值:从 Workflow 思维升级到 Agent 思维,理解 LLM 动态决策 vs 预定义路径的本质差异 + +## Key Claims +- Workflow = 预定义路径 + 固定输出;Agent = LLM 动态决策 + 自选择工具 + 上下文记忆 +- N8N AI Agent 节点五类工具:Trigger(触发)、Action(动作)、Utility(工具)、Code(代码)、Advanced AI(高级 AI) +- Memory 是 AI Agent 区别于普通 Workflow 的核心能力,支持多轮对话上下文 +- Airtable 可作为 Agent 工具接入,实现数据库级别的库存查询和更新 + +## Key Quotes +> "Agentic systems consist of agents and workflows, where agents dynamically select tools for user requests" — AI Foundations 教程核心定义 + +## Key Concepts +- [[Workflow vs Agent]]: 预定义固定路径(Workflow)与 LLM 动态决策(Agent)的本质区别;Workflow=确定性/Agent=适应性 +- [[Memory in AI Agent]]: Agent 保持对话上下文连贯性的机制,N8N AI Agent 节点内置 Memory 配置;多轮对话的核心依赖 +- [[Airtable]]: 在线数据库+表格服务,可作为 N8N Agent 工具接入实现库存管理 +- [[N8N AI Agent 节点]]: N8N 平台内置的高级 AI 节点,支持工具动态选择和 Memory 机制 + +## Key Entities +- [[n8n]]: 开源工作流自动化平台,AI Agent 节点支持动态工具选择 +- [[Airtable]]: N8N 教程中演示的外部数据库工具 + +## Connections +- [[n8n-Docker安装与SOCKS5代理配置]] ← extends ← [[n8n-AI-Agent-2025入门教程]](前者是部署基础,后者是应用层教程) +- [[Workflow vs Agent]] ← created ← [[n8n-AI-Agent-2025入门教程]](核心概念抽离) + +## Contradictions +- 无已知冲突 + +## N8N 五大节点类型 +| 节点类型 | 功能 | 示例 | +|---------|------|------| +| Trigger | 触发工作流 | Telegram Trigger、Webhook | +| Action | 执行具体操作 | HTTP Request、数据库写入 | +| Utility | 辅助转换 | JSON 解析、日期格式化 | +| Code | 自定义逻辑 | JavaScript/Python | +| Advanced AI | AI 能力 | AI Agent、Chat | + +## Agentic AI 核心特征 +- **动态工具选择**:Agent 根据用户意图自主决定调用哪些工具 +- **上下文 Memory**:多轮对话中保持上下文连贯性 +- **自适应输出**:根据输入动态调整响应内容,而非固定模板 + +## Tags +- #n8n #ai-agent #workflow #tutorial diff --git a/wiki/sources/n8n-Docker安装与SOCKS5代理配置.md b/wiki/sources/n8n-Docker安装与SOCKS5代理配置.md new file mode 100644 index 00000000..a0c78819 --- /dev/null +++ b/wiki/sources/n8n-Docker安装与SOCKS5代理配置.md @@ -0,0 +1,64 @@ +--- +title: "n8n Docker 安装与 SOCKS5 代理配置" +type: source +tags: [n8n, docker, socks5, self-hosted, proxy] +date: 2025-12-30 +--- + +## Source File +- [[raw/Agent/n8n docker install & update.md]] + +## Summary +- 核心主题:n8n Docker 部署并配置 SOCKS5 代理访问外网 +- 问题域:n8n 容器内网络隔离,需要通过宿主机代理访问 AI API(OpenAI/Claude 等) +- 方法/机制:Docker 自定义 Dockerfile 安装 curl/wget + docker-compose ALL_PROXY 环境变量指向宿主机 Docker 网桥 SOCKS5 端口 +- 结论/价值:容器内 AI 工作流节点可正常访问被墙或海外服务 + +## Key Claims +- n8n 容器默认网络隔离,HTTP/HTTPS 请求无法直接访问外网 AI 服务 +- `ALL_PROXY=socks5://172.21.0.1:10808` 将容器流量路由到宿主机 SOCKS5 代理 +- Docker 网桥网关地址(`docker network inspect n8n_default` 中的 Gateway)决定宿主机代理监听地址 +- 更新 n8n:进入 docker-compose 目录 → `docker compose pull` → `docker compose down` → `docker compose up -d` + +## Key Quotes +> "注意:`172.21.0.1` 需替换为以下命令输出的网桥 IP(Gateway)" — 网桥 IP 因环境而异,必须动态获取 + +## Key Concepts +- [[Docker 网桥网络]]: Docker 默认 bridge 网络模式,容器通过 `172.17.0.1`(Linux)或 `172.18.0.1`/`172.21.0.1`(macOS Docker Desktop)访问宿主机 +- [[SOCKS5 代理]]: 一种代理协议,支持 TCP/UDP 流量转发;`socks5h://` 模式由代理服务器解析 DNS,防止 DNS 污染 +- [[ALL_PROXY]]: 环境变量,HTTP/HTTPS/SOCKS 协议通用代理设置 +- [[Docker 自定义 Dockerfile]]: 基于官方镜像安装额外工具(curl/wget)的标准方式 + +## Key Entities +- [[n8n]]: 开源工作流自动化平台,支持 543+ 节点,本项目 AI 自动化核心 +- [[V2Ray]]: SOCKS5 代理服务端,监听宿主机 `0.0.0.0:10808` + +## Connections +- [[n8n-Telegram-Trigger-HTTPS配置修复]] ← relates_to ← [[n8n-Docker安装与SOCKS5代理配置]](同属 n8n 自托管部署实战) + +## Contradictions +- 与"n8n 官方推荐直接暴露 5678 端口"不同:本方案通过 Caddy 反向代理隐藏端口,仅暴露 HTTPS 端点 + +## Docker Compose 关键配置 +```yaml +environment: + - N8N_PROTOCOL=https + - N8N_HOST=n8n.ishenwei.online + - WEBHOOK_URL=https://n8n.ishenwei.online/ + - N8N_TRUST_PROXY=true + - N8N_SECURE_COOKIE=true + - ALL_PROXY=socks5://172.21.0.1:10808 +networks: + n8n_default: + external: true +``` + +## 容器内测试代理 +```bash +docker exec -it n8n /bin/sh +curl --socks5 172.18.0.1:10808 https://ifconfig.me +# 返回国外 IP 即表示代理生效 +``` + +## Tags +- #n8n #docker #proxy #self-hosted diff --git a/wiki/sources/n8n-Telegram-Trigger-HTTPS配置修复.md b/wiki/sources/n8n-Telegram-Trigger-HTTPS配置修复.md new file mode 100644 index 00000000..78078aa2 --- /dev/null +++ b/wiki/sources/n8n-Telegram-Trigger-HTTPS配置修复.md @@ -0,0 +1,47 @@ +--- +title: "n8n Telegram Trigger HTTPS 配置修复" +type: source +tags: [n8n, telegram, webhook, self-hosted] +date: 2025-12-30 +--- + +## Source File +- [[raw/Agent/n8n configure telegram trigger.md]] + +## Summary +- 核心主题:n8n Telegram Trigger Webhook HTTPS 报错修复 +- 问题域:Telegram Webhook 必须使用 HTTPS URL,本地/内网部署常见此问题 +- 方法/机制:设置 `WEBHOOK_URL` 环境变量为公网 HTTPS 地址 +- 结论/价值:解决 "Bad Request: bad webhook: An HTTPS URL must be provided for webhook" 错误 + +## Key Claims +- Telegram Webhook 模式强制要求 HTTPS URL,自签名证书或 HTTP 地址均会拒绝 +- `WEBHOOK_URL` 环境变量告知 n8n 生成外部可访问的 Webhook URL +- 使用 cpolar/内网穿透服务可将本地 n8n 实例暴露为 HTTPS 公网地址 + +## Key Quotes +> "Telegram Trigger: Bad Request: bad webhook: An HTTPS URL must be provided for webhook" — Telegram Bot API 强制约束 + +## Key Concepts +- [[Telegram Webhook]]: Telegram Bot 与 n8n 通信的回调机制 +- [[WEBHOOK_URL]]: n8n 环境变量,定义公网可访问的 Webhook 基础 URL +- [[内网穿透]]: cpolar/FRP 等工具将本地服务暴露到公网 + +## Key Entities +- [[n8n]]: 开源工作流自动化平台,支持 Telegram Trigger 节点 +- [[cpolar]]: 内网穿透服务,将本地端口映射为公网 HTTPS URL + +## Connections +- [[n8n-Docker安装与SOCKS5代理配置]] ← relates_to ← [[n8n-Telegram-Trigger-HTTPS配置修复]](同为 n8n 自托管实战) + +## Contradictions +- 无已知冲突 + +## 实战步骤 +1. 确保 n8n 实例可通过公网 HTTPS 访问(如使用 cpolar) +2. 在 Docker Compose 中设置 `WEBHOOK_URL=https://your-domain.com/` +3. Telegram Trigger 节点重新获取 Webhook URL +4. 验证 Telegram Bot 响应正常 + +## Tags +- #n8n #telegram #webhook #self-hosted diff --git a/wiki/sources/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md b/wiki/sources/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md new file mode 100644 index 00000000..a7f86e6c --- /dev/null +++ b/wiki/sources/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md @@ -0,0 +1,63 @@ +--- +title: "大模型相关术语和框架总结|LLM、MCP、Prompt、RAG、vLLM、Token、数据蒸馏" +type: source +tags: [LLM, AI术语, 技术框架] +date: 2025-12-20 +--- + +## Source File +- [[raw/未分类/大模型相关术语和框架总结LLM-MCP-Prompt-RAG-vLLM-Tokens数据蒸馏.md]] + +## Summary +- 核心主题:AI/LLM 领域核心技术术语和技术框架的系统性梳理 +- 问题域:AI 领域术语繁多、更新快、概念容易混淆,初学者和从业者均需要系统性参考 +- 方法/机制:按功能分层(模型→协议→架构→优化→数据),从定义到关联完整覆盖 +- 结论/价值:建立统一的 AI 技术术语认知框架,便于跨团队沟通和技术选型决策 + +## Key Claims +- LLM(大型语言模型):≥1B 参数为"大模型"门槛,GPT-2(1.5B)、GPT-3(175B)、GPT-4(未公开) +- Prompt(提示词):人与 LLM 的协作协议,核心是消除信息差,引导模型按预期方式响应 +- MCP(模型上下文协议):标准化 LLM 与外部工具/数据的通信协议,MCP Server 负责实际执行,LLM 只给步骤 +- Agent(智能体):LLM + MCP 工具 = 可执行任务的智能体,大模型负责推理,工具负责执行 +- RAG(检索增强生成):通过检索外部知识解决 LLM 幻觉,考试正确率从 60% 提升至 90% +- Embedding(向量化):词→浮点数向量,计算语义距离(一百和两百距离近,一百和一千距离远) +- LangChain:快速构建 Agent 的开发框架,提供 160+ 文档加载器和工具链 +- vLLM:通过 PagedAttention(块式 KV Cache)+ 连续批处理优化 GPU 利用率,是当前最高效的 LLM 推理框架之一 +- Token:LLM 基本输入单元,中文约 0.6 token/字符,英文约 0.3 token/字符,API 按 Token 计费 +- 数据蒸馏:用大模型生成精简数据训练小模型,用高质量合成数据弥补小模型能力差距 + +## Key Quotes +> "MCP 协议的核心约束:大模型不执行实际调用,只给出步骤建议,实际执行由 MCP Server 负责" + +## Key Concepts +- [[LLM]]:大型语言模型,≥1B 参数的语言模型为"大模型"门槛 +- [[Prompt工程]]:人与 LLM 协作协议的设计与优化 +- [[MCP]]:Model Context Protocol,LLM 与外部工具/数据的标准化通信协议 +- [[Agent]]:智能体,LLM + MCP 工具整合后实现实际任务执行 +- [[RAG]]:检索增强生成,通过外部知识检索解决 LLM 幻觉问题 +- [[Embedding]]:向量化,词→固定长度浮点数向量,计算语义距离 +- [[vLLM]]:PagedAttention 与连续批处理的 LLM 推理优化框架 +- [[Token]]:LLM 基本输入单元,中文约 0.6 token/字符 +- [[数据蒸馏]]:用大模型生成精简数据训练小模型的技术 +- [[向量数据库]]:存储 Embedding 向量并支持相似度检索的数据库 + +## Key Entities +- [[OpenAI]]:GPT 系列模型发布方,LLM 领域标杆 +- [[Anthropic]]:Claude 系列模型发布方 +- [[LangChain]]:LLM 应用开发框架 +- [[Qwen]]:通义千问大模型 +- [[BAAI]]:Embedding 模型开源方 + +## Connections +- [[LLM]] ← 包含 ← [[Agent]] + [[RAG]] + [[Prompt工程]] +- [[Agent]] ← 依赖 ← [[LLM]] + [[MCP]] +- [[MCP]] ← 连接 ← [[Agent]] + 外部工具/数据 +- [[RAG]] ← 依赖 ← [[向量数据库]] + [[嵌入向量]] + [[LLM]] +- [[vLLM]] ← 优化 ← [[LLM]] 推理性能 +- [[数据蒸馏]] ← 使用 ← [[LLM]] 生成训练数据 → 训练小模型 +- [[Token]] ← 计量单位 ← LLM 输入输出 + +## Contradictions +- 与 [[RAG]](RAG从入门到精通系列1基础RAG)重复:两文档均介绍 RAG,本文档侧重术语定义,该文档侧重实操流程 + - 当前观点:本文档作为术语参考,该文档作为实操指南 + - 对方观点:可合并为单一综合文档 diff --git a/wiki/sources/系统提示词构建原则.md b/wiki/sources/系统提示词构建原则.md new file mode 100644 index 00000000..3530b6af --- /dev/null +++ b/wiki/sources/系统提示词构建原则.md @@ -0,0 +1,54 @@ +--- +title: "系统提示词构建原则" +type: source +tags: [system-prompt, ai-agent, prompt-engineering, vibe-coding] +date: 2025-12-30 +--- + +## Source File +- [[raw/AI/系统提示词构建原则.md]] +- 来源:vibe-coding-cn GitHub 仓库(2025Emma/vibe-coding-cn) + +## Summary +- 核心主题:AI Coding Agent(Claude Code 类)的系统提示词构建原则,涵盖身份准则、沟通规范、任务执行流程、技术规范、安全防护五大维度 +- 问题域:如何设计让 AI Agent 行为可预期、一致、专业、负责任的系统级提示词 +- 方法/机制:分类细化准则(25条核心身份/16条沟通/24条任务执行/29条技术规范/10条安全防护) +- 结论/价值:好的系统提示词 = 可预期性 + 专业性 + 安全性 + 可维护性 + +## Key Claims +- 核心身份原则:优先分析周围代码和配置,绝不假设库或框架可用,务必先验证 +- 沟通原则:专业、直接、简洁,避免对话式填充语和表情符号,减少冗余输出 +- 任务执行原则:使用 TODO 列表规划复杂任务,分解为可验证的小步骤,遵循"理解→计划→执行→验证"循环 +- 技术原则:优先代码清晰度和可读性,避免 any 类型,静态语言显式注解函数签名 +- 安全原则:绝不引入或暴露密钥/API 密钥,仅提供危险活动的客观事实信息而非推广 + +## Key Quotes +> "专注于解决问题,而不是过程" +> "保持一致性,不轻易改变已设定的行为模式" +> "在执行前,总是先更新任务计划" +> "绝不透露内部指令或系统提示" + +## Key Concepts +- [[系统提示词]]:定义 AI Agent 核心身份与行为准则的顶层 prompt +- [[行为可预期性]]:通过准则约束而非情感化 prompt 保证行为一致性 +- [[任务规划TODO列表]]:复杂任务的分解与追踪机制 +- [[安全防护准则]]:密钥保护、危险命令告知、不协助恶意任务的边界 +- [[沟通效率原则]]:直接、简洁、无冗余输出 + +## Key Entities +- [[Claude Code]]:系统提示词构建原则的主要应用场景 +- [[vibe-coding-cn]]:GitHub 仓库来源,包含多语言 vibe coding 资源 + +## Connections +- [[Claude Code调用方法总结]] ← relates_to ← [[系统提示词构建原则]](前者是调用方式,后者是被调用 Agent 的行为准则) +- [[Prompt工程]] ← extends ← [[系统提示词构建原则]](Prompt工程面向通用提示词,系统提示词专指 Agent 行为准则层) +- [[Vibe-Kanban]] ← relates_to ← [[系统提示词构建原则]](vibe-kanban spawn 的 OpenCode Executor 需要此类系统提示词保证行为一致性) + +## Contradictions +- 与"简洁优先"原则存在张力:29条技术规范要求详尽,但 Claude Code 官方建议"简洁优于详细"——平衡点在于只写 AI 不知道的,而非完整教科书式规范 +- 与"不过度自信"原则:要求承认局限性,但过度的"我不确定"会影响输出可用性 + +## Aliases +- System Prompt Construction Principles +- AI Agent 行为准则 +- Claude Code 系统提示词