Files
nexus/wiki/sources/semantic-memory-search.md
2026-04-23 00:02:55 +08:00

47 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Semantic Memory Search"
type: source
tags: [memory, semantic-search, vector-db, openclaw]
date: 2026-04-22
---
## Source File
- [[Agent/usecases/semantic-memory-search]]
## Summary用中文描述
- 核心主题:为 OpenClaw 的 Markdown 记忆文件添加向量语义搜索能力
- 问题域OpenClaw 记忆以纯 Markdown 存储随时间积累后无法检索grep 只能关键词匹配,无法语义理解
- 方法/机制:使用 memsearch 库Milvus 向量数据库)构建混合搜索(稠密向量 + BM25配合 RRF 重排SHA-256 内容哈希实现增量索引;文件监视器自动重建索引
- 结论/价值:用自然语言提问(如"我们选了哪个缓存方案?")即可找到相关内容,无需记忆精确措辞;支持本地模式无需 API Key
## Key Claims用中文描述
- OpenClaw 记忆库积累后,纯 Markdown 无法语义检索,用户需要通过含义而非关键词找到过去决策
- 混合搜索(稠密向量 + BM25结合 RRF 重排,同时捕获语义相似性和关键词精确匹配,优于纯向量搜索
- SHA-256 内容哈希确保仅新内容或变更内容被嵌入,避免重复 API 调用,节省成本
- Markdown 文件是唯一真相,向量索引只是派生缓存,随时可通过 `memsearch index` 重建
## Key Quotes
> "Markdown stays the source of truth. The vector index is just a derived cache — you can rebuild it anytime with `memsearch index`. Your memory files are never modified." — 核心理念:原始文档不可变
> "Hybrid search beats pure vector search. Combining semantic similarity (dense vectors) with keyword matching (BM25) via Reciprocal Rank Fusion catches both meaning-based and exact-match queries." — 混合搜索的优越性
> "Smart dedup saves money. Each chunk is identified by a SHA-256 content hash. Re-running `index` only embeds new or changed content, so you can run it as often as you like without wasting embedding API calls." — 增量索引节省成本
## Key Concepts
- [[Semantic Memory Search]]:通过向量嵌入实现对记忆文件的语义搜索,而非仅关键词匹配
- [[Hybrid Search]]:结合稠密向量(语义相似性)和 BM25关键词精确匹配的混合检索策略
- [[Reciprocal Rank Fusion (RRF)]]:通过排名融合重排合并多个检索结果,提升搜索质量
- [[Content Hashing]]:使用 SHA-256 哈希识别内容块,仅对新增或变更内容重新嵌入
- [[File Watcher]]:监视记忆文件变化,自动触发增量重建索引,保持索引实时更新
## Key Entities
- [[memsearch]]ZillizTech 开源的向量语义搜索 CLI/库,为 OpenClaw 记忆提供语义搜索能力,基于 Milvus 向量数据库
- [[Milvus]]开源向量数据库后端memsearch 的向量存储和检索引擎
- [[OpenClaw]]:多 Agent 框架,自带 Markdown 记忆系统,是本用例的上层应用框架
## Connections
- [[OpenClaw]] ← extends ← [[Semantic Memory Search]]:本用例在 OpenClaw 纯 Markdown 记忆之上叠加向量语义搜索层
- [[Knowledge-Base-RAG]] ← related_to ← [[Semantic Memory Search]]:两者都涉及向量 Embedding 检索,属于 RAG 技术栈的不同场景
- [[Second Brain]] ← related_to ← [[Semantic Memory Search]]:第二大脑的记忆持久化与语义检索能力相辅相成
## Contradictions
- 与 [[Knowledge-Base-RAG]] 无冲突两者属同一技术栈的不同实现Knowledge Base RAG 侧重 Telegram/Slack 投递 URL 并入库,本用例侧重现有 Markdown 文件的语义索引