Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/concepts/Hybrid-Search.md
+++ b/wiki/concepts/Hybrid-Search.md
@@ -1,51 +1,51 @@
---
-title: "Hybrid Search"
-type: concept
-tags: [search, vector, bm25, retrieval]
-sources: [semantic-memory-search, knowledge-base-rag]
-last_updated: 2026-04-22
---
-
-## Definition
-
-混合搜索结合两种或多种检索策略——通常是稠密向量检索（语义相似性）和稀疏关键词检索（BM25）——通过排名融合算法合并结果，兼顾语义理解和精确匹配。是当前 RAG 系统提升召回率的主流方法。
-
-## How It Works
-
-```
-查询 → [向量检索（ANN）] ─┐
-     → [BM25 关键词检索] ──┼─→ Reciprocal Rank Fusion (RRF) → 融合排名结果
-     → [其他检索器] ──────┘
-```
-
-1. **向量检索**：Embedding 模型将查询编码为向量，通过 ANN 索引（如 HNSW）找到语义相近的文档块
-2. **BM25 检索**：传统关键词检索，统计词频和文档频率，返回字面匹配的文档块
-3. **RRF 融合**：对各检索器的排名结果按 `1/(k+rank)` 公式融合，k 为平滑参数（通常 k=60）
-
-## Why Not Pure Vector Search?
-
-纯向量搜索的局限性：
- **同义词覆盖不足**：Embedding 空间无法覆盖所有同义词（如"缓存"vs"cache"）
- **专有名词精度低**：罕见词/新词/数字类实体的向量表示不够精确
- **计算成本高**：向量检索的计算量随向量维度增长
-
-混合搜索通过 BM25 补充关键词精确匹配，同时保留向量搜索的语义理解能力。
-
-## Key Insight
-
-> "Hybrid search beats pure vector search. Combining semantic similarity (dense vectors) with keyword matching (BM25) via Reciprocal Rank Fusion catches both meaning-based and exact-match queries." — memsearch 文档
-
-## Implementation
-
-| 组件 | 说明 |
-|------|------|
-| 向量检索器 | Milvus / Pinecone / FAISS / Qdrant |
-| BM25 | Elasticsearch / OpenSearch / rank_bm25 |
-| RRF 融合 | 自实现或向量数据库内置 |
-| Embedding | OpenAI text-embedding-3 / BGE / Sentence-BERT |
-
-## Connections
- [[semantic-memory-search]] — memsearch 使用混合搜索策略
- [[Knowledge-Base-RAG]] — 混合搜索是知识库 RAG 提升召回率的关键
- [[Semantic-Search]] — 混合搜索是纯语义搜索的增强版
- [[Reciprocal Rank Fusion]] — 混合搜索的融合算法
+---
+title: "Hybrid Search"
+type: concept
+tags: [search, vector, bm25, retrieval]
+sources: [semantic-memory-search, knowledge-base-rag]
+last_updated: 2026-04-22
+---
+
+## Definition
+
+混合搜索结合两种或多种检索策略——通常是稠密向量检索（语义相似性）和稀疏关键词检索（BM25）——通过排名融合算法合并结果，兼顾语义理解和精确匹配。是当前 RAG 系统提升召回率的主流方法。
+
+## How It Works
+
+```
+查询 → [向量检索（ANN）] ─┐
+     → [BM25 关键词检索] ──┼─→ Reciprocal Rank Fusion (RRF) → 融合排名结果
+     → [其他检索器] ──────┘
+```
+
+1. **向量检索**：Embedding 模型将查询编码为向量，通过 ANN 索引（如 HNSW）找到语义相近的文档块
+2. **BM25 检索**：传统关键词检索，统计词频和文档频率，返回字面匹配的文档块
+3. **RRF 融合**：对各检索器的排名结果按 `1/(k+rank)` 公式融合，k 为平滑参数（通常 k=60）
+
+## Why Not Pure Vector Search?
+
+纯向量搜索的局限性：
+- **同义词覆盖不足**：Embedding 空间无法覆盖所有同义词（如"缓存"vs"cache"）
+- **专有名词精度低**：罕见词/新词/数字类实体的向量表示不够精确
+- **计算成本高**：向量检索的计算量随向量维度增长
+
+混合搜索通过 BM25 补充关键词精确匹配，同时保留向量搜索的语义理解能力。
+
+## Key Insight
+
+> "Hybrid search beats pure vector search. Combining semantic similarity (dense vectors) with keyword matching (BM25) via Reciprocal Rank Fusion catches both meaning-based and exact-match queries." — memsearch 文档
+
+## Implementation
+
+| 组件 | 说明 |
+|------|------|
+| 向量检索器 | Milvus / Pinecone / FAISS / Qdrant |
+| BM25 | Elasticsearch / OpenSearch / rank_bm25 |
+| RRF 融合 | 自实现或向量数据库内置 |
+| Embedding | OpenAI text-embedding-3 / BGE / Sentence-BERT |
+
+## Connections
+- [[semantic-memory-search]] — memsearch 使用混合搜索策略
+- [[Knowledge-Base-RAG]] — 混合搜索是知识库 RAG 提升召回率的关键
+- [[Semantic-Search]] — 混合搜索是纯语义搜索的增强版
+- [[Reciprocal Rank Fusion]] — 混合搜索的融合算法