nexus/wiki/concepts/Indexing.md at b40abbcd473a7093d8261e212e3d6de97c1e516a

ishenwei/nexus

Files

weishen 3224ec4787 Auto-sync: update nexus workspace

2026-04-28 07:26:52 +08:00

title, type, tags, sources, last_updated

title

type

Definition

Indexing（索引阶段）是 RAG（检索增强生成）管道的第一阶段，负责将外部文档转换为可检索的向量表示并存入向量数据库。

原始文档 → 文档加载器 → 文本切分(Split) → Embedding向量化 → 存入Vector Store

文档加载（Loading）：通过 LangChain 等框架的 Document Loader 从多种来源（网页/本地文件/数据库等）加载原始文档
文本切分（Splitting）：将长文档切分成适合 Embedding Model Context Window 的小块（Split），通常 512~4096 token
向量化（Embedding）：使用 Embedding Model（如 BAAI/bge 系列）将文本块转换为固定长度的向量表示
存入向量数据库：将 Embedding Vector 存入 Vector Store（如 Qdrant、Chroma、Milvus 等）