Files
nexus/wiki/concepts/Indexing.md
2026-04-14 16:02:50 +08:00

26 lines
678 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Indexing"
type: concept
tags: [RAG, 索引, 数据处理]
---
## Definition
Indexing是将外部文档切分并建立索引的过程是RAG的第一阶段。
## Core Mechanism
1. 文档加载Load
2. 文档切分Split按段落、句子或Token切分
3. 向量化Embed通过Embedding Model转为向量
4. 存储Store存入Vector Store
## Key Properties
- 切分策略影响检索质量
- 受Context Window限制
- 需要平衡粒度
## Connections
- [[RAG]] ← 阶段1 ← [[Indexing]]
- [[Retrieval]] ← 下游 ← [[Indexing]]
- [[Embedding Vector]] ← 输出 ← [[Indexing]]
- [[Vector Store]] ← 目标 ← [[Indexing]]