Auto-sync

2026-04-15 15:02:52 +08:00
parent bb2f9b2b3a
commit d3e7fcf81f
71 changed files with 2549 additions and 0 deletions
--- a/wiki/concepts/Embedding.md
+++ b/wiki/concepts/Embedding.md
@@ -0,0 +1,34 @@
+---
+title: "Embedding"
+type: concept
+tags: [embedding, vector, rag, nlp]
+sources: ["RAG从入门到精通系列1：基础RAG"]
+last_updated: 2026-04-15
+---
+
+## Definition
+将文本（Word、Sentence、Document）转换为固定长度的数值向量（Embedding Vector）的技术，捕获文本的语义信息使得语义相似的内容在向量空间中距离相近。
+
+## Technical Details
+- 输出为固定长度向量（如 768维、1024维、1536维）
+- 语义相近的文本在向量空间中距离更近
+- 支持余弦相似度、点积等多种相似度衡量方法
+
+## Embedding Model
+- **BAAI BGE 系列**：开源中文优化 Embedding Model
+- **OpenAI text-embedding-3**：OpenAI 官方 Embedding API
+- Context Window 通常 512~8192 token
+
+## Applications
+- [[RAG]]：文档和问题的向量化，支持语义检索
+- 文本相似度计算
+- 聚类分析
+- 推荐系统
+
+## Related Concepts
+- [[向量数据库]]：存储 Embedding Vector 的数据库
+- [[RAG]]：Embedding 的主要应用场景
+- [[Token]]：文本被分词后的基本单位
+
+## Sources
+- [[RAG从入门到精通系列1：基础RAG]]