wiki-ingest: RAG从入门到精通系列1

2026-04-16 03:47:33 +08:00
parent 821be5e431
commit 997ad92e81
7 changed files with 213 additions and 1 deletions
--- a/wiki/sources/RAG从入门到精通系列1-基础RAG.md
+++ b/wiki/sources/RAG从入门到精通系列1-基础RAG.md
@@ -0,0 +1,57 @@
+# RAG从入门到精通系列1：基础RAG
+
+## Metadata
+
+- **Date**: 2025-12-18
+- **Source**: https://mp.weixin.qq.com/s/TlFNOw7_3Q8qywKLpVUmfg
+- **Category**: AI / RAG
+
+## Key Insights
+
+- RAG (Retrieval Augmented Generation) connects LLM with external data sources for more relevant, up-to-date responses
+- Basic RAG consists of three stages: Indexing (document processing), Retrieval (finding relevant docs), and Generation (LLM answer synthesis)
+- Documents must be split into chunks (Splits) to fit within embedding models' limited Context Window (512-8192 tokens)
+- Embedding models convert text into numerical Embedding Vectors for similarity comparison using methods like cosine similarity
+- Vector databases like Qdrant store embedding vectors and enable efficient similarity search
+- LangChain and LlamaIndex are frameworks that simplify RAG pipeline construction
+- LangSmith helps monitor and debug RAG pipelines in production
+
+## Summary
+
+RAG (Retrieval Augmented Generation) is a method for connecting Large Language Models with external data sources, allowing them to generate responses based on private or up-to-date data. The basic RAG workflow consists of three stages: Indexing, Retrieval, and Generation.
+
+In the Indexing stage, external documents are loaded using document loaders (like those in LangChain), split into smaller chunks that fit within embedding models' context windows, and converted into embedding vectors stored in a vector database like Qdrant.
+
+During Retrieval, a user's question is converted into an embedding vector, and similar vectors are searched from the vector store using similarity measures like cosine similarity to find the k most relevant document chunks.
+
+In the Generation stage, the original question and retrieved context chunks are combined into a prompt template and fed to an LLM (like Qwen) to generate a grounded, accurate response with citation to source material.
+
+## Key Entities
+
+- **LLM (Large Language Model)**: Powerful AI model that generates text; doesn't always have access to task-relevant or latest data
+- **RAG (Retrieval Augmented Generation)**: Framework connecting LLM with external data sources for grounded generation
+- **Qwen**: LLM model referenced in the tutorial for RAG implementation
+- **BAAI**: Embedding model series for creating embedding vectors (e.g., BAAI/bge series)
+- **Qdrant**: Open-source vector database written in Rust for storing and searching embedding vectors
+- **LangChain**: Framework providing 160+ document loaders and components for building LLM applications
+- **LlamaIndex**: Framework for building LLM applications with data connectors (mentioned alongside LangChain)
+- **LangSmith**: Platform for monitoring, debugging, and evaluating production LLM applications
+- **Vector Store**: Database system for storing embedding vectors with similarity search capabilities
+- **Retriever**: Component that loads external documents and filters chunks relevant to a question
+
+## Key Concepts
+
+- **Indexing**: Process of loading external documents, splitting them into chunks, and storing their embedding vectors in a vector database
+- **Retrieval**: Process of converting a question to an embedding vector and finding k most similar document chunks from vector store
+- **Generation**: Process of combining question and retrieved context into a prompt and generating answer via LLM
+- **Embedding Vector**: Fixed-length numerical representation of text that captures semantic meaning, generated by embedding models
+- **Context Window**: Maximum token limit an embedding model can process (typically 512-8192 tokens)
+- **Token**: Basic unit for representing text in models; ~1 Chinese character or 3-4 English letters per token
+- **Cosine Similarity**: Method measuring similarity between vectors using cosine of angle between them
+- **Chunking/Splitting**: Breaking documents into smaller pieces to fit within embedding model context windows
+- **Chain**: Linking retrieval and generation components into a unified pipeline (e.g., LangChain's Chain abstraction)
+
+## Related Sources
+
+- [Qdrant：使用Rust编写的开源向量数据库&向量搜索引擎](https://mp.weixin.qq.com/s?__biz=MzI2ODUyMTQyNA==&mid=2247493427&idx=1&sn=75181307c395cd1d51ccfaafac340866&scene=21#wechat_redirect)
+- [GitHub: RAG Tutorial](https://github.com/realyinchen/RAG/blob/main/01_Indexing_Retrieval_Generation.ipynb)