- Sources: 5个新文档 - Concepts: ProxyChains, SOCKS5代理, Docker Daemon代理 - Index: 更新至 Batch 9 - 累计 sources: 108/182
27 lines
790 B
Markdown
27 lines
790 B
Markdown
---
|
||
title: "增量索引"
|
||
type: concept
|
||
tags: [indexing, efficiency, vector-search]
|
||
date: 2026-04-16
|
||
---
|
||
|
||
## Definition
|
||
基于内容哈希(SHA-256)识别未变化的文件,仅对新增或内容变更的文件重新构建索引,避免对未变化内容重复计算。
|
||
|
||
## Why It Matters
|
||
- Embedding API 调用成本高,增量索引可节省 90%+ 的 API 费用
|
||
- 文件监视器实时触发增量索引,保持索引最新
|
||
- 零浪费:每枚 token 都花在真正变化的内容上
|
||
|
||
## Implementation
|
||
```python
|
||
# 内容哈希 → 对比上次索引记录
|
||
content_hash = sha256(file_content)
|
||
if content_hash not in last_index:
|
||
embed_and_index(file_content)
|
||
```
|
||
|
||
## Connections
|
||
- [[memsearch]]:增量索引的具体实现
|
||
- [[向量数据库]]:增量索引的存储后端
|