Update nexus wiki content

This commit is contained in:
2026-05-03 05:42:06 +08:00
parent 90f3811b83
commit 111bc65b7b
707 changed files with 32306 additions and 7289 deletions

View File

@@ -0,0 +1,43 @@
---
title: "Hybrid Fingerprinting"
type: concept
tags: []
last_updated: 2026-05-01
---
## Definition
结合精确匹配SHA-256 主键哈希)与模糊匹配(向量语义相似度)两种信号,防止因表面相似而误合并不同记录的混合指纹识别机制。
## The Problem
纯语义相似度是模糊的:
- `"John Doe ID:101"``"Jon Doe ID:102"` 语义高度相似
- 但主键不同ID:101 ≠ ID:102实际上是两条不同的记录
- 若仅依赖语义相似度,可能被错误聚类合并
## Solution
```
Hybrid Score = SHA-256(PK_hash) + Vector_Similarity(embedding)
```
- **PK Hash differs** → 强制分离聚类,不允许合并
- **PK Hash matches** → 才考虑向量相似度进行聚类
## Implementation
```python
# 伪代码
for each candidate_pair:
if sha256(pk1) != sha256(pk2):
force_separate_clusters() # PK不同强制分离
else:
if vector_similarity(embedding1, embedding2) > threshold:
merge_clusters() # PK相同且语义相似才合并
```
## Related
- [[Semantic Anomaly Compression]]
- [[Air-Gapped SLM Fix Generation]]