Update nexus: fix conflicts and sync local changes

2026-04-26 12:06:50 +08:00
parent 191797c01b
commit f09834b5a5
2443 changed files with 254323 additions and 255154 deletions
--- a/wiki/concepts/Fuzzy-Matching.md
+++ b/wiki/concepts/Fuzzy-Matching.md
@@ -1,57 +1,57 @@
---
-title: "Fuzzy Matching"
-type: concept
-tags: ["identity-resolution", "string-similarity", "normalization", "entity-matching"]
-sources: ["identity-graph-operator"]
-last_updated: 2026-04-25
---
-
-# Fuzzy Matching（模糊匹配）
-
-## Definition
-处理"相同实体但文本表达不同"记录的能力——通过规范化（Normalization）和相似度算法，将表面不同的记录识别为同一实体。是身份解析的核心挑战之一。
-
-## Core Techniques
-
-### 1. Nickname Normalization
-```python
-nicknames = {
-    "bill": "william", "bob": "robert", "jim": "james",
-    "mike": "michael", "dave": "david", "joe": "joseph",
-    "tom": "thomas", "dick": "richard", "jack": "john",
-}
-# "Bill Smith" → "william smith"
-```
-
-### 2. String Similarity
-| 算法 | 适用场景 |
-|------|----------|
-| Levenshtein Distance | 字符级编辑距离 |
-| Jaro-Winkler | 人名高权重前缀匹配 |
-| Soundex / Metaphone | 语音相似性（"Jon" = "John"） |
-| Token-based（TF-IDF） | 多词短语 |
-
-### 3. Field-specific Normalization
-| 字段类型 | 规范化规则 |
-|----------|------------|
-| Email | `lower().strip()` |
-| Phone | `re.sub(r"[^\d+]", "", value)` → E.164 格式 |
-| Name | Nickname expansion + lowercase |
-| Address | Street abbreviation（St→Street）、directionals（NE→Northeast） |
-
-## Example
-```
-记录A: "Bill Smith", wsmith@acme.com, +1-555-0142
-记录B: "William Smith", wsmith@acme.com, +15550142
-        ↓ Normalize + Score
-Email:     1.0（exact match）
-Name:      0.82（Bill→William nickname expansion）
-Phone:     1.0（E.164 normalized）
-────────────────────────────────
-Total:     0.94 confidence → 触发自动 merge（> 0.95 阈值接近）
-```
-
-## Relationship to Related Concepts
- [[Fuzzy-Matching]] 是 [[Identity-Resolution]] scoring 层的核心技术
- [[Blocking]] 筛选候选对后，[[Fuzzy-Matching]] 执行细粒度字段比较
- [[Confidence-Score]] 综合所有字段的 fuzzy match scores 得出最终决策
+---
+title: "Fuzzy Matching"
+type: concept
+tags: ["identity-resolution", "string-similarity", "normalization", "entity-matching"]
+sources: ["identity-graph-operator"]
+last_updated: 2026-04-25
+---
+
+# Fuzzy Matching（模糊匹配）
+
+## Definition
+处理"相同实体但文本表达不同"记录的能力——通过规范化（Normalization）和相似度算法，将表面不同的记录识别为同一实体。是身份解析的核心挑战之一。
+
+## Core Techniques
+
+### 1. Nickname Normalization
+```python
+nicknames = {
+    "bill": "william", "bob": "robert", "jim": "james",
+    "mike": "michael", "dave": "david", "joe": "joseph",
+    "tom": "thomas", "dick": "richard", "jack": "john",
+}
+# "Bill Smith" → "william smith"
+```
+
+### 2. String Similarity
+| 算法 | 适用场景 |
+|------|----------|
+| Levenshtein Distance | 字符级编辑距离 |
+| Jaro-Winkler | 人名高权重前缀匹配 |
+| Soundex / Metaphone | 语音相似性（"Jon" = "John"） |
+| Token-based（TF-IDF） | 多词短语 |
+
+### 3. Field-specific Normalization
+| 字段类型 | 规范化规则 |
+|----------|------------|
+| Email | `lower().strip()` |
+| Phone | `re.sub(r"[^\d+]", "", value)` → E.164 格式 |
+| Name | Nickname expansion + lowercase |
+| Address | Street abbreviation（St→Street）、directionals（NE→Northeast） |
+
+## Example
+```
+记录A: "Bill Smith", wsmith@acme.com, +1-555-0142
+记录B: "William Smith", wsmith@acme.com, +15550142
+        ↓ Normalize + Score
+Email:     1.0（exact match）
+Name:      0.82（Bill→William nickname expansion）
+Phone:     1.0（E.164 normalized）
+────────────────────────────────
+Total:     0.94 confidence → 触发自动 merge（> 0.95 阈值接近）
+```
+
+## Relationship to Related Concepts
+- [[Fuzzy-Matching]] 是 [[Identity-Resolution]] scoring 层的核心技术
+- [[Blocking]] 筛选候选对后，[[Fuzzy-Matching]] 执行细粒度字段比较
+- [[Confidence-Score]] 综合所有字段的 fuzzy match scores 得出最终决策