title, type, tags, sources, last_updated
| title |
type |
tags |
sources |
last_updated |
| Fuzzy Matching |
concept |
| identity-resolution |
| string-similarity |
| normalization |
| entity-matching |
|
|
2026-04-25 |
Fuzzy Matching(模糊匹配)
Definition
处理"相同实体但文本表达不同"记录的能力——通过规范化(Normalization)和相似度算法,将表面不同的记录识别为同一实体。是身份解析的核心挑战之一。
Core Techniques
1. Nickname Normalization
2. String Similarity
| 算法 |
适用场景 |
| Levenshtein Distance |
字符级编辑距离 |
| Jaro-Winkler |
人名高权重前缀匹配 |
| Soundex / Metaphone |
语音相似性("Jon" = "John") |
| Token-based(TF-IDF) |
多词短语 |
3. Field-specific Normalization
| 字段类型 |
规范化规则 |
| Email |
lower().strip() |
| Phone |
re.sub(r"[^\d+]", "", value) → E.164 格式 |
| Name |
Nickname expansion + lowercase |
| Address |
Street abbreviation(St→Street)、directionals(NE→Northeast) |
Example
Relationship to Related Concepts