53 lines
2.1 KiB
Markdown
53 lines
2.1 KiB
Markdown
---
|
||
title: "Confidence Score"
|
||
type: concept
|
||
tags: ["identity-resolution", "decision-making", "threshold", "multi-agent"]
|
||
sources: ["identity-graph-operator"]
|
||
last_updated: 2026-04-25
|
||
---
|
||
|
||
# Confidence Score(置信度评分)
|
||
|
||
## Definition
|
||
身份解析决策的核心度量——综合所有字段级匹配证据,通过加权求和得出的合并置信度。是决定"自动合并 / 提案审查 / 创建新实体"三类决策的分界指标。
|
||
|
||
## Calculation
|
||
|
||
```
|
||
confidence = Σ(score_i × weight_i) / Σ(weight_i)
|
||
```
|
||
|
||
其中 `score_i` 是字段级 fuzzy/exact match 分数(0–1),`weight_i` 是字段可靠性权重。
|
||
|
||
### 示例(来自 Identity Graph Operator 源码)
|
||
| 字段 | 记录A值 | 记录B值 | Normalizer | Comparator | Score | Weight |
|
||
|------|---------|---------|-----------|------------|-------|--------|
|
||
| email | wsmith@acme.com | wsmith@acme.com | email | exact | 1.0 | 高 |
|
||
| last_name | Smith | Smith | name | exact | 1.0 | 高 |
|
||
| first_name | William | Bill | name | nickname | 0.82 | 中 |
|
||
| phone | +155****0142 | +155****0142 | phone | exact | 1.0 | 高 |
|
||
|
||
综合置信度 = `1.0×0.3 + 1.0×0.3 + 0.82×0.2 + 1.0×0.2` ≈ **0.96**
|
||
|
||
## Decision Thresholds
|
||
|
||
```
|
||
confidence > 0.95 → 自动合并(单 Agent 高置信)
|
||
0.60 ≤ confidence ≤ 0.95 → 提案审查(多 Agent 协作)
|
||
confidence < 0.60 → 创建新实体
|
||
```
|
||
|
||
## Field Reliability Weights
|
||
|
||
| 字段 | 权重 | 原因 |
|
||
|------|------|------|
|
||
| Email | 高 | 几乎唯一,变更需主动操作 |
|
||
| Phone | 高 | 需验证,变更成本高 |
|
||
| Name | 中 | 常见同名不同人,需结合其他字段 |
|
||
| Address | 低 | 常见地址变更(搬家) |
|
||
|
||
## Why Thresholds Matter
|
||
- **防止假阳性**(False Merge):将两个不同人(如同名"John Smith")错误合并——高阈值 + 字段级证据防止
|
||
- **防止假阴性**(Missed Match):将同一人(如"Bill Smith"/"William Smith")遗漏为不同实体——中等阈值触发提案审查而非直接拒绝
|
||
- **可解释性**:per-field evidence 使决策可被其他 Agent 和人类审计
|