Files
nexus/wiki/concepts/Discrimination-Metrics.md
2026-05-03 05:42:12 +08:00

79 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Discrimination Metrics"
type: concept
tags: [model-evaluation, classification-metrics, model-performance]
sources:
- specialized-model-qa
last_updated: 2026-05-29
---
## Definition
判别能力指标Discrimination Metrics衡量模型区分正例与负例的能力——给定一个随机正例和一个随机负例模型有多大概率给正例更高的分数。区别于校准衡量概率准确性判别度衡量排序正确性。
## Core Metrics
### AUC (Area Under the ROC Curve)
- ROC 曲线下面积,取值 [0.5, 1.0]
- 0.5 = 随机猜测1.0 = 完美区分
- 解读:给定随机正例和随机负例,有 AUC 概率给正例更高分数
- **优势**:阈值无关,对类别不平衡相对稳健
### Gini Coefficient
- $Gini = 2 \times AUC - 1$
- 取值 [0, 1.0],与 AUC 线性等价
- 金融行业常用(信用卡评分、信贷风控)
- 监管报告标准指标
### KS Statistic (Kolmogorov-Smirnov)
- 两个累积分布函数(正例 vs 负例)之间的最大垂直距离
- $KS = \max_t |F_{pos}(t) - F_{neg}(t)|$
- 取值 [0, 1.0]KS > 0.2 通常认为有区分能力
- **优势**:不依赖阈值,提供最佳分割点位置信息
### Additional Metrics
| Metric | Formula | 适用场景 |
|--------|---------|---------|
| F1 Score | $2 \times \frac{precision \times recall}{precision + recall}$ | 类别不平衡 |
| RMSE | $\sqrt{\frac{1}{n}\sum(y_i - \hat{y}_i)^2}$ | 回归模型 |
| Log Loss | $-\frac{1}{N}\sum[y_i \log p_i + (1-y_i)\log(1-p_i)]$ | 概率质量 |
## Usage
```python
from sklearn.metrics import roc_auc_score, f1_score
from scipy.stats import ks_2samp
def discrimination_report(y_true, y_score):
auc = roc_auc_score(y_true, y_score)
gini = 2 * auc - 1
ks_stat, ks_pval = ks_2samp(y_score[y_true == 1], y_score[y_true == 0])
return {
"AUC": round(auc, 4),
"Gini": round(gini, 4),
"KS": round(ks_stat, 4),
"KS_pvalue": round(ks_pval, 6),
}
```
## Model QA 中的应用
Model QA Specialist 执行以下判别能力审计:
1. **全数据切片分析**:在 Train/Validation/Test/OOT 四个数据切片上分别计算 AUC/Gini/KS
2. **子群体性能**:在性别/年龄/地区等受保护属性上分别测试,发现公平性隐患
3. **时间稳定性**:跨 OOT 窗口追踪 AUC/Gini 趋势,识别性能衰减
4. **冠军-挑战者对比**Proposed model vs. incumbent production model量化相对提升
## Relationship
- **被依赖** [[Calibration-Testing]]先确认判别能力KS > 0.2, AUC > 0.7),再测试校准
- **依赖** [[Population-Stability-Index]]PSI 监控输入稳定性,判别指标监控输出健康度
- **依赖** [[SHAP]]:判别指标提供"是否好"的答案SHAP 解释"为什么"
- **支撑** [[specialized-model-qa]]SourceModel QA Specialist 的核心性能评估步骤
## Key Insights
- **判别度 vs 校准**:高 AUC 模型仍可能在特定概率区间严重校准偏差;两者必须同时评估
- **KS vs AUC**KS 对尾部区分更敏感抓坏人AUC 对整体排序更均衡
- **监管门槛**:金融风控通常要求 Gini > 0.4(相当于 AUC > 0.7)方可上线