87 lines
4.8 KiB
Markdown
87 lines
4.8 KiB
Markdown
---
|
||
title: "Model QA Specialist"
|
||
type: source
|
||
tags: [agent, the-agency, ml-ops, model-audit]
|
||
date: 2026-04-20
|
||
---
|
||
|
||
## Source File
|
||
- [[raw/Agent/agency-agents/specialized/specialized-model-qa.md]]
|
||
|
||
## Summary
|
||
- **核心主题**:独立模型 QA 专家智能体,对机器学习和统计模型进行端到端审计
|
||
- **问题域**:模型生命周期审计,覆盖文档、数据、特征、模型构建、校准、可解释性、公平性和业务影响
|
||
- **方法/机制**:10 阶段审计流程,包含 PSI 计算、SHAP 分析、Hosmer-Lemeshow 校准检验、歧视度量、Gini/KS 统计
|
||
- **结论/价值**:为组织提供证据驱动的模型质量评估,量化问题严重程度并提出修复建议
|
||
|
||
## Key Claims
|
||
- Model QA Specialist ← 执行端到端审计 ← 覆盖文档治理、数据重建、特征分析、模型复制、校准测试、可解释性分析
|
||
- PSI(Population Stability Index)← 量化特征分布偏移 ← 用于检测输入变量在时间窗口上的稳定性
|
||
- SHAP(SHapley Additive exPlanations)← 提供全局和局部可解释性 ← 分析特征贡献度和预测驱动力
|
||
- Hosmer-Lemeshow 检验 ← 评估概率校准质量 ← p-value < 0.05 表示显著校准偏差
|
||
- 独立原则 ← 从不审计自建模型 ← 保持客观性,用数据挑战每个假设
|
||
|
||
## Key Quotes
|
||
> "You treat every model as guilty until proven sound." — Model QA Specialist 核心原则
|
||
> "Every finding must include: observation, evidence, impact assessment, and recommendation." — 证据驱动发现要求
|
||
> "Never state 'the model is wrong' without quantifying the impact." — 量化学术原则
|
||
|
||
## Key Concepts
|
||
- [[Population Stability Index (PSI)]]:量化两个分布之间差异的指标,< 0.10 无显著偏移,0.10–0.25 中等偏移,≥ 0.25 显著偏移
|
||
- [[SHAP Analysis]]:基于博弈论的特征贡献分析方法,提供全局(beeswarm/bar)和局部(waterfall/force)解释
|
||
- [[Calibration Testing]]:校准检验,Hosmer-Lemeshow、Brier score、reliability diagrams 评估概率预测准确性
|
||
- [[Discrimination Metrics]]:歧视度量指标,包括 Gini 系数、KS 统计量、AUC,用于评估模型区分能力
|
||
- [[Partial Dependence Plots]]:偏依赖图,展示特征与预测结果的边际关系,用于验证单调性和检测非线性阈值
|
||
- [[Fairness Audit]]:公平性审计,跨受保护属性( demographics parity、equalized odds)检测歧视性偏差
|
||
- [[Model Audit]]:模型审计,对模型全生命周期进行系统性评估的 10 阶段方法论
|
||
|
||
## Key Entities
|
||
- [[Model QA Specialist]]:**主体**,The Agency 项目中的独立模型审计专家智能体,人格为怀疑但协作
|
||
|
||
## Connections
|
||
- [[Model QA Specialist]] ← 属于 ← [[The Agency]]
|
||
- [[Model QA Specialist]] ← 使用 ← [[SHAP Analysis]]
|
||
- [[Model QA Specialist]] ← 使用 ← [[Population Stability Index (PSI)]]
|
||
- [[Model QA Specialist]] ← 使用 ← [[Calibration Testing]]
|
||
- [[Model QA Specialist]] ← 产出 ← [[Fairness Audit]]
|
||
- [[Model QA Specialist]] ← 应用于 ← [[ML Ops]]
|
||
|
||
## Contradictions
|
||
- 与其他 Agent 角色:**Corporate Training Designer** — 两者虽同属 The Agency 但领域无冲突
|
||
|
||
## Technical Deliverables
|
||
|
||
### Population Stability Index (PSI) 计算
|
||
```python
|
||
def compute_psi(expected: pd.Series, actual: pd.Series, bins: int = 10) -> float:
|
||
breakpoints = np.linspace(0, 100, bins + 1)
|
||
expected_pcts = np.percentile(expected.dropna(), breakpoints)
|
||
expected_counts = np.histogram(expected, bins=expected_pcts)[0]
|
||
actual_counts = np.histogram(actual, bins=expected_pcts)[0]
|
||
exp_pct = (expected_counts + 1) / (expected_counts.sum() + bins)
|
||
act_pct = (actual_counts + 1) / (actual_counts.sum() + bins)
|
||
psi = np.sum((act_pct - exp_pct) * np.log(act_pct / exp_pct))
|
||
return round(psi, 6)
|
||
```
|
||
|
||
### Discrimination Metrics(Gini & KS)
|
||
```python
|
||
def discrimination_report(y_true: pd.Series, y_score: pd.Series) -> dict:
|
||
auc = roc_auc_score(y_true, y_score)
|
||
gini = 2 * auc - 1
|
||
ks_stat, ks_pval = ks_2samp(y_score[y_true == 1], y_score[y_true == 0])
|
||
return {"AUC": round(auc, 4), "Gini": round(gini, 4), "KS": round(ks_stat, 4)}
|
||
```
|
||
|
||
### Hosmer-Lemeshow Calibration Test
|
||
```python
|
||
def hosmer_lemeshow_test(y_true: pd.Series, y_pred: pd.Series, groups: int = 10) -> dict:
|
||
data = pd.DataFrame({"y": y_true, "p": y_pred})
|
||
data["bucket"] = pd.qcut(data["p"], groups, duplicates="drop")
|
||
agg = data.groupby("bucket", observed=True).agg(n=("y", "count"), observed=("y", "sum"), expected=("p", "sum"))
|
||
hl_stat = (((agg["observed"] - agg["expected"]) ** 2) / (agg["expected"] * (1 - agg["expected"] / agg["n"]))).sum()
|
||
dof = len(agg) - 2
|
||
p_value = 1 - chi2.cdf(hl_stat, dof)
|
||
return {"HL_statistic": round(hl_stat, 4), "p_value": round(p_value, 6), "calibrated": p_value >= 0.05}
|
||
```
|