Files
nexus/wiki/sources/specialized-model-qa.md
2026-05-03 05:42:12 +08:00

57 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Model QA Specialist"
type: source
tags: [model-qa, ml-audit, interpretability, calibration, shap, psi, the-agency, specialized]
date: 2026-05-29
---
## Source File
- [[Agent/agency-agents/specialized/specialized-model-qa.md]]
## Summary用中文描述
- 核心主题ML 模型全生命周期端到端独立审计专家Model QA Specialist隶属于 The Agency Specialized 部门
- 问题域:模型质量管理、模型风险评估、模型可解释性、模型公平性审计
- 方法/机制:十域 QA 方法论文档治理→数据重建→标签分析→分段评估→特征分析→模型复制→校准测试→性能监控→可解释性与公平性→商业影响技术栈PSI + SHAP + PDP + Hosmer-Lemeshow + Gini/KS四阶段工作流Severity 分级High/Medium/Low/InfoQA 报告交付模板
- 结论/价值提供证据驱动的模型审计零主观意见每项发现必须量化影响成功标准95%+ 发现确认率、100% QA 域覆盖、复制输出与原始偏差 <1%、零发布后失败
## Key Claims用中文描述
- 模型 QA 专家必须独立于所审计的模型——从不审计自己参与构建的模型,保持客观性,用数据挑战每个假设
- 每次分析必须完全可重现:从原始数据到最终输出的每一步都必须有版本化脚本,无人工干预步骤
- 每项发现必须包含观察observation、证据evidence、影响评估impact assessment和整改建议recommendation且将严重性分为 High/Medium/Low/Info 四级
- 模型 QA 覆盖十个领域:文档与治理审查 → 数据重建与质量 → 目标/标签分析 → 分段与队列评估 → 特征分析与工程 → 模型复制与构建 → 校准测试 → 性能与监控 → 可解释性与公平性 → 商业影响与沟通
## Key Quotes
> "You treat every model as guilty until proven sound." — 核心审计哲学:无罪推定,有证据才过关
> "PSI >= 0.25 indicates significant population shift, action required." — PSI 红线阈值,超过则需干预
> "Every finding must include: observation, evidence, impact assessment, and recommendation. Never state 'the model is wrong' without quantifying the impact." — 证据驱动原则:质量评估不允许主观断言
## Key Concepts
- [[Population Stability Index (PSI)]]:衡量特征或预测分数在时间窗口间的分布漂移,阈值:<0.10 绿/0.100.25 琥珀/>=0.25 红
- [[SHAP-Value-Analysis]]:通过 SHAP 全局beeswarm/bar importance plot和局部waterfall plot分析量化特征贡献是可解释性的核心技术手段
- [[Partial Dependence Plots (PDP)]]:显示每个特征对预测的边际效应,用于验证模型学习的非线性关系和特征交互
- [[Hosmer-Lemeshow-Test]]概率校准的统计检验p-value < 0.05 表明显著校准误差
- [[Discrimination Metrics (Gini & KS)]]AUC/Gini/KS 统计量衡量分类器区分正负样本的能力
- [[Calibration Testing]]:通过 reliability diagram、Brier score 等验证预测概率的可靠性
- [[Champion-Challenger Framework]]:基准测试框架——将待审计模型(新)与生产模型(旧)并行评分对比
- [[Fairness Audit]]:跨受保护特征(种族/性别/年龄等)进行 demographic parity 和 equalized odds 检验
## Key Entities
- [[The Agency]]:所在组织,提供 Specialized 部门多领域专家 Agent 网络
- [[Agentic-Identity-Trust-Architect]]:身份与信任验证基础设施,与 Model QA Specialist 在模型访问权限和身份认证层面协作
- [[Compliance-Auditor]]:合规审计专家,与 Model QA Specialist 在监管合规领域协作——QA 发现可能触发合规审查
- [[Identity-Graph-Operator]]:身份图谱操作员,与 Model QA Specialist 在数据身份对齐层面协作
- [[Document-Generator-Agent]]:文档生成 Agent与 Model QA Specialist 在 QA 报告输出格式层面协作
## Connections
- [[The Agency]] ← provides agent network ← [[Model QA Specialist]] is a Specialized department agent
- [[Population Stability Index (PSI)]] ← is measured by ← [[SHAP-Value-Analysis]]
- [[SHAP-Value-Analysis]] ← informs ← [[Fairness Audit]]
- [[Hosmer-Lemeshow-Test]] ← used in ← [[Calibration Testing]]
- [[Champion-Challenger Framework]] ← benchmarked by ← [[Discrimination Metrics (Gini & KS)]]
- [[Partial Dependence Plots (PDP)]] ← used for ← [[Feature Analysis]]
- [[Model QA Specialist]] ← produces QA reports consumed by ← [[Compliance-Auditor]]
- [[Model QA Specialist]] ← uses templates from ← [[Document-Generator-Agent]]
## Contradictions
- 无实质性内容冲突——Model QA Specialist 的 QA 方法论与 wiki 中其他来源在技术层面互补而非竞争