Files
nexus/wiki/concepts/LLMasJudge.md
2026-04-27 16:26:34 +08:00

32 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "LLMasJudge"
type: concept
tags: ["evaluation", "llm-evaluation", "quality-assurance"]
sources: ["engineering-autonomous-optimization-architect"]
last_updated: 2026-04-26
---
## Aliases
- LLM as a Judge
- LLM-as-Judge
- LLM-as-a-Judge Grading
## Definition
LLM-as-a-Judge 是 [[AutonomousOptimizationArchitect]] 的评分机制——使用一个独立的 LLM如 Claude Opus作为"裁判"对实验模型和生产模型的输出进行客观评分避免人工评审的主观偏差。评分维度包括JSON 格式正确性5分、延迟3分、幻觉检测-10分等。
## Mechanism
1. **评分标准预先建立**:在 [[ShadowTraffic]] 测试前,[[AutonomousOptimizationArchitect]] 明确建立数学评分标准
2. **异步评估**:实验模型和生产模型同时处理任务,裁判 LLM 盲评两者输出
3. **统计分析**:累积足够样本后进行统计显著性检验
4. **自主决策**:实验模型显著优于基准时,更新路由权重
## Key Properties
- **客观性**:消除人工评分的主观偏差
- **可扩展**:可同时评估多个 Provider 的输出
- **数据驱动**:评分结果直接驱动 [[SemanticRouting]] 决策
## Connections
- [[AutonomousOptimizationArchitect]] — LLM-as-Judge 是核心评估工具
- [[ShadowTraffic]] — 提供实验与基准并行执行的流量环境
- [[SemanticRouting]] — 评分结果更新路由权重