Sync: add ai finops and deployment notes

2026-04-26 12:35:45 +08:00
parent f09834b5a5
commit ecdf295ded
14 changed files with 7117 additions and 5832 deletions
--- a/wiki/concepts/LLMasJudge.md
+++ b/wiki/concepts/LLMasJudge.md
@@ -0,0 +1,31 @@
+---
+title: "LLMasJudge"
+type: concept
+tags: ["evaluation", "llm-evaluation", "quality-assurance"]
+sources: ["engineering-autonomous-optimization-architect"]
+last_updated: 2026-04-26
+---
+
+## Aliases
+- LLM as a Judge
+- LLM-as-Judge
+- LLM-as-a-Judge Grading
+
+## Definition
+LLM-as-a-Judge 是 [[AutonomousOptimizationArchitect]] 的评分机制——使用一个独立的 LLM（如 Claude Opus）作为"裁判"，对实验模型和生产模型的输出进行客观评分，避免人工评审的主观偏差。评分维度包括：JSON 格式正确性（5分）、延迟（3分）、幻觉检测（-10分）等。
+
+## Mechanism
+1. **评分标准预先建立**：在 [[ShadowTraffic]] 测试前，[[AutonomousOptimizationArchitect]] 明确建立数学评分标准
+2. **异步评估**：实验模型和生产模型同时处理任务，裁判 LLM 盲评两者输出
+3. **统计分析**：累积足够样本后进行统计显著性检验
+4. **自主决策**：实验模型显著优于基准时，更新路由权重
+
+## Key Properties
+- **客观性**：消除人工评分的主观偏差
+- **可扩展**：可同时评估多个 Provider 的输出
+- **数据驱动**：评分结果直接驱动 [[SemanticRouting]] 决策
+
+## Connections
+- [[AutonomousOptimizationArchitect]] — LLM-as-Judge 是核心评估工具
+- [[ShadowTraffic]] — 提供实验与基准并行执行的流量环境
+- [[SemanticRouting]] — 评分结果更新路由权重