50 lines
2.0 KiB
Markdown
50 lines
2.0 KiB
Markdown
---
|
||
title: "Sprint Contract"
|
||
type: concept
|
||
tags:
|
||
- "harness-engineering"
|
||
- "agentic-ai"
|
||
- "evaluation"
|
||
sources:
|
||
- "Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog"
|
||
last_updated: 2026-04-20
|
||
---
|
||
|
||
## Overview
|
||
Sprint Contract——Generator Agent 和独立 Evaluator Agent 在工作开始前约定的可测试"完成"定义,通过角色分离打破 [[Self-Grading-Illusion]] 的结构性缺陷。
|
||
|
||
## Problem It Solves
|
||
[[Self-Grading-Illusion]]:LLM 无法可靠评估自身输出——同一权重生成又评判,结构性盲点无法自我修正。
|
||
|
||
## Mechanism
|
||
|
||
### Before Work Begins
|
||
Generator Agent 和独立 Evaluator Agent 协商具体的、可测试的"完成"定义,写入 Sprint Contract:
|
||
```
|
||
Sprint Contract for: Market Research Report
|
||
- Output: JSON with fields {competitor_name, pricing: {amount: float, currency: string}, source_url}
|
||
- Required: all fields present, amount > 0, currency in [USD, EUR, GBP]
|
||
- Evaluator action: fetch source_url, verify pricing matches
|
||
```
|
||
|
||
### Two Non-Negotiable Rules
|
||
|
||
**Rule 1: Evaluator 必须执行**
|
||
Evaluator 必须**运行**代码、在 headless 浏览器验证界面、或对比 schema——不只是读原始文本然后评判。
|
||
> 验证不能被伪造,这是唯一算数的验证。
|
||
|
||
**Rule 2: Evaluator 操作于干净上下文**
|
||
Evaluator 必须在干净上下文中操作,不读 Generator 的推理链。
|
||
如果 Evaluator 读取了 Generator 的完整思维链,它继承了 Generator 的假设和盲点——这破坏了独立审查的整个目的。
|
||
> 给 Evaluator 的信息:输出 + 成功标准,**无其他**。
|
||
|
||
## Relationship to [[Harness-Engineering]]
|
||
Sprint Contract 是 [[7-Layer-Harness-Stack]] 第 6 层(Evaluation & Observation)的核心机制——它是 Evaluator 层分离职责的具体实现。
|
||
|
||
## Source
|
||
- [[Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog]]
|
||
|
||
## See Also
|
||
- [[Self-Grading-Illusion]] — 它解决的问题
|
||
- [[7-Layer-Harness-Stack]] — 层次定位
|