Files
nexus/wiki/concepts/Sprint-Contract.md
2026-05-03 05:42:12 +08:00

50 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Sprint Contract"
type: concept
tags:
- "harness-engineering"
- "agentic-ai"
- "evaluation"
sources:
- "Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog"
last_updated: 2026-04-20
---
## Overview
Sprint Contract——Generator Agent 和独立 Evaluator Agent 在工作开始前约定的可测试"完成"定义,通过角色分离打破 [[Self-Grading-Illusion]] 的结构性缺陷。
## Problem It Solves
[[Self-Grading-Illusion]]LLM 无法可靠评估自身输出——同一权重生成又评判,结构性盲点无法自我修正。
## Mechanism
### Before Work Begins
Generator Agent 和独立 Evaluator Agent 协商具体的、可测试的"完成"定义,写入 Sprint Contract
```
Sprint Contract for: Market Research Report
- Output: JSON with fields {competitor_name, pricing: {amount: float, currency: string}, source_url}
- Required: all fields present, amount > 0, currency in [USD, EUR, GBP]
- Evaluator action: fetch source_url, verify pricing matches
```
### Two Non-Negotiable Rules
**Rule 1: Evaluator 必须执行**
Evaluator 必须**运行**代码、在 headless 浏览器验证界面、或对比 schema——不只是读原始文本然后评判。
> 验证不能被伪造,这是唯一算数的验证。
**Rule 2: Evaluator 操作于干净上下文**
Evaluator 必须在干净上下文中操作,不读 Generator 的推理链。
如果 Evaluator 读取了 Generator 的完整思维链,它继承了 Generator 的假设和盲点——这破坏了独立审查的整个目的。
> 给 Evaluator 的信息:输出 + 成功标准,**无其他**。
## Relationship to [[Harness-Engineering]]
Sprint Contract 是 [[7-Layer-Harness-Stack]] 第 6 层Evaluation & Observation的核心机制——它是 Evaluator 层分离职责的具体实现。
## Source
- [[Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog]]
## See Also
- [[Self-Grading-Illusion]] — 它解决的问题
- [[7-Layer-Harness-Stack]] — 层次定位