Files
nexus/wiki/concepts/Schema-Drift.md
2026-05-03 05:42:12 +08:00

40 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Schema Drift"
type: concept
tags:
- "harness-engineering"
- "data-quality"
- "contracts"
sources:
- "Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog"
last_updated: 2026-04-20
---
## Overview
Schema Drift——同一 LLM 在不同调用中对同一字段生成不同数据类型的静默错误。例如price 字段一次生成为 string"19.99"),下一次生成为 float19.99),下游管道静默产生垃圾数据或崩溃。
## Why It Happens
LLM 的输出是概率性的——它说"类型",但它真正说的是"下一个最可能的 token 序列"。没有显式契约,类型边界是模糊的。
## Silent vs. Loud Failure
Schema Drift 是**静默失败**的典型案例:
- 不符合 Schema 的输出 → 被管道接受 → 在数据分析阶段产生错误结论
- 例如:定价字段从 float 变为 string → 价格比较操作失败 → 商业分析静默出错
## Solution: Contracts & Interfaces Layer
在每个系统边界L LM ↔ 工具、Agent ↔ Agent、Harness ↔ 外部世界)强制显式契约:
- 严格 JSON Schema
- 类型化函数签名
- 版本化 API spec
Contract 层在边界交叉处验证输入和输出,**在不符合内容传播之前拒绝它**。
## Core Principle
> "The model speaks in probabilities. The harness must speak in types."
## Relationship to [[Harness-Engineering]]
Schema Drift 防护是 [[7-Layer-Harness-Stack]] 第 3 层Contracts & Interfaces的核心目标。
## Source
- [[Your-AI-Isn-t-Stupid---It-Just-Needs-a-Better-Harness--Lychee-Technology-Engineering-Blog]]