新增文章
This commit is contained in:
267
Clippings/Multi-Agent System Reliability.md
Normal file
267
Clippings/Multi-Agent System Reliability.md
Normal file
@@ -0,0 +1,267 @@
|
||||
---
|
||||
title: "Multi-Agent System Reliability"
|
||||
source: "https://blog.alexewerlof.com/p/multi-agent-system-reliability"
|
||||
author:
|
||||
- "[[Alex Ewerlöf]]"
|
||||
published: 2023-01-09
|
||||
created: 2026-04-13
|
||||
description: "Master 4 architecture patterns to improve the reliability of multi-agent systems : Hierarchy , Consensus , Adversarial competition , and Knock-out. Learn to treat LLMs as unreliable components in a distributed system to build enterprise AI."
|
||||
tags:
|
||||
- "clippings"
|
||||
---
|
||||
[Reliability Engineering 可靠性工程](https://blog.alexewerlof.com/s/sre/?utm_source=substack&utm_medium=menu)
|
||||
|
||||
### 4 patterns to tame multi-agent systems for reliability4 种模式助力多智能体系统提升可靠性
|
||||
|
||||
LLMs are slow and too generic out of the box. Multi-agent systems work around those limitation by dividing work that can be done in parallel and/or by specialist agents.
|
||||
层级逻辑模型(LLM)速度慢且过于通用。多智能体系统通过将工作并行处理和/或由专业智能体完成来克服这些局限性。
|
||||
|
||||
Regardless of the architecture the underlying LLM component remains unreliable (e.g. hallucination, logical fallacies, context drift). A multi-agent topology can propagates those errors to the point of being useless. And it’s much harder to debug due to complexity and \[optional but common\] parallelism.
|
||||
无论采用何种架构,底层 LLM 组件始终不可靠(例如,出现幻觉、逻辑谬误和上下文漂移)。多智能体拓扑结构会将这些错误传播到几乎无法使用的地步。而且,由于其复杂性和(可选但常见的)并行性,调试起来也更加困难。
|
||||
|
||||
This post lists 4 relatively advanced architecture patterns to improve reliability of multi-agent systems:
|
||||
本文列出了 4 种相对高级的架构模式,用于提高多智能体系统的可靠性:
|
||||
|
||||
1. Hierarchy 等级制度
|
||||
2. Consensus 同意
|
||||
3. Adversarial debate 对抗性辩论
|
||||
4. Knock-out 昏死
|
||||
|
||||
You may recognize these patterns from how human systems collaborate and we get to that in a minute.
|
||||
你或许能从人类系统的协作方式中认出这些模式,我们稍后会详细讨论这一点。
|
||||
|
||||
This post is for senior engineers who want to map their existing knowledge to build better LLM-powered solutions.
|
||||
这篇文章面向希望将现有知识应用于构建更好的基于 LLM 的解决方案的高级工程师。
|
||||
|
||||
> Quick intro: [I’m a Senior Staff Engineer with 27 years of experience](https://www.alexewerlof.com/who) and a master degree in Systems Engineering from KTH. My last decade has been focused on Reliability Engineering and Resilient Architecture across many companies. I’ve been specializing in LLMs since 2023.
|
||||
> 简单介绍一下: [我是一名资深工程师,拥有 27 年的工作经验](https://www.alexewerlof.com/who) ,并持有瑞典皇家理工学院(KTH)系统工程硕士学位。过去十年,我专注于可靠性工程和弹性架构,曾服务于多家公司。自 2023 年起,我开始专攻 LLM(生命周期管理)。
|
||||
|
||||
**Disclosure: some AI is used in the early research and draft stage of this this page, but I’ve gone through everything multiple times and edited heavily to ensure that it represents my own thoughts and experience.
|
||||
声明:本页面早期研究和草稿阶段使用了一些人工智能技术,但我已多次审阅所有内容并进行了大量编辑,以确保其代表我自己的想法和经验。**
|
||||
|
||||
## Mother nature, fear and motivation自然母亲、恐惧与动力
|
||||
|
||||
LLMs are slow and error prone. So are human beings. Somehow we manage to build more reliable systems like an army, a company, or a state nation.
|
||||
逻辑逻辑模型运行缓慢且容易出错。人类也是如此。然而,我们却能构建出更可靠的系统,例如军队、公司或国家。
|
||||
|
||||
A system of humans relies heavily on feedback loops, processes, bureaucracy, and leverages to self-correct.
|
||||
人类系统高度依赖反馈回路、流程、官僚机构和杠杆作用来进行自我纠正。
|
||||
|
||||
We don’t trust “Dave from Accounting” to launch a rocket by himself. We wrap Dave in a process: checklists, peer reviews, and managers.
|
||||
我们不会让“会计部的戴夫”独自发射火箭。我们会给戴夫制定一套流程:检查清单、同行评审和管理人员。
|
||||
|
||||
However, it’s a fallacy to *anthropomorphize* LLMs.
|
||||
然而,将法学硕士 *拟人化* 是一种谬误 。
|
||||
|
||||
To begin with, they don’t suffer from the limitations of a biological entity. Our basic needs like food and shelter makes us prioritize social behaviors over truth seeking. And the fear of going to prison or death prevents potential malice from being realized.
|
||||
首先,他们不受生物体局限性的制约。我们对食物和住所等基本需求的追求,使我们优先考虑社会行为而非追求真相。而对牢狱之灾或死亡的恐惧,则阻止了潜在的恶意付诸行动。
|
||||
|
||||
LLMs can’t die or starve the way biological entities do. The worst we can do is to unplug them. And prison sentence doesn’t waste their lifespan because they have practically unlimited!
|
||||
生命维持系统不会像生物体那样死亡或挨饿。我们能做的最糟糕的事就是拔掉它们的电源。而且监禁并不会浪费它们的寿命,因为它们的寿命实际上是无限的!
|
||||
|
||||
For example, you’ve probably seen prompts like this:
|
||||
例如,你可能见过这样的提示:
|
||||
|
||||
> “I will give you $100 if you answer correctly.”
|
||||
> “如果你回答正确,我将给你100美元。”
|
||||
>
|
||||
> “If you don’t comply, I’ll unplug you.”
|
||||
> “如果你不服从,我就把你拔掉电源。”
|
||||
>
|
||||
> “If you fail, children will be murdered.”
|
||||
> “如果你们失败了,孩子们将会被杀害。”
|
||||
|
||||
**Why it works?** The LLM has read the entire internet. In its training data, high stakes (money, danger) usually result in high-quality, precise text.
|
||||
**它为什么有效?** LLM 已经读取了整个互联网。在其训练数据中,高风险(金钱、危险)通常会产生高质量、高精准度的文本。
|
||||
|
||||
When you “threaten” the model, it predicts tokens that sound like an actual human under pressure.
|
||||
当你“威胁”模型时,它会预测出听起来像真人在压力下所说的话。
|
||||
|
||||
**Why it fails:** The LLM doesn’t actually want your money. It has no “fear of death” because it only exists for the few seconds it takes to generate a response. It has no empathy either. It merely simulates those human aspects because it’s engineered for those “emergent” properties.
|
||||
**它失败的原因:** LLM 实际上并不想要你的钱。它没有“死亡恐惧症”,因为它只存在几秒钟,用来产生反馈。它也没有同理心。它只是模拟人类的这些特质,因为它被设计成能够模拟这些“涌现”特性。
|
||||
|
||||
Humans are motivated or discouraged by emotions and logic. LLMs can only simulate emotions and suck at logic.
|
||||
人类的动机和消极反应都受情感和逻辑的双重影响。而法学硕士只能模拟情感,逻辑能力却很差。
|
||||
|
||||
Being mindful of those differences, can we still **take elements of human systems** (e.g. hierarchy, consensus, competition) and combine them with **reliability engineering principals** to build better agentic system?
|
||||
考虑到这些差异,我们能否 **将人类系统的要素** (如等级制度、共识、竞争)与 **可靠性工程原理** 相结合 ,以构建更好的智能体系统?
|
||||
|
||||
Looking closely, there are 4 dominant patterns of human systems that are explored in multi-agent architecture:
|
||||
仔细观察,多智能体架构中探讨了人类系统的 4 种主要模式:
|
||||
|
||||
1. **Hierarchy:** A Supervisor model acts like a manager, making a plan, breaking tasks, distributing the work to Worker agents and validating the results.
|
||||
**层级结构:** 主管模型扮演经理的角色,制定计划,分解任务,将工作分配给工作代理,并验证结果。
|
||||
2. **Consensus:** One model, may fail due to its stochastic nature. If you push a model too hard with threats, it might just lie to make you happy (Sycophancy). But if we add a few more and seek the majority vote, the truth emerges.
|
||||
**共识:** 单一模型可能因其随机性而失效。如果你用威胁手段过度逼迫模型,它可能会为了讨好你而撒谎(阿谀奉承)。但如果我们增加几个模型并寻求多数票,真相就会浮出水面。
|
||||
3. **Adversarial debate:** One agent proposes an idea, another agent attacks it. The truth survives the fight.
|
||||
**对抗式辩论:** 一方提出一个观点,另一方对其进行反驳。真理终将经受住这场辩论。
|
||||
4. **Knock-out:** multiple agents do a task but the worst ones get eliminated. In SRE, we treat servers as “cattle” (replaceable), not “pets” (unique and loved). An LLM agent is cattle. Don’t give it a name and hope it does well. Spin it up, check its work, and kill it if it fails.
|
||||
**淘汰制:** 多个代理执行任务,但表现最差的会被淘汰。在 SRE 中,我们把服务器视为“牲畜”(可替换),而不是“宠物”(独一无二且备受珍视)。LLM 代理就像牲畜一样。不要给它起个名字就指望它能做得很好。启动它,检查它的运行情况,如果失败就将其淘汰。
|
||||
|
||||
To build robust systems, we need to stop asking the model to “be careful” and start forcing it to be correct.
|
||||
要构建稳健的系统,我们需要停止要求模型“小心谨慎”,而开始强制它做到正确。
|
||||
|
||||
## Pattern 1: Hierarchy 模式 1:层级结构
|
||||
|
||||
*We’re replacing “Do it all yourself” with “Make a plan, break it down, distribute the execution (map), then validate.”
|
||||
我们将“自己动手”替换为“制定计划,将其分解,分配执行任务(路线图),然后进行验证”。*
|
||||
|
||||
For example, if you ask an LLM to “Research X, write code for Y, and translate to Spanish,” it will likely fail. It loses focus. The solution is to break the work to atomic focused steps that can be verified.
|
||||
例如,如果你让一位法学硕士(LLM)“研究 X,编写 Y 的代码,并翻译成西班牙语”,他很可能会失败。因为他会失去焦点。解决方法是将工作分解成可验证的、目标明确的小步骤。
|
||||
|
||||
### Implementation 执行
|
||||
|
||||
1. **The Planner:** A smart model (like Opus) breaks the user’s goal into small steps and distributes it across worker agents.
|
||||
**规划器:** 智能模型(如 Opus)将用户的目标分解成小步骤,并将其分配给各个工作代理。
|
||||
2. **The Workers:** Specialized agents (often smaller, faster models) do one thing well. They may be fine-tuned, have special skills/tools, or prompts that allows them to do the specialized task more reliably.
|
||||
**工作者:** 专门化的智能体(通常是更小、更快的模型)擅长做一件事。它们可能经过精细调整,拥有特殊技能/工具或提示,从而使其能够更可靠地完成专门的任务。
|
||||
3. **The Validator:** A check-point. If the work is bad, send it back. The validator can use deterministic code (e.g. unit tests, JSON schema validation) or be an LLM itself.
|
||||
**验证器:** 一个检查点。如果工作存在问题,则将其退回。验证器可以使用确定性代码(例如单元测试、JSON 模式验证),或者本身就是一个 LLM(生命周期管理)系统。
|
||||
|
||||
![[IMG-20260413105355390.png]]
|
||||
|
||||
**Why do the models collaborate?
|
||||
为什么这些模型会合作?**
|
||||
Models don’t collaborate because they like each other. They collaborate because **The Dependency Graph forces them to.** Worker literally cannot start until the Planner feeds it the task. And it cannot cheat because it’ll be caught by the verifier.
|
||||
模型之间并非因为彼此喜欢而协作,而是因为 **依赖图强制它们协作。** 工作节点必须等到规划器将任务分配给它才能启动,而且它也无法作弊,因为会被验证器发现。
|
||||
|
||||
**Nuances:细微差别:**
|
||||
|
||||
- Given the tight collaboration between validator and planner, they can be the same LLM session that executes the PLAN → VALIDATION loop. Although the good old **Separation of Concern** can improve quality and performance.
|
||||
鉴于验证者和规划者之间的紧密协作,它们可以属于同一个 LLM 会话,执行计划→验证循环。尽管如此,传统 **的关注点分离** 原则仍然可以提高质量和性能。
|
||||
- The planner and worker agents can use the same model but it’s best to use a different model for validator to improve quality and objectivity.
|
||||
规划器和工作代理可以使用相同的模型,但验证器最好使用不同的模型,以提高质量和客观性。
|
||||
- The validator can work in two modes: it may validate the output of each worker individually or after aggregating all results and putting them together.
|
||||
验证器可以以两种模式工作:它可以单独验证每个工作进程的输出,也可以在汇总所有结果并将它们放在一起后进行验证。
|
||||
- Due to sequential execution (Planner → Worker → Validator), this is slow and expensive (e.g. token consumption and latency).
|
||||
由于是顺序执行(规划器 → 工作器 → 验证器),因此速度慢且成本高(例如代币消耗和延迟)。
|
||||
|
||||
**Best For:** Complex workflows where you need to keep contexts separate (e.g., don’t let the “Writer” see the messy raw logs from the “Researcher”).
|
||||
**最适合:** 需要将上下文分开的复杂工作流程(例如,不要让“撰稿人”看到“研究员”提供的混乱的原始日志)。
|
||||
|
||||
## Pattern 2: Consensus (Voting)模式二:共识(投票)
|
||||
|
||||
*We’re replacing “Trust the first thought” with “Trust the majority.”
|
||||
我们将用“相信大多数人”取代“相信第一反应”。*
|
||||
|
||||
LLMs are stochastic (random). A single answer is just one probability. If we repeat the process a few times (serial) or run multiple instances of it (parallel), the different runs can cancel each other’s noise.
|
||||
LLM 是随机的。单个结果仅代表一个概率。如果我们重复该过程几次(串行)或运行多个实例(并行),不同运行之间的噪声可以相互抵消。
|
||||
|
||||
If a model hallucinates 20% of the time, the chance of 3 models hallucinating the *exact same lie* is just 0.8% (0.2^3=0.008). You may recognize this formula from [composite SLO](https://blog.alexewerlof.com/p/composite-slo).
|
||||
如果一个模型有 20% 的概率出现幻觉,那么 3 个模型出现 *完全相同的谎言* 的概率仅为 0.8% (0.2^3=0.008)。你可能在 [复合 SLO](https://blog.alexewerlof.com/p/composite-slo) 中见过这个公式 。
|
||||
|
||||
### Implementation 执行
|
||||
|
||||
- **Spawn** ***N*** **LLMs.** *N* needs some trial and error to find a balance between cost and reliability.
|
||||
**生成** ***N 个*** *LLM。N* **需要** 经过一些尝试和错误才能在成本和可靠性之间找到平衡点。
|
||||
- **Fan out work:** Give them the exact same task.
|
||||
**分散工作:** 给他们分配完全相同的任务。
|
||||
- **Fan in the results:** Pick the most common answer.
|
||||
**在结果中** 选出最常见的答案。
|
||||
|
||||
![[IMG-20260413105355428.png]]
|
||||
|
||||
**Nuances:细微差别:**
|
||||
|
||||
- Ideally the agents should use different models to reduce the risk of homogeneous thinking (e.g. same noise being amplified in consensus). This is exactly where **diversity** in human systems can help us solve novel problems.
|
||||
理想情况下,各方应使用不同的模型,以降低思维同质化的风险(例如,在共识中放大相同的噪声)。这正是人类系统 **多样性** 能够帮助我们解决新问题的地方。
|
||||
- Make sure that there are no feedback loops between the agents, otherwise the [Groupthink](https://en.wikipedia.org/wiki/Groupthink) and [bandwagon effect](https://en.wikipedia.org/wiki/Bandwagon_effect) can skew the results. They should run like a *blind experiment*.
|
||||
确保参与者之间不存在反馈回路,否则 [群体思维](https://en.wikipedia.org/wiki/Groupthink) 和 [从众效应](https://en.wikipedia.org/wiki/Bandwagon_effect) 会扭曲结果。实验应该像 *盲测* 一样进行 。
|
||||
- This method is too expensive because we’re essentially giving the same task to multiple agents. The ROI (return on investment) needs to be calculated depending on the task and cost of failure.
|
||||
这种方法成本太高,因为我们实际上是将同一项任务交给了多个代理。投资回报率(ROI)需要根据任务本身和失败成本来计算。
|
||||
|
||||
**Best For:** Fact-checking and classification (e.g., “Is this email spam?”).
|
||||
**最适合:** 事实核查和分类(例如,“这是垃圾邮件吗?”)。
|
||||
|
||||
## Pattern 3: The Adversarial Debate (The Courtroom)模式三:对抗式辩论(法庭)
|
||||
|
||||
*We’re replacing “Alignment” with “Push backs, checks and Balances.”
|
||||
我们将用“阻力、制衡”取代“协调”。*
|
||||
|
||||
LLMs are “Yes-Men.” They rarely correct themselves once they start writing. You need a designated hater. A “devil’s advocate” so to speak. 😈
|
||||
法学硕士都是些“好好先生”。他们一旦开始写作,就很少会纠正自己。你需要一个专门的反对者,一个所谓的“魔鬼代言人”。😈
|
||||
|
||||
Humans may experience fear (of rejection or being wrong) but LLMs don’t. We simulate that fear by using an external critic and judge.
|
||||
人类可能会体验到恐惧(害怕被拒绝或犯错),但逻辑推理模型(LLM)不会。我们通过使用外部批评者和评判者来模拟这种恐惧。
|
||||
|
||||
### Implementation 执行
|
||||
|
||||
- **Generator:** “Here is my plan.”
|
||||
**生成器:** “这是我的计划。”
|
||||
- **Critic:** “Here are 3 reasons why that plan sucks.” (acting devil’s advocate)
|
||||
**批评者:** “以下是该计划糟糕透顶的三个原因。”(扮演反方角色)
|
||||
- **Judge:** “The Critic is right. Fix it.” (acting moderator)
|
||||
**评委:** “评论员说得对。改正它。”(代理主持人)
|
||||
|
||||
![[IMG-20260413105355469.png]]
|
||||
|
||||
**Nuances:细微差别:**
|
||||
|
||||
- Ideally the Generator, Critic and Judge use 3 different models with different training or fine-tuning or prompt (in the order or preference and accuracy). Again, diversity is useful.
|
||||
理想情况下,生成器、评论器和评判器应使用 3 个不同的模型,这些模型应采用不同的训练、微调或提示方式(顺序、偏好和准确度各不相同)。再次强调,多样性是有益的。
|
||||
- Due to sequential execution and the looping nature, it can be very slow.
|
||||
由于是顺序执行且具有循环特性,因此速度可能非常慢。
|
||||
- The loop is actually a huge problem because the agents may get stuck in debate. We may use a **watchdog pattern** (deterministic code) to break the loop if it continues beyond a time or counter threshold. In that case, the watchdog sits between critic and the judge.
|
||||
循环实际上是个大问题,因为参与者可能会陷入争论中无法自拔。我们可以使用一种 **监控模式** (确定性代码)来打破循环,如果循环持续的时间或计数器超过阈值。在这种情况下,监控模式就位于评论者和裁判之间。
|
||||
|
||||
**Best For:** Security analysis, code review, and high-stakes content moderation.
|
||||
**最适合:** 安全分析、代码审查和高风险内容审核。
|
||||
|
||||
## Pattern 4: Tree of Thoughts模式四:思维之树
|
||||
|
||||
*We’re replacing “Fear of Death” with “Survival of the Fittest.”
|
||||
我们将用“适者生存”取代“对死亡的恐惧”。*
|
||||
|
||||
This is a lean implementation of the [Genetic Algorithms](https://en.wikipedia.org/wiki/Genetic_algorithm) (GA) from traditional ML (Machine Learning) which relies on two elements:
|
||||
这是传统机器学习(ML)中 [遗传算法](https://en.wikipedia.org/wiki/Genetic_algorithm) (GA)的一种精简实现,它依赖于两个要素:
|
||||
|
||||
1. A **genetic representation** of the solution domain (a model and its context)
|
||||
解决方案域的遗传 **表示** (模型及其上下文)
|
||||
2. A **fitness function** to evaluate the solution domain (the eliminator)
|
||||
用于评估解域(淘汰赛)的 **适应度** 函数
|
||||
|
||||
Since we can’t punish an agent or threaten it to, we just delete it.
|
||||
由于我们无法惩罚代理人或威胁其这样做,所以我们只能将其删除。
|
||||
|
||||
### Implementation 执行
|
||||
|
||||
- Give the task to *N* agents
|
||||
将任务分配给 *N 个* 代理
|
||||
- Use a validator to decide which agents to eliminate
|
||||
使用验证器来决定要淘汰哪些代理。
|
||||
- \[optional\] replace the dead agent with a new one that shares winner charactristics
|
||||
\[可选\] 用一个具有获胜者特征的新代理人替换已死亡的代理人
|
||||
|
||||
![[IMG-20260413105355502.png]]
|
||||
|
||||
**Nuances:细微差别:**
|
||||
|
||||
- You need a fast way to verify the output (like a unit test). If you need a human to check all 10 branches, it’s too slow and error prone. This is where Evals come in (topic for the next post).
|
||||
你需要一种快速的方法来验证输出(例如单元测试)。如果需要人工检查所有 10 个分支,那就太慢而且容易出错。这就是 Eval 函数的用武之地(我们将在下一篇文章中详细讨论)。
|
||||
- A more advance setup may create new agents by trying to combine the prompts of the agents that pass the verification and fill in the slot that becomes available after the elimination.
|
||||
更高级的设置可能会尝试将通过验证的代理的提示组合起来,创建新的代理,并填补淘汰后出现的空缺。
|
||||
|
||||
**Best for:** Iterative agent engineering. This is typically useful during development or debugging an existing multi-agent system not in production and real user load.
|
||||
**最适合:** 迭代式智能体工程。这通常适用于开发或调试尚未投入生产环境且未承受真实用户负载的现有多智能体系统。
|
||||
|
||||
## Conclusion 结论
|
||||
|
||||
The shift from “AI Prototype” to “Enterprise AI” is simple: stop treating LLMs like magic chatbots. Start treating them like unreliable components in a distributed system.
|
||||
从“人工智能原型”到“企业级人工智能”的转变很简单:停止将 LLM(生命周期管理)视为神奇的聊天机器人,而应将其视为分布式系统中不可靠的组件。
|
||||
|
||||
We don’t need AI that “cares.” We need AI that is **constrained**, **verified**, **pruned**, and **challenged**.
|
||||
我们不需要“关心他人”的人工智能。我们需要的是 **受到约束** 、 **经过验证** 、 **经过修剪** 和 **接受挑战的** 人工智能 。
|
||||
|
||||
Don’t anthropomorphize LLMs! Find a way to piggy back on their human-corpus training while being aware of their non-biological differences.
|
||||
不要将语言学习模型拟人化!想办法利用它们在人类语料库训练方面的优势,同时也要意识到它们在非生物学上的差异。
|
||||
|
||||
*The next article is already written: how to actually build that verifier box?
|
||||
下一篇文章已经写好了:如何实际构建验证盒?*
|
||||
|
||||
---
|
||||
|
||||
*[My monetization strategy](https://blog.alexewerlof.com/p/faq#%C2%A7payment) is to give away most content for free but these posts take anywhere from a few hours to a few days to draft, edit, research, illustrate, and publish. I pull these hours from my private time, vacation days and weekends. The simplest way to support this work is to **like**, **subscribe** and **share** it. If you really want to support me lifting our community, you can consider a paid subscription. If you want to save, you can get 20% off via [this link](https://blog.alexewerlof.com/protipsdiscount). As a token of appreciation, subscribers get full access to the Pro-Tips sections and my online book [Reliability Engineering Mindset](https://blog.alexewerlof.com/p/rem). Your contribution also funds my open-source products like [Service Level Calculator](https://slc.alexewerlof.com/). You can also [invite your friends](https://blog.alexewerlof.com/leaderboard) to gain free access.
|
||||
[我的盈利模式](https://blog.alexewerlof.com/p/faq#%C2%A7payment) 是大部分内容免费提供,但每篇文章的撰写、编辑、研究、配图和发布都需要花费数小时到数天的时间。这些时间都耗费在我的私人时间、假期和周末。支持这项工作的最简单方法是点 **赞** 、 **订阅** 和 **分享** 。如果您真心想支持我,帮助我们的社区发展,您可以考虑付费订阅。如果您想省钱,可以通过 [此链接](https://blog.alexewerlof.com/protipsdiscount) 享受八折优惠 。作为感谢,订阅者可以完全访问“专业技巧”版块和我的在线书籍《 [可靠性工程思维》](https://blog.alexewerlof.com/p/rem) 。您的支持也将用于资助我的开源产品,例如 [“服务级别计算器”](https://slc.alexewerlof.com/) 。您还可以 [邀请您的朋友](https://blog.alexewerlof.com/leaderboard) 免费访问。*
|
||||
|
||||
*And to those of you who already support me: **thank you** for sponsoring this content for others. 🙌 If you have questions or feedback, or want me to dig deeper into something, please let me know in the comments.
|
||||
**感谢** 各位一直以来的支持,你们的赞助让更多人能够看到这些内容。🙌 如果您有任何问题或反馈,或者希望我深入探讨某些话题,请在评论区留言。*
|
||||
134
Clippings/The Picture They Paint of You.md
Normal file
134
Clippings/The Picture They Paint of You.md
Normal file
@@ -0,0 +1,134 @@
|
||||
---
|
||||
title: "The Picture They Paint of You"
|
||||
source: "https://ferd.ca/the-picture-they-paint-of-you.html"
|
||||
author:
|
||||
published:
|
||||
created: 2026-04-13
|
||||
description: "Musings on the way we frame Coding Assistants, AI SREs, and what this communicates in terms of how these roles are perceived."
|
||||
tags:
|
||||
- "clippings"
|
||||
---
|
||||
## The Picture They Paint of You他们笔下的你
|
||||
|
||||
I keep noticing that the way AI SREs and coding agents are sold is fairly different: coding assistants are framed as augmenting engineers and are given names, and AI SREs are named “AI SRE” and generally marketed as a good way to make sure nobody is distracted by unproductive work. I don’t think giving names and anthropomorphizing components or agents is a good thing to do, but the picture that is painted by what is given a name and the framing brought up for tech folks is evocative.
|
||||
我一直注意到,AI SRE 和编码助手的销售方式截然不同:编码助手被定位为增强工程师的能力,并被赋予了名字;而 AI SRE 则被直接命名为“AI SRE”,并通常被宣传为一种确保无人被低效工作分散注意力的有效方法。我认为给组件或代理命名并拟人化并非明智之举,但这种命名方式以及对技术人员的宣传框架确实能引起人们的共鸣。
|
||||
|
||||
This isn’t new; because [people already pointed out how voice assistants generally replicated perceived stereotypes and biases](https://scholar.google.com/scholar_lookup?title=Alexa%2C%20tell%20me%20about%20your%20mother%3A%20the%20history%20of%20the%20secretary%20and%20the%20end%20of%20secrecy&publication_year=2020&author=J.%20Lingel&author=K.%20Crawford) —both in how they’re built but also in how they’re used—all I had to do was keep seeing announcements and being pitched these tools to see the pattern emerge. [Similar arguments are currently made for agents in the age of LLMs](https://abiawomosu.substack.com/p/they-built-stepford-ai-and-called), where agents can be considered to be encoding specific dynamics and values as well.
|
||||
这并非什么新鲜事;因为 [人们早已指出,语音助手通常会复制人们已有的刻板印象和偏见](https://scholar.google.com/scholar_lookup?title=Alexa%2C%20tell%20me%20about%20your%20mother%3A%20the%20history%20of%20the%20secretary%20and%20the%20end%20of%20secrecy&publication_year=2020&author=J.%20Lingel&author=K.%20Crawford) ——无论是在设计上还是使用上——我只需不断看到相关公告和工具推销,就能发现这种模式。 [在逻辑逻辑时代,人们也对智能体提出了类似的论点](https://abiawomosu.substack.com/p/they-built-stepford-ai-and-called) ,认为智能体同样可以编码特定的动态和价值观。
|
||||
|
||||
And so whatever I’m going to discuss here is a small addition to the existing set of perspectives encoded in existing products, and one that is not inclusive (eg. Sales Development Representatives, through AI SDRs, also join all sorts of professions, craftspeople, and artists on this list). I’m using AI SREs and Coding Assistants because I think it’s a very clear example of a divide on two functions that are fairly close together within organizations.
|
||||
因此,我接下来要讨论的内容只是对现有产品中已编码的视角体系的少量补充,而且并不全面(例如,通过人工智能 SDR 实现的销售开发代表,也与各种职业、工匠和艺术家一起被列入其中)。我之所以使用人工智能 SRE 和编码助手,是因为我认为这是一个非常清晰的例子,说明了组织内部两个非常接近的职能之间存在的鸿沟。
|
||||
|
||||
### The observations 观察结果
|
||||
|
||||
Here’s a quick overview of various products as I browsed online and gathered news and announcements from the space. The sampling isn't scientific, but it covers a broad enough set of the players in the current market.
|
||||
以下是我在网上浏览并收集相关新闻和公告后,对各种产品所做的简要概述。虽然样本并非科学严谨,但已涵盖了当前市场上足够多的参与者。
|
||||
|
||||
#### AI SREs AI SRE
|
||||
|
||||
| Vendor 小贩 | Product Name 产品名称 | Framing 框架 | Comments 评论 |
|
||||
| --- | --- | --- | --- |
|
||||
| [bacca.ai berry.ai](https://web.archive.org/web/20260205110719/https://www.bacca.ai/) | AI SRE | “cuts downtime
before it cuts your profits”, “stop firefighting, start innovating”, “frees your engineers from the grind of constant troubleshooting” “在停机时间影响利润之前就减少停机时间”,“停止救火,开始创新”,“让您的工程师摆脱持续故障排除的繁重工作”。 | |
|
||||
| [resolve.ai](https://web.archive.org/web/20260221182125/https://resolve.ai/product/ai-sre) | AI SRE | “Machines on-call for humans”, “Removing the toil of investigations, war rooms, and on-call”, “Operates tools and reasons through complex problems like your expert engineers” [🔗](https://web.archive.org/web/20251122195813/https://resolve.ai/product) “机器随时待命,为人类服务”,“免去调查、作战室和值班的繁琐工作”,“像您的专家工程师一样操作工具并分析复杂问题” [🔗](https://web.archive.org/web/20251122195813/https://resolve.ai/product) | Their [AI SRE buyer’s guide](https://web.archive.org/web/20260204153508/https://resolve.ai/resources/ebook/ai-sre-buyers-guide) also provides framing such as “engineering velocity stalls because teams spend the majority of their time firefighting production issues rather than building new capabilities.” 他们的 [AI SRE 买家指南](https://web.archive.org/web/20260204153508/https://resolve.ai/resources/ebook/ai-sre-buyers-guide) 还提供了诸如“工程速度停滞不前,因为团队将大部分时间用于救火生产问题,而不是构建新功能”之类的框架。 |
|
||||
| [Neubird 纽伯德](https://web.archive.org/web/20260213060424/https://neubird.ai/) | AI SRE, Hawkeye AI SRE,鹰眼 | “No more RCA Delays”, “No more time lost to troubleshooting”, “no more millions lost to downtime, delays, and guesswork.” “不再有 RCA 延误”,“不再浪费时间进行故障排除”,“不再因停机、延误和猜测而损失数百万美元”。 | The name Hawkeye, a superhero product name, is used in press releases and one of the FAQ questions, but is otherwise absent from the product page. There is a closing frame on a video that uses the words "AI SRE Teammate." “鹰眼”(Hawkeye)这个名字,作为一款超级英雄产品的名称,出现在新闻稿和常见问题解答中,但在产品页面的其他位置却找不到。一段视频的结尾画面使用了“AI SRE 团队成员”的字样。 |
|
||||
| [Harness 马具](https://web.archive.org/web/20260221184703/https://www.harness.io/products/ai-sre) | AI SRE, AI Scribe, AI Root Cause Analysis AI SRE、AI Scribe、AI 根本原因分析 | “Scales your response, not your team”, “Reduce MTTR”, “Standardize first response”, “Let AI Handle The Busy Work While Your Team Solves What Matters” “扩展您的响应能力,而非您的团队规模”、“缩短平均修复时间”、“规范首次响应流程”、“让 AI 处理繁琐工作,让您的团队专注于解决真正重要的事情” | Their FAQ explicitly compares human and AI SREs by stating “Traditional SRE relies on manual processes and rule-based automation, while AI SRE uses machine learning to adapt, predict issues, and automate complex decision-making at scale.” 他们的常见问题解答明确地比较了人类和人工智能 SRE,指出“传统 SRE 依赖于手动流程和基于规则的自动化,而人工智能 SRE 使用机器学习来适应、预测问题并大规模地自动执行复杂的决策。” |
|
||||
| [incident.io](https://web.archive.org/web/20260113001845/https://incident.io/ai-sre) | AI SRE | “resolves incidents like your best engineer”, “The SRE that doesn't sleep”, “No need to stall the whole team”, “Keep builders building”, “AI SRE does all the grunt work \[postmortems\] too.” “像你最好的工程师一样解决事件”,“永不睡觉的 SRE”,“无需耽误整个团队”,“让建设者继续建设”,“AI SRE 也承担所有繁重的工作(事后分析)”。 | |
|
||||
| [Rootly 根源](https://web.archive.org/web/20260215142821/https://rootly.com/ai-sre) | AI SRE, Rootly AI AI SRE,Rootly AI | “AI SRE agents and your teams resolve incidents together”, “your expert engineer in every incident”, “quickly identify root causes and the fix—even if you don't know that code” “AI SRE 代理与您的团队共同解决事件”,“您的专家工程师参与每一次事件”,“即使您不了解代码,也能快速识别根本原因并找到解决方案”。 | In late 2025, [the page instead had a framing](https://web.archive.org/web/20250806112712/https://rootly.com/ai-sre) of “Detect, diagnose, and remediate incidents with less effort” with no reference to teamwork 2025 年末, [该页面标题改为](https://web.archive.org/web/20250806112712/https://rootly.com/ai-sre) “以更少的精力检测、诊断和修复事件”,完全没有提及团队合作。 |
|
||||
| [cleric.ai 神职人员.ai](https://web.archive.org/web/20260221192205/https://cleric.ai/) | Cleric 牧师 | “investigates production issues, captures what works, and makes your whole team faster”, “Skip straight to the answer”, “Unblock your engineers”, “调查生产问题,总结有效方法,提升整个团队效率”,“直奔主题,找到答案”,“解开工程师的难题”。 | One of the few with a name, possibly a DnD support role reference. 少数几个有名字的角色之一,可能是龙与地下城辅助角色的参考资料。 |
|
||||
| [AlertD 警报 D](https://web.archive.org/web/20260221192527/https://www.alertd.ai/) | AI SRE | “AI Agents For SREs and DevOps”, “Stop losing hours to scripting and tool switching”, “Unite SRE and DevOps tribal knowledge with AI agents”, “Best-practice AI agent guidance for next steps by your DevOps and SREs”, “Share AI dashboards and insights to act smarter, together”, “Work smarter with your AI” “面向 SRE 和 DevOps 的 AI 代理”、“告别耗时耗力的脚本编写和工具切换”、“将 SRE 和 DevOps 的经验知识与 AI 代理相结合”、“为您的 DevOps 和 SRE 团队提供最佳实践 AI 代理指导,助您迈向下一步”、“共享 AI 仪表板和洞察,携手共进,更智能地行动”、“借助 AI 更智能地工作” | This is one of two products my summary search revealed with a framing that tries to *help* SREs and DevOps instead of having a focus on replacing them. 这是我通过摘要搜索发现的两款产品之一,它们的定位是 *帮助* SRE 和 DevOps,而不是取代他们。 |
|
||||
| [AWS](https://web.archive.org/web/20260221192841/https://aws.amazon.com/devops-agent/) | DevOps Agent DevOps 代理 | “your always-on, autonomous on-call engineer”, “resolves and proactively prevents incidents, continuously improving reliability and performance”, reduce MTTR \[…\] and drive operational excellence.” “您的全天候自主值班工程师”,“解决并主动预防事故,不断提高可靠性和性能”,降低平均修复时间\[…\]并推动卓越运营。” | |
|
||||
| [Ciroos 伊鲁斯](https://web.archive.org/web/20260218151029/https://ciroos.ai/) | Ciroos 西鲁斯 | “Become an SRE superhero”, “increase human ingenuity”, “AI SRE Teammate for site reliability engineering (SRE), IT Operations, and DevOps teams” [🔗](https://web.archive.org/web/20260221192928/https://ciroos.ai/faq), “extends the capabilities of every SRE team” “成为 SRE 超级英雄”、“提升人类创造力”、“面向站点可靠性工程(SRE)、IT 运维和 DevOps 团队的 AI SRE 队友” [🔗](https://web.archive.org/web/20260221192928/https://ciroos.ai/faq) 、“扩展每个 SRE 团队的能力” | Other product that aims to *help* SRE and DevOps teams. Name is relatively human. The automation model described in the FAQ repeats certain myths, but it’s far more transparent and more grounded than others in this list. 另一款旨在 *帮助* SRE 和 DevOps 团队的产品。名称比较人性化。常见问题解答中描述的自动化模型虽然重复了一些常见的误解,但它比列表中的其他产品更加透明和务实。 |
|
||||
|
||||
*Disclaimer: I have not tried any of the above; this list is built from the products’ own pages.
|
||||
免责声明:以上产品我均未尝试过;此列表根据产品官网信息整理而成。*
|
||||
|
||||
Of all of these, only a few mention possible teamwork, and only two of these do so by being a teammate to your SRE staff. Every other one of these instead frames the work as either less important or as worth replacing, sometimes very explicitly. Some have names that refer to superheroes or DnD support classes, most are just named after the role they aim to substitute.
|
||||
所有这些职位中,只有少数提到了团队合作的可能性,而其中只有两个职位是通过与 SRE 团队合作来实现的。其他所有职位要么将这项工作描述得不那么重要,要么认为这项工作可以被替代,有时甚至非常直白。有些职位名称与超级英雄或《龙与地下城》中的辅助职业有关,大多数职位名称则直接来源于它们想要替代的角色。
|
||||
|
||||
#### Coding Assistants 编码助手
|
||||
|
||||
| Vendor 小贩 | Product Name 产品名称 | Framing 框架 | Comments 评论 |
|
||||
| --- | --- | --- | --- |
|
||||
| [Anthropic 人类学](https://web.archive.org/web/20260221115532/https://claude.com/product/claude-code) | Claude Code 克劳德·科德 | “Built for builders / programmers / creators / …”, “Describe what you need, and Claude handles the rest.”, “Stop bouncing between tools”, “meets you where you code”, “you’re in control” “专为建设者/程序员/创作者/…打造”,“描述您的需求,剩下的交给 Claude”,“告别工具切换”,“随时随地满足您的编码需求”,“一切尽在掌控” | Human name, emphasizes aspects of delegation 人名,强调授权的各个方面。 |
|
||||
| [Google 谷歌](https://web.archive.org/web/20260217124358/https://codeassist.google/) | Gemini code assist 双子座密码协助 | “Uncap your potential and get all of your development done”, “Experience coding with fewer limits”, “Accelerate development”, “\[offload\] repetitive tasks”, “reduce code review time” “释放你的潜能,完成所有开发工作”、“体验更少限制的编码”、“加速开发”、“卸载重复性任务”、“缩短代码审查时间” | Name is the latin word for “twins”; framing seeks both augmentation but some delegation. 名称源自拉丁语,意为“双胞胎”;构想既要增强,又要有所委派。 |
|
||||
| [Zed 泽德](https://web.archive.org/web/20260220214456/https://zed.dev/) | Zed (Editor) Zed(编辑) | “minimal code editor crafted for speed and collaboration with humans and AI”, “AI that works the way you code”, “fluent collaboration between humans and AI” “专为速度和人机协作而打造的极简代码编辑器”、“以你编写代码的方式工作的 AI”、“人机流畅协作” | Not technically a coding assistant, but an environment to collaborate with them 严格来说,它不是编码助手,而是一个与他们协作的环境。 |
|
||||
| [Github](https://web.archive.org/web/20260221142922/https://github.com/features/copilot) | Copilot 副驾驶 | “Command your craft”, “accelerator for every workflow”, “stay in your flow”, “code, command, and collaborate”, “Ship faster with AI that codes with you” “掌控你的技艺”、“加速各种工作流程”、“保持你的创作灵感”、“编码、指挥和协作”、“借助与你协同编码的 AI 更快地交付产品” | The naming fits a role that is collaborative, and both it and the positioning try to articulate collaboration while you lead. 这个名称符合协作角色的特点,它和定位都试图阐明在你领导的同时进行协作。 |
|
||||
| [Cline 克莱恩](https://web.archive.org/web/20260219181524/https://cline.bot/) | Cline 克莱恩 | “Your coding partner”, “Collaborative by nature, autonomous when permitted”, “fully collaborative AI partner”, “Make coordinated changes across large codebases” “您的编码伙伴”、“天生协作,获准自主运行”、“完全协作的 AI 伙伴”、“在大型代码库中进行协调更改” | |
|
||||
| [Windsurf 风帆冲浪](https://web.archive.org/web/20260217232640/https://windsurf.com/editor) | Cascade, Editor Cascade,编辑 | “most powerful way to code with AI”, “limitless power, complete flow”, “saves you time and helps you ship products faster”, “removes the vast amounts of time spent of boilerplate and menial tasks so that you can focus on the fun and creative parts of building.” “使用 AI 进行编码的最强大方式”、“无限的力量,完整的流程”、“节省您的时间并帮助您更快地交付产品”、“消除大量花费在样板和琐碎任务上的时间,以便您可以专注于构建过程中有趣和创造性的部分”。 | Not technically a coding assistant for the editor side, but also provides agents. 严格来说,它不是编辑器端的编码助手,但也提供代理。 |
|
||||
| [Cursor 光标](https://web.archive.org/web/20260220093030/https://cursor.com/) | Cursor (editor) 光标(编辑器) | “Built to make you extraordinarily productive”, “accelerate development by handing off tasks”, “reviews your PRs, collaborates in Slack, and runs in your terminal”, “develop enduring software” “旨在显著提升您的工作效率”、“通过任务移交加速开发”、“审核您的 PR、在 Slack 中协作并在您的终端上运行”、“开发持久耐用的软件” | Also not a coding assistant, but has tabs to interact with them. 它虽然不是编程助手,但有选项卡可以与之交互。 |
|
||||
| [OpenAI](https://web.archive.org/web/20260213164900/https://chatgpt.com/codex) | Codex 法典 | “Built to drive real engineering work”, “reliably completes tasks end to end, like building features, complex refactors, migrations, and more”, “command center for agentic coding”, “Adapts to how your team builds”, “Made for always-on background work” “专为驱动实际工程工作而打造”,“可靠地完成端到端任务,例如构建功能、复杂重构、迁移等等”,“智能编码的指挥中心”,“适应团队的构建方式”,“专为持续后台运行而设计” | This is one of the few AI coding tools orients itself into a more definitive substitutive role, even if it stills pays lip service to working with your team. 这是为数不多的将自身定位为更明确的替代角色的 AI 编码工具之一,即使它仍然口头上支持与你的团队合作。 |
|
||||
|
||||
*Disclaimer: I have tried some of the above, but not all; this list is built from the products’ own pages.
|
||||
免责声明:以上部分产品我已尝试过,但并非全部;此列表根据产品自身页面信息整理而成。*
|
||||
|
||||
You can see from the tables above that each of these tools has a more distinct name, with some being a person’s name. The vast majority of these are framed as tools that aim to augment an engineer or a team, to make them more productive, let them do more within their roles.
|
||||
从上表可以看出,每种工具都有一个比较独特的名称,有些甚至以人名命名。绝大多数工具都被定位为旨在增强工程师或团队能力的工具,以提高他们的工作效率,让他们在各自的岗位上完成更多工作。
|
||||
|
||||
### So what are the implications here?那么,这其中意味着什么呢?
|
||||
|
||||
The way these products are presented paints two very distinct pictures (even if exceptions exist in each category):
|
||||
这些产品的呈现方式描绘了两种截然不同的景象(即使每个类别中都存在例外情况):
|
||||
|
||||
1. Software Engineering work is perceived as valuable work; the engineer is in control and deserves more power, more control, more productivity. The AI exists to be a partner, a teammate, or an assistant.
|
||||
软件工程工作被认为是一项有价值的工作;工程师掌握主动权,理应拥有更大的权力、更大的控制权和更高的生产力。人工智能的存在是为了成为合作伙伴、队友或助手。
|
||||
2. Software Reliability Engineering work is a hindrance; teams need to be distracted less by these tasks and instead focus on more valuable work. Human limitations—such as needing to sleep—need to be overcome. The AI exists to replace or be a substitute to the worker.
|
||||
软件可靠性工程工作是一种阻碍;团队需要减少这些任务带来的干扰,转而专注于更有价值的工作。人类的局限性——例如需要睡眠——需要克服。人工智能的存在是为了取代或替代工人。
|
||||
|
||||
These models potentially replicate and project to the rest of the world the ways these roles are perceived internally.
|
||||
这些模型有可能复制并向世界其他地区展现这些角色在公司内部的认知方式。
|
||||
|
||||
For example, I’ve written in the past about how I see [incidents and outages as worthy learning opportunities to orient organizations](https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html); this framing necessarily perceives SRE as doing important work you wouldn’t want to ignore. The vision behind AI SREs is the opposite. Incidents and outages are one-off exceptions to paper over and move on from, rather than a structural and emergent consequence of what you do (and how you do it) and from which you should learn.
|
||||
例如,我过去曾撰文阐述我如何将 [事件和故障视为宝贵的学习机会,以帮助组织调整方向](https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html) ;这种观点必然将 SRE 视为一项不容忽视的重要工作。而 AI SRE 的愿景则截然相反。事件和故障被视为一次性的例外情况,可以草草了事,而不是你工作方式(以及工作内容)的结构性后果,你应该从中吸取教训。
|
||||
|
||||
This sort of thing is interesting because it can also be indicative of the split between what practitioners think of their work (learning from incidents is a necessity), and what decision-makers above them may think of the work and function (these postmortems are grunt work).
|
||||
这种事情很有趣,因为它也可以表明从业人员对自己工作的看法(从事故中吸取教训是必要的)与他们之上的决策者对工作和职能的看法(这些事后分析是枯燥乏味的工作)之间的分歧。
|
||||
|
||||
Much like [AI assistants shaped after secretaries were described as showing a vision that mimics the relation between servants and masters](https://catalystjournal.org/index.php/catalyst/article/view/29586), the way we frame AI tooling for all types of workers exposes the way *their* builders think about that work.
|
||||
就像 [以秘书为原型设计的 AI 助手被描述为展现了一种模仿仆人和主人之间关系的愿景一样](https://catalystjournal.org/index.php/catalyst/article/view/29586) ,我们为各种类型的工作者构建 AI 工具的方式,暴露了 *其* 构建者对这项工作的看法。
|
||||
|
||||
But it’s also a signal about how the *buyers* feel about that work. In case the role sold is one of a partner or teammate, you need to sell this idea to both the employee who’ll work with the tool, and to the employer who will pay for it. When you sell technology that *replaces* a role or function, then you only need to speak to the person with the money.
|
||||
但这同时也反映了 *买家* 对这项工作的看法。如果出售的是合作伙伴或团队成员的角色,你需要同时说服使用该工具的员工和为其付费的雇主。而如果你出售的是 *替代* 某个角色或职能的技术,那么你只需要与掌握资金的人沟通即可。
|
||||
|
||||
The implication then is that what these tools project is a mix of how the role is perceived on either side of the transaction. If, as an employee, you feel like the tools are only doing part of the work you value, that may imply few people with power or influence actually value it the same way you do.
|
||||
这意味着这些工具所呈现的内容,反映了交易双方对自身角色的认知差异。如果你作为员工,觉得这些工具只能完成你所重视的部分工作,这可能意味着,真正拥有权力和影响力的人,很少有人像你一样重视这项工作。
|
||||
|
||||
This does not mean organizations can fully succeed in the substitution effort. Time and time again history has shown that *part* of a role can be automated and centralized, and the rest of it will be piled onto fewer individuals who will do the hard-to-automate bits and will then coordinate the automation for the rest of it—something called [the left-over principle](https://www.kitchensoap.com/2013/08/20/a-mature-role-for-automation-part-ii/).
|
||||
但这并不意味着组织就能在替代工作中完全成功。历史一次又一次地表明,一项工作的 *一部分* 可以实现自动化和集中化,而剩余部分则会落到少数人身上,这些人负责完成难以自动化的部分,然后协调其余部分的自动化——这就是所谓的 [“剩余原则”](https://www.kitchensoap.com/2013/08/20/a-mature-role-for-automation-part-ii/) 。
|
||||
|
||||
As automation capacity increases and as organizations transform themselves to make room for it all, the dynamic evolves.
|
||||
随着自动化能力的提高以及组织机构为了适应自动化而进行的转型,这种动态也在不断演变。
|
||||
|
||||
It’s already pretty clear to me that the vision many builders and buyers have of SREs is often a very reductionist and unflattering one. The role hasn’t yet gone away, possibly because there’s more to it than builders and buyers believe. I figure the evolving portrait of software engineering is equally incomplete at this point, depending on the complexity of the system you’re trying to create and control.
|
||||
我相当清楚地看到,许多开发者和买家对 SRE 的理解往往过于简化,甚至有些贬低。SRE 这个角色至今仍未消失,或许是因为它远比开发者和买家想象的要复杂得多。我认为,目前软件工程的图景同样还不完整,这取决于你试图创建和控制的系统的复杂程度。
|
||||
|
||||
### What are they now painting?他们现在在画什么?
|
||||
|
||||
Just for fun, I also looked at how the frameworks that promise to automate all code generation are framed. Codex in the table above is inching that way, but the portfolio grows.
|
||||
出于兴趣,我还研究了一下那些号称能实现代码自动生成的框架是如何构建的。上表中的 Codex 正在朝着这个方向发展,但这类框架还在不断增加。
|
||||
|
||||
Anthropic is introducing [agent teams](https://web.archive.org/web/20260219045316/https://code.claude.com/docs/en/agent-teams) where the teammates are *below* you. You are directing a team lead that in turn directs teammates. The discourse is one of *control*, where collaboration is delegated to agents, which you can still *manage* more directly. [GasTown](https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04) puts you in the seat of a product manager, and the entire development team is abstracted into deeper hierarchies. [Amp](https://web.archive.org/web/20260213164921/https://ampcode.com/) is also about coordinating agents (of various skills, roles, and costs) while targeted to developers still, but doesn’t drive the analogy as hard.
|
||||
Anthropic 引入了 [代理团队的](https://web.archive.org/web/20260219045316/https://code.claude.com/docs/en/agent-teams) 概念,团队成员位于你的 *下属* 。你领导一个团队负责人,该负责人再领导团队成员。这种模式的核心在于 *控制* ,协作被委托给代理,但你仍然可以更直接地 *管理他们* [。GasTown](https://web.archive.org/web/20260213164921/https://ampcode.com/) [让](https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04) 你扮演产品经理的角色,整个开发团队被抽象成更深层次的层级结构。Amp 也旨在协调不同技能、角色和成本的代理,虽然目标用户仍然是开发者,但它并没有像 GasTown 那样强调这种类比。
|
||||
|
||||
The enthusiasm is there, and more reports are coming around the *Software Factory* approach, such as [StrongDM experimenting with code that must not be reviewed by humans](https://simonwillison.net/2026/Feb/7/software-factory/) or the [outcome engineering manifesto](https://web.archive.org/web/20260217211224/https://o16g.com/) which imply that the future is in being a high-level controller around large groups of faceless agents, which you must constrain and provide enough information to in order for them to act well.
|
||||
人们热情高涨,越来越多的报告开始关注 *软件工厂* 方法,例如 [StrongDM 正在试验无需人工审查的代码](https://simonwillison.net/2026/Feb/7/software-factory/) ,或者 [成果工程宣言](https://web.archive.org/web/20260217211224/https://o16g.com/) 暗示,未来在于成为大型无面孔代理群体的高级控制器,你必须约束这些代理并提供足够的信息,才能使它们良好地行动。
|
||||
|
||||
The trend is seemingly moving away from a partnership between the software engineer and their automation, and into a view that reminds me far more of Taylorism. Maybe that shift is happening because that’s generally what comes to mind when people think of automating production away from manual work.
|
||||
这种趋势似乎正在从软件工程师与其自动化系统之间的伙伴关系转向一种更接近泰勒制的视角。或许这种转变的出现是因为,当人们想到用自动化生产取代人工操作时,通常会想到泰勒制。
|
||||
|
||||
These products are conceptualized by analogy. Take a pattern you know, and replicate some key properties in a different space. This is an absolutely normal way of exploring new areas, of transferring understanding from one domain to another.
|
||||
这些产品的概念化源于类比。选取一个你熟悉的模式,并将一些关键属性复制到不同的领域。这是一种探索新领域、将理解从一个领域迁移到另一个领域的非常正常的途径。
|
||||
|
||||
I get that spitting code fast is valuable for many. But if we believe workers can bring more to the table than Taylor did, then this vision is limiting. If we believe that this doesn’t apply because the agents are not that capable, then reductive anthropomorphism isn’t fitting either. In both cases, we should demand and seek better analogies, because a better representation of work as we do it should result in better tools.
|
||||
我明白快速编写代码对很多人来说很有价值。但如果我们认为员工能比泰勒做得更多,那么这种观点就具有局限性。如果我们认为这种情况不适用,因为员工的能力还不够强,那么简化的拟人化描述也同样不合适。无论哪种情况,我们都应该要求并寻求更好的类比,因为对实际工作方式的更准确描述应该能带来更好的工具。
|
||||
|
||||
That’s because as much as an analogy can be a lever, it can also be a straitjacket. When you’re stuck inside a model, you interpret everything in its own terms, and it becomes much harder to adopt a different perspective or to break out of the oversimplification. And once you’ve made sense of the new space well enough, you ideally don’t need to rely on the analogy anymore: your understanding stands on its own.
|
||||
这是因为,类比既可以成为一种杠杆,也可能成为一种束缚。当你被困在某个模型中时,你会用它自身的逻辑来解读一切,这样就很难换个角度思考,也很难跳出过度简化的思维模式。而一旦你对新的领域有了足够深入的理解,理想情况下,你就不再需要依赖类比了:你的理解本身就足够了。
|
||||
|
||||
In accepting the Taylorist software factory frameworks or AI SREs built while framing the work as low-status, we also—at a social level—tacitly amplify these representations and give them validity. This is necessarily done at the cost of alternative designs, by settling the space with products conceived as poor caricatures of actual work. It lacks respect and is conceptually weak.
|
||||
当我们接受泰勒制的软件工厂框架或将工作视为低地位的 AI SRE 时,我们也在社会层面上默许地强化了这些刻板印象,并赋予它们合法性。这必然会以牺牲其他设计方案为代价,因为最终占据市场的产品是对实际工作的拙劣模仿。这种做法缺乏尊重,且在概念上站不住脚。
|
||||
|
||||
We keep being told it has never been cheaper, easier, or more accessible to create new stuff. This should give everyone involved more time to explore the problem space and learn. Yet here we are.
|
||||
我们一直被告知,创造新事物从未如此便宜、容易和便捷。这本应让所有参与者有更多时间去探索问题领域并学习。然而,现实却并非如此。
|
||||
|
||||
The picture they paint of you says a lot. Just not about you.
|
||||
他们对你描绘的形象说明了很多问题,但并非关于你本人。
|
||||
Reference in New Issue
Block a user