Sync: add ai finops and deployment notes
This commit is contained in:
32
wiki/concepts/AIFinOps.md
Normal file
32
wiki/concepts/AIFinOps.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "AIFinOps"
|
||||
type: concept
|
||||
tags: ["finops", "cost-optimization", "cloud-economics"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- AI FinOps
|
||||
- AI Financial Operations
|
||||
- LLM Cost Management
|
||||
|
||||
## Definition
|
||||
AI FinOps(Financial Operations)是 [[AutonomousOptimizationArchitect]] 的成本管理框架——持续追踪每个 LLM Provider 的 Token 消耗、成本、延迟和输出质量,建立历史性能数据库,为 [[SemanticRouting]] 提供成本感知的决策依据。目标是实现 AI 运营成本的可预测性和可控性。
|
||||
|
||||
## Mechanism
|
||||
1. **遥测数据收集**:每次 API 调用记录 Token 数量、响应时间、错误率、成本
|
||||
2. **成本建模**:按 Provider、模型、任务类型建立成本分解模型
|
||||
3. **异常检测**:检测异常流量模式(如 500% 流量突增,可能为 bot 攻击)
|
||||
4. **预算告警**:当成本接近阈值时触发告警
|
||||
5. **优化建议**:基于历史数据生成成本优化建议(如切换到 Gemini Flash)
|
||||
|
||||
## Key Properties
|
||||
- **成本透明**:每百万 Token 成本精确追踪
|
||||
- **可预测性**:基于历史趋势预测未来成本
|
||||
- **与治理对齐**:为 [[CircuitBreaker]] 提供成本异常检测数据
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — AIFinOps 是成本管理的核心框架
|
||||
- [[SemanticRouting]] — 成本数据是路由决策的关键输入
|
||||
- [[CircuitBreaker]] — 异常成本流量触发熔断保护
|
||||
31
wiki/concepts/CircuitBreaker.md
Normal file
31
wiki/concepts/CircuitBreaker.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "CircuitBreaker"
|
||||
type: concept
|
||||
tags: ["reliability", "fault-tolerance", "llm-ops"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- Circuit Breaker
|
||||
- 熔断器
|
||||
- Circuit Breaker Pattern
|
||||
|
||||
## Definition
|
||||
熔断器模式是 [[AutonomousOptimizationArchitect]] 的核心安全机制——当某个 LLM Provider 的失败频率超过阈值(如 HTTP 402/429 错误、响应超时)时,自动切断该 Provider 并切换至廉价兜底方案,同时触发告警通知人工介入。
|
||||
|
||||
## Mechanism
|
||||
1. **监测**:追踪每个 Provider 的失败计数和失败率
|
||||
2. **触发**:当失败次数超过 `maxRetries` 阈值,或检测到 HTTP 402/429 错误流时,立即 trip 熔断器
|
||||
3. **降级**:所有请求切换到预配置的廉价兜底 Provider(如 Gemini Flash)
|
||||
4. **恢复**:人工确认问题解决后手动重置,或经过冷却期后自动尝试恢复
|
||||
|
||||
## Key Properties
|
||||
- **防止成本失控**:阻止 Token 消耗攻击(如恶意 bot 短时间内大量请求)
|
||||
- **防止无限重试**:每个 Provider 配置最大重试次数 `maxRetries`
|
||||
- **分级降级**:逐级切换到更便宜的备用 Provider,直到找到可用路径
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — 使用 CircuitBreaker 作为金融护栏的核心实现
|
||||
- [[LLMasJudge]] — 评估 Provider 降级后输出质量是否可接受
|
||||
- [[ShadowTraffic]] — 熔断触发后可异步在影子流量中测试备用 Provider
|
||||
31
wiki/concepts/DarkLaunching.md
Normal file
31
wiki/concepts/DarkLaunching.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "DarkLaunching"
|
||||
type: concept
|
||||
tags: ["deployment", "release-management", "feature-rollout"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- Dark Launch
|
||||
- 暗启动
|
||||
- 灰度发布
|
||||
- Feature Flag Deployment
|
||||
|
||||
## Definition
|
||||
暗启动是 [[AutonomousOptimizationArchitect]] 的模型引入策略——在不完全暴露给用户的前提下,将新模型部署到生产环境,通过 [[ShadowTraffic]] 验证其性能。分为三个阶段:影子测试(不返回用户)→ 灰度流量(5% 用户)→ 全量切换。
|
||||
|
||||
## Mechanism
|
||||
1. **Phase 1 - Shadow Deployment**:新模型接收影子流量,完全不影响用户
|
||||
2. **Phase 2 - Canary**:5% 真实流量切换到新模型,监控错误率和用户满意度
|
||||
3. **Phase 3 - Full Rollout**:新模型通过所有检查后,全量替换旧模型
|
||||
|
||||
## Key Properties
|
||||
- **风险可控**:任何阶段发现问题均可立即回滚
|
||||
- **数据驱动**:每个阶段都有明确的量化指标门槛
|
||||
- **与 CI/CD 集成**:暗启动可作为自动化发布流水线的组成部分
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — 使用暗启动作为新模型引入框架
|
||||
- [[ShadowTraffic]] — 暗启动 Phase 1 的核心实现方式
|
||||
- [[CircuitBreaker]] — 提供暗启动失败时的自动保护机制
|
||||
31
wiki/concepts/LLMasJudge.md
Normal file
31
wiki/concepts/LLMasJudge.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "LLMasJudge"
|
||||
type: concept
|
||||
tags: ["evaluation", "llm-evaluation", "quality-assurance"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- LLM as a Judge
|
||||
- LLM-as-Judge
|
||||
- LLM-as-a-Judge Grading
|
||||
|
||||
## Definition
|
||||
LLM-as-a-Judge 是 [[AutonomousOptimizationArchitect]] 的评分机制——使用一个独立的 LLM(如 Claude Opus)作为"裁判",对实验模型和生产模型的输出进行客观评分,避免人工评审的主观偏差。评分维度包括:JSON 格式正确性(5分)、延迟(3分)、幻觉检测(-10分)等。
|
||||
|
||||
## Mechanism
|
||||
1. **评分标准预先建立**:在 [[ShadowTraffic]] 测试前,[[AutonomousOptimizationArchitect]] 明确建立数学评分标准
|
||||
2. **异步评估**:实验模型和生产模型同时处理任务,裁判 LLM 盲评两者输出
|
||||
3. **统计分析**:累积足够样本后进行统计显著性检验
|
||||
4. **自主决策**:实验模型显著优于基准时,更新路由权重
|
||||
|
||||
## Key Properties
|
||||
- **客观性**:消除人工评分的主观偏差
|
||||
- **可扩展**:可同时评估多个 Provider 的输出
|
||||
- **数据驱动**:评分结果直接驱动 [[SemanticRouting]] 决策
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — LLM-as-Judge 是核心评估工具
|
||||
- [[ShadowTraffic]] — 提供实验与基准并行执行的流量环境
|
||||
- [[SemanticRouting]] — 评分结果更新路由权重
|
||||
32
wiki/concepts/SemanticRouting.md
Normal file
32
wiki/concepts/SemanticRouting.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "SemanticRouting"
|
||||
type: concept
|
||||
tags: ["routing", "llm-ops", "intelligent-routing"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- Semantic Routing
|
||||
- 语义路由
|
||||
- Intent Routing
|
||||
- Task-Aware Routing
|
||||
|
||||
## Definition
|
||||
语义路由是 [[AutonomousOptimizationArchitect]] 的决策核心——根据任务类型、历史性能评分和当前 Provider 状态,动态选择最优的 LLM Provider。Provider 按"优化分数"(Speed + Cost + Accuracy 综合排名)排序,优先尝试排名最高的可用 Provider。
|
||||
|
||||
## Mechanism
|
||||
1. **任务分析**:理解用户请求的类型和复杂度(如代码生成 vs. 闲聊)
|
||||
2. **Provider 排名**:按历史优化分数对所有 Provider 排序
|
||||
3. **动态选择**:从最高排名 Provider 开始尝试,直到找到可用且在成本限制内的 Provider
|
||||
4. **持续学习**:[[LLMasJudge]] 评分结果更新各 Provider 在特定任务类型上的排名
|
||||
|
||||
## Key Properties
|
||||
- **成本感知**:始终追踪每百万 Token 成本,优先使用低成本模型
|
||||
- **性能自适应**:根据 [[ShadowTraffic]] 数据动态调整排名
|
||||
- **故障感知**:熔断器切断的 Provider 自动跳过
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — 语义路由是核心路由决策逻辑
|
||||
- [[CircuitBreaker]] — 提供故障感知的 Provider 过滤
|
||||
- [[LLMasJudge]] — 提供更新路由权重的数据
|
||||
32
wiki/concepts/ShadowTraffic.md
Normal file
32
wiki/concepts/ShadowTraffic.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "ShadowTraffic"
|
||||
type: concept
|
||||
tags: ["testing", "a-b-testing", "dark-launch"]
|
||||
sources: ["engineering-autonomous-optimization-architect"]
|
||||
last_updated: 2026-04-26
|
||||
---
|
||||
|
||||
## Aliases
|
||||
- Shadow Traffic
|
||||
- 影子流量
|
||||
- Shadow Testing
|
||||
- 暗测试
|
||||
|
||||
## Definition
|
||||
影子流量是 [[AutonomousOptimizationArchitect]] 的核心测试机制——将一小部分真实用户请求(通常 5%)异步复制到实验模型,与生产模型并行执行,但不返回给用户。实验结果通过 [[LLMasJudge]] 自动评分,用于决定是否将实验模型提升为生产模型。
|
||||
|
||||
## Mechanism
|
||||
1. **流量复制**:用户请求同时发送至生产模型和实验模型
|
||||
2. **异步评估**:实验模型结果不阻塞用户响应,通过 [[LLMasJudge]] 异步评分
|
||||
3. **统计分析**:累积 N 次(如 1000 次)执行后评估性能差距
|
||||
4. **自主升级**:实验模型统计显著优于基准时,自动更新路由权重
|
||||
|
||||
## Key Properties
|
||||
- **零用户影响**:实验在后台进行,用户永远获得生产模型响应
|
||||
- **真实数据**:使用真实用户请求,而非人工构造的测试用例
|
||||
- **持续运行**:可 24/7 不间断运行,持续监控新模型发布
|
||||
|
||||
## Connections
|
||||
- [[AutonomousOptimizationArchitect]] — 影子流量是核心测试基础设施
|
||||
- [[LLMasJudge]] — 对影子流量结果进行自动评分
|
||||
- [[DarkLaunching]] — 影子流量是暗启动的测试阶段
|
||||
Reference in New Issue
Block a user