62 lines
5.0 KiB
Markdown
62 lines
5.0 KiB
Markdown
---
|
||
title: "Public Cloud Learning Sessions - Observability with OpenTelemetry - 20240402"
|
||
type: source
|
||
tags:
|
||
- OpenTelemetry
|
||
- Observability
|
||
- AWS
|
||
- EKS
|
||
date: 2024-04-02
|
||
---
|
||
|
||
## Source File
|
||
- [[raw/Cloud & DevOps/Public-Cloud-Learning-Sessions/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md]]
|
||
|
||
## Summary(用中文描述)
|
||
- 核心主题:AWS OpenTelemetry 可观测性解决方案全景介绍,包括 OpenTelemetry 核心概念、AWS 发行版功能及 EKS 环境下的完整演示
|
||
- 问题域:微服务架构下 observability 的挑战(系统复杂度增加、外部输出难以推断内部状态、Gartner 估计年均 87 小时停机时间、每小时 $42,000 成本)
|
||
- 方法/机制:三信号可观测性模型(Metrics/Logs/Traces)、OpenTelemetry 统一 SDK(11 种语言支持)、OTLP 标准化协议、AWS Distribution for OpenTelemetry 自动注入、OpenTelemetry Collector 组件(Receivers/Processors/Exporters/Extensions)、Fluent Bit 日志采集 → OpenTelemetry → Amazon OpenSearch 端到端管道
|
||
- 结论/价值:OpenTelemetry 提供 vendor-agnostic 的统一可观测性方案,AWS 发行版简化 EKS 环境部署,最新发布强化了安全合规、规模化、用户体验和日志支持
|
||
|
||
## Key Claims(用中文描述)
|
||
- 微服务架构导致可观测性挑战更加突出,因为系统复杂度随服务数量增加而指数增长
|
||
- 三信号(Metrics/Logs/Traces)共同构成完整可观测性视图:Metrics 提供聚合统计、Logs 定位根因、Traces 呈现请求全链路
|
||
- OpenTelemetry 通过统一数据格式和跨语言 SDK 解决了不同组件使用不同 SDK 和工具的碎片化问题
|
||
- AWS Distribution for OpenTelemetry 提供统一代理,自动检测应用语言并创建预配置 Collector,实现零侵入式自动注入
|
||
- Fluent Bit 将日志发送到 OpenTelemetry 容器(端口 55681),由 OpenTelemetry Collector 统一处理后导出至 OpenSearch
|
||
- OpenSearch Dashboard 可按 trace group 展示延迟并通过应用组成图定位性能瓶颈
|
||
|
||
## Key Quotes
|
||
> "Observability is defined as a measure of how well internal states of a system can be inferred from knowledge of its external outputs." — Jay Comer,AWS 演讲开场定义
|
||
> "OpenTelemetry aims to solve the problem of disparate SDKs and tooling for different components within the observability landscape by providing an instrumentation language with different SDKs per language." — OpenTelemetry 核心价值定位
|
||
> "The output that Fluent Bit is sending the individual logs to is the Open Telemetry endpoint on the port 55681." — Demo 中的关键配置细节
|
||
|
||
## Key Concepts
|
||
- [[OpenTelemetry]]:云原生计算基金会(CNCF)项目,提供跨语言的统一遥测数据采集标准,包含 SDK(11 种语言)、OTLP 协议和 Collector 组件
|
||
- [[Observability(可观测性)]]:通过系统外部输出(logs/metrics/traces)推断内部状态的能力,微服务架构的核心挑战
|
||
- [[Three Signals(三信号)]]:Metrics(聚合统计)、Logs(根因定位)、Traces(全链路追踪),三者共同构成完整可观测性视图
|
||
- [[OTLP(OpenTelemetry Protocol)]]:OpenTelemetry 的标准化数据传输协议,Collector 将数据导出至不同后端
|
||
- [[OpenTelemetry Collector]]:标准化和转换遥测数据的组件,包含 Receivers(接收器)、Processors(处理器)、Exporters(导出器)和 Extensions(扩展)
|
||
- [[AWS Distribution for OpenTelemetry]]:AWS 提供的 OpenTelemetry 统一代理,支持 Traces/Metrics/Logs 自动采集和 EKS Operator 自动注入
|
||
- [[Fluent Bit]]:开源日志处理器和转发器,在 EKS 中采集容器日志并转发至 OpenTelemetry 端点
|
||
|
||
## Key Entities
|
||
- [[Jay Comer]]:AWS 解决方案架构师,主讲本次 OpenTelemetry 可观测性专题
|
||
- [[Amazon EKS]]:AWS 托管 Kubernetes 服务,演示中运行示例应用的环境
|
||
- [[Amazon OpenSearch Service]]:AWS 托管搜索和分析服务,演示中作为遥测数据后端存储
|
||
- [[Amazon CloudWatch]]:AWS 原生监控服务,属于 AWS 可观测性生态但非本次演示重点
|
||
- [[AWS X-Ray]]:AWS 原生分布式追踪服务,属于 AWS 可观测性生态但非本次演示重点
|
||
- [[Grafana]]:开源可观测性平台,AWS 可观测性生态的重要组成部分
|
||
- [[Prometheus]]:开源指标采集系统,AWS Managed Service Collector for Prometheus 提供无服务器的自动抓取能力
|
||
- [[Fluent Bit]]:CNCF 毕业项目,轻量级日志处理器,用于 EKS 环境容器日志采集
|
||
|
||
## Connections
|
||
- [[ctp-topic-54-esm-saas-log-analytics]] ← related_to ← [[ctp-topic-67-cloud-native-observability-using-opentelemetry]]
|
||
- [[Amazon EKS]] ← instrumented_by ← [[OpenTelemetry]]
|
||
- [[Fluent Bit]] ← sends_logs_to ← OpenTelemetry Collector (port 55681)
|
||
- [[OpenTelemetry Collector]] ← exports_to ← [[Amazon OpenSearch Service]]
|
||
- [[OpenTelemetry]] ← is_standardized_by ← [[OTLP(OpenTelemetry Protocol)]]
|
||
|
||
## Contradictions
|
||
- (暂无检测到与其他 Wiki 页面的冲突内容)
|