Files
nexus/knowledgebase/DevOps & SRE/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md

4.6 KiB

title, type, source-type, category, tags, date-added, video-source, audio-source, status
title type source-type category tags date-added video-source audio-source status
Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402 160113-Meeting Recording cloud-learning video DevOps & SRE/04_EKS
OpenTelemetry
Observability
2026-04-14 nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402_160113-Meeting Recording.mp4 summarized (Gemini 摘要)

Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402 160113-Meeting Recording

Source: NAS /volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402_160113-Meeting Recording.mp4

Type: VIDEO | Category: 04_EKS

Status: 🟡 Awaiting Whisper transcription → Summary


Observability with Open Telemetry

Jay Comer, Solutions Architect with AWS, presented an overview of observability with OpenTelemetry, including changes and updates within the AWS observability ecosystem since the last session a year ago. The session included a demo showing how to piece together the components and how to instrument an application with OpenTelemetry.

Observability is defined as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. These outputs include logs, metrics, and traces, which are correlated with the application's health. As systems transition to micro-service-based architectures, the observability challenge becomes more prominent due to increasing complexity. Downtime can cost significant money and effort, with Gartner estimating an average of 87 hours per year of downtime, costing $42,000 per hour.

The three signals used for observability are metrics, logs, and traces. Metrics are aggregated source statistics, logs help determine the root cause of problems, and traces provide a holistic view of a specific request within the system. A trace span includes a start time, a duration, and metadata such as a log.

The AWS observability landscape includes AWS native services like CloudWatch and X-Ray, as well as managed services of open-source implementations like Grafana, OpenSearch, Prometheus, and OpenTelemetry. OpenTelemetry aims to solve the problem of disparate SDKs and tooling for different components within the observability landscape by providing an instrumentation language with different SDKs per language. It offers an end-to-end implementation for making telemetry data accessible and usable and is vendor-agnostic.

OpenTelemetry is a data format with support for 11 language SDKs and automates instrumentation. The OpenTelemetry collector standardizes and transforms data into the OpenTelemetry protocol (OTLP) format and exports it to different destinations. The collector includes receivers (AWS-specific or open source), processors (filtering, transformations), exporters (AWS native, open source, or third-party), and extensions (SIGV for authorization, health check).

The AWS distribution for OpenTelemetry is a unified agent for collecting traces, metrics, and logs. It includes an operator that automatically instruments applications by detecting the language used and creating pre-configured OpenTelemetry collectors. Custom attributes, such as tenant IDs, can be added to OpenTelemetry items.

Recent announcements focused on security and compliance, scale and region expansion, and a centralized pane of glass with an improved user experience. The managed service collector for Amazon Prometheus provides a serverless, agentless scraper that automatically discovers and pulls Prometheus-compatible metrics. Log support was added to the AWS distribution for OpenTelemetry, and Amazon Managed Grafana now supports community plugins.

The demo showcased a sample application running on EKS, using Fluent Bit for collecting logs and forwarding them to the OpenTelemetry container. The OpenTelemetry container collects traces and metrics from the application, sending logs, traces, and metrics to Amazon OpenSearch Service via an ingestion pipeline. The source code included Fluent Bit and OpenTelemetry YAML configuration files. The output that Fluent Bit is sending the individual logs to is the Open Telemetry endpoint on the port 55681. On a code level, the implementation involves importing OpenTelemetry SDKs, configuring a trace provider, and starting a span with the tracer at each point where instrumentation and request duration measurement are needed.

OpenSearch dashboards can display latency by trace group and an application composition map, showing where bottlenecks are appearing.