Auto-sync: wiki-ingest 3 sources (2026-04-16)

2026-04-16 00:08:35 +08:00
parent 9688f3f54b
commit 5ae9550d8c
267 changed files with 9537 additions and 1163 deletions
--- a/SRE/04_EKS/ctp-topic-29-cloud-monitoring-saas-lz-accounts.md
+++ b/SRE/04_EKS/ctp-topic-29-cloud-monitoring-saas-lz-accounts.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 29 Cloud Monitoring – SaaS LZ accounts"
+title: CTP Topic 29 Cloud Monitoring – SaaS LZ accounts
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - Monitoring
@@ -10,9 +10,9 @@ tags:
  - Landing-Zone
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 29_ Cloud Monitoring – SaaS LZ accounts.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 29_ Cloud Monitoring – SaaS LZ accounts.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 29 Cloud Monitoring – SaaS LZ accounts
@@ -27,7 +27,14 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## AWS Cloud Monitoring with OpsBridge
+
+The session covers AWS cloud monitoring using Micro Focus OpsBridge, focusing on a new Cloud Monitoring feature. This containerized solution can be deployed on-prem or on AWS EKS and supports monitoring over 20 AWS data services, with data stored in an optic data lake using Vertica for performance dashboarding and reporting. The architecture collects data from CloudWatch metrics using read-only access to monitored accounts, correlating data and updating the configuration management database.
+
+Key points include deployment, monitoring setup, and operations. Cloud Monitoring is enabled within OpsBridge, requiring a one-time IAM role setup in customer accounts for read-only access. *Tag-based monitoring is emphasized as a best practice, with automation to identify missing tags.* The solution uses a single instance to monitor multiple accounts and regions.
+
+Data consumption occurs via event dashboards, topology views, and performance dashboards. The solution is being developed in collaboration with the product R&D team, with new reporting features expected in the next release. The demo showcased event perspectives, performance dashboards, and topology views, highlighting event details, historical usage, and hierarchical resource presentation. The operational model's impact on application teams was discussed, including data feedback, OpsBridge expertise, and outage detection capabilities.
+

 ---

--- a/SRE/04_EKS/ctp-topic-29-cloud-monitoring-saas-lz-accounts.md.bak
+++ b/SRE/04_EKS/ctp-topic-29-cloud-monitoring-saas-lz-accounts.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 29 Cloud Monitoring – SaaS LZ accounts
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - Monitoring
+  - SaaS
+  - Landing-Zone
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 29_ Cloud Monitoring – SaaS LZ accounts.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 29 Cloud Monitoring – SaaS LZ accounts
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 29_ Cloud Monitoring – SaaS LZ accounts.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-39-implementing-eks-in-the-aws-lab-landing-zone.md
+++ b/SRE/04_EKS/ctp-topic-39-implementing-eks-in-the-aws-lab-landing-zone.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 39 Implementing EKS in the AWS Lab Landing Zone"
+title: CTP Topic 39 Implementing EKS in the AWS Lab Landing Zone
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - EKS
@@ -10,9 +10,9 @@ tags:
  - Landing-Zone
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 39_ Implementing EKS in the AWS Lab Landing Zone.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 39_ Implementing EKS in the AWS Lab Landing Zone.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 39 Implementing EKS in the AWS Lab Landing Zone
@@ -27,7 +27,21 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> Spencer and Guy discuss implementing Elastic Kubernetes Service (EKS) in the AWS landing zone, focusing on a use case with Octane, a Microfocus SaaS application that is IP-hungry. They faced challenges with the limited range of IP addresses in AWS labs run on the Microfocus network.
+
+The solution involved creating a private subnet within their own space, not connected to the main subnet, to provide a large number of IPs for EKS to use. *The problem was was that this wasn't supported in the EKS sort of solution that was given to us.* They utilized Terraform and Terragrunt modules to create the lab, working with SRE to enable EKS to create its own subnet and use its own IPs within each pod.
+
+Key points:
+*   The EKS module has a flag for custom networking configuration to control IP allocation.
+*   They demonstrated how to call the EKS module within Terraform code, specifying the subnet and mappings between federated accounts/roles.
+*   They showed how to access the EKS cluster, get pods, and access both internal Microfocus network resources and external resources from within a pod.
+*   *Within the spec configuration, we basically have to put host network equals true.*
+*   They addressed a question about container hardening guidelines, explaining that they had discussions with security teams and implemented strong security measures.
+*   They mentioned that AWS may have contributed to the idea of this solution.
+*   Atlantis cannot currently deploy EKS clusters; a Terragrunt module on Jenkins is used instead.
+*   Mapping roles allows connection to the cluster and visibility of EKS components in the AWS console.
+*   The number of node groups is currently hardcoded but will be made configurable in future versions.
+

 ---

--- a/SRE/04_EKS/ctp-topic-39-implementing-eks-in-the-aws-lab-landing-zone.md.bak
+++ b/SRE/04_EKS/ctp-topic-39-implementing-eks-in-the-aws-lab-landing-zone.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 39 Implementing EKS in the AWS Lab Landing Zone
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - EKS
+  - Kubernetes
+  - Landing-Zone
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 39_ Implementing EKS in the AWS Lab Landing Zone.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 39 Implementing EKS in the AWS Lab Landing Zone
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 39_ Implementing EKS in the AWS Lab Landing Zone.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-42-grafana-observability-dashboard.md
+++ b/SRE/04_EKS/ctp-topic-42-grafana-observability-dashboard.md
@@ -1,17 +1,17 @@
 ---
-title: "CTP Topic 42 Grafana Observability dashboard"
+title: CTP Topic 42 Grafana Observability dashboard
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - Grafana
  - Observability
  - Dashboard
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 42_ Grafana_Observability dashboard.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 42_ Grafana_Observability dashboard.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 42 Grafana Observability dashboard
@@ -26,7 +26,28 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## Grafana Observability and Dashboards
+
+Grafana is an open-source web application used for data visualization through charts and dashboards. It supports various data sources, including metrics (CPU load, memory usage) and logs (timestamps, debug levels). Data producers like Jenkins, CA servers, and AWS CloudWatch inject data into these sources, which Grafana then visualizes. *Grafana does not exist differently data source by itself. It needs to be expressed from the data, all kinds of data sources.*
+
+The infrastructure architecture involves users accessing Grafana through a load balancer and auto-scaling groups. Grafana is installed in a monitoring account and configured to access other product team AWS accounts via IAM role policies. A Grafana monitoring role is assumed from a Terraform service catalog repo, granting access to various landing zone source accounts.
+
+Grafana offers user-level and team-level access controls, with roles like editor, viewer, and admin. Data sources are created with specific ARNs to access AWS accounts. Dashboards are dynamic, fetching data based on product team access. A sample dashboard includes CPU, I/O, network, EBS, and estimated charges monitoring. Alerting systems can be configured to notify channels like Microsoft Teams of high CPU usage or service downtime.
+
+### Terraform and Automation
+
+Terraform is used to automate Grafana resource provisioning. Modules exist for data sources and Grafana organizations. A demo scenario simulates onboarding Grafana for a new product group account using LZSAP. The process involves creating folders, calling modules, and using JSON input variables to define organization names and user access.
+
+Dashboards are provisioned with data sources and regions as inputs. Grafana offers flexibility in dashboard layout and data visualization. Product teams can leverage these modules and customize dashboards with application-specific logs or custom CloudWatch metrics.
+
+### Network Monitoring and Roadmap
+
+Network monitoring is achieved using Prometheus as a data source for checkpoint and firewall instances. A tool called norm is referenced to fetch metrics via the SNMP protocol. Key dashboards display packet in/out transfers, interface metrics, and CPU/disk usage.
+
+The roadmap includes implementing alerting and notification rules, refining network monitoring dashboards, building application-specific dashboards, and enabling product groups to consume Grafana Terraform modules. The goal is to replace Micro Focus tools with Grafana for end-to-end monitoring. *We would like to build application specific dashboards which can basically give us key insight with respect to our applications that are running over there.*
+
+Grafana offers open-source and paid versions (Grafana Enterprise and Grafana Cloud). User management is currently within the Grafana database but will move to LDAP or SSO.
+

 ---

--- a/SRE/04_EKS/ctp-topic-42-grafana-observability-dashboard.md.bak
+++ b/SRE/04_EKS/ctp-topic-42-grafana-observability-dashboard.md.bak
@@ -0,0 +1,51 @@
+---
+title: CTP Topic 42 Grafana Observability dashboard
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - Grafana
+  - Observability
+  - Dashboard
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 42_ Grafana_Observability dashboard.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 42 Grafana Observability dashboard
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 42_ Grafana_Observability dashboard.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-54-esm-saas-log-analytics.md
+++ b/SRE/04_EKS/ctp-topic-54-esm-saas-log-analytics.md
@@ -1,17 +1,17 @@
 ---
-title: "CTP Topic 54 ESM SaaS Log Analytics"
+title: CTP Topic 54 ESM SaaS Log Analytics
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - Log-Analytics
  - SaaS
  - ESM
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 54_ ESM SaaS Log Analytics.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 54_ ESM SaaS Log Analytics.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 54 ESM SaaS Log Analytics
@@ -26,7 +26,22 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## ESM SAS Log Analytics
+
+Jackie, an ITOM ESM SAS architect, discusses Log Analytics, covering concepts, architecture, regional setup, provisioning, security, and a demo of a counter solution. He also briefly compares different solutions.
+
+The presentation begins with an overview of the ELK stack (Elasticsearch, Logstash, Kibana) and its open-source alternative, OpenSearch. Applications collect logs via BEATS, which are then aggregated and processed by Logstash to give meaning to each column, before being stored in Elasticsearch or OpenSearch. Kibana is used as a front-end for log file visualization and analysis.
+
+*The application collects your log, it's called the BEATS.* The architecture involves two VPCs: one for the application and another for logging. Filebeat, running as a container, continuously ships logs from the application VPC to the logging VPC. Logstash processes these logs, and OpenSearch stores them. End users can view logs via Kibana, connecting from a specified network. Redis is used as an optional buffer to prevent Logstash overload.
+
+Due to legal reasons like GDPR, farms are split regionally, with farms in Oregon, the US, and Europe. Provisioning is done via CloudFormation or Terraform, but security hardening and continuous optimization pose challenges. Security measures include encryption at rest (using encrypted nodes and hardware-level encryption on NVMe devices) and in transit (using TLS 1.2). Traffic between VPCs is private, not over the internet. Index-based access control and RBAC are implemented for different user roles.
+
+A demo shows how to search for specific IDs or services within the logs. A comparison of solutions like Logz.io, AWS OpenSearch, self-hosted ELK, and Microfocus OBA is provided. Logz.io is a managed ELK solution, while OBA offers more mature commercial options with automated clustering. ELK is easy to configure but complex to manage, while OBA is more mature with commercial options. ELK supports fine-grained access control, while OBA supports column-level access control.
+
+Cost estimates are provided based on a single farm usage with 14 days retention and 100GB processed daily. Logz.io costs around $4,000, while AWS OpenSearch costs around $1,500 or less. Self-hosted options can be very low cost but require more maintenance. Availability SLAs vary, with Logz.io offering 99.8% and AWS OpenSearch offering 99.9%. Disaster recovery is covered by the vendor for Logz.io, while AWS OpenSearch automatically captures snapshots.
+
+Recommendations for starting with Log Analytics include beginning with Logz.io for its trial period, then transitioning to AWS OpenSearch or self-hosted options for more control. The presentation concludes with a Q&A session covering GDPR requirements, log acquisition, cost details, scaling, and comparisons to other solutions. *We have already built up all the farms.*
+

 ---

--- a/SRE/04_EKS/ctp-topic-54-esm-saas-log-analytics.md.bak
+++ b/SRE/04_EKS/ctp-topic-54-esm-saas-log-analytics.md.bak
@@ -0,0 +1,51 @@
+---
+title: CTP Topic 54 ESM SaaS Log Analytics
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - Log-Analytics
+  - SaaS
+  - ESM
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 54_ ESM SaaS Log Analytics.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 54 ESM SaaS Log Analytics
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 54_ ESM SaaS Log Analytics.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-59-achieving-reliability-with-amazon-eks.md
+++ b/SRE/04_EKS/ctp-topic-59-achieving-reliability-with-amazon-eks.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 59 Achieving reliability with Amazon EKS"
+title: CTP Topic 59 Achieving reliability with Amazon EKS
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - EKS
@@ -10,9 +10,9 @@ tags:
  - Reliability
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 59_ Achieving reliability with Amazon EKS.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 59_ Achieving reliability with Amazon EKS.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 59 Achieving reliability with Amazon EKS
@@ -27,7 +27,20 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## EKS Reliability with AWS
+
+Surav Paul, a Senior Solutions Architect from AWS, presented on EKS (Elastic Kubernetes Service), covering container offerings and reliability practices. The session aimed to be interactive, encouraging questions about shared responsibility models, reliability-based practices, application reliability, and data plane reliability.
+
+When considering container offerings on AWS, users can choose between Amazon Elastic Container Service (ECS) and Elastic Kubernetes Service (EKS). ECS is recommended for those starting their container adoption journey, offering a simple interface with native AWS service integrations. EKS is suitable for those familiar with the Kubernetes ecosystem, providing flexibility with open community initiatives. *ECS is a more AWS opinionated way of running containers.* Both ECS and EKS offer multiple compute options, including VM images, serverless deployments (AWS Fargate), and on-prem deployments.
+
+Reliability in a system means it offers predictable behavior even when failures occur. Key concerns include failure detection, graceful service degradation, deterministic failure modes, self-healing capabilities, and on-demand scaling. Reliability concerns are grouped under application, control plane, and data plane categories. The shared responsibility model dictates that AWS manages control plane components (state store, scheduler, controller manager, API servers), while customers manage aspects like worker nodes, operating systems, and application configurations. *With Fargate, you don't have to worry about managing the nodes or worrying about patching or upgrading the nodes.*
+
+Application reliability involves avoiding singleton pods and spreading application pods across availability zones using pod anti-affinity or topology spread constraints. Topology spread constraints offer finer-grained control over workload distribution. Collecting metrics via the metrics server is crucial for scaling, with HPA (Horizontal Pod Autoscaler) using CPU utilization and memory consumption by default, and custom/external metrics available. VPA (Vertical Pod Autoscaler) can right-size pods, but runtime adjustments cause restarts. Deployment strategies include rolling upgrades, blue-green deployments, and canary deployments, each with different levels of control and complexity. Liveness, readiness, and startup probes are essential for monitoring pod health, and pod disruption budgets ensure minimum service levels during maintenance.
+
+Control plane reliability involves monitoring control plane metrics (API server requests, HCT state store size) to prevent issues. Securing cluster authentication by creating a secure user with super admin role is crucial. Admission webhooks should be carefully configured and tested to avoid obstructing the control plane. Cluster upgrades have control plane and data plane phases, with EKS platform versions handling patch releases transparently. Minor version upgrades have a 14-month support cycle before automatic upgrades occur.
+
+Data plane reliability involves using tools like node problem detector, reserving system resources, implementing quality of service, and configuring resource quotas and limit ranges. Pod priority and control preemption are also important.
+

 ---

--- a/SRE/04_EKS/ctp-topic-59-achieving-reliability-with-amazon-eks.md.bak
+++ b/SRE/04_EKS/ctp-topic-59-achieving-reliability-with-amazon-eks.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 59 Achieving reliability with Amazon EKS
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - EKS
+  - Kubernetes
+  - Reliability
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 59_ Achieving reliability with Amazon EKS.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 59 Achieving reliability with Amazon EKS
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 59_ Achieving reliability with Amazon EKS.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-60-monitor-aws-using-hyperscale-observability-with-grafana.md
+++ b/SRE/04_EKS/ctp-topic-60-monitor-aws-using-hyperscale-observability-with-grafana.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 60 Monitor AWS using Hyperscale Observability with Grafana"
+title: CTP Topic 60 Monitor AWS using Hyperscale Observability with Grafana
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - Grafana
@@ -10,9 +10,9 @@ tags:
  - Hyperscale
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 60_ Monitor AWS using Hyperscale Observability with Grafana.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 60_ Monitor AWS using Hyperscale Observability with Grafana.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 60 Monitor AWS using Hyperscale Observability with Grafana
@@ -27,7 +27,20 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## Monitoring AWS Using Hyperscale Observability with Grafana
+
+This session is a continuation of a previous session about Grafana. It focuses on recent capabilities and features now available. Vinay covers the session, in place of Sashi, who is on leave.
+
+The session recaps previous discussions, including the effective use of Grafana with different data sources, creating queries, and customizing visualizations. Grafana's ability to provision infrastructure and applications using Terraform modules (dashboard as code) is highlighted, along with its use for SNMP-based network infrastructure monitoring. The move from the open-source version of Grafana to the enterprise license version is emphasized to leverage the full potential of Grafana.
+
+Key highlights explored through demonstrations include data source integration, event tracking, alert integrations, instance monitoring, and resource tracking. Optic DR, an internal monitoring solution and plugin of VaticaDB, is crucial for pulling data into Grafana dashboards. *Opsbridge monitoring solutions use a dashboard to display even triggered by the monitoring systems.* Grafana's alert system is flexible and can be configured to use different notification channels, with the ability to forward alerts to Opsbridge to create incidents. Instance monitoring helps identify resource utilization, and resource tagging categorizes resources for effective management.
+
+The session covers the use of a Terraform module for product teams, which creates Grafana organizations, users, folders, IAM roles, and dashboards for AWS services. *The product team can consume the modules by using sample telegram HCL file.* Default dashboards are provided for accounts onboarded to code, with prerequisites outlined in a readme file. Several default dashboards are offered to product teams, such as billing information dashboards that display resource utilization and EC2 dashboards that can be customized. Customized dashboards can consolidate all services into a single view, though this is typically limited to one account and one region.
+
+EC2 inventory dashboards, using data from Optic DR, provide a view of running and non-running EC2 instances and identify whether resources are tagged. Event dashboards display daily active events triggered by OpsBridge AWS monitoring solutions, with ongoing integration of alerts generated by Grafana. Future roadmap items include SSO authentication, reporting capabilities, URL monitoring, process monitoring, log monitoring, and integration with other products like PagerDuty and Slack Manager.
+
+The session concludes with a discussion of next steps and collaboration, encouraging users to leverage available dashboards and provide feedback or enhancement requests. The team also addresses questions about the cost impact of joining the service, clarifying that default metrics do not incur additional costs, but custom metrics may.
+

 ---

--- a/SRE/04_EKS/ctp-topic-60-monitor-aws-using-hyperscale-observability-with-grafana.md.bak
+++ b/SRE/04_EKS/ctp-topic-60-monitor-aws-using-hyperscale-observability-with-grafana.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 60 Monitor AWS using Hyperscale Observability with Grafana
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - Grafana
+  - Observability
+  - Hyperscale
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 60_ Monitor AWS using Hyperscale Observability with Grafana.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 60 Monitor AWS using Hyperscale Observability with Grafana
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 60_ Monitor AWS using Hyperscale Observability with Grafana.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-64-scaling-out-with-amazon-eks.md
+++ b/SRE/04_EKS/ctp-topic-64-scaling-out-with-amazon-eks.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 64 Scaling out with Amazon EKS"
+title: CTP Topic 64 Scaling out with Amazon EKS
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - EKS
@@ -10,9 +10,9 @@ tags:
  - Scaling
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 64_ Scaling out with Amazon EKS.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 64_ Scaling out with Amazon EKS.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 64 Scaling out with Amazon EKS
@@ -27,7 +27,26 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## Scaling Out with Amazon EKS
+
+The 64th Cloud Transformation Program session covers scaling out with Amazon EKS, with a special guest presenter from AWS. The session is interactive and encourages questions, with a survey link to be shared for feedback.
+
+Suravpul, a senior solutions architect from AWS, discusses scaling workloads using the horizontal pod autoscaler (HPA), event-driven autoscaling with KEDA, capacity autoscaling (cluster autoscaler and Carpenter), addressing IP exhaustion, and scaling cluster components like DNS.
+
+The horizontal pod autoscaler (HPA) is the standard Kubernetes mechanism for scaling application workloads, using metrics to determine replica requirements. It supports CPU and memory utilization out of the box via a metrics server. Custom and external metrics, such as those from load balancers or messaging middleware, can also be used. *The horizontal pod autoscaler is going to pull the metrics and it is going to calculate how many replicas are required for your application workload.* The speaker notes that the gap between the target threshold and 100% utilization is important, and addresses flapping via period seconds and stabilization window seconds settings. HPA currently considers resource consumption only at the pod level, not at the container level.
+
+KEDA allows scaling application workloads based on external events, using a custom resource definition called a scaled object. It can scale applications from zero replicas, or publish metrics for the horizontal pod autoscaler to use.
+
+Capacity autoscaling can be achieved using Fargate or EC2 instances. For EC2 instances, cluster autoscaler or Carpenter can be used. Cluster autoscaler is tied to auto scaling groups and node groups, updating the desired capacity of the auto scaling group based on the number of pending pods. It considers CPU and memory requests, and supports mixed instances policies. *The scaling decision that is made by the cluster auto scaler, it is done on the number of pending pods in the cluster.* Auto-discovery is recommended, and changes to min/max configuration should be made at the managed node group or auto scaling group level.
+
+Carpenter is an open-source Kubernetes native capacity auto scaler that directly interacts with the EC2 API, offering dynamic on-demand provisioning and improved speed. It does not depend on pre-configured node groups or auto scaling groups. Carpenter uses the concept of a provisioner to define requirements for EC2 instances, matched with workload requirements using node selectors and affinity terms. Reclamation is disabled by default, so TTL or cluster consolidation must be enabled. Carpenter is recommended for clusters with varying capacity and workload requirements.
+
+To address IP exhaustion, switching to IPv6 addressing is recommended. If not possible, custom networking can be used with carrier-grade NAT. For IPv6, a dual-stack VPC is recommended, with nodes supporting dual-stack IP addresses but pods having only IPv6 addresses. Interaction between IPv6 pods and IPv4 destinations is configured by utilizing matting at two different layers.
+
+Additional considerations for scaling include enabling API server priority and fairness metrics, enabling caching and disabling compression, removing underutilized nodes, and limiting scaling spikes. Scaling the DNS component (CoreDNS) and installing node local DNS cache are also important.
+
+The presentation concludes by recommending the EKS best practices guides, specifically the scalability section.
+

 ---

--- a/SRE/04_EKS/ctp-topic-64-scaling-out-with-amazon-eks.md.bak
+++ b/SRE/04_EKS/ctp-topic-64-scaling-out-with-amazon-eks.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 64 Scaling out with Amazon EKS
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - EKS
+  - Kubernetes
+  - Scaling
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 64_ Scaling out with Amazon EKS.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 64 Scaling out with Amazon EKS
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 64_ Scaling out with Amazon EKS.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-67-cloud-native-observability-using-opentelemetry.md
+++ b/SRE/04_EKS/ctp-topic-67-cloud-native-observability-using-opentelemetry.md
@@ -1,17 +1,17 @@
 ---
-title: "CTP Topic 67 Cloud native observability using OpenTelemetry"
+title: CTP Topic 67 Cloud native observability using OpenTelemetry
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - OpenTelemetry
  - Observability
  - Cloud-Native
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 67_ Cloud native observability using  OpenTelemetry.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 67_ Cloud native observability using  OpenTelemetry.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 67 Cloud native observability using OpenTelemetry
@@ -26,7 +26,14 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> Surav from AWS presented a session on observability for Amazon EKS, covering the need for observability, code instrumentation using open telemetry, defining pipelines, AWS Distro for Open Telemetry collector deployment patterns, and observability deployment options on EKS and ECS.
+
+Observability is essential for managing complexity as systems evolve. *Building observable applications is a developer responsibility.* Key signals to collect include traces, metrics, and logs, enabling reactive and proactive troubleshooting. AWS offers native options like CloudWatch and X-Ray, alongside open-source solutions such as Yeager, Zipkin, Prometheus, and Grafana, either self-hosted or managed. The AWS Distro for Open Telemetry (ADOT) is a secure, production-ready solution with AWS-developed components, offering support for operational issues.
+
+Open Telemetry provides a vendor-agnostic instrumentation library, simplifying code instrumentation. The Open Telemetry collector uses receivers, processors, and exporters to manage signals. Receivers collect signals, processors transform them, and exporters send them to destinations. *A trace captures the processing time taken at individual layers in your application call stack.* ADOT includes the AWS SIG V4 extension for seamless integration with AWS services. Collecting metrics from both application and infrastructure layers allows comprehensive application views, including business-level metrics, service maps from X-Ray traces, and application logs. Correlation IDs, like the X-ray trace ID, enable deep links to trace views from log events.
+
+ADOT is a repackaged Open Telemetry collector with AWS-developed components. It offers receivers like Prometheus and X-ray, processors like batch and filter, and exporters like X-ray, CloudWatch, Prometheus, and EMF. In ECS deployments, the AWS ECS container metrics receiver collects infrastructure metrics, while the Prometheus remote write exporter sends metrics to Prometheus. The SIGV4 Auth extension is used for AWS API calls. ADOT can be deployed as a sidecar container or a separate task, with configurations for scraping targets and defining pipelines. Deployment patterns include sidecar, separate task, demon set, and high-availability replicas. The ADOT add-on for EKS simplifies deployment with an operator and Terraform module, including prebuilt Grafana dashboards. Costs depend on the destination service, such as metric storage for Prometheus or trace ingestion for X-ray. An observability workshop and best practices site offer further guidance.
+

 ---

--- a/SRE/04_EKS/ctp-topic-67-cloud-native-observability-using-opentelemetry.md.bak
+++ b/SRE/04_EKS/ctp-topic-67-cloud-native-observability-using-opentelemetry.md.bak
@@ -0,0 +1,51 @@
+---
+title: CTP Topic 67 Cloud native observability using OpenTelemetry
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - OpenTelemetry
+  - Observability
+  - Cloud-Native
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 67_ Cloud native observability using  OpenTelemetry.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 67 Cloud native observability using OpenTelemetry
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 67_ Cloud native observability using  OpenTelemetry.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-70-eks-deployment-using-iac.md
+++ b/SRE/04_EKS/ctp-topic-70-eks-deployment-using-iac.md
@@ -1,8 +1,8 @@
 ---
-title: "CTP Topic 70 EKS deployment using IAC"
+title: CTP Topic 70 EKS deployment using IAC
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - EKS
@@ -10,9 +10,9 @@ tags:
  - Kubernetes
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 70_ EKS deployment using IAC.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 70_ EKS deployment using IAC.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 70 EKS deployment using IAC
@@ -27,7 +27,31 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## EKS Deployment Using Infrastructure As Code
+
+This session covers EKS cluster deployment via Infrastructure as Code (IAC), focusing on managing containers and worker nodes using the SRE EKS module. Key capabilities include cluster autoscaling, ingress controller, and custom networking. The agenda includes comparing containers and VMs, discussing EKS features, and demonstrating EKS deployment via Terraform and Service Catalog. Monitoring the EKS stack and containers for proactive alerting is also covered.
+
+The discussion begins with the differences between VMs and containers, highlighting the benefits of containers such as reduced boot time, memory efficiency, and portability. Kubernetes is presented as a framework for running distributed systems resiliently, automating rollouts/rollbacks, load balancing, and horizontal pod scaling.
+
+EKS, a managed Kubernetes service by Amazon, offers features like fully managed control planes and autoscaling worker nodes. *Zero downtime rolling deployments for worker node updates* and IAM RBAC mapping for least privilege access are implemented. The SRE EKS module integrates an ALB ingress controller for traffic management and EMI custom networking for pods to handle CIDR limitations.
+
+### Deployment Methods
+
+Two deployment methods are detailed:
+
+1.  **Terraform:** Using a `tera-grant.scl` file, users can define environment variables, EKS cluster version, and worker node types (CPU, GPU, or default). Integration with AWS Secret Manager is included for engineering contact notifications.
+2.  **Service Catalog:** This method allows users to create EKS clusters via a module with version selection and worker node type configuration. It provides more control over security and permissions.
+
+*Service Catalog allows creating, organizing, and governing AWS resources with permission control.*
+
+### Custom Networking and Autoscaling
+
+Custom networking for pods addresses CIDR limitations by adding a virtual EMI to assign IP addresses to pods. The Kubernetes cluster autoscaler automatically scales worker nodes based on resource needs. Future implementation of Carpenter is being considered for more efficient instance type creation based on pod requirements.
+
+### Monitoring
+
+Monitoring is achieved using CloudWatch agent and FluentBit deployed as demon sets. Container Insights needs to be enabled to publish metrics to CloudWatch. The process involves applying manifest files within the cluster to set up CloudWatch logs and metrics. AWS Open Telemetry can also be used for monitoring. Centralized Grafana instances are available for visualizing metrics via templated dashboards, including an EKS-specific dashboard.
+

 ---

--- a/SRE/04_EKS/ctp-topic-70-eks-deployment-using-iac.md.bak
+++ b/SRE/04_EKS/ctp-topic-70-eks-deployment-using-iac.md.bak
@@ -0,0 +1,52 @@
+---
+title: CTP Topic 70 EKS deployment using IAC
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - EKS
+  - IaC
+  - Kubernetes
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 70_ EKS deployment using IAC.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 70 EKS deployment using IAC
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 70_ EKS deployment using IAC.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/ctp-topic-8-implementation-of-cloud-monitoring-using-micro-focus-operations-brid.md
+++ b/SRE/04_EKS/ctp-topic-8-implementation-of-cloud-monitoring-using-micro-focus-operations-brid.md
@@ -1,17 +1,17 @@
 ---
-title: "CTP Topic 8 Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol"
+title: CTP Topic 8 Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol
 type: cloud-learning
 source-type: video
-category: "DevOps & SRE/04_EKS"
+category: DevOps & SRE/04_EKS
 tags:
  - AWS
  - Monitoring
  - Observability
  - CTP
 date-added: 2026-04-14
-video-source: "nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 8_ Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol.mp4"
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 8_ Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol.mp4
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # CTP Topic 8 Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol
@@ -26,7 +26,16 @@ status: raw

 ## 摘要

-> 待转录后由 LLM 生成
+> ## Cloud Monitoring Using OBM Implementation
+
+The session covers the implementation of cloud monitoring using Microfocus's Operations Bridge Manager (OBM), a solution designed to address gaps in existing monitoring systems like Sitescope, especially with the increasing shift towards public cloud environments. OBM offers a dynamic monitoring solution for AWS core services, enhanced security, and improved dynamic capabilities compared to Sitescope.
+
+The current architecture involves data collection from various sources (infrastructure, servers, applications, hardware, and networks) using data collectors like Sitescope, HPCM, and norm, feeding into regional OBMs. These regional OBMs then send data to a global OBM, which acts as a manager of managers. The global OBM integrates with smacks, enabling the OSE team to escalate and create tickets for events. A new regional OBM setup is planned for AWS cloud monitoring in a lab landing zone environment in Frankfurt. The OBM account will be part of the digital factory landing zone, interacting with core accounts like shared, logs, and security accounts. The regional OBM collects data from different AWS accounts through an operation agent and CloudWatch API, forwarding it to the on-premise global OBM.
+
+The architecture includes an OBM AWS account with an OBM application, a Postgres RDS database, and a separate instance with an operation agent. The operation agent collects data using OBM management packs, specifically the AWS management pack, which instructs the agent to gather data from different accounts. *The agent uses role-based access to collect data from CloudWatch API, eliminating the need to install servers in customer accounts and share sensitive access keys.* The management pack solution uses policies to define monitoring intervals, specific metrics, and data collection from specific accounts, matching data against thresholds to trigger events. *Whenever new instances are added, policies are automatically deployed, and monitoring begins, offering dynamic monitoring capabilities.*
+
+For onboarding new customers, an IAM role with CloudWatch read-only access needs to be created, and the AWS account where the OBM and operation agent reside must be added to the trust relationship tab. The role ARN is then added as a policy in the OBM account's IAM role, attached to the agent node. The process involves specifying the role ARN, account ID, namespaces/services to be monitored, metrics, thresholds, monitoring frequency, and title format. The title format is enriched to provide useful information for the service center team, facilitating escalation and runbook execution. CloudWatch custom metrics can be used for metrics not exposed by default. The OBM management pack solution can monitor any public cloud vendor (Amazon, Azure, Google Cloud) and any AWS service with data exposed to CloudWatch metrics, using both metrics and logs. The solution is dynamic and customizable, with all data collected from the OBM account without requiring any installations in customer accounts.
+

 ---

--- a/SRE/04_EKS/ctp-topic-8-implementation-of-cloud-monitoring-using-micro-focus-operations-brid.md.bak
+++ b/SRE/04_EKS/ctp-topic-8-implementation-of-cloud-monitoring-using-micro-focus-operations-brid.md.bak
@@ -0,0 +1,51 @@
+---
+title: CTP Topic 8 Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol
+type: cloud-learning
+source-type: video
+category: DevOps & SRE/04_EKS
+tags:
+  - AWS
+  - Monitoring
+  - Observability
+  - CTP
+date-added: 2026-04-14
+video-source: nas:///volume2/work/Public Cloud Learning Sessions/CTP _ Topic 8_ Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol.mp4
+audio-source: ""
+status: raw
+---
+
+# CTP Topic 8 Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/CTP _ Topic 8_ Implementation of Cloud monitoring using Micro Focus Operations Bridge Monitoring Sol.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-1-of-3-compute-optimization.md
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-1-of-3-compute-optimization.md
@@ -11,7 +11,7 @@ tags:
 date-added: 2026-04-14
 video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204_170113-Meeting Recording.mp4"
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204 170113-Meeting Recording
@@ -24,28 +24,37 @@ status: raw

 ---

-## 摘要
+## EKS Optimization with Carpenter

-> 待转录后由 LLM 生成
+This session introduces Carpenter, an open-source compute infrastructure management tool for Kubernetes clusters, addressing challenges associated with the traditional Cluster Autoscaler. Carpenter offers native integration with Kubernetes, direct EC2 fleet API communication, and intelligent workload placement and consolidation based on cost and utilization.

---
+Key differences between Carpenter and Cluster Autoscaler:
+*   Carpenter integrates with Kubernetes workload scheduling constructs.
+*   It directly communicates with the EC2 fleet API, reducing latency.
+*   It provides native experiences for workload placement and node consolidation.

-## 关键概念
+Two core components of Carpenter: node pools and node classes. Node pools define scheduling constraints and capacity limits, while node classes define instance provisioning details like subnets, node roles, and AMIs.

-
+Carpenter supports Kubernetes scheduling constraints like node selectors, affinity, taints, tolerations, and topology spread, along with AWS placement requirements such as purchasing options, processor architectures, and availability zones. It can identify zonal requirements based on volume claims and storage classes, simplifying workload definitions compared to Cluster Autoscaler.

---
+_*Carpenter has native integration with Kubernetes and it complements the native Kubernetes spot pod scheduling constraints that is available for your workloads.*_

-## 行动项
+Carpenter natively supports spot interruptions without requiring additional components like the node termination handler. It uses EventBridge and SQS to handle spot interruption notifications, instance rebalance notifications, health events, and instance state change events.

-
+Node pools can be designed for various scenarios, including single node pools, mixed compute/accelerated nodes, or isolated node pools based on cost, security, or multi-tenancy. Weighted node pools can prioritize instances based on existing commitments or reservations.

---
+Carpenter simplifies data plane management by removing pain points associated with node groups, integrating node termination handlers, and providing native integration with Kubernetes scheduling constraints. It also helps consolidate compute instances for greater cost efficiency.

-## 相关视频
+_*Carpenter not only does the auto-scaling bit, but it also removes the pain points of working with node groups.*_

-> 配对视频笔记链接（生成后填入）
+Carpenter can automatically upgrade AMIs or use defined AMIs, referring to the parameter store for the latest EKS optimized AMIs for the corresponding control plane version. It identifies drifts between the desired state and running machines, rolling out changes in a rolling upgrade fashion.

---
+AMI selection can be pinned to specific versions or use custom AMIs. The AMI family setting tells Carpenter what user data to inject when spinning up instances.

-*最后更新: 2026-04-14*
+Consolidation policies can be configured with fine-grained budgets, such as preventing consolidation during peak business hours or limiting the percentage of instances disrupted at a time.
+
+Carpenter publishes logs and emits Prometheus metrics for observability, with community-maintained dashboards available for visualization.
+
+Onboarding is simple, requiring Carpenter to be deployed on nodes not managed by Carpenter, such as a small node group or Fargate instances. Migration guides are available for migrating from Cluster Autoscaler.
+
+The session is the first in a series of three, with subsequent sessions covering the Bottlerocket operating system and EKS Auto Mode.
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-1-of-3-compute-optimization.md.bak
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-1-of-3-compute-optimization.md.bak
@@ -0,0 +1,51 @@
+---
+title: "Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204 170113-Meeting Recording"
+type: cloud-learning
+source-type: video
+category: "DevOps & SRE/04_EKS"
+tags:
+  - AWS
+  - EKS
+  - Karpenter
+  - Cost-Optimization
+date-added: 2026-04-14
+video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204_170113-Meeting Recording.mp4"
+audio-source: ""
+status: raw
+---
+
+# Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204 170113-Meeting Recording
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 1 of 3 - Compute Optimization with Karpenter - 20250204_170113-Meeting Recording.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-2-of-3-running-containers-w.md
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-2-of-3-running-containers-w.md
@@ -11,7 +11,7 @@ tags:
 date-added: 2026-04-14
 video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218_170127-Meeting Recording.mp4"
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218 170127-Meeting Recording
@@ -24,28 +24,12 @@ status: raw

 ---

-## 摘要
+## EKS Optimization: Running Containers with Water Rocket OS

-> 待转录后由 LLM 生成
+This session focuses on Water Rocket OS and its benefits for running containerized workloads in EKS. Water Rocket is a Linux-based operating system designed specifically for hosting containers, differing from general-purpose OSes by including only essential components. It is free, open-source, and maintained on GitHub, with AWS as a core maintainer and sponsor. Water Rocket can be run on laptops, workstations, or in data centers, and is designed to be minimal, enforce safe updates, and be security-focused.

---
+Water Rocket is minimal because it lacks unnecessary software, drivers, and tools. It does not include a package manager, default shell interpreter, or default SSH access. Only essential kernel components are packaged into the OS image during build time. To accommodate specific workload needs like GPU resources, Water Rocket uses variants, which are combinations of platform, processor architecture, and necessary binary components. These variants are built with specific packages, drivers, and tools included. *A variant is basically a combination of platform, supported platform, the processor architecture and the necessary binary components that are supported by the processor architecture and any additional packages and drivers that are required for your specific workloads.* Configuration is managed through an API interface or Toml-formatted user data.

-## 关键概念
+Safe updates are enforced through in-place updates and node replacement. In-place updates involve downloading a new image version to an inactive partition and switching the active partition upon reboot, ensuring system consistency. The data volume caches container images and can be pre-populated with images via snapshots. Security is enhanced through secure boot, cryptographic verification of the root file system using dm-verity, and an immutable root file system. The `/etc` directory is a temporary file system, and SE Linux is enabled by default in enforcing mode. *The root file system is by default immutable, you cannot change anything there.* Bottle Rocket has a dedicated CIS benchmark for hardening, and comprehensive security guidance is available.

-
-
---
-
-## 行动项
-
-
-
---
-
-## 相关视频
-
-> 配对视频笔记链接（生成后填入）
-
---
-
-*最后更新: 2026-04-14*
+Water Rocket integrates with EKS through optimized variants and is supported across self-managed node groups, managed node groups, and Carpenter node pools. It can be configured using tools like EKS Cuddle and Carpenter, with best practices including pinning the AMI to a specific version.
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-2-of-3-running-containers-w.md.bak
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-2-of-3-running-containers-w.md.bak
@@ -0,0 +1,51 @@
+---
+title: "Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218 170127-Meeting Recording"
+type: cloud-learning
+source-type: video
+category: "DevOps & SRE/04_EKS"
+tags:
+  - AWS
+  - EKS
+  - Bottlerocket
+  - OS
+date-added: 2026-04-14
+video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218_170127-Meeting Recording.mp4"
+audio-source: ""
+status: raw
+---
+
+# Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218 170127-Meeting Recording
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 2 of 3 - Running Containers with Bottlerocket OS - 20250218_170127-Meeting Recording.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-3-of-3-introduction-to-eks-.md
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-3-of-3-introduction-to-eks-.md
@@ -10,7 +10,7 @@ tags:
 date-added: 2026-04-14
 video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304_170115-Meeting Recording.mp4"
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304 170115-Meeting Recording
@@ -23,28 +23,20 @@ status: raw

 ---

-## 摘要
+## EKS Optimization: Introduction to EKS Auto Mode

-> 待转录后由 LLM 生成
+This session focuses on EKS Auto Mode, the third part of a series on EKS optimization. EKS Auto Mode extends the management responsibilities of the EKS service to the data plane, managing instances, operating systems, patches, and security updates. It leverages core capabilities like Carpenter for infrastructure management, a managed EBS CSI driver for stateful workloads, and the AWS load balancer controller.

---
+Key benefits of EKS Auto Mode include increased agility, automatic consolidation, dynamic instance determination, and optimized compute costs. *With Auto Mode, a majority of the operational concerns are being managed by the ECS service.* Core capabilities are managed within instances provisioned inside the EKS account, while customers retain control over VPC infrastructure, cluster configuration, add-ons, and workload configurations.

-## 关键概念
+EKS Auto Mode offers an easier interface for working with EKS, providing data plane management in addition to control plane management. It supports a wide range of EC2 instances (excluding bare metal) and is fully compatible with Kubernetes-compliant workloads. Security is enhanced through the use of the Bottle Rocket operating system and automated patch management. The core cluster capabilities are grouped under compute (Carpenter controller), networking (AWS load balancer controller), storage (EBS CSI controller), and security (pod identity associations).

-
+By default, Auto Mode includes two node pools (general purpose and system) and one node class. The default node pools are immutable and configured with zero weight, allowing custom node pools to be prioritized. The general purpose node pool is locked to AMD64 architecture, while custom node pools can be defined for Graviton instances. Instances in the system node pool have a taint applied, requiring corresponding tolerations for system add-ons.

---
+Networking in Auto Mode includes Core DNS packaged with every node as a system service, VPCCNI as a system service, and Kube proxy set up in IP tables mode. Prefix delegation is enabled by default. The AWS load balancer controller is available as a core capability, using an EKS Auto Mode-specific load balancer class. The packaged CSI controller requires a storage class referring to the EBS CSI EKS provisioner.

-## 行动项
+Version upgrades in Auto Mode are initiated by an operator for the control plane. *Once the control plane version gets upgraded, then the compute controller, which is running as a core capability, will identify that the control plane version has changed and it will try to pull the current AMI version for that new control plane version.* The compute controller then rolls out the new AMI across the cluster through a rolling upgrade.

-
+While the controllers are managed by the EKS service, users can investigate custom resources and deploy node diagnostic CRDs. Observability can be achieved through CloudWatch agent, AWS distro for open telemetry, or other collectors.

---
-
-## 相关视频
-
-> 配对视频笔记链接（生成后填入）
-
---
-
-*最后更新: 2026-04-14*
+For every instance spun up in an Auto Mode cluster, there is a 12% premium charged for the automatic management of those instances.
--- a/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-3-of-3-introduction-to-eks-.md.bak
+++ b/SRE/04_EKS/public-cloud-learning-sessions-eks-optimization-part-3-of-3-introduction-to-eks-.md.bak
@@ -0,0 +1,50 @@
+---
+title: "Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304 170115-Meeting Recording"
+type: cloud-learning
+source-type: video
+category: "DevOps & SRE/04_EKS"
+tags:
+  - AWS
+  - EKS
+  - Auto-Mode
+date-added: 2026-04-14
+video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304_170115-Meeting Recording.mp4"
+audio-source: ""
+status: raw
+---
+
+# Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304 170115-Meeting Recording
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions - EKS Optimization part 3 of 3 - Introduction to EKS Auto Mode - 20250304_170115-Meeting Recording.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*
--- a/SRE/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md
+++ b/SRE/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md
@@ -9,7 +9,7 @@ tags:
 date-added: 2026-04-14
 video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402_160113-Meeting Recording.mp4"
 audio-source: ""
-status: raw
+status: summarized (Gemini 摘要)
 ---

 # Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402 160113-Meeting Recording
@@ -22,28 +22,22 @@ status: raw

 ---

-## 摘要
+## Observability with Open Telemetry

-> 待转录后由 LLM 生成
+Jay Comer, Solutions Architect with AWS, presented an overview of observability with OpenTelemetry, including changes and updates within the AWS observability ecosystem since the last session a year ago. The session included a demo showing how to piece together the components and how to instrument an application with OpenTelemetry.

---
+Observability is defined as *a measure of how well internal states of a system can be inferred from knowledge of its external outputs.* These outputs include logs, metrics, and traces, which are correlated with the application's health. As systems transition to micro-service-based architectures, the observability challenge becomes more prominent due to increasing complexity. Downtime can cost significant money and effort, with Gartner estimating an average of 87 hours per year of downtime, costing $42,000 per hour.

-## 关键概念
+The three signals used for observability are metrics, logs, and traces. Metrics are aggregated source statistics, logs help determine the root cause of problems, and traces provide a holistic view of a specific request within the system. A trace span includes a start time, a duration, and metadata such as a log.

-
+The AWS observability landscape includes AWS native services like CloudWatch and X-Ray, as well as managed services of open-source implementations like Grafana, OpenSearch, Prometheus, and OpenTelemetry. OpenTelemetry aims to solve the problem of disparate SDKs and tooling for different components within the observability landscape by providing an instrumentation language with different SDKs per language. It offers an end-to-end implementation for making telemetry data accessible and usable and is vendor-agnostic.

---
+OpenTelemetry is a data format with support for 11 language SDKs and automates instrumentation. The OpenTelemetry collector standardizes and transforms data into the OpenTelemetry protocol (OTLP) format and exports it to different destinations. The collector includes receivers (AWS-specific or open source), processors (filtering, transformations), exporters (AWS native, open source, or third-party), and extensions (SIGV for authorization, health check).

-## 行动项
+The AWS distribution for OpenTelemetry is a unified agent for collecting traces, metrics, and logs. It includes an operator that automatically instruments applications by detecting the language used and creating pre-configured OpenTelemetry collectors. Custom attributes, such as tenant IDs, can be added to OpenTelemetry items.

-
+Recent announcements focused on security and compliance, scale and region expansion, and a centralized pane of glass with an improved user experience. The managed service collector for Amazon Prometheus provides a serverless, agentless scraper that automatically discovers and pulls Prometheus-compatible metrics. Log support was added to the AWS distribution for OpenTelemetry, and Amazon Managed Grafana now supports community plugins.

---
+The demo showcased a sample application running on EKS, using Fluent Bit for collecting logs and forwarding them to the OpenTelemetry container. The OpenTelemetry container collects traces and metrics from the application, sending logs, traces, and metrics to Amazon OpenSearch Service via an ingestion pipeline. The source code included Fluent Bit and OpenTelemetry YAML configuration files. *The output that Fluent Bit is sending the individual logs to is the Open Telemetry endpoint on the port 55681.* On a code level, the implementation involves importing OpenTelemetry SDKs, configuring a trace provider, and starting a span with the tracer at each point where instrumentation and request duration measurement are needed.

-## 相关视频
-
-> 配对视频笔记链接（生成后填入）
-
---
-
-*最后更新: 2026-04-14*
+OpenSearch dashboards can display latency by trace group and an application composition map, showing where bottlenecks are appearing.
--- a/SRE/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md.bak
+++ b/SRE/04_EKS/public-cloud-learning-sessions-observability-with-opentelemetry-20240402-160113-.md.bak
@@ -0,0 +1,49 @@
+---
+title: "Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402 160113-Meeting Recording"
+type: cloud-learning
+source-type: video
+category: "DevOps & SRE/04_EKS"
+tags:
+  - OpenTelemetry
+  - Observability
+date-added: 2026-04-14
+video-source: "nas:///volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402_160113-Meeting Recording.mp4"
+audio-source: ""
+status: raw
+---
+
+# Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402 160113-Meeting Recording
+
+**Source:** NAS `/volume2/work/Public Cloud Learning Sessions/Public Cloud Learning Sessions- Observability with OpenTelemetry - 20240402_160113-Meeting Recording.mp4`
+
+**Type:** VIDEO | **Category:** 04_EKS
+
+**Status:** 🟡 Awaiting Whisper transcription → Summary
+
+---
+
+## 摘要
+
+> 待转录后由 LLM 生成
+
+---
+
+## 关键概念
+
+-
+
+---
+
+## 行动项
+
+-
+
+---
+
+## 相关视频
+
+> 配对视频笔记链接（生成后填入）
+
+---
+
+*最后更新: 2026-04-14*