Files
nexus/#recycle/Work/CSD/ESM Cloud Application Troubleshooting as a Service.md
2026-03-23 20:57:45 +08:00

2.4 KiB

title, source, author, published, created, description, tags
title source author published created description tags
Shen Wei

Introduction

The main purpose of this document is to help non-Cloud Ops team members better understand the various services and tools currently provided for Cloud Application troubleshooting, so that they can be used flexibly in different scenarios and reduce dependence on Cloud Ops engineers. Our goal is also very clear. We hope to provide a more efficient DevOps ecosystem to provide better services to our customers.

Please note that the various services and tools mentioned below require approval and authorization, and are currently limited to members of the Cloud Ops and R&D CPE teams

Troubleshooting as a Service

Access Environment as a Service

Access to Customer Tenant

We provide a method to enter the customer's tenant so that when doing troubleshooting, you can directly access the customer's environment to check the problem and understand the symptoms of the problem at the first time, so as to make the right judgment.

Access to ESM Farm BO, IDM, UCMDB JMX console

We provide a method to apply for temporary user access to each farm management console

  • BO Suite Admin
  • ESM IDM Admin
  • UCMDB Super Admin to UCMDB JMX Console

Log Collection as a Service

We provide a very comprehensive log collection automation tool. Collect log information of a specific module within a specific time period. Users can select appropriate filtering conditions to collect logs according to different scenarios, so as to locate problems more accurately and reduce extra effort caused by excessive log size.

Check Configuration

Monitoring as a Service

Unified Monitoring via pre-defined Grafana Dashboard

We provide a lot of rich implementation monitoring data for various troubleshooting. Currently we use Grafana as the monitoring UI to reflect the monitoring data of farm implementation:

  • AWS Cloud Watch Data Source - Able to have real-time infrastructure monitoring (AWS EKS/EFS/RDS)
  • Prometheus Data Source - Able to check real-time application level metrics exposed by Prometheus
  • Database query Data Source - Get some key indicators of the application through database query
  • Containerize/K8S - Able to monitor the key monitoring data of the containerize product, container/node/pod etc.

Service Availability Health Page

Log Analysis as a Service

BI Reporting as a Service

Unplanned Change Request as a Service

Other Services