diff --git a/knowledgebase/csd-wiki/ICSD/APM-Monitoring_686073667.md b/knowledgebase/csd-wiki/ICSD/APM-Monitoring_686073667.md deleted file mode 100644 index f31c837c..00000000 --- a/knowledgebase/csd-wiki/ICSD/APM-Monitoring_686073667.md +++ /dev/null @@ -1,15 +0,0 @@ -# APM-Monitoring_686073667 -1. [ITOM Cloud Service Delivery](index.html) -2. [ITOM Cloud Service Delivery](ITOM-Cloud-Service-Delivery_681555087.html) -3. [💠3 - Operation & Maintenance](682933064.html) - -Created by, last modified by Wei Shen on Feb 08, 2025 EST - -- [HCMX APM Monitoring Business Flow](HCMX-APM-Monitoring-Business-Flow_686073715.html) -- [OO APM Monitoring Business Flow](OO-APM-Monitoring-Business-Flow_686073823.html) -- [SMAX APM Monitoring Business Flow](SMAX-APM-Monitoring-Business-Flow_686087711.html) -- [UCMDB APM Monitoring Business Flow](UCMDB-APM-Monitoring-Business-Flow_686073690.html) - -Document generated by Confluence on Sep 15, 2025 22:25 EDT - -[Atlassian](https://www.atlassian.com/) diff --git a/knowledgebase/csd-wiki/ICSD/AWS-Cognito-User-Creation_708224408.md b/knowledgebase/csd-wiki/ICSD/AWS-Cognito-User-Creation_708224408.md deleted file mode 100644 index 4358f24e..00000000 --- a/knowledgebase/csd-wiki/ICSD/AWS-Cognito-User-Creation_708224408.md +++ /dev/null @@ -1,70 +0,0 @@ -# AWS-Cognito-User-Creation_708224408 -## AWS Cognito users are used for authentication to login to the following Ops tools: - -- **SaaS Ops Jenkins Tool** - [https://saas-ops.itsma-ng.net/](https://saas-ops.itsma-ng.net/) -- **ESM SaaS System Health Page Ops Console** - [https://smax-health.saas.microfocus.com/ops](https://smax-health.saas.microfocus.com/ops) (Use this permission to support SaaS 911 case to publish the incident report for customer communication) -- **ESM Saas ELK Log Analysis (OpenSearch)** - Contains 14 days of logs. Currently available only for the following farms: US2, US6/EU8, EU30 (aviator) -- **Grafana Monitors for ESM SaaS Farms** - -CSD Ops team have the permission needed to create users in AWS Cognito. Currently, there are 3 user persona's: - -- CSD Ops team - admins -- CSD Ops team - team member -- Core CPE Team limited access - -To streamline the user creation process, follow the process below to create new users based on their persona. - -This process eliminates the need for back and forth and simplifies the new user onboarding. Basically, the Ops team will pre-create the user, login the first time, set the roles and also configure the account so the enduser performs a single step of reset password to gain access. - -## Create and Configure User - jenkins admin access needed - -1. Login to AWS console using your personal Ops team account. Access account: 361684190412 and set region to United States (Oregon). -2. Access AWS Cognito / User Pools - you should see the existing user pool: "notes-user-pool" Click on notes-user-pool, then click on Users on left menu. -3. Click "Create user" button: use any value for the password but write it down since you will need it in the next step. -4.  -5. Note down the new user id. You may need to do a search using the email address to get this. -6. Access Jenkins using the new user - 1. [https://saas-ops.itsma-ng.net/](https://saas-ops.itsma-ng.net/) - 2. Make sure you are logged out of your own account. - 3. Login with the new user account using the password you pre-set. - 4. You will be forced to set a new password. This one is not important, because we will tell the new user to reset their password on first access. - 5. Will get Access Denied message in the screen - at this point, the user has been created in Jenkins and will allow us to setup their profile in the following steps. - 6. Logout of new user account. -7. Login to jenkins with your admin account - 1. From Jenkins main Dashboard, use the global search at the top to find the new user id like: 333a6473-6b8a-4b16-bbcb-4bd8512e158e - 2. Click Configure menu item on left - **NOTE**: you must have jenkins Administrator role. If not, contact one of the team who has the admin role. - 3. Set the user Full Name - change it from the id to the first/last name of the user - 4. Set the appropriate roles depending if this is a Ops or CPE team member (see section below). -8. Tell the user to access Jenkins URL and have them use the Forgot Password option - -## Role Assignment in Jenkins - -Ops team should set the role based on the user persona - Ops Admin OR CPE Team member. - -1. Login to Jenkins with your Admin user account -2. Click on Manage Jenkins in left menu -3. Scroll down to Security section and click on Manage and Assign Roles -4. Click on Assign Roles in left menu. -5. There are 2 sections and you need to add the user in both: Global roles + Item roles -6. At the bottom of each list, click the Add User button -7. Use the cognito user id like: 333a6473-6b8a-4b16-bbcb-4bd8512e158e -8.  -9.  -10. After you add to both lists, make sure to press the Save button - -**Related pages** - -- Page: - [ESM Cloud Farm Version Tracking](/display/ICSD/ESM+Cloud+Farm+Version+Tracking) -- Page: - [How to get an Opentext Confluence account](/display/ICSD/How+to+get+an+Opentext+Confluence+account) -- Page: - [ITOM APM AppPluse Cloud Farm Information](/display/ICSD/ITOM+APM+AppPluse+Cloud+Farm+Information) -- Page: - [ITOM Cloud Service Ops Doc Management Process](/display/ICSD/ITOM+Cloud+Service+Ops+Doc+Management+Process) -- Page: - [ITOM ESM Cloud Service Catalog](/display/ICSD/ITOM+ESM+Cloud+Service+Catalog) -- Page: - [ITOM OpsB NOM Cloud Service Catalog](/display/ICSD/ITOM+OpsB+NOM+Cloud+Service+Catalog) -- Page: - [OpsB and NOM Cloud Deployments Version Tracking](/display/ICSD/OpsB+and+NOM+Cloud+Deployments+Version+Tracking) diff --git a/knowledgebase/csd-wiki/ICSD/AWS-Infrastructure-Naming-Rules_688988195.md b/knowledgebase/csd-wiki/ICSD/AWS-Infrastructure-Naming-Rules_688988195.md deleted file mode 100644 index 5244be1c..00000000 --- a/knowledgebase/csd-wiki/ICSD/AWS-Infrastructure-Naming-Rules_688988195.md +++ /dev/null @@ -1,68 +0,0 @@ -# AWS-Infrastructure-Naming-Rules_688988195 -**EC2 Instance** - -- eu8-prod-smax-worker -- eu8-prod-cms-worker -- eu8-prod-cms-probe-windows -- eu8-prod-oo-worker -- eu8-prod-monitor-worker -- eu8-prod-logging-worker -- eu8-prod-logging-logstash-linux -- eu8-prod-bastion-server-linux -- eu8-prod-bastion-server-windows -- eu8-prod-vertica-node-linux -- eu8-prod-vertica-mc-linux -- eu8-prod-opb-agent-server-windows -- eu8-prod-sm-server-windows -- eu8-prod-idol-server-windows -- eu8-prod-jenkins-server-linux - -**RDS** - -**EFS** - -- us1-prod-smax-efs -- us1-prod-cms-efs -- us1-prod-oo-efs -- us2-dev-smax-efs -- us2-dev-oo-efs - -**Subnets** - -- us24-prod-public-subnet-1 -- us24-prod-public-subnet-2 -- us24-prod-public-subnet-3 -- us24-prod-private-subnet-1 -- us24-prod-private-subnet-2 -- us24-prod-private-subnet-3 -- us24-prod-database-subnet-1 -- us24-prod-database-subnet-2 - -**SecurityGroup**: - -- us24-prod-bastion-securitygroup - -**Backup Plan** - -- - us1-prod-aws-backup-plan - - us2-prod-aws-backup-plan - - jp12-stg-aws-backup-plan - -Backup Rules - -- - us1-prod-6h-backup-rule - - us2-prod-6h-backup-rule - -**Resource Assignment** - -**S3 bucket for Vertica** - -- us2-prod-vertica-data - -**S3 bucket for Velero** - -**AWS CloudWatch Naming Rules** - -Monitoring SMAX Tenant - -Carnaries diff --git a/knowledgebase/csd-wiki/ICSD/AWS-RDS-certificate-update--Helm-Fedramp-simulation-ENV_688983269.md b/knowledgebase/csd-wiki/ICSD/AWS-RDS-certificate-update--Helm-Fedramp-simulation-ENV_688983269.md deleted file mode 100644 index fe6e23b9..00000000 --- a/knowledgebase/csd-wiki/ICSD/AWS-RDS-certificate-update--Helm-Fedramp-simulation-ENV_688983269.md +++ /dev/null @@ -1,2 +0,0 @@ -# AWS-RDS-certificate-update--Helm-Fedramp-simulation-ENV_688983269 -
Tasks | Products | Steps | Duration | Downtime | |
Prepare: Certificate File Preparation | Download the new AWS RDS certificate bundle PEM file | Download the new AWS RDS certificate bundles for specific AWS region from the Certificate bundles for specific AWS Regions. For region of us-gov-west-1, download the below certificate:
| |||
Prepare: Update certificate configuration in application side | OMT | 1. Acquire database info before running the script: You may get the db user, db name and PASSWORD_KEY values from database configmap with below commands: kubectl get cm default-database-configmap -n The result is like: DEFAULT_DB_CDFIDM_PASSWORD_KEY: defaultdb_cdfidm_user_password DEFAULT_DB_CDFIDM_USERNAME: cdfidm DEFAULT_DB_HOST: xxxxxyyyyy.us-west-2.rds.amazonaws.com DEFAULT_DB_NAME: itom-cdf-idm 2. Get the cdfidm db password: kubectl get pod -n $CDF_NAMESPACE | grep "itom-idm" | head -1 | awk '{print $1}' kubectl exec -it For example: kubectl exec -it $(kubectl get pod -n $CDF_NAMESPACE | grep "itom-idm" | head -1 | awk '{print $1}') -n $CDF_NAMESPACE -c idm -- get_secret defaultdb_cdfidm_user_password Note: Record the database info and password, they will be used in execute command | https://docs.microfocus.com/doc/SMAX/24.2/ModifyExternalDBConfig | ||
SMAX & HCMX |
NOTE: The yaml file with new pem content replaced will be used in RDS certificate replacement.Reference: https://staging.docs.microfocus.com/doc/SMAX/Main/ChangeCertForPostgreSQL | https://docs.microfocus.com/doc/SMAX/24.2/ModifyExternalDBConfig | |||
CMS |
helm get values 2. Replace the content of caCertificates.postgresql.crt in values.yaml with the content of AWS RDS certificate bundle got at above step. Note:every line of certificate content starts with 4 indentations in values.yaml, for example:
| ||||
Audit |
helm get values 2. Replace the content of caCertificates.RE_ca_dbcrt in values.yaml with the content of AWS RDS certificate bundle got at above step. Note: every line of certificate content starts with 4 indentations in values.yaml, for example:
| ||||
Execute certificate update in application side Note: There is no dependency on each application. | OMT | Navigate to the $CDF_HOME/bin directory, run the updateExternalDbInfo.sh script with below parameters: ./updateExternalDbInfo.sh -H For example: ./updateExternalDbInfo.sh -H xxxxyyyy.us-west-2.rds.amazonaws.com -p 5432 -d cdfidmdb -u cdfidm --dbpassword Reference: https://docs.microfocus.com/doc/OMT/24.2/ModifyExternalDatabaseConfiguration | 1min | 0 | |
SMAX & HCMX |
| 4mins | 0 | ||
CMS |
NOTE: You may do this in parallel with SMAX restart | 1min | 0 | ||
Audit |
$CDF_HOME/bin/cdfctl runlevel set -l DOWN -n NOTE: You may do this in parallel with SMAX restart | 1min | 0 | ||
Restart pods (Alternative) | You may also do the helm upgrade for all products in parallel without restarting. Then do the restart against all products whose RDS certificates were changed For example: $CDF_HOME/bin/cdfctl runlevel set -l DOWN -n | 14mins | 14mins | ||
| Monitor the restart till all pods are started | kubectl get pod -n < ESM_NAMESPACE > |grep -v 1/1|grep -v 2/2|grep -v 3/3|grep -v 4/4|grep -v Completed | ||||
Update the certificates of AWS RDS DB instances. | Update the certificate on AWS RDS DB instances. | 1.Login AWS console, go to the RDS instances that you want to update the certificates. 2.Select the RDS instance, click modify button 3.Change the Certificate authority. If your primary certificate CA is rds-ca-2019, it's recommended to select the rds-ca-rsa4096-g1 CA as new value
4.Save the change and select immediate effect. 5.Repeat the steps for all your RDS instances | 2mins | 0 |
| Tasks | Products | Duration | Steps |
|---|---|---|---|
Preparation | Download the new AWS RDS certificate bundle PEM file | 5 mins | Download the new AWS RDS certificate bundles for specific AWS region from the Certificate bundles for specific AWS Regions. for example, for region of us-west-2, download the below certificate:
Upload the certificate bundle to the bastion. Note
|
| OMT | 5 mins | 1. Acquire database info before running the script:
Note: Above are OOB values, if you are not using OOB values, you may get the values with below commands: kubectl get cm default-database-configmap -n The result is like: DEFAULT_DB_CDFIDM_PASSWORD_KEY: defaultdb_cdfidm_user_password You may find the db user, db name and PASSWORD_KEY value from database configmap. 2. Get the cmfidm db password. For example: Take the note of your Reference: https://docs.microfocus.com/doc/OMT/24.2/ModifyExternalDatabaseConfiguration | |
| SMAX & HCMX | 10 mins |
NOTE: The yaml file with new pem content replaced will be used in RDS certificate replacement. Reference: https://staging.docs.microfocus.com/doc/SMAX/Main/ChangeCertForPostgreSQL | |
| CMS | 5 mins | 1.Get the CMS values.yaml from current running deployment by running below command helm get values 2.Replace the content of caCertificates.postgresql.crt in values.yaml with the content of AWS RDS certificate bundle got at above step. Note: every line of certificate content starts with 4 indentation in values.yaml | |
| OO | 5 mins | 1.Get the OO values.yaml helm get values 2.Replace the content of caCertificates.postgresql.crt in values.yaml with the content of AWS RDS certificate bundle got at above step. Note: every line of certificate content starts with 4 indentation in values.yaml | |
| Audit | 5 mins |
| |
Maintain Window Update the certificate in application side | OMT | 5 mins | Navigate to the $CDF_HOME/bin directory, run the updateExternalDbInfo.sh script with below parameters: NOTE: you may find the DB parameters in preparation steps. ./updateExternalDbInfo.sh -H |
SMAX/HCMX | 30 mins |
| |
CMS | 20 mins | 1. Update the deployment by running helm upgrade command. The yaml file is the one with new pem content replaced in preparation steps. helm upgrade 2. Restart CMS $CDF_HOME/bin/cdfctl runlevel set -l DOWN -n wait till all pods are shut down $CDF_HOME/bin/cdfctl runlevel set -l UP -n 3. Monitor pod status: kubectl get pod -n NOTE: You may do this in parallel with SMAX restart | |
| OO | 20 mins | 1.Update the deployment by running helm upgrade command. The yaml file is the one with new pem content replaced in preparation steps. helm upgrade 2. Restart OO $CDF_HOME/bin/cdfctl runlevel set -l DOWN -n 3. Monitor pod status: kubectl get pod -n NOTE: You may do this in parallel with SMAX restart | |
| Audit | 5 mins |
$CDF_HOME/bin/cdfctl runlevel set -l DOWN -n 3. Monitor pod status: kubectl get pod -n NOTE: You may do this in parallel with SMAX restart | |
Update the certificates of AWS RDS DB instances. | Update the certificate on AWS RDS DB instances. | 10 mins | 1.Login AWS console, go to the RDS instances that you want to update the certificates. 2.Select the RDS instance, click modify button 3.Change the Certificate authority. If your primary certificate CA is rds-ca-2019, it's recommended to select the rds-ca-rsa2048-g1 CA as new value
4.Save the change, and select immediate effect. Repeat the steps for all your RDS instances |
| Field | Required | Description |
|---|---|---|
| Display Name | Yes | The display name of this configuration. |
| Shared in same family | No | Share the authentication settings within the same family. The supported values are "false" and "true''. See OMT doc. |
| Client ID | Yes | The value of Client ID that you get from the OpenID identity provider. |
| Client Secret | Yes | The value of Client Secret that you get from the OpenID identity provider. |
| HTTP Method | Yes | The HTTP method of getting a user's information from the endpoint. The supported values are "GET" and "POST". Caution: By selecting The GET option, you are disabling or bypassing security features, thereby exposing the system to increased security risks. By using this option, you understand and agree to assume all associated risks and hold OpenText harmless for the same. |
| IDP URL | Yes | The endpoint or URL path provided by the OpenID Identity Provider. The URL set for "Redirect URL" will be directed to the IDP URL. |
| Redirect URI | Yes | The value of redirect URI of the IDM URL for login. See OMT doc. |
| Scope | Yes | The value of scope. For example, "openid email". See OMT doc. |
| State Supported | No | Whether support the State Supported feature. The supported values are "false" and "true''. See OMT doc. |
| Username Attribute | Yes | The attribute to define a username. |
| User Info Endpoint | No | An OAuth 2.0 Protected Resource that returns Claims about the authenticated end user. For example, /userinfo. |
| Token Endpoint | Yes | The token endpoint of the OpenID identity provider. The Token Endpoint is used to obtain a Token Response. For example, /token. |
| Authentication Endpoint | Yes | The Authorization Endpoint performs authentication of an end user. This is done by sending the user agent to the authorization server's endpoint for authentication and authorization, using request parameters defined by OAuth 2.0 and additional parameters and parameter values defined by OpenID Connect. For example, /authorize. |
| Logout Endpoint | No | The token endpoint where you can end a session. |
| Additional Parameter | No | The additional parameter for authentication. See OMT doc. |
| Field | Description |
|---|---|
| Display Name | The display name of this configuration. |
| Client ID | The value of Client ID that you get from step 5 above. |
| Client Secret | The value of Client Secret that you get from step 5 above. |
| IDP URL | https://accounts.google.com |
| Scope | openid profile email |
| User Info Endpoint | https://openidconnect.googleapis.com/v1/userinfo |
| Token Endpoint | https://oauth2.googleapis.com/token |
| Authorization Endpoint | https://accounts.google.com/o/oauth2/v2/auth |
| Logout Endpoint | https://accounts.google.com/Logout |
| Additional Parameter | The additional parameter for authentication. |
Section | Item | Value |
|---|---|---|
Basic configuration | Target type | IP addresses |
Target group name | NLB-for-Vertica-TG | |
Protocol: Port | TCP: 5433 | |
IP address type | IPv4 | |
VPC | VPC of the Vertica DB server | |
Others | / | Leave default |
Section | Item | Value |
|---|---|---|
Basic configuration | Load balancer name | NLB-for-Vertica |
Scheme | Internal | |
IP address type | IPv4 | |
Network mapping | VPC | VPC of the Vertica DB server |
Mappings | us-west-2a: private subnet1 us-west-2b: private subnet2 us-west-2c: private subnet3 | |
Security groups | Security groups | The security group of the Vertica DB server |
Listeners and routing | Protocol | TCP |
Port | 5433 | |
Forward to | NLB-for-Vertica-TG |
Section | Item | Value |
|---|---|---|
Endpoint service settings | Name | Vertica-endpoint-service |
Load balancer type | Network | |
Available load balancers | Select the load balancers | NLB-for-Vertica |
Additional settings | Acceptance required | Checked |
Supported IP address types | IPv4 |
Section | Item | Value |
|---|---|---|
Endpoint settings | Name tag | Vertica-Pulsar-endpoint |
Service category | Other endpoint services | |
Service settings | Service name | The pulsar service name shared from OpsB |
VPC | VPC | The VPC of Vertica |
Additional settings | Leave as default |
Section | Item | Value |
|---|---|---|
Endpoint settings | Name tag | Vertica-DI-Admin-endpoint |
Service category | Other endpoint services | |
Service settings | Service name | The DI Admin service name shared from OpsB |
VPC | VPC | The VPC of Vertica |
Additional settings | Leave as default |
Section | Item | Value |
|---|---|---|
Endpoint settings | Name tag | Vertica-DI-Receiver-endpoint |
Service category | Other endpoint services | |
Service settings | Service name | The DI receiver service name shared from OpsB |
VPC | VPC | The VPC of Vertica |
Additional settings | Leave as default |
| Index | Status | Issues | Comments |
|---|---|---|---|
| Micro Focus Reference Architecture | Video Slides | ||
| Monitoring Level | Category | Severity | Code | Alert Description AWS | Alert Description GCP | Sample Chart | Alert Message | Runbook AWS | Runbook GCP |
|---|---|---|---|---|---|---|---|---|---|
| Infrastructure | Compute | ALB HTTP 5XX Count (More than 34 in a 3 mins time frame) | N/A | Link | [ S0 - Urgent ] [ farm-name ] ALB HTTP 5XX Count alert | Runbook | |||
S2 | ALB Target 5xx Count | N/A | Link | ||||||
| Storage | S3 | EBS Disk Queue Depth (EBS disk queue depth more than 5 for more than 10 mins) | Disk queue length avg (disk queue length is more than 5 for more than 10 mins) | Link | [ S3 - Warning ] [ farm-name ] EBS Disk Queue Depth alert | Runbook | |||
S2 | EBS Burst Balance Average (EBS burst balance below 40% for more than 30 mins ) | N/A | Link | [ S2 - Error ] [ farm-name ] EBS Burst Balance Average alert | Runbook | ||||
| EBS Burst Balance Average (EBS burst balance is below 0) | N/A | Link | [ S0 - Urgent ] [ farm-name ] EBS Burst Balance Average alert | Runbook | |||||
S2 | EFS Burst Credit Balance (Burst credit below 40% for more than 15 mins ) | N/A | Link | [ S2 - Error ] [ farm-name ] EFS Burst Credit Balance alert | Runbook | ||||
| EFS Burst Credit Balance (Burst credit is 0) | N/A | Link | [ S0 - Urgent ] [ farm-name ] EFS Burst Credit Balance alert | Runbook | |||||
| Disk average latency (?) | |||||||||
| Filestore: Average read latency (?) | |||||||||
| Filestore: Average write latency (?) | |||||||||
| Filestore: Used space percent (?) | |||||||||
Virtualization | |||||||||
Database | S2 | RDS CPU Utilization (CPU more than 97% for more than 30 mins) | CPU utilization (CPU more than 97% for more than 30 mins) | Link | [ S2 - Error ] [ farm-name ] RDS CPU Utilization alert | Runbook | |||
S2 | CPU (sy: system >70% for more than 60 mins ) | N/A | Link | [ S2 - Error ] [ farm-name ] RDS cpuUtilization System alert | Runbook | ||||
S2 | CPU (si: soft interrupts > 15% for more than 60 mins ) | N/A | Link | [ S2 - Error ] [ farm-name ] RDS CPU Soft Interrupts alert | Runbook | ||||
S3 | Disk queue depth (EBS disk queue depth more than 5 for more than 10 mins) | IO wait (Total of IO_time,?) | Link | [ S3 - Warning ] [ farm-name ] RDS Disk queue depth alert | Runbook | ||||
S2 | Disk (Free Storage Space is below 500 MB) | Disk (Free Storage Space= (1-Disk Utilization)* Disk allocation / Disk Utilization is below 500 MB) | Link | [ S2 - Error ] [ farm-name ] RDS Disk Free Storage Space alert | Runbook | ||||
S2 | Disk (Storage has enough space to auto-scale, (Free Space + Max Autoscaling Storage - Allocated Storage) / Allocated Storage < 0.2 ) | Disk (Storage has enough space to auto-scale, (Free Space + Max Autoscaling Storage - Allocated Storage) / Allocated Storage < 0.2 ) | Runbook | ||||||
S2 | Memory (Free memory less than 5% for more than 5 mins) | Memory components(sum of all components) (Free memory less than 5% for more than 5 mins) | Link | [ S2 - Error ] [ farm-name ] RDS Free Memory Percentage alert | Runbook | ||||
| Memory (Free memory less than 2% for more than 5 mins) | Memory components(sum of all components) (Free memory less than 2% for more than 5 mins) | Link | [ S0 - Urgent ] [ farm-name ] RDS Free Memory Percentage alert | Runbook | |||||
S2 | Storage (Burst Balance below 40% for more than 30 mins ) | N/A | Link | [ S2 - Error ] [ farm-name ] RDS Burst Balance alert | Runbook | ||||
| RDS Burst Balance (Burst Balance is 0) | N/A | Link | [ S0 - Urgent ] [ farm-name ] RDS Burst Balance alert | Runbook | |||||
S2 | RDS DBLoad (AWS Specific, via performance insight, more than 2 times of CPU number for more than one hour) | Database load (via query insight, execution_time, more than 2 times of CPU capacity) | [ S2 - Error ] [ farm-name ] SMA RDS DBLoad alert [ S2 - Error ] [ farm-name ] CMS RDS DBLoad alert | Runbook | |||||
| RDS DBLoad (AWS Specific, via performance insight, more than 4 times of CPU number for more than one hour) | Database load (via query insight, execution_time, more than 4 times of CPU capacity) | [ S1 - Critical ] [ farm-name ] SMA RDS DBLoad alert [ S1 - Critical ] [ farm-name ] CMS RDS DBLoad alert | Runbook | ||||||
S3 | RDS DBLoadNonCPU (AWS Specific, via performance insight, more than 1 times of CPU number more than one hour) | IO wait time+Lock wait time (via query insight,, more than 1 times of CPU capacity) | [ S3 - Warning ] [ farm-name ] SMA RDS DBLoadNonCPU alert [ S3 - Warning ] [ farm-name ] CMS RDS DBLoadNonCPU alert | Runbook | |||||
Wait events (Total of all events,?) | |||||||||
Query latency (Total of all the latencies,?) | |||||||||
| Link | Block Session Count | ||||||||
| Link | long active query duration | ||||||||
Capture RDS top 10 query (TBD)
| RDS top 10 query | ||||||||
dead tuple ems dead tuple rms dead tuple idm | |||||||||
| OS (Node level) | CPU | S2 | CPU more than 97% for more than 60 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Node CPU Usage alert | Runbook | ||
S2 | CPU (sy: system >70% for more than 60 mins )(mark for review) | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Node CPU System alert | Runbook | ||||
S2 | CPU (si: soft interrupts > 15% for more than 60 mins )(mark for review) | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Node CPU Soft Interrupts alert | Runbook | ||||
Memory | S3 | Memory more than 95% for more than 10 mins | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Node Mem Usage alert | Runbook | |||
Disk | S3 | Disk usage more than 95% | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Node Disk Usage alert | Runbook | |||
Disk read/write latency (TBD) | Same as AWS | Disk Read Latency Disk Write Latency | |||||||
S3 | Inode usage > 97% | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Disk Inode Usage alert | Runbook | ||||
Node disk IO load (TBD) | Same as AWS | Link | Disk IOPS | ||||||
Network | network operation latency(TBD) | Same as AWS | |||||||
network transit error rate(TBD) | Same as AWS | Link | Network Transit Error Rate | ||||||
network transit drop rate(TBD) | Same as AWS | Link | Network Transit Drop Rate | ||||||
network transit queue length(TBD) | Same as AWS | ||||||||
Throughput / bandwidth (TBD) | Same as AWS | ||||||||
S3 | Load (Load Avg 15m/core number > 200% for 35 mins ) | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Node Load Avg 15m/core | Runbook | ||||
| Container | CPU | S2 | CPU (CPU more than 97% for more than 60 mins) | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Pod CPU usage alert | Runbook | ||
Memory | swap usage | Same as AWS | Link | Pod Swap Usage | |||||
Disk | Disk read/write latency (TBD) | Same as AWS | Pod Disk Read Latency Pod Disk Write Latency | ||||||
S3 | Inode usage(free/total) > 97% | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Pod Inode Usage alert | Runbook | ||||
Network | network transit error rate(TBD) | Same as AWS | Link | Pod Network Transit Error Rate | |||||
network transit drop rate(TBD) | Same as AWS | Link | Pod Network Transit Drop Rate | ||||||
Unavailable service | SMAXcritical path unavailable: svc portal / runtime ui/ gateway/ platform / redis / rabbitmq / bo-login / idm / bo-ats / ingress-nginx / sma-ui / bo-farcade | Same as AWS | Link | [ S0 - Urgent ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | ||||
S2 | SMAXimpact partial of business: others not in S0, search related (content, DIH, DAH, search, proxy) / auto pass / bo-ui / bo-user | Same as AWS | Link | [ S2 - Error ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | ||||
S3 | SMAXno obvious impact on business: XMPP / XIE / Smart Ticket / stx / virtual agent / ppo / web socket gateway / smart-ui / ocr / smarta-installer | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | ||||
S4 | SMAXservices out side of ESM / toolkit | Same as AWS | Link | [ S4 - Info ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | ||||
CMScritical path unavailable: itom-cms-gateway, itom-idm, itom-ingress-controller, itom-ucmdb-browser, tom-ucmdb-solr, itom-ucmdb (both are down) | Same as AWS | Link | [ S0 - Urgent ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | |||||
S2 | CMSimpact partial of business: itom-autopass-lms, itom-vault, itom-ucmdb (either is down) | Same as AWS | Link | [ S2 - Error ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | ||||
S3 | CMS no obvious impact on business: | Same as AWS | |||||||
S4 | CMSservices out side of ESM / toolkit: itom-ucmdb-probe, itom-ucmdb-dfp-lunux-installer, itom-ucmdb-dfp-windows-installer, itom-ucmdb-localclient-installers | Same as AWS | Link | [ S4 - Info ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | ||||
Load | S3 | Load Avg 15m/core number > 200% for 35 mins (TBD, because it's not observable via current metrics) | Same as AWS | Link | Pod Load Avg 10s | Runbook | |||
Threads | container_threads on process (TBD) | Same as AWS | Link | Threads | |||||
Pod balancing (TBD) | |||||||||
| App metrics | Thread | ||||||||
Connections | |||||||||
Limits | |||||||||
Smart Analytics | S3 | SMAXContent data ratio(total doc/committed doc) > 1.20 | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] SmartA Data Compact Ration alert | Runbook | |||
Rabbitmq (each node) | S3 | SMAXqueue > 200 / 250 for more than 30 mins (200 for medium profile or lower, 250 for large profile) | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Rabbitmq Queue alert | Runbook | |||
S3 | SMAXPending Messages/Minute > 500 for more than 30 mins (Mark for review) | Same as AWS | Link | [ S3 - Warning ] [ farm-name ] Rabbitmq Messages/Minute alert | Runbook | ||||
SMAXMessage queue not equally distributed to different cluster nodes(TBD) | Same as AWS | Runbook | |||||||
IDM | S4 | SMAXActive user (per profile, medium profile > 1100 for more than 30 mins, large profile > 3000 for more than 30 mins ) | Same as AWS | Link | [ S4 - Info ] [ farm-name ] IDM active users alert | Runbook | |||
Gateway | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Gateway Tomcat https connector currentThreadsBusy alert | Runbook | |||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Gateway Httpclient InUse alert | Runbook | ||||
Platform | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Platform Tomcat https connector currentThreadsBusy alert | Runbook | |||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Platform Httpclient InUse alert | Runbook | ||||
Serviceportal | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Serviceportal Tomcat https connector currentThreadsBusy alert | Runbook | |||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Same as AWS | Link | [ S2 - Error ] [ farm-name ] Serviceportal Httpclient InUse alert | Runbook | ||||
OpenSearch based Monitoring (TBD) | Access 5xx | ||||||||
Access Response time | |||||||||
Database level customer metrics | Same as AWS | Link | NativeSACM Transaction Context Queue | ||||||
Same as AWS | Link | NativeSACM Transaction Context Queue retries | |||||||
Same as AWS | |||||||||
Same as AWS | Link | ||||||||
TextDetection Job queue | Same as AWS | Link | |||||||
IndexEntities Job queue | Same as AWS | Link | |||||||
EntitiesHandler Job queue | Same as AWS | Link | |||||||
SLT Job Delay time[mins] | Same as AWS | Link | |||||||
TextDetection Job Delay time[mins] | Same as AWS | Link | |||||||
IndexEntities Job Delay time[mins] | Same as AWS | Link | |||||||
EntitiesHandler Job Delay time[mins] | Same as AWS | Link | |||||||
| Instrumental | Method | ||||||||
Query | |||||||||
| Others | When to scale out (overloaded) | ||||||||
| Monitoring Level | Category | Severity | Code | Alert Description | Sample Chart | Alert Message | Runbook |
|---|---|---|---|---|---|---|---|
| Infrastructure | Compute | ALB HTTP 5XX Count (More than 34 in a 3 mins time frame) | Link | [ S0 - Urgent ] [ farm-name ] ALB HTTP 5XX Count alert | Runbook | ||
S2 | ALB Target 5xx Count | Link | |||||
| Storage | S3 | EBS Disk Queue Depth (EBS disk queue depth more than 5 for more than 10 mins) | Link | [ S3 - Warning ] [ farm-name ] EBS Disk Queue Depth alert | Runbook | ||
S2 | EBS Burst Balance Average (EBS burst balance below 40% for more than 30 mins ) | Link | [ S2 - Error ] [ farm-name ] EBS Burst Balance Average alert | Runbook | |||
| EBS Burst Balance Average (EBS burst balance is below 0) | Link | [ S0 - Urgent ] [ farm-name ] EBS Burst Balance Average alert | Runbook | ||||
S2 | EFS Burst Credit Balance (Burst credit below 40% for more than 15 mins ) | Link | [ S2 - Error ] [ farm-name ] EFS Burst Credit Balance alert | Runbook | |||
| EFS Burst Credit Balance (Burst credit is 0) | Link | [ S0 - Urgent ] [ farm-name ] EFS Burst Credit Balance alert | Runbook | ||||
Virtualization | |||||||
Database | S2 | RDS CPU Utilization (CPU more than 97% for more than 30 mins) | Link | [ S2 - Error ] [ farm-name ] RDS CPU Utilization alert | Runbook | ||
S2 | CPU (sy: system >70% for more than 60 mins ) | Link | [ S2 - Error ] [ farm-name ] RDS cpuUtilization System alert | Runbook | |||
S2 | CPU (si: soft interrupts > 15% for more than 60 mins ) | Link | [ S2 - Error ] [ farm-name ] RDS CPU Soft Interrupts alert | Runbook | |||
S3 | Disk queue depth (EBS disk queue depth more than 5 for more than 10 mins) | Link | [ S3 - Warning ] [ farm-name ] RDS Disk queue depth alert | Runbook | |||
S2 | Disk (Free Storage Space is below 500 MB) | Link | [ S2 - Error ] [ farm-name ] RDS Disk Free Storage Space alert | Runbook | |||
S2 | Disk (Storage has enough space to auto-scale, (Free Space + Max Autoscaling Storage - Allocated Storage) / Allocated Storage < 0.2 ) | Runbook | |||||
S2 | Memory (Free memory less than 5% for more than 5 mins) | Link | [ S2 - Error ] [ farm-name ] RDS Free Memory Percentage alert | Runbook | |||
| Memory (Free memory less than 2% for more than 5 mins) | Link | [ S0 - Urgent ] [ farm-name ] RDS Free Memory Percentage alert | Runbook | ||||
S2 | Storage (Burst Balance below 40% for more than 30 mins ) | Link | [ S2 - Error ] [ farm-name ] RDS Burst Balance alert | Runbook | |||
| RDS Burst Balance (Burst Balance is 0) | Link | [ S0 - Urgent ] [ farm-name ] RDS Burst Balance alert | Runbook | ||||
S2 | RDS DBLoad (AWS Specific, via performance insight, more than 2 times of CPU number for more than one hour) | [ S2 - Error ] [ farm-name ] SMA RDS DBLoad alert [ S2 - Error ] [ farm-name ] CMS RDS DBLoad alert | Runbook | ||||
| RDS DBLoad (AWS Specific, via performance insight, more than 4 times of CPU number for more than one hour) | [ S1 - Critical ] [ farm-name ] SMA RDS DBLoad alert [ S1 - Critical ] [ farm-name ] CMS RDS DBLoad alert | Runbook | |||||
S3 | RDS DBLoadNonCPU (AWS Specific, via performance insight, more than 1 times of CPU number more than one hour) | [ S3 - Warning ] [ farm-name ] SMA RDS DBLoadNonCPU alert [ S3 - Warning ] [ farm-name ] CMS RDS DBLoadNonCPU alert | Runbook | ||||
| Link | Block Session Count | ||||||
| Link | long active query duration | ||||||
Capture RDS top 10 query (TBD)
| RDS top 10 query | ||||||
dead tuple ems dead tuple rms dead tuple idm | |||||||
| OS (Node level) | CPU | S2 | CPU more than 97% for more than 60 mins | Link | [ S2 - Error ] [ farm-name ] Node CPU Usage alert | Runbook | |
S2 | CPU (sy: system >70% for more than 60 mins )(mark for review) | Link | [ S2 - Error ] [ farm-name ] Node CPU System alert | Runbook | |||
S2 | CPU (si: soft interrupts > 15% for more than 60 mins )(mark for review) | Link | [ S2 - Error ] [ farm-name ] Node CPU Soft Interrupts alert | Runbook | |||
Memory | S3 | Memory more than 95% for more than 10 mins | Link | [ S3 - Warning ] [ farm-name ] Node Mem Usage alert | Runbook | ||
Disk | S3 | Disk usage more than 95% | Link | [ S3 - Warning ] [ farm-name ] Node Disk Usage alert | Runbook | ||
Disk read/write latency (TBD) | Disk Read Latency Disk Write Latency | ||||||
S3 | Inode usage > 97% | Link | [ S3 - Warning ] [ farm-name ] Disk Inode Usage alert | Runbook | |||
Node disk IO load (TBD) | Link | Disk IOPS | |||||
Network | network operation latency(TBD) | ||||||
network transit error rate(TBD) | Link | Network Transit Error Rate | |||||
network transit drop rate(TBD) | Link | Network Transit Drop Rate | |||||
network transit queue length(TBD) | |||||||
Throughput / bandwidth (TBD) | |||||||
S3 | Load (Load Avg 15m/core number > 200% for 35 mins ) | Link | [ S3 - Warning ] [ farm-name ] Node Load Avg 15m/core | Runbook | |||
| Container | CPU | S2 | CPU (CPU more than 97% for more than 60 mins) | Link | [ S2 - Error ] [ farm-name ] Pod CPU usage alert | Runbook | |
Memory | swap usage | Link | Pod Swap Usage | ||||
Disk | Disk read/write latency (TBD) | Pod Disk Read Latency Pod Disk Write Latency | |||||
S3 | Inode usage(free/total) > 97% | Link | [ S3 - Warning ] [ farm-name ] Pod Inode Usage alert | Runbook | |||
Network | network transit error rate(TBD) | Link | Pod Network Transit Error Rate | ||||
network transit drop rate(TBD) | Link | Pod Network Transit Drop Rate | |||||
Unavailable service | SMAXcritical path unavailable: svc portal / runtime ui/ gateway/ platform / redis / rabbitmq / bo-login / idm / bo-ats / ingress-nginx / sma-ui / bo-farcade | Link | [ S0 - Urgent ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | |||
S2 | SMAXimpact partial of business: others not in S0, search related (content, DIH, DAH, search, proxy) / auto pass / bo-ui / bo-user | Link | [ S2 - Error ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | |||
S3 | SMAXno obvious impact on business: XMPP / XIE / Smart Ticket / stx / virtual agent / ppo / web socket gateway / smart-ui / ocr / smarta-installer | Link | [ S3 - Warning ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | |||
S4 | SMAXservices out side of ESM / toolkit | Link | [ S4 - Info ] [ farm-name ] SMA Unavailable k8s resource alert | Runbook | |||
CMScritical path unavailable: itom-cms-gateway, itom-idm, itom-ingress-controller, itom-ucmdb-browser, tom-ucmdb-solr, itom-ucmdb (both are down) | Link | [ S0 - Urgent ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | ||||
S2 | CMSimpact partial of business: itom-autopass-lms, itom-vault, itom-ucmdb (either is down) | Link | [ S2 - Error ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | |||
S3 | CMS no obvious impact on business: | ||||||
S4 | CMSservices out side of ESM / toolkit: itom-ucmdb-probe, itom-ucmdb-dfp-lunux-installer, itom-ucmdb-dfp-windows-installer, itom-ucmdb-localclient-installers | Link | [ S4 - Info ] [ farm-name ] CMS Unavailable k8s resource alert | Runbook | |||
Load | S3 | Load Avg 15m/core number > 200% for 35 mins (TBD, because it's not observable via current metrics) | Link | Pod Load Avg 10s | Runbook | ||
Threads | container_threads on process (TBD) | Link | Threads | ||||
Pod balancing (TBD) | |||||||
| App metrics | Thread | ||||||
Connections | |||||||
Limits | |||||||
Smart Analytics | S3 | SMAXContent data ratio(total doc/committed doc) > 1.20 | Link | [ S3 - Warning ] [ farm-name ] SmartA Data Compact Ration alert | Runbook | ||
Rabbitmq (each node) | S3 | SMAXqueue > 200 / 250 for more than 30 mins (200 for medium profile or lower, 250 for large profile) | Link | [ S3 - Warning ] [ farm-name ] Rabbitmq Queue alert | Runbook | ||
S3 | SMAXPending Messages/Minute > 500 for more than 30 mins (Mark for review) | Link | [ S3 - Warning ] [ farm-name ] Rabbitmq Messages/Minute alert | Runbook | |||
SMAXMessage queue not equally distributed to different cluster nodes(TBD) | Runbook | ||||||
IDM | S4 | SMAXActive user (per profile, medium profile > 1100 for more than 30 mins, large profile > 3000 for more than 30 mins ) | Link | [ S4 - Info ] [ farm-name ] IDM active users alert | Runbook | ||
Gateway | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Gateway Tomcat https connector currentThreadsBusy alert | Runbook | ||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Gateway Httpclient InUse alert | Runbook | |||
Platform | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Platform Tomcat https connector currentThreadsBusy alert | Runbook | ||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Platform Httpclient InUse alert | Runbook | |||
Serviceportal | S2 | SMAXTomcat https connector currentThreadsBusy > 30 for 30 mins (EU8-Prod) Tomcat https connector currentThreadsBusy > 30 for 30 mins or Tomcat https connector currentThreadsBusy > 60 for 15 mins or Tomcat https connector currentThreadsBusy > 90 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Serviceportal Tomcat https connector currentThreadsBusy alert | Runbook | ||
S2 | SMAXHttpclient InUse > 20 for 30 mins (EU8-Prod) Httpclient InUse > 20 for 30 mins or Httpclient InUse > 30 for 15 mins or Httpclient InUse > 80 for 5 mins | Link | [ S2 - Error ] [ farm-name ] Serviceportal Httpclient InUse alert | Runbook | |||
OpenSearch based Monitoring (TBD) | Access 5xx | ||||||
Access Response time | |||||||
Database level customer metrics | Link | NativeSACM Transaction Context Queue | |||||
| Link | NativeSACM Transaction Context Queue retries | ||||||
| Link | |||||||
TextDetection Job queue | Link | ||||||
IndexEntities Job queue | Link | ||||||
EntitiesHandler Job queue | Link | ||||||
SLT Job Delay time[mins] | Link | ||||||
TextDetection Job Delay time[mins] | Link | ||||||
IndexEntities Job Delay time[mins] | Link | ||||||
EntitiesHandler Job Delay time[mins] | Link | ||||||
| Instrumental | Method | ||||||
Query | |||||||
| Others | When to scale out (overloaded) | ||||||
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package |
| |||
1 | Upgrade Preparation | SMAX/HCMX:
|
| |||
UCMDB: | / | |||||
Audit Service: | Update values yaml file: 1min | Update Values.yaml file | Screenshot for reference
Add the parameter in the audit yaml file: cluster: k8sProvider: aws | |||
OO: | / | |||||
Aviator: | / | |||||
OP: | / | |||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.1.1 Aviator Upgrade | |
3 | Upgrade Maintenance Window | Upgrade OMT | 13mins (including Prometheus) | SMAX: 0 CMS: 0 | Upgrade OMT to 25.1 | |
Upgrade OP OP can be upgrade post upgrade of OMT | 8 mins | 0 | ||||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 0 UCMDB: 0 Audit Services: 0 | Upgrade SMAX/HCMX to 25.1.1 | |||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 22mins | 0 | ||||
4 | Post-Upgrade | OP | Vertica DB stop & start: 5mins Upgrade UDX plugin: 10mins Verify upgrade: 2mins | If this plugin was already upgraded to 24.4.1 then you can ignore in this upgrade | ||
SMAX/HCMX post upgrade task:
|
| |||||
SMAX OPB agent status check | 25.1.1 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 20 minutes for one external ras | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package | ||||
1 | Upgrade Preparation | SMAX/HCMX: |
| |||
UCMDB:
| / | |||||
Audit Service: | / | |||||
OO:
| 2mins | |||||
Aviator: Existing integration in BO portal status & results from agent & service portal | 2mins | |||||
OP: | N/A | N/A | ||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.1.2 Aviator Upgrade | |
3 | Upgrade Maintenance Window | Upgrade OMT | 15mins (including Prometheus) | SMAX: 0 CMS: 0 | Upgrade OMT to 25.1.1 | |
Upgrade OP | N/A | 0 | N/A | |||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 6mins UCMDB: 1mins Audit Services: 0 | Install ESM 25.1.2 Patch | |||
| Install UD/UCMDB 25.1.2 Patch | ||||||
| Install Audit 25.1.2 patch | ||||||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 17mins | 0 | ||||
4 | Post-Upgrade | OP | N/A | N/A | N/A | N/A |
SMAX/HCMX post upgrade task:
|
| |||||
SMAX OPB agent status check | 25.1.2 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 20 minutes for one external ras (Download & upgrade) | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package |
| |||
1 | Upgrade Preparation | SMAX/HCMX:
|
| |||
UCMDB: | / | |||||
Audit Service: | Update values yaml file: 1min | Update Values.yaml file | Screenshot for reference
Add the parameter in the audit yaml file: cluster: k8sProvider: aws | |||
OO: | / | |||||
Aviator: | / | |||||
OP: | / | |||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.1 Aviator Upgrade | |
3 | Upgrade Maintenance Window | Upgrade OMT | 10mins (including Prometheus) | SMAX: 0 CMS: 0 | Upgrade OMT to 25.1 | |
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 0 UCMDB: 0 Audit Services: 0 | Upgrade SMAX/HCMX to 25.1 | |||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 22mins | 0 | ||||
Upgrade OP OP can be upgrade post upgrade of OMT | 10 mins | 10 mins | ||||
4 | Post-Upgrade | SMAX/HCMX post upgrade task:
|
| |||
SMAX OPB agent status check | 25.1 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 20 minutes for one external ras | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
OP | Vertica DB stop & start: 5mins Upgrade UDX plugin: 10mins Verify upgrade: 2mins | If this plugin was already upgraded to 24.4.1 then you can ignore in this upgrade | ||||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package |
| |||
1 | Upgrade Preparation | SMAX/HCMX:
|
|
Before this above link settings has to be performed to make sure SMAX content check go smooth refer below link. | ||
UCMDB: | Back up UD/UCMDB (managed Kubernetes) | |||||
Audit Service: | / | Back up Audit service | ||||
OO: | / | Back up the OO deployment before upgrade | ||||
Aviator: | / | |||||
OP: | / | |||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 4 mins | 25.2 Aviator Upgrade | |
3 | Upgrade Maintenance Window | Upgrade OMT | 12mins (including Prometheus) | SMAX: 0 CMS: 0 | Upgrade OMT to 25.2 | |
Upgrade OP OP can be upgraded post upgrade of OMT Upgrade Optic Data lake plugin Upgrade Vertica to 25.x | 9 mins | 0 | ||||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 9 mins UCMDB: 0 Audit Services: 0 | Upgrade SMAX/HCMX to 25.2 | |||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 22mins | 0 | ||||
4 | Post-Upgrade | SMAX/HCMX post upgrade task:
Crawling upgrade | 5mins | Perform post-upgrade tasks as the tenant admin - Service Management Perform post-upgrade tasks as the suite admin - Service Management |
| |
SMAX OPB agent status check | 2mins | 25.2 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | ||||
OP | Vertica DB stop & start: 5mins UDX plugin- 5mins bvd-quexserv pod restart- 2mins Verify upgrade: 2mins | |||||
Aviator | --Remove unused SageMaker endpoint of the Embedding model --Perform a full reindex - 5mins | |||||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package | ||||
1 | Upgrade Preparation | SMAX/HCMX: |
| |||
UCMDB:
| / | |||||
Audit Service: | / | |||||
OO:
| 2mins | |||||
Aviator: Existing integration in BO portal status & results from agent & service portal | 2mins | |||||
OP: | N/A | N/A | ||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.2.2 Aviator Upgrade | Aviator was upgraded through pipeline this time as maple owned setup had some problem. |
3 | Upgrade Maintenance Window | Upgrade OMT | 13mins (including Prometheus) | SMAX: 0 CMS: 0 | Apply OMT Patch | |
Upgrade OP | N/A | 0 | N/A | |||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 2mins UCMDB: 1mins Audit Services: 0 | Install ESM 25.2.2 Patch | |||
| Install UD/UCMDB 25.2.2 Patch | ||||||
| Install Audit 25.2.2 patch | ||||||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 22mins | 0 | ||||
4 | Post-Upgrade | Aviator | / | N/A | / | / |
SMAX/HCMX post upgrade task:
| Upgrade yet to do in US7 simulation time. |
| ||||
SMAX OPB agent status check | 25.2.2 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 5 minutes for one external ras (Download & upgrade) | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package |
| |||
1 | Upgrade Preparation | SMAX/HCMX:
|
| |||
UCMDB:
| / | |||||
Audit Service: | / | |||||
OO:
| 2mins | |||||
Aviator: Existing integration in BO portal status & results from agent & service portal | 2mins | |||||
OP: | N/A | N/A | ||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.3.1 Aviator Upgrade | Aviator was upgraded through pipeline this time as maple owned setup had some problem. |
3 | Upgrade Maintenance Window | Upgrade OMT | 13mins (including Prometheus) | SMAX: 0 CMS: 0 | Apply OMT Patch | |
Upgrade OP | N/A | 0 | N/A | |||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 2mins UCMDB: 1mins Audit Services: 0 | Install ESM 25.3.1 Patch | |||
| Install UD/UCMDB 25.3.1 Patch | ||||||
| Install Audit 25.3.1 patch | ||||||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 22mins | 0 | ||||
4 | Post-Upgrade | Aviator | / | N/A | / | / |
SMAX/HCMX post upgrade task:
| Upgrade yet to do in US7 simulation time. |
| ||||
SMAX OPB agent status check | 25.3.1 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 5 minutes for one external ras (Download & upgrade) | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package |
| |||
1 | Upgrade Preparation | SMAX/HCMX:
|
| |||
UCMDB:
| / | |||||
Audit Service: | / | |||||
OO:
| 2mins | |||||
Aviator: Existing integration in BO portal status & results from agent & service portal | 2mins | |||||
OP: | N/A | N/A | ||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
|
| 0 | 25.3.2 Aviator Upgrade | |
3 | Upgrade Maintenance Window | Upgrade OMT | 14mins (including Prometheus) | SMAX: 0 CMS: 0 | Apply OMT Patch | |
Upgrade OP | 11 mins | 0 | Apply OP Patch | |||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 2mins UCMDB: 1mins Audit Services: 0 | Install ESM 25.3.2 Patch | |||
| Install UD/UCMDB 25.3.2 Patch | ||||||
| Install Audit 25.3.2 patch | ||||||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | 26 mins | 0 | ||||
4 | Post-Upgrade | Aviator | 5mins | 0 | Verify post upgrade of aviator patch, if existing tenants search works as expected. | / |
UCMDB post upgrade task: | ||||||
SMAX OPB agent status check | 25.3.2 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | 5 minutes for one external ras (Download & upgrade) | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
No | Phase | Upgrade Tasks | Upgrade Duration | Downtime (min) | Doc Link | |
|---|---|---|---|---|---|---|
0 | Get release package | Check ESM release package | ||||
1 | Upgrade Preparation | SMAX/HCMX: |
| |||
UCMDB:
| / | |||||
Audit Service: | / | |||||
OO:
| mins | |||||
Aviator: Existing integration in BO portal status & results from agent & service portal | mins | |||||
OP: | N/A | N/A | ||||
Backup SaaS Farm including Aviator | ||||||
2 | Upgrade Aviator |
| mins | 0 | 25.2.2 Aviator Upgrade | Aviator was upgraded through pipeline this time as maple owned setup had some problem. |
3 | Upgrade Maintenance Window | Upgrade OMT | mins (including Prometheus) | SMAX: 0 CMS: 0 | Upgrade OMT to 25.2.2 | |
Upgrade OP | N/A | 0 | N/A | |||
Upgrade SMAX/HCMX, UCMDB, Audit in parallel
|
| SMAX: 2mins UCMDB: 1mins Audit Services: 0 | Install ESM 25.2.2 Patch | |||
| Install UD/UCMDB 25.2.2 Patch | ||||||
| Install Audit 25.2.2 patch | ||||||
Upgrade OO Upgrade OO when SMAX/HCMX upgrade finish (watch the pod of dnd-upgrade-job-xxx and cgro-deploy-controller, until they become to Completed, then you can start to upgrade OO) | mins | 0 | ||||
4 | Post-Upgrade | Reindex for Aviator tenants | mins | N/A | DocsMicrofocus | This duration is based on US7 simulation setup for one tenant |
SMAX/HCMX post upgrade task:
| Upgrade yet to do in US7 simulation time. |
| ||||
SMAX OPB agent status check | 25.2.2 include OPB agent upgrade. Compare the OPB agent status check before the upgrade and ensure all live OPB agent are upgrades successfully with new version and connection keep live | |||||
Upgrade external OO RAS | minutes for one external ras (Download & upgrade) | Upgrade External OO RAS - Service Management | Need to upgrade OO RAS or internal owned tenants:
| |||
5 | Rollback |
| Deploy AC | 1.Make sure the SMAX/HCMX is helm environment 2.Deploy AC | Please firstly deploy the AC as the document on the right before you enable it. | https://staging.docs.microfocus.com/itom/Automation_Center:Main/DeployAC
|
| Enable AC |
|
| |
2. Generate tenant key chain |
| ||
| 3. Enable Vulnerability & Remediation tenant settings |
| ||
| 4. Create tenant schemas for AC backend service |
|