34 lines
1.4 KiB
Markdown
34 lines
1.4 KiB
Markdown
# Auto-healing-1.0_686083903
|
|
## Introduction
|
|
|
|
This page presents all the specifications for fixing or healing.
|
|
|
|
## Types of healing
|
|
|
|
#### Scheduled healing
|
|
|
|
- Weekly - Rolling restart key deployments
|
|
- Weekly - Smart Analytics Content Compact
|
|
|
|
#### Event triggered healing
|
|
|
|
- ALB 5xx alert - Rolling restart key deployments
|
|
- Database free memory alert - Rolling restart key deployments
|
|
- Smart Analytics Content data ratio(total doc/committed doc) alert - Smart Analytics Content Compact
|
|
- Tomcat https connector threads/MAX threads alert - Rolling restart specific deployments
|
|
- Httpclient InUse/Max alert - Rolling restart specific deployments
|
|
|
|
## Mechanism to survive between false alarms
|
|
|
|
The auto healing steps may caused by false alarms. In order to protect the farm from those auto healing steps, it's always required to use the actions with no availability and performance impact.
|
|
|
|
For example, even the auto healing steps are triggered by accident, it should not impact the availability and performance of the farm. The mechanism can be in but not limited to below list:
|
|
|
|
- The jobs can only be triggered once an hour
|
|
- Once restart is required, rolling restart should be used
|
|
- If the job is not executed successfully, notifications will be sent to administrators
|
|
|
|
## Threshold
|
|
|
|
For the thresholds, please consider the numbers from the guide in [monitoring](https://rndwiki.houston.softwaregrp.net/confluence/display/SMA/Monitoring).
|