Files
nexus/knowledgebase/csd-wiki/ICSD/ITOM-SaaS-Pain-Points_686083998.md
2026-04-18 17:09:43 +08:00

69 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ITOM-SaaS-Pain-Points_686083998
## Introduction
This page presents all the pain points for SaaS delivery on the ITOM BU level.
## Legends
SMAX
HCMX
FINOPS
OO
OPS
## Pain points
- Process
- OPS Dont wait until retro, we need a system that can record such inputs.
- Quality
- Recent system crashes caused by the new features. (Incident ID, Feature ID?)
- SMAX SMAX platform pod loads are not well balanced. (24.2 may fix it)
- It's not...
- SMAX nativeSACM consumes high resource usage, especially network.
- SMAX The SMAX still has OOTB index issues.
- SMAX Redis is the single point of failure and is not easy to debug.
- SMAX HCMX FINOPS OO Need more post upgrade tests. / CMS post upgrade issue is not acceptable
- SMAX SLT task is weekly (unplanned change is not submitted )
- After 2024.1, CMSs quality is not good. So many issues.
- SLA
- SMAX Missing monitoring metrics for CMS/Native SACM
- CMS not fully ready for auto-healing (rolling restart takes more than 1 minute)
- SMAX Missing correlations on several S1 / S2 alerts, for example, 5xx errors, and soft interrupts.
- 5xx errors: put errors into categories
- Soft interrupt: more metrics / diagnostics to get the detailed breakdown of the interrupt
- SMAX It is missing the overall throttling mechanism/rate limit which causes unexpected outages on the farm.
- 24.4?
- OO OO upgrade takes hours to finish, the OORAS pods can only be upgraded one after another.
- Solved in 24.3.
- Security
- OPS Missing the WAF rules rolling out, the farm is visited by malicious requests every day
- WIP
- OPS Major security KPI missing, including Qualys score, SIEM integration, etc.
- Compliance
- Missing the standard/certification for EU-managed
- Maintenance & Operation
- OPS Operation efforts increase when there are more farms, including upgrades, patches, etc (Automation rate is low.)
- OPS Monitoring need to be improved. More meaningful alerts, less false alert.
- Aligning the CPU alert threshold to 99.9%.
- OPS Troubleshooting takes lots of time.
- OPS Cannot always leverage Ops from other regions / how to grow up Ops from other regions
- OPS Too many threads, need all the members to do self-driven.
- OPS Need an option to better to utilize Shen Weis time
- SMAX Logging issue
- Accumulated logs cost more
- Too much logs slows down troubleshooting
- Too much log writing used up the network throughput
- OO The tenant import feature cannot handle integrations like nativeSACM and OO.
- SMAX OO Too many special settings to keep the system stable, and many of them can be lost during upgrade.
- Cost
- SMAX HCMX FINOPS OO When customer usage increases the resource doesn't increase linearly.
- SMAX HCMX FINOPS OO FinOps, SMAX, and OO consume lots of resources, CMS resource usage is OK
- The sizing of HCMX, OO are based on tenant number instead of usage. Usually for almost all the ESM farm, OO need to be medium profile ($65K/y) or even larger, which doesn't contribute any license revenue.
- FinOps cost is usually more than SMAX large profile ($113K/y)
- SMAX sizing is not helpful for medium or large sized customers. Usually the farm need to double or triple the resource required by sizing guide.
- There is no sizing guide for integration, including API integration, nativeSACM, etc.