# ITOM-SaaS-Pain-Points_686083998 ## Introduction This page presents all the pain points for SaaS delivery on the ITOM BU level. ## Legends SMAX HCMX FINOPS OO OPS ## Pain points - Process - OPS Don’t wait until retro, we need a system that can record such inputs. - Quality - Recent system crashes caused by the new features. (Incident ID, Feature ID?) - SMAX SMAX platform pod loads are not well balanced. (24.2 may fix it) - It's not... - SMAX nativeSACM consumes high resource usage, especially network. - SMAX The SMAX still has OOTB index issues. - SMAX Redis is the single point of failure and is not easy to debug. - SMAX HCMX FINOPS OO Need more post upgrade tests. / CMS post upgrade issue is not acceptable - SMAX SLT task is weekly (unplanned change is not submitted ) - After 2024.1, CMS’s quality is not good. So many issues. - SLA - SMAX Missing monitoring metrics for CMS/Native SACM - CMS not fully ready for auto-healing (rolling restart takes more than 1 minute) - SMAX Missing correlations on several S1 / S2 alerts, for example, 5xx errors, and soft interrupts. - 5xx errors: put errors into categories - Soft interrupt: more metrics / diagnostics to get the detailed breakdown of the interrupt - SMAX It is missing the overall throttling mechanism/rate limit which causes unexpected outages on the farm. - 24.4? - OO OO upgrade takes hours to finish, the OORAS pods can only be upgraded one after another. - Solved in 24.3. - Security - OPS Missing the WAF rules rolling out, the farm is visited by malicious requests every day - WIP - OPS Major security KPI missing, including Qualys score, SIEM integration, etc. - Compliance - Missing the standard/certification for EU-managed - Maintenance & Operation - OPS Operation efforts increase when there are more farms, including upgrades, patches, etc (Automation rate is low.) - OPS Monitoring need to be improved. More meaningful alerts, less false alert. - Aligning the CPU alert threshold to 99.9%. - OPS Troubleshooting takes lots of time. - OPS Cannot always leverage Ops from other regions / how to grow up Ops from other regions - OPS Too many threads, need all the members to do self-driven. - OPS Need an option to better to utilize Shen Wei’s time - SMAX Logging issue - Accumulated logs cost more - Too much logs slows down troubleshooting - Too much log writing used up the network throughput - OO The tenant import feature cannot handle integrations like nativeSACM and OO. - SMAX OO Too many special settings to keep the system stable, and many of them can be lost during upgrade. - Cost - SMAX HCMX FINOPS OO When customer usage increases the resource doesn't increase linearly. - SMAX HCMX FINOPS OO FinOps, SMAX, and OO consume lots of resources, CMS resource usage is OK - The sizing of HCMX, OO are based on tenant number instead of usage. Usually for almost all the ESM farm, OO need to be medium profile ($65K/y) or even larger, which doesn't contribute any license revenue. - FinOps cost is usually more than SMAX large profile ($113K/y) - SMAX sizing is not helpful for medium or large sized customers. Usually the farm need to double or triple the resource required by sizing guide. - There is no sizing guide for integration, including API integration, nativeSACM, etc.