Files
nexus/knowledgebase/csd-wiki/ICSD/Scheduled-scaling_686083970.md

2.0 KiB

Scheduled-scaling_686083970

Introduction

This page presents all the contents about scheduled scaling.

Background

Usually there are outages due to reaching the capacity of the farm during peak time.

Before the auto-scaling is available, we can leverage the concept of scheduled scaling:

Concept

Start up a new worker node group before the busy time begins, shutdown the new worker node group after the busy time ends. This is usually 1/3 to 1/2 of the day. So even we add 3-9 workers for the peak time, the cost is only 1-4 workers or even less, which is quite affordable.

As per previous experience, this approach is more stable than auto scaling for two reasons

  1. It only do one scale up and scale down per day, less interruptions
  2. It scale down during the non-peak hours, which has less impact on the farm

Details

Procedure

  1. Plan for the schedule: scale up before the peak hour begins / scale down after the peak hour ends.
    1. Peak ours: for example, 7 - 18 for EU8, 8-19 for BR14, etc.
  2. Setup
    1. Adding an additional worker node group for the new pods 2. Adding a new k8s cronjob for the scheduled scaling
  3. Detailed action during scale up
    1. Increase workers 2. Adding replicas to the scalable deployments 3. Rolling restart deployments
  4. Detailed action during scale down
    1. Scale in replicas 2. Drain and decrease workers 3. Rolling restart deployments
  5. Settings of node number based on different sizing profiles, instance type: r5.xlarge, r6i.xlarge, etc.
    Number of Concurrent User Max nodes of group sma-autoscaling-nodes Max nodes of non-autoscaling nodegroup
    3000 10 6
    1000 6 4
    Others Adjust the number based on the node resource usage monitoring Adjust the number based on the node resource usage monitoring

Reference

  1. SMA Autoscaling.