Scheduled-scaling_686083970

Introduction

This page presents all the contents about scheduled scaling.

Background

Usually there are outages due to reaching the capacity of the farm during peak time.

Before the auto-scaling is available, we can leverage the concept of scheduled scaling:

Concept

Start up a new worker node group before the busy time begins, shutdown the new worker node group after the busy time ends. This is usually 1/3 to 1/2 of the day. So even we add 3-9 workers for the peak time, the cost is only 1-4 workers or even less, which is quite affordable.

As per previous experience, this approach is more stable than auto scaling for two reasons

It only do one scale up and scale down per day, less interruptions
It scale down during the non-peak hours, which has less impact on the farm

Details

Procedure

Plan for the schedule: scale up before the peak hour begins / scale down after the peak hour ends.
1. Peak ours: for example, 7 - 18 for EU8, 8-19 for BR14, etc.
Setup
1. Adding an additional worker node group for the new pods 2. Adding a new k8s cronjob for the scheduled scaling
Detailed action during scale up
1. Increase workers 2. Adding replicas to the scalable deployments 3. Rolling restart deployments
Detailed action during scale down
1. Scale in replicas 2. Drain and decrease workers 3. Rolling restart deployments

Settings of node number based on different sizing profiles, instance type: r5.xlarge, r6i.xlarge, etc.

Number of Concurrent User	Max nodes of group sma-autoscaling-nodes	Max nodes of non-autoscaling nodegroup
3000	10	6
1000	6	4
Others	Adjust the number based on the node resource usage monitoring	Adjust the number based on the node resource usage monitoring

Reference

SMA Autoscaling.

2.0 KiB Raw Blame History