12 KiB
EKS-upgrade-from-version-1.30-to-1.31_706832607
Introduction
This page describes the steps for upgrading the EKS cluster of ESM in SaaS environment, from version 1.30 to 1.31.
Reference resources: https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpage.action?spaceKey=SMA&title=How%20to%20upgrade%20EKS%20in%20SaaS
The process has 3 main parts: 1. Upgrading the add-ons; 2. Upgrading the EKS cluster; 3. Upgrading the EKS worker node groups.
1. Upgrading the add-ons
The add-ons coredns, vpc-cni and kube-proxy need to be upgraded before driving the EKS upgrade. Here are the referenced instructions:
https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html
https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html
1.1. Upgrading the coredns add-on
Open the subsequent referenced Amazon page: https://docs.aws.amazon.com/eks/latest/userguide/coredns-add-on-self-managed-update.html.
1.1.1 Confirm, in the bastion's cli, that you have the self-managed type of the add-on installed on your cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name coredns --query addon.addonVersion --output text
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name coredns --query addon.addonVersion --output text
If an error message is returned, you have the self-managed type of the add-on installed on your cluster.
1.1.2. Check the version of the container image that is currently installed on the cluster.
kubectl describe deployment coredns -n kube-system | grep Image | cut -d ":" -f 3
1.1.3. Check the current CoreDNS image version:
kubectl describe deployment coredns -n kube-system | grep Image
1.1.4. Since the upgrade is made to CoreDNS v1.11.4-eksbuild.14, add the endpointslices permission to the system:coredns Kubernetes clusterrole.
kubectl edit clusterrole system:coredns -n kube-system
Add the following lines under the existing permissions lines in the rules section of the file.
[...]
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch
[...]
1.1.5. Update the CoreDNS - replace just the region and the image version:
kubectl set image deployment.apps/coredns -n kube-system coredns= 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v1.11.4-eksbuild.14
1.1.5. Check the pods in the kube-system namespace and the add-on version now installed:
kubectl get pods -n kube-system
kubectl describe deployment coredns -n kube-system | grep Image | cut -d ":" -f 3
1.2. Upgrading the vpc-cni add-on
Open the subsequent referenced Amazon page: https://docs.aws.amazon.com/eks/latest/userguide/vpc-add-on-self-managed-update.html
1.2.1. Confirm that the Amazon EKS type of the add-on is not installed on the cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name vpc-cni --query addon.addonVersion --output text
If an error message is returned, the Amazon EKS type of the add-on is not installed on the cluster.
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name vpc-cni --query addon.addonVersion --output text
1.2.2. Check the version of the container image that is currently installed on the cluster.
kubectl describe daemonset aws-node --namespace kube-system | grep amazon-k8s-cni: | cut -d: -f 3
1.2.3. Navigate to /opt/25/2 and backup the current settings so to configure the same settings once the version is updated:
cd /opt/25.2/
kubectl get daemonset aws-node -n kube-system -o yaml > aws-k8s-cni-old.yaml
cat aws-k8s-cni-old.yaml
1.2.4. Check the latest available version table on the page: https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-cni-latest-available-version => v1.19.5-eksbuild.3
1.2.5. Create a folder for the EKS upgrade and download the vpc-cni manifest file in it:
mkdir eks_upgrade_1.31
cd eks_upgrade_1.31/
curl -O https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.19.5/config/master/aws-k8s-cni.yaml
1.2.6. Apply the modified manifest to the cluster:
kubectl apply -f aws-k8s-cni.yaml
1.2.7. Check the pods in the kube-system namespace and the add-on version now installed:
watch 'kubectl get pods -n kube-system '
kubectl describe daemonset aws-node --namespace kube-system | grep amazon-k8s-cni: | cut -d: -f 3
1.2.8. Since custom networking (non-routable CIDR) is enabled on this farm, re-enable it after updating VPC CNI plugin.
kubectl set env daemonset aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true
and check again the pods:
watch 'kubectl get pods -n kube-system '
1.3. Upgrading the kube-proxy add-on
Open the following in the AWS content tree page: https://docs.aws.amazon.com/eks/latest/userguide/kube-proxy-add-on-self-managed-update.html
1.3.1. Check that the self-managed type of the add-on is installed on the cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name kube-proxy --query addon.addonVersion --output text
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name kube-proxy --query addon.addonVersion --output text
If an error message is returned, then the self-managed type of the add-on is installed on your cluster.
1.3.2. Check the version of the container image that is currently installed on the cluster.
kubectl describe daemonset kube-proxy -n kube-system | grep Image
1.3.3. Update the kube-proxy add-on using the minimal version:
kubectl set image daemonset.apps/kube-proxy -n kube-system kube-proxy= 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.31.9-minimal-eksbuild.2
1.3.4. Check that the new version is now installed on the cluster.
watch 'kubectl get pods -n kube-system'
kubectl get pods -n kube-system | grep kube-proxy
kubectl describe daemonset kube-proxy -n kube-system | grep Image | cut -d ":" -f 3
2. Upgrading the EKS cluster
Login AWS console, go to the EKS service, click "Update now" and choose the targeted version, 1.31 in this case. Click "Update" and wait until the upgrade is completed, 15~45 minutes.
Once the EKS cluster is upgraded to the new version, upgrade the worker nodes to the new version accordingly.
3. Upgrading the EKS worker node groups
Open the subsequent referenced Amazon page: https://docs.aws.amazon.com/eks/latest/userguide/update-workers.html
3.1. Create a dedicated location on the Linux bastion for the EKS node groups upgrade
3.2. Download the scripts from this location: https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpageattachments.action?pageId=1309586390&metadataLink=true
3.3. If the preparation of the new node groups is being done in a different day than the one when the node groups are being actually upgraded, make sure that new node groups are created with 0 desired size, by commenting the last line in the script:
# aws eks update-nodegroup-config --cluster-name $eks_name --nodegroup-name $old_nodegroup_name-workernodes-1-$eks_version --scaling-config minSize=$min_size,maxSize=$max_size,desiredSize=$desired_size 2>&1 >/dev/null
3.4. Run the creation node group creation script create-eks-worker.sh:
sh./create-eks-worker.sh
If the script is not formatted properly, use the below command to format it correctly and re-run the script:
dos2unix create-eks-worker.sh
3.5. If not all the labels are created on each node group, use the script tag_ASG.sh here to tag them:
sh./tag_ASG.sh
3.6. If one node is overloaded with pods, evaluate the pods on a certain node:
kubectl taint nodes ${currentNodeName} podReScheduler=value:NoExecute
3.7. Scale up the new node group to the desired size
AWS UI > EKS > > Compute > > Edit >
3.8. Taint the old worker nodes by running the in-line script lines:
nodes=$(kubectl get nodes | grep -i v1.30 | awk '{print $1}')
for node in $nodes
do
kubectl taint nodes ${node} podReScheduler=value:NoSchedule
done
3.9. Check if there are any pods still on the previous version, e.g. 1.30, worker nodes, by running these in-line script lines:
nodes=$(kubectl get nodes | grep -i v1.30 | awk '{print $1}')
for node in $nodes
do
kubectl get po -o wide -A | grep -i $node | grep -v 'aws-node-\|kube-proxy-\|ebs-csi-node\|twistlock-defender\|itom-prometheus-node-exporter-\|itom-throttling-controller\|Completed' | awk '{print $1,$2}'
done
3.10. If there are pods running on 1.30, only on small namespaces like: audit, core, kube-system, cert-manager, velero, manually restart them with the script rollingMigratePodsByNamespace.sh:
./rollingMigratePodsByNamespace.sh ..
nohup sh rollingMigratePodsByNamespace.sh audit core kube-system &
e.g.
./rollingMigratePodsByNamespace.sh cert-manager kube-system monitoring velero
Note: It is not safe to run the script on big namespaces like itsma, core or monitoring.
3.11. Manually restart the pods on the itsma, core, monitoring namespaces:
kubectl delete pod itom-toolkit-6c5f5745b-cfzqx -n itsma-ohs8f
kubectl delete pod filebeat-drxl5 -n logging
kubectl delete pod suite-conf-pod-itsma-6854dd8f74-5c9dm -n core
3.12. Check again as on step #3.9 above.
3.13. Terminate and delete old version, e.g. 1.30, worker nodes.
AWS UI > EKS > > Compute > > Delete.
3.14. Once all the old worknodes are terminated, install the Qualys agents on the new worknodes, except for US24-PROD, by using the install_qualys_agent.sh script install_qualys_agent.sh:
sh install_qualys_agent.sh
e.g. sh install_qualys_agent.sh us6-prod
3.15. SSH to one of the new worknode, check that Qualys is installed by typing: service qualys-cloud-agent status
ssh -i worknodes.pem ec2-user@ip-10-210-96-76.us-west-2.compute.intern al
service qualys-cloud-agent status
exit

