Files
nexus/knowledgebase/csd-wiki/ICSD/EKS-upgrade-from-version-1.30-to-1.31_706832607.md
2026-04-18 17:09:43 +08:00

149 lines
12 KiB
Markdown

# EKS-upgrade-from-version-1.30-to-1.31_706832607
## Introduction
This page describes the steps for upgrading the EKS cluster of ESM in SaaS environment, from version 1.30 to 1.31.
Reference resources: [https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpage.action?spaceKey=SMA&title=How%20to%20upgrade%20EKS%20in%20SaaS](https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpage.action?spaceKey=SMA&title=How%20to%20upgrade%20EKS%20in%20SaaS)
The process has 3 main parts: 1. Upgrading the add-ons; 2. Upgrading the EKS cluster; 3. Upgrading the EKS worker node groups.
## 1\. Upgrading the add-ons
The add-ons **coredns**, **vpc-cni** and **kube-proxy** need to be upgraded before driving the EKS upgrade. Here are the referenced instructions:
[https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html](https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html "https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html")
[https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html "https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html")
[https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html](https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html "https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html")
**1.1. Upgrading the *coredns* add-on**
Open the subsequent referenced Amazon page: [https://docs.aws.amazon.com/eks/latest/userguide/coredns-add-on-self-managed-update.html](https://docs.aws.amazon.com/eks/latest/userguide/coredns-add-on-self-managed-update.html).
1.1.1 **Confirm**, in the bastion's cli, that you have the self-managed type of the add-on installed on your cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name coredns --query addon.addonVersion --output text
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name coredns --query addon.addonVersion --output text
If an error message is returned, you have the self-managed type of the add-on installed on your cluster.
1.1.2. **Check** the version of the container image that is currently installed on the cluster.
kubectl describe deployment coredns -n kube-system | grep Image | cut -d ":" -f 3
1.1.3. **Check** the current CoreDNS image version:
kubectl describe deployment coredns -n kube-system | grep Image
1.1.4. Since the upgrade is made to CoreDNS v1.11.4-eksbuild.14, **add** the endpointslices permission to the system:coredns Kubernetes clusterrole.
kubectl edit clusterrole system:coredns -n kube-system
Add the following lines under the existing permissions lines in the rules section of the file.
\[...\]
\- apiGroups:
\- [discovery.k8s.io](http://discovery.k8s.io/)
resources:
\- endpointslices
verbs:
\- list
\- watch
\[...\]
1.1.5. **Update** the CoreDNS - replace just the region and the image version:
kubectl set image deployment.apps/coredns -n kube-system coredns= [602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v1.11.4-eksbuild.14](http://602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/coredns:v1.11.4-eksbuild.14)
1.1.5. **Check** the pods in the kube-system namespace and the add-on version now installed:
kubectl get pods -n kube-system
kubectl describe deployment coredns -n kube-system | grep Image | cut -d ":" -f 3
**1.2. Upgrading the *vpc-cni* add-on**
Open the subsequent referenced Amazon page: [https://docs.aws.amazon.com/eks/latest/userguide/vpc-add-on-self-managed-update.html](https://docs.aws.amazon.com/eks/latest/userguide/vpc-add-on-self-managed-update.html)
1.2.1. **Confirm** that the Amazon EKS type of the add-on is not installed on the cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name vpc-cni --query addon.addonVersion --output text
If an error message is returned, the Amazon EKS type of the add-on is not installed on the cluster.
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name vpc-cni --query addon.addonVersion --output text
1.2.2. **Check** the version of the container image that is currently installed on the cluster.
kubectl describe daemonset aws-node --namespace kube-system | grep amazon-k8s-cni: | cut -d: -f 3
1.2.3. Navigate to /opt/25/2 and **backup** the current settings so to configure the same settings once the version is updated:
cd /opt/25.2/
kubectl get daemonset aws-node -n kube-system -o yaml > aws-k8s-cni-old.yaml
cat aws-k8s-cni-old.yaml
1.2.4. **Check** the latest available version table on the page: [https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-cni-latest-available-version](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-cni-latest-available-version) => v1.19.5-eksbuild.3
1.2.5. Create a folder for the EKS upgrade and **download** the vpc-cni manifest file in it:
mkdir eks\_upgrade\_1.31
cd eks\_upgrade\_1.31/
curl -O [https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.19.5/config/master/aws-k8s-cni.yaml](https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.19.5/config/master/aws-k8s-cni.yaml)
1.2.6. **Apply** the modified manifest to the cluster:
kubectl apply -f aws-k8s-cni.yaml
1.2.7. **Check** the pods in the kube-system namespace and the add-on version now installed:
watch 'kubectl get pods -n kube-system '
kubectl describe daemonset aws-node --namespace kube-system | grep amazon-k8s-cni: | cut -d: -f 3
1.2.8. Since custom networking (non-routable CIDR) is enabled on this farm, **re-enable** it after updating VPC CNI plugin.
kubectl set env daemonset aws-node -n kube-system AWS\_VPC\_K8S\_CNI\_CUSTOM\_NETWORK\_CFG=true
and **check** again the pods:
watch 'kubectl get pods -n kube-system '
**1.3. Upgrading the *kube-proxy* add-on**
Open the following in the AWS content tree page: [https://docs.aws.amazon.com/eks/latest/userguide/kube-proxy-add-on-self-managed-update.html](https://docs.aws.amazon.com/eks/latest/userguide/kube-proxy-add-on-self-managed-update.html)
1.3.1. **Check** that the self-managed type of the add-on is installed on the cluster. Replace my-cluster with the name of your cluster.
aws eks describe-addon --cluster-name my-cluster --addon-name kube-proxy --query addon.addonVersion --output text
e.g. aws eks describe-addon --cluster-name us2-dev-eks-cluster --addon-name kube-proxy --query addon.addonVersion --output text
If an error message is returned, then the self-managed type of the add-on is installed on your cluster.
1.3.2. **Check** the version of the container image that is currently installed on the cluster.
kubectl describe daemonset kube-proxy -n kube-system | grep Image
1.3.3. **Update** the kube-proxy add-on using the minimal version:
kubectl set image daemonset.apps/kube-proxy -n kube-system kube-proxy= [602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.31.9-minimal-eksbuild.2](http://602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.31.9-minimal-eksbuild.2)
1.3.4. **Check** that the new version is now installed on the cluster.
watch 'kubectl get pods -n kube-system'
kubectl get pods -n kube-system | grep kube-proxy
kubectl describe daemonset kube-proxy -n kube-system | grep Image | cut -d ":" -f 3
## 2\. Upgrading the EKS cluster
Login AWS console, go to the EKS service, click "Update now" and choose the targeted version, 1.31 in this case. Click "Update" and wait until the upgrade is completed, 15~45 minutes.
![](attachments/706832607/706832864.png)
![](attachments/706832607/706832865.png)
Once the EKS cluster is upgraded to the new version, upgrade the worker nodes to the new version accordingly.
## 3\. Upgrading the EKS worker node groups
Open the subsequent referenced Amazon page: [https://docs.aws.amazon.com/eks/latest/userguide/update-workers.html](https://docs.aws.amazon.com/eks/latest/userguide/update-workers.html)
3.1. **Create** a dedicated location on the Linux bastion for the EKS node groups upgrade
3.2. **Download** the scripts from this location: [https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpageattachments.action?pageId=1309586390&metadataLink=true](https://rndwiki.houston.softwaregrp.net/confluence/pages/viewpageattachments.action?pageId=1309586390&metadataLink=true)
3.3. If the preparation of the new node groups is being done in a different day than the one when the node groups are being actually upgraded, make sure that new node groups are created with 0 desired size, by **commenting** the last line in the script:
\# aws eks update-nodegroup-config --cluster-name $eks\_name --nodegroup-name $old\_nodegroup\_name-workernodes-1-$eks\_version --scaling-config minSize=$min\_size,maxSize=$max\_size,desiredSize=$desired\_size 2>&1 >/dev/null
3.4. **Run** the creation node group creation script [create-eks-worker.sh](attachments/706832607/709421232.sh):
sh./create-eks-worker.sh
If the script is not formatted properly, use the below command to **format** it correctly and re-run the script:
dos2unix create-eks-worker.sh
3.5. If not all the labels are created on each node group, use the script [tag\_ASG.sh](attachments/706832607/709421233.sh) here to **tag** them:
sh./tag\_ASG.sh
3.6. If one node is overloaded with pods, **evaluate** the pods on a certain node:
kubectl taint nodes ${currentNodeName} podReScheduler=value:NoExecute
3.7. **Scale** up the new node group to the desired size
AWS UI > EKS > <the cluster name> > Compute > <each worker node group> > Edit >
3.8. **Taint** the old worker nodes by running the in-line script lines:
nodes=$(kubectl get nodes | grep -i v1.30 | awk '{print $1}')
for node in $nodes
do
kubectl taint nodes ${node} podReScheduler=value:NoSchedule
done
3.9. **Check** if there are any pods still on the previous version, e.g. 1.30, worker nodes, by running these in-line script lines:
nodes=$(kubectl get nodes | grep -i v1.30 | awk '{print $1}')
for node in $nodes
do
kubectl get po -o wide -A | grep -i $node | grep -v 'aws-node-\\|kube-proxy-\\|ebs-csi-node\\|twistlock-defender\\|itom-prometheus-node-exporter-\\|itom-throttling-controller\\|Completed' | awk '{print $1,$2}'
done
3.10. If there are pods running on 1.30, only on small namespaces like: audit, core, kube-system, cert-manager, velero, manually **restart** them with the script [rollingMigratePodsByNamespace.sh](attachments/706832607/709421199.sh):
./rollingMigratePodsByNamespace.sh <namespace1> <namespace2>..
nohup sh rollingMigratePodsByNamespace.sh audit core kube-system &
e.g.
./rollingMigratePodsByNamespace.sh cert-manager kube-system monitoring velero
**Note:** It is not safe to run the script on big namespaces like itsma, core or monitoring.
3.11. Manually **restart** the pods on the itsma, core, monitoring namespaces:
kubectl delete pod itom-toolkit-6c5f5745b-cfzqx -n itsma-ohs8f
kubectl delete pod filebeat-drxl5 -n logging
kubectl delete pod suite-conf-pod-itsma-6854dd8f74-5c9dm -n core
3.12. **Check** again as on step #3.9 above.
3.13. Terminate and **delete** old version, e.g. 1.30, worker nodes.
AWS UI > EKS > <the cluster name> > Compute > <old node groups> > Delete.
3.14. Once all the old worknodes are terminated, **install** the Qualys agents on the new worknodes, except for US24-PROD, by using the install\_qualys\_agent.sh script install\_qualys\_agent.sh:
sh install\_qualys\_agent.sh <farmName>
e.g. sh install\_qualys\_agent.sh us6-prod
3.15. **SSH** to one of the new worknode, check that Qualys is installed by typing: service qualys-cloud-agent status
ssh -i worknodes.pem [ec2-user@ip-10-210-96-76.us-west-2.compute.intern](mailto:ec2-user@ip-10-210-96-76.us-west-2.compute.intern) al
service qualys-cloud-agent status
exit