Amazon EKS Upgrade Journey From 1.27 to 1.28- welcoming Planternetes
We are now welcoming “Planternetes” release. Process and considerations while upgrading EKS control-plane to version 1.28.
Overview
I shall start slightly different this time. We are welcoming “Planternetes” to this world. It is a release which consists of 45 enhancements. They include 19 are entering Alpha, 14 have graduated to Beta, and 12 have graduated to Stable. For EKS purposes, we only really care about the stable feature since AWS team don’t support alpha and beta features anymore.
Note from the official Kubernetes team: “Each Kubernetes release is the culmination of the hard work of thousands of individuals from our community. Much like a garden, our release has ever-changing growth, challenges and opportunities. This theme celebrates the meticulous care, intention and efforts to get the release to where we are today. Harmoniously together, we grow better.”
Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is comprised of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. I would also like to thank the community, the team for their dedication and their specialised skills that make the code run. In many companies, specialists are not valued and are simply ignored which isn’t right. Hence, I would like to point this out in the public.
I would also like to thank again the entire release team for the hours spent hard at work to ensure in order to deliver a solid Kubernetes v1.28 release for us engineers.
Previous Stories and Upgrades
If you are looking at
- upgrading EKS from 1.26 to 1.27 check out this story
- upgrading EKS from 1.25 to 1.26 check out this story
- upgrading EKS from 1.24 to 1.25 check out this story
- upgrading EKS from 1.23 to 1.24 check out this story
- upgrading EKS from 1.22 to 1.23 check out this story
Kuberentes 1.28- changes in this release
AWS EKS specific- optimized AMI update
Starting with Kubernetes version 1.28
, you will no longer be able to use Amazon EC2 P2 instances with the Amazon EKS optimized accelerated Amazon Linux AMIs out of the box. These AMIs for Kubernetes versions 1.28
or later will support NVIDIA 525 series or later drivers, which are incompatible with the P2 instances. However, NVIDIA 525 series or later drivers are compatible with the P3, P4, and P5 instances, so you can use those instances with the AMIs for Kubernetes version 1.28
or later. Before your Amazon EKS clusters are upgraded to version 1.28
, migrate any P2 instances to P3, P4, and P5 instances. You should also proactively upgrade your applications to work with the NVIDIA 525 series or later.
Recovery from non-graceful node shutdown- GA
This is a great feature which I partially like. If a node shuts down unexpectedly or ends up in a non-recoverable state (perhaps due to hardware failure or unresponsive OS), Kubernetes allows you to clean up afterward and allow stateful workloads to restart on a different node.
This allows stateful workloads to fail over to a different node successfully after the original node is shut down or in a non-recoverable state, such as the hardware failure or broken OS.
Improved CRDs- CustomResourceDefinition validation rules
In 1.28, two optional fields reason
and fieldPath
were added to allow users to specify the failure reason and fieldPath when validation failed.
Default StorageClass graduates to stable
Kubernetes automatically sets a storageClassName
for a PersistentVolumeClaim (PVC) if you don't provide a value. This proactive approach isn’t limited to new PVCs; even existing PVCs benefit from this automation. Such user-centric enhancements make Kubernetes v1.28 a noteworthy upgrade.
Advanced topology management and fine-tuned pod placement reached beta
Kubernetes v1.28 introduces a sophisticated array of topology management features. These features, detailed in the KEP (#3545), are already enabled by default and available in beta. Together, they form a robust powerhouse that addresses the challenges of orchestrating pod placement in a way that maximizes resource efficiency, enhances performance, and fortifies fault tolerance.
TopologyManagerPolicyBetaOptions empowers you with advanced settings for fine-tuning pod placement based on factors such as node topology and resource availability.
TopologyManagerPolicyOptions provides offers an extra layer of granularity in tailoring pod placement according to unique cluster topologies.
If you want to read more about this see -> Control Topology Management Policies on a node.
Other features that are now stable
kubectl events
- Retroactive default StorageClass assignment
- Non-graceful node shutdown
- Support 3rd party device monitoring plugins
- Auth API to get self-user attributes
- Proxy Terminating Endpoints
- Expanded DNS Configuration
- Cleaning up IPTables Chain Ownership
- Minimizing iptables-restore input size
- Graduate the kubelet pod resources endpoint to GA
- Extend podresources API to report allocatable resources
- Move EndpointSlice Reconciler into Staging
Upgrade your EKS with terraform
I used following providers for the upgrade:
This time upgrade of the control plane takes around ~8 minutes. I would say this is super fast and zero issues afterwards have been experienced by me. I don’t even think I noticed any unavailable from API server itself which did happen in previous upgrades. AWS are doing a great job at reducing the time it takes to upgrade EKS control plane.
I immediately upgraded worker nodes which took around ~15 minutes to join the upgraded EKS cluster. This time is dependent on how many worker nodes you have and how many pods need to be drained from old nodes.
In general full upgrade process controlplane + worker nodes took around ~23 mins. Really good time I would say.
I personally use Terraform to deploy and upgrade my EKS clusters. Here is an example of the EKS cluster resource.
resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = ["audit"]
name = local.name_prefix
role_arn = aws_iam_role.cluster.arn
version = "1.28"
vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
}
encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms-eks.key_arn
}
}
tags = var.tags
}
For worker nodes I have used official AMI with id: ami-070d35b39981a6e3d. I didn’t notice any issues after rotating all nodes. Nodes are running following version: v1.28.1-eks-43840fb
Templates I use for creating EKS clusters using Terraform can be found in my Github repository reachable under https://github.com/marcincuber/eks
Upgrading Managed EKS Add-ons
In this case the change is trivial and works fine, simply update the version of the add-on. In my case, from this release I utilise kube-proxy, coreDNS and ebs-csi-driver.
Terraform resources for add-ons
resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "kube-proxy"
addon_version = "v1.28.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "core_dns" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.4"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "aws_ebs_csi_driver" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.23.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
After upgrading EKS control-plane
Remember to upgrade core deployments and daemon sets that are recommended for EKS 1.28.
- CoreDNS — v1.10.1-eksbuild.4
- Kube-proxy — 1.28.1-eksbuild.1
- VPC CNI — 1.15.0-eksbuild.1
- aws-ebs-csi-driver- v1.23.0-eksbuild.1
The above is just a recommendation from AWS. You should look at upgrading all your components to match the 1.28 Kubernetes version. They could include:
- load balancer controller
- calico-node
- cluster-autoscaler or Karpenter
- external secrets operator
- kube-state-metrics
- metrics-server
- csi-secrets-store
- calico-typha and calico-typha-horizontal-autoscaler
- reloader
- keda (event driven autoscaler)
Summary and Conclusions
Even quicker upgrade of the EKS cluster than every before. In 9 mins the task to upgrade the controlplane was completed. I use Terraform to run my cluster and node upgrades so the pipeline made my life super easy.
Yet again, no significant issues. Hope you will have the same easy job to perform. All workloads worked just fine. I didn’t have to modify anything really.
If you are interested in the entire terraform setup for EKS, you can find it on my GitHub -> https://github.com/marcincuber/eks
Hope this article nicely aggregates all the important information around upgrading EKS to version 1.28 and it will help people speed up their task.
Long story short, you hate and/or you love Kubernetes but you still use it ;).
Please note that my notes relay on official AWS and Kubernetes sources.
Enjoy Kubernetes!!!
Sponsor Me
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber