Amazon EKS Upgrade Journey From 1.27 to 1.28- welcoming Planternetes

Marcin Cuber
7 min readSep 27, 2023

We are now welcoming “Planternetes” release. Process and considerations while upgrading EKS control-plane to version 1.28.

Overview

I shall start slightly different this time. We are welcoming “Planternetes” to this world. It is a release which consists of 45 enhancements. They include 19 are entering Alpha, 14 have graduated to Beta, and 12 have graduated to Stable. For EKS purposes, we only really care about the stable feature since AWS team don’t support alpha and beta features anymore.

Note from the official Kubernetes team: “Each Kubernetes release is the culmination of the hard work of thousands of individuals from our community. Much like a garden, our release has ever-changing growth, challenges and opportunities. This theme celebrates the meticulous care, intention and efforts to get the release to where we are today. Harmoniously together, we grow better.”

Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is comprised of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. I would also like to thank the community, the team for their dedication and their specialised skills that make the code run. In many companies, specialists are not valued and are simply ignored which isn’t right. Hence, I would like to point this out in the public.

I would also like to thank again the entire release team for the hours spent hard at work to ensure in order to deliver a solid Kubernetes v1.28 release for us engineers.

Previous Stories and Upgrades

If you are looking at

  • upgrading EKS from 1.26 to 1.27 check out this story
  • upgrading EKS from 1.25 to 1.26 check out this story
  • upgrading EKS from 1.24 to 1.25 check out this story
  • upgrading EKS from 1.23 to 1.24 check out this story
  • upgrading EKS from 1.22 to 1.23 check out this story

Kuberentes 1.28- changes in this release

AWS EKS specific- optimized AMI update

Starting with Kubernetes version 1.28, you will no longer be able to use Amazon EC2 P2 instances with the Amazon EKS optimized accelerated Amazon Linux AMIs out of the box. These AMIs for Kubernetes versions 1.28 or later will support NVIDIA 525 series or later drivers, which are incompatible with the P2 instances. However, NVIDIA 525 series or later drivers are compatible with the P3, P4, and P5 instances, so you can use those instances with the AMIs for Kubernetes version 1.28 or later. Before your Amazon EKS clusters are upgraded to version 1.28, migrate any P2 instances to P3, P4, and P5 instances. You should also proactively upgrade your applications to work with the NVIDIA 525 series or later.

Recovery from non-graceful node shutdown- GA

This is a great feature which I partially like. If a node shuts down unexpectedly or ends up in a non-recoverable state (perhaps due to hardware failure or unresponsive OS), Kubernetes allows you to clean up afterward and allow stateful workloads to restart on a different node.

This allows stateful workloads to fail over to a different node successfully after the original node is shut down or in a non-recoverable state, such as the hardware failure or broken OS.

Improved CRDs- CustomResourceDefinition validation rules

In 1.28, two optional fields reason and fieldPath were added to allow users to specify the failure reason and fieldPath when validation failed.

Default StorageClass graduates to stable

Kubernetes automatically sets a storageClassName for a PersistentVolumeClaim (PVC) if you don't provide a value. This proactive approach isn’t limited to new PVCs; even existing PVCs benefit from this automation. Such user-centric enhancements make Kubernetes v1.28 a noteworthy upgrade.

Advanced topology management and fine-tuned pod placement reached beta

Kubernetes v1.28 introduces a sophisticated array of topology management features. These features, detailed in the KEP (#3545), are already enabled by default and available in beta. Together, they form a robust powerhouse that addresses the challenges of orchestrating pod placement in a way that maximizes resource efficiency, enhances performance, and fortifies fault tolerance.

TopologyManagerPolicyBetaOptions empowers you with advanced settings for fine-tuning pod placement based on factors such as node topology and resource availability.

TopologyManagerPolicyOptions provides offers an extra layer of granularity in tailoring pod placement according to unique cluster topologies.

If you want to read more about this see -> Control Topology Management Policies on a node.

Other features that are now stable

Upgrade your EKS with terraform

I used following providers for the upgrade:

This time upgrade of the control plane takes around ~8 minutes. I would say this is super fast and zero issues afterwards have been experienced by me. I don’t even think I noticed any unavailable from API server itself which did happen in previous upgrades. AWS are doing a great job at reducing the time it takes to upgrade EKS control plane.

I immediately upgraded worker nodes which took around ~15 minutes to join the upgraded EKS cluster. This time is dependent on how many worker nodes you have and how many pods need to be drained from old nodes.

In general full upgrade process controlplane + worker nodes took around ~23 mins. Really good time I would say.

I personally use Terraform to deploy and upgrade my EKS clusters. Here is an example of the EKS cluster resource.

resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = ["audit"]
name = local.name_prefix
role_arn = aws_iam_role.cluster.arn
version = "1.28"

vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
}

encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms-eks.key_arn
}
}

tags = var.tags
}

For worker nodes I have used official AMI with id: ami-070d35b39981a6e3d. I didn’t notice any issues after rotating all nodes. Nodes are running following version: v1.28.1-eks-43840fb

Templates I use for creating EKS clusters using Terraform can be found in my Github repository reachable under https://github.com/marcincuber/eks

Upgrading Managed EKS Add-ons

In this case the change is trivial and works fine, simply update the version of the add-on. In my case, from this release I utilise kube-proxy, coreDNS and ebs-csi-driver.

Terraform resources for add-ons

resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "kube-proxy"
addon_version = "v1.28.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "core_dns" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.4"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "aws_ebs_csi_driver" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.23.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

After upgrading EKS control-plane

Remember to upgrade core deployments and daemon sets that are recommended for EKS 1.28.

  1. CoreDNS — v1.10.1-eksbuild.4
  2. Kube-proxy — 1.28.1-eksbuild.1
  3. VPC CNI — 1.15.0-eksbuild.1
  4. aws-ebs-csi-driver- v1.23.0-eksbuild.1

The above is just a recommendation from AWS. You should look at upgrading all your components to match the 1.28 Kubernetes version. They could include:

  1. load balancer controller
  2. calico-node
  3. cluster-autoscaler or Karpenter
  4. external secrets operator
  5. kube-state-metrics
  6. metrics-server
  7. csi-secrets-store
  8. calico-typha and calico-typha-horizontal-autoscaler
  9. reloader
  10. keda (event driven autoscaler)

Summary and Conclusions

Even quicker upgrade of the EKS cluster than every before. In 9 mins the task to upgrade the controlplane was completed. I use Terraform to run my cluster and node upgrades so the pipeline made my life super easy.

Yet again, no significant issues. Hope you will have the same easy job to perform. All workloads worked just fine. I didn’t have to modify anything really.

If you are interested in the entire terraform setup for EKS, you can find it on my GitHub -> https://github.com/marcincuber/eks

Hope this article nicely aggregates all the important information around upgrading EKS to version 1.28 and it will help people speed up their task.

Long story short, you hate and/or you love Kubernetes but you still use it ;).

Please note that my notes relay on official AWS and Kubernetes sources.

Enjoy Kubernetes!!!

Sponsor Me

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber

--

--

Marcin Cuber

Principal Cloud Engineer, AWS Community Builder and Solutions Architect