Amazon EKS Upgrade Journey From 1.24 to 1.25

Marcin Cuber
8 min readFeb 23, 2023

We are now welcoming “Combiner”. Process and considerations while upgrading EKS control-plane to version 1.25.

Overview

This Kubernetes 1.25 release was chosen to recognize both the diverse components that the project comprises and the individuals who contributed to it. Hence, the fitting release name, “Combiner”. Combiner highlights the collaborative nature of open source and its impact, both in general and with specific regard to Kubernetes. The Kubernetes release team stated the following, “With this release, we wish to honor the collaborative, open spirit that takes us from isolated developers, writers, and users spread around the globe to a combined force capable of changing the world. Kubernetes v1.25 includes a staggering 40 enhancements, none of which would exist without the incredible power we have when we work together.”

It is a release packed with great features. Read more about them below. I am also going to discuss upgrade details and any issues I encountered during the upgrade.

Previous Stories and Upgrades

If you are looking at

  • upgrading EKS from 1.23 to 1.24 check out this story
  • upgrading EKS from 1.22 to 1.23 check out this story
  • upgrading EKS from 1.21 to 1.22 check out this story
  • upgrading EKS from 1.20 to 1.21 check out this story
  • upgrading EKS from 1.19 to 1.20 check out this story
  • upgrading EKS from 1.18 to 1.19 check out this story
  • upgrading EKS from 1.17 to 1.18 check out this story

Kubernetes 1.25 Major updates

PodSecurityPolicy is removed; Pod Security Admission graduates to Stable

PodSecurityPolicy (PSP) have been deprecated in 1.21 release of Kuberentes and now they are completely removed. The updates required to improve its usability would have introduced breaking changes, so it became necessary to remove it in favour of a more friendly replacement. That replacement is Pod Security Admission, PSA was introduced as a beta feature to succeed PSPs in implementing pod security measures. These measures are defined by the Pod Security Standards (PSS) and allow you to restrict the behavior of your pods at a namespace level. In Kubernetes 1.25, PSA is stable and enabled by default in Amazon EKS. You can find the official documentation also provides assistance on this migration. Anything related to PSP removal can be found in FAQ.

Ephemeral containers in GA

Ephemeral containers were introduced long time ago in 1.16 version of Kubernetes. It was implemented and designed to enhance the process of debugging running pods. Ephemeral containers are deployed in the same namespace as the running pod that you intend to debug and has access to its containers’ file systems and process namespace. These type of containers are useful for troubleshooting exercises but aren’t meant to be used for normal application deployments. As such, they can’t be configured as you typically would other containers with fields like ports, readinessProbes, and livenessProbes. This feature is now generally available, allowing you to debug and run inspections on your workloads in Amazon EKS 1.25.

Network Policy endPort is Stable

Promoted endPort in Network Policy to GA. Network Policy providers that support endPort field now can use it to specify a range of ports to apply a Network Policy. Previously, each Network Policy could only target a single port.

Network Policies are very important, otherwise all pods on a cluster can communicate with each other. This default setup simplifies the initial adoption and remains a default configuration. However, it’s highly encouraged to adopt a Network Policy for production workloads to secure east-to-west network connections between pods. Network Policies implement allow-based firewall rules. Amazon EKS installs Amazon VPC CNI for pod networking and is the default Container Network Interface (CNI). You can read more about implementing Network Policies in Amazon EKS in the EKS Best Practices guide on networking. I personally use Calico

Cgroups v2 is stable

Control groups (Cgroups) are a Linux kernel feature that allow you to manage resources for running processes. With Cgroups, you can allocate and limit the usage of CPU, memory, network, disk I/O, etc. When working with Cgroups v2 in Amazon EKS 1.25, make sure to review the new configuration values to see some of the changes to ranges of values for resources (i.e., cpu.weight changes from [2–262144] to [1–10000]).

DaemonSet maxSurge is stable

When working with DaemonSets we can now control the maximum number of pods that can be created in place of old ones during a rolling update. To make use of this, you can add the desired value for the number of pods to the spec.strategy.rollingUpdate.maxSurge optional field. This value can also be expressed as a percentage of the existing desired pods.

Local ephemeral storage capacity isolation is stable

Kubernetes supports both persistent and ephemeral storage. Ephemeral storage is useful for things like caching, sharing transient data between multi-container pods, and logging. This feature has been available for long time and it provides a way for you to isolate and limit the local ephemeral storage consumed by a pod. In the same fashion as you set limits to CPU and memory resource management, you can set limits as well as reserve ephemeral storage with resource requests. A pod’s local ephemeral storage shares its lifecycle and doesn’t extend beyond that. Through the use of hard limits, pods can be evicted in the case that they exceed their individual configured capacity.

Core CSI migration is stable

Container Storage Interface (CSI) migration was introduced to solve the complexity surrounding in-tree storage plugins that were encoded into Kubernetes. The CSI Migration is now generally available (GA), enabling the use of drivers that interact with workloads on your cluster through the CSI. In EKS 1.25, Amazon EBS CSI migration is enabled by default. However, to use Amazon EBS volumes for resources like StorageClass, PersistentVolume, and PersistentVolumeClaim, then you need to install the corresponding Amazon EBS CSI driver as an Amazon EKS add-on. I am using EKS addon for the installation of CSI driver, below you can find terraform code for it.

New Feature which is not available in EKS 1.25- KMS v2

Kubernetes 1.25 introduces KMS v2alpha1 API to add performance, rotation, and observability improvements. Encrypt data at rest (ie Kubernetes Secrets) with DEK using AES-GCM instead of AES-CBC for kms data encryption. No user action is required. Reads with AES-GCM and AES-CBC will continue to be allowed. Unfortunately we will need to wait for some time to see this feature availbale in EKS.

Upgrade your EKS with terraform

This time upgrade of the control plane takes around ~10 minutes and didn’t cause any issues. AWS are doing a great job at reducing the time it takes to upgrade EKS control plane.

I immediately upgraded worker nodes which took around 10–20 minutes to join the upgraded EKS cluster. This time is dependent on how many worker nodes you have and how many pods need to be drained from old nodes.

In general full upgrade process controlplane + worker nodes took around ~22 mins. Really good time I would say.

I personally use Terraform to deploy and upgrade my EKS clusters. Here is an example of the EKS cluster resource.

resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = ["audit"]
name = local.name_prefix
role_arn = aws_iam_role.cluster.arn
version = "1.25"

vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
}

encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms-eks.key_arn
}
}

tags = var.tags
}

For worker nodes I have used official AMI with id: ami-0607584022f1c8ae9. I didn’t notice any issues after rotating all nodes. Nodes are running following version: v1.25.6-eks-48e63af

Templates I use for creating EKS clusters using Terraform can be found in my Github repository reachable under https://github.com/marcincuber/eks/tree/master/terraform-aws

Please note that I have noticed that after EKS upgrade, API server was not reachable for about 45 seconds. Requests eventually were handled after that.

Upgrading Managed EKS Add-ons

In this case the change is trivial and works fine, simply update the version of the add-on. In my case, from this release I utilise kube-proxy, coreDNS and ebs-csi-driver.

Terraform resources for add-ons

resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "kube-proxy"
addon_version = "1.25.6-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "core_dns" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "coredns"
addon_version = "v1.9.3-eksbuild.2"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "aws_ebs_csi_driver" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.16.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

After upgrading EKS control-plane

Remember to upgrade core deployments and daemon sets that are recommended for EKS 1.25.

  1. CoreDNS — v1.9.3-eksbuild.2
  2. Kube-proxy — 1.25.6-eksbuild.1
  3. VPC CNI — 1.12.2-eksbuild.1
  4. aws-ebs-csi-driver- v1.16.0-eksbuild.1

The above is just a recommendation from AWS. You should look at upgrading all your components to match the 1.24 Kubernetes version. They could include:

  1. calico-node
  2. cluster-autoscaler or Karpenter
  3. kube-state-metrics
  4. metrics-server
  5. csi-secrets-store
  6. calico-typha and calico-typha-horizontal-autoscaler
  7. reloader

Summary and Conclusions

Mega surprise with this release, upgrade of the cluster only took ~9mins. This is a huge reduction in time over previous upgrades. This upgrade time got reduced by at least 30mins compared to EKS 1.21 upgrade!

I have to say that this was a nice, pleasant and relatively fast upgrade. Yet again, no significant issues. Hope you will have the same easy job to perform. All workloads worked just fine. I didn’t have to modify anything really.

If you are interested in the entire terraform setup for EKS, you can find it on my GitHub -> https://github.com/marcincuber/eks/tree/master/terraform-aws

Hope this article nicely aggregates all the important information around upgrading EKS to version 1.26 and it will help people speed up their task.

Long story short, you hate and/or you love Kubernetes but you still use it ;).

Enjoy Kubernetes!!!

Sponsor Me

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber

--

--