Amazon EKS Upgrade Journey From 1.26 to 1.27 (Chill Vibes)

Marcin Cuber
7 min readMay 26, 2023

--

We are now welcoming “Chill Vibes” release. Process and considerations while upgrading EKS control-plane to version 1.27.

Overview

As per release code name of this release, it will Chill and Vibes. Not many major changes. I haven’t noticed any problems or issues. This is a nice and peaceful upgrade to perform.

As noted by the community, the reason we were able to enjoy a more calm release this time around, and that’s all the work that folks put in behind the scenes to improve how we manage the release. That’s what this theme celebrates, people putting in the work to make things better for the community.

Chill Vibes. In their official release announcement, the Kubernetes release team said the release was, “the first release that anyone can remember where we didn’t receive a single exception request after the enhancements freeze.”

Previous Stories and Upgrades

If you are looking at

  • upgrading EKS from 1.25 to 1.26 check out this story
  • upgrading EKS from 1.24 to 1.25 check out this story
  • upgrading EKS from 1.23 to 1.24 check out this story
  • upgrading EKS from 1.22 to 1.23 check out this story

Kuberentes 1.27- changes in this release

Freeze k8s.gcr.io image registry

Replacing the old image registry, k8s.gcr.io with registry.k8s.io which has been generally available for several months. This release of Kubernetes will not be published to the k8s.gcr.io registry any longer.

SeccompDefault graduates to stable

To use seccomp profile defaulting, you must run the kubelet with the --seccomp-default command line flag enabled for each node where you want to use it. If enabled, the kubelet will use the RuntimeDefault seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined (seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions.

You can find detailed information about a possible upgrade and downgrade strategy in the related Kubernetes Enhancement Proposal (KEP): Enable seccomp by default.

Mutable scheduling directives for Jobs graduates to GA

This was introduced in v1.22 and started as a beta level, now it’s stable. In most cases a parallel job will want the pods to run with constraints, like all in the same zone, or all either on GPU model x or y but not a mix of both.

This feature allows updating a Job’s scheduling directives before it starts, which gives custom queue controllers the ability to influence pod placement while at the same time offloading actual pod-to-node assignment to kube-scheduler. This is allowed only for suspended Jobs that have never been unsuspended before. The fields in a Job’s pod template that can be updated are node affinity, node selector, tolerations, labels ,annotations, and scheduling gates. Find more details in the KEP: Allow updating scheduling directives of jobs.

DownwardAPIHugePages graduates to stable

In Kubernetes v1.20, support for requests.hugepages-<pagesize> and limits.hugepages-<pagesize> was added to the downward API to be consistent with other resources like cpu, memory, and ephemeral storage. This feature graduates to stable in this release. You can find more details in the KEP: Downward API HugePages.

More fine-grained pod topology spread policies reached beta

Kubernetes v1.27 unveils an advanced suite of pod topology spread policies. These features, described in the KEPs (#3022, #3094, #3243) , are immediately available as they are enabled by default. They collectively offer a powerful toolset that bolsters the capacity to evenly distribute workloads, amplifies resilience, and simplifies the execution of rolling upgrades. Firstly #3022 unveils the minDomains parameter, gives you the ability to determine the minimum number of domains your pods should occupy, thereby guaranteeing a balanced spread of workloads across the cluster. Second in line, #3094 introduces the nodeAffinityPolicy and nodeTaintPolicy parameters, which allow for an extra level of granularity in governing pod distribution according to node affinities and taints. This particular feature is linked with the NodeInclusionPolicyInPodTopologySpread gate, now enabled by default for better utilization. Lastly, #3243 implements the matchLabelKeys field in the topologySpreadConstraints of your pod’s specification, which permits the selection of pods for spreading calculations following a rolling upgrade.

You can find out more on Kubernetes 1.27: More fine-grained pod topology spread policies reached beta.

Raised default API query-per-second limits for kubelet

In versions prior to v1.27, the Amazon EKS kubelet had default values of 10 requests per second for kubeAPIQPS and a burst limit of 20 requests for kubeAPIBurst, which determined the rate at which the kubelet could handle incoming requests. New new default value is 50 requests per second for kubeAPIQPS and a burst limit of 100 requests for kubeAPIBurst. These changes improve how quickly pods start running on new nodes when there is a sudden demand for additional resources.

Removal of --container-runtime command line argument

The --container-runtime command line argument for the kubelet has been removed. The default container runtime for Amazon EKS has been containerd since v1.24, which eliminates the need to specify the container runtime. It is important that you do not pass this argument to --kubelet-extra-args in order to prevent errors during the node bootstrap process. You must remove the --container-runtime argument from all your node creation workflows and build scripts.

Upgrade your EKS with terraform

This time upgrade of the control plane takes around ~9 minutes and didn’t cause any issues. AWS are doing a great job at reducing the time it takes to upgrade EKS control plane.

I immediately upgraded worker nodes which took around 10–20 minutes to join the upgraded EKS cluster. This time is dependent on how many worker nodes you have and how many pods need to be drained from old nodes.

In general full upgrade process controlplane + worker nodes took around ~22 mins. Really good time I would say.

I personally use Terraform to deploy and upgrade my EKS clusters. Here is an example of the EKS cluster resource.

resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = ["audit"]
name = local.name_prefix
role_arn = aws_iam_role.cluster.arn
version = "1.27"

vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
}

encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms-eks.key_arn
}
}

tags = var.tags
}

For worker nodes I have used official AMI with id: ami-017dc85cb46681399. I didn’t notice any issues after rotating all nodes. Nodes are running following version: v1.27.1-eks-2f008fe

Templates I use for creating EKS clusters using Terraform can be found in my Github repository reachable under https://github.com/marcincuber/eks/tree/master/terraform-aws

Please note that I have noticed that after EKS upgrade, API server was not reachable for about 45 seconds. Requests eventually were handled after that.

Upgrading Managed EKS Add-ons

In this case the change is trivial and works fine, simply update the version of the add-on. In my case, from this release I utilise kube-proxy, coreDNS and ebs-csi-driver.

Terraform resources for add-ons

resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "kube-proxy"
addon_version = "v1.27.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "core_dns" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "aws_ebs_csi_driver" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.19.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

After upgrading EKS control-plane

Remember to upgrade core deployments and daemon sets that are recommended for EKS 1.27.

  1. CoreDNS — v1.10.1-eksbuild.1
  2. Kube-proxy — 1.27.1-eksbuild.1
  3. VPC CNI — 1.12.2-eksbuild.1
  4. aws-ebs-csi-driver- v1.19.0-eksbuild.1

The above is just a recommendation from AWS. You should look at upgrading all your components to match the 1.27 Kubernetes version. They could include:

  1. load balancer controller
  2. calico-node
  3. cluster-autoscaler or Karpenter
  4. external secrets operator
  5. kube-state-metrics
  6. metrics-server
  7. csi-secrets-store
  8. calico-typha and calico-typha-horizontal-autoscaler
  9. reloader

Summary and Conclusions

Even quicker upgrade of the EKS cluster than every before. In 10 mins the task to upgrade the controlplane was completed. I use Terraform to run my cluster and node upgrades so the pipeline made my life super easy.

Yet again, no significant issues. Hope you will have the same easy job to perform. All workloads worked just fine. I didn’t have to modify anything really.

If you are interested in the entire terraform setup for EKS, you can find it on my GitHub -> https://github.com/marcincuber/eks/tree/master/terraform-aws

Hope this article nicely aggregates all the important information around upgrading EKS to version 1.27 and it will help people speed up their task.

Long story short, you hate and/or you love Kubernetes but you still use it ;).

Enjoy Kubernetes!!!

Sponsor Me

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber

--

--

Marcin Cuber

Principal Cloud Engineer, AWS Community Builder and Solutions Architect