Amazon EKS Upgrade Journey From 1.26 to 1.27 (Chill Vibes)
We are now welcoming “Chill Vibes” release. Process and considerations while upgrading EKS control-plane to version 1.27.
Overview
As per release code name of this release, it will Chill and Vibes. Not many major changes. I haven’t noticed any problems or issues. This is a nice and peaceful upgrade to perform.
As noted by the community, the reason we were able to enjoy a more calm release this time around, and that’s all the work that folks put in behind the scenes to improve how we manage the release. That’s what this theme celebrates, people putting in the work to make things better for the community.
Chill Vibes. In their official release announcement, the Kubernetes release team said the release was, “the first release that anyone can remember where we didn’t receive a single exception request after the enhancements freeze.”
Previous Stories and Upgrades
If you are looking at
- upgrading EKS from 1.25 to 1.26 check out this story
- upgrading EKS from 1.24 to 1.25 check out this story
- upgrading EKS from 1.23 to 1.24 check out this story
- upgrading EKS from 1.22 to 1.23 check out this story
Kuberentes 1.27- changes in this release
Freeze k8s.gcr.io
image registry
Replacing the old image registry, k8s.gcr.io with registry.k8s.io which has been generally available for several months. This release of Kubernetes will not be published to the k8s.gcr.io
registry any longer.
SeccompDefault
graduates to stable
To use seccomp profile defaulting, you must run the kubelet with the --seccomp-default
command line flag enabled for each node where you want to use it. If enabled, the kubelet will use the RuntimeDefault
seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined
(seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions.
You can find detailed information about a possible upgrade and downgrade strategy in the related Kubernetes Enhancement Proposal (KEP): Enable seccomp by default.
Mutable scheduling directives for Jobs graduates to GA
This was introduced in v1.22 and started as a beta level, now it’s stable. In most cases a parallel job will want the pods to run with constraints, like all in the same zone, or all either on GPU model x or y but not a mix of both.
This feature allows updating a Job’s scheduling directives before it starts, which gives custom queue controllers the ability to influence pod placement while at the same time offloading actual pod-to-node assignment to kube-scheduler. This is allowed only for suspended Jobs that have never been unsuspended before. The fields in a Job’s pod template that can be updated are node affinity, node selector, tolerations, labels ,annotations, and scheduling gates. Find more details in the KEP: Allow updating scheduling directives of jobs.
DownwardAPIHugePages graduates to stable
In Kubernetes v1.20, support for requests.hugepages-<pagesize>
and limits.hugepages-<pagesize>
was added to the downward API to be consistent with other resources like cpu, memory, and ephemeral storage. This feature graduates to stable in this release. You can find more details in the KEP: Downward API HugePages.
More fine-grained pod topology spread policies reached beta
Kubernetes v1.27 unveils an advanced suite of pod topology spread policies. These features, described in the KEPs (#3022, #3094, #3243) , are immediately available as they are enabled by default. They collectively offer a powerful toolset that bolsters the capacity to evenly distribute workloads, amplifies resilience, and simplifies the execution of rolling upgrades. Firstly #3022 unveils the minDomains parameter, gives you the ability to determine the minimum number of domains your pods should occupy, thereby guaranteeing a balanced spread of workloads across the cluster. Second in line, #3094 introduces the nodeAffinityPolicy and nodeTaintPolicy parameters, which allow for an extra level of granularity in governing pod distribution according to node affinities and taints. This particular feature is linked with the NodeInclusionPolicyInPodTopologySpread gate, now enabled by default for better utilization. Lastly, #3243 implements the matchLabelKeys field in the topologySpreadConstraints of your pod’s specification, which permits the selection of pods for spreading calculations following a rolling upgrade.
You can find out more on Kubernetes 1.27: More fine-grained pod topology spread policies reached beta.
Raised default API query-per-second limits for kubelet
In versions prior to v1.27, the Amazon EKS kubelet had default values of 10 requests per second for kubeAPIQPS and a burst limit of 20 requests for kubeAPIBurst, which determined the rate at which the kubelet could handle incoming requests. New new default value is 50 requests per second for kubeAPIQPS and a burst limit of 100 requests for kubeAPIBurst. These changes improve how quickly pods start running on new nodes when there is a sudden demand for additional resources.
Removal of --container-runtime
command line argument
The --container-runtime
command line argument for the kubelet has been removed. The default container runtime for Amazon EKS has been containerd since v1.24, which eliminates the need to specify the container runtime. It is important that you do not pass this argument to --kubelet-extra-args
in order to prevent errors during the node bootstrap process. You must remove the --container-runtime
argument from all your node creation workflows and build scripts.
Upgrade your EKS with terraform
This time upgrade of the control plane takes around ~9 minutes and didn’t cause any issues. AWS are doing a great job at reducing the time it takes to upgrade EKS control plane.
I immediately upgraded worker nodes which took around 10–20 minutes to join the upgraded EKS cluster. This time is dependent on how many worker nodes you have and how many pods need to be drained from old nodes.
In general full upgrade process controlplane + worker nodes took around ~22 mins. Really good time I would say.
I personally use Terraform to deploy and upgrade my EKS clusters. Here is an example of the EKS cluster resource.
resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = ["audit"]
name = local.name_prefix
role_arn = aws_iam_role.cluster.arn
version = "1.27"
vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
}
encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms-eks.key_arn
}
}
tags = var.tags
}
For worker nodes I have used official AMI with id: ami-017dc85cb46681399. I didn’t notice any issues after rotating all nodes. Nodes are running following version: v1.27.1-eks-2f008fe
Templates I use for creating EKS clusters using Terraform can be found in my Github repository reachable under https://github.com/marcincuber/eks/tree/master/terraform-aws
Please note that I have noticed that after EKS upgrade, API server was not reachable for about 45 seconds. Requests eventually were handled after that.
Upgrading Managed EKS Add-ons
In this case the change is trivial and works fine, simply update the version of the add-on. In my case, from this release I utilise kube-proxy, coreDNS and ebs-csi-driver.
Terraform resources for add-ons
resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "kube-proxy"
addon_version = "v1.27.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "core_dns" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
resource "aws_eks_addon" "aws_ebs_csi_driver" {
cluster_name = aws_eks_cluster.cluster[0].name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.19.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}
After upgrading EKS control-plane
Remember to upgrade core deployments and daemon sets that are recommended for EKS 1.27.
- CoreDNS — v1.10.1-eksbuild.1
- Kube-proxy — 1.27.1-eksbuild.1
- VPC CNI — 1.12.2-eksbuild.1
- aws-ebs-csi-driver- v1.19.0-eksbuild.1
The above is just a recommendation from AWS. You should look at upgrading all your components to match the 1.27 Kubernetes version. They could include:
- load balancer controller
- calico-node
- cluster-autoscaler or Karpenter
- external secrets operator
- kube-state-metrics
- metrics-server
- csi-secrets-store
- calico-typha and calico-typha-horizontal-autoscaler
- reloader
Summary and Conclusions
Even quicker upgrade of the EKS cluster than every before. In 10 mins the task to upgrade the controlplane was completed. I use Terraform to run my cluster and node upgrades so the pipeline made my life super easy.
Yet again, no significant issues. Hope you will have the same easy job to perform. All workloads worked just fine. I didn’t have to modify anything really.
If you are interested in the entire terraform setup for EKS, you can find it on my GitHub -> https://github.com/marcincuber/eks/tree/master/terraform-aws
Hope this article nicely aggregates all the important information around upgrading EKS to version 1.27 and it will help people speed up their task.
Long story short, you hate and/or you love Kubernetes but you still use it ;).
Enjoy Kubernetes!!!
Sponsor Me
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber