Amazon EKS- migrating Karpenter resources from Beta to v1 API version

Marcin Cuber
7 min readAug 16, 2024

--

Detailed migration journey from v1beta1 to API version v1 of karpenter.sh using GitOps strategy (Fluxv2).

Introduction

Karpenter is a Kubernetes node lifecycle manager created by AWS. It is responsible for maintaining, running, scaling, and minimising cluster node configurations. It is a major project that was started to provide a better alternative to cluster-autoscaler which Karpenter greatly achieved in quick succession.

With the release of Karpenter v1.0.0, Karpenter has graduated out of beta and ships with stable APIs — NodePool and EC2NodeClass –remains available for future 1.0 minor version releases and will not be modified in ways that result in breaking changes from one minor release to another.

In my previous story I covered migration for alpha to beta API version. As of yesterday, Karpenter v1 api version is available which I want to focus on.

I have based my own work on the official documentation which you can find at:

  1. v1 migration procedure
  2. aws announcing Karpenter v1

If you read any of my previous stories, you probably know that I am utilising FluxV2 (Gitops strategy) to deploy all my yaml manifests and helm charts. Before I started my upgrade I read the migration procedure and I though it would be even easier with Flux which will deploy things nicely without much of my input. How wrong was I?! Turns out I was very wrong and ended up debugging Karpenter V1 deployed resources for another 5 to 6 hours.

Based on the above, I would like to share this story how to perhaps do this migration in a less stressful way and hopefully in the way that GitOps strategy and mechanisms like Flux and ArgoCd can be well utilised.

What is changing

It is worth bringing this upfront as there is a lot of changes and even though I read them all before upgrade, I still missed couple of them which I needed to fix after upgrading. Note that all of the below info is copied stright from official migration guide.

Features:

  • AMI Selector Terms has a new Alias field which can only be set by itself in EC2NodeClass.Spec.AMISelectorTerms
  • Disruption Budgets by Reason was added to NodePool.Spec.Disruption.Budgets
  • TerminationGracePeriod was added to NodePool.Spec.Template.Spec.
  • LOG_OUTPUT_PATHS and LOG_ERROR_OUTPUT_PATHS environment variables added
  • API Rename: NodePool’s ConsolidationPolicy WhenUnderutilized is now renamed to WhenEmptyOrUnderutilized

Behavior Changes:

  • Expiration is now forceful and begins draining as soon as it’s expired. Karpenter does not wait for replacement capacity to be available before draining, but will start provisioning a replacement as soon as the node is expired and begins draining.
  • Karpenter’s generated NodeConfig now takes precedence when generating UserData with the AL2023 amiFamily. If you’re setting any values managed by Karpenter in your AL2023 UserData, configure these through Karpenter natively (e.g. kubelet configuration fields).
  • Karpenter now adds a karpenter.sh/unregistered:NoExecute taint to nodes in injected UserData when using alias in AMISelectorTerms or non-Custom AMIFamily. When using amiFamily: Custom, users will need to add this taint into their UserData, where Karpenter will automatically remove it when provisioning nodes.
  • Discovered standard AL2023 AMIs will no longer be considered compatible with GPU / accelerator workloads. If you’re using an AL2023 EC2NodeClass (without AMISelectorTerms) for these workloads, you will need to select your AMI via AMISelectorTerms (non-alias).
  • Karpenter now waits for underlying instances to be completely terminated before removing the associated nodes. This means it may take longer for nodes to be deleted and for nodeclaims to get cleaned up.

API Moves:

  • ExpireAfter has moved from the NodePool.Spec.Disruption block to NodePool.Spec.Template.Spec, and is now a drift-able field.
  • Kubelet was moved to the EC2NodeClass from the NodePool.
  • RBAC changes: added delete pods | added get, patch crds | added update nodes | removed create nodes
  • Breaking API (Manual Migration Needed):
  • Ubuntu is dropped as a first class supported AMI Family
  • karpenter.sh/do-not-consolidate (annotation), karpenter.sh/do-not-evict (annotation), and karpenter.sh/managed-by (tag) are all removed. karpenter.sh/managed-by, which currently stores the cluster name in its value, will be replaced by eks:eks-cluster-name
  • The taint used to mark nodes for disruption and termination changed from karpenter.sh/disruption=disrupting:NoSchedule to karpenter.sh/disrupted:NoSchedule. It is not recommended to tolerate this taint, however, if you were tolerating it in your applications, you’ll need to adjust your taints to reflect this.

Environment Variable Changes:

  • LOGGING_CONFIG, ASSUME_ROLE_ARN, ASSUME_ROLE_DURATION Dropped
  • LEADER_ELECT renamed to DISABLE_LEADER_ELECTION
  • FEATURE_GATES.DRIFT=true was dropped and promoted to Stable, and cannot be disabled.
  • Users currently opting out of drift, disabling the drift feature flag will no longer be able to do so.

Defaults changed:

  • API: Karpenter will drop support for IMDS access from containers by default on new EC2NodeClasses by updating the default of httpPutResponseHopLimit from 2 to 1.
  • API: ConsolidateAfter is required. Users couldn’t set this before with ConsolidationPolicy: WhenUnderutilized, where this is now required. Users can set it to 0 to have the same behavior as in v1beta1.
  • API: All NodeClassRef fields are now all required, and apiVersion has been renamed to group
  • API: AMISelectorTerms are required. Setting an Alias cannot be done with any other type of term, and must match the AMI Family that’s set or be Custom.
  • Helm: Deployment spec TopologySpreadConstraint to have required zonal spread over preferred. Users who had one node running their Karpenter deployments need to either:
  • Have two nodes in different zones to ensure both Karpenter replicas schedule
  • Scale down their Karpenter replicas from 2 to 1 in the helm chart
  • Edit and relax the topology spread constraint in their helm chart from DoNotSchedule to ScheduleAnyway
  • Helm/Binary: controller.METRICS_PORT default changed back to 8080

Deployments and Upgrades

First and foremost, it is super important to disable Karpenter webhook which enabled by default in V1. I didn’t disable it and it caused many issues where webhook was modifying the same resources as Flux. This resulted in hours of debugging, removing redundant annotations and wasting time on reviewing CRDs code. So if you want to save time, simply disable the webhook and your life will be simple.

IAM Role policy

IAM Role used by Karpenter controller needs its policy updated with new conditions around ec2:CreateTags action. You can see the specific changes in this git commit.

  • Full IAM Role Policy written in Terraform can be found in eks repo.
  • Cloudformation template can be found in the official doc.

Helm Releases

Worth noting that CRD helmrelease are always deployed before the actual Karpenter controller helmrelease. I set this dependency specifically in the Flux kustomization.

Karpenter CRDs- before upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: karpenter-crds
namespace: karpenter
spec:
releaseName: karpenter-crds
interval: 10m
chart:
spec:
chart: karpenter-crd
version: "0.37.1"
sourceRef:
kind: HelmRepository
name: karpenter-crds
namespace: flux-system

Karpenter CRDs- after upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: karpenter-crds
namespace: karpenter
spec:
releaseName: karpenter-crds
interval: 10m
chart:
spec:
chart: karpenter-crd
version: "1.0.0"
sourceRef:
kind: HelmRepository
name: karpenter-crds
namespace: flux-system
values:
webhook:
enabled: false

Note, the webhook which is disabled in this release since we don’t want to use it. As mentioned above, it caused more issues for me then I expected.

Karpenter Controller- before upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: karpenter
namespace: karpenter
spec:
releaseName: karpenter
chart:
spec:
chart: karpenter
version: 0.37.1
sourceRef:
kind: HelmRepository
name: karpenter
namespace: flux-system
interval: 1h0m0s
install:
remediation:
retries: 3
values:
replicas: 3
serviceAccount:
name: karpenter

Karpenter Controller- after upgrade

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: karpenter
namespace: karpenter
spec:
releaseName: karpenter
chart:
spec:
chart: karpenter
version: 1.0.0
sourceRef:
kind: HelmRepository
name: karpenter
namespace: flux-system
interval: 1h0m0s
install:
remediation:
retries: 3
values:
replicas: 3
serviceAccount:
name: karpenter
webhook:
enabled: false

Again, note the webhook being disabled so that our yaml templates are not auto-updated inside the cluster.

NodePools and EC2NodeClasses

Before Upgrade- beta resources

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "eu-dev"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "eu-dev"
role: "eu-dev-node"
tags:
Name: "eu-dev-node-default"
Intent: "default"
Environment: "dev"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: true
deleteOnTermination: true
detailedMonitoring: true
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
intent: default
ami-family: AL2023
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m7i.xlarge", "m7i.2xlarge"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]
kubelet:
clusterDNS: ["10.0.0.10"]
maxPods: 234
limits:
cpu: 8
memory: 32Gi
disruption:
expireAfter: 1440h

After Upgrade- V1 resource

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
kubelet:
clusterDNS: ["10.10.0.10"]
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "eu-dev"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "eu-dev"
role: "eu-dev-node"
metadataOptions:
httpEndpoint: enabled
httpPutResponseHopLimit: 2
tags:
Name: "eu-dev-node-default"
Intent: "default"
Environment: "dev"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: true
deleteOnTermination: true
detailedMonitoring: true
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
intent: default
ami-family: AL2023
spec:
expireAfter: 1440h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m7i.xlarge", "m7i.2xlarge"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]
limits:
cpu: 8
memory: 32Gi

As you can see there are couple of relocations of arguments from one resource to another. You can read about them all in the changes section above.

Notably keep an eye on the following:

  • kubelet section- moved from NodePool to EC2NodeClass
  • nodeClassRef uses group instead of apiversion
  • apiSelectorTerms are now required in EC2NodeClass, I use “alias: al2023@latest”
  • disruption block is now removed and you can define expireAfter in spec.template.spec block
  • super important, note the addition of metadataOptions.httpPutResponseHopLimit: 2, if you don’t set it, new default values will break your LB controller and Fluxv2 image automation app.

Conclusion

Looking at this upgrade/migration from a perspective, I think it is relatively straightforward. However, what went wrong in my case was the usage of GitOps in combination with active Karpenter webhook.

So my honest advise is, disable the webhook and gradually re-write your nodepool and ec2nodeclass resources and simply apply them. Updating those resources will not cause recreation of nodes.

Sponsor Me

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber

--

--

Marcin Cuber

Principal Cloud Engineer, AWS Community Builder and Solutions Architect