Amazon EKS- migrating Karpenter resources from Beta to v1 API version

7 min readAug 16, 2024

Detailed migration journey from v1beta1 to API version v1 of karpenter.sh using GitOps strategy (Fluxv2).

Introduction

Karpenter is a Kubernetes node lifecycle manager created by AWS. It is responsible for maintaining, running, scaling, and minimising cluster node configurations. It is a major project that was started to provide a better alternative to cluster-autoscaler which Karpenter greatly achieved in quick succession.

With the release of Karpenter v1.0.0, Karpenter has graduated out of beta and ships with stable APIs — NodePool and EC2NodeClass –remains available for future 1.0 minor version releases and will not be modified in ways that result in breaking changes from one minor release to another.

In my previous story I covered migration for alpha to beta API version. As of yesterday, Karpenter v1 api version is available which I want to focus on.

I have based my own work on the official documentation which you can find at:

If you read any of my previous stories, you probably know that I am utilising FluxV2 (Gitops strategy) to deploy all my yaml manifests and helm charts. Before I started my upgrade I read the migration procedure and I though it would be even easier with Flux which will deploy things nicely without much of my input. How wrong was I?! Turns out I was very wrong and ended up debugging Karpenter V1 deployed resources for another 5 to 6 hours.

Based on the above, I would like to share this story how to perhaps do this migration in a less stressful way and hopefully in the way that GitOps strategy and mechanisms like Flux and ArgoCd can be well utilised.

What is changing

It is worth bringing this upfront as there is a lot of changes and even though I read them all before upgrade, I still missed couple of them which I needed to fix after upgrading. Note that all of the below info is copied stright from official migration guide.

Features:

AMI Selector Terms has a new Alias field which can only be set by itself in EC2NodeClass.Spec.AMISelectorTerms
Disruption Budgets by Reason was added to NodePool.Spec.Disruption.Budgets
TerminationGracePeriod was added to NodePool.Spec.Template.Spec.
LOG_OUTPUT_PATHS and LOG_ERROR_OUTPUT_PATHS environment variables added
API Rename: NodePool’s ConsolidationPolicy WhenUnderutilized is now renamed to WhenEmptyOrUnderutilized

Behavior Changes:

Expiration is now forceful and begins draining as soon as it’s expired. Karpenter does not wait for replacement capacity to be available before draining, but will start provisioning a replacement as soon as the node is expired and begins draining.
Karpenter’s generated NodeConfig now takes precedence when generating UserData with the AL2023 amiFamily. If you’re setting any values managed by Karpenter in your AL2023 UserData, configure these through Karpenter natively (e.g. kubelet configuration fields).
Karpenter now adds a karpenter.sh/unregistered:NoExecute taint to nodes in injected UserData when using alias in AMISelectorTerms or non-Custom AMIFamily. When using amiFamily: Custom, users will need to add this taint into their UserData, where Karpenter will automatically remove it when provisioning nodes.
Discovered standard AL2023 AMIs will no longer be considered compatible with GPU / accelerator workloads. If you’re using an AL2023 EC2NodeClass (without AMISelectorTerms) for these workloads, you will need to select your AMI via AMISelectorTerms (non-alias).
Karpenter now waits for underlying instances to be completely terminated before removing the associated nodes. This means it may take longer for nodes to be deleted and for nodeclaims to get cleaned up.

API Moves:

ExpireAfter has moved from the NodePool.Spec.Disruption block to NodePool.Spec.Template.Spec, and is now a drift-able field.
Kubelet was moved to the EC2NodeClass from the NodePool.
RBAC changes: added delete pods | added get, patch crds | added update nodes | removed create nodes
Breaking API (Manual Migration Needed):
Ubuntu is dropped as a first class supported AMI Family
karpenter.sh/do-not-consolidate (annotation), karpenter.sh/do-not-evict (annotation), and karpenter.sh/managed-by (tag) are all removed. karpenter.sh/managed-by, which currently stores the cluster name in its value, will be replaced by eks:eks-cluster-name
The taint used to mark nodes for disruption and termination changed from karpenter.sh/disruption=disrupting:NoSchedule to karpenter.sh/disrupted:NoSchedule. It is not recommended to tolerate this taint, however, if you were tolerating it in your applications, you’ll need to adjust your taints to reflect this.

Environment Variable Changes:

LOGGING_CONFIG, ASSUME_ROLE_ARN, ASSUME_ROLE_DURATION Dropped
LEADER_ELECT renamed to DISABLE_LEADER_ELECTION
FEATURE_GATES.DRIFT=true was dropped and promoted to Stable, and cannot be disabled.
Users currently opting out of drift, disabling the drift feature flag will no longer be able to do so.

Defaults changed:

API: Karpenter will drop support for IMDS access from containers by default on new EC2NodeClasses by updating the default of httpPutResponseHopLimit from 2 to 1.
API: ConsolidateAfter is required. Users couldn’t set this before with ConsolidationPolicy: WhenUnderutilized, where this is now required. Users can set it to 0 to have the same behavior as in v1beta1.
API: All NodeClassRef fields are now all required, and apiVersion has been renamed to group
API: AMISelectorTerms are required. Setting an Alias cannot be done with any other type of term, and must match the AMI Family that’s set or be Custom.
Helm: Deployment spec TopologySpreadConstraint to have required zonal spread over preferred. Users who had one node running their Karpenter deployments need to either:
Have two nodes in different zones to ensure both Karpenter replicas schedule
Scale down their Karpenter replicas from 2 to 1 in the helm chart
Edit and relax the topology spread constraint in their helm chart from DoNotSchedule to ScheduleAnyway
Helm/Binary: controller.METRICS_PORT default changed back to 8080

Deployments and Upgrades

First and foremost, it is super important to disable Karpenter webhook which enabled by default in V1. I didn’t disable it and it caused many issues where webhook was modifying the same resources as Flux. This resulted in hours of debugging, removing redundant annotations and wasting time on reviewing CRDs code. So if you want to save time, simply disable the webhook and your life will be simple.

IAM Role policy

IAM Role used by Karpenter controller needs its policy updated with new conditions around ec2:CreateTags action. You can see the specific changes in this git commit.

Full IAM Role Policy written in Terraform can be found in eks repo.
Cloudformation template can be found in the official doc.

Helm Releases

Worth noting that CRD helmrelease are always deployed before the actual Karpenter controller helmrelease. I set this dependency specifically in the Flux kustomization.

Karpenter CRDs- before upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: karpenter-crds
  namespace: karpenter
spec:
  releaseName: karpenter-crds
  interval: 10m
  chart:
    spec:
      chart: karpenter-crd
      version: "0.37.1"
      sourceRef:
        kind: HelmRepository
        name: karpenter-crds
        namespace: flux-system

Karpenter CRDs- after upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: karpenter-crds
  namespace: karpenter
spec:
  releaseName: karpenter-crds
  interval: 10m
  chart:
    spec:
      chart: karpenter-crd
      version: "1.0.0"
      sourceRef:
        kind: HelmRepository
        name: karpenter-crds
        namespace: flux-system
  values:
    webhook:
      enabled: false

Note, the webhook which is disabled in this release since we don’t want to use it. As mentioned above, it caused more issues for me then I expected.

Karpenter Controller- before upgrade

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: karpenter
  namespace: karpenter
spec:
  releaseName: karpenter
  chart:
    spec:
      chart: karpenter
      version: 0.37.1
      sourceRef:
        kind: HelmRepository
        name: karpenter
        namespace: flux-system
  interval: 1h0m0s
  install:
    remediation:
      retries: 3
  values:
    replicas: 3
    serviceAccount:
      name: karpenter

Karpenter Controller- after upgrade

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: karpenter
  namespace: karpenter
spec:
  releaseName: karpenter
  chart:
    spec:
      chart: karpenter
      version: 1.0.0
      sourceRef:
        kind: HelmRepository
        name: karpenter
        namespace: flux-system
  interval: 1h0m0s
  install:
    remediation:
      retries: 3
  values:
    replicas: 3
    serviceAccount:
      name: karpenter
    webhook:
      enabled: false

Again, note the webhook being disabled so that our yaml templates are not auto-updated inside the cluster.

NodePools and EC2NodeClasses

Before Upgrade- beta resources

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "eu-dev"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "eu-dev"
  role: "eu-dev-node"
  tags:
    Name: "eu-dev-node-default"
    Intent: "default"
    Environment: "dev"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  detailedMonitoring: true
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: default
        ami-family: AL2023
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m7i.xlarge", "m7i.2xlarge"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot"]
      kubelet:
        clusterDNS: ["10.0.0.10"]
        maxPods: 234
  limits:
    cpu: 8
    memory: 32Gi
  disruption:
    expireAfter: 1440h

After Upgrade- V1 resource

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  kubelet:
    clusterDNS: ["10.10.0.10"]
  amiFamily: AL2023
  amiSelectorTerms:
    - alias: al2023@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "eu-dev"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "eu-dev"
  role: "eu-dev-node"
  metadataOptions:
    httpEndpoint: enabled
    httpPutResponseHopLimit: 2
  tags:
    Name: "eu-dev-node-default"
    Intent: "default"
    Environment: "dev"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
  detailedMonitoring: true
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: default
        ami-family: AL2023
    spec:
      expireAfter: 1440h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m7i.xlarge", "m7i.2xlarge"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot"]
  limits:
    cpu: 8
    memory: 32Gi

As you can see there are couple of relocations of arguments from one resource to another. You can read about them all in the changes section above.

Notably keep an eye on the following:

kubelet section- moved from NodePool to EC2NodeClass
nodeClassRef uses group instead of apiversion
apiSelectorTerms are now required in EC2NodeClass, I use “alias: al2023@latest”
disruption block is now removed and you can define expireAfter in spec.template.spec block
super important, note the addition of metadataOptions.httpPutResponseHopLimit: 2, if you don’t set it, new default values will break your LB controller and Fluxv2 image automation app.

Conclusion

Looking at this upgrade/migration from a perspective, I think it is relatively straightforward. However, what went wrong in my case was the usage of GitOps in combination with active Karpenter webhook.

So my honest advise is, disable the webhook and gradually re-write your nodepool and ec2nodeclass resources and simply apply them. Updating those resources will not cause recreation of nodes.

Sponsor Me

Sponsor @marcincuber on GitHub Sponsors

Hi guys, I am Marcin and I am a Principal Engineer specialising in the field of DevOps. I am also a certified AWS Solutions…

github.com

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber