Amazon EKS + managed node groups

8 min readJan 22, 2020

EKS cluster configured with managed node groups using Terraform

Overview

In this story I am going to concentrate on the managed worker nodes or managed node groups feature for EKS. It is recently released feature for Amazon’s managed Kubernetes. I am also going to highlight pros and cons of using managed node groups.

Implementation details and terraform snippets can be found in this story in case you decide to make use of them. I am using latest terraform (0.12.19), terraform aws provider (2.45.0).

Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.

EKS Managed Node Groups details

Managed Node Groups are supported on Amazon EKS clusters beginning with Kubernetes version 1.14 and platform version eks.3. Existing clusters can update to version 1.14 to take advantage of this feature.

Advantages

With Amazon EKS managed node groups, you don’t need to separately provision or register the Amazon EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, update, or terminate nodes for your cluster with a single operation. Nodes run using the latest Amazon EKS-optimized AMIs in your AWS account while node updates and terminations gracefully drain nodes to ensure that your applications stay available.

Nodes launched as part of a managed node group are automatically tagged for auto-discovery by the Kubernetes cluster autoscaler and you can use the node group to apply Kubernetes labels to nodes and update them at any time.

Disadvantages

AMI versions

AMI versions aren’t up-to-date for managed worker groups. At the time of writing I am fetching latest AMI version using the following:

data "aws_ssm_parameter" "eks_optimized_ami_id" {
  name            = "/aws/service/eks/optimized-ami/1.14/amazon-linux-2/recommended/image_id"
  with_decryption = true
}

The above is used to fetch AMI ID for my non-managed worker nodes and gives me node with the following output:

ip-10-60-16-223.eu-west-1.compute.internal Ready v1.14.8-eks-b8860f

When using managed worker groups they are using older AMI and give me:

ip-10-60-1-166.eu-west-1.compute.internal Ready v1.14.7-eks-1861c5

Clearly, managed worker nodes are not patched to the latest version.

Rollback AMI

You cannot roll back a node group to an earlier Kubernetes version or AMI version. Knowing AWS where even the most crucial component such as aws-vpc-cni has been released and has broken everything recently, rollback in my opinion is a must have feature for managed worker groups.

Security Groups

Managed worker groups spin up nodes with two security group. You must be very careful not to tag both security groups with following tag:

kubernetes.io/cluster/eks-test-eu = "owned"

This is going to break Kubernetes and you won’t be able to spin up load balancer in your AWS account using EKS. There is a strict condition to only allow a single security group with that tag.

Terraform and managed node group upgrades

With terraform there is no way to perform rolling updates to worker nodes. In fact the only way to update manage node group with new AMI is to create new group and destroy the old one…

No support for spot instances!

Managed node groups don’t support spot instances at the moment. I would expect for that to be the number one feature to implement from the start.

Implementation details in Terraform

VPC

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "2.21.0"  name = "eks-vpc"  azs = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
  cidr = "10.60.0.0/18"
  
  private_subnets = ["10.60.0.0/20", "10.60.16.0/20", "10.60.32.0/20"]
  public_subnets  = ["10.60.48.0/22", "10.60.52.0/22", "10.60.56.0/22"]  enable_dns_hostnames = true
  enable_dns_support   = true  enable_nat_gateway = true
  enable_vpn_gateway = true  single_nat_gateway     = true
  one_nat_gateway_per_az = false  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }  tags = merge(
    var.tags,
    {
      "kubernetes.io/cluster/eks-test" = "shared"
    },
  )  enable_ecr_dkr_endpoint              = true
  ecr_dkr_endpoint_private_dns_enabled = true
  ecr_dkr_endpoint_security_group_ids  = [aws_security_group.vpc_endpoint.id]  enable_s3_endpoint = true
}# Security Group configuration for VPC endpoints
resource "random_id" "vpc_endpoint_sg_suffix" {
  byte_length = 4
}resource "aws_security_group" "vpc_endpoint" {
  name        = "eks-vpc-endpoint-sg-${random_id.vpc_endpoint_sg_suffix.hex}"  description = "Security Group used by VPC Endpoints."
  vpc_id      = module.vpc.vpc_id  tags = merge(
    var.tags,
    {
      "Name" = "eks-vpc-endpoint-sg-${random_id.vpc_endpoint_sg_suffix.hex}"
    }
  )  lifecycle {
    create_before_destroy = true
  }
}resource "aws_security_group_rule" "vpc_endpoint_egress" {
  security_group_id = aws_security_group.vpc_endpoint.id
  type              = "egress"
  protocol          = "-1"
  from_port         = 0
  to_port           = 0
  cidr_blocks       = ["0.0.0.0/0"]
  ipv6_cidr_blocks  = ["::/0"]
}resource "aws_security_group_rule" "vpc_endpoint_self_ingress" {
  security_group_id        = aws_security_group.vpc_endpoint.id
  type                     = "ingress"
  protocol                 = "-1"
  from_port                = 0
  to_port                  = 0
  source_security_group_id = aws_security_group.vpc_endpoint.id
}

The above vpc configuration is a relatively simple configuration of the VPC. It consists three public and three private subnets. Private subnets are used for our Kubernetes worker nodes. Additionally, we enable ECR DKR and S3 private endpoints to keep all the connections to ECR or S3 private within our VPC.

EKS Controlplane

# EKS Control Plane security group
resource "aws_security_group_rule" "vpc_endpoint_eks_cluster_sg" {

  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.vpc_endpoint.id
  source_security_group_id = aws_eks_cluster.cluster.vpc_config.0.cluster_security_group_id
  to_port                  = 443
  type                     = "ingress"  depends_on = [aws_eks_cluster.cluster]
}# EKS Cluster
resource "aws_eks_cluster" "cluster" {
  enabled_cluster_log_types = []
  name                      = eks-test
  role_arn                  = aws_iam_role.cluster.arn
  version                   = var.eks_version  vpc_config {
    subnet_ids              = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
    security_group_ids      = []
    endpoint_private_access = "true"
    endpoint_public_access  = "true"
  }  tags = var.tags  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
    aws_cloudwatch_log_group.cluster
  ]
}resource "aws_cloudwatch_log_group" "cluster" {
  name              = "/aws/eks/eks-test/cluster"
  retention_in_days = 7
}resource "aws_iam_role" "cluster" {
  name = "eks-test-cluster-role"assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICYtags = var.tags
}resource "aws_iam_role_policy_attachment"     "cluster_AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = aws_iam_role.cluster.name
}

Standard setup of EKS controlplane with security group configuration enabling correct working of VPC endpoints.

EKS managed node groups

resource "random_id" "managed_workers_sg_suffix_a" {  byte_length = 4
  keepers = {
    ssh_key_name                       = var.ssh_key_name
    node_role_arn                      = aws_iam_role.managed_workers.arn
    subnet_ids                         = module.vpc.private_subnets[0]
    managed_node_group_instance_types  = var.managed_node_group_instance_types
    managed_node_group_release_version = var.managed_node_group_release_version
  }
}resource "random_id" "managed_workers_sg_suffix_b" {  byte_length = 4
  keepers = {
    ssh_key_name                       = var.ssh_key_name
    node_role_arn                      = aws_iam_role.managed_workers.arn
    subnet_ids                         = module.vpc.private_subnets[1]
    managed_node_group_instance_types  = var.managed_node_group_instance_types
    managed_node_group_release_version = var.managed_node_group_release_version
  }
}resource "random_id" "managed_workers_sg_suffix_c" {  byte_length = 4
  keepers = {
    ssh_key_name                       = var.ssh_key_name
    node_role_arn                      = aws_iam_role.managed_workers.arn
    subnet_ids                         = module.vpc.private_subnets[2]
    managed_node_group_instance_types  = var.managed_node_group_instance_types
    managed_node_group_release_version = var.managed_node_group_release_version
  }
}resource "aws_eks_node_group" "managed_workers_a" {  cluster_name    = aws_eks_cluster.cluster.name
  node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_a.id}"
  node_role_arn   = aws_iam_role.managed_workers.arn
  subnet_ids      = [module.vpc.private_subnets[0]]  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }  instance_types = split(",", var.managed_node_group_instance_types)  labels = {
    lifecycle = "OnDemand"
    az        = "eu-west-1a"
  }  remote_access {
    ec2_ssh_key               = var.ssh_key_name
    source_security_group_ids = [module.bastion.security_group_id]
  }  release_version = var.managed_node_group_release_version  tags = var.tags  depends_on = [
    aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
  ]  lifecycle {
    create_before_destroy = true
  }
}resource "aws_eks_node_group" "managed_workers_b" {  cluster_name    = aws_eks_cluster.cluster.name
  node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_b.id}"
  node_role_arn   = aws_iam_role.managed_workers.arn
  subnet_ids      = [module.vpc.private_subnets[1]]  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }  labels = {
    lifecycle = "OnDemand"
    az        = "eu-west-1b"
  }  remote_access {
    ec2_ssh_key               = var.ssh_key_name
    source_security_group_ids = [module.bastion.security_group_id]
  }  release_version = "1.14.7-20190927"  tags = var.tags  depends_on = [
    aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
  ]  lifecycle {
    create_before_destroy = true
  }
}resource "aws_eks_node_group" "managed_workers_c" {  cluster_name    = aws_eks_cluster.cluster.name
  node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_c.id}"
  node_role_arn   = aws_iam_role.managed_workers.arn
  subnet_ids      = [module.vpc.private_subnets[2]]  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }  instance_types = split(",", var.managed_node_group_instance_types)  labels = {
    lifecycle = "OnDemand"
    az        = "eu-west-1c"
  }  remote_access {
    ec2_ssh_key               = var.ssh_key_name
    source_security_group_ids = [module.bastion.security_group_id]
  }  release_version = var.managed_node_group_release_version  tags = var.tags  depends_on = [
    aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
  ]  lifecycle {
    create_before_destroy = true
  }
}resource "aws_iam_role" "managed_workers" {  name = "eks-test-managed-worker-node"  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Effect": "Allow"
    }
  ]
}
EOF
}resource "aws_iam_role_policy_attachment" "eks-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.managed_workers.name
}resource "aws_iam_role_policy_attachment" "eks-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.managed_workers.name
}resource "aws_iam_role_policy_attachment" "eks-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.managed_workers.name
}

So the above defines a full configuration for three EKS node groups, a single node group per availability zone. It is a good idea to separate nodes into multiple groups since each update to the group will destroy old nodes and create new ones. It is a really bad design from AWS which shouldn’t never be called managed node groups.

Moving on, if we are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes Cluster Autoscaler, it is essential to configure multiple node groups, each scoped to a single Availability Zone. In addition, you must enable the --balance-similar-node-groups feature.

In order to ensure resource capacity in terms of nodes, each node groups is using a common terraform hack:

lifecycle {
  create_before_destroy = true
}

So this way we are making sure that new node group and its nodes are created before old ones are removed. Personally, I am not a big fan of this solution but looks like it is the only way to get it working properly in Terraform 0.12.

Cluster Autoscaler

For completeness, you can find a tested deployment configuration for Kubernetes AutoScaler that I used for my setup.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        - image: gcr.io/google-containers/cluster-autoscaler:v1.14.7
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --skip-nodes-with-system-pods=false
            - --ignore-daemonsets-utilization=true
            - --balance-similar-node-groups=true
            - --expander=random
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks-test
          env:
            - name: AWS_REGION
              value: eu-west-1
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-bundle.crt"

Conclusion

I have demonstrated how to configure your VPC, EKS cluster, managed node groups and cluster autoscaler. I hope this is going to help someone in delivering a production Kubernetes environment.

Personally, I don’t use node groups since they are very inflexible and essential features such as spot instances support are simply missing. Additionally, AMIs used for node groups are slightly behind latest AMI available for worker nodes.

Overall, I think node groups are a nice free feature that ships with EKS. However, AWS is following their standard approach for many year and again has released something that is not ready for serious use and in my opinion it has been rushed through just to deliver it together with Fargate profiles for ReInvent 19.

Sponsor Me

Sponsor @marcincuber on GitHub Sponsors

Hi guys, I am Marcin and I am Technical Lead specialising in the field of DevOps. I am also a certified AWS Solutions…

github.com

Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.

Thanks for reading everybody. Marcin Cuber