Amazon EKS + managed node groups
EKS cluster configured with managed node groups using Terraform
Overview
In this story I am going to concentrate on the managed worker nodes or managed node groups feature for EKS. It is recently released feature for Amazon’s managed Kubernetes. I am also going to highlight pros and cons of using managed node groups.
Implementation details and terraform snippets can be found in this story in case you decide to make use of them. I am using latest terraform (0.12.19), terraform aws provider (2.45.0).
Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.
EKS Managed Node Groups details
Managed Node Groups are supported on Amazon EKS clusters beginning with Kubernetes version 1.14 and platform version eks.3. Existing clusters can update to version 1.14 to take advantage of this feature.
Advantages
With Amazon EKS managed node groups, you don’t need to separately provision or register the Amazon EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, update, or terminate nodes for your cluster with a single operation. Nodes run using the latest Amazon EKS-optimized AMIs in your AWS account while node updates and terminations gracefully drain nodes to ensure that your applications stay available.
Nodes launched as part of a managed node group are automatically tagged for auto-discovery by the Kubernetes cluster autoscaler and you can use the node group to apply Kubernetes labels to nodes and update them at any time.
Disadvantages
AMI versions
AMI versions aren’t up-to-date for managed worker groups. At the time of writing I am fetching latest AMI version using the following:
data "aws_ssm_parameter" "eks_optimized_ami_id" {
name = "/aws/service/eks/optimized-ami/1.14/amazon-linux-2/recommended/image_id"
with_decryption = true
}
The above is used to fetch AMI ID for my non-managed worker nodes and gives me node with the following output:
ip-10-60-16-223.eu-west-1.compute.internal Ready v1.14.8-eks-b8860f
When using managed worker groups they are using older AMI and give me:
ip-10-60-1-166.eu-west-1.compute.internal Ready v1.14.7-eks-1861c5
Clearly, managed worker nodes are not patched to the latest version.
Rollback AMI
You cannot roll back a node group to an earlier Kubernetes version or AMI version. Knowing AWS where even the most crucial component such as aws-vpc-cni has been released and has broken everything recently, rollback in my opinion is a must have feature for managed worker groups.
Security Groups
Managed worker groups spin up nodes with two security group. You must be very careful not to tag both security groups with following tag:
kubernetes.io/cluster/eks-test-eu = "owned"
This is going to break Kubernetes and you won’t be able to spin up load balancer in your AWS account using EKS. There is a strict condition to only allow a single security group with that tag.
Terraform and managed node group upgrades
With terraform there is no way to perform rolling updates to worker nodes. In fact the only way to update manage node group with new AMI is to create new group and destroy the old one…
No support for spot instances!
Managed node groups don’t support spot instances at the moment. I would expect for that to be the number one feature to implement from the start.
Implementation details in Terraform
VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.21.0" name = "eks-vpc" azs = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
cidr = "10.60.0.0/18"
private_subnets = ["10.60.0.0/20", "10.60.16.0/20", "10.60.32.0/20"]
public_subnets = ["10.60.48.0/22", "10.60.52.0/22", "10.60.56.0/22"] enable_dns_hostnames = true
enable_dns_support = true enable_nat_gateway = true
enable_vpn_gateway = true single_nat_gateway = true
one_nat_gateway_per_az = false private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
} public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
} tags = merge(
var.tags,
{
"kubernetes.io/cluster/eks-test" = "shared"
},
) enable_ecr_dkr_endpoint = true
ecr_dkr_endpoint_private_dns_enabled = true
ecr_dkr_endpoint_security_group_ids = [aws_security_group.vpc_endpoint.id] enable_s3_endpoint = true
}# Security Group configuration for VPC endpoints
resource "random_id" "vpc_endpoint_sg_suffix" {
byte_length = 4
}resource "aws_security_group" "vpc_endpoint" {
name = "eks-vpc-endpoint-sg-${random_id.vpc_endpoint_sg_suffix.hex}" description = "Security Group used by VPC Endpoints."
vpc_id = module.vpc.vpc_id tags = merge(
var.tags,
{
"Name" = "eks-vpc-endpoint-sg-${random_id.vpc_endpoint_sg_suffix.hex}"
}
) lifecycle {
create_before_destroy = true
}
}resource "aws_security_group_rule" "vpc_endpoint_egress" {
security_group_id = aws_security_group.vpc_endpoint.id
type = "egress"
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}resource "aws_security_group_rule" "vpc_endpoint_self_ingress" {
security_group_id = aws_security_group.vpc_endpoint.id
type = "ingress"
protocol = "-1"
from_port = 0
to_port = 0
source_security_group_id = aws_security_group.vpc_endpoint.id
}
The above vpc configuration is a relatively simple configuration of the VPC. It consists three public and three private subnets. Private subnets are used for our Kubernetes worker nodes. Additionally, we enable ECR DKR and S3 private endpoints to keep all the connections to ECR or S3 private within our VPC.
EKS Controlplane
# EKS Control Plane security group
resource "aws_security_group_rule" "vpc_endpoint_eks_cluster_sg" {
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.vpc_endpoint.id
source_security_group_id = aws_eks_cluster.cluster.vpc_config.0.cluster_security_group_id
to_port = 443
type = "ingress" depends_on = [aws_eks_cluster.cluster]
}# EKS Cluster
resource "aws_eks_cluster" "cluster" {
enabled_cluster_log_types = []
name = eks-test
role_arn = aws_iam_role.cluster.arn
version = var.eks_version vpc_config {
subnet_ids = flatten([module.vpc.public_subnets, module.vpc.private_subnets])
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
} tags = var.tags depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
aws_cloudwatch_log_group.cluster
]
}resource "aws_cloudwatch_log_group" "cluster" {
name = "/aws/eks/eks-test/cluster"
retention_in_days = 7
}resource "aws_iam_role" "cluster" {
name = "eks-test-cluster-role"assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICYtags = var.tags
}resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = aws_iam_role.cluster.name
}
Standard setup of EKS controlplane with security group configuration enabling correct working of VPC endpoints.
EKS managed node groups
resource "random_id" "managed_workers_sg_suffix_a" { byte_length = 4
keepers = {
ssh_key_name = var.ssh_key_name
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = module.vpc.private_subnets[0]
managed_node_group_instance_types = var.managed_node_group_instance_types
managed_node_group_release_version = var.managed_node_group_release_version
}
}resource "random_id" "managed_workers_sg_suffix_b" { byte_length = 4
keepers = {
ssh_key_name = var.ssh_key_name
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = module.vpc.private_subnets[1]
managed_node_group_instance_types = var.managed_node_group_instance_types
managed_node_group_release_version = var.managed_node_group_release_version
}
}resource "random_id" "managed_workers_sg_suffix_c" { byte_length = 4
keepers = {
ssh_key_name = var.ssh_key_name
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = module.vpc.private_subnets[2]
managed_node_group_instance_types = var.managed_node_group_instance_types
managed_node_group_release_version = var.managed_node_group_release_version
}
}resource "aws_eks_node_group" "managed_workers_a" { cluster_name = aws_eks_cluster.cluster.name
node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_a.id}"
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = [module.vpc.private_subnets[0]] scaling_config {
desired_size = 1
max_size = 1
min_size = 1
} instance_types = split(",", var.managed_node_group_instance_types) labels = {
lifecycle = "OnDemand"
az = "eu-west-1a"
} remote_access {
ec2_ssh_key = var.ssh_key_name
source_security_group_ids = [module.bastion.security_group_id]
} release_version = var.managed_node_group_release_version tags = var.tags depends_on = [
aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
] lifecycle {
create_before_destroy = true
}
}resource "aws_eks_node_group" "managed_workers_b" { cluster_name = aws_eks_cluster.cluster.name
node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_b.id}"
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = [module.vpc.private_subnets[1]] scaling_config {
desired_size = 1
max_size = 1
min_size = 1
} labels = {
lifecycle = "OnDemand"
az = "eu-west-1b"
} remote_access {
ec2_ssh_key = var.ssh_key_name
source_security_group_ids = [module.bastion.security_group_id]
} release_version = "1.14.7-20190927" tags = var.tags depends_on = [
aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
] lifecycle {
create_before_destroy = true
}
}resource "aws_eks_node_group" "managed_workers_c" { cluster_name = aws_eks_cluster.cluster.name
node_group_name = "eks-test-managed-workers-${random_id.managed_workers_sg_suffix_c.id}"
node_role_arn = aws_iam_role.managed_workers.arn
subnet_ids = [module.vpc.private_subnets[2]] scaling_config {
desired_size = 1
max_size = 1
min_size = 1
} instance_types = split(",", var.managed_node_group_instance_types) labels = {
lifecycle = "OnDemand"
az = "eu-west-1c"
} remote_access {
ec2_ssh_key = var.ssh_key_name
source_security_group_ids = [module.bastion.security_group_id]
} release_version = var.managed_node_group_release_version tags = var.tags depends_on = [
aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
] lifecycle {
create_before_destroy = true
}
}resource "aws_iam_role" "managed_workers" { name = "eks-test-managed-worker-node" assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow"
}
]
}
EOF
}resource "aws_iam_role_policy_attachment" "eks-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.managed_workers.name
}resource "aws_iam_role_policy_attachment" "eks-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.managed_workers.name
}resource "aws_iam_role_policy_attachment" "eks-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.managed_workers.name
}
So the above defines a full configuration for three EKS node groups, a single node group per availability zone. It is a good idea to separate nodes into multiple groups since each update to the group will destroy old nodes and create new ones. It is a really bad design from AWS which shouldn’t never be called managed node groups.
Moving on, if we are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes Cluster Autoscaler, it is essential to configure multiple node groups, each scoped to a single Availability Zone. In addition, you must enable the --balance-similar-node-groups
feature.
In order to ensure resource capacity in terms of nodes, each node groups is using a common terraform hack:
lifecycle {
create_before_destroy = true
}
So this way we are making sure that new node group and its nodes are created before old ones are removed. Personally, I am not a big fan of this solution but looks like it is the only way to get it working properly in Terraform 0.12.
Cluster Autoscaler
For completeness, you can find a tested deployment configuration for Kubernetes AutoScaler that I used for my setup.
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: gcr.io/google-containers/cluster-autoscaler:v1.14.7
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --skip-nodes-with-system-pods=false
- --ignore-daemonsets-utilization=true
- --balance-similar-node-groups=true
- --expander=random
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks-test
env:
- name: AWS_REGION
value: eu-west-1
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-bundle.crt"
Conclusion
I have demonstrated how to configure your VPC, EKS cluster, managed node groups and cluster autoscaler. I hope this is going to help someone in delivering a production Kubernetes environment.
Personally, I don’t use node groups since they are very inflexible and essential features such as spot instances support are simply missing. Additionally, AMIs used for node groups are slightly behind latest AMI available for worker nodes.
Overall, I think node groups are a nice free feature that ships with EKS. However, AWS is following their standard approach for many year and again has released something that is not ready for serious use and in my opinion it has been rushed through just to deliver it together with Fargate profiles for ReInvent 19.
Sponsor Me
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber