Amazon EKS with custom service IPv4 CIDR
Solving DNS resolution issue
I am writing this story to tell you more about feature which is offered by Amazon EKS. However, it could be very time consuming to actually debug and find out why it may not be working as expected from the start. I have caught myself in such a situation and I spent hours debugging and trying to figure out why my DNS is not resolving from within the cluster.
So, service IPv4 CIDR
feature for EKS. It allows you to specify CIDR block from which Kubernetes selects and assign service IP addresses from. If you don't specify a block, Kubernetes assigns addresses from either the 10.100.0.0/16 or 172.20.0.0/16 CIDR blocks. It is recommended to specify a block that does not overlap with resources in other networks that are peered or connected to our VPC. The block must meet the following requirements:
- Within one of the following private IP address blocks: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16.
- Doesn’t overlap with any CIDR block assigned to the VPC that you selected for VPC.
- Between /24 and /12.
My aim was to configure service IPv4 CIDR to 10.160.0.0/16. However, making the change on the cluster side resulted in DNS resolution not working. In essence CoreDNS running in EKS was not resolving any requests. See below my configuration and how I resolved the problem.
Configuration and Deployment
I configure all my EKS components using Terraform so you will be able to find configuration snippets below.
Terraform EKS Cluster
resource "aws_eks_cluster" "cluster" {
name = local.eks_cluster_name
role_arn = aws_iam_role.cluster.arn
version = 1.23 vpc_config {
subnet_ids = data.aws_subnets.private.ids
security_group_ids = []
endpoint_private_access = "true"
endpoint_public_access = "true"
} encryption_config {
resources = ["secrets"]
provider {
key_arn = module.kms_eks_cluster.key_arn
}
} kubernetes_network_config {
service_ipv4_cidr = "10.160.0.0/16"
}
}
Terraform Apply Result
I selected 10.160.0.0/16 CIDR and deployed the cluster. Boom, cluster is all working and ready. All of the configuration is working as expected. But then I tried running an app which needs DNS resolution and this was not working.
Issue
After hours of debugging it turned out that CoreDNS is not resolving DNS queries. This led me to this:
/ # cat /etc/resolv.conf
nameserver 172.20.0.10
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
Clearly, a wrong nameserver is being used. It should be what I specified in my EKS configuration.
Solution
Your EKS worker nodes also have to have this custom CIDR range passed into as a kubelet argument. This is crazy important otherwise your DNS won’t work at all and you will simply have a not working cluster.
To find the necessary flag, I had to go to the source code of the EKS AMI. And here it is, the flag that you need to pass into kubelet which is --dns-cluster-ip
!!!
As already mentioned, I am using Terraform and with that I am making use of eks_node_group with launch template. That way I am in control of what flags are being passed into kubelet. Here is a working configuration for launch template so that custom CIDR range is utilised and DNS resolution works as expected.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"#!/bin/bash
set -exexec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1/etc/eks/bootstrap.sh ${CLUSTER_NAME} --b64-cluster-ca ${B64_CLUSTER_CA} --apiserver-endpoint ${API_SERVER_URL} --container-runtime ${CONTAINER_RUNTIME} --dns-cluster-ip "10.160.0.0/16"--==MYBOUNDARY==--
After deploying new nodes with the above specified flag, DNS resolution started working as expected.
Karpenter solution
In case you are using Karpenter for cluster autoscaling. I have also fixed Karpenter provisioner to work with the new cluster ip configuration.
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
kubeletConfiguration:
clusterDNS: ["10.160.0.10"]
containerRuntime: containerd
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["m6i.xlarge", "m5.xlarge"]
limits:
resources:
cpu: "40"
memory: 160Gi
ttlSecondsAfterEmpty: 60
ttlSecondsUntilExpired: 2592000
providerRef:
name: default
Conclusion
Additional steps on the worker side are not documented by AWS. So, I hope the solution provided in this story will save a lot of your time.
Enjoy Kubernetes!!!
Sponsor Me
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber