Kubernetes GitLab Runners on Amazon EKS

9 min readJun 24, 2020

Find out how to configure GitLab Runners efficiently and trouble free on Amazon EKS following GitOps strategy.

Overview

Custom Gitlab Runners in AWS is probably the best feature of Gitlab, especially when you have a managed cloud GitLab server. I don’t believe such functionality is offered by any other known provider such as CircleCI, TravisCI, TeamCity or disgusting Jenkins. GitLab Runner is used to run your jobs and send the results back to GitLab. It is used in conjunction with GitLab CI/CD, the open-source continuous integration service included with GitLab that coordinates the jobs.

I am using Kubernetes platform to spin up all my Gitlab Runners which is proving to be a very efficient and fast way of spinning them up. Previously, Gitlab Runners were running on AWS EC2, this was proving to be very challenging in terms of configuration and was taking long time to get new runners when needed.

In this article, I will provide details and templates defining how my Gitlab Runners are implemented. Also, I will outline the setup of Amazon EKS cluster and GitOps strategy using Flux that helps hugely with upgrades of the runners.

Note that all configuration details for Gitlab Runners and EKS can be found in the following:

Cluster configuration details

I am currently running latest available version of Amazon EKS 1.16. Full configuration of the cluster can be found in my Github EKS repository.

My cluster is configured with following addons to make Gitlab runners work nicely:

External Secrets — this is used to pull gitlab token from AWS SSM parameter store and populate it as a secret
Reloader — this is extremely useful as it watches changes in ConfigMap and Secret and performs rolling upgrades on Pods

All my Kubernetes addons are deployed in GitOps way using Flux. Flux is a tool that automatically ensures that the state of a cluster matches the config in git. It uses an operator in the cluster to trigger deployments inside Kubernetes, which means you don’t need a separate CD tool. It monitors all relevant image repositories, detects new images, triggers deployments and updates the desired running configuration based on that (and a configurable policy). If you want to learn more about it, head to https://github.com/fluxcd/flux.

Flux is a fantastic tool and in my opinion it is much better than Helm, which I am avoiding at all costs. Helm is simply another wrapper around wrapper which to me doesn’t make any sense. In fact, I am hugely surprised that people waste their time trying to fight Helm issues instead of configuring a clean fully working yaml templates for their applications. If you follow GitOps approach you can also get nice release versions of your code in Gitlab or Gihub. In case you want to find out more about Helm madness, please head to https://helm.sh/

Current limitation of Gitlab runners in Kubernetes

Gitlab runners don’t work very well with HPA (Horizonal Pod Autoscaler). Also, metrics server and/or Prometheus custom metrics doesn’t provide good enough metrics to make it work. I confirmed that information with Gitlab support. So, even though Gitlab docs and code suggests that is it possible (https://gitlab.com/gitlab-org/charts/gitlab-runner/blob/master/values.yaml#L382) in fact it doesn’t work, so simply don’t waste your time.

My GitOps configuration allows you to easily rollout version upgrades of the runners. Easily upgrade number of required replicas of runners per Gitlab group. And many other simply upgrades which require a minimal modification to yaml template and flux will handle the deployment.

Gitlab Runner Configs

We have reached the essence of this article. Here I am going to share the templates that I use to configure deployment, configmap, namespace, gitlab-token external-secret and RBAC.

Note that all templates are available in my github -> https://github.com/marcincuber/kubernetes-gitlab-runner

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: gitlab-runner-gitlab-runner
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
  annotations:
    reloader.stakater.com/auto: "true"
spec:
  progressDeadlineSeconds: 600
  replicas: 10
  selector:
    matchLabels:
      app: gitlab-runner-gitlab-runner
  strategy:
    rollingUpdate:
      maxSurge: 50%
      maxUnavailable: 50%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: gitlab-runner-gitlab-runner
    spec:
      containers:
      - command:
        - /bin/bash
        - /scripts/entrypoint
        env:
        - name: CI_SERVER_URL
          value: https://gitlab.com/
        - name: CLONE_URL
          value: https://gitlab.com/
        - name: RUNNER_REQUEST_CONCURRENCY
          value: "1"
        - name: RUNNER_EXECUTOR
          value: kubernetes
        - name: REGISTER_LOCKED
          value: "true"
        - name: RUNNER_TAG_LIST
          value: "test"
        - name: RUNNER_OUTPUT_LIMIT
          value: "4096"
        - name: KUBERNETES_IMAGE
          value: ubuntu:20.04
        - name: KUBERNETES_PRIVILEGED
          value: "true"
        - name: KUBERNETES_NAMESPACE
          value: "gitlab"
        - name: KUBERNETES_POLL_TIMEOUT
          value: "180"
        - name: KUBERNETES_POLL_INTERVAL
          value: "3"
        - name: KUBERNETES_CPU_LIMIT
          value: "2"
        - name: KUBERNETES_MEMORY_LIMIT
          value: "3.5Gi"
        - name: KUBERNETES_CPU_REQUEST
          value: "1.5"
        - name: KUBERNETES_MEMORY_REQUEST
          value: "3.5Gi"
        - name: KUBERNETES_SERVICE_ACCOUNT
        - name: KUBERNETES_SERVICE_CPU_LIMIT
        - name: KUBERNETES_SERVICE_MEMORY_LIMIT
        - name: KUBERNETES_SERVICE_CPU_REQUEST
        - name: KUBERNETES_SERVICE_MEMORY_REQUEST
        - name: KUBERNETES_HELPER_CPU_LIMIT
        - name: KUBERNETES_HELPER_MEMORY_LIMIT
        - name: KUBERNETES_HELPER_CPU_REQUEST
          value: "100m"
        - name: KUBERNETES_HELPER_MEMORY_REQUEST
          value: "128Mi"
        - name: KUBERNETES_HELPER_IMAGE
        - name: KUBERNETES_PULL_POLICY
          value: "if-not-present"
        - name: CACHE_TYPE
          value: s3
        - name: CACHE_PATH
          value: gitlab-caches
        - name: CACHE_SHARED
          value: "true"
        - name: CACHE_S3_SERVER_ADDRESS
          value: s3.amazonaws.com
        - name: CACHE_S3_BUCKET_NAME
          value: kubernetes-gitlab-cache
        - name: CACHE_S3_BUCKET_LOCATION
          value: eu-west-1
        - name: RUNNER_CACHE_DIR
          value: /opt/gitlab-runner/cache/
        - name: RUNNER_BUILDS_DIR
          value: /opt/gitlab-runner/builds/
        image: gitlab/gitlab-runner:alpine-v13.1.0
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /entrypoint
              - unregister
              - --all-runners
        livenessProbe:
          exec:
            command:
            - /bin/bash
            - /scripts/check-live
        name: gitlab-runner-gitlab-runner
        ports:
        - containerPort: 9252
          name: metrics
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /usr/bin/pgrep
            - gitlab.*runner
        volumeMounts:
        - mountPath: /secrets
          name: runner-secrets
        - mountPath: /home/gitlab-runner/.gitlab-runner
          name: etc-gitlab-runner
        - mountPath: /scripts
          name: scripts
        resources:
          requests:
            cpu: "50m"
            memory: "200Mi"
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - /config/configure
        env:
        - name: CI_SERVER_URL
          value: https://gitlab.com/
        - name: CLONE_URL
          value: https://gitlab.com/
        - name: RUNNER_REQUEST_CONCURRENCY
          value: "1"
        - name: RUNNER_EXECUTOR
          value: kubernetes
        - name: REGISTER_LOCKED
          value: "true"
        - name: RUNNER_TAG_LIST
          value: "test"
        - name: RUNNER_OUTPUT_LIMIT
          value: "4096"
        - name: KUBERNETES_IMAGE
          value: ubuntu:20.04
        - name: KUBERNETES_PRIVILEGED
          value: "true"
        - name: KUBERNETES_NAMESPACE
          value: "gitlab"
        - name: KUBERNETES_POLL_TIMEOUT
          value: "180"
        - name: KUBERNETES_POLL_INTERVAL
          value: "3"
        - name: KUBERNETES_CPU_LIMIT
          value: "2"
        - name: KUBERNETES_MEMORY_LIMIT
          value: "3.5Gi"
        - name: KUBERNETES_CPU_REQUEST
          value: "1.5"
        - name: KUBERNETES_MEMORY_REQUEST
          value: "3.5Gi"
        - name: KUBERNETES_SERVICE_ACCOUNT
        - name: KUBERNETES_SERVICE_CPU_LIMIT
        - name: KUBERNETES_SERVICE_MEMORY_LIMIT
        - name: KUBERNETES_SERVICE_CPU_REQUEST
        - name: KUBERNETES_SERVICE_MEMORY_REQUEST
        - name: KUBERNETES_HELPER_CPU_LIMIT
        - name: KUBERNETES_HELPER_MEMORY_LIMIT
        - name: KUBERNETES_HELPER_CPU_REQUEST
          value: "100m"
        - name: KUBERNETES_HELPER_MEMORY_REQUEST
          value: "128Mi"
        - name: KUBERNETES_HELPER_IMAGE
        - name: KUBERNETES_PULL_POLICY
          value: "if-not-present"
        - name: CACHE_TYPE
          value: s3
        - name: CACHE_PATH
          value: gitlab-caches
        - name: CACHE_SHARED
          value: "true"
        - name: CACHE_S3_SERVER_ADDRESS
          value: s3.amazonaws.com
        - name: CACHE_S3_BUCKET_NAME
          value: kubernetes-gitlab-cache
        - name: CACHE_S3_BUCKET_LOCATION
          value: eu-west-1
        - name: RUNNER_CACHE_DIR
          value: /opt/gitlab-runner/cache/
        - name: RUNNER_BUILDS_DIR
          value: /opt/gitlab-runner/builds/
        image: gitlab/gitlab-runner:alpine-v13.1.0
        imagePullPolicy: IfNotPresent
        name: configure
        volumeMounts:
        - mountPath: /secrets
          name: runner-secrets
        - mountPath: /config
          name: scripts
          readOnly: true
        - mountPath: /init-secrets
          name: init-runner-secrets
          readOnly: true
        resources:
          requests:
            cpu: "50m"
            memory: "200Mi"
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65533
        runAsUser: 100
      serviceAccount: gitlab-runner-gitlab-runner
      serviceAccountName: gitlab-runner-gitlab-runner
      terminationGracePeriodSeconds: 300
      volumes:
      - emptyDir:
          medium: Memory
        name: runner-secrets
      - emptyDir:
          medium: Memory
        name: etc-gitlab-runner
      - name: init-runner-secrets
        projected:
          defaultMode: 420
          sources:
          - secret:
              items:
              - key: runner-registration-token
                path: runner-registration-token
              name: gitlab-runner-token
      - configMap:
          defaultMode: 420
          name: gitlab-runner-gitlab-runner
        name: scripts

I have highlighted some important bits that you may want to reconfigure yourself. That is a S3 bucket which will have a different name and potentially a different region.

There are various environment variables that you should review as well. Setting such as Privileged container is very important. I strongly recommend that you have a dedicated cluster to run all your Gitlab runners. In most case they will need to be privileged. Additionally, all Kubernetes cluster should be treated like cattles.

Lastly, I want to point out that if you are following GitOps model of deployments, upgrades to runners are very simple and you only need to update image field with the new tag of the docker image. Usually upgrades take few seconds depending on rollingUpdate strategy.

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: gitlab-runner-gitlab-runner
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
data:
  check-live: |
    #!/bin/bash
    if /usr/bin/pgrep -f .*register-the-runner; then
      exit 0
    elif /usr/bin/pgrep gitlab.*runner; then
      exit 0
    else
      exit 1
    fi
  config.toml: |
    concurrent = 1
    check_interval = 10
    log_level = "info"
  configure: |
    set -e
    cp /init-secrets/* /secrets
  entrypoint: |
    #!/bin/bash
    set -e
    mkdir -p /home/gitlab-runner/.gitlab-runner/
    cp /scripts/config.toml /home/gitlab-runner/.gitlab-runner/# Register the runner
    if [[ -f /secrets/accesskey && -f /secrets/secretkey ]]; then
      export CACHE_S3_ACCESS_KEY=$(cat /secrets/accesskey)
      export CACHE_S3_SECRET_KEY=$(cat /secrets/secretkey)
    fiif [[ -f /secrets/gcs-applicaton-credentials-file ]]; then
      export GOOGLE_APPLICATION_CREDENTIALS="/secrets/gcs-applicaton-credentials-file"
    else
      if [[ -f /secrets/gcs-access-id && -f /secrets/gcs-private-key ]]; then
        export CACHE_GCS_ACCESS_ID=$(cat /secrets/gcs-access-id)
        # echo -e used to make private key multiline (in google json auth key private key is oneline with \n)
        export CACHE_GCS_PRIVATE_KEY=$(echo -e $(cat /secrets/gcs-private-key))
      fi
    fiif [[ -f /secrets/runner-registration-token ]]; then
      export REGISTRATION_TOKEN=$(cat /secrets/runner-registration-token)
    fiif [[ -f /secrets/runner-token ]]; then
      export CI_SERVER_TOKEN=$(cat /secrets/runner-token)
    fiif ! sh /scripts/register-the-runner; then
      exit 1
    fi# Start the runner
    exec /entrypoint run --user=gitlab-runner \
      --working-directory=/home/gitlab-runner
  register-the-runner: "#!/bin/bash\nMAX_REGISTER_ATTEMPTS=30\n\nfor i in $(seq 1
    \"${MAX_REGISTER_ATTEMPTS}\"); do\n  echo \"Registration attempt ${i} of ${MAX_REGISTER_ATTEMPTS}\"\n
    \ /entrypoint register \\\n    --non-interactive\n\n  retval=$?\n\n  if [ ${retval}
    = 0 ]; then\n    break\n  elif [ ${i} = ${MAX_REGISTER_ATTEMPTS} ]; then\n    exit
    1\n  fi\n\n  sleep 5 \ndone\n\nexit 0\n"

In the configmap template you don’t really need to customise anything. Only things you may want to look at are the namespace (if you decide to change it) and perhaps concurrency. A single Gitlab Runner can run more than one concurrent worker/executer. I set it to exactly to one, this allows me to have a single runner managing exactly one job at a time.

External Secret- Gitlab token

apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
  name: gitlab-runner-token
  namespace: gitlab
spec:
  backendType: systemManager
  data:
    - key: /k8s-gitlab-runner/token/test
      name: runner-registration-token

To make use of this, you will need to have external-secrets addon installed. I highly recommend having it, especially when you are running things in AWS. The highlighted line is the important part which points at the name of the AWS SSM parameter. This parameter stores the registration token for Gitlab. Such token can be found in More about SSM parameters can be read in docs.

Namespace and RBAC

apiVersion: v1
kind: Namespace
metadata:
  name: gitlab
  labels:
    app: gitlab-runner-gitlab-runner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app: gitlab-runner-gitlab-runner
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
rules:
- apiGroups:
  - ""
  resources:
  - '*'
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app: gitlab-runner-gitlab-runner
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: gitlab-runner-gitlab-runner
subjects:
- kind: ServiceAccount
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: gitlab-runner-gitlab-runner
  name: gitlab-runner-gitlab-runner
  namespace: gitlab
automountServiceAccountToken: true

Just a basic configs for namespace and role. Role is has all permissions assigned, however it is bound to its namespace so doesn’t cause any issues.

Conclusion

I have present you with the latest configuration I am using for Gitlab runners. They are using up-to-date docker image which is version v13.1.0. I have been running this setup for at least 4 months and rolling updates is extremely easy. I had zero issues.

Something I have not mentioned was that my EKS cluster fully utilises spot instances which makes this setup very cheap. As this is all Gitlab related, you can always opt to use cloud managed Gitlab Runners which are provided by Gitlab. For security reasons and easy integration with AWS IAM, it was a must for me to have and run my own runners in AWS.

GitLab runners or in fact GitLab itself doesn’t provide one feature which is very important and this is managed Mac OSX runners. Hopefully, this feature is in works and at some point we will get fully equipped platform with everything we need :).

Thanks for reading, hope this story helped. Stay tuned.

If you have any questions please leave comments or private notes. Otherwise you can find me on https://www.linkedin.com/in/marcincuber/