Drupal and Kubernetes (K8s) getting started

This blog post aims to describe the learnings and the challenges I faced when looking at implementing Drupal on Kubernetes. If you have zero experience with Kubernetes then this post will give you a shallow dive into Kubernetes with Drupal.

This blog post aims to describe the learnings and the challenges I faced when looking at implementing Drupal on Kubernetes. I will describe how I went about creating a test environment and then a cloud hosted environment on Hetzner cloud. I started out with zero experience in container orchestration.

Starting out

Illustration of a female captains in front of a wheel on a boat

Starting out with implementing Drupal on Kubernetes I had no idea where to start or how components would work together. My first step was to get reading up on the topic and of course you start out with the online documentation which Kubernetes has done a very good job of explaining the basic components and how they work together. However, I needed more practice and practical explanations. I purchased the book Kubernetes: Up and Running by Joe Beda, Brendan Burns and Kelsey Hightower. This is a fantastic book and really helped me to get to know the fundamentals of Kubernetes and how the pieces should fit together. The current edition is a bit out of date, not explaining StatefulSets in detail yet as they were still in development but the next edition will explain these, as well as Operators which are the developing pattern for handling stateful services.

I found that the most useful tool if using Linux is Canonical’s microk8s. It is a great application which is very easy to install and use. It sets up a single node cluster on a local machine allowing you to test your deployments. Provisioning volumes is also very easy and uses the host system to dynamically provision the persistent volumes. This was a very fast and easy way to test deployments without incurring any extra costs by using Google, AWS or one of the other cloud providers. Installation is as easy as

$ snap install microk8s –classic

on an Linux machine. To add functionality for DNS, storage and the Kubernetes dashboard requires a single command:

$ microk8s.enable <add-on name>

Adding the storage add-on enables storage to then be dynamically provisioned to the hostpath without further setup.

Once I had a running single node cluster provisioning volumes for persistent storage I was then able to move the cluster towards a real world solution.

1. Create master and slave nodes

In this case I chose Hetzner as it is an extremely cheap cloud provider offering compute instances and storage. For the test cluster I used one master with 2 vCPU, 4 gig RAM which costs EUR 5.88 per month. This is the minimum specification Kubernetes allows you to run the master on when using kubeadm. The slave nodes consisted of 3 compute instances with 1 vCPU and 2 gig RAM with a cost of EUR 2.99 per instance. The total test cluster would cost EUR 14.85 per month as Hetzner still charges for non-running instances.

2. Install Docker

Docker is required to be installed on all nodes. Once installed the cgroup driver needs to be changed as Docker by default uses the cgroupfs driver but Linux recommends using systemd by default. Changing the driver ensures there is only one control group to constrain the resources being used. Running two control groups is not recommended as it creates a more complicated view of resource allocation. It has also been shown that nodes become unstable if there are two control group managers.

This is achieved by creating the following file:

$ touch /etc/docker/daemon.json

and including the following in the file:

{

"exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file",

"log-opts": { "max-size": "100m" },

"storage-driver": "overlay2"

}

3. Install Kubeadm

Now my question was how do you start a cluster? Enter kubeadm. Kubeadm is software to get a cluster up and running with a master node as well as joining any slave nodes to the master node. So here you can join as many slave nodes to the master node.

The Kubernetes signing key needs to be added on all nodes.

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

The Xenial Kubernetes repository is added:

$ apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

Kubeadm is then installed:

$ apt install kubeadm

Swap memory is then turned off as Kubernetes does not operate correctly with it running:

$ swapoff –a

On the master node Kubernetes is then initialized with:

$ kubeadm init –pod-network-cidr=10.244.0.0/16

Specifying the pod network CIDR is required by Flannel, the pod network provider being used.

To start using the cluster the Kubernetes installation tells the user to run the following commands as a regular user:

$ mkdir -p $HOME/.kube



$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config



$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

4. Install Flannel

Deploying Flannel, the Pod network “controller” is done by running:

$ kubectl apply –f \ https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Flannel allocates subnets to each host out of a larger preconfigured address space and provides a layer 3 IPv4 network between multiple nodes in a cluster. It does not control how containers are networked but only how the traffic is transported between hosts. The network configuration is stored in the Kubernetes API or etcd.

In order to add the slave nodes to the master node the kubeadm join command is used on the slave nodes. The installation provides the admin with this value and will be similar to the following command:

$ kubeadm join 195.201.96.157:6443 --token brtmr3.1i8brj4p70ktclql \ --discovery-token-ca-cert-hash sha256:f1c5d3047dd8b919400871c4f41c3cca1d498880ae6691a4e674f6840c7bcf83

The slave nodes then join the master node and are available for pods to be created on.

5. Install Hetzner storage interface

The Hetzner storage driver is then required so that it is possible to dynamically provision storage. In order for the cluster to be able to dynamically provision storage the Hetzner CSI driver is installed. In the most current version of Kubernetes the required feature gates are included by default and are in beta stage. The inclusion of theses feature gates when initiating the cluster is therefore not required which is included in the README on the Hetzner GitHub repository. A feature gate is a way of including alpha and experimental features in the cluster so that new features can be used.

First the two custom resources CSINodeInfo and CSIDriver are created. On the master node the following manifests are applied.

# kubectl apply -f \ https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csidriver.yaml



# kubectl apply -f \ https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csinodeinfo.yaml

An API token then needs to be created in the Hetzner console. The API token is required for the dynamic provisioning of volumes within the Hetzner account.

A secret is then created containing the token and saved as secret.yaml:

apiVersion: v1

kind: Secret

metadata:

  name: hcloud-csi

  namespace: kube-system

stringData:

  token: YOURTOKEN

The secret is applied using the command:

# kubectl apply –f secret.yaml

The CSI driver is then deployed:

# kubectl apply –f \ https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml

The admin now has a storage class and storage driver installed allowing dynamic provisioning of persistent volumes which enables persistent storage should a pod become unhealthy.

6. Drupal Deployment

The official Drupal image drupal:8.6.15-apache will be used for the prototype to demonstrate the setup. As a database the MySQL image mysql:5.6 will be used.

The following YAML files are created drupal-persistentVolume-deployment.yaml and mysql-persistentVolume-deployment.yaml.

---

apiVersion: v1

kind: Service

metadata:

  name: drupal 

  labels:

    app: drupal

spec:

  ports:

    - port: 80

      name: web

      targetPort: 80

  selector:

    app: drupal

    tier: frontend

  type: LoadBalancer

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: drupal-pvc

  labels:

    app: drupal

spec:

  accessModes:

  - ReadWriteOnce

  resources:

    requests:

      storage: 10Gi

---

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2

kind: Deployment

metadata:

  name: drupal

  labels:

    app: drupal

    tier: frontend

spec:

  selector:

    matchLabels:

      app: drupal

      tier: frontend

  strategy:

    type: Recreate

  template:

    metadata:

      labels:

        app: drupal

        tier: frontend

    spec:

      initContainers:

        - name: init-sites-volume

          image: drupal:8.6.15-apache

          command: ['/bin/bash', '-c']

          args: ['cp -r /var/www/html/sites /data; chown www-data:www-data /data/ -R']

          volumeMounts:

          - mountPath: /data

            name: drupal-pvc

      containers:

      - image: drupal:8.6.15-apache

        imagePullPolicy: IfNotPresent

        name: drupal

        env:

        - name: DRUPAL_DATABASE_HOST

          value: drupal-mysql

        - name: DRUPAL_DATABASE_PASSWORD

          valueFrom:

            secretKeyRef:

              name: mysql-pass

              key: password

        ports:

        - containerPort: 80

          name: drupal

        volumeMounts:

        - name: drupal-pvc

          mountPath: /var/www/html/modules

          subPath: modules

        - name: drupal-pvc

          mountPath: /var/www/html/profiles

          subPath: profiles

        - name: drupal-pvc

          mountPath: /var/www/html/sites

          subPath: sites

        - name: drupal-pvc

          mountPath: /var/www/html/themes

          subPath: themes

      volumes:

        - name: drupal-pvc

          persistentVolumeClaim:

            claimName: drupal-pvc

For the Drupal deployment a service is created in order to provide the Drupal site with a cluster IP address where the site is accessible from when it needs to be exposed to the Ingress controller which exposes the site to the internet. The persistent volume claim creates a 10 GB (the minimum amount of storage allowed in Hetzner) volume in Hetzner when a pod tries to claim storage. In the deployment manifest an init-container is created which copies the sites folder in the container to the persistent storage and changes the owner and permissions so the folders are accessible. The main running container is then created and binds the persistent volume sites folders to the pod folders allowing changes to be made to the website. The database host is set as well as the database password which is created from a secret passed into the Kubernetes API by a secret generator. The container then makes a claim for a volume using the persistent volume claim. The Hetzner CSI driver then creates the volumes and binds them to the nodes for usage.

7. MySQL deployment

---   

apiVersion: v1

kind: Service

metadata:

  name: drupal-db

  labels:

    app: drupal

spec:

  ports:

    - port: 3306

  selector:

    app: drupal

    tier: backend

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: drupal-pvc-db

spec:

  accessModes:

  - ReadWriteOnce

  resources:

    requests:

      storage: 10Gi  

---

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2

kind: Deployment

metadata:

  name: drupal-mysql

  labels:

    app: drupal        

spec:

  replicas: 1

  selector:

    matchLabels:

      app: drupal

      tier: backend

  strategy:

    type: Recreate

  template:

    metadata:

      labels:

        app: drupal

        tier: backend

    spec:

      containers:

      - image: mysql:5.6

        imagePullPolicy: IfNotPresent

        name: mysql

        env:

        - name: MYSQL_DATABASE

          value: drupal-db

        - name: MYSQL_ROOT_PASSWORD

          valueFrom:

            secretKeyRef:

              name: mysql-pass

              key: password

        ports:

        - containerPort: 3306

          name: mysql

        volumeMounts:

        - name: mysql-stateful-storage

          mountPath: /var/lib/mysql

      volumes:

        - name: mysql-stateful-storage

          persistentVolumeClaim:

            claimName: drupal-pvc-db

In the MySQL deployment a service is also created exposing the MySQL pod with a cluster IP, allowing the Drupal instance to find the database and will not be exposed to the Ingress controller. KubeDNS the Kubernetes DNS service creates a DNS entry for this cluster IP and the drupal-mysql hostname. A persistent volume claim is then created as in the Drupal deployment. The container is then run setting the database name and root password from the secret created in the Kubernetes API. The MySQL folder with the data is then mounted to the persistent storage. The volume for the persistent storage is bound to the pod using the persistent volume claim as in the Drupal deployment.

8. Kustomization file

A kustomization.yaml file is then created, including a secret generator for the MySQL root password and the files needed for the deployments:

secretGenerator:

- name: mysql-pass

  literals:

  - password=YOURPASSWORD

resources:

  - mysql-persistentVolume-deployment.yaml

  - drupal-persistentVolume-deployment.yaml

The three files are then stored in the same folder and the deployment is created by running:

# kubectl apply –k ./

This creates the deployments, services, persistent volume claims and dynamically provisions the storage volumes. All abstracted in their own running microservice and independent of one another.

There are however a number of limitations to this solution and I will continue to work on it to improve it so that it is capable of running a production site.

I would want to move the MySQL deployment to a StatefulSet as it is currently unable to scale. Creating a replica breaks the site and this needs to be looked into. The Drupal container can scale with replicas. Comparing a development and statefulset will also be useful to see which is more suitable for Drupal. Next steps would also be to integrate Gitlab with the cluster and to have a running Ingress controller, using the review apps provided by Gitlab for Kubernetes integration.

This solution shows how to run a basic Drupal site using Kubernetes and Hetzner as the storage provider. Hetzner is a cheap provider and the storage interface is stable. Drupal and Kubernetes are able to be run together but more work needs to be done in order to have a production ready site and to work out the unique challenges related to Drupal and saving state.