Drupal and Kubernetes (K8s) getting started

by ian on Fri, 08/16/2019 - 12:28

This blog post aims to describe the learnings and the challenges I faced when looking at implementing Drupal on Kubernetes. If you have zero experience with Kubernetes then this post will give you a shallow dive into Kubernetes with Drupal.

This blog post aims to describe the learnings and the challenges I faced when looking at implementing Drupal on Kubernetes. I will describe how I went about creating a test environment and then a cloud hosted environment on Hetzner cloud. I started out with zero experience in container orchestration.

Starting out

Illustration of a female captains in front of a wheel on a boat

Starting out with implementing Drupal on Kubernetes I had no idea where to start or how components would work together. My first step was to get reading up on the topic and of course you start out with the online documentation which Kubernetes has done a very good job of explaining the basic components and how they work together. However, I needed more practice and practical explanations. I purchased the book Kubernetes: Up and Running by Joe Beda, Brendan Burns and Kelsey Hightower. This is a fantastic book and really helped me to get to know the fundamentals of Kubernetes and how the pieces should fit together. The current edition is a bit out of date, not explaining StatefulSets in detail yet as they were still in development but the next edition will explain these, as well as Operators which are the developing pattern for handling stateful services.

I found that the most useful tool if using Linux is Canonical’s microk8s. It is a great application which is very easy to install and use. It sets up a single node cluster on a local machine allowing you to test your deployments. Provisioning volumes is also very easy and uses the host system to dynamically provision the persistent volumes. This was a very fast and easy way to test deployments without incurring any extra costs by using Google, AWS or one of the other cloud providers. Installation is as easy as

$ snap install microk8s –classic

on an Linux machine. To add functionality for DNS, storage and the Kubernetes dashboard requires a single command:

$ microk8s.enable <add-on name>

Adding the storage add-on enables storage to then be dynamically provisioned to the hostpath without further setup.

Once I had a running single node cluster provisioning volumes for persistent storage I was then able to move the cluster towards a real world solution.

1. Create master and slave nodes

In this case I chose Hetzner as it is an extremely cheap cloud provider offering compute instances and storage. For the test cluster I used one master with 2 vCPU, 4 gig RAM which costs EUR 5.88 per month. This is the minimum specification Kubernetes allows you to run the master on when using kubeadm. The slave nodes consisted of 3 compute instances with 1 vCPU and 2 gig RAM with a cost of EUR 2.99 per instance. The total test cluster would cost EUR 14.85 per month as Hetzner still charges for non-running instances.

2. Install Docker

Docker is required to be installed on all nodes. Once installed the cgroup driver needs to be changed as Docker by default uses the cgroupfs driver but Linux recommends using systemd by default. Changing the driver ensures there is only one control group to constrain the resources being used. Running two control groups is not recommended as it creates a more complicated view of resource allocation. It has also been shown that nodes become unstable if there are two control group managers.

This is achieved by creating the following file:

$ touch /etc/docker/daemon.json

and including the following in the file:

{
"exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file",
"log-opts": { "max-size": "100m" },
"storage-driver": "overlay2"
}

3. Install Kubeadm

Now my question was how do you start a cluster? Enter kubeadm. Kubeadm is software to get a cluster up and running with a master node as well as joining any slave nodes to the master node. So here you can join as many slave nodes to the master node.

The Kubernetes signing key needs to be added on all nodes.

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

The Xenial Kubernetes repository is added:

$ apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

Kubeadm is then installed:

$ apt install kubeadm

Swap memory is then turned off as Kubernetes does not operate correctly with it running:

$ swapoff –a

On the master node Kubernetes is then initialized with:

$ kubeadm init –pod-network-cidr=10.244.0.0/16

Specifying the pod network CIDR is required by Flannel, the pod network provider being used.

To start using the cluster the Kubernetes installation tells the user to run the following commands as a regular user:

$ mkdir -p $HOME/.kube

$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

4. Install Flannel

Deploying Flannel, the Pod network “controller” is done by running:

$ kubectl apply –f \ https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Flannel allocates subnets to each host out of a larger preconfigured address space and provides a layer 3 IPv4 network between multiple nodes in a cluster. It does not control how containers are networked but only how the traffic is transported between hosts. The network configuration is stored in the Kubernetes API or etcd.

In order to add the slave nodes to the master node the kubeadm join command is used on the slave nodes. The installation provides the admin with this value and will be similar to the following command:

$ kubeadm join 195.201.96.157:6443 --token brtmr3.1i8brj4p70ktclql \ --discovery-token-ca-cert-hash sha256:f1c5d3047dd8b919400871c4f41c3cca1d498880ae6691a4e674f6840c7bcf83

The slave nodes then join the master node and are available for pods to be created on.

5. Install Hetzner storage interface

The Hetzner storage driver is then required so that it is possible to dynamically provision storage. In order for the cluster to be able to dynamically provision storage the Hetzner CSI driver is installed. In the most current version of Kubernetes the required feature gates are included by default and are in beta stage. The inclusion of theses feature gates when initiating the cluster is therefore not required which is included in the README on the Hetzner GitHub repository. A feature gate is a way of including alpha and experimental features in the cluster so that new features can be used.

First the two custom resources CSINodeInfo and CSIDriver are created. On the master node the following manifests are applied.

# kubectl apply -f \ https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csidriver.yaml

# kubectl apply -f \ https://raw.githubusercontent.com/kubernetes/csi-api/release-1.13/pkg/crd/manifests/csinodeinfo.yaml

An API token then needs to be created in the Hetzner console. The API token is required for the dynamic provisioning of volumes within the Hetzner account.

A secret is then created containing the token and saved as secret.yaml:

apiVersion: v1
kind: Secret
metadata:
name: hcloud-csi
namespace: kube-system
stringData:
token: YOURTOKEN

The secret is applied using the command:

# kubectl apply –f secret.yaml

The CSI driver is then deployed:

# kubectl apply –f \ https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml

The admin now has a storage class and storage driver installed allowing dynamic provisioning of persistent volumes which enables persistent storage should a pod become unhealthy.

6. Drupal Deployment

The official Drupal image drupal:8.6.15-apache will be used for the prototype to demonstrate the setup. As a database the MySQL image mysql:5.6 will be used.

The following YAML files are created drupal-persistentVolume-deployment.yaml and mysql-persistentVolume-deployment.yaml.

---
apiVersion: v1
kind: Service
metadata:
name: drupal
labels:
app: drupal
spec:
ports:
- port: 80
name: web
targetPort: 80
selector:
app: drupal
tier: frontend
type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: drupal-pvc
labels:
app: drupal
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: drupal
labels:
app: drupal
tier: frontend
spec:
selector:
matchLabels:
app: drupal
tier: frontend
strategy:
type: Recreate
template:
metadata:
labels:
app: drupal
tier: frontend
spec:
initContainers:
- name: init-sites-volume
image: drupal:8.6.15-apache
command: ['/bin/bash', '-c']
args: ['cp -r /var/www/html/sites /data; chown www-data:www-data /data/ -R']
volumeMounts:
- mountPath: /data
name: drupal-pvc
containers:
- image: drupal:8.6.15-apache
imagePullPolicy: IfNotPresent
name: drupal
env:
- name: DRUPAL_DATABASE_HOST
value: drupal-mysql
- name: DRUPAL_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
ports:
- containerPort: 80
name: drupal
volumeMounts:
- name: drupal-pvc
mountPath: /var/www/html/modules
subPath: modules
- name: drupal-pvc
mountPath: /var/www/html/profiles
subPath: profiles
- name: drupal-pvc
mountPath: /var/www/html/sites
subPath: sites
- name: drupal-pvc
mountPath: /var/www/html/themes
subPath: themes
volumes:
- name: drupal-pvc
persistentVolumeClaim:
claimName: drupal-pvc

For the Drupal deployment a service is created in order to provide the Drupal site with a cluster IP address where the site is accessible from when it needs to be exposed to the Ingress controller which exposes the site to the internet. The persistent volume claim creates a 10 GB (the minimum amount of storage allowed in Hetzner) volume in Hetzner when a pod tries to claim storage. In the deployment manifest an init-container is created which copies the sites folder in the container to the persistent storage and changes the owner and permissions so the folders are accessible. The main running container is then created and binds the persistent volume sites folders to the pod folders allowing changes to be made to the website. The database host is set as well as the database password which is created from a secret passed into the Kubernetes API by a secret generator. The container then makes a claim for a volume using the persistent volume claim. The Hetzner CSI driver then creates the volumes and binds them to the nodes for usage.

7. MySQL deployment

---   
apiVersion: v1
kind: Service
metadata:
name: drupal-db
labels:
app: drupal
spec:
ports:
- port: 3306
selector:
app: drupal
tier: backend
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: drupal-pvc-db
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: drupal-mysql
labels:
app: drupal
spec:
replicas: 1
selector:
matchLabels:
app: drupal
tier: backend
strategy:
type: Recreate
template:
metadata:
labels:
app: drupal
tier: backend
spec:
containers:
- image: mysql:5.6
imagePullPolicy: IfNotPresent
name: mysql
env:
- name: MYSQL_DATABASE
value: drupal-db
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-stateful-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-stateful-storage
persistentVolumeClaim:
claimName: drupal-pvc-db

In the MySQL deployment a service is also created exposing the MySQL pod with a cluster IP, allowing the Drupal instance to find the database and will not be exposed to the Ingress controller. KubeDNS the Kubernetes DNS service creates a DNS entry for this cluster IP and the drupal-mysql hostname. A persistent volume claim is then created as in the Drupal deployment. The container is then run setting the database name and root password from the secret created in the Kubernetes API. The MySQL folder with the data is then mounted to the persistent storage. The volume for the persistent storage is bound to the pod using the persistent volume claim as in the Drupal deployment.

8. Kustomization file

A kustomization.yaml file is then created, including a secret generator for the MySQL root password and the files needed for the deployments:

secretGenerator:
- name: mysql-pass
literals:
- password=YOURPASSWORD
resources:
- mysql-persistentVolume-deployment.yaml
- drupal-persistentVolume-deployment.yaml

The three files are then stored in the same folder and the deployment is created by running:

# kubectl apply –k ./

This creates the deployments, services, persistent volume claims and dynamically provisions the storage volumes. All abstracted in their own running microservice and independent of one another.

There are however a number of limitations to this solution and I will continue to work on it to improve it so that it is capable of running a production site.

I would want to move the MySQL deployment to a StatefulSet as it is currently unable to scale. Creating a replica breaks the site and this needs to be looked into. The Drupal container can scale with replicas. Comparing a development and statefulset will also be useful to see which is more suitable for Drupal. Next steps would also be to integrate Gitlab with the cluster and to have a running Ingress controller, using the review apps provided by Gitlab for Kubernetes integration.

This solution shows how to run a basic Drupal site using Kubernetes and Hetzner as the storage provider. Hetzner is a cheap provider and the storage interface is stable. Drupal and Kubernetes are able to be run together but more work needs to be done in order to have a production ready site and to work out the unique challenges related to Drupal and saving state.