Persistent storage improvements

A few months ago I wrote a blog post about how you can manually increase the size of the persistent volumes you use for Kafka or Zookeeper storage. I promised that one day it would be supported directly in Strimzi. And that’s what happened in the Strimzi 0.12 release. In this post I’m going to tell you more about the improvements to persistent storage in the 0.12 release.

Resizing persistent volumes

Support for resizing volumes depends on the version of your Kubernetes or OpenShift cluster, and on the infrastructure it runs on. We tested this feature with Kubernetes and OpenShift on Amazon AWS. It should be compatible with Kubernetes 1.11+ and OpenShift 3.11+. Resizing persistent volumes should work on most major public clouds (such as Amazon AWS, Microsoft Azure and Google Cloud) and many other storage types (such as Cinder or Ceph).

To tell your Kubernetes or OpenShift cluster that your storage supports volume resizing, you have to set the allowVolumeExpansion option in your StorageClass to true. For example, the following StorageClass creates Amazon AWS GP2 volumes with xfs filesystem, encryption and volume expansion enabled:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  encrypted: "true"
  fsType: "xfs"
reclaimPolicy: Delete
allowVolumeExpansion: true

The option is called allowVolumeExpansion because right now you can only increase the size of the volumes to expand them - you cannot decrease it. Strimzi checks that the storage class has this option set to true before it tries to resize any volumes. The original blog post explained in detail how this resizing works:

The size increase is requested by changing the spec.resources.requests.storage of the Persistent Volume Claim (PVC)
Kubernetes requests resizing of the volumes from your infrastructure
Once the resizing of the volume is finished, the pod using this volume needs to be restarted to allow the expansion of the file system

What Strimzi does is simplify this process by taking care of it for you! All you need to do is edit your Kafka custom resource and increase the requested storage size. You can increase the size of both Zookeeper and Kafka volumes. For example, you can change the resource from:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  zookeeper:
    # ...
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false

to:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      type: persistent-claim
      size: 1000Gi
      deleteClaim: false
  zookeeper:
    # ...
    storage:
      type: persistent-claim
      size: 200Gi
      deleteClaim: false

And the rest is taken care of by Strimzi. The Cluster Operator automatically changes the requested volume size in the PVCs and waits until a restart of the pod is required. Once the condition of the PVC is set to FileSystemResizePending (read the original blog post for more information about the different states the PVC can be in during the resizing), Strimzi automatically restarts the pod using this PVC.

# ...
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 1000Gi
  conditions:
    - type: FileSystemResizePending
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2019-02-27T12:25:30Z'
      message: >-
        Waiting for user to (re-)start a pod to finish file system resize of
        volume on node.

The restart might not happen immediately - it will happen as part of the next periodical reconciliation. So you might need to wait a couple of minutes before it happens. The Cluster Operator is, of course, aware of your whole Kafka cluster and will not restart all your pods at the same time, but one by one to make sure your cluster remains in a usable state.

Adding and removing volumes from JBOD storage

JBOD (Just a Bunch Of Disks) storage allows you to use multiple disks in each Kafka broker for storing commit logs. Strimzi already added support for JBOD storage in Kafka brokers in version 0.11 (JBOD storage is not supported in Zookeeper). But it didn’t allow you to add or remove volumes from it. You had to specify all your volumes when creating the cluster.

In 0.12 we significantly improved the JBOD storage support. As well as resizing the volumes as described in the previous section, you can now add or remove volumes in an existing cluster.

To be able to do that, an existing Kafka cluster already needs to be using JBOD storage. For example:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
# ...

If you have something like this, you can easily add more volumes just by editing the YAML. For example:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
      - id: 10
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
      - id: 11
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
  # ...

Similarly you can also remove volumes:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      type: jbod
      volumes:
      - id: 1
        type: persistent-claim
        size: 1000Gi
        deleteClaim: false
# ...

When removing volumes, Strimzi will not do anything to move the data from the volume you are removing. You have to do that manually before removing the volume(s) from the JBOD storage. Strimzi will simply remove the volume from the pods.

There are some things you should keep in mind and take care with:

The id numbers have to be unique.
The id numbers do not have to be in sequence.
Changing the id number is the same as removing the volume with the original id, and adding a new volume with the new id. Strimzi will stop using the old PVC with the old volume and create a new PVC for the new volume.
By default the PVCs are not deleted. So if you reuse an id which you already used in the past you should check first whether the old PVC still exists or not. If it exists, it will be reused instead of creating a new PVC with a new volume.

Using different storage class for each broker

One of the other storage features we added in the 0.12 release is the capability to use a different storage class for each broker. You can specify a different storage class for one or more Kafka brokers, instead of using the same storage class for all of them. This is useful if, for example, storage classes are restricted to different availability zones or data centers. You can use the overrides field for this purpose. For example:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    # ...
    storage:
      deleteClaim: true
      size: 100Gi
      type: persistent-claim
      class: my-storage-class
      overrides:
        - broker: 0
          class: my-storage-class-zone-1a
        - broker: 1
          class: my-storage-class-zone-1b
        - broker: 2
          class: my-storage-class-zone-1c
  # ...

Conclusion

These improvements should make Strimzi more easy to use. They should make it much easier to grow your Kafka cluster together with your project, which is important when you are looking to save on infrastructure costs. Things such as adding volumes or resizing volumes are an important part of Day 2 operations which is the area where the operator pattern can provide the most help. In future Strimzi releases we plan to add more features related to storage - such as backup and recovery.

If you liked this blog post, star us on GitHub and follow us on Twitter to make sure you don’t miss any of our future blog posts!