Moving data between JBOD disks using Cruise Control

Apache Kafka is a platform that provides durability and fault tolerance by storing messages on persistent volumes. In most cases, each Kafka broker will use one persistent volume. However, it is also possible to use multiple volumes for each broker. This configuration is called JBOD storage. Using JBOD storage, you can increase the data storage capacity for Kafka nodes, which can further lead to performance improvements. It might happen that you need to add or remove volumes to increase or shrink the overall capacity and performance of the Kafka cluster. When adding new volumes, you first need to add the volume and then move some data to it. That can be done using the intra-broker rebalance. When removing volumes, you have to safely move the data to other volumes first. Failing to do so could result in data loss. Data movement between JBOD disks can be managed using the kafka-reassign-partitions.sh tool, though it is not particularly user-friendly. To simplify this process, Strimzi 0.45.0 introduces support for moving data between JBOD disks using Cruise Control.

New remove-disks mode in KafkaRebalance

The new remove-disks mode allows you to move the data from one JBOD disk to another JBOD disk using Strimzi’s KafkaRebalance custom resource. This feature makes use of the remove-disks endpoint of Cruise Control that triggers a rebalancing operation which moves all replicas, starting with the largest and proceeding to the smallest, to the remaining disks.

Setting up the environment

Let’s set up a cluster to work through an example demonstrating this feature. In the example we will see how to safely remove the JBOD disks by moving the data from one disk to another, and we will use Kafka and KafkaNodePool resources to create a KRaft cluster. Then, we create a KafkaRebalance resource in remove-disks mode, specifying the brokers and volume IDs for partition reassignment. After generating the optimization proposal, we approve it to move the data.

To get the KRaft cluster up and running, we will first have to install the Strimzi Cluster Operator and then deploy the Kafka and KafkaNodePool resources. You can install the Cluster Operator with the installation method you prefer, which are described in the Strimzi documentation.

Let’s deploy a KRaft cluster with JBOD storage and Cruise Control enabled.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: controller
  labels:
    strimzi.io/cluster: my-cluster
  annotations:
    strimzi.io/next-node-ids: "[0-2]"
spec:
  replicas: 3
  roles:
    - controller
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        kraftMetadata: shared
        deleteClaim: false
---

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: broker
  labels:
    strimzi.io/cluster: my-cluster
  annotations:
    strimzi.io/next-node-ids: "[3-5]"
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        # Indicates that this directory will be used to store Kraft metadata log
        kraftMetadata: shared
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
      - id: 2
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
---

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 3.9.0
    metadataVersion: 3.9-IV0
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
  entityOperator:
    topicOperator: {}
    userOperator: {}
  cruiseControl: {}

Once the Kafka cluster is ready, you can create some topics so that the volumes on brokers contain partition replicas.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 7200000
    segment.bytes: 1073741824

Let’s see the partition replicas assigned to the volumes on the brokers using the kafka-log-dir.sh tool and a helper pod named my-pod.

kubectl -n myproject run my-pod -ti --image=quay.io/strimzi/kafka:0.45.0-kafka-3.9.0 --rm=true --restart=Never -- bin/kafka-log-dirs.sh --describe --bootstrap-server my-cluster-kafka-bootstrap:9092  --broker-list 3,4,5 --topic-list my-topic

Output:

{
  "brokers": [
    {
      "broker": 4,
      "logDirs": [
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log4"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log4"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log4"
        }
      ]
    },
    {
      "broker": 5,
      "logDirs": [
        {
          "partitions": [
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log5"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log5"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log5"
        }
      ]
    },
    {
      "broker": 3,
      "logDirs": [
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log3"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log3"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log3"
        }
      ]
    }
  ],
  "version": 1
}

Next, let’s move the data from volumes 1 and 2 to volume 0 for all the brokers. To do that, we create a KafkaRebalance resource and specify remove-disks mode.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  name: my-rebalance
  labels:
    strimzi.io/cluster: my-cluster
# no goals specified, using the default goals from the Cruise Control configuration
spec:
  mode: remove-disks
  moveReplicasOffVolumes:
    - brokerId: 3
      volumeIds:
        - 1
        - 2
    - brokerId: 4
      volumeIds:
        - 1
        - 2
    - brokerId: 5
      volumeIds:
        - 1
        - 2

Now let’s wait for the KafkaRebalance resource to move to ProposalReady state. You can check the rebalance summary by running the following command once the proposal is ready:

kubectl get kafkarebalance my-rebalance -n myproject -o yaml

And you should be able to get an output like this:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
  creationTimestamp: "2025-01-15T10:20:37Z"
  generation: 1
  labels:
    strimzi.io/cluster: my-cluster
  name: my-rebalance
  namespace: myproject
  resourceVersion: "1755"
  uid: 335ee6b2-59d2-4326-a690-00278d125abd
spec:
  mode: remove-disks
  moveReplicasOffVolumes:
    - brokerId: 3
      volumeIds:
        - 1
        - 2
    - brokerId: 4
      volumeIds:
        - 1
        - 2
    - brokerId: 5
      volumeIds:
        - 1
        - 2
status:
  conditions:
    - lastTransitionTime: "2025-01-15T10:20:52.882813259Z"
      status: "True"
      type: ProposalReady
  observedGeneration: 1
  optimizationResult:
    afterBeforeLoadConfigMap: my-rebalance
    dataToMoveMB: 0
    excludedBrokersForLeadership: []
    excludedBrokersForReplicaMove: []
    excludedTopics: []
    intraBrokerDataToMoveMB: 0
    monitoredPartitionsPercentage: 100
    numIntraBrokerReplicaMovements: 135
    numLeaderMovements: 0
    numReplicaMovements: 0
    onDemandBalancednessScoreAfter: 100
    onDemandBalancednessScoreBefore: 0
    provisionRecommendation: ""
    provisionStatus: UNDECIDED
    recentWindows: 1
  sessionId: 028a7dc8-8f6d-485e-8580-93225528b587

Now you can use the approve annotation to apply the generated proposal. If the strimzi.io/rebalance-auto-approval annotation is set to true in the KafkaRebalance resource, the Cluster Operator will approve the proposal automatically. For more details, you can refer to our Strimzi documentation.

After the rebalance is complete, we use the kafka-log-dirs.sh tool again to verify that the data has been moved.

kubectl -n myproject run my-pod -ti --image=quay.io/strimzi/kafka:0.45.0-kafka-3.9.0 --rm=true --restart=Never -- bin/kafka-log-dirs.sh --describe --bootstrap-server my-cluster-kafka-bootstrap:9092  --broker-list 3,4,5 --topic-list my-topic

Output:

{
  "brokers": [
    {
      "broker": 4,
      "logDirs": [
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log4"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log4"
        },
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log4"
        }
      ]
    },
    {
      "broker": 5,
      "logDirs": [
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log5"
        },
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log5"
        },
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log5"
        }
      ]
    },
    {
      "broker": 3,
      "logDirs": [
        {
          "partitions": [
            {
              "partition": "my-topic-0",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-1",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "my-topic-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data-0/kafka-log3"
        },
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-2/kafka-log3"
        },
        {
          "partitions": [],
          "error": null,
          "logDir": "/var/lib/kafka/data-1/kafka-log3"
        }
      ]
    }
  ],
  "version": 1
}

As shown in the output, the data has been moved from volumes 1 and 2 of all the brokers, which no longer contain any partition replicas. To remove volume 1 and volume 2 from all the brokers, we need to update the configuration for the node pools to specify only volume 0 and then apply the changes.

# ...
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: broker
  labels:
    strimzi.io/cluster: my-cluster
  annotations:
    strimzi.io/next-node-ids: "[3-5]"
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        kraftMetadata: shared
        deleteClaim: false
# ...

Checking the PVCs, we see that they are not deleted.

 kubectl get pvc -n myproject

NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-0-my-cluster-broker-3       Bound    pvc-d3027150-36ef-48f5-9ee1-ee4bdad93c0c   100Gi      RWO            standard       63m
data-0-my-cluster-broker-4       Bound    pvc-12792aeb-83c5-47a9-a7e3-cfe9a3472fee   100Gi      RWO            standard       63m
data-0-my-cluster-broker-5       Bound    pvc-35c1b8ad-d4cc-4785-848a-670dc62035c3   100Gi      RWO            standard       63m
data-0-my-cluster-controller-0   Bound    pvc-7719eb5d-59d1-4e47-9fed-7ffeff566d1f   100Gi      RWO            standard       63m
data-0-my-cluster-controller-1   Bound    pvc-b1bccbe2-cd0b-4ae6-a6d7-54f2dd9728bf   100Gi      RWO            standard       63m
data-0-my-cluster-controller-2   Bound    pvc-a7b8306a-f9c3-4e62-9201-f9cb189fc3f8   100Gi      RWO            standard       63m
data-1-my-cluster-broker-3       Bound    pvc-0aa2f480-6cb8-4247-b0f7-322bb95b0b2a   100Gi      RWO            standard       63m
data-1-my-cluster-broker-4       Bound    pvc-870755ad-3ee2-4c7a-8c03-c20502014548   100Gi      RWO            standard       63m
data-1-my-cluster-broker-5       Bound    pvc-0a291b63-955b-40db-98e7-eef25224c78a   100Gi      RWO            standard       63m
data-2-my-cluster-broker-4       Bound    pvc-3d79414f-e227-4555-8589-41893d433d8d   100Gi      RWO            standard       63m
data-2-my-cluster-broker-5       Bound    pvc-0c126dc5-863f-4e2a-96ae-1a4fef9d8839   100Gi      RWO            standard       63m

It is because they are not deleted by default, and you need to remove them yourself. You can delete the PVCs using the following command.

kubectl delete pvc data-1-my-cluster-broker-3  -n myproject

You can remove the other PVCs in the same way.

What’s missing?

After all replicas are moved from the specified disk, it is very much possible that some new partitions replicas gets assigned to them when a new topic is created and also the disk may still be used by Cruise Control during the rebalance, so make sure that you delete the unrequired volumes once they are cleaned before performing new operations like rebalancing or creating topics.

Additional notes

This feature only works if JBOD storage is enabled and multiple disks are used, otherwise you will be prompted for not having enough volumes to move the data to.
The optimization proposal does not show the load before optimization, it only shows the load after optimization.

What’s next

We hope this blog post has provided you with a clear understanding of how you can use the KafkaRebalance custom resource in remove-disks mode to easily move the data between the JBOD disks. If you encounter any issues or want to know more, refer to our documentation on Using Cruise Control to reassign partitions on JBOD disks.