Scaling an Apache Kafka cluster up or down can help meet various goals and deliver multiple benefits. Scaling up by adding new brokers enables the cluster to handle a higher data load across clients. Scaling down by removing brokers can reduce costs and energy consumption when demand is lower. But scaling the cluster makes more sense if it’s done automatically, for example by leveraging the HPA (Horizontal Pod Autoscaler) within Kubernetes. After scaling up, the issue arises from the cluster being unbalanced, with new brokers left empty. This requires rebalancing topic partitions to equally spread the load across all brokers, old and new. On the other hand, scaling down may not be possible if the brokers to be removed host topic partitions, unless we are willing to accept offline replicas and sacrifice High Availability (HA). In both scenarios, rebalancing the cluster by using the Cruise Control integration within Strimzi is the solution. Doing it automatically makes the process even easier and this is what we are going to talk about in this blog post.
How scaling and rebalancing is done today
When scaling up a Kafka cluster, newly added brokers don’t get any partitions for the topics hosted on the other brokers.
The new brokers remain empty and will receive partitions for newly created topics based on Apache Kafka’s distribution mechanism.
Quite often, this is not what you want because you are scaling up in order to spread the load across more brokers and to get better performance.
Currently, the only way to achieve this with Strimzi is by manually rebalancing the cluster using the KafkaRebalance
custom resource in add-brokers
mode, where you list the added brokers and specify the goals to be met by Cruise Control.
This way, some of the already existing topic partitions are moved from the old brokers to the new ones, making the cluster more balanced.
When scaling down, the operation to remove brokers could be blocked by the Cluster Operator because those brokers are hosting topic partitions and the immediate shutdown will cause offline replicas and loss of HA.
In order to move forward, before scaling down, it is possible to run a manual rebalancing by using a KafkaRebalance
custom resource with the remove-brokers
mode (listing the brokers to remove) and the goals for that.
This way, Cruise Control moves partitions off the brokers to be removed so that they become empty and can be scaled down.
For more details about how to do this, please refer to the official documention related to the Cruise Control integration within Strimzi.
While both the above approaches work, the process is still actually manual and done in two separate steps:
- scale the cluster up, then run a rebalancing by using the
add-brokers
mode. - run a rebalancing by using the
remove-brokers
mode, then scale the cluster down.
What if it could be possible to automate such a process? What about having the Cluster Operator running a rebalance right after a scale up operation, or right before scaling down the cluster? The new Strimzi 0.44.0 release brings to you the new auto-rebalancing on cluster scaling feature!
The auto-rebalancing on scaling feature to the rescue!
Enabling auto-rebalancing is totally configurable through the new autoRebalance
property within the cruiseControl
section in the Kafka
custom resource.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
# ...
cruiseControl:
# ...
autoRebalance:
- mode: add-brokers
template:
name: my-add-brokers-rebalancing-template
- mode: remove-brokers
template:
name: my-remove-brokers-rebalancing-template
From the example above, within the autoRebalance
section, the user can specify “when” the auto-rebalancing has to run: on scaling up (by using the add-brokers
mode) and/or on scaling down (by using the remove-broker
mode).
It is not mandatory to have them both; you can decide to have auto-rebalancing on scaling up but not on scaling down and vice versa.
For each mode, it is possible to specify the name of a rebalancing configuration “template”.
This is just a KafkaRebalance
custom resource with the new strimzi.io/rebalance-template: true
annotation applied, which makes it a configuration template to be used during auto-rebalancing operations and not an actual rebalance request to run.
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaRebalance
metadata:
name: my-rebalance-template
annotations:
strimzi.io/rebalance-template: "true" # specifies that this KafkaRebalance is a rebalance configuration template
spec:
# NOTE: mode and brokers fields, if set, will be just ignored because they are
# automatically set on the corresponding KafkaRebalance by the operator
goals:
- CpuCapacityGoal
- NetworkInboundCapacityGoal
- DiskCapacityGoal
- RackAwareGoal
- MinTopicLeadersPerBrokerGoal
- NetworkOutboundCapacityGoal
- ReplicaCapacityGoal
skipHardGoalCheck: true
# ... other rebalancing related configuration
The configuration template can be the same for both scaling operations or the user can decide to use different ones.
Using the KafkaRebalance
custom resource as a template is a way to make the lives of Strimzi users easier because they already know how it works in regards to integration with Cruise Control.
There isn’t anything new to learn beyond applying the template annotation and not specifying fields like mode
and brokers
because they will be set automatically by the Strimzi operator when running the auto-rebalancing.
Indeed, when an auto-rebalancing has to run, the Cluster Operator starts from the template and creates an actual KafkaRebalance
custom resource by using that configuration and adding the right mode
and brokers
properties.
In the following shell snippet, you can see the KafkaRebalance
resources involved in two auto-rebalancing operations.
The first one is about scaling up and the next one is about scaling down.
It shows how the KafkaRebalance
resources go through the usual states when the operator interacts with Cruise Control for running the rebalancing.
NAME CLUSTER TEMPLATE STATUS
my-add-brokers-rebalancing-template my-cluster true
my-remove-brokers-rebalancing-template my-cluster true
...
my-cluster-auto-rebalancing-add-brokers my-cluster PendingProposal
my-cluster-auto-rebalancing-add-brokers my-cluster ProposalReady
my-cluster-auto-rebalancing-add-brokers my-cluster Rebalancing
my-cluster-auto-rebalancing-add-brokers my-cluster Ready
...
...
my-cluster-auto-rebalancing-remove-brokers my-cluster ProposalReady
my-cluster-auto-rebalancing-remove-brokers my-cluster Rebalancing
my-cluster-auto-rebalancing-remove-brokers my-cluster Ready
It is also possible to get the auto-rebalancing progress from the status
section within the Kafka
custom resource as well as from the specific KafkaRebalance
instance created by the operator of course.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
# ...
cruiseControl:
# ...
autoRebalance:
# ...
status:
autoRebalance:
lastTransitionTime: "2024-10-24T10:46:10.759494479Z"
modes:
- brokers:
- 6
- 7
mode: remove-brokers
state: RebalanceOnScaleUp
If you want to see the auto-rebalancing in action together with cluster auto-scaling, you can watch the KubeCon NA 2025 session, Elastic Data Streaming: Autoscaling Apache Kafka, delivered by Jakub Scholz.
Conclusion
Scaling an Apache Kafka cluster plays an important role in getting better performance on one side and saving costs and energy consumption on the other side. But just scaling can’t bring all the advantages by itself and it needs to be supported by a rebalancing process in order to get a more balanced cluster to spread the load across all the brokers. Having an automatic mechanism to do so is the key for making your cluster capable of scaling more elastically, as also demonstrated by Jakub Scholz in this video. We hope this new feature proves useful for Strimzi users, and we encourage you to try it out. A better integration between Kafka an Cruise Control is one of the long-term goals of the Strimzi project that would allow a sort of Kafka auto pilot mode, and this is a significant step in that direction. Please let us know how it works, any issues you encounter, or any suggestions you may have. Looking forward to hearing from our beloved community!