Uneven worker load distribution when using Kafka

Franky Laarits · December 2023

Why do I see some worker processes having a higher usage than others for Kafka sources or destinations?

Brian Yearwood · January 2024

The relationship is a one to one relationship for a consumer to partition in a Kafka topic. Currently when a consumer receives the partition assignment from the co-ordinator they are assigned using the consumer ID.

The "coordinator" takes the entire list of consumers and then matches them to partitions it does not assign partitions to new consumers as they join the consumer group.

The coordinator is taking the list of consumers, sorting them by name, and then giving them to our partition assignment code.

For example:
- Kafka topic which has 10 partitions
- 2 Cribl workers each with 10 worker processes
- There will be a total of 20 consumers that will join the consumer group

The first 10 consumers from the first worker will be assigned partitions to consume from. Although all 20 consumers will join the consumer group only 10 will be consuming and the others will be part of the consumer group but in an idle state.

It has been seen that in such cases when the worker that is consuming is stopped / restarted the idle consumers will be go through the rebalancing phase, then be assigned partitions and commence consuming.

However as soon as the original worker processes have rejoined the consumer group and the rebalancing is completed they will take over the consuming from the partitions leaving the other worker idle again.

This has been reported to our development team for review as a feature enhancement in CRIBL-20627 to investigate if we randomize which consumers get assigned which partitions.

Brian Yearwood · January 2024

The relationship is a one to one relationship for a consumer to partition in a Kafka topic. Currently when a consumer receives the partition assignment from the co-ordinator they are assigned using the consumer ID.

The "coordinator" takes the entire list of consumers and then matches them to partitions it does not assign partitions to new consumers as they join the consumer group.

The coordinator is taking the list of consumers, sorting them by name, and then giving them to our partition assignment code.

For example:
- Kafka topic which has 10 partitions
- 2 Cribl workers each with 10 worker processes
- There will be a total of 20 consumers that will join the consumer group

The first 10 consumers from the first worker will be assigned partitions to consume from. Although all 20 consumers will join the consumer group only 10 will be consuming and the others will be part of the consumer group but in an idle state.

It has been seen that in such cases when the worker that is consuming is stopped / restarted the idle consumers will be go through the rebalancing phase, then be assigned partitions and commence consuming.

However as soon as the original worker processes have rejoined the consumer group and the rebalancing is completed they will take over the consuming from the partitions leaving the other worker idle again.

This has been reported to our development team for review as a feature enhancement in CRIBL-20627 to investigate if we randomize which consumers get assigned which partitions.

Uneven worker load distribution when using Kafka

Best Answer

Answers

Categories