-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
I'm seeing consumers committing offsets that are lower than the offset that it has committed before. This seems to happen during a rebalance.
I have a group of 4 consumers, reading from a single topic with 20 partitions, ack mode is RECORD (and enable.auto.commit=false), and no other non-default settings.
The 4 consumers are all happily consuming messages from their assigned partitions. Debug logging is enabled, which tells me exactly which offsets are committed for each partition. The consumer that is assigned partion 1 logs
Group g committed offset 2080124 for partition p-1
Group g committed offset 2080125 for partition p-1
When one of the consumers is stopped, a rebalance is triggered on the remaining 3 consumers. The consumer that is assigned partition 1 shows the following sequence of log messages:
Revoking previously assigned partitions [p-0, p-3, p-4, p-1, p-2] for group g
Group g committed offset 2077901 for partition p-1
(Re-)joining group g
Group g committed offset 2077901 for partition p-1
Group g committed offset 2077902 for partition p-1
Performing assignment for group g using strategy range with subscriptions {...}
Finished assignment for group g: {...}
Successfully joined group g with generation 605
Setting newly assigned partitions [p-0, p-5, p-6, p-3, p-4, p-1, p-2] for group g
g fetching committed offsets for partitions: [p-0, p-5, p-6, p-3, p-4, p-1, p-2]
Group g committed offset 2028542 for partition p-0
Group g committed offset 2046396 for partition p-5
Group g committed offset 2065056 for partition p-6
Group g committed offset 2047834 for partition p-3
Group g committed offset 2055878 for partition p-4
Group g committed offset 2077901 for partition p-1
Group g committed offset 2058169 for partition p-2
partitions assigned:[p-0, p-5, p-6, p-3, p-4, p-1, p-2]
The committed offset of 2077901 on line 2 is lower than the last "regularly" committed offset of 2080125. It seems that the consumer somehow commits this incorrect offset as soon as the rebalance is started. It even commits two more incorrect offsets. After the consumer has joined the group, the offsets are fetched, including the incorrect one for p-1. After that, the consumer continues committing offsets starting from these fetched values, as expected.
This phenomenon happens almost every time a consumer is stopped, on a few partitions. I have not been able to reproduce it in a lab setting, unfortunately. Does any of you have an idea on what is happening here?
I'm using spring-kafka 1.3.0 and kafka-clients 0.11.0.2.
Thanks,
Tom