Skip to content

Conversation

@edenhill
Copy link
Contributor

@edenhill edenhill commented May 8, 2019

The logical broker connections (such as to the group coordinator)
are reserved for specific use and shall not be reused for things like
Metadata requests. The code to lookup a usable broker to send
Metadata requests understood this, but the code to select a broker
to create a new connection to if no usable connections are
found (sparse connnections) did not understand this and already
thought there was an active connection (the logical broker connection)
and refused to set up a new one.

This could lead to consumers stalling consumption if it was fetching
from a single broker and that broker went down, no new connection
to the cluster would be made to refresh metadata.

This fixes #2266

The logical broker connections (such as to the group coordinator)
are reserved for specific use and shall not be reused for things like
Metadata requests. The code to lookup a usable broker to send
Metadata requests understood this, but the code to select a broker
to create a new connection to if no usable connections are
found (sparse connnections) did not understand this and already
thought there was an active connection (the logical broker connection)
and refused to set up a new one.

This could lead to consumers stalling consumption if it was fetching
from a single broker and that broker went down, no new connection
to the cluster would be made to refresh metadata.
@dubee
Copy link

dubee commented May 22, 2019

@edenhill, this seems like a fairly important fix. Are you going to release 1.0.1 with this fix in it soon?

Curious if this bug would cause poll() to block indefinitely?

@edenhill
Copy link
Contributor Author

Yes, it is part of the v1.0.1 release, v1.0.1-RC1 was tagged yesterday, we're looking to do the final release later this week.

This bug could cause a consumer to not consume any messages, effectively making consumer_poll() block indefinitely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long consumer recovery time when non-group coordinator is downed.

4 participants