Skip to content

Stream connections dropping on 0.5.35 release #2638

@eliteprox

Description

@eliteprox

Describe the bug

A few orchestrators have mentioned issues with the latest release (0.5.35). I upgraded on Monday and noticed a significant change in stream stability. Large spikes and crashes were common. There were a lot of “connection reset by peer” errors. After extensive troubleshooting, a rollback to 0.5.34 today restored stability and we performed better than any day this week.

I think PR #2628 might be related to these connections dropping prematurely.

To Reproduce
Steps to reproduce the behavior:

While operating orchestrator in production, observe logs for connection resets and monitor per stream metrics. Many are associated with “connection reset by peer” errors.

Expected behavior
After some research it seems the idle timeout may not be the best way to handle latent connections https://stackoverflow.com/questions/29334407/creating-an-idle-timeout-in-go

At first I thought perhaps 8 seconds was just too low, but it seems that separate Read and Write time-out may perform better than Idle timeout. There is also an event handler option. See stackoverflow link above.

Screenshots
None

Desktop (please complete the following information):

  • OS: Ubuntu Linux 22.04

Additional context

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions