Skip to content

Creating connections can hang during server downtime #595

@pvlugter

Description

@pvlugter

During Postgres instance changes, maintenance or resizing or failover, on Google Cloud SQL in our case, we've been seeing that when applications are active during this downtime, the connection pool can reach its max size with allocated connections, while these connections are not actually usable (neither acquired nor idle). I see there have been some similar issues reported before, with the connection pool hanging after database restarts.

In what we're seeing, connections are closed when the server goes down, and these are invalidated and released to the pool as expected. During the downtime, new connections are accepted and then closed immediately on the Cloud SQL side. For the client, creating these connections never returns, because the SSL handshake future is never completed. Permits have been allocated from the pool, but with the connection creation Mono not returning, they also aren't invalidated.

We've tested changes that handle the early connection resets in the SSL adapter, and this looks good under our testing — I'll create a pull request. There could also be a timeout on the SSL handshake Mono, to safeguard against this never completing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions