smartconnpool: reject SetCapacity on a closed pool + shutdown stress tests#20158
smartconnpool: reject SetCapacity on a closed pool + shutdown stress tests#20158arthurschreiber wants to merge 3 commits into
Conversation
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Pull request overview
This PR expands the smartconnpool test suite with additional shutdown/stress regression coverage intended to exercise close/drain behavior under concurrency, refresh-driven reopens, and capacity churn (as part of the shutdown audit work referenced by #20122 and #20157).
Changes:
- Added a comprehensive pool stress test that mixes
Get/Recycle/Taint, refresh checks, and capacity updates while asserting no leaks after shutdown (TestStressFull). - Added a multi-cycle stress regression that targets close behavior under aggressive refresh-triggered reopen plus concurrent
SetCapacitychurn (TestStressRefreshReopenSetCapacityClose). - Added a focused regression test ensuring
CloseWithContextdoes not hand an “idle-cleanup popped” connection to a waiter once capacity reaches 0 (TestCloseWithContextAfterIdleCleanupPopDoesNotLeak).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| go/pools/smartconnpool/stress_test.go | Adds new stress/regression tests exercising shutdown semantics under concurrent traffic, refresh reopens, and capacity changes. |
| go/pools/smartconnpool/pool_test.go | Adds a targeted regression test around close + waiters when an idle-cleanup connection is returned after capacity hits 0. |
Signed-off-by: Arthur Schreiber <arthur@planetscale.com> # Conflicts: # go/pools/smartconnpool/pool_test.go # go/pools/smartconnpool/stress_test.go
A SetCapacity call queued on capacityMu while CloseWithContext holds it would run right after Close released the mutex and silently write the requested capacity into pool.capacity. The pool ended up closed (IsOpen() == false) with a non-zero Capacity(), as the TestStressRefreshReopenSetCapacityClose regression test demonstrates. Guard SetCapacity with the same close-pointer check that reopen() already uses, returning ErrConnPoolClosed when the pool is not open. A pre-Open SetCapacity call in primeTxPoolWithConnection was already a no-op today (pool.open() unconditionally stores config.maxCapacity into pool.capacity), so it's moved after Open to actually pin the intended capacity of 1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20158 +/- ##
===========================================
+ Coverage 69.67% 77.44% +7.77%
===========================================
Files 1614 110 -1504
Lines 216793 16654 -200139
===========================================
- Hits 151044 12898 -138146
+ Misses 65749 3756 -61993
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What's this?
Adds the remaining smartconnpool stress/regression tests from the connection pool stall / correctness audit, plus the small fix one of them surfaced.
Fix:
SetCapacityon a closed poolA
SetCapacitycall queued oncapacityMuwhileCloseWithContextholds it would acquire the mutex right afterClosereleased it and silently store the requested capacity intopool.capacity. The pool then sat in aIsOpen() == false/Capacity() > 0state — closed, but with capacity still pointing at the racing caller's last value.Guarded
SetCapacitywith the sameclose.Load() == nilcheck thatreopen()already uses, returningErrConnPoolClosedwhen the pool is not open.The same guard also rejects pre-
Opencalls, which were already a no-op in practice —open()unconditionally storesconfig.maxCapacityintopool.capacity— but quietly accepted the value. One test (primeTxPoolWithConnection) was relying on this no-op; it now sets capacity afterOpen, which is what it always intended.Tests
TestStressFull— broad shutdown stressTestStressRefreshReopenSetCapacityClose— surfaces theSetCapacity/Closerace fixed hereTestCloseWithContextAfterIdleCleanupPopDoesNotLeak— covered by smartconnpool: unblock Close and preserve Setting on reopen #20122's fixTestSetCapacityRejectedOnClosedPool— pins the new contract directly (pre-Open, open, and closed)Related Issue(s)
Builds on #20122 and #20157, both of which have merged.
Checklist
Deployment Notes
SetCapacitynow returnsErrConnPoolClosedinstead of silently writing intopool.capacitywhen the pool is closed (or has never been opened). The only production callers — the/debug/envHTTP setters intabletserver.go— are guarded bytsv.isOpenand only run when the QueryEngine pools are open, so this is not a user-visible change there.AI Disclosure
Tests written by GPT-5 with human direction; the
SetCapacityguard and the test updates were written by Claude Code with direction from Arthur.