Skip to content

Translate Quip's Loadshed-Lock to Go#840

Merged
bgwines merged 13 commits into
bwines/snake-22from
bwines/loadshed-lock
May 20, 2026
Merged

Translate Quip's Loadshed-Lock to Go#840
bgwines merged 13 commits into
bwines/snake-22from
bwines/loadshed-lock

Conversation

@bgwines

@bgwines bgwines commented Apr 10, 2026

Copy link
Copy Markdown

Background / Why?

VTTablets under high load need a mechanism to shed excess requests rather than letting queue depth grow unbounded. The Python loadshed-lock system implements this using the CoDel (Controlled Delay) algorithm — the same algorithm used in Linux's fq_codel queueing discipline — to detect persistent queue buildup and drop low-priority requests before they accumulate latency that compounds downstream.

This PR ports that system to Go. The core challenge is that Python's asyncio provides implicit single-threaded synchronization, while Go requires explicit synchronization under true parallelism. The design uses a single shared sync.Mutex across all layers to mirror Python's single-threaded guarantee, with chan error replacing asyncio.Future for per-request signaling.

Architecture

Lock (public API — owns sync.Mutex)
│
├── Acquire(ctx, valveID) → *SafeUnlock
│   ├── uncontended: grant immediately, start max-age timer
│   └── contended: enqueue → select { req.result | ctx.Done() }
│
├── SafeUnlock.Release(errs...)
│   ├── verify nonce (sync.Once for idempotency)
│   ├── run release callbacks (mutex NOT held)
│   └── dequeue current holder, signal next waiter
│
├── SelfContentionAwareCoDelQueue (valve layer)
│   │
│   │  N valves (one per valve ID, typically one per HTTP request
│   │  on the app server). Each valve permits at most 1 request
│   │  into the CoDel queue; the rest wait in a per-ID FIFO slice.
│   │
│   │  Owns drop orchestration: lockedRunScheduledDrop encapsulates
│   │  finding the lowest-priority droppable request and triggering
│   │  valve promotion.
│   │
│   │         valve "A"           valve "B"         valve "C"
│   │        ┌─────────┐        ┌─────────┐       ┌─────────┐
│   │        │ A₂  A₃  │        │ B₂      │       │ (empty) │
│   │        └────┬────┘        └────┬────┘       └────┬────┘
│   │             │ promote          │ promote         │
│   │             ▼                  ▼                 ▼
│   └───► CoDelQueue ◄───────────────────────────────────────
│         (single shared instance)
│
├── CoDelQueue (CoDel algorithm)
│   ├── *list.List: [A₁] [B₁] [C₁]   (O(1) head pop + O(1) removal)
│   ├── healthy ↔ dropping state machine
│   ├── control law: interval / count^exponent
│   └── schedules drops via injected callback — no timer ownership
│
└── Timers (owned by Lock, scheduled via callback injection)
    ├── dropTimer: time.AfterFunc → sq.lockedRunScheduledDrop
    └── maxAgeTimer: time.AfterFunc → force-release stale holder

Request lifecycle events

Event Mechanism Result channel touched? Request leaves CoDel queue?
Grant lockedMarkNotDroppable + signal(nil) Yes — nil No — holder stays until release
Release lockedDequeue No (already signaled) Yes — removed on release
Drop lockedDropActivesignal(err) Yes — error Yes — removed immediately
Cancel lockedCancel / lockedRemove Yes — error Yes — removed immediately

Summary

  • request.goRequest with priority (*float64), result channel, signal() for atomic write to inspectable outcome field + blocking channel, container/list back-pointer for O(1) removal, unexported priorityUndroppable sentinel, droppability derived from priority (no separate boolean)
  • codelq.go — CoDel queue + DroppedRequestError: healthy/dropping state machine, control law (interval / count^exponent), drop timer scheduling via injected callback, max 100 drops per timer fire, sojourn-time-based state transitions
  • selfcontentionaware_codelq.go — Per-valve-ID serialization: at most 1 request per valve ID in CoDel queue at a time, preventing fan-out patterns from inflating queue depth. Owns lockedRunScheduledDrop which encapsulates drop-finding + valve promotion
  • lock.goLock (public API) + SafeUnlock with nonce verification, Acquire(ctx, valveID) with explicit valve ID parameter, max-age forced release, cancel-vs-grant race handling, release callbacks with panic recovery

Key design decisions

  1. Drop timer via callback injection — CoDelQueue calls an injected scheduleDropTimer(delayNs) internally whenever it needs a timer (matching Python's call_later). Lock provides the callback. This eliminates the error-prone out-parameter pattern that caused missed reschedules.
  2. Valve ID as explicit parameterAcquire(ctx, valveID string) takes the valve ID directly rather than via a config callback. Clearer call sites, no ambient state.
  3. Droppability derived from priority — No separate droppable boolean. isDroppable() checks priority != priorityUndroppable. lockedMarkNotDroppable sets priority to the sentinel.
  4. Cancel-vs-grant race — Go's non-deterministic select when both ctx.Done() and req.result are ready requires explicit detection: the cancel path checks if l.holder == req after acquiring the mutex, and if so calls releaseInternal to prevent lock orphaning.

Differences from the Python implementation

  1. Synchronization: Python relies on asyncio's single-threaded event loop (no locking). Go uses a single sync.Mutex on Lock guarding all state
  2. Request signaling: Python uses asyncio.Future. Go uses buffered chan error (cap 1) paired with signal() that writes both an inspectable outcome field and the channel
  3. Timer ownership: Both Python and Go inject scheduling into the queue. Python uses loop.call_later(), Go uses an injected func(delayNs int64) callback. Lock owns the actual time.AfterFunc and provides idempotency
  4. Cancel-vs-grant race: Python has no equivalent — asyncio's cooperative scheduling prevents it. Go's Acquire uses a double-select with holder-check fallback
  5. Context cancellation: Python cancels via asyncio.CancelledError. Go takes context.Context in Acquire with explicit lockedCancel/lockedRemove methods
  6. Release idempotency: Python uses __del__ fallback. Go wraps Release in sync.Once
  7. Max-age timeout: Python cancels the holder's asyncio.Task. Go uses time.AfterFunc with staleness guard via maxAgeHolder pointer comparison
  8. activePerValve map: Added for O(1) lookup of which request is in CoDel per valve ID (Python scans callbacks)
  9. Release callbacks: Python awaits async coroutines. Go runs func(error) callbacks without mutex held, with recover() for panic safety

Testing

98 tests across 4 test files:

  • codelq_test.go (39 tests) — CoDel algorithm unit tests: FIFO ordering, drop priority selection, state machine transitions, control law math, scheduled drop behavior via test recorder
  • selfcontentionaware_codelq_test.go (14 tests) — Valve logic: direct entry, valving, promotion on dequeue/drop/cancel, outstanding count lifecycle, FIFO within contention
  • lock_test.go (28 tests) — Lock behavior: mutual exclusion, FIFO ordering, context cancellation, cancel-vs-grant race, self-contention serialization, CoDel drops, max-age timeout, SafeUnlock idempotency, release callbacks with panic recovery
  • lock_stress_test.go (16 tests) — High-concurrency stress tests with -race: 200 goroutines, rapid acquire/release, cancel-vs-grant races, drop timer + cancel races, max-age under load, goroutine leak detection, starvation prevention, self-contention stress (mutual exclusion, FIFO valve ordering, drop+promotion chains, cancel-in-valve, mixed cancel/drop/grant, sustained high concurrency)
  • lock_race_test.go (1 test) — 50k-iteration targeted reproduction of the cancel-vs-grant race

All tests pass with -race -count=5.


AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

@github-actions github-actions Bot added this to the v24.0.0 milestone Apr 10, 2026
@bgwines bgwines force-pushed the bwines/loadshed-lock branch from 314f3e5 to 06d5590 Compare April 10, 2026 17:59
@bgwines bgwines changed the title Implement CoDel-based loadshed lock for VTTablet Translate Quip's Loadshed-Lock to Go Apr 10, 2026
@bgwines bgwines changed the title Translate Quip's Loadshed-Lock to Go Translate Quip's Loadshed-Lock to Go [WIP] Apr 10, 2026
Comment on lines +59 to +66
// SafeUnlock is a handle for releasing a lock. Only the goroutine that
// acquired the lock should call Release. Release is idempotent.
SafeUnlock struct {
l *Lock
nonce uint64
once sync.Once
err error
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern appears to not be idiomatic in go. if the usage of that the lock is taken and released in the same function, maybe we don't need the additional proptection and can get away with a simpler abstraction of an "erroring mutex"

err := emu.Lock()
if err != nil {
	return ...
}
defer emu.Unlock()

@bgwines bgwines May 1, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed this verbally today. Assuming we keep the lock(semaphore) holder inside the codelq, the use-case of transactions means we can't do this since it's multiple requests from the client, which implies multipl estackframes.

Fortunately, it's not completely without precedent. Claude claims idiomatic Go for this is store the release handle on the struct that owns the resource lifetime, which is StatefulConnection. It already does exactly this with the DB connection itself:

  // stateful_connection.go
  type StatefulConnection struct {
      dbConn  *connpool.PooledConn   // acquired in Begin, released in Commit/Rollback
      ConnID  tx.ConnID
      // ...
  }

  func (sc *StatefulConnection) Release(reason tx.ReleaseReason) {
      sc.pool.unregister(sc.ConnID, ...)
      sc.dbConn.Recycle()    // ← returns conn to pool
      sc.dbConn = nil
  }


if contentionID != "" {
s.outstandingCounts[contentionID]++
if s.outstandingCounts[contentionID] > 1 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to put some thought into whether 1 is the right number here. it made sense when doing single dispatch, but when we do multiple and possibly variable dispatch i don't know offhand what the right answer is.

IntervalNs func() int64
TargetNs func() int64
Exponent func() float64
MinDropDelayNs func() int64

@bgwines bgwines May 19, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Quip we never overrode the default we set, 100ns -- maybe just don't even expose this?

@bgwines

bgwines commented May 19, 2026

Copy link
Copy Markdown
Author

Code Review: Interactive Walkthrough Results

Brett and I did a function-by-function review comparing this Go translation against the Python original. 23 items identified — 1 bug, 3 architectural, 5 refactoring, 14 naming/clarity.

Bug Found & Fixed

Cancel-vs-grant race in Acquire's double-select (commit 125651dcea)

Race window: releaser sets l.holder = next and unlocks, but BEFORE calling next.signal(nil), the cancelling goroutine's inner non-blocking select takes default (done channel empty), acquires the mutex, and calls lockedCancel on itself — the current holder. Reproduction rate: 17% (17,202 / 100,000 iterations). Fixed by checking l.holder == req after acquiring mutex in the default branch, draining the pending signal, then calling releaseInternal(). Stress test: 50,000 iterations, 0 failures.

Architecture (3 items)

# Summary
8 Queue should manage its own drop timer via callback injection (faithful to Python's battle-tested design)
20 Eliminate maxAgeHolder — restructure release so all state manipulation is one atomic mutex hold, run callbacks after granting next holder
21 Make valve ID a mandatory string parameter to Acquire, not a config callable

Refactoring (5 items)

# Summary
15 Extract lockedClearActiveAndPromote shared helper from promote-on-grant and promote-on-evict
18 Extract release callback execution into runReleaseCBs helper
19 Extract lockedGrantNext helper for duplicated grant logic
13 Inline decrementOutstanding in clearDone loop
23 Collapse NewLock / NewLockWithClock into single constructor

Naming & Clarity (14 items)

# Summary
1 Use explicit constant for undroppable priority, not nil
2 Remove droppable instance variable — derive from priority
3 Remove enqueuedAt parameter from newRequest
4 Rename done channel to result
5 Remove redundant q.droppableLen > 0 guard
6 Rename clockFunc to now
7 Rename enqueuedAt to enqueuedAtNs
9 Rename lockedCancel to lockedRemove
10 Rename contentionID to valveID throughout
11 Add comment explaining empty valve ID bypass semantics
12 Rename activeRequests to clarify it only tracks valve-ID requests
14 Rename SelfContentionAwareCoDelQueue receiver sq
17 Fix "run release callbacks without mutex held" comment to explain WHY
22 Rename l.sq to l.q

All items will be addressed in a follow-up commit on this branch.

@bgwines bgwines changed the base branch from main to bwines/project-snake May 19, 2026 20:31
@salesforce-cla

Copy link
Copy Markdown

Thanks for the contribution! Before we can merge this, we need @mattlord @vitess-bot @nickvanw @maxenglander @mhamza15 @arthurschreiber to sign the Salesforce Inc. Contributor License Agreement.

bgwines and others added 10 commits May 19, 2026 21:35
Port the Python loadshed-lock system to Go for production use on
VTTablets. The lock uses a CoDel (Controlled Delay) algorithm to
detect persistent queue buildup and shed load by dropping low-priority
requests, while a self-contention-aware valve system prevents fan-out
patterns from artificially triggering drops.

Key design decisions for Go's concurrency model:
- Single shared mutex across all layers (Lock, SelfAwareCoDelQueue,
  CoDelQueue) mirrors Python's single-threaded asyncio guarantee
- chan error (capacity 1) per request replaces asyncio.Future
- Direct method calls within critical sections replace Python's
  future-callback inter-layer communication
- Cancel-vs-grant race handling for Go's non-deterministic select

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
- Inline `DroppedRequestError` into `codelq.go`, remove `errors.go`
- Rename `SelfAwareCoDelQueue` → `SelfContentionAwareCoDelQueue`
- Rename `cq` field → `codelq` for explicitness
- Rename timer functions to `locked*` prefix for consistency
  (`scheduleDropTimerLocked` → `lockedScheduleDropTimer`, etc.)
- Add `PriorityUndroppable` sentinel constant in `request.go`
- Replace `context.Background()` with `t.Context()` in all tests
- Run `gofumpt` and `goimports` on all files

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Brett Wines <bwines@slack-corp.com>
Resolves golangci-lint failures from the Static Code Checks CI job:
- Replace math/rand with math/rand/v2 (depguard rule)
- Convert all wg.Add(1)+go func patterns to wg.Go (waitgroup modernize)
- Add //nolint:modernize to NewPriority and math.Inf call sites

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Brett Wines <bwines@slack-corp.com>
The CoDel queue insertion logic (set enqueuedAt, PushBack, droppableLen,
timer schedule signaling) was duplicated between CoDelQueue.lockedEnqueue
and SelfContentionAwareCoDelQueue.codelqEnqueue. Extract it into
CoDelQueue.lockedEnqueueRequest so the valve layer delegates instead of
reimplementing. Mirrors the Python split (enqueue vs enqueue_request).

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Signed-off-by: Brett Wines <bwines@slack-corp.com>
…ield

lockedPeek needed to check whether a signaled request was granted (nil)
or dropped (error) without consuming the channel value. It did this by
reading from the channel and writing the value back — correct under the
mutex but fragile and hard to reason about.

Add a signal() method on Request that writes both an inspectable result
field and the blocking channel atomically. lockedPeek now reads result
directly. The channel is still used for blocking in Lock.Acquire.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Signed-off-by: Brett Wines <bwines@slack-corp.com>
The old TestLock_Stress_SelfContention used defaultLockConfig() which
hardcodes ContentionID to return "" — the valve was never exercised.
TestLock_Stress_SelfContention_Proper had weak assertions.

Replace both with 6 focused stress tests that use a sync.Map +
goroutineID() pattern to set per-goroutine contention IDs, ensuring
the valve mechanism is actually tested under concurrent load:

1. MutualExclusion: 10 IDs x 10 goroutines, per-ID + global atomics
2. ValveSerializationOrder: FIFO assertion for single contention ID
3. DropPromotionChain: aggressive CoDel, granted+dropped=total per ID
4. CancelInValve: cancel every other waiter, verify promotion chain
5. MixedCancelDropGrant: CoDel+max-age+timeouts, no lost requests
6. HighConcurrency_Sustained: 500ms sustained cycling, mutual exclusion

All pass with -race -count=5.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Signed-off-by: Brett Wines <bwines@slack-corp.com>
Lock.runDropTimer reached through two layers (l.sq.codelq) to find
droppable requests and run the CoDel state machine, then called back
up to l.sq.lockedDropActive for valve promotion. This coupled Lock to
CoDelQueue internals (list.Element, Request casting, contentionID).

Give SelfContentionAwareCoDelQueue its own lockedRunScheduledDrop that
encapsulates the find-and-drop-with-promotion logic. Lock now calls
l.sq.lockedRunScheduledDrop() without knowing about CoDelQueue, list
elements, or how drops trigger valve promotion — matching the Python
original where CoDelQueue owned all drop logic and promotion happened
via Future callbacks.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Signed-off-by: Brett Wines <bwines@slack-corp.com>
`scheduleDropIfNeeded` was a no-op that existed as a placeholder from
the Python translation. Timer scheduling is signaled via return values
from `lockedEnqueue` and `lockedRunScheduledDrop` — the no-op was never
called for effect.

`stopDropTimer` → `lockedClearTimerFlag` makes the method honest about
what it does (clears a bookkeeping flag) and consistent with the
`locked*` prefix convention.

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Brett Wines <bwines@slack-corp.com>
When lockedDequeue exits the dropping state (sojourn < target), the
drop timer was cleared but never re-armed. If droppable items remain
in the queue and no new enqueue arrives, CoDel never fires — a
liveness gap. Port the Python behavior: re-arm at the healthy interval.

The fix changes lockedDequeue to return (req, needSchedule, delayNs)
so callers can propagate the reschedule signal up through
SelfContentionAwareCoDelQueue and Lock.release/releaseInternal.

Also fixes lockedPromote which was discarding the needSchedule signal
from codelqEnqueue — valve promotions now correctly propagate the
timer schedule signal to the Lock layer.

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Brett Wines <bwines@slack-corp.com>
The race window: after the releaser sets holder=req and unlocks but
BEFORE signaling, the cancelling goroutine's inner non-blocking select
takes default (done channel empty), acquires the mutex, and previously
would call lockedCancel on itself — the current holder.

This leaked the grant (17% reproduction rate in stress test): release
callbacks skipped, max-age timer fires on stale holder, l.holder
points to a cancelled request.

Fix: after acquiring the mutex in the default branch, check if we were
granted during the race window (l.holder == req). If so, drain the
pending signal from the done channel (the releaser is about to send it)
and call releaseInternal to hand the lock to the next waiter.

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Brett Wines <bwines@slack-corp.com>
@bgwines bgwines force-pushed the bwines/loadshed-lock branch from 125651d to 3f8af27 Compare May 19, 2026 20:35
@bgwines bgwines changed the title Translate Quip's Loadshed-Lock to Go [WIP] Implement CoDel-based loadshed lock for VTTablet May 19, 2026
- Drop timer via callback injection (matches Python's call_later design)
- valveID as explicit Acquire parameter instead of config callback
- Droppability derived from priority (remove separate boolean)
- Rename done→result, enqueuedAt→enqueuedAtNs, contentionID→valveID
- Move drop orchestration into SelfContentionAwareCoDelQueue
- Fix missing decrementOutstanding on dequeue path
- Remove dead code (goroutineID, scheduleDropIfNeeded out-params)
- Simplify test helpers to match new callback-injection APIs

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bgwines bgwines changed the title Implement CoDel-based loadshed lock for VTTablet Translate Quip's Loadshed-Lock to Go May 19, 2026
lockedPeek defensively removes done-with-error requests from the CoDel
queue head, but was not decrementing droppableLen or notifying the
self-contention layer to decrement outstandingCounts. This could cause
both counters to drift if the defensive path ever fires.

Fix: add onPeekCleanup callback to CoDelQueue (mirroring the
scheduleDropTimer injection pattern). CoDelQueue now decrements
droppableLen inline and calls the callback for each cleaned-up request.
SelfContentionAwareCoDelQueue provides the callback to decrement
outstandingCounts.

Python doesn't have this issue because asyncio Future callbacks fire
automatically on completion, so the equivalent bookkeeping is always
triggered regardless of which code path observes the done state.

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bgwines bgwines changed the base branch from bwines/project-snake to bwines/snake-22 May 19, 2026 21:44
@bgwines bgwines marked this pull request as ready for review May 19, 2026 23:29
@bgwines bgwines requested a review from a team as a code owner May 19, 2026 23:29
CI uses Go 1.24.13 (per go.mod) which doesn't have WaitGroup.Go
(added in Go 1.25). Replace all 23 occurrences in lock_stress_test.go
and lock_test.go with the traditional wg.Add(1); go func() { defer
wg.Done(); ... }() pattern.

Also applies gofmt to fix minor alignment drift in request.go and
codelq.go.

AI disclosure: Claude Code assisted with development. Every line of code was either written by or carefully reviewed by me :)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bgwines bgwines merged commit ee133fe into bwines/snake-22 May 20, 2026
90 of 91 checks passed
@bgwines bgwines deleted the bwines/loadshed-lock branch May 20, 2026 01:10
if len(q.pendingRequests[valveID]) == 0 {
delete(q.pendingRequests, valveID)
}
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valves are FIFO, I'll touch this up in a subsequent PR. I don't think this is O(N) based on the caller pattern, but we can make that a hard guarantee by adjusting the implementation a bit.

@bgwines

bgwines commented May 20, 2026

Copy link
Copy Markdown
Author

squash-merged in 4191f37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants