Skip to content

Fix WebSocket close frame discarding buffered data frames#2032

Merged
piotrpio merged 1 commit intomainfrom
ws-close
Mar 25, 2026
Merged

Fix WebSocket close frame discarding buffered data frames#2032
piotrpio merged 1 commit intomainfrom
ws-close

Conversation

@wallyqs
Copy link
Copy Markdown
Member

@wallyqs wallyqs commented Feb 27, 2026

When a NATS server rejects a WebSocket connection (e.g., due to an expired JWT), it sends a -ERR protocol message followed by a WebSocket close frame. If both frames arrive in the same TCP read, the websocketReader.Read() method would process the close frame and immediately return io.EOF, discarding the -ERR data that had already been buffered in the pending list.

This caused nats.Connect to return a generic EOF error instead of the proper authentication error (ErrAuthorization, ErrAuthExpired, etc.), making it difficult for users to diagnose connection failures.

The fix defers the close frame's io.EOF when there is already pending data, storing it in a new closeErr field. The pending data is returned first, and the deferred error is returned on the subsequent Read() call.

@wallyqs wallyqs marked this pull request as ready for review March 2, 2026 16:57
@piotrpio
Copy link
Copy Markdown
Collaborator

@claude review this PR

@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer Bot commented Mar 25, 2026

Claude finished @piotrpio's task in 1m 31s —— View job


PR Review

  • Gather context
  • Read changed files
  • Analyze the fix
  • Post review feedback

Summary: This is a clean, minimal fix for a real UX problem. The approach is correct and well-targeted. A few minor observations below.


Fix Analysis (ws.go)

The fix is correct. The core change is:

  1. In websocketReader.Read(), when a control frame handler (specifically a close frame) returns an error and there is already data in r.pending, store the error in a new closeErr field and break rather than returning immediately.
  2. After r.pending is fully drained (which can happen over multiple Read() calls for small buffers), the deferred error is returned at the closeErr check block (lines 209–213).

The ordering in the else branch is correct:

if len(r.pending) > 0 {
    return r.drainPending(p), nil   // drain buffered data first
}
if r.closeErr != nil {
    ...                              // return deferred error after data is drained
}

One subtle correctness note: After breaking out of the loop on a deferred close, the function reaches:

if len(r.pending) > 0 {
    return r.drainPending(p), nil
}
return 0, nil   // ws.go:346

If somehow r.pending is empty when the break occurs (e.g., the close frame arrived before any data frame added to pending), the function returns (0, nil). The deferred closeErr is still returned correctly on the next Read() call via the check at line 209. This is an edge case that doesn't apply to the bug being fixed (the data frame would always have been added to pending first), but the behavior remains correct.

The closeErr check is inside the else branch (i.e., only when r.ib == nil). If there's a buffered r.ib, the deferred error is skipped in that pass — but that's fine, since r.ib represents leftover data from the same TCP read that needs to be parsed first.


Test Analysis (ws_test.go)

The test correctly exercises the primary scenario: binary data frame + close frame in the same buffer, verifying that the data is returned first and io.EOF follows on the subsequent Read().

Minor nit (test comment vs. frame type): The test comment at line 237 says the server sends "-ERR... as a text frame", but the frame is constructed with wsBinaryMessage (line 244). The fix works for both frame types at the WS layer, so this doesn't affect test correctness, but the comment is slightly misleading.

Minor nit (close frame status code): Status 0x03, 0xE8 = 1000 ("normal closure") is used alongside the reason string "Authentication Failure" (line 247–252). Status 1000 with an auth failure reason is a slight mismatch — the server would more typically use 1008 (policy violation) or an application-specific 4xxx code. Again, this doesn't affect the validity of the fix or test — it's purely a test data realism nit. Fix this →


Overall

The fix is correct, minimal, and handles the partial-drain case properly. The approach of deferring io.EOF until pending data is consumed is idiomatic and consistent with io.Reader semantics. No blocking issues.

When a NATS server rejects a WebSocket connection (e.g., due to an
expired JWT), it sends a -ERR protocol message followed by a WebSocket
close frame. If both frames arrive in the same TCP read, the
websocketReader.Read() method would process the close frame and
immediately return io.EOF, discarding the -ERR data that had already
been buffered in the pending list.

This caused nats.Connect to return a generic EOF error instead of the
proper authentication error (ErrAuthorization, ErrAuthExpired, etc.),
making it difficult for users to diagnose connection failures.

The fix defers the close frame's io.EOF when there is already pending
data, storing it in a new closeErr field. The pending data is returned
first, and the deferred error is returned on the subsequent Read() call.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
Copy link
Copy Markdown
Collaborator

@piotrpio piotrpio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 85.353% (-0.04%) from 85.392%
when pulling 4574245 on ws-close
into 01214e2 on main.

@piotrpio piotrpio merged commit 03c61d7 into main Mar 25, 2026
7 checks passed
@piotrpio piotrpio deleted the ws-close branch March 25, 2026 14:05
@piotrpio piotrpio mentioned this pull request Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants