poc: original request is not aborted on bodytimeout (only retry handler?) by Uzlopak · Pull Request #4470 · nodejs/undici

Uzlopak · 2025-08-26T13:43:50Z

This PR is far from perfect! Actually started as an approach to deflake test/isue-3356.js.

Please have a look at the code and the tests. It is hard to explain:

It seems that the flakyness actually shows that we have some underlying issue. If we get a bodyTimeout, it doesnt mean that the connection is closed. It just means that the body did not finish in the expected time. Ok, no problem is the connection closes after some time. But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

So I assume, we have to abort the request before we retry with a new request. Maybe even check how much data we buffered already, and set the corresponding content-range headers.

But tbh. I am kind of lost in this part of the code. So I show you this, maybe you have better ideas.

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

Status

Uzlopak · 2025-08-26T13:56:50Z

+    // If the error is a body timeout we want to abort the request
+    // as the server could be still sending data and we want to avoid
+    // to have multiple ongoing requests.
+    if (code === 'UND_ERR_BODY_TIMEOUT') {
+      if (controller && !controller.aborted) {
+        controller.abort()
+      }
+      shouldRetryCb(err)


this feels so hacky.

Uzlopak · 2025-08-26T13:57:48Z

+    setTimeout(() => {
+      shouldRetryCb(null)
+    }, retryTimeout)?.unref()


Why I unref it? tbh... where do we actually clear the timer?

Is not cleared as is not common request gets aborted on retry, tho having it as safe net when the controller.abort is called seems a good approach

Uzlopak · 2025-08-26T13:58:44Z

  }

-  static [kRetryHandlerDefaultRetry] (err, { state, opts }, cb) {
+  static [kRetryHandlerDefaultRetry] (err, { controller, state, opts }, shouldRetryCb) {


renamed it to shouldRetryCb so that it is easier to grok what it does.
controller is passed to potentially abort the request.

Uzlopak · 2025-08-26T13:59:37Z

        opts: { retryOptions: this.retryOpts, ...this.opts }
      },
-      shouldRetry.bind(this)
+      shouldRetryCb


because shouldRetryCb is now an arrow function, this points to the retry handler anyway.

Uzlopak · 2025-08-26T14:03:00Z

-      setTimeout(() => { res.end('ello world!') }, 100)
+    if (callCount++ === 0) {
+      res.write('ahahaha')
+      // never end the response


First i thought, to increase the timeouts, but then i thought: What happens if we never timeout?!

Original issue 3356 was, that we should ensure, that we dont concat the responses. solution was that non-206 responses should throw.

Uzlopak · 2025-08-26T14:04:16Z

+      // never end the response
    } else {
-      res.end('hello world!')
+      t.fail('should not be called twice')


If you run the code on main, you will see that we will call the routehandler twice! This means, that the retry handler makes the request twice. That doesnt seem right if we say, that responses with status 200 will not be able to process responses with content-range

Responses with 200 can mean that no more data is available (consumed all request) or the server just don't support range-request and will send the whole body instead.

The handler already covered that, but will need to check what was possibly wrong with it

According to the issue, we had the problem that if we passed the response stream to the target stream, there is no way to "revert" that downstreamed data. E.g. we stream to a file stream, and partial data is written, response stream has issues, now retry, so we begin from the beginning to stream. Bam, double data.

The consensus of that issue was to handle it as an error and define the state of the request/response as non recoverable

Maybe my understanding is wrong. But this means that status 200 means that we dont retry. Of course we could consider that even if status 200 is thrown we retry and see if the response is a partial response with corresponding range headers set.
But i dont see it in my tests?!

We could have of course tried other approaches too. Like make a request and track transferred content on bytes, on error do retry sent range headers in hope it will accept it, and if there are no content-range headers dump bytes till we get new bytes and push them finally to the real stream. Such a behaviour should be configurable.

Uzlopak · 2025-08-26T14:05:40Z

+  after(() => once(server.close(), 'close'))

  const agent = new RetryAgent(new Agent({ bodyTimeout: 50 }), {
    errorCodes: ['UND_ERR_BODY_TIMEOUT']


This means that UND_ERR_BODY_TIMEOUT should retry. But actually we decided, that it shuold not retry?

Uzlopak · 2025-08-26T14:06:45Z

  await t.completed
 })
+
+test('https://github.com/nodejs/undici/issues/3356', { skip: true }, async (t) => {


I skipped this test, because it is not working. Maybe the logic for 206 with content-range is wrong.

Since when is not working, or only not working with the new changes?

IIRC it also fails on main. But maybe the test setup is bad.

fatal10110 · 2025-08-28T06:37:33Z

IMHO I do not think the solution u are looking for is entirely in retry handler, you have two issues described in the description of the PR

you described two issues here

But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

How can it happen, if you destroy the socket on bodyTimeout?

util.destroy(socket, new BodyTimeoutError())

Maybe even check how much data we buffered already, and set the corresponding content-range headers.

Thats the only implementation that should be in retry handler
There is no reason to stop the retry process on body timeout AFAIK

metcoder95

The approach lgtm, if the range-request logic is broken with this changes, we might need to verify the changes or that the retry handler properly processes the range-requests as per spec.

I can try to do that later this week.

About the timer unref, I'd recommend not apply unref as possibly imposes a breaking change (now terminating process won't account for the request about to be retried). Tho, I'm +1 on cleaning the timer and upon request getting aborted.

metcoder95 · 2025-08-28T06:42:04Z

  await t.completed
 })
+
+test('https://github.com/nodejs/undici/issues/3356', { skip: true }, async (t) => {


Since when is not working, or only not working with the new changes?

metcoder95 · 2025-08-28T06:44:00Z

+      // never end the response
    } else {
-      res.end('hello world!')
+      t.fail('should not be called twice')


Responses with 200 can mean that no more data is available (consumed all request) or the server just don't support range-request and will send the whole body instead.

The handler already covered that, but will need to check what was possibly wrong with it

metcoder95 · 2025-08-28T06:47:14Z

+    setTimeout(() => {
+      shouldRetryCb(null)
+    }, retryTimeout)?.unref()


Is not cleared as is not common request gets aborted on retry, tho having it as safe net when the controller.abort is called seems a good approach

Uzlopak · 2025-08-30T09:30:23Z

@metcoder95

I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?

Uzlopak · 2025-08-30T09:37:08Z

@fatal10110

I dont think the socket gets destroyed. Would need to investigate though. But i guess it is because we are not directly working on the h1 client?
Idk.

Anyhow imho undici is acting strange. This PR was just a poc. Maybe everything is fine and i am wrong...

mcollina · 2025-08-30T10:11:19Z

+    // If the error is a body timeout we want to abort the request
+    // as the server could be still sending data and we want to avoid
+    // to have multiple ongoing requests.
+    if (code === 'UND_ERR_BODY_TIMEOUT') {


Use the modified test on main. The process hangs unrecoverable.

artur-ma · 2025-08-31T08:51:27Z

Is there a way / test to repoduce the issue u are describing? Running the test on main brach, it always passes

Uzlopak · 2025-08-31T09:15:36Z

@artur-ma

This is what i see, when i run my modified test on main:

aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js 
✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms)
  AssertionError [ERR_ASSERTION]: should not be called twice
      at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35)
      at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9)
      at Server.emit (node:events:524:28)
      at parserOnIncoming (node:_http_server:1141:12)
      at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) {
    generatedMessage: false,
    code: 'ERR_ASSERTION',
    actual: undefined,
    expected: undefined,
    operator: 'fail'
  }

﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIP

The process hangs...

metcoder95 · 2025-09-01T06:50:20Z

I personally lack the insights in these parts and I think it would great if you would investigate it further. Should i close this PR?

Sure, I can do that

artur-ma · 2025-09-04T08:12:05Z

@Uzlopak

@artur-ma

This is what i see, when i run my modified test on main:

aras@aras-HP-ZBook-15-G3:~/workspace/undici$ node test/issue-3356.js 
✖ https://github.com/nodejs/undici/issues/3356 (1525.138235ms)
  AssertionError [ERR_ASSERTION]: should not be called twice
      at res.<computed> [as fail] (/home/aras/workspace/undici/node_modules/@matteo.collina/tspl/tspl.js:58:35)
      at Server.<anonymous> (/home/aras/workspace/undici/test/issue-3356.js:23:9)
      at Server.emit (node:events:524:28)
      at parserOnIncoming (node:_http_server:1141:12)
      at HTTPParser.parserOnHeadersComplete (node:_http_common:118:17) {
    generatedMessage: false,
    code: 'ERR_ASSERTION',
    actual: undefined,
    expected: undefined,
    operator: 'fail'
  }

﹣ https://github.com/nodejs/undici/issues/3356 (0.122966ms) # SKIP

The process hangs...

That sounds like incorrect test.. Its expected to be called twice, since this is the purpose of retry on timeout
From what I understand, the case you are trying to fix is another one, that both sockets are active (data is written in to it) at the same time because the first socket wasnt destryed.

Uzlopak · 2025-09-06T07:05:31Z

@artur-ma

Did you read what I wrote? Did you read the corresponding issue?

Exactly the opposite of what you wrote is the expected behavior.

artur-ma · 2025-09-07T05:54:42Z

@Uzlopak

@artur-ma

Did you read what I wrote? Did you read the corresponding issue?

Exactly the opposite of what you wrote is the expected behavior.

I read what u wrote, and this is exactly what Im saying

But if we have a body timeout and the connection is still open and potentially still sending data, then some undefined behavior happens.

Retry is expected on timeout, the second call to the API is expected this is the purpose of retry mechanism, what is not expected, is that the old socket still be active after timeout

So how is it the opossite?

Uzlopak · 2025-09-07T08:15:43Z

@artur-ma

If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.

artur-ma · 2025-09-09T08:41:57Z

@Uzlopak

@artur-ma

If the server does not support ranges, no range headers are sent, no 206 status code and maybe no etag to verify, then we should not retry if a body was already sent. The test does simulates a case which does not meet the conditions for a retry, but the retry handler does the retry anway.

Thank you for clarification, and still it sounds like the test is wrong, the logic u described is not handled right now in mian, so after adding errorCodes: ['UND_ERR_BODY_TIMEOUT'] as RetryAgent option it is expected to be retried, as u are setting it explicitly as retryable error code.

If its not set, the error will be thrown and bubble up to this line

undici/lib/core/request.js

Line 314 in f182ff1

this.aborted = true

which AFAIK basically does the same thing as controller.abort() that u used in this PR
The only difference here is that someone can catch the error down the line

The logic you described is handled in a method onResponseStart Which means, if the request was retried, on the retry process on consuming the data on the second time, we indicate that this error shouldn't be retried in the first place (this flow should not happen as body timeout by default throws, and no retry happens)

please correct me if I get you wrong again.

mcollina · 2026-01-03T12:08:51Z

@Uzlopak @artur-ma any updates on this?

test: deflake test/iisue-3356.js

6ce580c

Uzlopak requested review from mcollina and metcoder95 August 26, 2025 13:43

Uzlopak commented Aug 26, 2025

View reviewed changes

metcoder95 reviewed Aug 28, 2025

View reviewed changes

mcollina reviewed Aug 30, 2025

View reviewed changes

Merge branch 'main' into deflake-3356

4fc4ee2

Uh oh!

Conversation

Uzlopak commented Aug 26, 2025

This relates to...

Rationale

Changes

Features

Bug Fixes

Breaking Changes and Deprecations

Status

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fatal10110 commented Aug 28, 2025

Uh oh!

metcoder95 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uzlopak commented Aug 30, 2025

Uh oh!

Uzlopak commented Aug 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artur-ma commented Aug 31, 2025

Uh oh!

Uzlopak commented Aug 31, 2025

Uh oh!

metcoder95 commented Sep 1, 2025

Uh oh!

artur-ma commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uzlopak commented Sep 6, 2025

Uh oh!

artur-ma commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uzlopak commented Sep 7, 2025

Uh oh!

artur-ma commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcollina commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

artur-ma commented Sep 4, 2025 •

edited

Loading

artur-ma commented Sep 7, 2025 •

edited

Loading

artur-ma commented Sep 9, 2025 •

edited

Loading