`TcpListener::accept` not retrying transient errors is a footgun

Currently, `TcpListener::accept` forwards to a single instance of the underlying libc call, propagating any encountered errors. This encourages using the function in one of two ways:

- Propagating the error to the `main` function, causing the application to exit.
- Sleeping when an error is encountered, and retrying `accept` later.

Unfortunately, both of these are incorrect, any may lead to denial-of-service attacks; in particular, I believe erroneous returns from `accept` can be triggered by a client, thus either causing the server to crash or no more connections to be accepted for some period of time. There are four kinds of error that can be returned from `accept()`:

1. Client / transient errors. These should be handled by immediately calling `accept` again.
2. Resource exhaustion, or otherwise device failure. These are typically handled by logging the error, sleeping for some time, and then retrying.
3. Programmer errors. These should be handled by exiting the application.
4. Asynchronous errors (i.e. `WouldBlock`). These should be handled by waiting for further events via the chosen I/O driver, and are generally handled correctly today.

For prior art and further reading, see:

- [accept(2)'s problem of trying to return two different sorts of errors](https://utcc.utoronto.ca/~cks/space/blog/unix/AcceptErrnoProblem), which explains the issue
- [Which socket accept() errors are fatal?](https://stackoverflow.com/questions/76955978/which-socket-accept-errors-are-fatal), a fairly comprehensive summary of the various kinds of error that can occur in `accept()`
- [man accept(2)](https://man7.org/linux/man-pages/man2/accept.2.html), which has two relevant excerpts:
    - > `ECONNABORTED`: A connection has been aborted.
    - > Linux accept() (and accept4()) passes already-pending network errors on the new socket as an error code from accept().  This behavior differs from other BSD socket implementations.  For reliable operation the application should detect the network errors defined for the protocol after accept() and treat them like EAGAIN by retrying.  In the case of TCP/IP, these are ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH.
- [WSAAccept](https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaaccept), which  has two relevant excerpts:
    - > `WSAECONNREFUSED`: No connection could be made because the target machine actively refused it. This error is returned if the connection request was forcefully rejected as indicated in the return value of the condition function (CF_REJECT).
    - > `WSAECONNRESET`: An existing connection was forcibly closed by the remote host. This error is returned of an incoming connection was indicated, but was subsequently terminated by the remote peer prior to accepting the call.
- [axum](https://docs.rs/axum/0.8.4/src/axum/serve/listener.rs.html#160-167), who follows Hyper 0.14 in treating `ConnectionRefused`, `ConnectionAborted` and `ConnectionReset` as especially non-fatal (and sleeps on the rest)

## A proposal: modifying `accept()`

A possible solution is to internally retry in `accept` in the following cases:
- On all platforms, `io::ErrorKind::Interrupted`
- On all platforms, `io::ErrorKind::Conection{Aborted, Refused, Reset}`
- On Linux only, `ENETDOWN`, `EPROTO`, `ENOPROTOOPT`, `EHOSTDOWN`, `ENONET`, `EHOSTUNREACH`, `EOPNOTSUPP`, and `ENETUNREACH`

This approach differentiates between client errors and resource exhaustion, but not between resource exhaustion and programmer errors. This is, however, far less of a footgun: it is generally not dangerous to treat a programmer error the same way as resource exhaustion. In the method documentation, we should encourage logging and sleeping on errors returned from `.accept()`, with an optional clause that certain errors known to be programmer errors may be handled specially if the user so desires.

### Why not a separate method or external crate?

The functionality of retrying on transient errors could be provided by another method on `TcpListener`, e.g. `.accept_retries()` (in a similar vein to `write_all`), or by a third-party crate. Whether we choose this option depends on two factors:
1. How widespread we believe this incorrect handling of `accept()` is, and how damaging it is. I have not collected data on this, but given that the method docs do not mention the possibility of transient errors at all, I suspect that it is rather frequent.
2. How likely we believe it is that making this change will break user code. I can’t immediately think of a case in which code would be broken, but it is always possible they exist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`TcpListener::accept` not retrying transient errors is a footgun #142557

A proposal: modifying `accept()`

Why not a separate method or external crate?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TcpListener::accept not retrying transient errors is a footgun #142557

Description

A proposal: modifying accept()

Why not a separate method or external crate?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`TcpListener::accept` not retrying transient errors is a footgun #142557

A proposal: modifying `accept()`