Skip to content

gh-121940: Reduce checking isatty on Windows write() #121941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Include/internal/pycore_fileutils.h
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,11 @@ extern int _Py_GetTicksPerSecond(long *ticks_per_second);
// Export for '_testcapi' shared extension
PyAPI_FUNC(int) _Py_IsValidFD(int fd);

#ifdef MS_WINDOWS
size_t _Py_LimitConsoleWriteSize(const void *buf, size_t requested_size,
size_t cap_size);
#endif

#ifdef __cplusplus
}
#endif
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Default buffer size :func:`os.write` on Windows no longer or splits the write.
Writing to the Windows console is still split to maintain responsiveness of
interrupts, but at a much larger size.
48 changes: 16 additions & 32 deletions Modules/_io/winconsoleio.c
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@
of less than one character */
#define SMALLBUF 4

/* Limit write size to consoles so that interrupts feel
responsive. */
#define WRITE_LIMIT_CONSOLE (1024 * 1024)

char _get_console_type(HANDLE handle) {
DWORD mode, peek_count;

Expand Down Expand Up @@ -134,24 +138,6 @@ char _PyIO_get_console_type(PyObject *path_or_fd) {
return m;
}

static DWORD
_find_last_utf8_boundary(const char *buf, DWORD len)
{
/* This function never returns 0, returns the original len instead */
DWORD count = 1;
if (len == 0 || (buf[len - 1] & 0x80) == 0) {
return len;
}
for (;; count++) {
if (count > 3 || count >= len) {
return len;
}
if ((buf[len - count] & 0xc0) != 0x80) {
return len - count;
}
}
}

/*[clinic input]
module _io
class _io._WindowsConsoleIO "winconsoleio *" "clinic_state()->PyWindowsConsoleIO_Type"
Expand Down Expand Up @@ -1016,25 +1002,23 @@ _io__WindowsConsoleIO_write_impl(winconsoleio *self, PyTypeObject *cls,
if (!b->len) {
return PyLong_FromLong(0);
}
if (b->len > BUFMAX)
len = BUFMAX;
/* Ensure len fits in a DWORD. This cap is larger than the write
limit because it doesn't respect utf-8 characters boundaries.
Rely on _Py_LimitConsoleWriteSize to do a character split. */
if (b->len > WRITE_LIMIT_CONSOLE * 2)
len = WRITE_LIMIT_CONSOLE * 2;
else
len = (DWORD)b->len;


/* Limit console write size to keep interactivity.

This is a soft cap / wlen may be higher, but that is
okay because it isn't a hard OS limit in Windows 8+. */
len = (DWORD)_Py_LimitConsoleWriteSize(b->buf, len, WRITE_LIMIT_CONSOLE);

Py_BEGIN_ALLOW_THREADS
wlen = MultiByteToWideChar(CP_UTF8, 0, b->buf, len, NULL, 0);

/* issue11395 there is an unspecified upper bound on how many bytes
can be written at once. We cap at 32k - the caller will have to
handle partial writes.
Since we don't know how many input bytes are being ignored, we
have to reduce and recalculate. */
while (wlen > 32766 / sizeof(wchar_t)) {
len /= 2;
/* Fix for github issues gh-110913 and gh-82052. */
len = _find_last_utf8_boundary(b->buf, len);
wlen = MultiByteToWideChar(CP_UTF8, 0, b->buf, len, NULL, 0);
}
Py_END_ALLOW_THREADS

if (!wlen)
Expand Down
72 changes: 65 additions & 7 deletions Python/fileutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,17 @@ int _Py_open_cloexec_works = -1;
// The value must be the same in unicodeobject.c.
#define MAX_UNICODE 0x10ffff

/* Limit write size on terminals in Windows to keep the interpreter
feeling responsive.

This is higher than WRITE_LIMIT_CONSOLE because `.write()`
is targeted at non-console I/O (but may happen to touch a tty). Use
WinConsoleIO for best console interactivity.

This should ideally be bigger than DEFAULT_BUFFER_SIZE so common
case write to file on disk is quick. */
#define WRITE_LIMIT_INTERACTIVE (5 * 1024 * 1024)

// mbstowcs() and mbrtowc() errors
static const size_t DECODE_ERROR = ((size_t)-1);
static const size_t INCOMPLETE_CHARACTER = (size_t)-2;
Expand Down Expand Up @@ -1923,20 +1934,18 @@ _Py_write_impl(int fd, const void *buf, size_t count, int gil_held)

_Py_BEGIN_SUPPRESS_IPH
#ifdef MS_WINDOWS
if (count > 32767) {
/* Issue #11395: the Windows console returns an error (12: not
enough space error) on writing into stdout if stdout mode is
binary and the length is greater than 66,000 bytes (or less,
depending on heap usage). */
/* isatty is guarded because don't want it in common case of
writing DEFAULT_BUFFER_SIZE to regular files (gh-121940). */
if (count > WRITE_LIMIT_INTERACTIVE) {
if (gil_held) {
Py_BEGIN_ALLOW_THREADS
if (isatty(fd)) {
Copy link
Contributor

@eryksun eryksun Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C isatty() is true for a file opened for any character device. It excludes pipes and disk files, but it's actually true for most devices, including "NUL". I'd rather only apply this limit to console files.

_PyIO_get_console_type() could be moved from "Modules/_io/winconsoleio.c" to "Python/fileutils.c" and renamed _Py_get_console_type().

The helper function _get_console_type() could also get a minor improvement. It depends on WinAPI GetConsoleMode(), which requires read access. Unfortunately, "CON" can only be opened for writing if read access isn't requested. Fortunately, however, the device driver has to first dereference the handle to access the kernel File object, and it fails immediately if the file is opened for some other device. In this case, the error code is ERROR_INVALID_HANDLE. Otherwise, if it's a console file that simply lacks read access, the error code is ERROR_ACCESS_DENIED. We should check for the latter if GetConsoleMode() fails. For example:

char _get_console_type(HANDLE handle) {
    DWORD mode, peek_count;

    if (handle == INVALID_HANDLE_VALUE) {
        return '\0';
    }
    if (!GetConsoleMode(handle, &mode) &&
        GetLastError() != ERROR_ACCESS_DENIED) {
        return '\0';
    }
    /* Peek at the handle to see whether it is an input or output handle */
    if (GetNumberOfConsoleInputEvents(handle, &peek_count)) {
        return 'r';
    }
    return 'w';
}

I'd also prefer to first check isatty() in _Py_get_console_type(), before getting the handle and calling _get_console_type(). The isatty() call is a quick check for a valid file descriptor and the presence of the FDEV flag. If it's true, then it's worth doing the extra work to actually check whether it's a console file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that making the check a lot more complicated will increase the runtime cost more than the savings of not having a write split. This PR currently doesn't change the check that is made, just makes it less commonly called.

In general I would really like to eliminate the size + isatty check from write altogether, but I think to do that need to figure out why people are using PYTHONLEGACYWINDOWSSTDIO to avoid the newer WinConsoleIO and fix the issues underlying that. I'm potentially open to that line of work in the future, but is a much larger scope project than what this PR is focused on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to do that need to figure out why people are using PYTHONLEGACYWINDOWSSTDIO

os.write() also calls _Py_write(). It still needs Ctrl+C interrupt support, regardless of "PYTHONLEGACYWINDOWSSTDIO".

As to why someone would use multibyte (legacy) console I/O, one reason would be that it actually works correctly in some cases nowadays1, which is overall simpler and more consistent than using io._WindowsConsoleIO.

I'm concerned that making the check a lot more complicated will increase the runtime cost more than the savings of not having a write split

All we really need to check here is isatty() and whether GetConsoleMode() mode succeeds or fails with ERROR_ACCESS_DENIED. The IOCTL for GetConsoleMode() poses no perceptible cost for interactive console I/O. The appreciable relative cost would be for other character devices, such as "NUL". But it's a fixed cost of a single system call, which fails immediately before doing any real work if it's not a console file2.

Note that the CRT's read() and write() functions themselves sometimes call isatty() and GetConsoleMode(), if the file is opened in text mode3. So it's not like calling GetConsoleMode() is unprecedented in terms of determining how a read or write is handled.

Footnotes

  1. The console host finally has functional UTF-8 (codepage 65001) support for both reads and writes -- or at least "openconsole.exe" does in recent releases of Windows Terminal. The system console host, "conhost.exe", still doesn't support UTF-8 reads correctly. But since "conhost.exe" is based on the same code as "openconsole.exe", I expect it will be fixed in the next release of Windows 11 later this year.

  2. Console API functions that aren't direct I/O requests are implemented by sending an IOCTL to a console connection file, opened on "\Device\ConDrv\Connect". A handle argument, if any, gets packed in the IOCTL input data. Thus, even for a file opened on another device such as "NUL", the device-type check for GetConsoleMode(hfile, &mode) is always handled by the "condrv.sys" driver, and not the driver for the actual device of hfile, such as "null.sys".

  3. The C runtime's text mode isn't used by builtin open(), but it's the default for os.open(), unless the O_BINARY flag is used. In some cases, such as an open in UTF-16 text mode, the C runtime switches to using ReadConsoleW() and WriteConsoleW() for a console file, instead of ReadFile() and WriteFile(). Note that this is just an in principle example since the CRT's UTF-8 and UTF-16 text modes aren't compatible with Python's I/O stack. They require reading and writing an even number of UTF-16 encoded bytes, which Python's raw and buffered I/O layers don't support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun I am not okay making the changes you're requesting here.

I agree that Ctrl-C working well for all Python Windows I/O is a valuable feature. I believe this PR makes some cases better, and doesn't hurt more general I/O cases. I believe there is no change in behavior with this PR on I/O where isatty returns False other than it doesn't call isatty anymore if the write is sufficiently small (< 1,000,000 bytes). Large individual write calls to fds where isatty returns False were not split before this PR, and are not split after it. I agree those may not be really responsive to Ctrl-C. This PR keeps existing behavior when isatty would return False.

My motivation for this PR was that every write() on Windows over a relatively small threshold called isatty and that seemed like unnecessary work. Increasing the number of system calls / instructions being run by the processor to do both isatty and GetConsoleMode doesn't align with that initial motivation. I understand that it may be more precise, and that both those calls are relatively quick, but my motivation for this PR was removing an individual isatty call most the time. My personal perspective is that the fastest thing to do is less. Code still needs to meet API guarantees, and in this case splitting unnecessarily is okay and something the previous code did significantly more often. In CPython main today isatty seems to be sufficiently precise today.

When isatty returns True, there are two behavior changes with this PR:

  1. If the write is small (Under 1,000,000 bytes), isatty is never called, and the write is never split whereas in main it is always split. Testing under Windows Terminal, cmd.exe, PowerShell.exe, and the Visual Studio integrated shell on my Windows 11 box a single print() from python of a extremely long string Ctrl-C feels responsive with the 1,000,000 cutoff to me. The behavior change here is this cap is raised from 32,766 to 1,000,000 bytes.
  2. If the write is large (Over 1,000,000 bytes), isatty is called for general I/O but not for _WinConsoleIO (matching existing behavior, just the higher cap from change 1). The write() will always be capped to at most 1,000,000 bytes. The code will try to find the end of a utf-8 character by searching backwards up to 4 bytes now, which is a behavior change (Previously only _WinConsoleIO did the back search)

I'm open to a path forward but I don't currently see one I am comfortable implementing. If the changes you're requesting in this review comment are required, I'll close this PR as I'm not going to make those changes. In that case, if someone else would like to pick up these changes, adopt, and make the requested changes they're welcome to, I'm just not interested in that work.

count = 32767;
count = _Py_LimitConsoleWriteSize(buf, count, WRITE_LIMIT_INTERACTIVE);
}
Py_END_ALLOW_THREADS
} else {
if (isatty(fd)) {
count = 32767;
count = _Py_LimitConsoleWriteSize(buf, count, WRITE_LIMIT_INTERACTIVE);
}
}
}
Expand Down Expand Up @@ -3101,3 +3110,52 @@ _Py_IsValidFD(int fd)
return (fstat(fd, &st) == 0);
#endif
}

#ifdef MS_WINDOWS
static size_t
_find_last_utf8_boundary(const char *buf, size_t len)
{
/* This function never returns 0, returns the original len instead */
DWORD count = 1;
if (len == 0 || (buf[len - 1] & 0x80) == 0) {
return len;
}
for (;; count++) {
if (count > 3 || count >= len) {
return len;
}
if ((buf[len - count] & 0xc0) != 0x80) {
return len - count;
}
}
}

/* Put a soft limit on the number of bytes to be written.

In older versions of Windows a hard limit was necessary because
there was a hard limit to the number of bytes (bpo-11395), but that
is not the case in Windows 8+.

For Windows 8+ the console host synchronizes I/O operations which
means a Ctrl-C doesn't generate an interrupt until after the write
is completed. That means large writes which take multiple seconds
will reduce responsiveness to interrupts.

This does a "soft cap" (not exact number of utf-16 bytes, but close
enough) to maintain responsiveness of consoles on
Windows (gh-121940). */
size_t _Py_LimitConsoleWriteSize(const void *buf, size_t requested_size,
size_t cap_size) {
if (requested_size <= cap_size) {
return requested_size;
}

/* Fix for github issues gh-110913 and gh-82052.

Splitting utf-8 can't be done at arbitrary byte boundaries
because that results in broken utf-8 byte sequences being
presented to the user. */
return _find_last_utf8_boundary(buf, cap_size);
}

#endif
Loading