Skip to content

Conversation

@lgritz
Copy link
Collaborator

@lgritz lgritz commented Oct 1, 2025

Fixes #4641

Has something to do with mixing Windows compiler versions, there's a subtle ABI compatibility issue that this sidesteps.

Fixes AcademySoftwareFoundation#4641

Has something to do with mixing Windows compiler versions, there's a
subtle ABI compatibility issue that this sidesteps.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz requested a review from zachlewis October 1, 2025 03:43
@lgritz lgritz added build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. windows Problems specific to MS Windows OS labels Oct 1, 2025
Copy link
Collaborator

@zachlewis zachlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really equipped to tell if this actually fixes the reported issues, but it can't hurt to try...

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 1, 2025

I'm not really equipped to tell if this actually fixes the reported issues, but it can't hurt to try...

I mean, I did copy it verbatim from YOUR FORK.

@zachlewis
Copy link
Collaborator

Who can really say why I do the things I do? 🤷

I think I was pursuing a handful of different possible solutions, based on suggestions I was receiving in the thread and on Slack; but it was never clear to me what would or wouldn't work, since I couldn't reproduce the issue, and didn't have the means to further investigate.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 1, 2025

I don't see how it can harm things, and I've seen three places (you, the Issue referenced, and a thread on vfx-platform discussion email) that all suggest this change.

@zachlewis
Copy link
Collaborator

zachlewis commented Oct 1, 2025

I don't think it can harm things either, but I suspect it's a red herring.

Apparently, manually updating the VC runtime solves the issue for those who are having it; so I suspect what we really want to do is link the static MSVC runtime instead of the dynamic:

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:DebugDLL>" CACHE STRING "")
cmake_policy(SET CMP0091 NEW)  # ensures that CMake obeys the above property

@nrusch
Copy link
Contributor

nrusch commented Oct 1, 2025

I'll chime in to say that, in the absence of OIIO enforcing a strict MSVC/VC runtime build matrix in some way (which would be a huge and I suspect largely thankless exercise), I think this is the "most correct" solution to the problem.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 1, 2025

@nrusch "this is the most correct solution" -- do you mean this PR?

@nrusch
Copy link
Contributor

nrusch commented Oct 1, 2025

Sorry, yes, I mean this PR.

Apparently, manually updating the VC runtime solves the issue for those who are having it; so I suspect what we really want to do is link the static MSVC runtime instead of the dynamic:

This would be another good option if it's acceptable from a licensing standpoint and reasonably straightforward, as long as the OIIO wheel doesn't end up shipping any VC runtime DLLs.

I'm less familiar with Windows link semantics/quirks than with Linux, but I know there is a flavor of "static linking" on Windows that effectively uses a static shim in front of a DLL, so the end result is that what you end up embedding in your library are sort of proxy symbols to load the "real" symbols from the DLL (which you must also ship).

Shipping a runtime DLL with the wheel is something I would try and avoid at all costs, because it basically creates a potential future version of this same sort of binary compatibility issue for any users of the wheel. This is something we've run into internally with e.g. PySide2.

@lgritz lgritz marked this pull request as ready for review October 1, 2025 20:37
@LazyDodo
Copy link
Contributor

LazyDodo commented Oct 1, 2025

The fix is correct as far as it can be, we ran into this with blender+otio a while ago and this is the best i was able to disect the issue.

At the core of the problem is this nugget of wisdom as outlined on the C++ binary compatibility between Visual Studio versions page

The Redistributable your app uses has a similar binary-compatibility restriction. When you mix binaries built by different supported versions of the toolset, the Redistributable version must be at least as new as the latest toolset used by any app component.

we (As in developers in general, not blender specifically) all seemingly have been playing fast and lose with this rule, and have gotten away with it for the most part, this is the first time it actually came around to bite us in the rear, and Microsoft rightfully went "told you so" and pointed to their documentation.

using the _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR escape hatch will allow it to survive when it encounters an older runtime than what oiio was build against, but it's not as much a fix as a workaround. Other problems may be lurking when having code rely on a newer crt and not finding getting it.

Linking the crt statically may sound like a appealing option, but you'll have to be a 100000% sure no crt objects will escape (ie no transferring FILE* handles in/out of any public api) as while it is absolutely fine to have two different statically linked crt versions loaded into a single process on windows, when those 2 crt's encounter each others objects it panics and and shoots the process in the face in the most brutal way possible that not even your debugger will have a chance to attach.

@zachlewis
Copy link
Collaborator

I really appreciate the insight, guys!

YOUR FORK.

Ah, now I remember what's going on... I created zachlewis:wheels_static_windows to generate wheels build artifacts for @MrLixm to test, to see if the /D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR business did the trick... apparently, it did not, which is why I never submitted a PR.

Here's Liam's test command:

uv run --with PySide2 --with "G:\temp\cibw-wheels-cp310-win_amd64\openimageio-3.1.0.0-cp310-cp310-win_amd64.whl" --with numpy<2 --python 3.10 python -c "from PySide2 import QtWidgets;import OpenImageIO as oiio;oiio.ImageBuf(oiio.ImageSpec(640, 480, 3, 'float'));print('success')" ; echo $LastExitCode

So... if we don't want to do anything too contrived with linking the CRT, I think we need to dig deeper for another escape hatch.

@nrusch
Copy link
Contributor

nrusch commented Oct 2, 2025

Ah, thanks for mentioning that. That shook another latent memory loose in my brain...

Quoting myself from my previous message:

Shipping a runtime DLL with the wheel is something I would try and avoid at all costs, because it basically creates a potential future version of this same sort of binary compatibility issue for any users of the wheel. This is something we've run into internally with e.g. PySide2.

The PySide2 wheel that's available from PyPI does include a copy of the VC runtime and, as you can probably imagine, it's not up to date. That's actually how I originally discovered this mutex compatibility issue... but with OpenColorIO, not OIIO.

Because of this, in the test command you posted, the fact that it's importing PySide2 first means that PySide2 will end up loading its copy of the runtime first. Then OIIO (and its dependencies!) come along, see that those dependencies have already been satisfied, and try to use the existing symbols. And we know how that goes.

Now, you tested applying this compiler define to OIIO, but assuming the OIIO wheel includes OCIO, you would need to include the _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR in the compile defs for the OCIO build as well (I haven't looked at the OCIO deps build process in depth, so I don't know which, if any, compiler flags are shared between the core lib and the deps).

A simple test to confirm this theory is:

  1. Make sure the machine in question has the latest version of the VC runtime redistributable installed.
  2. Swap the import order in that test command so that OIIO is imported first. This should trigger the loading of the system-level runtime (since OIIO does not bundle its own) and circumvent the crash.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 10, 2025

@zachlewis, @LazyDodo, @nrusch: Where are we on this? Is it helpful at least a little and I should merge? Or is it unrelated to the real problem and I should abandon?

@zachlewis
Copy link
Collaborator

I'm so sorry, I must've missed Nathan's reply.

Let's definitely give this a shot! It sounds like we may also need to do this for OpenColorIO as well (perhaps in our OCIO recipe for now, and/or in the actual OCIO codebase later). Does that sound reasonable?

@LazyDodo
Copy link
Contributor

LazyDodo commented Oct 10, 2025

I have no wisdom here, the real solution is to upgrade the msvc runtime on the systems #4641 rears its ugly head to whatever oiio was build against, but depending on the runtime environment, that may not always be possible.

Given there is no easy solution to the "what msvc runtime version will i be running with"-problem, i'd say lean towards the option the end user loses the least amount of work with and build oiio when targeting pipy with the hatch and hope an incompatibility like this doesn't happen again (but it likely will....)

That being said i wouldn't hard-code it like it currently is, there's clearly a performance benefit to be had, otherwise ms would not have broken this for funsies, i'd enable it for the builds that feed into pipy and leave this choice to make for other downstream users of oiio them selves. I mean if you're in control of the runtime environment there's no reason not to ship with this optimization enabled.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 11, 2025

i'd say lean towards the option the end user loses the least amount of work with and build oiio when targeting pipy with the hatch and hope an incompatibility like this doesn't happen again (but it likely will....)

I literally don't know which option you're mean, whether you are telling me to merge or not. Assume that I haven't had substantial Windows experience since the mid 90s, and am basically a child who doesn't understand any of these tradeoffs. I need to be told exactly what to do in the simplest possible language:

  • Yes, Larry, merge this PR, it can't hurt and maybe it helps somebody.
  • No, close this PR without merging, it's more likely to make things worse, or at best be totally irrelevant.
  • Modify the PR in this precise way: ...

@LazyDodo
Copy link
Contributor

I'm suggesting to build the pipy distribution with /D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR, but don't force the option other oiio users that may be building from source and could very well have a firmer grasp on the runtime environment. Probably easiest to put it behind a cmake option.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 11, 2025

From what I have read, the issue is that fairly recently, MSVS added a new constexpr version of the constructor for std::mutex. Which I guess is important if you must use a mutex in a constexpr context, and maybe it could lead to some minor performance improvement in some cases? But the downside is that if you do it and end up in an environment with mixed runtime library choices, you can get crashes.

I feel like making a CMake variable to control it and expecting downstream users to notice that and set it properly is a pretty big ask, unless there is a really compelling advantage to not disabling it when you don't need to. Because if they make the choice wrong, of course they open themselves to the crash, depending on what else it's linked with, right?

The thing is, I'm not aware that OIIO ever depends on mutex ctr being constexpr. (Strong supporting evidence: everything was just fine before the constexpr declaration was added to MSVS.) And even if there is a possible small performance improvement, it's a newly added improvement, so the worst case is we run at the same speed as we always did. And maybe the improvement is not even measurable or doesn't apply at all in any of the places we use it.

So I'm inclined to just disable it always, to guarantee not getting the crash and save downstream users from guessing wrong.

@LazyDodo
Copy link
Contributor

Fair enough, you can merge it as is then.

@lgritz lgritz merged commit f6be156 into AcademySoftwareFoundation:main Oct 11, 2025
62 checks passed
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Oct 11, 2025
…ndation#4914)

Fixes AcademySoftwareFoundation#4641

Has something to do with mixing Windows compiler versions, there's a
subtle ABI compatibility issue that this sidesteps.

Signed-off-by: Larry Gritz <[email protected]>
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Oct 11, 2025
…ndation#4914)

Fixes AcademySoftwareFoundation#4641

Has something to do with mixing Windows compiler versions, there's a
subtle ABI compatibility issue that this sidesteps.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz deleted the lg-winmutex branch October 11, 2025 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. windows Problems specific to MS Windows OS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] OpenImageIO crashes without error on some Windows systems

4 participants