Skip to content

Conversation

@lgritz
Copy link
Collaborator

@lgritz lgritz commented Jul 12, 2025

As reported on Slack
(https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029), between OIIO 2.5 and 3.0, we had a big slowdown when doing a multithreaded maketx --envlatl, which was traced almost entirely to the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all happening inside the call stack of resize_block_<float>().

The reason was that it was calling ImageBuf::interppixel_NDC for every pixel, which in turn works by creating (internally) an ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of those ConstIterator constructors called init_ib, which briefly grabbed the mutex for the IB, which in addition to being wasteful on its own, caused the threads to block on each other.

The solution is straightforward: there is no need to construct a new Iterator for every pixel. We can create the iterator once (a single call to the init_ib for each thread region of the image) and then for each pixel, just call its rerange() method to reset the set of pixels to loop over for the samples needed for that pixel.

I also opportunistically eliminated a few redundant spec() calls in various routines in imagebuf.cpp.

On my Mac laptop, when doing a maketx --latlong on a 16k image with 16 threads, it was previously taking 170.5s (including 117.8s for the initial resize and 42.8s for the MIP computations). With this change, the same operation takes 12.4s (including 3.7s for the initial resize and 1.6s for he MIP computationss). That's almost a 14x speedup. YMMV, depending on platform, compiler, image size, and number of threads.

As reported on Slack
(https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029),
between OIIO 2.5 and 3.0, we had a big slowdown when doing a
multithreaded `maketx --envlatl`, which was traced almost entirely to
the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all
happening inside the call stack of `resize_block_<float>()`.

The reason was that it was calling ImageBuf::interppixel_NDC for every
pixel, which in turn works by creating (internally) an
ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one
of those ConstIterator constructors called init_ib, which briefly
grabbed the mutex for the IB, which in addition to being wasteful on
its own, caused the threads to block on each other.

The solution is straightforward: there is no need to construct a new
Iterator for every pixel. We can create the iterator once (a single
call to the init_ib for each thread region of the image) and then for
each pixel, just call its `rerange()` method to reset the set of
pixels to loop over for the samples needed for that pixel.

I also opportunistically eliminated a few redundant spec() calls in
various routines in imagebuf.cpp.

On my Mac laptop, when doing a maketx --latlong on a 16k image with 16
threads, it was previously taking 170.5s (including 117.8s for the
initial resize and 42.8s for the MIP computations). With this change,
the same operation takes 12.4s (including 3.7s for the initial resize
and 1.6s for he MIP computationss). That's almost a 14x speedup.
YMMV, depending on platform, compiler, image size, and number of
threads.

Signed-off-by: Larry Gritz <[email protected]>
Copy link
Contributor

@jessey-git jessey-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and can confirm a very large speedup locally as well.

Before

use_tbb = 1 | threads = 12
Details:
maketx run time (seconds): 18.25
  file read:        0.40
  file write:       3.01
  initial resize:   0.48
  hash:             0.73
  pixelstats:       0.51
  mip computation: 13.07
  color convert:    0.00
  unaccounted:      0.57  ( 0.04  0.00  0.00  0.00  0.00)
maketx peak memory used: 2.3 GB

After

use_tbb = 1 | threads = 12
Details:
maketx run time (seconds):  6.52
  file read:        0.40
  file write:       3.02
  initial resize:   0.55
  hash:             0.96
  pixelstats:       0.51
  mip computation:  1.03
  color convert:    0.00
  unaccounted:      0.56  ( 0.02  0.00  0.00  0.00  0.00)
maketx peak memory used: 2.3 GB

@lgritz lgritz merged commit 15aa81b into AcademySoftwareFoundation:main Jul 13, 2025
30 checks passed
@lgritz lgritz deleted the lg-iblock branch July 13, 2025 21:54
@lgritz lgritz added internals Internal changes, not public APIs texture / image cache ImageCache, TextureSystem, maketx image processing Related to ImageBufAlgo or other image processing topic. performance labels Jul 13, 2025
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Jul 14, 2025
…ademySoftwareFoundation#4825)

[As reported on Slack](https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029),
between OIIO 2.5 and 3.0, we had a big slowdown when doing a
multithreaded `maketx --envlatl`, which was traced almost entirely to
the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all
happening inside the call stack of `resize_block_<float>()`.

The reason was that it was calling ImageBuf::interppixel_NDC for every
pixel, which in turn works by creating (internally) an
ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of
those ConstIterator constructors called init_ib, which briefly grabbed
the mutex for the IB, which in addition to being wasteful on its own,
caused the threads to block on each other.

The solution is straightforward: there is no need to construct a new
Iterator for every pixel. We can create the iterator once (a single call
to the init_ib for each thread region of the image) and then for each
pixel, just call its `rerange()` method to reset the set of pixels to
loop over for the samples needed for that pixel.

I also opportunistically eliminated a few redundant spec() calls in
various routines in imagebuf.cpp.

On my Mac laptop, when doing a maketx --latlong on a 16k image with 16
threads, it was previously taking 170.5s (including 117.8s for the
initial resize and 42.8s for the MIP computations). With this change,
the same operation takes 12.4s (including 3.7s for the initial resize
and 1.6s for he MIP computations). That's almost a 14x speedup. YMMV,
depending on platform, compiler, image size, and number of threads.

Signed-off-by: Larry Gritz <[email protected]>
zachlewis pushed a commit to zachlewis/OpenImageIO that referenced this pull request Aug 1, 2025
…ademySoftwareFoundation#4825)

[As reported on Slack](https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029),
between OIIO 2.5 and 3.0, we had a big slowdown when doing a
multithreaded `maketx --envlatl`, which was traced almost entirely to
the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all
happening inside the call stack of `resize_block_<float>()`.

The reason was that it was calling ImageBuf::interppixel_NDC for every
pixel, which in turn works by creating (internally) an
ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of
those ConstIterator constructors called init_ib, which briefly grabbed
the mutex for the IB, which in addition to being wasteful on its own,
caused the threads to block on each other.

The solution is straightforward: there is no need to construct a new
Iterator for every pixel. We can create the iterator once (a single call
to the init_ib for each thread region of the image) and then for each
pixel, just call its `rerange()` method to reset the set of pixels to
loop over for the samples needed for that pixel.

I also opportunistically eliminated a few redundant spec() calls in
various routines in imagebuf.cpp.

On my Mac laptop, when doing a maketx --latlong on a 16k image with 16
threads, it was previously taking 170.5s (including 117.8s for the
initial resize and 42.8s for the MIP computations). With this change,
the same operation takes 12.4s (including 3.7s for the initial resize
and 1.6s for he MIP computations). That's almost a 14x speedup. YMMV,
depending on platform, compiler, image size, and number of threads.

Signed-off-by: Larry Gritz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

image processing Related to ImageBufAlgo or other image processing topic. internals Internal changes, not public APIs performance texture / image cache ImageCache, TextureSystem, maketx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants