perf: Speed up maketx --envlatl when multithreaded by over 10x. #4825

lgritz · 2025-07-12T06:19:10Z

As reported on Slack
(https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029), between OIIO 2.5 and 3.0, we had a big slowdown when doing a multithreaded maketx --envlatl, which was traced almost entirely to the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all happening inside the call stack of resize_block_<float>().

The reason was that it was calling ImageBuf::interppixel_NDC for every pixel, which in turn works by creating (internally) an ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of those ConstIterator constructors called init_ib, which briefly grabbed the mutex for the IB, which in addition to being wasteful on its own, caused the threads to block on each other.

The solution is straightforward: there is no need to construct a new Iterator for every pixel. We can create the iterator once (a single call to the init_ib for each thread region of the image) and then for each pixel, just call its rerange() method to reset the set of pixels to loop over for the samples needed for that pixel.

I also opportunistically eliminated a few redundant spec() calls in various routines in imagebuf.cpp.

On my Mac laptop, when doing a maketx --latlong on a 16k image with 16 threads, it was previously taking 170.5s (including 117.8s for the initial resize and 42.8s for the MIP computations). With this change, the same operation takes 12.4s (including 3.7s for the initial resize and 1.6s for he MIP computationss). That's almost a 14x speedup. YMMV, depending on platform, compiler, image size, and number of threads.

As reported on Slack (https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029), between OIIO 2.5 and 3.0, we had a big slowdown when doing a multithreaded `maketx --envlatl`, which was traced almost entirely to the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all happening inside the call stack of `resize_block_<float>()`. The reason was that it was calling ImageBuf::interppixel_NDC for every pixel, which in turn works by creating (internally) an ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of those ConstIterator constructors called init_ib, which briefly grabbed the mutex for the IB, which in addition to being wasteful on its own, caused the threads to block on each other. The solution is straightforward: there is no need to construct a new Iterator for every pixel. We can create the iterator once (a single call to the init_ib for each thread region of the image) and then for each pixel, just call its `rerange()` method to reset the set of pixels to loop over for the samples needed for that pixel. I also opportunistically eliminated a few redundant spec() calls in various routines in imagebuf.cpp. On my Mac laptop, when doing a maketx --latlong on a 16k image with 16 threads, it was previously taking 170.5s (including 117.8s for the initial resize and 42.8s for the MIP computations). With this change, the same operation takes 12.4s (including 3.7s for the initial resize and 1.6s for he MIP computationss). That's almost a 14x speedup. YMMV, depending on platform, compiler, image size, and number of threads. Signed-off-by: Larry Gritz <[email protected]>

jessey-git

Looks good and can confirm a very large speedup locally as well.

Before

use_tbb = 1 | threads = 12
Details:
maketx run time (seconds): 18.25
  file read:        0.40
  file write:       3.01
  initial resize:   0.48
  hash:             0.73
  pixelstats:       0.51
  mip computation: 13.07
  color convert:    0.00
  unaccounted:      0.57  ( 0.04  0.00  0.00  0.00  0.00)
maketx peak memory used: 2.3 GB

After

use_tbb = 1 | threads = 12
Details:
maketx run time (seconds):  6.52
  file read:        0.40
  file write:       3.02
  initial resize:   0.55
  hash:             0.96
  pixelstats:       0.51
  mip computation:  1.03
  color convert:    0.00
  unaccounted:      0.56  ( 0.02  0.00  0.00  0.00  0.00)
maketx peak memory used: 2.3 GB

…ademySoftwareFoundation#4825) [As reported on Slack](https://academysoftwarefdn.slack.com/archives/C05782U3806/p1751642730171029), between OIIO 2.5 and 3.0, we had a big slowdown when doing a multithreaded `maketx --envlatl`, which was traced almost entirely to the new mutex lock in ImageBuf::IteratorBase::init_ib(), which were all happening inside the call stack of `resize_block_<float>()`. The reason was that it was calling ImageBuf::interppixel_NDC for every pixel, which in turn works by creating (internally) an ImageBuf::ConstIterator to sample the 4 nearby pixels. But every one of those ConstIterator constructors called init_ib, which briefly grabbed the mutex for the IB, which in addition to being wasteful on its own, caused the threads to block on each other. The solution is straightforward: there is no need to construct a new Iterator for every pixel. We can create the iterator once (a single call to the init_ib for each thread region of the image) and then for each pixel, just call its `rerange()` method to reset the set of pixels to loop over for the samples needed for that pixel. I also opportunistically eliminated a few redundant spec() calls in various routines in imagebuf.cpp. On my Mac laptop, when doing a maketx --latlong on a 16k image with 16 threads, it was previously taking 170.5s (including 117.8s for the initial resize and 42.8s for the MIP computations). With this change, the same operation takes 12.4s (including 3.7s for the initial resize and 1.6s for he MIP computations). That's almost a 14x speedup. YMMV, depending on platform, compiler, image size, and number of threads. Signed-off-by: Larry Gritz <[email protected]>

jessey-git approved these changes Jul 13, 2025

View reviewed changes

lgritz merged commit 15aa81b into AcademySoftwareFoundation:main Jul 13, 2025
30 checks passed

lgritz deleted the lg-iblock branch July 13, 2025 21:54

lgritz added internals Internal changes, not public APIs texture / image cache ImageCache, TextureSystem, maketx image processing Related to ImageBufAlgo or other image processing topic. performance labels Jul 13, 2025

BrewTestBot mentioned this pull request Oct 2, 2025

openimageio 3.1.6.1 Homebrew/homebrew-core#246692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Speed up maketx --envlatl when multithreaded by over 10x. #4825

perf: Speed up maketx --envlatl when multithreaded by over 10x. #4825

Uh oh!

lgritz commented Jul 12, 2025

Uh oh!

jessey-git left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: Speed up maketx --envlatl when multithreaded by over 10x. #4825

perf: Speed up maketx --envlatl when multithreaded by over 10x. #4825

Uh oh!

Conversation

lgritz commented Jul 12, 2025

Uh oh!

jessey-git left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants