Skip to content

Conversation

juj
Copy link
Collaborator

@juj juj commented Aug 31, 2025

Found problems:

  1. The convention for sbrk() changed at some point to require 16b alignment. (__alignof__(max_align_t)) This can create gaps in the alignment of emmalloc heap end sentinel region and the previous free region. This must require changing the assumption in emmalloc that heap end sentinel region would always be fixed 16b -> allow it to be arbitrary sized (in practice 16-31b).
  2. sbrk() implementation had round-up to nearest max_align_t, but didn't handle negative sbrk() values.
  3. in 4GB mode, a 32-bit sbrk() is not enough to accommodate support for both sbrk() increases and decreases (unless we take a convention that sbrk() can only be increased by < 2GB amount.. which probably won't hold in use cases that want to claim the whole heap immediately). Add a function sbrk64() that can be used in 4GB builds to be able to express both memory increases and decresases.
  4. test case expectations have changed at some point. test_emmalloc_trim was written with an assumption that emmalloc would acquire the whole heap with sbrk() at startup.. but no longer does so. Adjusted test to not assume this.
  5. printing bare memory values in test doesn't work well in e.g. ubsan test mode, so remove the direct value prints in the test, and just assert() the expectations.

Fixes: #23343

@juj juj force-pushed the fix_emmalloc_trim branch from c2453b5 to 87dcf39 Compare August 31, 2025 23:11
#if defined(__wasm64__) // TODO || !defined(wasm2gb)
// In the correct https://linux.die.net/man/2/sbrk spec, sbrk() parameter is
// intended to be treated as signed, meaning that it is not possible in a
// 32-bit program to sbrk alloc (or dealloc) more than 2GB of memory at once.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is indeed the case (I have no reason to believe its not) wouldn't it be better to fix the below bug rather than do this dance?

Maybe at least open a bug so that we can remove this code since it seems like its only need to workaround the bug?

Copy link
Collaborator Author

@juj juj Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better to fix the below bug rather than do this dance?

Yes, totally agree. A later PR. (possibly.. I don't know how hard it will be to do)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the full bug URL in the code here so we can clean this up in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

juj added 5 commits September 2, 2025 23:01
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (26) test expectation files were updated by
running the tests with `--rebaseline`:

```
code_size/test_codesize_cxx_ctors1.json: 149258 => 149263 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_ctors2.json: 148663 => 148668 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_except.json: 194681 => 194686 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_except_wasm.json: 164268 => 164273 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_except_wasm_legacy.json: 161857 => 161862 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_lto.json: 125540 => 125545 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_mangle.json: 258772 => 258777 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_noexcept.json: 151675 => 151680 [+5 bytes / +0.00%]
code_size/test_codesize_cxx_wasmfs.json: 176935 => 176940 [+5 bytes / +0.00%]
code_size/test_codesize_files_wasmfs.json: 55775 => 55780 [+5 bytes / +0.01%]
code_size/test_codesize_hello_dylink.json: 45534 => 45543 [+9 bytes / +0.02%]
code_size/test_codesize_hello_dylink_all.json: 843921 => 844093 [+172 bytes / +0.02%]
code_size/test_codesize_mem_O3.json: 9654 => 9659 [+5 bytes / +0.05%]
code_size/test_codesize_mem_O3_grow.json: 9940 => 9945 [+5 bytes / +0.05%]
code_size/test_codesize_mem_O3_grow_standalone.json: 9667 => 9672 [+5 bytes / +0.05%]
code_size/test_codesize_mem_O3_standalone.json: 9525 => 9530 [+5 bytes / +0.05%]
code_size/test_codesize_mem_O3_standalone_lib.json: 8836 => 8841 [+5 bytes / +0.06%]
code_size/test_codesize_mem_O3_standalone_narg.json: 8865 => 8870 [+5 bytes / +0.06%]
code_size/test_codesize_mem_O3_standalone_narg_flto.json: 7677 => 7682 [+5 bytes / +0.07%]
code_size/test_codesize_minimal_pthreads.json: 27242 => 27251 [+9 bytes / +0.03%]
code_size/test_codesize_minimal_pthreads_memgrowth.json: 27670 => 27679 [+9 bytes / +0.03%]
code_size/test_minimal_runtime_code_size_audio_worklet.json: 5882 => 5892 [+10 bytes / +0.17%]
code_size/test_minimal_runtime_code_size_hello_embind.json: 15112 => 15122 [+10 bytes / +0.07%]
code_size/test_minimal_runtime_code_size_hello_embind_val.json: 11750 => 11760 [+10 bytes / +0.09%]
code_size/test_minimal_runtime_code_size_hello_wasm_worker.json: 3203 => 3215 [+12 bytes / +0.37%]
test/parallel_testsuite.py updated

Average change: +0.05% (+0.00% - +0.37%)
```
@juj
Copy link
Collaborator Author

juj commented Sep 3, 2025

Would this look good now?

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@@ -1165,7 +1172,7 @@ struct mallinfo emmalloc_mallinfo() {
struct mallinfo info;
// Non-mmapped space allocated (bytes): For emmalloc,
// let's define this as the difference between heap size and dynamic top end.
info.arena = emscripten_get_heap_size() - (size_t)sbrk(0);
info.arena = emscripten_get_heap_size() - (size_t)sbrk64(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this line can be reverted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for emmalloc to only refer to this new sbrk64(). This way if user code only uses emmalloc and nothing else (doesn't sbrk() manually), then emmalloc will not pull in the flawed middle-man sbrk() function at all, but just the sbrk64() variant.

So that is why I changed emmalloc to only call to the "fixed" sbrk64().

return 0; // emmalloc is not controlling any dynamic memory at all - cannot release memory.
}
uint8_t *previousSbrkEndAddress = listOfAllRegions->endPtr;
assert(sbrk(0) == previousSbrkEndAddress);
void *sbrk_addr = sbrk64(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this can just be sbrk(0). Hopefully in the longer run we can remove sbrk64 so the fewer references to it the better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the opposite direction: sbrk() is flawed, so we must migrate to using sbrk64() even in 4GB builds.

(2GB builds don't care, since there the problem doesn't exist)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we agree that even if sbrk() is limited in how much it can allocate we need to fix our current sbrk() so that it can shrink memory, right? That is what #25138 is for I guess?

So is the idea that rather than updating all our malloc implementations to chop up sbrk() calls into in 2gb chunks, we would instead move them all over to using sbrk64?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should land this change using _sbrk64() and then we can bikeshed the final solution we want in #25138.

Once we make the new function public we can declare it in a header and remove the underscore prefix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we agree that even if sbrk() is limited in how much it can allocate we need to fix our current sbrk() so that it can shrink memory, right? That is what #25138 is for I guess?

Yeah, that's right. 25138 means that interpretation of existing 32-bit sbrk() must be turned back to signed again.

So is the idea that rather than updating all our malloc implementations to chop up sbrk() calls into in 2gb chunks, we would instead move them all over to using sbrk64?

That's true. At first I was thinking that callers can just chop their allocs to 2gb pieces, but the issue there is that then there won't be a guarantee that such sbrk() allocs are contiguous.

For example in a multithreaded program with bdwgc, different threads can race to sbrk() for native malloc(), and bdwgc races to sbrk() for itself. So if it was patched to alloc in 2gb slices, another thread could come in between, and the result for the chopped alloc would result in noncontinuous >2GB memory area being received, which would be catastrophic.

So best to just provide a int64_t utilizing sbrk64() api instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should land this change using _sbrk64() and then we can bikeshed the final solution we want in #25138.

Once we make the new function public we can declare it in a header and remove the underscore prefix.

Ok, updated.

@@ -61,6 +61,8 @@
#include <emscripten/heap.h>
#include <emscripten/threading.h>

void *sbrk64(int64_t numBytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets call this _sbrk64 to avoid polluting the global namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be important for end users to be able to call this function as well. I am already thinking that at Unity we'll patch bdwgc to call to this fixed sbrk64() function.

juj added 5 commits September 4, 2025 01:17
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (16) test expectation files were updated by
running the tests with `--rebaseline`:

```
code_size/test_codesize_cxx_ctors1.json: 149263 => 149258 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_ctors2.json: 148668 => 148663 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_except.json: 194686 => 194681 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_except_wasm.json: 164273 => 164268 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_except_wasm_legacy.json: 161862 => 161857 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_mangle.json: 258777 => 258772 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_noexcept.json: 151680 => 151675 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_wasmfs.json: 176940 => 176935 [-5 bytes / -0.00%]
code_size/test_codesize_files_wasmfs.json: 55780 => 55775 [-5 bytes / -0.01%]
code_size/test_codesize_hello_dylink_all.json: 844094 => 844094 [+0 bytes / +0.00%]
code_size/test_codesize_mem_O3.json: 9659 => 9654 [-5 bytes / -0.05%]
code_size/test_codesize_mem_O3_grow.json: 9945 => 9940 [-5 bytes / -0.05%]
code_size/test_codesize_mem_O3_grow_standalone.json: 9672 => 9667 [-5 bytes / -0.05%]
code_size/test_codesize_mem_O3_standalone.json: 9530 => 9525 [-5 bytes / -0.05%]
code_size/test_codesize_mem_O3_standalone_lib.json: 8841 => 8836 [-5 bytes / -0.06%]
code_size/test_codesize_mem_O3_standalone_narg.json: 8870 => 8865 [-5 bytes / -0.06%]

Average change: -0.02% (-0.06% - +0.00%)
```
@sbc100 sbc100 changed the title Fix emmalloc_trim() to work again. #23343. Fix emmalloc_trim() to work again Sep 3, 2025
@juj
Copy link
Collaborator Author

juj commented Sep 3, 2025

Hmm I'm having trouble landing this: I can't get the rebaseline_expectations.py script to create expectations that will match what's desired. (tried running the script on Windows and on Linux)

Would it make sense to land, and then run the GitHub Action?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 3, 2025

Hmm I'm having trouble landing this: I can't get the rebaseline_expectations.py script to create expectations that will match what's desired. (tried running the script on Windows and on Linux)

Would it make sense to land, and then run the GitHub Action?

Thats one option.

But using ./emsdk install tot followed by ./emcc --clear-cache followed by tools/maint/rebaseline_tests.py should always works (modulo real bugs that we should track down).

This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (31) test expectation files were updated by
running the tests with `--rebaseline`:

```
code_size/test_codesize_cxx_ctors1.json: 149258 => 149271 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_ctors2.json: 148663 => 148676 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_except.json: 194681 => 194694 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_except_wasm.json: 164268 => 164281 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_except_wasm_legacy.json: 161857 => 161870 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_lto.json: 125545 => 125540 [-5 bytes / -0.00%]
code_size/test_codesize_cxx_mangle.json: 258772 => 258785 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_noexcept.json: 151675 => 151688 [+13 bytes / +0.01%]
code_size/test_codesize_cxx_wasmfs.json: 176935 => 176948 [+13 bytes / +0.01%]
code_size/test_codesize_files_wasmfs.json: 55775 => 55786 [+11 bytes / +0.02%]
code_size/test_codesize_hello_dylink.json: 45543 => 45546 [+3 bytes / +0.01%]
code_size/test_codesize_hello_dylink_all.json: 844094 => 844086 [-8 bytes / -0.00%]
code_size/test_codesize_mem_O3.json: 9654 => 9666 [+12 bytes / +0.12%]
code_size/test_codesize_mem_O3_grow.json: 9940 => 9952 [+12 bytes / +0.12%]
code_size/test_codesize_mem_O3_grow_standalone.json: 9667 => 9679 [+12 bytes / +0.12%]
code_size/test_codesize_mem_O3_standalone.json: 9525 => 9537 [+12 bytes / +0.13%]
code_size/test_codesize_mem_O3_standalone_lib.json: 8836 => 8848 [+12 bytes / +0.14%]
code_size/test_codesize_mem_O3_standalone_narg.json: 8865 => 8877 [+12 bytes / +0.14%]
code_size/test_codesize_mem_O3_standalone_narg_flto.json: 7682 => 7677 [-5 bytes / -0.07%]
code_size/test_codesize_minimal_pthreads.json: 27251 => 27242 [-9 bytes / -0.03%]
code_size/test_codesize_minimal_pthreads_memgrowth.json: 27679 => 27670 [-9 bytes / -0.03%]
code_size/test_minimal_runtime_code_size_audio_worklet.json: 5892 => 5882 [-10 bytes / -0.17%]
code_size/test_minimal_runtime_code_size_hello_embind.json: 15122 => 15112 [-10 bytes / -0.07%]
code_size/test_minimal_runtime_code_size_hello_embind_val.json: 11760 => 11750 [-10 bytes / -0.09%]
code_size/test_minimal_runtime_code_size_hello_wasm_worker.json: 3215 => 3203 [-12 bytes / -0.37%]
code_size/test_minimal_runtime_code_size_hello_webgl2_wasm.json: 13208 => 13190 [-18 bytes / -0.14%]
code_size/test_minimal_runtime_code_size_hello_webgl2_wasm2js.json: 18553 => 18536 [-17 bytes / -0.09%]
code_size/test_minimal_runtime_code_size_hello_webgl_wasm.json: 12746 => 12728 [-18 bytes / -0.14%]
code_size/test_minimal_runtime_code_size_hello_webgl_wasm2js.json: 18079 => 18063 [-16 bytes / -0.09%]
code_size/test_minimal_runtime_code_size_random_printf_wasm.json: 12507 => 12507 [+0 bytes / +0.00%]
code_size/test_unoptimized_code_size.json: 174633 => 174633 [+0 bytes / +0.00%]

Average change: -0.01% (-0.37% - +0.14%)
```
@juj
Copy link
Collaborator Author

juj commented Sep 4, 2025

But using ./emsdk install tot followed by ./emcc --clear-cache followed by tools/maint/rebaseline_tests.py should always works (modulo real bugs that we should track down).

Ok, rebaselined with emsdk install tot, let's see if that does the trick

@juj juj merged commit 0a0cd05 into emscripten-core:main Sep 4, 2025
25 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

emmalloc_trim broken since #13442?
2 participants