-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix emmalloc_trim() to work again #25126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
#if defined(__wasm64__) // TODO || !defined(wasm2gb) | ||
// In the correct https://linux.die.net/man/2/sbrk spec, sbrk() parameter is | ||
// intended to be treated as signed, meaning that it is not possible in a | ||
// 32-bit program to sbrk alloc (or dealloc) more than 2GB of memory at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is indeed the case (I have no reason to believe its not) wouldn't it be better to fix the below bug rather than do this dance?
Maybe at least open a bug so that we can remove this code since it seems like its only need to workaround the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be better to fix the below bug rather than do this dance?
Yes, totally agree. A later PR. (possibly.. I don't know how hard it will be to do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include the full bug URL in the code here so we can clean this up in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (26) test expectation files were updated by running the tests with `--rebaseline`: ``` code_size/test_codesize_cxx_ctors1.json: 149258 => 149263 [+5 bytes / +0.00%] code_size/test_codesize_cxx_ctors2.json: 148663 => 148668 [+5 bytes / +0.00%] code_size/test_codesize_cxx_except.json: 194681 => 194686 [+5 bytes / +0.00%] code_size/test_codesize_cxx_except_wasm.json: 164268 => 164273 [+5 bytes / +0.00%] code_size/test_codesize_cxx_except_wasm_legacy.json: 161857 => 161862 [+5 bytes / +0.00%] code_size/test_codesize_cxx_lto.json: 125540 => 125545 [+5 bytes / +0.00%] code_size/test_codesize_cxx_mangle.json: 258772 => 258777 [+5 bytes / +0.00%] code_size/test_codesize_cxx_noexcept.json: 151675 => 151680 [+5 bytes / +0.00%] code_size/test_codesize_cxx_wasmfs.json: 176935 => 176940 [+5 bytes / +0.00%] code_size/test_codesize_files_wasmfs.json: 55775 => 55780 [+5 bytes / +0.01%] code_size/test_codesize_hello_dylink.json: 45534 => 45543 [+9 bytes / +0.02%] code_size/test_codesize_hello_dylink_all.json: 843921 => 844093 [+172 bytes / +0.02%] code_size/test_codesize_mem_O3.json: 9654 => 9659 [+5 bytes / +0.05%] code_size/test_codesize_mem_O3_grow.json: 9940 => 9945 [+5 bytes / +0.05%] code_size/test_codesize_mem_O3_grow_standalone.json: 9667 => 9672 [+5 bytes / +0.05%] code_size/test_codesize_mem_O3_standalone.json: 9525 => 9530 [+5 bytes / +0.05%] code_size/test_codesize_mem_O3_standalone_lib.json: 8836 => 8841 [+5 bytes / +0.06%] code_size/test_codesize_mem_O3_standalone_narg.json: 8865 => 8870 [+5 bytes / +0.06%] code_size/test_codesize_mem_O3_standalone_narg_flto.json: 7677 => 7682 [+5 bytes / +0.07%] code_size/test_codesize_minimal_pthreads.json: 27242 => 27251 [+9 bytes / +0.03%] code_size/test_codesize_minimal_pthreads_memgrowth.json: 27670 => 27679 [+9 bytes / +0.03%] code_size/test_minimal_runtime_code_size_audio_worklet.json: 5882 => 5892 [+10 bytes / +0.17%] code_size/test_minimal_runtime_code_size_hello_embind.json: 15112 => 15122 [+10 bytes / +0.07%] code_size/test_minimal_runtime_code_size_hello_embind_val.json: 11750 => 11760 [+10 bytes / +0.09%] code_size/test_minimal_runtime_code_size_hello_wasm_worker.json: 3203 => 3215 [+12 bytes / +0.37%] test/parallel_testsuite.py updated Average change: +0.05% (+0.00% - +0.37%) ```
Would this look good now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
system/lib/emmalloc.c
Outdated
@@ -1165,7 +1172,7 @@ struct mallinfo emmalloc_mallinfo() { | |||
struct mallinfo info; | |||
// Non-mmapped space allocated (bytes): For emmalloc, | |||
// let's define this as the difference between heap size and dynamic top end. | |||
info.arena = emscripten_get_heap_size() - (size_t)sbrk(0); | |||
info.arena = emscripten_get_heap_size() - (size_t)sbrk64(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this line can be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is for emmalloc to only refer to this new sbrk64(). This way if user code only uses emmalloc and nothing else (doesn't sbrk() manually), then emmalloc will not pull in the flawed middle-man sbrk() function at all, but just the sbrk64() variant.
So that is why I changed emmalloc to only call to the "fixed" sbrk64().
system/lib/emmalloc.c
Outdated
return 0; // emmalloc is not controlling any dynamic memory at all - cannot release memory. | ||
} | ||
uint8_t *previousSbrkEndAddress = listOfAllRegions->endPtr; | ||
assert(sbrk(0) == previousSbrkEndAddress); | ||
void *sbrk_addr = sbrk64(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this can just be sbrk(0)
. Hopefully in the longer run we can remove sbrk64
so the fewer references to it the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the opposite direction: sbrk()
is flawed, so we must migrate to using sbrk64()
even in 4GB builds.
(2GB builds don't care, since there the problem doesn't exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we agree that even if sbrk()
is limited in how much it can allocate we need to fix our current sbrk()
so that it can shrink memory, right? That is what #25138 is for I guess?
So is the idea that rather than updating all our malloc implementations to chop up sbrk()
calls into in 2gb chunks, we would instead move them all over to using sbrk64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should land this change using _sbrk64()
and then we can bikeshed the final solution we want in #25138.
Once we make the new function public we can declare it in a header and remove the underscore prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we agree that even if
sbrk()
is limited in how much it can allocate we need to fix our currentsbrk()
so that it can shrink memory, right? That is what #25138 is for I guess?
Yeah, that's right. 25138 means that interpretation of existing 32-bit sbrk() must be turned back to signed again.
So is the idea that rather than updating all our malloc implementations to chop up
sbrk()
calls into in 2gb chunks, we would instead move them all over to usingsbrk64
?
That's true. At first I was thinking that callers can just chop their allocs to 2gb pieces, but the issue there is that then there won't be a guarantee that such sbrk() allocs are contiguous.
For example in a multithreaded program with bdwgc, different threads can race to sbrk() for native malloc(), and bdwgc races to sbrk() for itself. So if it was patched to alloc in 2gb slices, another thread could come in between, and the result for the chopped alloc would result in noncontinuous >2GB memory area being received, which would be catastrophic.
So best to just provide a int64_t utilizing sbrk64() api instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should land this change using
_sbrk64()
and then we can bikeshed the final solution we want in #25138.Once we make the new function public we can declare it in a header and remove the underscore prefix.
Ok, updated.
system/lib/emmalloc.c
Outdated
@@ -61,6 +61,8 @@ | |||
#include <emscripten/heap.h> | |||
#include <emscripten/threading.h> | |||
|
|||
void *sbrk64(int64_t numBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets call this _sbrk64
to avoid polluting the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be important for end users to be able to call this function as well. I am already thinking that at Unity we'll patch bdwgc to call to this fixed sbrk64() function.
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (16) test expectation files were updated by running the tests with `--rebaseline`: ``` code_size/test_codesize_cxx_ctors1.json: 149263 => 149258 [-5 bytes / -0.00%] code_size/test_codesize_cxx_ctors2.json: 148668 => 148663 [-5 bytes / -0.00%] code_size/test_codesize_cxx_except.json: 194686 => 194681 [-5 bytes / -0.00%] code_size/test_codesize_cxx_except_wasm.json: 164273 => 164268 [-5 bytes / -0.00%] code_size/test_codesize_cxx_except_wasm_legacy.json: 161862 => 161857 [-5 bytes / -0.00%] code_size/test_codesize_cxx_mangle.json: 258777 => 258772 [-5 bytes / -0.00%] code_size/test_codesize_cxx_noexcept.json: 151680 => 151675 [-5 bytes / -0.00%] code_size/test_codesize_cxx_wasmfs.json: 176940 => 176935 [-5 bytes / -0.00%] code_size/test_codesize_files_wasmfs.json: 55780 => 55775 [-5 bytes / -0.01%] code_size/test_codesize_hello_dylink_all.json: 844094 => 844094 [+0 bytes / +0.00%] code_size/test_codesize_mem_O3.json: 9659 => 9654 [-5 bytes / -0.05%] code_size/test_codesize_mem_O3_grow.json: 9945 => 9940 [-5 bytes / -0.05%] code_size/test_codesize_mem_O3_grow_standalone.json: 9672 => 9667 [-5 bytes / -0.05%] code_size/test_codesize_mem_O3_standalone.json: 9530 => 9525 [-5 bytes / -0.05%] code_size/test_codesize_mem_O3_standalone_lib.json: 8841 => 8836 [-5 bytes / -0.06%] code_size/test_codesize_mem_O3_standalone_narg.json: 8870 => 8865 [-5 bytes / -0.06%] Average change: -0.02% (-0.06% - +0.00%) ```
Hmm I'm having trouble landing this: I can't get the rebaseline_expectations.py script to create expectations that will match what's desired. (tried running the script on Windows and on Linux) Would it make sense to land, and then run the GitHub Action? |
Thats one option. But using |
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (31) test expectation files were updated by running the tests with `--rebaseline`: ``` code_size/test_codesize_cxx_ctors1.json: 149258 => 149271 [+13 bytes / +0.01%] code_size/test_codesize_cxx_ctors2.json: 148663 => 148676 [+13 bytes / +0.01%] code_size/test_codesize_cxx_except.json: 194681 => 194694 [+13 bytes / +0.01%] code_size/test_codesize_cxx_except_wasm.json: 164268 => 164281 [+13 bytes / +0.01%] code_size/test_codesize_cxx_except_wasm_legacy.json: 161857 => 161870 [+13 bytes / +0.01%] code_size/test_codesize_cxx_lto.json: 125545 => 125540 [-5 bytes / -0.00%] code_size/test_codesize_cxx_mangle.json: 258772 => 258785 [+13 bytes / +0.01%] code_size/test_codesize_cxx_noexcept.json: 151675 => 151688 [+13 bytes / +0.01%] code_size/test_codesize_cxx_wasmfs.json: 176935 => 176948 [+13 bytes / +0.01%] code_size/test_codesize_files_wasmfs.json: 55775 => 55786 [+11 bytes / +0.02%] code_size/test_codesize_hello_dylink.json: 45543 => 45546 [+3 bytes / +0.01%] code_size/test_codesize_hello_dylink_all.json: 844094 => 844086 [-8 bytes / -0.00%] code_size/test_codesize_mem_O3.json: 9654 => 9666 [+12 bytes / +0.12%] code_size/test_codesize_mem_O3_grow.json: 9940 => 9952 [+12 bytes / +0.12%] code_size/test_codesize_mem_O3_grow_standalone.json: 9667 => 9679 [+12 bytes / +0.12%] code_size/test_codesize_mem_O3_standalone.json: 9525 => 9537 [+12 bytes / +0.13%] code_size/test_codesize_mem_O3_standalone_lib.json: 8836 => 8848 [+12 bytes / +0.14%] code_size/test_codesize_mem_O3_standalone_narg.json: 8865 => 8877 [+12 bytes / +0.14%] code_size/test_codesize_mem_O3_standalone_narg_flto.json: 7682 => 7677 [-5 bytes / -0.07%] code_size/test_codesize_minimal_pthreads.json: 27251 => 27242 [-9 bytes / -0.03%] code_size/test_codesize_minimal_pthreads_memgrowth.json: 27679 => 27670 [-9 bytes / -0.03%] code_size/test_minimal_runtime_code_size_audio_worklet.json: 5892 => 5882 [-10 bytes / -0.17%] code_size/test_minimal_runtime_code_size_hello_embind.json: 15122 => 15112 [-10 bytes / -0.07%] code_size/test_minimal_runtime_code_size_hello_embind_val.json: 11760 => 11750 [-10 bytes / -0.09%] code_size/test_minimal_runtime_code_size_hello_wasm_worker.json: 3215 => 3203 [-12 bytes / -0.37%] code_size/test_minimal_runtime_code_size_hello_webgl2_wasm.json: 13208 => 13190 [-18 bytes / -0.14%] code_size/test_minimal_runtime_code_size_hello_webgl2_wasm2js.json: 18553 => 18536 [-17 bytes / -0.09%] code_size/test_minimal_runtime_code_size_hello_webgl_wasm.json: 12746 => 12728 [-18 bytes / -0.14%] code_size/test_minimal_runtime_code_size_hello_webgl_wasm2js.json: 18079 => 18063 [-16 bytes / -0.09%] code_size/test_minimal_runtime_code_size_random_printf_wasm.json: 12507 => 12507 [+0 bytes / +0.00%] code_size/test_unoptimized_code_size.json: 174633 => 174633 [+0 bytes / +0.00%] Average change: -0.01% (-0.37% - +0.14%) ```
Ok, rebaselined with |
Found problems:
__alignof__(max_align_t)
) This can create gaps in the alignment of emmalloc heap end sentinel region and the previous free region. This must require changing the assumption in emmalloc that heap end sentinel region would always be fixed 16b -> allow it to be arbitrary sized (in practice 16-31b).Fixes: #23343