Skip to content

Fix binary emitting of br_if with a refined value by emitting a cast #6510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 105 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
f835f74
uilds
kripken Apr 11, 2024
0ae495f
builds
kripken Apr 11, 2024
7c7ff86
todo
kripken Apr 11, 2024
a1075d2
tests
kripken Apr 11, 2024
a787489
test
kripken Apr 11, 2024
497d0b8
fix
kripken Apr 11, 2024
097970b
same
kripken Apr 11, 2024
23f006c
fix
kripken Apr 11, 2024
1233da1
test
kripken Apr 11, 2024
a5e6037
fix
kripken Apr 11, 2024
47bdc44
test notes
kripken Apr 11, 2024
d152156
fix
kripken Apr 11, 2024
e285a11
fix
kripken Apr 11, 2024
205c4e9
fix
kripken Apr 11, 2024
cfdd78f
fix
kripken Apr 11, 2024
585f9c8
test, sad
kripken Apr 11, 2024
57879cf
typo
kripken Apr 15, 2024
235af4f
thinking out loud
kripken Apr 15, 2024
90dd4b8
work
kripken Apr 15, 2024
532c214
yolo
kripken Apr 16, 2024
4342d57
work
kripken Apr 16, 2024
e8d5f6b
work
kripken Apr 16, 2024
96c2bce
work
kripken Apr 16, 2024
631adb9
work?
kripken Apr 16, 2024
f829de3
work?
kripken Apr 16, 2024
3f0ba33
work?
kripken Apr 16, 2024
bb762f9
work
kripken Apr 16, 2024
bd82ccd
fix
kripken Apr 16, 2024
f6c5c19
fix
kripken Apr 16, 2024
d73efd9
work
kripken Apr 16, 2024
a390d3c
fix
kripken Apr 16, 2024
17bf39e
simplr
kripken Apr 16, 2024
ea37c27
switch
kripken Apr 17, 2024
a047fbf
undo
kripken Apr 17, 2024
799c03a
restart
kripken Apr 17, 2024
8a0fea1
part.build
kripken Apr 17, 2024
5650369
work
kripken Apr 17, 2024
fa6c3a2
work
kripken Apr 17, 2024
ae6a400
work
kripken Apr 17, 2024
5dc87a5
fix
kripken Apr 17, 2024
ac1de14
work
kripken Apr 17, 2024
e74ad5e
work
kripken Apr 17, 2024
4efb784
work
kripken Apr 17, 2024
d91a9cf
work
kripken Apr 17, 2024
92e3e63
work
kripken Apr 17, 2024
9c84a04
fix
kripken Apr 17, 2024
726eed5
work
kripken Apr 17, 2024
a9b1642
test
kripken Apr 17, 2024
9d4c16f
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken Apr 17, 2024
148479b
comments
kripken Apr 17, 2024
6d005ca
format
kripken Apr 17, 2024
8a69cee
fix
kripken Apr 17, 2024
a009110
fix
kripken Apr 17, 2024
85e9d4d
nfc
kripken Apr 17, 2024
7eed7fe
fix
kripken Apr 17, 2024
3a18cae
fix
kripken Apr 17, 2024
34db25a
typo
kripken Apr 17, 2024
a96a4c9
comment
kripken Apr 17, 2024
769d792
comment
kripken Apr 17, 2024
41ba975
remove
kripken Apr 18, 2024
16dba3e
remove
kripken Apr 18, 2024
fe530c1
remove
kripken Apr 18, 2024
a4b346e
remove
kripken Apr 18, 2024
9176802
test
kripken Apr 18, 2024
ae0d5fb
comment
kripken Apr 18, 2024
0b4dbde
undo
kripken Apr 18, 2024
a83f7be
undo
kripken Apr 18, 2024
f90858a
test
kripken Apr 18, 2024
4cd8430
moar
kripken Apr 25, 2024
76c74d2
work
kripken Apr 25, 2024
9d37132
workaround
kripken Apr 25, 2024
57f1536
format
kripken Apr 25, 2024
00cb780
work
kripken Apr 25, 2024
30a18c8
test
kripken Apr 25, 2024
8515ad9
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken Apr 29, 2024
f5449d8
Merge remote-tracking branch 'myself/fuzz.string.measure' into br_if.…
kripken Apr 29, 2024
1f07285
fix bug due to interaction with #6549
kripken Apr 29, 2024
62b20ea
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken Apr 29, 2024
4e6ff90
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken Apr 29, 2024
bab64f5
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken May 6, 2024
37a3aa6
undo core chnages in prep for reworking
kripken May 14, 2024
b0fbd98
halfway..
kripken May 14, 2024
136f191
done..?
kripken May 14, 2024
7c0568b
format
kripken May 14, 2024
7bf4103
undo
kripken May 14, 2024
adebbdd
format.again
kripken May 14, 2024
ea0a050
fixen
kripken May 14, 2024
3170ac9
builds but errors
kripken May 14, 2024
e2eafea
fix. now works
kripken May 14, 2024
55a13fd
fix
kripken May 14, 2024
f663842
text
kripken May 14, 2024
70f4812
fix
kripken May 14, 2024
ad87777
fix
kripken May 14, 2024
f8e13e0
test
kripken May 14, 2024
4d40f31
fix
kripken May 14, 2024
20df949
fix
kripken May 14, 2024
cd94ea1
test
kripken May 14, 2024
d4294df
text
kripken May 14, 2024
bc4f68e
text
kripken May 14, 2024
f572f54
text
kripken May 14, 2024
5301467
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken May 15, 2024
5a5a2af
optimize
kripken May 15, 2024
2e79f34
Remove some complexity now that visitRefCast is simpler
kripken May 15, 2024
1d6497c
merge oops
kripken May 15, 2024
03eb588
Merge remote-tracking branch 'origin/main' into br_if.alternative
kripken May 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,16 @@ There are a few differences between Binaryen IR and the WebAssembly language:
much about this when writing Binaryen passes. For more details see the
`requiresNonNullableLocalFixups()` hook in `pass.h` and the
`LocalStructuralDominance` class.
* `br_if` output types are more refined in Binaryen IR: they have the type of
the value, when a value flows in. In the wasm spec the type is that of the
branch target, which may be less refined. Using the more refined type here
ensures that we optimize in the best way possible, using all the type
information, but it does mean that some roundtripping operations may look a
little different. In particular, when we emit a `br_if` whose type is more
refined in Binaryen IR then we emit a cast right after it, so that the
output has the right type in the wasm spec. That may cause a few bytes of
extra size in rare cases (we avoid this overhead in the common case where
the `br_if` value is unused).
* Strings
* Binaryen allows string views (`stringview_wtf16` etc.) to be cast using
`ref.cast`. This simplifies the IR, as it allows `ref.cast` to always be
Expand Down
19 changes: 18 additions & 1 deletion scripts/fuzz_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,24 @@ def is_git_repo():
'fannkuch3_manyopts_dwarf.wasm',
'fib2_emptylocspan_dwarf.wasm',
'fannkuch3_dwarf.wasm',
'dwarf-local-order.wasm',
'strip-producers.wasm',
'multi_unit_abbrev_noprint.wasm',
'reverse_dwarf_abbrevs.wasm',
'print_g.wasm',
'print_g_strip-dwarf.wasm',
'fannkuch0_dwarf.wasm',
'dwarfdump_roundtrip_dwarfdump.wasm',
'dwarfdump.wasm',
'fannkuch3_dwarf.wasm',
'dwarf-local-order.wasm',
'dwarf_unit_with_no_abbrevs_noprint.wasm',
'strip-debug.wasm',
'multi_line_table_dwarf.wasm',
'dwarf_with_exceptions.wasm',
'strip-dwarf.wasm',
'ignore_missing_func_dwarf.wasm',
'print.wasm',
# TODO fuzzer support for multimemory
'multi-memories-atomics64.wast',
'multi-memories-basics.wast',
Expand Down Expand Up @@ -1183,7 +1200,7 @@ def filter_exports(wasm, output, keep):
f.write(json.dumps(graph))

# prune the exports
run([in_bin('wasm-metadce'), wasm, '-o', output, '--graph-file', 'graph.json', '-all'])
run([in_bin('wasm-metadce'), wasm, '-o', output, '--graph-file', 'graph.json'] + FEATURE_OPTS)


# Fuzz the interpreter with --fuzz-exec -tnh. The tricky thing with traps-never-
Expand Down
15 changes: 15 additions & 0 deletions src/wasm-stack.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,21 @@ class BinaryInstWriter : public OverriddenVisitor<BinaryInstWriter> {
// and StringSliceWTF if their non-string operands are already LocalGets.
// Record those LocalGets here.
std::unordered_set<LocalGet*> deferredGets;

// Track which br_ifs need handling of their output values, which is the case
// when they have a value that is more refined than the wasm type system
// allows atm (and they are not dropped, in which case the type would not
// matter). See https://github.com/WebAssembly/binaryen/pull/6390 for more on
// the difference. As a result of the difference, we will insert extra casts
// to ensure validation in the wasm spec. The wasm spec will hopefully improve
// to use the more refined type as well, which would remove the need for this
// hack.
//
// Each br_if present as a key here is mapped to the unrefined type for it.
// That is, the br_if has a type in Binaryen IR that is too refined, and the
// map contains the unrefined one (which we need to know the local types, as
// we'll stash the unrefined values and then cast them).
std::unordered_map<Break*, Type> brIfsNeedingHandling;
};

// Takes binaryen IR and converts it to something else (binary or stack IR)
Expand Down
159 changes: 157 additions & 2 deletions src/wasm/wasm-stack.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,56 @@ void BinaryInstWriter::visitLoop(Loop* curr) {
void BinaryInstWriter::visitBreak(Break* curr) {
o << int8_t(curr->condition ? BinaryConsts::BrIf : BinaryConsts::Br)
<< U32LEB(getBreakIndex(curr->name));

// See comment on |brIfsNeedingHandling| for the extra casts we need to emit
// here for certain br_ifs.
auto iter = brIfsNeedingHandling.find(curr);
if (iter != brIfsNeedingHandling.end()) {
auto unrefinedType = iter->second;
auto type = curr->type;
assert(type.size() == unrefinedType.size());

assert(curr->type.hasRef());

auto emitCast = [&](Type to) {
// Shim a tiny bit of IR, just enough to get visitRefCast to see what we
// are casting, and to emit the proper thing.
RefCast cast;
cast.type = to;
cast.ref = nullptr;
visitRefCast(&cast);
};

if (!type.isTuple()) {
// Simple: Just emit a cast, and then the type matches Binaryen IR's.
emitCast(type);
} else {
// Tuples are trickier to handle, and we need to use scratch locals. Stash
// all the values on the stack to those locals, then reload them, casting
// as we go.
//
// We must track how many scratch locals we've used from each type as we
// go, as a type might appear multiple times in the tuple. We allocated
// enough for each, in a contiguous range, so we just increment as we go.
std::unordered_map<Type, Index> scratchTypeUses;
for (Index i = 0; i < unrefinedType.size(); i++) {
auto t = unrefinedType[unrefinedType.size() - i - 1];
assert(scratchLocals.find(t) != scratchLocals.end());
auto localIndex = scratchLocals[t] + scratchTypeUses[t]++;
o << int8_t(BinaryConsts::LocalSet) << U32LEB(localIndex);
}
for (Index i = 0; i < unrefinedType.size(); i++) {
auto t = unrefinedType[i];
auto localIndex = scratchLocals[t] + --scratchTypeUses[t];
o << int8_t(BinaryConsts::LocalGet) << U32LEB(localIndex);
if (t.isRef()) {
// Note that we cast all types here, when perhaps only some of the
// tuple's lanes need that. This is simpler.
emitCast(type[i]);
}
}
}
}
}

void BinaryInstWriter::visitSwitch(Switch* curr) {
Expand Down Expand Up @@ -2664,11 +2714,116 @@ InsertOrderedMap<Type, Index> BinaryInstWriter::countScratchLocals() {
auto& count = scratches[Type::i32];
count = std::max(count, numScratches);
}
};

ScratchLocalFinder finder(*this);
// As mentioned in BinaryInstWriter::visitBreak, the type of br_if with a
// value may be more refined in Binaryen IR compared to the wasm spec, as we
// give it the type of the value, while the spec gives it the type of the
// block it targets. To avoid problems we must handle the case where a br_if
// has a value, the value is more refined then the target, and the value is
// not dropped (the last condition is very rare in real-world wasm, making
// all of this a quite unusual situation). First, detect such situations by
// seeing if we have br_ifs that return reference types at all. We do so by
// counting them, and as we go we ignore ones that are dropped, since a
// dropped value is not a problem for us.
//
// Note that we do not check all the conditions here, such as if the type
// matches the break target, or if the parent is a cast, which we leave for
// a more expensive analysis later, which we only run if we see something
// suspicious here.
Index numDangerousBrIfs = 0;

void visitBreak(Break* curr) {
if (curr->type.hasRef()) {
numDangerousBrIfs++;
}
}

void visitDrop(Drop* curr) {
if (curr->value->is<Break>() && curr->value->type.hasRef()) {
// The value is exactly a br_if of a ref, that we just visited before
// us. Undo the ++ from there as it can be ignored.
assert(numDangerousBrIfs > 0);
numDangerousBrIfs--;
}
}
} finder(*this);
finder.walk(func->body);

if (!finder.numDangerousBrIfs || !parent.getModule()->features.hasGC()) {
// Nothing more to do: either no such br_ifs, or GC is not enabled.
//
// The explicit check for GC is here because if only reference types are
// enabled then we still may seem to need a fixup here, e.g. if a ref.func
// is br_if'd to a block of type funcref. But that only appears that way
// because in Binaryen IR we allow non-nullable types even without GC (and
// if GC is not enabled then we always emit nullable types in the binary).
// That is, even if we see a type difference without GC, it will vanish in
// the binary format; there is never a need to add any ref.casts without GC
// being enabled.
return std::move(finder.scratches);
}

// There are dangerous-looking br_ifs, so we must do the harder work to
// actually investigate them in detail, including tracking block types. By
// being fully precise here, we'll only emit casts when absolutely necessary,
// which avoids repeated roundtrips adding more and more code.
struct RefinementScanner : public ExpressionStackWalker<RefinementScanner> {
BinaryInstWriter& writer;
ScratchLocalFinder& finder;

RefinementScanner(BinaryInstWriter& writer, ScratchLocalFinder& finder)
: writer(writer), finder(finder) {}

void visitBreak(Break* curr) {
// See if this is one of the dangerous br_ifs we must handle.
if (!curr->type.hasRef()) {
// Not even a reference.
return;
}
auto* parent = getParent();
if (parent) {
if (parent->is<Drop>()) {
// It is dropped anyhow.
return;
}
if (auto* cast = parent->dynCast<RefCast>()) {
if (Type::isSubType(cast->type, curr->type)) {
// It is cast to the same type or a better one. In particular this
// handles the case of repeated roundtripping: After the first
// roundtrip we emit a cast that we'll identify here, and not emit
// an additional one.
return;
}
}
}
auto* breakTarget = findBreakTarget(curr->name);
auto unrefinedType = breakTarget->type;
if (unrefinedType == curr->type) {
// It has the proper type anyhow.
return;
}
Comment on lines +2801 to +2804
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #6579 there will be no need to know the source type when emitting a cast, so this check will be the only remaining use for unrefinedType. This is also the only check that cannot be done as-is in ScratchLocalFinder. WDYT about making ScratchLocalFinder an ExpressionStackWalker to avoid needing this separate walker at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead is annoying, I'm afraid, and I'd really like to not regress the by-far common case of not needing the extra work.

Some data:

$ perf stat -r 5 bin/wasm-opt UE4Game-HTML5-Shipping.wasm -all -o a.wasm
warning: no passes specified, not doing any work
warning: no passes specified, not doing any work
warning: no passes specified, not doing any work
warning: no passes specified, not doing any work
warning: no passes specified, not doing any work

 Performance counter stats for 'bin/wasm-opt UE4Game-HTML5-Shipping.wasm -all -o a.wasm' (5 runs):

         10,947.27 msec task-clock                       #    1.767 CPUs utilized               ( +-  0.38% )
               855      context-switches                 #   78.102 /sec                        ( +-  9.24% )
                40      cpu-migrations                   #    3.654 /sec                        ( +- 11.87% )
           239,470      page-faults                      #   21.875 K/sec                       ( +-  0.07% )
    34,826,556,461      cycles                           #    3.181 GHz                         ( +-  0.27% )
    48,135,304,544      instructions                     #    1.38  insn per cycle              ( +-  0.01% )
    10,101,559,953      branches                         #  922.747 M/sec                       ( +-  0.02% )
       150,681,797      branch-misses                    #    1.49% of all branches             ( +-  1.17% )

            6.1962 +- 0.0404 seconds time elapsed  ( +-  0.65% )

And, with this patch:

$ git diff
diff --git a/src/wasm/wasm-stack.cpp b/src/wasm/wasm-stack.cpp
index 53a25d52b..1a83f56cb 100644
--- a/src/wasm/wasm-stack.cpp
+++ b/src/wasm/wasm-stack.cpp
@@ -2681,7 +2681,7 @@ void BinaryInstWriter::noteLocalType(Type type, Index count) {
 }
 
 InsertOrderedMap<Type, Index> BinaryInstWriter::countScratchLocals() {
-  struct ScratchLocalFinder : PostWalker<ScratchLocalFinder> {
+  struct ScratchLocalFinder : ExpressionStackWalker<ScratchLocalFinder> {
     BinaryInstWriter& parent;
     InsertOrderedMap<Type, Index> scratches;
$ perf stat -r 5 bin/wasm-opt UE4Game-HTML5-Shipping.wasm -all -o b.wasm

 Performance counter stats for 'bin/wasm-opt UE4Game-HTML5-Shipping.wasm -all -o b.wasm' (5 runs):

         11,311.23 msec task-clock                       #    1.727 CPUs utilized               ( +-  0.26% )
               720      context-switches                 #   63.654 /sec                        ( +-  5.01% )
                29      cpu-migrations                   #    2.564 /sec                        ( +-  8.30% )
           238,817      page-faults                      #   21.113 K/sec                       ( +-  0.04% )
    36,257,517,592      cycles                           #    3.205 GHz                         ( +-  0.14% )
    50,906,632,590      instructions                     #    1.40  insn per cycle              ( +-  0.66% )
    10,525,197,435      branches                         #  930.509 M/sec                       ( +-  0.01% )
       155,907,135      branch-misses                    #    1.48% of all branches             ( +-  0.70% )

            6.5514 +- 0.0335 seconds time elapsed  ( +-  0.51% )

That's about 5% more instructions and walltime. Note that that measures load+save, so the impact on save is around double that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, that's way too much slowdown :(


// Mark the br_if as needing handling, and add the type to the set of
// types we need scratch tuple locals for (if relevant).
writer.brIfsNeedingHandling[curr] = unrefinedType;

if (unrefinedType.isTuple()) {
// We must allocate enough scratch locals for this tuple. Note that we
// may need more than one per type in the tuple, if a type appears more
// than once, so we count their appearances.
InsertOrderedMap<Type, Index> scratchTypeUses;
for (auto t : unrefinedType) {
scratchTypeUses[t]++;
}
for (auto& [type, uses] : scratchTypeUses) {
auto& count = finder.scratches[type];
count = std::max(count, uses);
}
}
}
} refinementScanner(*this, finder);
refinementScanner.walk(func->body);

return std::move(finder.scratches);
}

Expand Down
Loading