Skip to content

Commit ee7cc00

Browse files
TimonPostcfallin
andauthored
Backport: "Cranelift: avoid quadratic behavior in label-fixup processing" to 12.0.1 (bytecodealliance#6897)
* Cranelift: avoid quadratic behavior in label-fixup processing (bytecodealliance#6804) * WIP: two-tier processing for MachBuffer veneer processing in island emission * Unconditionally emit veneers for label forward-refs that cross islands. * Clean up and fix tests * Review feedback * Add some more detailed comments. * Update the releases file to reflect the patch --------- Co-authored-by: Chris Fallin <[email protected]>
1 parent 54cbe5f commit ee7cc00

File tree

5 files changed

+160
-47
lines changed

5 files changed

+160
-47
lines changed

RELEASES.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
--------------------------------------------------------------------------------
22

3+
## 12.0.1
4+
5+
Unreleased
6+
7+
### Fixed
8+
9+
* Optimized the cranelift compilation on aarch64 for large wasm modules.
10+
[#6804](https://github.com/bytecodealliance/wasmtime/pull/6804)
11+
312
## 12.0.0
413

514
Released 2023-08-21.
@@ -56,9 +65,6 @@ Released 2023-08-21.
5665
instead of returning an error.
5766
[#6776](https://github.com/bytecodealliance/wasmtime/pull/6776)
5867

59-
* Optimized the cranelift compilation on aarch64 for large wasm modules.
60-
[#6804](https://github.com/bytecodealliance/wasmtime/pull/6804)
61-
6268
### Changed
6369

6470
* Empty types are no longer allowed in the component model.

cranelift/codegen/src/isa/aarch64/inst/emit.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3590,7 +3590,7 @@ impl MachInstEmit for Inst {
35903590
dest: BranchTarget::Label(jump_around_label),
35913591
};
35923592
jmp.emit(&[], sink, emit_info, state);
3593-
sink.emit_island(needed_space + 4, &mut state.ctrl_plane);
3593+
sink.emit_island(&mut state.ctrl_plane);
35943594
sink.bind_label(jump_around_label, &mut state.ctrl_plane);
35953595
}
35963596
}
@@ -3789,7 +3789,7 @@ fn emit_return_call_common_sequence(
37893789
dest: BranchTarget::Label(jump_around_label),
37903790
};
37913791
jmp.emit(&[], sink, emit_info, state);
3792-
sink.emit_island(space_needed + 4, &mut state.ctrl_plane);
3792+
sink.emit_island(&mut state.ctrl_plane);
37933793
sink.bind_label(jump_around_label, &mut state.ctrl_plane);
37943794
}
37953795

cranelift/codegen/src/isa/riscv64/inst/emit.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,7 +1189,7 @@ impl MachInstEmit for Inst {
11891189
// we need to emit a jump table here to support that jump.
11901190
let distance = (targets.len() * 2 * Inst::INSTRUCTION_SIZE as usize) as u32;
11911191
if sink.island_needed(distance) {
1192-
sink.emit_island(distance, &mut state.ctrl_plane);
1192+
sink.emit_island(&mut state.ctrl_plane);
11931193
}
11941194

11951195
// Emit the jumps back to back
@@ -3132,7 +3132,7 @@ fn emit_return_call_common_sequence(
31323132
dest: BranchTarget::Label(jump_around_label),
31333133
}
31343134
.emit(&[], sink, emit_info, state);
3135-
sink.emit_island(space_needed + 4, &mut state.ctrl_plane);
3135+
sink.emit_island(&mut state.ctrl_plane);
31363136
sink.bind_label(jump_around_label, &mut state.ctrl_plane);
31373137
}
31383138

cranelift/codegen/src/machinst/buffer.rs

Lines changed: 146 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,52 @@
139139
//!
140140
//! Given these invariants, we argue why each optimization preserves execution
141141
//! semantics below (grep for "Preserves execution semantics").
142+
//!
143+
//! # Avoiding Quadratic Behavior
144+
//!
145+
//! There are two cases where we've had to take some care to avoid
146+
//! quadratic worst-case behavior:
147+
//!
148+
//! - The "labels at this branch" list can grow unboundedly if the
149+
//! code generator binds many labels at one location. If the count
150+
//! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
151+
//! simply abort an optimization early in a way that is always correct
152+
//! but is conservative.
153+
//!
154+
//! - The fixup list can interact with island emission to create
155+
//! "quadratic island behvior". In a little more detail, one can hit
156+
//! this behavior by having some pending fixups (forward label
157+
//! references) with long-range label-use kinds, and some others
158+
//! with shorter-range references that nonetheless still are pending
159+
//! long enough to trigger island generation. In such a case, we
160+
//! process the fixup list, generate veneers to extend some forward
161+
//! references' ranges, but leave the other (longer-range) ones
162+
//! alone. The way this was implemented put them back on a list and
163+
//! resulted in quadratic behavior.
164+
//!
165+
//! To avoid this, we could use a better data structure that allows
166+
//! us to query for fixups with deadlines "coming soon" and generate
167+
//! veneers for only those fixups. However, there is some
168+
//! interaction with the branch peephole optimizations: the
169+
//! invariant there is that branches in the "most recent branches
170+
//! contiguous with end of buffer" list have corresponding fixups in
171+
//! order (so that when we chomp the branch, we can chomp its fixup
172+
//! too).
173+
//!
174+
//! So instead, when we generate an island, for now we create
175+
//! veneers for *all* pending fixups, then if upgraded to a kind
176+
//! that no longer supports veneers (is at "max range"), kick the
177+
//! fixups off to a list that is *not* processed at islands except
178+
//! for one last pass after emission. This allows us to skip the
179+
//! work and avoids the quadratic behvior. We expect that this is
180+
//! fine-ish for now: islands are relatively rare, and if they do
181+
//! happen and generate unnecessary veneers (as will now happen for
182+
//! the case above) we'll only get one unnecessary veneer per
183+
//! branch (then they are at max range already).
184+
//!
185+
//! Longer-term, we could use a data structure that allows querying
186+
//! by deadline, as long as we can properly chomp just-added fixups
187+
//! when chomping branches.
142188
143189
use crate::binemit::{Addend, CodeOffset, Reloc, StackMap};
144190
use crate::ir::{ExternalName, Opcode, RelSourceLoc, SourceLoc, TrapCode};
@@ -150,7 +196,7 @@ use crate::timing;
150196
use crate::trace;
151197
use cranelift_control::ControlPlane;
152198
use cranelift_entity::{entity_impl, PrimaryMap};
153-
use smallvec::SmallVec;
199+
use smallvec::{smallvec, SmallVec};
154200
use std::convert::TryFrom;
155201
use std::mem;
156202
use std::string::String;
@@ -190,6 +236,18 @@ impl CompilePhase for Final {
190236
type SourceLocType = SourceLoc;
191237
}
192238

239+
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
240+
enum ForceVeneers {
241+
Yes,
242+
No,
243+
}
244+
245+
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
246+
enum IsLastIsland {
247+
Yes,
248+
No,
249+
}
250+
193251
/// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
194252
/// in bulk.
195253
///
@@ -234,6 +292,10 @@ pub struct MachBuffer<I: VCodeInst> {
234292
pending_traps: SmallVec<[MachLabelTrap; 16]>,
235293
/// Fixups that must be performed after all code is emitted.
236294
fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
295+
/// Fixups whose labels are at maximum range already: these need
296+
/// not be considered in island emission until we're done
297+
/// emitting.
298+
fixup_records_max_range: SmallVec<[MachLabelFixup<I>; 16]>,
237299
/// Current deadline at which all constants are flushed and all code labels
238300
/// are extended by emitting long-range jumps in an island. This flush
239301
/// should be rare (e.g., on AArch64, the shortest-range PC-rel references
@@ -389,6 +451,7 @@ impl<I: VCodeInst> MachBuffer<I> {
389451
pending_constants: SmallVec::new(),
390452
pending_traps: SmallVec::new(),
391453
fixup_records: SmallVec::new(),
454+
fixup_records_max_range: SmallVec::new(),
392455
island_deadline: UNKNOWN_LABEL_OFFSET,
393456
island_worst_case_size: 0,
394457
latest_branches: SmallVec::new(),
@@ -1157,27 +1220,24 @@ impl<I: VCodeInst> MachBuffer<I> {
11571220
/// Should only be called if `island_needed()` returns true, i.e., if we
11581221
/// actually reach a deadline. It's not necessarily a problem to do so
11591222
/// otherwise but it may result in unnecessary work during emission.
1160-
pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) {
1161-
self.emit_island_maybe_forced(false, distance, ctrl_plane);
1223+
pub fn emit_island(&mut self, ctrl_plane: &mut ControlPlane) {
1224+
self.emit_island_maybe_forced(ForceVeneers::No, IsLastIsland::No, ctrl_plane);
11621225
}
11631226

11641227
/// Same as `emit_island`, but an internal API with a `force_veneers`
11651228
/// argument to force all veneers to always get emitted for debugging.
11661229
fn emit_island_maybe_forced(
11671230
&mut self,
1168-
force_veneers: bool,
1169-
distance: CodeOffset,
1231+
force_veneers: ForceVeneers,
1232+
last_island: IsLastIsland,
11701233
ctrl_plane: &mut ControlPlane,
11711234
) {
11721235
// We're going to purge fixups, so no latest-branch editing can happen
11731236
// anymore.
11741237
self.latest_branches.clear();
11751238

11761239
// Reset internal calculations about islands since we're going to
1177-
// change the calculus as we apply fixups. The `forced_threshold` is
1178-
// used here to determine whether jumps to unknown labels will require
1179-
// a veneer or not.
1180-
let forced_threshold = self.worst_case_end_of_island(distance);
1240+
// change the calculus as we apply fixups.
11811241
self.island_deadline = UNKNOWN_LABEL_OFFSET;
11821242
self.island_worst_case_size = 0;
11831243

@@ -1232,7 +1292,14 @@ impl<I: VCodeInst> MachBuffer<I> {
12321292
self.get_appended_space(size);
12331293
}
12341294

1235-
for fixup in mem::take(&mut self.fixup_records) {
1295+
let last_island_fixups = match last_island {
1296+
IsLastIsland::Yes => mem::take(&mut self.fixup_records_max_range),
1297+
IsLastIsland::No => smallvec![],
1298+
};
1299+
for fixup in mem::take(&mut self.fixup_records)
1300+
.into_iter()
1301+
.chain(last_island_fixups.into_iter())
1302+
{
12361303
trace!("emit_island: fixup {:?}", fixup);
12371304
let MachLabelFixup {
12381305
label,
@@ -1275,7 +1342,8 @@ impl<I: VCodeInst> MachBuffer<I> {
12751342
kind.max_neg_range()
12761343
);
12771344

1278-
if (force_veneers && kind.supports_veneer()) || veneer_required {
1345+
if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required
1346+
{
12791347
self.emit_veneer(label, offset, kind);
12801348
} else {
12811349
let slice = &mut self.data[start..end];
@@ -1284,21 +1352,43 @@ impl<I: VCodeInst> MachBuffer<I> {
12841352
}
12851353
} else {
12861354
// If the offset of this label is not known at this time then
1287-
// there's one of two possibilities:
1355+
// there are three possibilities:
12881356
//
1289-
// * First we may be about to exceed the maximum jump range of
1290-
// this fixup. In that case a veneer is inserted to buy some
1291-
// more budget for the forward-jump. It's guaranteed that the
1292-
// label will eventually come after where we're at, so we know
1293-
// that the forward jump is necessary.
1357+
// 1. It's possible that the label is already a "max
1358+
// range" label: a veneer would not help us any,
1359+
// and so we need not consider the label during
1360+
// island emission any more until the very end (the
1361+
// last "island" pass). In this case we kick the
1362+
// label into a separate list to process once at
1363+
// the end, to avoid quadratic behavior (see
1364+
// "quadratic island behavior" above, and issue
1365+
// #6798).
12941366
//
1295-
// * Otherwise we're still within range of the forward jump but
1296-
// the precise target isn't known yet. In that case we
1297-
// enqueue the fixup to get processed later.
1298-
if forced_threshold - offset > kind.max_pos_range() {
1299-
self.emit_veneer(label, offset, kind);
1367+
// 2. Or, we may be about to exceed the maximum jump range of
1368+
// this fixup. In that case a veneer is inserted to buy some
1369+
// more budget for the forward-jump. It's guaranteed that the
1370+
// label will eventually come after where we're at, so we know
1371+
// that the forward jump is necessary.
1372+
//
1373+
// 3. Otherwise, we're still within range of the
1374+
// forward jump but the precise target isn't known
1375+
// yet. In that case, to avoid quadratic behavior
1376+
// (again, see above), we emit a veneer and if the
1377+
// resulting label-use fixup is then max-range, we
1378+
// put it in the max-range list. We could enqueue
1379+
// the fixup for processing later, and this would
1380+
// enable slightly fewer veneers, but islands are
1381+
// relatively rare and the cost of "upgrading" all
1382+
// forward label refs that cross an island should
1383+
// be relatively low.
1384+
if !kind.supports_veneer() {
1385+
self.fixup_records_max_range.push(MachLabelFixup {
1386+
label,
1387+
offset,
1388+
kind,
1389+
});
13001390
} else {
1301-
self.use_label_at_offset(offset, label, kind);
1391+
self.emit_veneer(label, offset, kind);
13021392
}
13031393
}
13041394
}
@@ -1346,25 +1436,36 @@ impl<I: VCodeInst> MachBuffer<I> {
13461436
veneer_fixup_off,
13471437
veneer_label_use
13481438
);
1349-
// Register a new use of `label` with our new veneer fixup and offset.
1350-
// This'll recalculate deadlines accordingly and enqueue this fixup to
1351-
// get processed at some later time.
1352-
self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1439+
// Register a new use of `label` with our new veneer fixup and
1440+
// offset. This'll recalculate deadlines accordingly and
1441+
// enqueue this fixup to get processed at some later
1442+
// time. Note that if we now have a max-range, we instead skip
1443+
// the usual fixup list to avoid quadratic behavior.
1444+
if veneer_label_use.supports_veneer() {
1445+
self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1446+
} else {
1447+
self.fixup_records_max_range.push(MachLabelFixup {
1448+
label,
1449+
offset: veneer_fixup_off,
1450+
kind: veneer_label_use,
1451+
});
1452+
}
13531453
}
13541454

13551455
fn finish_emission_maybe_forcing_veneers(
13561456
&mut self,
1357-
force_veneers: bool,
1457+
force_veneers: ForceVeneers,
13581458
ctrl_plane: &mut ControlPlane,
13591459
) {
13601460
while !self.pending_constants.is_empty()
13611461
|| !self.pending_traps.is_empty()
13621462
|| !self.fixup_records.is_empty()
1463+
|| !self.fixup_records_max_range.is_empty()
13631464
{
13641465
// `emit_island()` will emit any pending veneers and constants, and
13651466
// as a side-effect, will also take care of any fixups with resolved
13661467
// labels eagerly.
1367-
self.emit_island_maybe_forced(force_veneers, u32::MAX, ctrl_plane);
1468+
self.emit_island_maybe_forced(force_veneers, IsLastIsland::Yes, ctrl_plane);
13681469
}
13691470

13701471
// Ensure that all labels have been fixed up after the last island is emitted. This is a
@@ -1385,7 +1486,7 @@ impl<I: VCodeInst> MachBuffer<I> {
13851486
// had bound one last label.
13861487
self.optimize_branches(ctrl_plane);
13871488

1388-
self.finish_emission_maybe_forcing_veneers(false, ctrl_plane);
1489+
self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane);
13891490

13901491
let alignment = self.finish_constants(constants);
13911492

@@ -1734,7 +1835,7 @@ impl MachBranch {
17341835
pub struct MachTextSectionBuilder<I: VCodeInst> {
17351836
buf: MachBuffer<I>,
17361837
next_func: usize,
1737-
force_veneers: bool,
1838+
force_veneers: ForceVeneers,
17381839
}
17391840

17401841
impl<I: VCodeInst> MachTextSectionBuilder<I> {
@@ -1746,7 +1847,7 @@ impl<I: VCodeInst> MachTextSectionBuilder<I> {
17461847
MachTextSectionBuilder {
17471848
buf,
17481849
next_func: 0,
1749-
force_veneers: false,
1850+
force_veneers: ForceVeneers::No,
17501851
}
17511852
}
17521853
}
@@ -1762,9 +1863,9 @@ impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
17621863
// Conditionally emit an island if it's necessary to resolve jumps
17631864
// between functions which are too far away.
17641865
let size = func.len() as u32;
1765-
if self.force_veneers || self.buf.island_needed(size) {
1866+
if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) {
17661867
self.buf
1767-
.emit_island_maybe_forced(self.force_veneers, size, ctrl_plane);
1868+
.emit_island_maybe_forced(self.force_veneers, IsLastIsland::No, ctrl_plane);
17681869
}
17691870

17701871
self.buf.align_to(align);
@@ -1796,7 +1897,7 @@ impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
17961897
}
17971898

17981899
fn force_veneers(&mut self) {
1799-
self.force_veneers = true;
1900+
self.force_veneers = ForceVeneers::Yes;
18001901
}
18011902

18021903
fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> {
@@ -1946,7 +2047,7 @@ mod test {
19462047
buf.bind_label(label(1), state.ctrl_plane_mut());
19472048
while buf.cur_offset() < 2000000 {
19482049
if buf.island_needed(0) {
1949-
buf.emit_island(0, state.ctrl_plane_mut());
2050+
buf.emit_island(state.ctrl_plane_mut());
19502051
}
19512052
let inst = Inst::Nop4;
19522053
inst.emit(&[], &mut buf, &info, &mut state);
@@ -1983,9 +2084,15 @@ mod test {
19832084
// before the deadline.
19842085
taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20),
19852086

1986-
// This branch is in-range so no veneers should be needed, it should
1987-
// go directly to the target.
1988-
not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4),
2087+
// This branch is in-range so no veneers are technically
2088+
// be needed; however because we resolve *all* pending
2089+
// fixups that cross an island when that island occurs, it
2090+
// will have a veneer as well. This veneer comes just
2091+
// after the one above. (Note that because the CondBr has
2092+
// two instructions, the conditinoal and unconditional,
2093+
// this offset is the same, though the veneer is four
2094+
// bytes later.)
2095+
not_taken: BranchTarget::ResolvedOffset((1 << 20) - 4 - 20),
19892096
};
19902097
inst.emit(&[], &mut buf2, &info, &mut state);
19912098

0 commit comments

Comments
 (0)