-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Status quo
Currently, MMTk defines several addresses of an object.
Name | Definition | Must be inside object |
---|---|---|
starting address | the return value of memory_manager::alloc |
Yes |
ObjectReference | an address that refers to an object | No |
in-object address | at a constant offset from ObjectReference, used to access SFT, side metadata, etc. | Yes |
header address | address used to access in-object metadata | Yes |
The definition of ObjectReference
is VM-specific. We currently allow ObjectReference
to be outside an object because some VMs do so. For example, in JikesRVM, an ObjectReference
is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from the ObjectReference
. When we port MMTk from JikesRVM to Rust, we inherited this type. ObjectReference
is now the standard way for mmtk-core to refer to an object. We still allow ObjectReference
to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field as ObjectReference
.
However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement ObjectReference::ref_to_address
which computes the "in-object address" of an object which must be inside the object. (#699)
Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual ObjectReference
or not, the offset from the ObjectReference
to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in #699)
Meanwhile, not all VMs can use "the word stored in the field" as ObjectReference
. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement the Slot
trait and customize the load
and store
method so that we always represent a word-sized pointer-based ObjectReference
to mmtk-core. (#606)
Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the ObjectReference
nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exact ObjectReference
, but only an address range where one of the addresses is a valid ObjectReference
. That's confusing and inefficient. Now we require that ObjectReference
must be word-aligned, while the "in-object address" has no alignment requirements. This makes ObjectReference
more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment in Slot::load
and Slot::store
. (#1159)
In conclusion, an ObjectReference
as required by the current mmtk-core
- is an address, and
- must be a constant offset from the "in-object address" because of conservative stack scanning, and
- must be word-aligned to support searching for
ObjectReference
from interior pointer, and - may or may not be the thing held in the field.
p.s. See #1044 for the discussion about VMs that store handles instead of object addresses in fields.
The problem
mmtk-core doesn't use the raw address of ObjectReference
except for debug purposes. Almost all operations are done w.r.t. the "in-object address", including trace_object
, is_reachable
(via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.
Meanwhile, ObjectReference
is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object as ObjectReference
is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequent ObjectReference::to_address()
.
Proposal: Require ObjectReference to be inside an object
We can add one more requirement in addition to the alignment requirement: ObjectReference
must be an address inside an object.
That merges the "in-object address" and ObjectReference
.
The benefits are obvious:
- We directly use the raw address of
ObjectReference
to access SFT and side metadata since it's guaranteed to be inside an object. - If a VO bit is set for an address, it will be the exact address for the
ObjectReference
. There is no confusion about the offset or alignment. - Removing a few constants and methods in
ObjectModel
andObjectReference
. The API will be much simpler. - Removing the cost of address computing at every
ObjectReference::to_address
.
Concretely, we remove ObjectReference::to_address
, keeping the to_raw_address
, to_header
and to_object_start
methods. When accessing SFT or side metadata, we simply use ObjectReference::to_raw_address
because it will be guaranteed to be inside the object.
We remove the constant IN_OBJECT_ADDRESS_OFFSET
and the methods ObjectReference::to_address
and ObjectReference::from_address
. Note that IN_OBJECT_ADDRESS_OFFSET
is not required to be a multiple of word size. Currently, when we set a VO bit from ObjectReference
, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement of ObjectReference
to infer the only possible raw address of ObjectReference
given a VO bit. After removing IN_OBJECT_ADDRESS_OFFSET
, we set VO bit exactly at ObjectReference::to_raw_address
. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at address X
, then ObjectReference::from_raw_address_unchecked(X)
will be guaranteed to be a valid ObjectReference
.
Potential risks
Performance
By unifying ObjectReference
and "in-object address", mmtk-core will no longer call ObjectReference::to_address
if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at every Slot::load
and an addition at Slot::store
. In this sense, we merely moved the overhead from to_address
to load
and store
. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has different ObjectReference
and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.
Engineering
By unifying ObjectReference
and "in-object address", mmtk-core will have an easier time mapping a VO bit to its corresponding ObjectReference
. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate of ObjectReference
" by subtracting the value on the stack with a value before passing the "candidate" to memory_manager::is_mmtk_object
. That means, if the VM binding doesn't implement the subtraction in ObjectModel::ref_to_address
, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to #1159, regardless of this change.