Require ObjectReference to be inside an object

# Status quo

Currently, MMTk defines several addresses of an object.

| Name              | Definition                                    | Must be inside object |
|-------------------|-----------------------------------------------|-----------------------|
| starting address  | the return value of `memory_manager::alloc`   | Yes                   |
| ObjectReference   | an address that refers to an object           | No                    |
| in-object address | at a constant offset from ObjectReference, used to access SFT, side metadata, etc. | Yes |
| header address    | address used to access in-object metadata     | Yes                   |

The definition of `ObjectReference` is VM-specific.  We currently allow `ObjectReference` to be outside an object because some VMs do so.  For example, in JikesRVM, an `ObjectReference` is defined as the address to the array payload of an object if the object is an array.  That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from the `ObjectReference`.  When we port MMTk from JikesRVM to Rust, we inherited this type.  `ObjectReference` is now the standard way for mmtk-core to refer to an object.  We still allow `ObjectReference` to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field as `ObjectReference`.

However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata.   Similar is true for SFT entries which are allocated by chunk.  If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault.  To solve this problem, we require the VM binding to implement `ObjectReference::ref_to_address` which computes the "in-object address" of an object which must be inside the object.  (https://github.com/mmtk/mmtk-core/pull/699)

Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address".  Because we don't know if a word on the stack is an actual `ObjectReference` or not, the offset from the `ObjectReference` to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body).  (Also in https://github.com/mmtk/mmtk-core/pull/699)

Meanwhile, not all VMs can use "the word stored in the field" as `ObjectReference`.  In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM).  We solve this problem by letting the VM binding implement the `Slot` trait and customize the `load` and `store` method so that we always represent a word-sized pointer-based `ObjectReference` to mmtk-core.  (https://github.com/mmtk/mmtk-core/pull/606)

Then we implemented an algorithm for finding the last VO bit from an interior pointer.  If neither the `ObjectReference` nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exact `ObjectReference`, but only an address range where one of the addresses is a valid `ObjectReference`.  That's confusing and inefficient.  Now we require that `ObjectReference` must be word-aligned, while the "in-object address" has no alignment requirements.  This makes `ObjectReference` more likely *not* to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned.  But this is not a problem because the VM binding can fix the alignment in `Slot::load` and `Slot::store`.  (https://github.com/mmtk/mmtk-core/pull/1159)

In conclusion, an `ObjectReference` as required by the current mmtk-core

-   is an address, and
-   must be a constant offset from the "in-object address" because of conservative stack scanning, and
-   must be word-aligned to support searching for `ObjectReference` from interior pointer, and
-   may or may not be the thing held in the field.

*p.s. See https://github.com/mmtk/mmtk-core/issues/1044 for the discussion about VMs that store handles instead of object addresses in fields.*


# The problem

mmtk-core doesn't use the raw address of `ObjectReference` except for debug purposes.  Almost all operations are done w.r.t. the "in-object address", including `trace_object`, `is_reachable` (via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.

Meanwhile, `ObjectReference` is not always what's in a field, either.  It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address".  The only reason for a VM binding to use an address outside an object as `ObjectReference` is "it is what's in a field, and we don't want to waste one subtraction for every field load".  But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequent `ObjectReference::to_address()`.


# Proposal: Require ObjectReference to be inside an object

We can add one more requirement in addition to the alignment requirement:  **`ObjectReference` must be an address inside an object.**

That merges the "in-object address" and `ObjectReference`.

The benefits are obvious:

-   We directly use the raw address of `ObjectReference` to access SFT and side metadata since it's guaranteed to be inside an object.
-   If a VO bit is set for an address, it will be the exact address for the `ObjectReference`.  There is no confusion about the offset or alignment.
-   Removing a few constants and methods in `ObjectModel` and `ObjectReference`.  The API will be much simpler.
-   Removing the cost of address computing at every `ObjectReference::to_address`.

Concretely, we remove `ObjectReference::to_address`, keeping the `to_raw_address`, `to_header` and `to_object_start` methods.  When accessing SFT or side metadata, we simply use `ObjectReference::to_raw_address` because it will be guaranteed to be inside the object.

We remove the constant `IN_OBJECT_ADDRESS_OFFSET` and the methods `ObjectReference::to_address` and `ObjectReference::from_address`.  Note that `IN_OBJECT_ADDRESS_OFFSET` is not required to be a multiple of word size.  Currently, when we set a VO bit from `ObjectReference`, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement of `ObjectReference` to infer the only possible raw address of `ObjectReference` given a VO bit.  After removing `IN_OBJECT_ADDRESS_OFFSET`, we set VO bit exactly at `ObjectReference::to_raw_address`.  It will be both inside the object and aligned.  There will be no need to mess with the alignment requirements.  If VO bit is set at address `X`, then `ObjectReference::from_raw_address_unchecked(X)` will be guaranteed to be a valid `ObjectReference`.


# Potential risks

## Performance

By unifying `ObjectReference` and "in-object address", mmtk-core will no longer call `ObjectReference::to_address` if there is an offset between the raw address and the "in-object address".  This should potentially improve the performance.  However, we then requires one subtraction at every `Slot::load` and an addition at `Slot::store`.  In this sense, we merely moved the overhead from `to_address` to `load` and `store`.  We need performance evaluation to see whether the cost increases or decreases after this change.  Currently the only VM binding that has different `ObjectReference` and "in-object address" is JikesRVM.  We'll need some test results from JikesRVM.

# Engineering

By unifying `ObjectReference` and "in-object address", mmtk-core will have an easier time mapping a VO bit to its corresponding `ObjectReference`.  But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate of `ObjectReference`" by subtracting the value on the stack with a value before passing the "candidate" to `memory_manager::is_mmtk_object`.  That means, if the VM binding doesn't implement the subtraction in `ObjectModel::ref_to_address`, it must implement it in the conservative stack scanner.  That's also shifting the complexity from one place to another.  Fortunately, JikesRVM doesn't use conservative stack scanning.  If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to https://github.com/mmtk/mmtk-core/pull/1159, regardless of this change.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Require ObjectReference to be inside an object #1170

Status quo

The problem

Proposal: Require ObjectReference to be inside an object

Potential risks

Performance

Engineering

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Name	Definition	Must be inside object
starting address	the return value of `memory_manager::alloc`	Yes
ObjectReference	an address that refers to an object	No
in-object address	at a constant offset from ObjectReference, used to access SFT, side metadata, etc.	Yes
header address	address used to access in-object metadata	Yes

Require ObjectReference to be inside an object #1170

Description

Status quo

The problem

Proposal: Require ObjectReference to be inside an object

Potential risks

Performance

Engineering

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions