-
Notifications
You must be signed in to change notification settings - Fork 52
Description
By specializing BINARY_OP
by reference count as well as by type we can potentially speedup tier 2 jitted binary operations quite a lot.
It is quite common that one of the operands of BINARY_OP
has a reference count of 1, so it is profitable to avoid object allocation and reuse that object. We already do that for float
specializations of BINARY_OP
.
The problem is that the resulting code is quite branchy, and thus bulky, which is a problem for the jit.
If we specialize by refcount, the code size shrinks dramatically.
For example, the x86-64 code size for _BINARY_OP_MULTIPLY_FLOAT
is 232 bytes.
If we specialize for the left hand side having a refcount of 1 the equivalent instruction is 143* with a guard for the refcount.
If we assume that the optimizer can remove the guard, then the size drops to 74 bytes.
If we also specialize for the right hand side having a refcount > 1, then the code size (without guards) drops to 41 bytes, less than 20% of the original code size.
Pyston performs this optimization. It is, I believe, the only specialization that Pyston does that we don't.
There isn't much room for more tier 1 instructions, but by combining this approach with #660 we can implement this without excessive tier 1 specialization.
Currently, BINARY_OP
specializes to one of:
- BINARY_OP_MULTIPLY_INT
- BINARY_OP_ADD_INT
- BINARY_OP_SUBTRACT_INT
- BINARY_OP_MULTIPLY_FLOAT
- BINARY_OP_ADD_FLOAT
- BINARY_OP_SUBTRACT_FLOAT
- BINARY_OP_ADD_UNICODE
We can replace these with:
- BINARY_OP_TABLE
- BINARY_OP_INT
- BINARY_OP_TABLE_L1
- BINARY_OP_INT_L1
- BINARY_OP_TABLE_R1
- BINARY_OP_INT_R1
The L1
and R1
suffices mean "left has refcount of one" and "right has refcount of one" respectively.
int
is still worth treating separately because it is both common and more awkward due to the caching of small values.
(ignoring BINARY_OP_INPLACE_ADD_UNICODE as a special case)
*This will go down once we have hot/cold splitting