Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Common Subexpression Elimination #403

Open
@omnisip

Description

@omnisip

Hi everyone,

@abrown asked me to look into optimizations for the extended multiplication in #376 that would reduce the number of shuffles before executing pmuldq (The proposed optimization is here). In the midst of doing research and analyzing the code, it became clear that for every wasmsimd instruction that required a sequence of instructions to implement, there would be redundant operations if there was more than one operation using the same parameters. In the xnnpack case, the multiplier is reused in each of the subsequent calculations. As a side effect, pshufd can be called repeatedly on the multiplier even if the required intermediate was already generated as part of a previous operation.

What if we implemented something like an LRU on register outputs for every instruction that comes through? Something like this has to exist on some level since the compiler has to keep track of register allocations before it gets to instruction generation. But what if we extended this mechanism such that for every instruction that is assembled, it first checks to see if a previous temporary register with the output of that instruction is set and is valid?

The benefit of such optimization could mean tremendously better code generation in cases like shuffles, swizzles, widen_high/low, multiplications and any other scenario that we thought might be appropriate for multi-return.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions