Skip to content

Consider SonicMOE kernel comparison for potentially better IO/activation memory management #2709

@Skylion007

Description

@Skylion007

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

SonicMOE has some interesting open source kernels for fine-grained MOE which claim to reduce memory and improves throughput significantly on Hopper. We should see what kernels can be adapted from here for use in TransformerEngine to improve speed and memory usage: https://arxiv.org/abs/2512.14080 Claimed to reduce activation memory by up to 45% improve compute throughput by up to 1.86X on Hopper.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Benchmark the kernels and see if there are any easy gains to be had here in TransfromerEngine, specifically with the FusedRouter etc.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions