perf(dipu): faster aten::mul in cuda & muxi #855

Wrench-Git · 2024-06-28T09:05:20Z

This change remove a redundency transformation mul_tensor->mul_scalar->mul_tensor. Also with a faster BinaryOpInferrer.
The aten::mul is almost as fast as torch in cpu avg.

dipu/torch_dipu/csrc_dipu/aten/ops/OpUtils.hpp

dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py

dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml

dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py

* faster aten::mul in cuda * improve the code format * loose the check for scalar tensor * improve the code * let the logic of mul diff on different devices. * Update autogen_diopi_wrapper.py * Update diopi_functions.yaml * Update OpUtils.hpp

update ../../impl/ascend/device_configs.py

Wrench-Git requested review from mrdanielw, lljbash and fandaoyi as code owners June 28, 2024 09:05

Wrench-Git requested a review from zhaoguochun1995 as a code owner July 10, 2024 11:55

Wrench-Git added enhancement New feature or request DIPU DIPU related labels Jul 11, 2024

Wrench-Git force-pushed the faster_mul branch 2 times, most recently from a51cfbb to ad2e7e0 Compare July 11, 2024 05:40

fandaoyi reviewed Jul 12, 2024

View reviewed changes

dipu/torch_dipu/csrc_dipu/aten/ops/OpUtils.hpp Outdated Show resolved Hide resolved

dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py Outdated Show resolved Hide resolved

dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml Show resolved Hide resolved

fandaoyi requested a review from yangbofun July 12, 2024 10:45

Wrench-Git added 5 commits July 15, 2024 15:17

faster aten::mul in cuda

1d43225

improve the code format

e4700be

loose the check for scalar tensor

169ab77

improve the code

5140aed

let the logic of mul diff on different devices.

ae77ca8

Wrench-Git force-pushed the faster_mul branch from 926e63d to ae77ca8 Compare July 15, 2024 07:18

fandaoyi reviewed Jul 15, 2024

View reviewed changes

dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py Outdated Show resolved Hide resolved

Wrench-Git force-pushed the faster_mul branch from 0471ed7 to ae77ca8 Compare July 15, 2024 12:10

Wrench-Git added 3 commits July 15, 2024 20:30

Update autogen_diopi_wrapper.py

0963fa8

Update diopi_functions.yaml

915ac27

Update OpUtils.hpp

2995ee2

fandaoyi approved these changes Jul 15, 2024

View reviewed changes

lljbash approved these changes Jul 16, 2024

View reviewed changes

lljbash changed the title ~~faster aten::mul in cuda~~ perf(dipu): faster aten::mul in cuda & muxi Jul 16, 2024

mrdanielw merged commit 8738b0c into DeepLink-org:main Jul 16, 2024
26 of 29 checks passed

zhangzefeng92 pushed a commit to DeepLink-org/deeplink.framework.dev that referenced this pull request Jul 18, 2024

zq/update device_configs.py (DeepLink-org#855)

20ce541

update ../../impl/ascend/device_configs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(dipu): faster aten::mul in cuda & muxi #855

perf(dipu): faster aten::mul in cuda & muxi #855

Uh oh!

Wrench-Git commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perf(dipu): faster aten::mul in cuda & muxi #855

perf(dipu): faster aten::mul in cuda & muxi #855

Uh oh!

Conversation

Wrench-Git commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!