Fix bf16/fp16 accuracy issue in sparsecsr addmm by jenniew · Pull Request #3273 · intel/torch-xpu-ops

jenniew · 2026-04-06T23:01:34Z

Fix bf16/fp16 accuracy issue in sparsecsr addmm.
Related issue: #3177

…arsecsr_mm

Copilot

Pull request overview

This PR targets the reported bf16/fp16 accuracy gap in XPU SparseCSR addmm by changing how the beta * input term is applied during dense fallback computation.

Changes:

Updates addmm_calculation to scale input by beta via an explicit multiply before accumulating into the mm result.
Removes use of the fused add_(tensor, alpha=beta) form in favor of add_(tensor * beta).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T05:49:33Z

src/ATen/native/sparse/xpu/SparseCsrTensorMath.cpp

  if (beta.toComplexDouble() != 0.) {
    Tensor input_dense = input.layout() != kStrided ? input.to_dense() : input;
-    result_dense.add_(input_dense, beta);
+    result_dense.add_(input_dense * beta);


result_dense.add_(input_dense * beta) introduces an extra elementwise multiply + a temporary tensor allocation. This is typically both slower and less numerically accurate for bf16/fp16 than the fused add_(input_dense, /*alpha=*/beta) (single kernel, often FMA). If the intent is to improve bf16/fp16 accuracy, consider keeping the fused add_ form and fixing precision/casting in the underlying XPU add kernel (or doing the accumulation in fp32) rather than splitting into mul+add here.

Suggested change

result_dense.add_(input_dense * beta);

result_dense.add_(input_dense, beta);

CuiYifeng · 2026-04-07T07:07:49Z

src/ATen/native/sparse/xpu/SparseCsrTensorMath.cpp

-    result_dense.add_(input_dense, beta);
+    result_dense.add_(input_dense * beta);


Could you explain the difference between these two calling methods?

jenniew added 21 commits February 9, 2026 17:34

add addmm, addmv, mm, bmm, baddbmm, add on SparseCsrXPU

ed81c37

add header

8297607

fix comments

532ee09

split PR

7fceb9e

Merge branch 'main' of https://github.com/intel/torch-xpu-ops into sp…

59e061b

…arsecsr_mm

clean

ef3c08a

add bmm

e74ceba

Merge branch 'main' of https://github.com/intel/torch-xpu-ops into sp…

77e6bc4

…arsecsr_mm

clean

66bbc4e

fix grad check and enable some 2213 cases

eef738c

merge

a5c5a32

fix comments and lint

4f5bb3e

Merge branch 'main' of https://github.com/intel/torch-xpu-ops into sp…

76867ec

…arsecsr_mm

clean

078bf9f

fix comments

a2a8865

Merge branch 'main' of https://github.com/intel/torch-xpu-ops into sp…

d999d27

…arsecsr_mm

update test

554d1e4

Merge

3ce5009

fix add bf16/fp16 issue

c0c0013

merge

6b8eabb

merge1

f218608

jenniew requested a review from CuiYifeng April 6, 2026 23:01

jenniew mentioned this pull request Apr 6, 2026

Accuracy gap of BF16/FP16 test_block_addmm #3177

Open

CuiYifeng requested a review from Copilot April 7, 2026 05:47

Copilot AI reviewed Apr 7, 2026

View reviewed changes

CuiYifeng reviewed Apr 7, 2026

View reviewed changes

CuiYifeng requested a review from chunhuanMeng April 7, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bf16/fp16 accuracy issue in sparsecsr addmm#3273

Fix bf16/fp16 accuracy issue in sparsecsr addmm#3273
jenniew wants to merge 21 commits intointel:mainfrom
jenniew:sparsecsr_mm

jenniew commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

CuiYifeng Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	result_dense.add_(input_dense * beta);
	result_dense.add_(input_dense, beta);

Conversation

jenniew commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

CuiYifeng Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants