Skip to content

Fix bf16/fp16 accuracy issue in sparsecsr addmm#3273

Open
jenniew wants to merge 21 commits intointel:mainfrom
jenniew:sparsecsr_mm
Open

Fix bf16/fp16 accuracy issue in sparsecsr addmm#3273
jenniew wants to merge 21 commits intointel:mainfrom
jenniew:sparsecsr_mm

Conversation

@jenniew
Copy link
Copy Markdown
Contributor

@jenniew jenniew commented Apr 6, 2026

Fix bf16/fp16 accuracy issue in sparsecsr addmm.
Related issue: #3177

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the reported bf16/fp16 accuracy gap in XPU SparseCSR addmm by changing how the beta * input term is applied during dense fallback computation.

Changes:

  • Updates addmm_calculation to scale input by beta via an explicit multiply before accumulating into the mm result.
  • Removes use of the fused add_(tensor, alpha=beta) form in favor of add_(tensor * beta).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (beta.toComplexDouble() != 0.) {
Tensor input_dense = input.layout() != kStrided ? input.to_dense() : input;
result_dense.add_(input_dense, beta);
result_dense.add_(input_dense * beta);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result_dense.add_(input_dense * beta) introduces an extra elementwise multiply + a temporary tensor allocation. This is typically both slower and less numerically accurate for bf16/fp16 than the fused add_(input_dense, /*alpha=*/beta) (single kernel, often FMA). If the intent is to improve bf16/fp16 accuracy, consider keeping the fused add_ form and fixing precision/casting in the underlying XPU add kernel (or doing the accumulation in fp32) rather than splitting into mul+add here.

Suggested change
result_dense.add_(input_dense * beta);
result_dense.add_(input_dense, beta);

Copilot uses AI. Check for mistakes.
Comment on lines -86 to +85
result_dense.add_(input_dense, beta);
result_dense.add_(input_dense * beta);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the difference between these two calling methods?

@CuiYifeng CuiYifeng requested a review from chunhuanMeng April 7, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants