Skip to content

Add native XPU SVD implementation using oneMKL gesvd#3264

Open
PatrykWilczewski wants to merge 8 commits intointel:mainfrom
PatrykWilczewski:dev/pmerchex/fix_xpu_svd
Open

Add native XPU SVD implementation using oneMKL gesvd#3264
PatrykWilczewski wants to merge 8 commits intointel:mainfrom
PatrykWilczewski:dev/pmerchex/fix_xpu_svd

Conversation

@PatrykWilczewski
Copy link
Copy Markdown
Contributor

@PatrykWilczewski PatrykWilczewski commented Apr 3, 2026

Implement native _linalg_svd for XPU using oneMKL's gesvd USM API, removing the CPU fallback that caused extra warning.

Previously, _linalg_svd.U was in the XPU fallback list, which meant every SVD call on the XPU transferred data to CPU and back, emitting a fallback warning. This caused test_cond_errors_and_warnings to fail because torch.linalg.cond() internally calls linalg_svdvals() -> _linalg_svd, producing an unexpected second warning alongside the expected resize warning.

  • Add apply_svd_mkl<>() template calling oneapi::mkl::lapack::gesvd with batch loop, in mkl/BatchLinearAlgebra.cpp
  • Add svd_mkl() declaration in mkl/BatchLinearAlgebra.h
  • Register svd_stub for XPU via svd_kernel_xpu() in BatchLinearAlgebra.cpp (with CPU fallback when USE_ONEMKL_XPU is not defined)
  • Remove "_linalg_svd.U" from XPUFallback.template
  • Add _linalg_svd dispatch entries for XPU in yaml/native/native_functions.yaml

This PR also includes a targeted test-input stabilization for SVD-dependent OpInfo coverage on XPU:

  • For XPU float32 only, OpInfo sample generation for linalg.cond is patched to use well-conditioned matrices.
  • A matching patch is applied for linalg.det in XPU functorch
    Linalg.cond internally relies on SVD. With unconstrained random float32 matrices, samples are often near singular which can introduce high numerical variance and flaky failures unrelated to the kernel changes in this PR.

Fixes: #2389

Implement native _linalg_svd for XPU using oneMKL's gesvd USM API,
removing the CPU fallback that caused extra warning.

Previously, _linalg_svd.U was in the XPU fallback list, which meant
every SVD call on the XPU transferred data to CPU and back, emitting a
fallback warning. This caused test_cond_errors_and_warnings to fail
because torch.linalg.cond() internally calls linalg_svdvals() ->
_linalg_svd, producing an unexpected second warning alongside the
expected resize warning.

- Add apply_svd_mkl<>() template calling oneapi::mkl::lapack::gesvd
  with batch loop, in mkl/BatchLinearAlgebra.cpp
- Add svd_mkl() declaration in mkl/BatchLinearAlgebra.h
- Register svd_stub for XPU via svd_kernel_xpu() in
  BatchLinearAlgebra.cpp (with CPU fallback when USE_ONEMKL_XPU
  is not defined)
- Remove "_linalg_svd.U" from XPUFallback.template
- Add _linalg_svd dispatch entries for XPU in
  yaml/native/native_functions.yaml

Fixes: intel#2389
Copilot AI review requested due to automatic review settings April 3, 2026 07:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a native XPU implementation of torch._linalg_svd backed by oneMKL gesvd (USM), removing the previous CPU fallback path that caused extra device transfers and fallback warnings.

Changes:

  • Added XPU dispatch entries for _linalg_svd / _linalg_svd.U in native_functions.yaml.
  • Implemented an oneMKL-based SVD path (svd_mkl, apply_svd_mkl) and registered the XPU svd_stub.
  • Removed "_linalg_svd.U" from the XPU fallback allowlist.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
yaml/native/native_functions.yaml Registers XPU dispatch for _linalg_svd so calls no longer fall back to CPU.
src/ATen/native/xpu/mkl/BatchLinearAlgebra.h Adds the svd_mkl declaration and required includes.
src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp Implements oneMKL gesvd-based SVD and batching loop.
src/ATen/native/xpu/XPUFallback.template Removes _linalg_svd.U from the fallback list.
src/ATen/native/xpu/BatchLinearAlgebra.cpp Registers XPU svd_stub and provides CPU fallback when oneMKL XPU is unavailable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 3, 2026 09:02
@PatrykWilczewski PatrykWilczewski force-pushed the dev/pmerchex/fix_xpu_svd branch from 53ea463 to 9c04a68 Compare April 3, 2026 09:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@PatrykWilczewski PatrykWilczewski force-pushed the dev/pmerchex/fix_xpu_svd branch from 4cd7a54 to e1ccb1d Compare April 8, 2026 08:00
Copilot AI review requested due to automatic review settings April 8, 2026 13:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Skip]: RuntimeError: Data corruption detected

2 participants