Skip to content

Conversation

@tbensonatl
Copy link
Collaborator

Change the behavior of the inv() transform as follows:

  • No longer unconditionally overwrite the input with factorized data. Previously, (Ainv = inv(A)).run() would write the inverse to Ainv and the LU factorization to A.
  • Support in-place transforms like (A = inv(A)).run(). Previously, this would run, but the results would be incorrect because the underlying cuBLAS calls only support out-of-place solves.
  • The above are achieved by always creating a temporary work buffer and copying the input into that work buffer.
  • Add support for input operators (i.e., not just tensors). The operator runs when populating the temporary input work buffer.
  • Use host-pinned memory and async memcpys to test the success of the factorization/inversion. This still synchronizes the provided stream, but no longer synchronizes the default stream.

Change the behavior of the inv() transform as follows:

- No longer unconditionally overwrite the input with factorized data.
  Previously, (Ainv = inv(A)).run() would write the inverse to A and
  the LU factorization to A.
- Support in-place transforms like (A = inv(A)).run(). Previously,
  this would run, but the results would be incorrect because the
  underlying cuBLAS calls only support out-of-place solves.
- The above are achieved by always creating a temporary workbuffer
  and copying the input into that work buffer.
- Add support for input operators (i.e., not just tensors). The
  operator runs when populating the temporary input work buffer.
@tbensonatl
Copy link
Collaborator Author

/build

@cliffburdick
Copy link
Collaborator

/build

@coveralls
Copy link

Coverage Status

coverage: 93.39% (+0.07%) from 93.323%
when pulling 3df9230 on add-in-place-inv-transform-support
into 459cffb on main.

@cliffburdick cliffburdick merged commit 4468820 into main Aug 26, 2024
@cliffburdick cliffburdick deleted the add-in-place-inv-transform-support branch August 26, 2024 18:34
@cliffburdick
Copy link
Collaborator

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants