-
Notifications
You must be signed in to change notification settings - Fork 814
[GPU][Codegen] Expand iteration space based on new expand_dims attribute
#22342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
expand_dims lowering_config expand_dims lowering_config
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Max191
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good now, just some nits and one comment about the configuration logic. Nice work, and thanks for addressing all my comments so far!
In case I'm OOO by the time you address these changes, I'll remove my request for changes now. It LGTM after these final comments, so anyone else can give the final approval after this round of comments.
| #include "iree/compiler/Codegen/Common/Transforms.h" | ||
| #include "iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.h" | ||
| #include "iree/compiler/Dialect/LinalgExt/Transforms/Transforms.h" | ||
| #include "iree/compiler/Dialect/Util/IR/UtilDialect.h" | ||
| #include "llvm/ADT/STLExtras.h" | ||
| #include "llvm/ADT/SmallVectorExtras.h" | ||
| #include "llvm/Support/DebugLog.h" | ||
| #include "llvm/Support/LogicalResult.h" | ||
| #include "mlir/Dialect/Affine/IR/AffineOps.h" | ||
| #include "mlir/Dialect/Arith/Utils/Utils.h" | ||
| #include "mlir/Dialect/Linalg/IR/Linalg.h" | ||
| #include "mlir/Dialect/MemRef/Transforms/Transforms.h" | ||
| #include "mlir/Dialect/Tensor/IR/Tensor.h" | ||
| #include "mlir/IR/AffineExpr.h" | ||
| #include "mlir/IR/AffineMap.h" | ||
| #include "mlir/IR/BuiltinTypeInterfaces.h" | ||
| #include "mlir/IR/OpDefinition.h" | ||
| #include "mlir/Transforms/GreedyPatternRewriteDriver.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please clean up the includes. I think a lot of them are probably not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need these remaining ones
compiler/src/iree/compiler/Codegen/Common/GPU/GPUExpandDimensions.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUExpandDimensions.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUExpandDimensions.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUExpandDimensions.cpp
Outdated
Show resolved
Hide resolved
|
|
||
| // ----- | ||
|
|
||
| func.func @expand_dynamic_dim(%a: tensor<4x?xf16>, %b: tensor<1x?xf16>) -> tensor<4x1xf32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For this test and the test below, can you use something simple like linalg.add? The tests are quite long, and I don't think it needs to be a matvec to test what is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to linalg.add for all except one (I think it is a useful reference since it's the only use real use of this atm)
compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ReductionConfigUtils.cpp
Outdated
Show resolved
Hide resolved
I will be OOO, and it looks good to me after the final comments are addressed. Someone else can give the final approval.
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
Signed-off-by: Eric Feng <[email protected]>
|
@efric I added a ci-extra trailer to run test_torch can you check? |
Signed-off-by: Eric Feng <[email protected]>
edb0ed2 to
8850fc8
Compare
The current binary size for this test is nearing its threshold (459794/460000). Bump to 470000 to allow future changes such as #22342 some room (goes up to 460306) Signed-off-by: Eric Feng <[email protected]>
The current binary size for this test is nearing its threshold (459794/460000). Bump to 470000 to allow future changes such as iree-org#22342 some room (goes up to 460306) Signed-off-by: Eric Feng <[email protected]>
…ibute (iree-org#22342) This patch introduces iteration space expansion for reductions in the VectorDistribute path. Specifically, we: 1. Add a new attribute, `expand_dims`, for reductions. 2. Introduce a new pass, `GPUExpandDimensions`, which uses `expand_dims` to expand the iteration space of relevant dimensions. 3. Refactor common functionality shared between `GPUExpandDimensions` and `BlockDynamicDimensions` into reusable utilities. 4. Refactor encoding helpers from `EncodingAttrs.cpp` into reusable utilities. This change also enables [chain FMA](iree-org#21855) in matvec codegen as we iterate along the K reduction dimension. --- **Performance Summary** **IREE benchmark module** * Only expansion: ~4% improvement * Expansion + chain FMA: ~11% improvement **rocprof** * Only expansion: ~13% worse * Expansion + chain FMA: ~9% better **Register usage** * 10% reduction (60 → 54 registers for matvec dispatches) **Instruction latency (post-reduction loop epilogue)** * 3.5% improvement (340 → 328 total mean latency) --- **Notes** * As a follow-up, we can explore applying iteration space expansion to the reduction in attention * Right now, we only expand one dimension into two although the implementation supports expansion to N dimensions. * Please note this PR changes the reduction order, so expect some minor changes to the numerics * This is does not improve performance by itself/can cause regression without chain FMA iree-org#21855 Traces for matvec dispatches are attached for all variations (original, only expansion, and expansion + chain FMA). [115_expansion_and_chain.tar.gz](https://github.com/user-attachments/files/23268046/115_expansion_and_chain.tar.gz) [115_nothing.tar.gz](https://github.com/user-attachments/files/23268047/115_nothing.tar.gz) [115_only_expansion.tar.gz](https://github.com/user-attachments/files/23268048/115_only_expansion.tar.gz) Fixes: iree-org#22153 ci-extra: test_torch --------- Signed-off-by: Eric Feng <[email protected]> Signed-off-by: Eric Feng <[email protected]>
…ms` attribute (iree-org#22342)" This reverts commit b879ee1.
…ms` attribute (iree-org#22342)" This reverts commit b879ee1. Signed-off-by: Eric Feng <[email protected]>
…ms` attribute (#22342) (#22982) This reverts commit b879ee1. To temporarily fix ONNX models test suite from issues suspected from [[DispatchCreation] Unit dims not properly folded away during GlobalOptimization](#22978) and reported regression. Signed-off-by: Eric Feng <[email protected]>
The current binary size for this test is nearing its threshold (459794/460000). Bump to 470000 to allow future changes such as iree-org#22342 some room (goes up to 460306) Signed-off-by: Eric Feng <[email protected]>
…ibute (iree-org#22342) This patch introduces iteration space expansion for reductions in the VectorDistribute path. Specifically, we: 1. Add a new attribute, `expand_dims`, for reductions. 2. Introduce a new pass, `GPUExpandDimensions`, which uses `expand_dims` to expand the iteration space of relevant dimensions. 3. Refactor common functionality shared between `GPUExpandDimensions` and `BlockDynamicDimensions` into reusable utilities. 4. Refactor encoding helpers from `EncodingAttrs.cpp` into reusable utilities. This change also enables [chain FMA](iree-org#21855) in matvec codegen as we iterate along the K reduction dimension. --- **Performance Summary** **IREE benchmark module** * Only expansion: ~4% improvement * Expansion + chain FMA: ~11% improvement **rocprof** * Only expansion: ~13% worse * Expansion + chain FMA: ~9% better **Register usage** * 10% reduction (60 → 54 registers for matvec dispatches) **Instruction latency (post-reduction loop epilogue)** * 3.5% improvement (340 → 328 total mean latency) --- **Notes** * As a follow-up, we can explore applying iteration space expansion to the reduction in attention * Right now, we only expand one dimension into two although the implementation supports expansion to N dimensions. * Please note this PR changes the reduction order, so expect some minor changes to the numerics * This is does not improve performance by itself/can cause regression without chain FMA iree-org#21855 Traces for matvec dispatches are attached for all variations (original, only expansion, and expansion + chain FMA). [115_expansion_and_chain.tar.gz](https://github.com/user-attachments/files/23268046/115_expansion_and_chain.tar.gz) [115_nothing.tar.gz](https://github.com/user-attachments/files/23268047/115_nothing.tar.gz) [115_only_expansion.tar.gz](https://github.com/user-attachments/files/23268048/115_only_expansion.tar.gz) Fixes: iree-org#22153 ci-extra: test_torch --------- Signed-off-by: Eric Feng <[email protected]> Signed-off-by: Eric Feng <[email protected]>
The current binary size for this test is nearing its threshold (459794/460000). Bump to 470000 to allow future changes such as iree-org#22342 some room (goes up to 460306) Signed-off-by: Eric Feng <[email protected]> Signed-off-by: Bangtian Liu <[email protected]>
…ibute (iree-org#22342) This patch introduces iteration space expansion for reductions in the VectorDistribute path. Specifically, we: 1. Add a new attribute, `expand_dims`, for reductions. 2. Introduce a new pass, `GPUExpandDimensions`, which uses `expand_dims` to expand the iteration space of relevant dimensions. 3. Refactor common functionality shared between `GPUExpandDimensions` and `BlockDynamicDimensions` into reusable utilities. 4. Refactor encoding helpers from `EncodingAttrs.cpp` into reusable utilities. This change also enables [chain FMA](iree-org#21855) in matvec codegen as we iterate along the K reduction dimension. --- **Performance Summary** **IREE benchmark module** * Only expansion: ~4% improvement * Expansion + chain FMA: ~11% improvement **rocprof** * Only expansion: ~13% worse * Expansion + chain FMA: ~9% better **Register usage** * 10% reduction (60 → 54 registers for matvec dispatches) **Instruction latency (post-reduction loop epilogue)** * 3.5% improvement (340 → 328 total mean latency) --- **Notes** * As a follow-up, we can explore applying iteration space expansion to the reduction in attention * Right now, we only expand one dimension into two although the implementation supports expansion to N dimensions. * Please note this PR changes the reduction order, so expect some minor changes to the numerics * This is does not improve performance by itself/can cause regression without chain FMA iree-org#21855 Traces for matvec dispatches are attached for all variations (original, only expansion, and expansion + chain FMA). [115_expansion_and_chain.tar.gz](https://github.com/user-attachments/files/23268046/115_expansion_and_chain.tar.gz) [115_nothing.tar.gz](https://github.com/user-attachments/files/23268047/115_nothing.tar.gz) [115_only_expansion.tar.gz](https://github.com/user-attachments/files/23268048/115_only_expansion.tar.gz) Fixes: iree-org#22153 ci-extra: test_torch --------- Signed-off-by: Eric Feng <[email protected]> Signed-off-by: Eric Feng <[email protected]> Signed-off-by: Bangtian Liu <[email protected]>
This patch introduces iteration space expansion for reductions in the VectorDistribute path.
Specifically, we:
expand_dims, for reductions.GPUExpandDimensions, which usesexpand_dimsto expand the iteration space of relevant dimensions.GPUExpandDimensionsandBlockDynamicDimensionsinto reusable utilities.EncodingAttrs.cppinto reusable utilities.This change also enables chain FMA in matvec codegen as we iterate along the K reduction dimension.
Performance Summary
IREE benchmark module
rocprof
Register usage
Instruction latency (post-reduction loop epilogue)
Notes
Traces for matvec dispatches are attached for all variations (original, only expansion, and expansion + chain FMA).
115_expansion_and_chain.tar.gz
115_nothing.tar.gz
115_only_expansion.tar.gz
Fixes: #22153
ci-extra: test_torch