Open
Description
Pull Request Overview
This PR ensures that the compute shader local sizes (x, y, z) are rounded up to multiples of the subgroup size. It adds helper functions to compute trailing-zero counts, round values to powers-of-two multiples, and selects the minimal adjustment cost for the trio of dimensions.
- Added
count_trailing_zeros
,round_up_pow2_mul
, andadjust_xyz
helper functions - Modified
set_local_size_xyz
to calladjust_xyz
before assigning the local sizes - Brought in
<limits.h>
forLONG_MAX
Comments suppressed due to low confidence (1)
src/pipeline.cpp:197
- No unit tests were added for the new rounding logic in
adjust_xyz
. Consider adding tests for varioussubgroup_size
and(w,h,c)
combinations, including edge cases like zero or non-power-of-two sizes.
adjust_xyz(&w, &h, &c, d->subgroup_size);
Originally posted by @copilot-pull-request-reviewer in Tencent/ncnn#2483 (review)
Metadata
Metadata
Assignees
Labels
No labels