Skip to content

## Pull Request Overview #2197

Open
Open
@Stevenanthony21b

Description

@Stevenanthony21b

Pull Request Overview

This PR ensures that the compute shader local sizes (x, y, z) are rounded up to multiples of the subgroup size. It adds helper functions to compute trailing-zero counts, round values to powers-of-two multiples, and selects the minimal adjustment cost for the trio of dimensions.

  • Added count_trailing_zeros, round_up_pow2_mul, and adjust_xyz helper functions
  • Modified set_local_size_xyz to call adjust_xyz before assigning the local sizes
  • Brought in <limits.h> for LONG_MAX
Comments suppressed due to low confidence (1)

src/pipeline.cpp:197

  • No unit tests were added for the new rounding logic in adjust_xyz. Consider adding tests for various subgroup_size and (w,h,c) combinations, including edge cases like zero or non-power-of-two sizes.
adjust_xyz(&w, &h, &c, d->subgroup_size);

Originally posted by @copilot-pull-request-reviewer in Tencent/ncnn#2483 (review)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions