Skip to content

[PERF] Explore overflow pre-check optimization for multiply with ANSI mode enabled (Spark 4.0+) #3982

@binmahone

Description

@binmahone

Context

PR #3974 optimized the multiply operation by avoiding validity vector allocation when both inputs have no nulls and overflow checking is disabled.

However, in Spark 4.0+, ANSI mode will be enabled by default, which means check_overflow will be true in most cases. This significantly reduces the benefit of the optimization in #3974, as we always need to allocate the validity vector to track overflow.

Proposed Optimization

Explore a two-pass approach for integer multiply operations when ANSI mode is enabled:

Pass 1: Fast overflow detection kernel

  • Run a lightweight kernel that only checks if any overflow would occur
  • No result computation, no validity vector allocation
  • Use shared memory reduction to quickly aggregate overflow status
  • Early exit if overflow is detected

Pass 2: Conditional execution

Expected Benefits

  • For workloads where overflow rarely happens (common case), we can still benefit from the fast path
  • Trade-off: Additional kernel launch overhead vs. memory allocation and computation savings
  • Most beneficial for large datasets where overflow is rare

Considerations

  • Only applicable to integer types (int8, int16, int32, int64)
  • Floating point multiply doesn't need overflow checking
  • Need to benchmark to ensure the two-pass approach is actually faster than always allocating validity vector
  • Consider different thresholds for small vs. large datasets

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions