-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Labels
Description
Context
PR #3974 optimized the multiply operation by avoiding validity vector allocation when both inputs have no nulls and overflow checking is disabled.
However, in Spark 4.0+, ANSI mode will be enabled by default, which means check_overflow will be true in most cases. This significantly reduces the benefit of the optimization in #3974, as we always need to allocate the validity vector to track overflow.
Proposed Optimization
Explore a two-pass approach for integer multiply operations when ANSI mode is enabled:
Pass 1: Fast overflow detection kernel
- Run a lightweight kernel that only checks if any overflow would occur
- No result computation, no validity vector allocation
- Use shared memory reduction to quickly aggregate overflow status
- Early exit if overflow is detected
Pass 2: Conditional execution
- If no overflow detected: Use the fast path (no validity vector, similar to Scenario A in Optimize multiply operation by avoiding unnecessary validity vector allocation #3974)
- If overflow detected: Fall back to current implementation with validity vector
Expected Benefits
- For workloads where overflow rarely happens (common case), we can still benefit from the fast path
- Trade-off: Additional kernel launch overhead vs. memory allocation and computation savings
- Most beneficial for large datasets where overflow is rare
Considerations
- Only applicable to integer types (int8, int16, int32, int64)
- Floating point multiply doesn't need overflow checking
- Need to benchmark to ensure the two-pass approach is actually faster than always allocating validity vector
- Consider different thresholds for small vs. large datasets