[PERF] Explore overflow pre-check optimization for multiply with ANSI mode enabled (Spark 4.0+)


## Context
PR #3974 optimized the `multiply` operation by avoiding validity vector allocation when both inputs have no nulls and overflow checking is disabled.

However, in Spark 4.0+, ANSI mode will be enabled by default, which means `check_overflow` will be true in most cases. This significantly reduces the benefit of the optimization in #3974, as we always need to allocate the validity vector to track overflow.

## Proposed Optimization
Explore a two-pass approach for integer multiply operations when ANSI mode is enabled:

### Pass 1: Fast overflow detection kernel
- Run a lightweight kernel that only checks if any overflow would occur
- No result computation, no validity vector allocation
- Use shared memory reduction to quickly aggregate overflow status
- Early exit if overflow is detected

### Pass 2: Conditional execution
- **If no overflow detected**: Use the fast path (no validity vector, similar to Scenario A in #3974)
- **If overflow detected**: Fall back to current implementation with validity vector

## Expected Benefits
- For workloads where overflow rarely happens (common case), we can still benefit from the fast path
- Trade-off: Additional kernel launch overhead vs. memory allocation and computation savings
- Most beneficial for large datasets where overflow is rare

## Considerations
- Only applicable to integer types (int8, int16, int32, int64)
- Floating point multiply doesn't need overflow checking
- Need to benchmark to ensure the two-pass approach is actually faster than always allocating validity vector
- Consider different thresholds for small vs. large datasets

## Related
- Follow-up to #3974 (suggested by @revans2 in https://github.com/NVIDIA/spark-rapids-jni/pull/3974#pullrequestreview-2427848626)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PERF] Explore overflow pre-check optimization for multiply with ANSI mode enabled (Spark 4.0+) #3982

Context

Proposed Optimization

Pass 1: Fast overflow detection kernel

Pass 2: Conditional execution

Expected Benefits

Considerations

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PERF] Explore overflow pre-check optimization for multiply with ANSI mode enabled (Spark 4.0+) #3982

Description

Context

Proposed Optimization

Pass 1: Fast overflow detection kernel

Pass 2: Conditional execution

Expected Benefits

Considerations

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions