Skip to content

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Jun 11, 2025

Rationale for this change

When slicing arrays with non-trivial steps we were using numpy.arange to generate the indices for take. As numpy is an optional dependency, implementing it via Python caused a performance penalty. Creating a pyarrow function to build our own ranges that mimics Python range or numpy arange is useful for that uses case and might also be useful for other use cases. Currently we only generate Array[Int64] we could potentially generate more types.

What changes are included in this PR?

provide a pa.arange function that allows us to generate indices when slicing arrays.

Are these changes tested?

Yes new tests added.

Are there any user-facing changes?

No but a new pyarrow.arange function has been added.

This comment was marked as outdated.

Copy link
Member Author

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether this is what you had in mind @pitrou but would appreciate some tips on possible improvements. I've been testing and it stills is slightly slower than the np.arange original implementation. With the original np.arange implementation:

In [2]: a = pa.array(np.arange(0, 2_000_000))

In [3]: %timeit a[slice(1, 1_000_000, 2)]
763 μs ± 4.28 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each

with the current PR one:

In [2]: a = pa.array(np.arange(0, 2_000_000))

In [3]: %timeit a[slice(1, 1_000_000, 2)]
1.08 ms ± 43.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@raulcd raulcd changed the title [Python][C++] Implement pa.arange function to generate array sequences GH-46771: [Python][C++] Implement pa.arange function to generate array sequences Jun 11, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 11, 2025
Copy link

⚠️ GitHub issue #46771 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 11, 2025
@raulcd
Copy link
Member Author

raulcd commented Jun 11, 2025

It seems to be on-pair now:

In [2]: %timeit np.arange(1, 1_000_000, 2)
219 μs ± 10.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [3]: %timeit pa.arange(1, 1_000_000, 2)
212 μs ± 1.97 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 12, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jun 12, 2025
@raulcd raulcd marked this pull request as ready for review June 13, 2025 07:30
@raulcd raulcd requested review from AlenkaF and rok as code owners June 13, 2025 07:30
Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @raulcd for working on this!
It looks good on my end, but I'll let Antoine finish the review.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 19, 2025
@raulcd raulcd requested a review from pitrou June 19, 2025 08:24
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for fixing this @raulcd

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting merge Awaiting merge and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Jun 20, 2025
@raulcd raulcd merged commit b3e261d into apache:main Jun 20, 2025
17 checks passed
@raulcd raulcd removed the awaiting merge Awaiting merge label Jun 20, 2025
@raulcd raulcd deleted the GH-46771 branch June 20, 2025 10:33
@github-actions github-actions bot added the awaiting changes Awaiting changes label Jun 20, 2025
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b3e261d.

There were 122 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants