-
Notifications
You must be signed in to change notification settings - Fork 41
Parallelize Dual Grid Construction #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ASV BenchmarkingBenchmark Comparison ResultsBenchmarks that have improved:
Benchmarks that have stayed the same:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds parallelization to dual mesh construction and bilinear weight calculation functions to improve performance. The changes implement Numba's prange
for parallel execution in computationally intensive loops.
- Enables parallel processing in dual mesh face construction using
prange
- Adds parallelization to bilinear weight calculation functions
- Improves numerical stability with bounds checking and error handling
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
uxarray/remap/bilinear.py | Adds parallel execution to weight calculation loop and improves array indexing |
uxarray/grid/dual.py | Implements parallel dual mesh construction with enhanced numerical stability checks |
test/test_grid.py | Adds comprehensive tests for parallel dual mesh construction and validation |
Comments suppressed due to low confidence (1)
test/test_grid.py:877
- [nitpick] The 30-second timeout threshold for performance testing may be too lenient and could vary significantly across different hardware configurations. Consider using a more relative performance metric or making this configurable.
assert avg_time < 30.0, f"Dual construction too slow: {avg_time:.2f}s"
if d_dot_norm > 1.0: | ||
d_dot_norm = 1.0 | ||
# Clamp to valid range for arccos to avoid numerical errors | ||
d_dot_norm = max(-1.0, min(1.0, d_dot_norm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using numpy.clip() instead of nested max/min calls for better readability and potential performance: 'd_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)'
d_dot_norm = max(-1.0, min(1.0, d_dot_norm)) | |
d_dot_norm = np.clip(d_dot_norm, -1.0, 1.0) |
Copilot uses AI. Check for mistakes.
uxarray/remap/bilinear.py
Outdated
for idx in prange(len(valid_idxs)): | ||
fidx = int(face_indices[valid_idxs[idx], 0]) | ||
# bounds check | ||
if fidx < 0 or fidx >= len(n_nodes_per_face): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds check using len(n_nodes_per_face) is called repeatedly inside the parallel loop. Consider storing the array length in a variable before the loop to avoid repeated function calls.
for idx in prange(len(valid_idxs)): | |
fidx = int(face_indices[valid_idxs[idx], 0]) | |
# bounds check | |
if fidx < 0 or fidx >= len(n_nodes_per_face): | |
n_nodes_per_face_len = len(n_nodes_per_face) | |
for idx in prange(len(valid_idxs)): | |
fidx = int(face_indices[valid_idxs[idx], 0]) | |
# bounds check | |
if fidx < 0 or fidx >= n_nodes_per_face_len: |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't len(n_nodes_per_face)
equivalent to n_face
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uxarray/remap/bilinear.py
Outdated
for idx in prange(len(valid_idxs)): | ||
fidx = int(face_indices[valid_idxs[idx], 0]) | ||
# bounds check | ||
if fidx < 0 or fidx >= len(n_nodes_per_face): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't len(n_nodes_per_face)
equivalent to n_face
?
…ion call entirely in the Numba-compiled loop
Closes #1277
Overview
This adds parallelization to numba functions within the dual function and within the bilinear weights calculation.
PR Checklist
General
Testing
Documentation