Skip to content

Parallelize Dual Grid Construction #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

aaronzedwick
Copy link
Member

@aaronzedwick aaronzedwick commented May 26, 2025

Closes #1277

Overview

This adds parallelization to numba functions within the dual function and within the bilinear weights calculation.

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have updated with any function changes

@aaronzedwick aaronzedwick added the run-benchmark Run ASV benchmark workflow label May 26, 2025
Copy link

github-actions bot commented May 26, 2025

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [d27bbfc] After [d37a581] Ratio Benchmark (Parameter)
- 776M 345M 0.45 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
failed 754M n/a face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 50.6±1ms 25.4±0.1ms 0.50 mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
- 5.02±0.2ms 3.89±0.1ms 0.78 mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
- 377M 297M 0.79 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [d27bbfc] After [d37a581] Ratio Benchmark (Parameter)
343M 344M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
375M 374M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
19.0±0.02ms 19.1±0.2ms 1.00 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
3.33±0.03ms 3.31±0.03ms 1.00 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
17.1±0.06ms 17.2±0.05ms 1.01 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
2.10±0ms 2.07±0.02ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
2.11±0.05s 2.08±0.03s 0.99 import.Imports.timeraw_import_uxarray
855±7μs 872±10μs 1.02 mpas_ocean.CheckNorm.time_check_norm('120km')
694±20μs 672±10μs 0.97 mpas_ocean.CheckNorm.time_check_norm('480km')
662±6ms 669±5ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
43.2±0.9ms 43.2±0.3ms 1.00 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
637±10μs 622±6μs 0.98 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
516±7μs 507±5μs 0.98 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
6.44±0.09ms 6.39±0.07ms 0.99 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
4.73±0.02ms 4.77±0.05ms 1.01 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.58±0.04s 3.53±0.02s 0.99 mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
224±0.8ms 225±3ms 1.00 mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
18.1±0.02ms 18.1±0.03ms 1.00 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
959±10μs 964±5μs 1.01 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
11.2±0.07ms 11.3±0.07ms 1.01 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
1.16±0.01ms 1.16±0.01ms 1.00 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
547±6ms 549±4ms 1.00 mpas_ocean.CrossSections.time_const_lat('120km', 1)
274±1ms 274±1ms 1.00 mpas_ocean.CrossSections.time_const_lat('120km', 2)
142±0.5ms 143±1ms 1.01 mpas_ocean.CrossSections.time_const_lat('120km', 4)
426±5ms 420±2ms 0.99 mpas_ocean.CrossSections.time_const_lat('480km', 1)
216±4ms 213±1ms 0.99 mpas_ocean.CrossSections.time_const_lat('480km', 2)
111±2ms 112±0.8ms 1.01 mpas_ocean.CrossSections.time_const_lat('480km', 4)
915±7ms 919±6ms 1.00 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
53.5±0.8ms 52.7±0.3ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
74.6±1ms 73.9±0.2ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.44±0.07ms 5.30±0.1ms 0.97 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
233M 233M 1.00 mpas_ocean.Gradient.peakmem_gradient('120km')
217M 217M 1.00 mpas_ocean.Gradient.peakmem_gradient('480km')
2.73±0.02ms 2.78±0.03ms 1.02 mpas_ocean.Gradient.time_gradient('120km')
310±1μs 311±2μs 1.00 mpas_ocean.Gradient.time_gradient('480km')
207±3μs 206±2μs 0.99 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
118±1μs 118±1μs 1.00 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
309M 310M 1.00 mpas_ocean.Integrate.peakmem_integrate('120km')
145±0.3ms 151±1ms 1.04 mpas_ocean.Integrate.time_integrate('120km')
18.7±0.3ms 18.1±0.1ms 0.97 mpas_ocean.Integrate.time_integrate('480km')
345±2ms 354±4ms 1.03 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
349±4ms 352±9ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
348±2ms 351±5ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
23.0±0.2ms 23.3±0.2ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.9±0.2ms 23.3±0.3ms 1.02 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.9±0.2ms 23.3±0.1ms 1.02 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
failed failed n/a mpas_ocean.PointInPolygon.time_face_search('120km')
failed failed n/a mpas_ocean.PointInPolygon.time_face_search('480km')
267±3ms 237±2ms ~0.89 mpas_ocean.RemapDownsample.time_bilinear_remapping
288±2ms 291±0.8ms 1.01 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
29.3±0.2ms 28.9±0.2ms 0.99 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
1.19±0.01s 1.18±0.01s 0.99 mpas_ocean.RemapUpsample.time_bilinear_remapping
51.3±0.2ms 51.2±0.4ms 1.00 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
26.2±0.2ms 26.4±0.2ms 1.01 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
28.1±0.2ms 28.3±0.4ms 1.01 mpas_ocean.ZonalAverage.time_zonal_average('120km')
5.65±0.06ms 5.51±0.03ms 0.98 mpas_ocean.ZonalAverage.time_zonal_average('480km')
215M 213M 0.99 quad_hexagon.QuadHexagon.peakmem_open_dataset
213M 213M 1.00 quad_hexagon.QuadHexagon.peakmem_open_grid
7.56±0.1ms 7.77±0.4ms 1.03 quad_hexagon.QuadHexagon.time_open_dataset
6.46±0.01ms 6.60±0.1ms 1.02 quad_hexagon.QuadHexagon.time_open_grid

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label May 28, 2025
@aaronzedwick aaronzedwick marked this pull request as ready for review June 3, 2025 02:23
@aaronzedwick aaronzedwick requested review from rajeeja and philipc2 June 3, 2025 14:34
@rajeeja rajeeja added the scalability Related to scalability & performance efforts label Jun 10, 2025
@philipc2 philipc2 requested a review from Copilot July 15, 2025 20:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds parallelization to dual mesh construction and bilinear weight calculation functions to improve performance. The changes implement Numba's prange for parallel execution in computationally intensive loops.

  • Enables parallel processing in dual mesh face construction using prange
  • Adds parallelization to bilinear weight calculation functions
  • Improves numerical stability with bounds checking and error handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
uxarray/remap/bilinear.py Adds parallel execution to weight calculation loop and improves array indexing
uxarray/grid/dual.py Implements parallel dual mesh construction with enhanced numerical stability checks
test/test_grid.py Adds comprehensive tests for parallel dual mesh construction and validation
Comments suppressed due to low confidence (1)

test/test_grid.py:877

  • [nitpick] The 30-second timeout threshold for performance testing may be too lenient and could vary significantly across different hardware configurations. Consider using a more relative performance metric or making this configurable.
    assert avg_time < 30.0, f"Dual construction too slow: {avg_time:.2f}s"

if d_dot_norm > 1.0:
d_dot_norm = 1.0
# Clamp to valid range for arccos to avoid numerical errors
d_dot_norm = max(-1.0, min(1.0, d_dot_norm))
Copy link
Preview

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using numpy.clip() instead of nested max/min calls for better readability and potential performance: 'd_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)'

Suggested change
d_dot_norm = max(-1.0, min(1.0, d_dot_norm))
d_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)

Copilot uses AI. Check for mistakes.

Comment on lines 163 to 166
for idx in prange(len(valid_idxs)):
fidx = int(face_indices[valid_idxs[idx], 0])
# bounds check
if fidx < 0 or fidx >= len(n_nodes_per_face):
Copy link
Preview

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds check using len(n_nodes_per_face) is called repeatedly inside the parallel loop. Consider storing the array length in a variable before the loop to avoid repeated function calls.

Suggested change
for idx in prange(len(valid_idxs)):
fidx = int(face_indices[valid_idxs[idx], 0])
# bounds check
if fidx < 0 or fidx >= len(n_nodes_per_face):
n_nodes_per_face_len = len(n_nodes_per_face)
for idx in prange(len(valid_idxs)):
fidx = int(face_indices[valid_idxs[idx], 0])
# bounds check
if fidx < 0 or fidx >= n_nodes_per_face_len:

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't len(n_nodes_per_face) equivalent to n_face?

@philipc2 philipc2 changed the title Update dual creation to use parallelization Parallelize Dual Grid Construction Jul 16, 2025
Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my i7 local machine for a 15km Grid

image

There's a noticeable speedup, but scaling is not great. I'm happy to move forward with this though.

Comment on lines 163 to 166
for idx in prange(len(valid_idxs)):
fidx = int(face_indices[valid_idxs[idx], 0])
# bounds check
if fidx < 0 or fidx >= len(n_nodes_per_face):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't len(n_nodes_per_face) equivalent to n_face?

@rajeeja rajeeja requested a review from philipc2 July 18, 2025 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scalability Related to scalability & performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize Dual Creation
4 participants