Parallelize Dual Grid Construction #1274

aaronzedwick · 2025-05-26T14:51:34Z

Closes #1277

Overview

This adds parallelization to numba functions within the dual function and within the bilinear weights calculation.

PR Checklist

General

An issue is linked created and linked
Add appropriate labels
Filled out Overview and Expected Usage (if applicable) sections

Testing

Adequate tests are created if there is new functionality
Tests cover all possible logical paths in your function
Tests are not too basic (such as simply calling a function and nothing else)

Documentation

Docstrings have updated with any function changes

github-actions · 2025-05-26T15:14:22Z

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change	Before [`d27bbfc`]	After [`d37a581`]	Ratio	Benchmark (Parameter)
-	776M	345M	0.45	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
	failed	754M	n/a	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
-	50.6±1ms	25.4±0.1ms	0.50	mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
-	5.02±0.2ms	3.89±0.1ms	0.78	mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
-	377M	297M	0.79	mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Before [`d27bbfc`]	After [`d37a581`]	Ratio	Benchmark (Parameter)
343M	344M	1.00	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
375M	374M	1.00	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
19.0±0.02ms	19.1±0.2ms	1.00	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
3.33±0.03ms	3.31±0.03ms	1.00	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
17.1±0.06ms	17.2±0.05ms	1.01	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
2.10±0ms	2.07±0.02ms	0.99	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
2.11±0.05s	2.08±0.03s	0.99	import.Imports.timeraw_import_uxarray
855±7μs	872±10μs	1.02	mpas_ocean.CheckNorm.time_check_norm('120km')
694±20μs	672±10μs	0.97	mpas_ocean.CheckNorm.time_check_norm('480km')
662±6ms	669±5ms	1.01	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
43.2±0.9ms	43.2±0.3ms	1.00	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
637±10μs	622±6μs	0.98	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
516±7μs	507±5μs	0.98	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
6.44±0.09ms	6.39±0.07ms	0.99	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
4.73±0.02ms	4.77±0.05ms	1.01	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.58±0.04s	3.53±0.02s	0.99	mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
224±0.8ms	225±3ms	1.00	mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
18.1±0.02ms	18.1±0.03ms	1.00	mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
959±10μs	964±5μs	1.01	mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
11.2±0.07ms	11.3±0.07ms	1.01	mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
1.16±0.01ms	1.16±0.01ms	1.00	mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
547±6ms	549±4ms	1.00	mpas_ocean.CrossSections.time_const_lat('120km', 1)
274±1ms	274±1ms	1.00	mpas_ocean.CrossSections.time_const_lat('120km', 2)
142±0.5ms	143±1ms	1.01	mpas_ocean.CrossSections.time_const_lat('120km', 4)
426±5ms	420±2ms	0.99	mpas_ocean.CrossSections.time_const_lat('480km', 1)
216±4ms	213±1ms	0.99	mpas_ocean.CrossSections.time_const_lat('480km', 2)
111±2ms	112±0.8ms	1.01	mpas_ocean.CrossSections.time_const_lat('480km', 4)
915±7ms	919±6ms	1.00	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
53.5±0.8ms	52.7±0.3ms	0.99	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
74.6±1ms	73.9±0.2ms	0.99	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.44±0.07ms	5.30±0.1ms	0.97	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
233M	233M	1.00	mpas_ocean.Gradient.peakmem_gradient('120km')
217M	217M	1.00	mpas_ocean.Gradient.peakmem_gradient('480km')
2.73±0.02ms	2.78±0.03ms	1.02	mpas_ocean.Gradient.time_gradient('120km')
310±1μs	311±2μs	1.00	mpas_ocean.Gradient.time_gradient('480km')
207±3μs	206±2μs	0.99	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
118±1μs	118±1μs	1.00	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
309M	310M	1.00	mpas_ocean.Integrate.peakmem_integrate('120km')
145±0.3ms	151±1ms	1.04	mpas_ocean.Integrate.time_integrate('120km')
18.7±0.3ms	18.1±0.1ms	0.97	mpas_ocean.Integrate.time_integrate('480km')
345±2ms	354±4ms	1.03	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
349±4ms	352±9ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
348±2ms	351±5ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
23.0±0.2ms	23.3±0.2ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.9±0.2ms	23.3±0.3ms	1.02	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.9±0.2ms	23.3±0.1ms	1.02	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
failed	failed	n/a	mpas_ocean.PointInPolygon.time_face_search('120km')
failed	failed	n/a	mpas_ocean.PointInPolygon.time_face_search('480km')
267±3ms	237±2ms	~0.89	mpas_ocean.RemapDownsample.time_bilinear_remapping
288±2ms	291±0.8ms	1.01	mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
29.3±0.2ms	28.9±0.2ms	0.99	mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
1.19±0.01s	1.18±0.01s	0.99	mpas_ocean.RemapUpsample.time_bilinear_remapping
51.3±0.2ms	51.2±0.4ms	1.00	mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
26.2±0.2ms	26.4±0.2ms	1.01	mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
28.1±0.2ms	28.3±0.4ms	1.01	mpas_ocean.ZonalAverage.time_zonal_average('120km')
5.65±0.06ms	5.51±0.03ms	0.98	mpas_ocean.ZonalAverage.time_zonal_average('480km')
215M	213M	0.99	quad_hexagon.QuadHexagon.peakmem_open_dataset
213M	213M	1.00	quad_hexagon.QuadHexagon.peakmem_open_grid
7.56±0.1ms	7.77±0.4ms	1.03	quad_hexagon.QuadHexagon.time_open_dataset
6.46±0.01ms	6.60±0.1ms	1.02	quad_hexagon.QuadHexagon.time_open_grid

…lity improvements

Copilot

Pull Request Overview

This PR adds parallelization to dual mesh construction and bilinear weight calculation functions to improve performance. The changes implement Numba's prange for parallel execution in computationally intensive loops.

Enables parallel processing in dual mesh face construction using prange
Adds parallelization to bilinear weight calculation functions
Improves numerical stability with bounds checking and error handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
uxarray/remap/bilinear.py	Adds parallel execution to weight calculation loop and improves array indexing
uxarray/grid/dual.py	Implements parallel dual mesh construction with enhanced numerical stability checks
test/test_grid.py	Adds comprehensive tests for parallel dual mesh construction and validation

Comments suppressed due to low confidence (1)

test/test_grid.py:877

[nitpick] The 30-second timeout threshold for performance testing may be too lenient and could vary significantly across different hardware configurations. Consider using a more relative performance metric or making this configurable.

    assert avg_time < 30.0, f"Dual construction too slow: {avg_time:.2f}s"

uxarray/grid/dual.py

Copilot · 2025-07-15T20:33:19Z

uxarray/grid/dual.py

-            if d_dot_norm > 1.0:
-                d_dot_norm = 1.0
+            # Clamp to valid range for arccos to avoid numerical errors
+            d_dot_norm = max(-1.0, min(1.0, d_dot_norm))


[nitpick] Consider using numpy.clip() instead of nested max/min calls for better readability and potential performance: 'd_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)'

Suggested change

d_dot_norm = max(-1.0, min(1.0, d_dot_norm))

d_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)

Copilot · 2025-07-15T20:33:19Z

uxarray/remap/bilinear.py

+    for idx in prange(len(valid_idxs)):
+        fidx = int(face_indices[valid_idxs[idx], 0])
+        # bounds check
+        if fidx < 0 or fidx >= len(n_nodes_per_face):


The bounds check using len(n_nodes_per_face) is called repeatedly inside the parallel loop. Consider storing the array length in a variable before the loop to avoid repeated function calls.

Suggested change

for idx in prange(len(valid_idxs)):

fidx = int(face_indices[valid_idxs[idx], 0])

# bounds check

if fidx < 0 or fidx >= len(n_nodes_per_face):

n_nodes_per_face_len = len(n_nodes_per_face)

for idx in prange(len(valid_idxs)):

fidx = int(face_indices[valid_idxs[idx], 0])

# bounds check

if fidx < 0 or fidx >= n_nodes_per_face_len:

Isn't len(n_nodes_per_face) equivalent to n_face?

philipc2

On my i7 local machine for a 15km Grid

There's a noticeable speedup, but scaling is not great. I'm happy to move forward with this though.

philipc2 · 2025-07-16T15:43:51Z

uxarray/remap/bilinear.py

+    for idx in prange(len(valid_idxs)):
+        fidx = int(face_indices[valid_idxs[idx], 0])
+        # bounds check
+        if fidx < 0 or fidx >= len(n_nodes_per_face):


Isn't len(n_nodes_per_face) equivalent to n_face?

test/test_grid.py

…ion call entirely in the Numba-compiled loop

Update dual to use parallelization

4152eff

aaronzedwick added the run-benchmark Run ASV benchmark workflow label May 26, 2025

updated doc strings, precommit

109fee0

Added parallelization to bilinear _calculate_weights

9e9c2e4

rajeeja mentioned this pull request May 28, 2025

Parallelize Dual Creation #1277

Open

Merge branch 'main' into zedwick/parallelize_dual

e69bd5e

philipc2 removed the run-benchmark Run ASV benchmark workflow label May 28, 2025

Merge branch 'main' into zedwick/parallelize_dual

e012c05

aaronzedwick marked this pull request as ready for review June 3, 2025 02:23

aaronzedwick requested review from rajeeja and philipc2 June 3, 2025 14:34

Merge branch 'main' into zedwick/parallelize_dual

432baf4

rajeeja added the scalability Related to scalability & performance efforts label Jun 10, 2025

rajeeja and others added 4 commits June 13, 2025 16:10

Merge branch 'main' into zedwick/parallelize_dual

aa76ac3

o Enhance dual mesh construction with comprehensive testing and stabi…

9dec036

…lity improvements

o Add minor bounds check for calc wts fn

98a1efb

Merge branch 'main' into zedwick/parallelize_dual

562b8d1

philipc2 requested a review from Copilot July 15, 2025 20:32

Copilot AI reviewed Jul 15, 2025

View reviewed changes

philipc2 changed the title ~~Update dual creation to use parallelization~~ Parallelize Dual Grid Construction Jul 16, 2025

philipc2 reviewed Jul 16, 2025

View reviewed changes

philipc2 requested changes Jul 16, 2025

View reviewed changes

philipc2 assigned rajeeja Jul 16, 2025

philipc2 and others added 4 commits July 16, 2025 15:03

Merge branch 'main' into zedwick/parallelize_dual

6889048

o Remove parallel tests

4620d5f

o don't change test_grid.py

58f7e5e

o Pass n_face to _calculate_weights function to avoid the len() funct…

ce7b6ad

…ion call entirely in the Numba-compiled loop

rajeeja requested a review from philipc2 July 18, 2025 00:13

Merge branch 'main' into zedwick/parallelize_dual

fbdb373

rajeeja approved these changes Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelize Dual Grid Construction #1274

Parallelize Dual Grid Construction #1274

Uh oh!

aaronzedwick commented May 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jul 15, 2025

Uh oh!

Copilot AI Jul 15, 2025

Uh oh!

philipc2 Jul 16, 2025

Uh oh!

philipc2 left a comment

Uh oh!

philipc2 Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

	d_dot_norm = max(-1.0, min(1.0, d_dot_norm))
	d_dot_norm = np.clip(d_dot_norm, -1.0, 1.0)

Parallelize Dual Grid Construction #1274

Are you sure you want to change the base?

Parallelize Dual Grid Construction #1274

Uh oh!

Conversation

aaronzedwick commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

PR Checklist

Uh oh!

github-actions bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ASV Benchmarking

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

philipc2 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

philipc2 left a comment

Choose a reason for hiding this comment

Uh oh!

philipc2 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aaronzedwick commented May 26, 2025 •

edited

Loading

github-actions bot commented May 26, 2025 •

edited

Loading