✨ feat: Support parallel computation for NDT#6376
✨ feat: Support parallel computation for NDT#6376mvieth merged 7 commits intoPointCloudLibrary:masterfrom
Conversation
Support parallel computation for NDT by OpenMP with different neighbor search method
Use signed std::ptrdiff_t instead of std::size_t
There was a problem hiding this comment.
Pull request overview
This PR adds OpenMP-based parallel computation support to the Normal Distributions Transform (NDT) registration algorithm, along with multiple neighbor search methods to optimize performance. The changes introduce thread-level parallelization to speed up the derivative computation phase of NDT.
Key changes:
- Adds a
NeighborSearchMethodenum with four options: KDTREE (original), DIRECT26, DIRECT7, and DIRECT1 - Implements
setNumberOfThreads()method with OpenMP support for parallel execution - Parallelizes the main computation loop in
computeDerivatives()using OpenMP directives - Updates test to verify functionality with the new API
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| registration/include/pcl/registration/ndt.h | Adds enum for neighbor search methods, declares thread control API and private member variables for threading configuration |
| registration/include/pcl/registration/impl/ndt.hpp | Implements setNumberOfThreads() with OpenMP detection, parallelizes derivative computation loop, adds switch statement for different search methods |
| test/registration/test_ndt.cpp | Updates test to explicitly configure search method and thread count for validation |
Comments suppressed due to low confidence (1)
registration/include/pcl/registration/impl/ndt.hpp:269
- The OpenMP parallel region has a critical thread safety issue. The variables
score,score_gradient, andhessianare marked as shared, but multiple threads will update them concurrently (viascore +=on line 266 and calls toupdateDerivativeswhich modifiesscore_gradientandhessian). These concurrent updates to shared variables will cause race conditions, leading to incorrect results. You need to use reduction clauses forscoreand critical sections or atomic operations forscore_gradientandhessian, or alternatively use thread-local accumulators that are combined after the parallel region.
#pragma omp parallel for num_threads(threads_) schedule(dynamic, 32) \
shared(score, score_gradient, hessian) firstprivate(neighborhood, distances)
// Update gradient and hessian for each point, line 17 in Algorithm 2 [Magnusson 2009]
for (std::ptrdiff_t idx = 0; idx < static_cast<std::ptrdiff_t>(input_->size());
idx++) {
// Transformed Point
const auto& x_trans_pt = trans_cloud[idx];
// Find neighbors with different search method
switch (search_method_) {
case KDTREE:
// Radius search has been experimentally faster than direct neighbor checking.
target_cells_.radiusSearch(x_trans_pt, resolution_, neighborhood, distances);
break;
case DIRECT26:
target_cells_.getNeighborhoodAtPoint(x_trans_pt, neighborhood);
break;
case DIRECT7:
target_cells_.getFaceNeighborsAtPoint(x_trans_pt, neighborhood);
break;
case DIRECT1:
target_cells_.getVoxelAtPoint(x_trans_pt, neighborhood);
break;
}
// Original Point
const Eigen::Vector3d x = (*input_)[idx].getVector3fMap().template cast<double>();
const Eigen::Vector3d x_trans_position =
x_trans_pt.getVector3fMap().template cast<double>();
for (const auto& cell : neighborhood) {
// Denorm point, x_k' in Equations 6.12 and 6.13 [Magnusson 2009]
const Eigen::Vector3d x_trans = x_trans_position - cell->getMean();
// Inverse Covariance of Occupied Voxel
// Uses precomputed covariance for speed.
const Eigen::Matrix3d c_inv = cell->getInverseCov();
// Compute derivative of transform function w.r.t. transform vector, J_E and H_E
// in Equations 6.18 and 6.20 [Magnusson 2009]
computePointDerivatives(x);
// Update score, gradient and hessian, lines 19-21 in Algorithm 2, according to
// Equations 6.10, 6.12 and 6.13, respectively [Magnusson 2009]
score +=
updateDerivatives(score_gradient, hessian, x_trans, c_inv, compute_hessian);
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Apply copilot suggestion to fix some error
mvieth
left a comment
There was a problem hiding this comment.
You have to at least put a omp critical region around computePointDerivatives(x) and updateDerivatives(...) because they both use global variables.
Also maybe put NeighborSearchMethod into the NormalDistributionTransform class, to make it clear that they belong together?
- Added the switch to select different search methods to computeHessian() as well (would not make sense to use different search methods in computeDerivatives() and computeHessian()) - Added overloads to functions computePointDerivatives(), updateDerivatives(), and updateHessian() to be able to use local variables instead of the global variables point_jacobian_ and point_hessian_ - Change the parallelized region in computeDerivatives() by using the new function overloads, and by introducing a custom reduction (goal: compatibility with OpenMP 2.0). Synchronization overhead between threads is minimal, with only two critical regions at the end to accumulate, score, gradient, and hessian - Tested with two larger point clouds, and there is indeed a nice speed-up when increasing the number of threads (however not a perfect 1-to-1 speedup because there are still other functions in NDT that are not parallelized)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I looked a bit more into this, and I realized that my earlier suggestion to put an omp critical region around I also added the switch to select different search methods to computeHessian() as well (would not make sense to use different search methods in computeDerivatives() and computeHessian()). |
Support parallel computation for NDT by OpenMP with different neighbor search method