I am no expert in matrix computation.
For a symmetric matrix output, skipping computation of entries in the low-triangular area will speed up the computation by roughly a factor of two. This happens a lot when you are trying to find the best matches within the same set, aka self dot-product.
For example, at line 100 of https://github.com/ing-bank/sparse_dot_topn/blob/master/sparse_dot_topn/sparse_dot_topn_source.cpp, can we add a condition to check if head < i, with a flag check on an additional argument symmetric?