-
Notifications
You must be signed in to change notification settings - Fork 336
Description
Currently, stumpy.stump
and stumpy.stumped
can account for constant regions. So, when two subsequences are being compared and one subsequence is constant (and the second is not constant) then the pearson correlation is set to 0.5
, which means that the z-normalized Euclidean distance is np.sqrt(np.abs(2 * m * (1 - 0.5)))
or np.sqrt(m)
(not sure if this is the best/correct choice see proof below). However, when both subsequences are constant then the pearson correlation is set to 1.0
, which means that the distance is 0.0
However, this is unaccounted for in stumpy.mstump
and stumpy.mstumped
and stumpy.gpu_stump
and stumpy.mass
and some tests/naive.py
implementatios. This may be something we should consider doing consistently everywhere.
changes to `_mass`
Replace the return statement with:
distance_profile = calculate_distance_profile(m, QT, μ_Q, σ_Q, M_T, Σ_T)
if σ_Q == 0.0:
# Both subsequences are constant
distance_profile[Σ_T == 0.0] = 0.0
else:
# Only one subsequence is constant
distance_profile[Σ_T == 0.0] = math.sqrt(m)
return distance_profile
This paper may offer some direction/insight as to how best to handle this.