Skip to content

Handle Constant Regions Consistently #591

@seanlaw

Description

@seanlaw

Currently, stumpy.stump and stumpy.stumped can account for constant regions. So, when two subsequences are being compared and one subsequence is constant (and the second is not constant) then the pearson correlation is set to 0.5, which means that the z-normalized Euclidean distance is np.sqrt(np.abs(2 * m * (1 - 0.5))) or np.sqrt(m) (not sure if this is the best/correct choice see proof below). However, when both subsequences are constant then the pearson correlation is set to 1.0, which means that the distance is 0.0

However, this is unaccounted for in stumpy.mstump and stumpy.mstumped and stumpy.gpu_stump and stumpy.mass and some tests/naive.py implementatios. This may be something we should consider doing consistently everywhere.

changes to `_mass`

Replace the return statement with:

distance_profile = calculate_distance_profile(m, QT, μ_Q, σ_Q, M_T, Σ_T)
if σ_Q == 0.0:
    # Both subsequences are constant
    distance_profile[Σ_T == 0.0] = 0.0
else:
    # Only one subsequence is constant
    distance_profile[Σ_T == 0.0] = math.sqrt(m)

return distance_profile

This paper may offer some direction/insight as to how best to handle this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions