Develop single-dimensional matrix profile for multi-dimensional time series

**Motivation**

I have noticed that there are some interest in using matrix profile in multi-dimensional time series. We can think of it as:

1. Multi-Dimensional Matrix Profile of Multi-Dimensional time series
2. Single-Dimensional Matrix Profile of Multi-Dimensional time series

I believe the first one is tackled in [Matrix Profile VI](https://www.cs.ucr.edu/~eamonn/Motif_Discovery_ICDM.pdf), and its tutorial is under progress in PR #557 . 

In the paper [Matrix Profile VI](https://www.cs.ucr.edu/~eamonn/Motif_Discovery_ICDM.pdf), we can see the following note:
![image](https://user-images.githubusercontent.com/38519522/176251134-68479b9f-c6a9-469b-b045-c97a5f38c42f.png)

There are a few things to notice:
* So, it seems generalizing single-dimensional matrix profile works good for multi-dimensional data, particularly if the dimension is low (like 2 or 3) and if there is no irrelevant data there.
* The paper claims that if we generalize the concept to single-dimensional matrix profile, we can find the same motifs that were discovered before when matrix profile used for each time series. Well, in my opinion, it  is hard to believe that the statement is true in all data sets. I mean, maybe having single-dimensional matrix profile gives us new information. (note that we should NOT simply add their matrix profiles to get one-dimensional matrix profile)

**So, I think it is worth it to have support for  single-dimensional matrix profile for multi-dimensional time series data.**

----

**Challenge-(I)**
(this is based on what I read about generalizing single-dimensional matrix profile to multi-dimensional time series data. I do not remember the source though. I think it was from Eamonn Keogh.)

One of the main challenges is how to combine m distances across m-dimensional time series data.

Example:
Let's say we have two-dimensional data: T1 (first dimension) and T2 (second dimension). And, let's focus on subsequences at index i and j. Therefore: 
```
Si_1 = T1[i: i+m]; Si_2=T2[i: i+m]
Sj_1 = T1[j: j+m]; Sj_2=T2[j: j+m]

d1 = dist(Si_1, Sj_1)
d2 = dist(Si_2, Sj_2)

D = f(d1, d2) # D: total distance between i and j
```

But, what is that function `f` for combining the two distances `d1` and `d2` (and gives one single value)? Two common options are:
* f(d, d') = d + d'
* f(d, d') = $(d^{p} + d '^{p})^{1/p}$ (p-norm) 

This two are equal only when `p=1`. Otherwise, they are different. Both seems reasonable approach. However, we can go with the second approach and provide a module for that in `stumpy` (see section "implementation" below)


**Challenge-(II)**
Another challenge that I remember I read from the same source was how to avoid the domination of one dimension? well, this is probably an issue in non-normalized version. However, like many machine learning problems, it is usually up to the user on how to normalize time series. Maybe they normalize `T1` and `T2` by their maximum. Or, they may standardize the WHOLE data `T1` and the WHOLE  data `T2`. So, I believe this shouldn't be our concern.

-----

**Implementation**
I think we can easily do $(d^{p} + d '^{p})^{1/p}$ for p-norm non-normalized matrix profile. The challenge might be in using Pearson correlation in normalized version. However, there is a nice solution for that!

$D^2 = d ^ 2 + d' ^ 2 = 2m (1-\rho) + 2m(1-\rho^{\prime}) = 2m(2 - (\rho +  \rho^{\prime})) = 2[2m(1 - avg(\rho, \rho^{\prime})]$
Note that the factor 2 can be eliminated because if we scale all pairwise distances by the same number, it does not change the result.

Therefore:
$D = \sqrt{2m ( 1 - P)}$, where P is average of pearsons. So, we can use all those rolling/running variance stuff and simply just take average of pearsons! 

Cool!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop single-dimensional matrix profile for multi-dimensional time series #632

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Develop single-dimensional matrix profile for multi-dimensional time series #632

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions