Skip to content

Consensus Motifs: Ostinato algorithm; most central motif; reproduce Fig 1, 2, 9 from paper Matrix Profile XV #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Dec 1, 2020

Conversation

refactoriel
Copy link
Contributor

@refactoriel refactoriel commented Nov 4, 2020

This uses the mtDNA example. Ostinato transcribed from table 2 in paper.
Performance is really slow.

resolves #277, resolves #278

Pull Request Checklist

Below is a simple checklist but please do not hesitate to ask for assistance!

  • Fork, clone, and checkout the newest version of the code
  • Create a new branch
  • Make necessary code changes
  • Install black (i.e., python -m pip install black or conda install black)
  • Install flake8 (i.e., python -m pip install flake8 or conda install flake8)
  • Install pytest-cov (i.e., python -m pip install pytest-cov or conda install pytest-cov)
  • Run black . in the root stumpy directory
  • Run flake8 . in the root stumpy directory
  • Run ./setup.sh && ./test.sh in the root stumpy directory
  • Reference a Github issue (and create one if one doesn't already exist)

This uses the mtDNA example. Ostinato transcribed from table 2 in paper.
Performance is really slow.

Related to stumpy-dev#278
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@refactoriel
Copy link
Contributor Author

Hi @seanlaw,

here's a first draft that reproduces figure 9 of the paper. It's of course still very rough around the edges, not deserving the name 'tutorial' yet. I simply transcribed the pseudocode in table 2. Somehow it's really slow. I think this is because of the repeated calls to the MASS algorithm.
I'll look at their Matlab implementation to see if there's an obvious way to speed up the function.

@refactoriel
Copy link
Contributor Author

OK, in the Matlab implementation they actually use something very close to the MASS2 algorithm which is an optimised version of MASS using frequency domain multiplication (so ffts). Instead of repeatedly calculating the fft for the time series T^i they precompute it once.
I just tried computing distance profiles repeatedly borrowing some code from mass-ts and got a substantial speedup. That repo's code is under the Apache licence, so should be OK to use.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 4, 2020

@DanBenHa Firstly, thank you for taking the time to put this together! You've done a wonderful job! I have tendency to be rather direct in my feedback and just know that my comments are meant to be constructive and open for discussion/push back. I really appreciate everything that you've done!

Regarding the slowness, I don't see anything particularly obvious. In fact, when I look at the length of each time series:

python 17618
hippo 16408
red_flying_fox 16741
alpaca 16653

It looks like it is actually faster to concatenate all four time series (length of 67420 ) and find the consensus sequence the brute force way. It took me ~15 seconds on my laptop. I'm not suggesting that we do this but just sharing an observation. You're probably right that the slowness is repeatedly calling mass. To be explicitly clear, it isn't that mass is slow, it's that we are calling it so many times and inside of a triply-nested for-loop.

Okay, I just noticed that you are calling stumpy.core.mass_absolute() (this returns non-normalized Euclidean distances). Shouldn't this be calling stumpy.core.mass() instead (this returns z-normalized Euclidean distances computed via FFT, this is the distance that is being computed by stumpy.stump)? The mass_absolute function may not be parallelized (this can be fixed). I think you'll see the speed-up with stumpy.core.mass() and this will produce the correct radii (unlike stumpy.core.mass_absolute).

I find it really strange that, in their paper, bsf_rad is initially set to np.inf. This means that it will take a long time before bsf_rad is first updated (basically, you must run through and complete the inner-most for-loop at least one time before bsf_rad is updated). I wonder what would happen if you had initialized bsf_rad with the known best radius. Then, it should be much faster?

So, I remember reading in one of the papers that when a subsequence of length m is z-normalized (i.e., standardized), then there is a clear upper bound of what the maximum distance can be (see the paragraph directly above Table 3 of this paper). Essentially:

We do have an upper bound as to the largest possible distance for time series of length L, it is simply the largest possible distance between any pair of subsequences of length L, which is 2 * sqrt(L)

Or, in our case, I think we can/should initialize bsf_rad = 2 * np.sqrt(m) instead of np.inf. I don't know if it will change anything but it is certainly a better boundary condition. What do you think?

@seanlaw
Copy link
Contributor

seanlaw commented Nov 4, 2020

Below are some additional comments for you to consider:

  1. I understand that it is nice to have a progress bar but I think it is distracting in a tutorial and it makes the for-loop inside of the ostinato function a bit hard to digest/read. I'd prefer not to use tqdm as it adds another unnecessary dependency to the notebook. Would you mind removing this?
  2. Considering that STUMPY only supports Python 3.6+ (where dictionaries are always ordered), I don't think it makes sense to explicitly wrap the data into an OrderedDict as order already comes natively to all dictionaries.
  3. I see that you are calling a function to retrieve the data but I think this is only valuable if the function was called multiple times. I think we should just break it out of the function and simply write:
context = ssl.SSLContext()  # Ignore SSL certificate verification for simplicity
T_url = 'https://sites.google.com/site/consensusmotifs/dna.zip?attredirects=0&d=1'
T_raw_bytes = urllib.request.urlopen(T_url, context=context).read()
T_data = io.BytesIO(T_raw_bytes)
T_zipfile = zipfile.ZipFile(T_data)
animals = ['python', 'hippo', 'red_flying_fox', 'alpaca']
data = {}
for animal in animals:
    with T_zipfile.open(f'dna/data/{animal}.mat') as f:
        data[animal] = loadmat(f)['ts'].flatten().astype(float)

colors = {'python': 'tab:blue', 'hippo': 'tab:green', 'red_flying_fox': 'tab:purple', 'alpaca': 'tab:red'}

What do you think?

@refactoriel
Copy link
Contributor Author

@DanBenHa Firstly, thank you for taking the time to put this together! You've done a wonderful job! I have tendency to be rather direct in my feedback and just know that my comments are meant to be constructive and open for discussion/push back. I really appreciate everything that you've done!

No offense taken. I'm here to learn.

It looks like it is actually faster to concatenate all four time series (length of 67420 ) and find the consensus sequence the brute force way. It took me ~15 seconds on my laptop. I'm not suggesting that we do this but just sharing an observation. You're probably right that the slowness is repeatedly calling mass. To be explicitly clear, it isn't that mass is slow, it's that we are calling it so many times and inside of a triply-nested for-loop.

How did you do it? Compute the matrix profile for the concatenate sequence and four times mass on the individual time series?
I just uploaded a much faster version which takes around 16 seconds.

Okay, I just noticed that you are calling stumpy.core.mass_absolute() (this returns non-normalized Euclidean distances). Shouldn't this be calling stumpy.core.mass() instead (this returns z-normalized Euclidean distances computed via FFT, this is the distance that is being computed by stumpy.stump)? The mass_absolute function may not be parallelized (this can be fixed). I think you'll see the speed-up with stumpy.core.mass() and this will produce the correct radii (unlike stumpy.core.mass_absolute).

For me mass_absolute was actually faster than mass, so that's why I used it. But you're right, we should probably use the z-normalised distances.

Or, in our case, I think we can/should initialize bsf_rad = 2 * np.sqrt(m) instead of np.inf. I don't know if it will change anything but it is certainly a better boundary condition. What do you think?

Yes, that's what they did in the Matlab implementation, too. Doesn't seem to speed up the calculation, though.

1. I understand that it is nice to have a progress bar but I think it is distracting in a tutorial and it makes the `for-loop` inside of the `ostinato` function a bit hard to digest/read. I'd prefer not to use `tqdm` as it adds another unnecessary dependency to the notebook. Would you mind removing this?

Sure. It was just for me because the naive implementation took so long to compute ;)

2. Considering that STUMPY only supports Python 3.6+ (where dictionaries are always ordered), I don't think it makes sense to explicitly wrap the data into an `OrderedDict` as order already comes natively to all dictionaries.

I wasn't aware of that. Thanks.

3. I see that you are calling a function to retrieve the data but I think this is only valuable if the function was called multiple times. I think we should just break it out of the function and simply write:

Done :)

So, I'm not sure if this is actually worth pursuing any further if your brute-force approach is so fast. I'll test it with more/longer time-series tomorrow. Maybe it scales better?
Other than that, the k of P variant (so not having to match all time-series) might be valuable. Or would your brute-force approach work here as well?

@seanlaw
Copy link
Contributor

seanlaw commented Nov 4, 2020

How did you do it? Compute the matrix profile for the concatenate sequence and four times mass on the individual time series?

Sorry, I should've been a bit more explicit. My "15 seconds" was simply calling stumpy.stump() on the concatenated time series and nothing more. It served as a rough approximation.

I just uploaded a much faster version which takes around 16 seconds.

Honestly, from what I can tell, you aren't doing anything that isn't already implemented in mass. Of course, mass also does a few really nice convenience things before computing the distance profile like:

  1. Makes a copy of your inputs so that you don't accidentally overwrite the original inputs (this is important when you have np.nan/np.inf values present
  2. It automatically computes the means/stddevs (or you can provide pre-computed values) - means/stddevs are computed differently if np.nan/np.inf values are present

Unfortunately, this means that all of the means/stddevs and array copies are performed over and over again for each call to mass. We never meant for mass to be used this way but can certainly make improve it to accommodate. Give me a bit of time to play with this and compare it with your fast implementation. I think we can get it pretty close with a little bit of extra code in the ostinato function!

I think the secret may be to use the lower level stumpy.core._mass(Q, T, QT, μ_Q, σ_Q, M_T, Σ_T) instead which avoids a lot of the copies and you can pre-compute the means and stddevs once on the fly. You may also be interested in looking at the stumpy.core.preprocess() and stumpy.core.rolling_window()functions as these are helper functions that are already available to you :)

So, I'm not sure if this is actually worth pursuing any further if your brute-force approach is so fast. I'll test it with more/longer time-series tomorrow. Maybe it scales better? Other than that, the k of P variant (so not having to match all time-series) might be valuable. Or would your brute-force approach work here as well?

I think that as the time series lengths get longer (100K+), the brute-force approach will perform significantly worse and, again, I apologize for making the false claim about the speed of the approximate "brute-force" approach above. That was a distraction.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 4, 2020

No offense taken. I'm here to learn.

I love your attitude! I am learning a lot from this conversation

@codecov-io
Copy link

codecov-io commented Nov 4, 2020

Codecov Report

Merging #279 (31dbae7) into master (07ba1ee) will decrease coverage by 0.17%.
The diff coverage is 95.65%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master     #279      +/-   ##
===========================================
- Coverage   100.00%   99.82%   -0.18%     
===========================================
  Files           18       19       +1     
  Lines         1644     1688      +44     
===========================================
+ Hits          1644     1685      +41     
- Misses           0        3       +3     
Impacted Files Coverage Δ
stumpy/ostinato.py 95.45% <95.45%> (ø)
stumpy/__init__.py 100.00% <100.00%> (ø)
stumpy/core.py 100.00% <100.00%> (ø)
stumpy/aamp.py 100.00% <0.00%> (ø)
stumpy/aampi.py 100.00% <0.00%> (ø)
stumpy/floss.py 100.00% <0.00%> (ø)
stumpy/stump.py 100.00% <0.00%> (ø)
stumpy/mstump.py 100.00% <0.00%> (ø)
stumpy/scrump.py 100.00% <0.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07ba1ee...31dbae7. Read the comment docs.

@refactoriel
Copy link
Contributor Author

@seanlaw Thanks, I appreciate your feedback very much.

Honestly, from what I can tell, you aren't doing anything that isn't already implemented in mass. Of course, mass also does a few really nice convenience things before computing the distance profile like:

  1. Makes a copy of your inputs so that you don't accidentally overwrite the original inputs (this is important when you have np.nan/np.inf values present

  2. It automatically computes the means/stddevs (or you can provide pre-computed values) - means/stddevs are computed differently if np.nan/np.inf values are present

Unfortunately, this means that all of the means/stddevs and array copies are performed over and over again for each call to mass. We never meant for mass to be used this way but can certainly make improve it to accommodate. Give me a bit of time to play with this and compare it with your fast implementation. I think we can get it pretty close with a little bit of extra code in the ostinato function!

I think the secret may be to use the lower level stumpy.core._mass(Q, T, QT, μ_Q, σ_Q, M_T, Σ_T) instead which avoids a lot of the copies and you can pre-compute the means and stddevs once on the fly. You may also be interested in looking at the stumpy.core.preprocess() and stumpy.core.rolling_window()functions as these are helper functions that are already available to you :)

The only difference is that I'm precomputing T's FFT which would be repeatedly computed by sliding_dot_product. Is numpy.convolve actually doing an FFT convolution? Reading the doc and some of the code I'm not so sure about it.
So I guess I could make my own implementation of a sliding dot product using the precomputed FFT of T and feed the resulting QT to _mass.

@refactoriel
Copy link
Contributor Author

Is numpy.convolve actually doing an FFT convolution? Reading the doc and some of the code I'm not so sure about it.

A quick test of convolving two vectors of length 2**18 at least shows that numpy.convolve is much slower than scipy.signal.fftconvolve for big vectors :

In [5]: %timeit np.convolve(a, b)
3.51 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit fftconvolve(a, b)
31.3 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

Can you tell me if you are using the latest version of NumPy? I recently (10 days ago) replaced scipy.signal.fftconvolve with numpy.convolve as NumPy seemed to be faster (this didn't use to be the case) but perhaps my tests were on arrays that were too small. I made the assumption that NumPy convolve was also using FFTs and that may have been a bad assumption.

Now I regret doing it as we may need to revert back to scipy.signal.fftconvolve. Would you mind taking a look?

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

The only difference is that I'm precomputing T's FFT which would be repeatedly computed by sliding_dot_product. Is numpy.convolve actually doing an FFT convolution? Reading the doc and some of the code I'm not so sure about it.
So I guess I could make my own implementation of a sliding dot product using the precomputed FFT of T and feed the resulting QT to _mass.

Exactly! First, you'd take each time series and do something like:

Ts = [None] * len(time_series)
M_Ts = [None] * len(time_series)
Σ_Ts = [None] * len(time_series)
for i, T in enumerate(list(time_series)):
    Ts[i], M_Ts[i], Σ_Ts[i] = stumpy.core.preprocess(T, m)

# followed by the rest of the ostinato algo

The reason we use stumpy.core.preprocess is because it will handle non-finite values for you that is compatible with _mass. We must support handling of non finite values (i.e., nan/inf) in our code.

I am not so well versed with FFT and haven't looked at that part of your new code closely but, since the sliding dot product is always between two different time series (i.e., Q from one time series and T from another time series), is it actually possible to precompute FFTs for each of the time series independently and reuse them once you know which Q it should be paired with?

If so, assuming we replace it with scipy.signal.fftconvolve again, is there a nice way to add additional optional arguments (i.e., Q_fft=None or T_fft=None or whatever makes sense) to stumpy.core.sliding_dot_product? This way, the fft can be used when they are available but defaults to what currently exists if they are not.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

My previous np.convolve vs scipy.signal.fftconvole timings can be found here and I found NumPy to be about twice as fast even for much longer time series. I wonder if this is due to recent changes in NumPy

@refactoriel
Copy link
Contributor Author

My previous np.convolve vs scipy.signal.fftconvole timings can be found here and I found NumPy to be about twice as fast even for much longer time series. I wonder if this is due to recent changes in NumPy

I can reproduce your findings:

%run convolvetest.py
Numpy version: 1.18.5
Scipy version: 1.4.1
64 0.0 0.0 3.79
128 0.0 0.0 7.12
256 0.0 0.0 6.6
512 0.0 0.0 4.02
1024 0.0 0.0 3.78
2048 0.0 0.0 2.85
4096 0.0 0.0 2.09
8192 0.0 0.0 1.63
16384 0.0 0.0 1.65
32768 0.0 0.0 1.57
65536 0.0 0.0 1.57
131072 0.0 0.01 1.41
262144 0.01 0.02 1.86
524288 0.02 0.04 2.12
1048576 0.04 0.09 2.23
2097152 0.08 0.22 2.7
4194304 0.16 0.62 3.89
8388608 0.27 1.18 4.39
16777216 0.51 2.63 5.15
33554432 1.28 5.42 4.25
67108864 2.56 11.2 4.38
134217728 5.11 24.58 4.81

The discrepancy is because I was using vectors of the same orders of magnitude:

for i in range(6, 28):
     Q = np.random.rand(2**i)
     T = np.random.rand(2**i + 11)
     n = T.shape[0]
     m = Q.shape[0]
     Qr = np.flipud(Q)  # Reverse/flip Q
     start = time.time()
     QT_np = np_QT(Qr, T)
     np_time = time.time()-start
     start = time.time()
     QT_scipy = scipy_QT(Qr, T)
     scipy_time = time.time()-start
     print(2**i, np.round(np_time, 2), np.round(scipy_time,2), np.round(scipy_time/np_time,2))
     npt.assert_almost_equal(QT_np, QT_scipy)

64 0.0 0.0 5.22
128 0.0 0.0 6.03
256 0.0 0.0 3.68
512 0.0 0.0 2.07
1024 0.0 0.0 0.88
2048 0.0 0.0 0.45
4096 0.01 0.0 0.16
8192 0.03 0.0 0.05
16384 0.09 0.0 0.04
32768 0.27 0.01 0.04
65536 0.72 0.02 0.03
131072 1.8 0.03 0.01
262144 4.17 0.06 0.01
524288 13.9 0.15 0.01
1048576 48.02 0.33 0.01
(skipped the rest ...)

I was about to say that it's alright to use numpy, since the query will typically be shorter than than T. But have a look at different query sizes:

T = np.random.rand(2**20)
for i in range(6, 20):
     Q = np.random.rand(2**i)
     n = T.shape[0]
     m = Q.shape[0]
     Qr = np.flipud(Q)  # Reverse/flip Q
     start = time.time()
     QT_np = np_QT(Qr, T)
     np_time = time.time()-start
     start = time.time()
     QT_scipy = scipy_QT(Qr, T)
     scipy_time = time.time()-start
     print(2**i, np.round(np_time, 2), np.round(scipy_time,2), np.round(scipy_time/np_time,2))
     npt.assert_almost_equal(QT_np, QT_scipy)
 
64 0.03 0.08 2.21
128 0.04 0.07 1.57
256 0.06 0.09 1.52
512 0.1 0.06 0.58
1024 0.14 0.06 0.42
2048 0.41 0.07 0.17
4096 1.69 0.07 0.04
8192 2.89 0.09 0.03
16384 3.59 0.06 0.02
32768 4.6 0.07 0.02
65536 8.23 0.1 0.01
131072 9.88 0.14 0.01
262144 14.49 0.16 0.01
524288 28.21 0.24 0.01

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

I was about to say that it's alright to use numpy, since the query will typically be shorter than than T. But have a look at different query sizes:

Wow, that's wild! Thanks for digging into this. If I understand your numbers correctly, it seems as though scipy.signal.fftconvolve may be, on average, faster in the majority of cases (except when the Q and T are similar in length). I wonder if there is a rational way to decide (behind the scenes) when to use np.convolve and when to use scipy.signal.fftconvolve? I remember reading in the scipy.signal.fftconvolve docstring that when:

This is generally much faster than convolve for large arrays (n > ~500), but can be slower when only a few output values are needed,

It's interesting that the length described is very much aligned with your findings where the break even happens around n=500:

256 0.06 0.09 1.52
<--- Break even here
512 0.1 0.06 0.58

Perhaps we just update stumpy.core.sliding_dot_product to check the lengths of Q and T like:

if Q.shape[0] >= 500 or T.shape[0] >= 500:
    convolve = scipy.signal.fftconvolve
else:
    convolve = np.convolve

...
QT = convolve(Qr, T)

return QT.real[m - 1 : n] 

@refactoriel
Copy link
Contributor Author

Perhaps we just update stumpy.core.sliding_dot_product to check the lengths of Q and T like:

if Q.shape[0] <= 500 or T.shape[0] <= 500:
    convolve = scipy.signal.fftconvolve
else:
    convolve = np.convolve

...
QT = convolve(Qr, T)

return QT.real[m - 1 : n] 

That first conditional should be >= 500 for scipy, shouldn't it? Or, more elegantly, you could let SciPy do the work in scipy.signal.convolve:

Automatically chooses direct or Fourier method based on an estimate of which is faster (default). See Notes for more detail.

I don't know if SciPy's direct method is the same as NumPy's convolve, though.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

That first conditional should be >= 500 for scipy, shouldn't it?

Oops, you're right! Thanks for catching that

I don't know if SciPy's direct method is the same as NumPy's convolve, though.

Honestly, I've gone back and forth between using np.convolve and scipy.signal.fftconvolve at least twice in STUMPY's history and so I think it is probably okay to go with the explicit if/else statement which, in my opinion, clearly defines when each method is used and it's likely "good enough". But I am open to using scipy.signal.convolve. I want us to be pragmatic about it and focus our attention on the ostinato component :)

How does that sound? If we don't want to handle the convolve part here then I can re-open the issue #272 and deal with it there.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 5, 2020

Or, more elegantly, you could let SciPy do the work in scipy.signal.convolve

Okay, I just ran a quick test with np.convolve, scipy.signal.convolve, and scipy.signal.fftconvolve. We keep T constant and increase the length of Q:

import numpy as  np
from scipy.signal import fftconvolve, convolve
import numpy.testing as npt
import time

def np_QT(Qr, T):
    QT = np.convolve(Qr, T)

    return QT.real[m - 1 : n]

def scipy_fft_QT(Qr, T):
    QT = fftconvolve(Qr, T)

    return QT.real[m - 1 : n]

def scipy_QT(Qr, T):
    QT = convolve(Qr, T)

    return QT.real[m - 1 : n]

T = np.random.rand(2**20)
for i in range(6, 20):
    Q = np.random.rand(2**i)
    n = T.shape[0]
    m = Q.shape[0]
    Qr = np.flipud(Q)  # Reverse/flip Q
    
    start = time.time()
    QT_np = np_QT(Qr, T)
    np_time = time.time()-start
    
    start = time.time()
    QT_fft_scipy = scipy_fft_QT(Qr, T)
    scipy_fft_time = time.time()-start
    
    start = time.time()
    QT_scipy = scipy_fft_QT(Qr, T)
    scipy_time = time.time()-start
    
    print(2**i, np.round(np_time, 2), np.round(scipy_time,2), np.round(scipy_fft_time,2), np.round(scipy_time/np_time,2), np.round(scipy_fft_time/np_time,2))
    npt.assert_almost_equal(QT_np, QT_fft_scipy)
    npt.assert_almost_equal(QT_np, QT_scipy)

and the results were:

64 0.05 0.07 0.12 1.34 2.47
128 0.03 0.08 0.08 2.26 2.34
256 0.05 0.06 0.07 1.34 1.43
512 0.07 0.09 0.1 1.19 1.3
1024 0.12 0.08 0.07 0.71 0.58
2048 0.25 0.1 0.1 0.41 0.42
4096 0.71 0.08 0.08 0.11 0.11
8192 1.4 0.07 0.07 0.05 0.05
16384 3.08 0.08 0.1 0.03 0.03
32768 5.56 0.08 0.11 0.02 0.02
65536 11.23 0.09 0.11 0.01 0.01
131072 33.38 0.09 0.13 0.0 0.0
262144 74.39 0.09 0.13 0.0 0.0

For short Q, np.convolve is slightly faster than scipy.signal.convolve but it's imperceptible IMHO (i.e., scipy.signal.convolve is "good enough") and then it flips once the length is greater than 500 and scipy.signal.convolve is faster as the length of Q increases.

And then, when Q is kept small:

Q = np.random.rand(50)
for i in range(6, 28):
    T = np.random.rand(2**i + 11)
    n = T.shape[0]
    m = Q.shape[0]
    Qr = np.flipud(Q)  # Reverse/flip Q
    start = time.time()
    QT_np = np_QT(Qr, T)
    np_time = time.time()-start
    start = time.time()
    QT_scipy = scipy_QT(Qr, T)
    scipy_time = time.time()-start
    print(2**i, np.round(np_time, 2), np.round(scipy_time,2), np.round(scipy_time/np_time,2))
    npt.assert_almost_equal(QT_np, QT_scipy)

we get

64 0.0 0.01 5.78
128 0.0 0.0 2.27
256 0.0 0.0 1.78
512 0.0 0.0 2.45
1024 0.0 0.0 1.15
2048 0.0 0.0 0.26
4096 0.0 0.0 0.71
8192 0.0 0.0 2.43
16384 0.0 0.0 0.9
32768 0.0 0.0 0.57
65536 0.0 0.0 0.93
131072 0.0 0.0 1.04
262144 0.01 0.01 1.19
524288 0.02 0.01 0.82
1048576 0.07 0.05 0.63
2097152 0.1 0.07 0.67
4194304 0.16 0.11 0.69
8388608 0.25 0.21 0.86
16777216 0.43 0.43 1.01
33554432 0.87 0.85 0.97
67108864 1.77 2.45 1.39
134217728 3.8 3.61 0.95

Based on this, I think I'm pretty convinced that scipy.signal.convolve is probably the most elegant solution that gets us there the majority of the time! So, let's go with it 👍

I don't know if SciPy's direct method is the same as NumPy's convolve, though.

Given that the outputs are the same (based on the assert statements above) I don't think this matters. Unless I'm missing anything obvious?

@seanlaw
Copy link
Contributor

seanlaw commented Nov 6, 2020

So, after switching np.convolve to scipy.signal.convolve in core.py, I updated your original ostinato function to:

def ostinato(tss, m):
    # Preprocess means and stddevs and handle np.nan/np.inf
    Ts = [None] * len(tss)
    M_Ts = [None] * len(tss)
    Σ_Ts = [None] * len(tss)
    for i, T in enumerate(tss):
        Ts[i], M_Ts[i], Σ_Ts[i] = stumpy.core.preprocess(T, m)
        
    bsf_rad, ts_ind, ss_ind = np.inf, 0, 0
    k = len(Ts)
    for j in range(k):
        if j < (k - 1):
            h = j + 1
        else:
            h = 0
        
        mp = stumpy.stump(Ts[j], m, Ts[h], ignore_trivial=False)
        si = np.argsort(mp[:, 0])
        for q in si:
            rad = mp[q, 0]
            if rad >= bsf_rad:
                break
            for i in range(k):
                if ~np.isin(i, [j, h]):
                    QT = stumpy.core.sliding_dot_product(Ts[j][q:q+m], Ts[i])
                    rad = np.max((
                            rad, 
                            np.min(stumpy.core._mass(Ts[j][q:q+m], Ts[i], QT, M_Ts[j][q], Σ_Ts[j][q], M_Ts[i], Σ_Ts[i]))
                                ))
                    if rad >= bsf_rad:
                        break
            if rad < bsf_rad:
                bsf_rad, ts_ind, ss_ind = rad, j, q
    
    return bsf_rad, ts_ind, ss_ind

m = 1000
start = time.time()
rad, ts_ind, ss_ind = ostinato(list(data.values()), m)
print(time.time() - start)

This code executed in around 6-7 seconds for the DNA inputs using _mass (with scipy.signal.convolve) versus around 35 seconds using mass (i.e., np.min(stumpy.core.mass(Ts[j][q:q+m], Ts[i])) with scipy.signal.convolve).

I think 6-7 seconds is "good enough". @DanBenHa Do you mind testing this out?

@refactoriel
Copy link
Contributor Author

Yes, switching to scipy.signal.convolve and using the updated ostinato function I get around ~4 seconds for the DNA data and ~2 seconds for the electrical demand data. Seems good enough for data of this size.
I'm now testing it with my data (around 400 time-series of 1500 to 10000 values). This is going to take much longer ;)

@seanlaw
Copy link
Contributor

seanlaw commented Nov 9, 2020

Yes, switching to scipy.signal.convolve and using the updated ostinato function I get around ~4 seconds for the DNA data and ~2 seconds for the electrical demand data. Seems good enough for data of this size.
I'm now testing it with my data (around 400 time-series of 1500 to 10000 values). This is going to take much longer ;)

Awesome! I'm super interested in hearing about how this goes with your larger data set and how much time it ends up taking. I'm guessing that since the ostinato algorithm doesn't require all pairwise time series comparisons that it should be relatively linear in time. I'd estimate that it would take 30 minutes to one hour to complete? Please kindly report back once the job is complete! 👍

If you have the time, would you mind:

  1. Switching np.convolve with scipy.signal.convolve in stumpy.core
  2. Update the notebook reproducer to use the final algorithm that we described above

Additionally:

Other than that, the k of P variant (so not having to match all time-series) might be valuable. Or would your brute-force approach work here as well?

Would it be possible to add an additional argument to ostinato like k=None and then if k is None then we set k to the number of time series? Is there a lot of work that needs to done to make this compatible with the k of P variant?

Also, I think the DNA example is good as a more complex example (for later in the notebook) but I think it would be really helpful to start by reproducing Figure 1 and Figure 2 from the paper. I feel like these two basic figures really keeps the point simple in that you are trying to find the subsequence that exists in ALL of the 9 time series and then you show how to do it with simply calling the ostinato function and produce Figure 2.

I understand that you are probably super busy so please let me know and I can probably handle it afterward. Absolutely no pressure! You've already contributed a lot of your time and I really appreciate it.

@refactoriel
Copy link
Contributor Author

Awesome! I'm super interested in hearing about how this goes with your larger data set and how much time it ends up taking. I'm guessing that since the ostinato algorithm doesn't require all pairwise time series comparisons that it should be relatively linear in time. I'd estimate that it would take 30 minutes to one hour to complete? Please kindly report back once the job is complete! 👍

I was busy with other stuff, so only had time to start the job around 5.5 hours ago. If I reuse FFTs like in my second version it takes around 80 minutes. The other version (convolution changed) is still computing and is taking more than three times the time. Seems like the repeated FFT computation is taking a big toll if the number of time-series is increased that much.

If you have the time, would you mind:

1. Switching `np.convolve` with `scipy.signal.convolve` in `stumpy.core`

Do you mean just for this notebook or should I make a PR for the change in stumpy.core?

2. Update the notebook reproducer to use the final algorithm that we described above

After seeing the performance drop for bigger datasets we should consider recycling the time-series' FFTs. What do you think? Doesn't need to be in that object-oriented style that I went with in version 2.

Would it be possible to add an additional argument to ostinato like k=None and then if k is None then we set k to the number of time series? Is there a lot of work that needs to done to make this compatible with the k of P variant?

In the paper's companion code it's a separate function. There's probably a way to combine them into one function but it doesn't look so simple. I think it's faster if we first implement the special case and later add the generalisation. BTW, in their readme they state It's best to reuse a single object if subsequence length does not change, as a lot of things are cached at the object level. This is useful for the k of P variant if you wish to test different values of k against P time series., arguing for recycling FFTs and also making it a class.

Also, I think the DNA example is good as a more complex example (for later in the notebook) but I think it would be really helpful to start by reproducing Figure 1 and Figure 2 from the paper. I feel like these two basic figures really keeps the point simple in that you are trying to find the subsequence that exists in ALL of the 9 time series and then you show how to do it with simply calling the ostinato function and produce Figure 2.

Agreed. After all, it's supposed to be a tutorial ;)

I understand that you are probably super busy so please let me know and I can probably handle it afterward. Absolutely no pressure! You've already contributed a lot of your time and I really appreciate it.

I appreciate your empathy! Yes, I'm busy actually applying stumpy to my data :) But there are going to be downtimes during which I can work on the notebook. It's just gonna take a bit more time.

@refactoriel
Copy link
Contributor Author

The other version (convolution changed) is still computing and is taking more than three times the time.

It just completed taking 212 minutes. So 2.65 times more than with the FFT recycling.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 10, 2020

Do you mean just for this notebook or should I make a PR for the change in stumpy.core?

I don't mind if you changed it in this PR.

After seeing the performance drop for bigger datasets we should consider recycling the time-series' FFTs. What do you think? Doesn't need to be in that object-oriented style that I went with in version 2.

It sounds like recycling FFTs may be worth it. Is it possible (or enough) to modify core.sliding_dot_product() and add additional parameters like Q_fft=None and T_fft=None and when either of them are provided then you reuse them accordingly but if they are both None then you compute QT as we have it now? Would this be enough to recycle FFTs? Perhaps, you can provide a list of things that need to be done and, in the meantime, I can take a closer look at your second implementation. My goal is for us to make as few changes as possible since we'll need to write unit tests for any new functionality.

Having said this, I think we should be pragmatic and get this implementation merged as it is definitely "good enough" and we don't have to strive for the fastest implementation right out of the gate. I'd rather have smaller focused PRs than one giant PR with many little related things. Let's celebrate this win! And then, we can start a new separate issue that is focused on recycling FFTs and work on it when we are able to find more time. How does that sound to you?

@seanlaw
Copy link
Contributor

seanlaw commented Nov 10, 2020

It just completed taking 212 minutes. So 2.65 times more than with the FFT recycling.

Thank you for following up on this. It's not horrible. Let's create a fresh issue that focuses on improving the speed/performance.

@refactoriel
Copy link
Contributor Author

refactoriel commented Nov 10, 2020

Having said this, I think we should be pragmatic and get this implementation merged as it is definitely "good enough" and we don't have to strive for the fastest implementation right out of the gate. I'd rather have smaller focused PRs than one giant PR with many little related things. Let's celebrate this win! And then, we can start a new separate issue that is focused on recycling FFTs and work on it when we are able to find more time. How does that sound to you?

Yes, thanks for bringing in the pragmatic perspective. Being in academia I tend to over-analyse/-optimise things 😆
So, things for this PR:

  • change convolve function in stumpy.core
  • add ostinato function in stumpy.ostinato
  • notebook: reproduce figures 1 and 2 of the paper
  • unittest(s) for ostinato

And the optimisation in a fresh issue/PR.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 10, 2020

So, things for this PR:

I've taken the liberty to add a little more detail

  • change convolve function in stumpy.core
  • add ostinato function in ostinato.py
  • add from .ostinato import ostinato # noqa: F401 to __init__.py (this allows you do import stumpy; stumpy.ostinato())
  • notebook: reproduce figures 1 and 2 of the paper
  • create tests/test_ostinato.py file
  • in tests/test_ostinato.py add a naive_ostinato function which is just a brute force version of ostinato to compare against. We prefer this approach rather than hardcoded inputs/outputs.
  • unittest(s) for ostinato - In this case, I would just write one (or two) test that takes in three randomly generated time series as input (i.e., ts = list(np.random.rand(64), np.random.rand(128), np.random.rand(256)) and m = 50) and compare the stumpy.ostinato(ts, m) output to that of the naive_ostinato output and they should be the same. I think this is satisfactory.
  • add py.test -x -W ignore::RuntimeWarning -W ignore::DeprecationWarning tests/test_ostinato.py and check_errs $? at line 124 of the test.sh testing script

Please let me know if you'd like me to help with any/all of these tasks

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor suggestions but looks good otherwise.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some additional thoughts on one of the points

ts_ind = j
ss_ind = min_rad_index

return _get_central_motif(tss, rad, ts_ind, ss_ind, m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using _get_central_motif is it possible to create a naive_get_central_motif instead? And then, we should test _get_central_motif against naive_get_central_motif. I think it is important to have this test since it is so core to ostinato

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this. Can you elaborate on this idea? How would a naive_get_central_motif work? I feel like my implementation of _get_central_motif is already quite naive :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the initial vagueness and I agree that ostinato._get_central_motif is already quite simple but here is my concern. What happens when somebody incorrectly modifies or changes ostinato._get_central_motif? Since this test calls ostinato._get_central_motif then we have no way of verifying the "correctness" of ostinato._get_central_motif. So, if somebody decides to alter ostinato._get_central_motif to not compute anything and to always return the set of values 1, 2, 3 (for rad, ts_ind, ss_ind) then this test will always pass since the test is calling ostinato._get_central_motif. In other words, to be properly covered, we should have a separate test to verify the outputs of ostinato._get_central_motif. Does that make sense?

Copy link
Contributor

@seanlaw seanlaw Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, when the functions are already simple, I create a "naive" version by replacing all of the major NumPy calls with a more verbose/explicit for-loop (i.e., replace np.argmin, np.isclose, np.max, etc). I know that I am being rather stubborn/persistent here and so I can handle this afterward in #285

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense. I'll give it a shot.

radii = np.zeros(len(tss[j]) - m + 1)
for i in range(k):
if i != j:
mp = stumpy.stump(tss[j], m, tss[i], ignore_trivial=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests/naive.py you can import this into this test with import naive and then replace stumpy.stump with naive.stump

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment refers to the idea of naive_get_central_motif, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Instead of using:

mp = stumpy.stump(tss[j], m, tss[i], ignore_trivial=False)

Please replace this with:

import naive

mp = naive.stump(tss[j], m, tss[i], ignore_trivial=False)

This naive.stump implementation is slower but helps us ensure that all of our fast implementations produce the same results as all of slow (naive) implementations.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some more comments around the tests

@refactoriel
Copy link
Contributor Author

Apologies for the delay - I've been quite busy the last few days. I've implemented your suggestions to the notebook. The data was uploaded to Zenodo but, as you saw, not in the proper way. I'll leave it like that for now and see if I have time to do it properly later.
Will look at your other comments soonish.

@seanlaw
Copy link
Contributor

seanlaw commented Nov 24, 2020

Apologies for the delay - I've been quite busy the last few days. I've implemented your suggestions to the notebook. The data was uploaded to Zenodo but, as you saw, not in the proper way. I'll leave it like that for now and see if I have time to do it properly later.

No worries! What a coincidence, I found a little bit of time to upload the data to Zenodo in the "correct" format and I've provided some notebook updates that reflect those changes so, hopefully, it should be a quick fix.

Will look at your other comments soonish.

No rush! Thank you for following up on this

@refactoriel
Copy link
Contributor Author

No worries! What a coincidence, I found a little bit of time to upload the data to Zenodo in the "correct" format and I've provided some notebook updates that reflect those changes so, hopefully, it should be a quick fix.

Great, cheers for that! Looks much cleaner now 😃

@seanlaw
Copy link
Contributor

seanlaw commented Nov 25, 2020

@DanBenHa Do you think this is ready to be merged or are there any remaining tasks to be done?

@refactoriel
Copy link
Contributor Author

@DanBenHa Do you think this is ready to be merged or are there any remaining tasks to be done?

Almost. I'll resolve your remaining comments around the tests first thing tomorrow morning :)

@refactoriel
Copy link
Contributor Author

OK, barring the naive_get_central_motif idea this is ready from my side.

@refactoriel
Copy link
Contributor Author

refactoriel commented Nov 26, 2020

In thinking more about the "central consensus motif", I was wondering if you had considered using the time series "snippets" definition (proposed in this paper by the same researchers as the matrix profile). I haven't explored it myself (but added a new issue #283 ) but I wonder if it may be useful in this case vs looking at the mean distance to its nearest neighbor.

Thanks for this idea! Seems like snippets are conceptually similar to a PCA (thinking very naively) in that you're trying to find components/snippets that best explain the variance in the data. I don't think it would help with the central motifs but it's definitely something that I will look into for my data.

ts_ind_alt = ts_ind_alt[alt_same]
ss_ind_alt = ss_ind_alt[alt_same]
i_alt_first = np.argmin(ts_ind_alt)
if ts_ind_alt[i_alt_first] < ts_ind:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This amount of nested branching logic concerns me (from a testing standpoint). I can try to spend some time in the future to see if it can be simplified but just leaving a note for my future self here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #285

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in case it matters, I would not put naive_get_central_motif in naive.py because naive_get_central_motif is only used by test_ostinato.py and no other test file. Functions in naive.py are strictly for naive implementations that are used across multiple different test files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when I wrote this I also got that code smell. I'll see if i can resolve it ;)

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment

@refactoriel
Copy link
Contributor Author

Cool, I'll try to implement/change what you suggested.

@refactoriel
Copy link
Contributor Author

I have implemented the naive version of get_central_motif. I hope this is what you suggested. Feels a bit weird to essentially test numpy implementations :P

Also removed the nested conditional. get_central_motif is probably not the most beautiful function, but I'll leave it up to you to decide if it requires further refactoring ;)

@seanlaw
Copy link
Contributor

seanlaw commented Dec 1, 2020

I have implemented the naive version of get_central_motif. I hope this is what you suggested. Feels a bit weird to essentially test numpy implementations :P

Ohh no, that wasn't my intention. :( I apologize as I was likely not clear enough on what I was asking. I think you've done a tonne so I can go over it after it gets merged.

Also removed the nested conditional. get_central_motif is probably not the most beautiful function, but I'll leave it up to you to decide if it requires further refactoring ;)

Sounds good! I will take some time to go through it. Thank you so much for working with me on all of this! I really appreciate your help and I certainly learned a lot. Once this gets released, I'll be sure to use your @DanielHaehnke Twitter handle to acknowledge your contribution!

@seanlaw seanlaw merged commit 060df73 into stumpy-dev:master Dec 1, 2020
@refactoriel
Copy link
Contributor Author

Cool, thanks for accepting the pull request into master!

Don't worry about the testing confusion on my side. My knowledge of testing isn't that big, so I'm really curious about what you meant. Will subscribe to #285 to see your solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Ostinato Time Series Consensus Motif Notebook Reproducer Add Time Series Consensus Motifs
3 participants