Replies: 10 comments
-
In fact, I would consider
And when you do the same for the
You see that both of their z-normalized Euclidean distances are the same. So, it is important to keep in mind that z-normalization basically removes the vertical shifts in your data and only compares the shape of your data. In this case, they are indeed identical in shape! However, they are certainly different in magnitude. Of course, finding matching shape is the harder part and, in your case, you can simply account for this magnitude different in a post-processing step.
Yes, having a pattern of length P.S. This is a judgement free zone so please ask away! I'm learning as much as you are. |
Beta Was this translation helpful? Give feedback.
-
Thank you a lot! For binning you mean my attempt to normalize the data by assigning it to values between -3 and 3, right? |
Beta Was this translation helpful? Give feedback.
-
Yes, basically, by assigning values between -3 and 3, you are applying a manual normalization step (which could be biased depending on how you choose the bin edges). Unlike other machine learning methods that typically require your data to be standardized or normalized, this is not necessary with matrix profiles since it will already perform a local z-normalization of the subsequence before computing the Euclidean distance. By assigning the values to a "bin", you've essentially removed information that would help you differentiate between the matching patterns. I would generally recommend just running STUMPY on your raw data whenever possible (i.e., make little to no assumptions) and then only make changes to your process in a data driven way (i.e., only "fix" something if you see that it is causing a problem rather than fixing a non-existent problem and end up potentially creating more issues). Again, to be clear, a window size of 3 is not likely going to be meaningful so I strongly recommend looking at larger window sizes (or you may explore hourly data??) |
Beta Was this translation helpful? Give feedback.
-
Thank you a lot! In this case i will do it without that "normalization". I think that another thing interfeering here might be the fact that i'm using, instead of the actual prices, the variation between the prices: |
Beta Was this translation helpful? Give feedback.
-
Another thing to potentially experiment with is looking at percentage change between neighboring values. Good luck! |
Beta Was this translation helpful? Give feedback.
-
This is absolutely true. What i'm trying to do now is how much does the scale of the number interferes with my MASS output; i've noticed that, for example, when the data in my target dataset is made of small values, compared to the pattern data, the output tends to have higher values (all around 3). For example, the following code:
Gives the following output:
Even though from index I'm trying to work on this, for now. Thank you a lot! |
Beta Was this translation helpful? Give feedback.
-
I'll have to look into that. I would've expected the identical match to produce a distance of zero. We may be bumping up against machine precision. |
Beta Was this translation helpful? Give feedback.
-
Honored i was able to help you finding a bug! |
Beta Was this translation helpful? Give feedback.
-
@Jacks349 This is fixed in the master branch now. Are you able to clone the repository and re-install from source? Otherwise, you'll have to wait for the next release |
Beta Was this translation helpful? Give feedback.
-
Yes i can do that! Thank you for fixing the issue so quickly! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am the same guy from this issue! I have another (noob) question:
In my dataset, i have a set of local minima where each element of the array is a percentage that stands for the variation from the previous element, so it means that price (not asking for financial advice) for that asset is going up when there is a series of positive percentages or down when there is a series of negative percentages.
Here is a sample:
[1.8867924528301907, 2.5462962962962914, 0.45146726862302056, 0.6741573033707992, 2.2321428571428545, 0.8733624454148388, 7.575757575757577, 1.810865191146881, 2.173913043478257, 2.514506769825928, 5.094339622641508, -7.719928186714545, -2.3346303501945465, -0.5976095617529992, -3.40681362725451, 0.0, -1.2448132780082868, -1.6806722689075646, -2.7777777777777874, -0.4395604395604354, 6.401766004415006]
In order to avoid noise and get better results with my distance profile, i normalized this data by converting every percentage to a value between -3 and 3, where 0 is a variation of 0, from 0 to 0,5 is a variation of <2, from 0.5 to 1 is a variation <4 and so on.
So the previous dataset, becomes this:
Target = [0.5, 1, 0.5, 0.5, 1, 0.5, 2, 0.5, 1, 1, 1.5, -2, -1, -0.5, -1, 0.0, -0.5, -0.5, -1, -0.5, 1.5]
Now i specified a small pattern, which is the following:
Pattern = [-0.5, -1, -1]
. This pattern is made of only negative values, which means that price is going down (gradually, since the values are not high).So i made the distance profile:
distance_profile = mass(Pattern, Target)
Which gave me the following output:
What i don't understand is the second element of the output
1.46000966e-07
, which is a very low value, so it suggests strong correlation for that part of the dataset, what i don't get is why is the value so low when the data on that part of the dataset are all positive values ([..0.5, 1, 0.5, 0.5, 1, 0.5, 2...]
), whereas on myPattern
array i only have negative values.Here are three hypothesis i made:
Pattern
array is too smallAny advice on this is appreciated! I apologize if anything should sound confusing.
Beta Was this translation helpful? Give feedback.
All reactions