Similarity and iSAX

Notice that the 3 instances in figure 1 are not exactly equal. Rather, they are similar. Finding exactly equal patterns is trivial and meaningless.

Time series can be considered similar according to imaginary boundaries (figure 2). If two or more time series are between the range defined by the boundaries, they are similar and can be counted as a repetition (motif). To demonstrate this notion we create a random time series X of length 128 (a) and randomly generate several time series that are visually similar to X. We observe that by imposing tight similarity boundaries, the time series need to be more similar  to be counted as repetitions (c). By making this boundaries more relaxed (d), we can incorporate more time series in it.  This is an important property, because in some application domains some time series need to be seen as similar, even if they present some differences (e.g. noisy applications). It is also an interesting property to be able to change similarity boundaries on the fly. To be able to change between tight and more relaxed boundaries "on-the-fly" would also be an useful property.

                                                                   Figure 2: Similar times series.

Our algorithm implements this notion of tightness of similarity boundaries by using the iSAX time series approximation technique [2]. 

The different similarity boundaries are iSAX resolutions.

iSAX divides a time series into w (word length) frames, calculate each frame's average and then convert the list of averages to a sequence of symbols. The number of symbols a is the alphabet size or resolution. This process is depicted in figure 3 with a word size of 8 and two different resolutions 4 and 16.

Figure 3: iSAX conversion process for time series X using w=8 and a) a=4; b) a=16 (code provided by SAX authors).

The series smoothing caused by the frame averaging process and the resolution create similarity boundaries. The higher the resolution, the tighter the similarity boundaries are. Also, the closer two time series need to be in order to be considered as similar.