Time Series Motifs Statistical Significance

Supporting web page of our paper:

Nuno Castro and Paulo J. Azevedo, Time Series Motifs Statistical Significance
in Proceedings of the Eleventh SIAM International Conference on Data Mining (SDM 2011), Mesa, Phoenix, Arizona, USA. SIAM, 2011. [pdf] [slides][Best Student Paper Award!]
[Free Java source code] [DBLP] [Scholar] [BibTeX]

Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data.

Figure 1 - Example of motifs in their original context (up), and in the same referential for similarity observation (down).
Left image: randomly generated; right: electroencephalogram (eeg) time series.

Many different approaches have been proposed on how to efficiently extract motifs.

Surprisingly, very few works on how to evaluate the extracted motifs.

Motifs are typically evaluated by human experts:
  • Subjective (a pattern that the expert labels as incorrect can be simply a correct unexpected pattern)
  • Slow (we can be talking about Terabytes of data)

In practice, this is unfeasible.
Automatic evaluation measures are necessary.

We present an approach to evaluate time series motifs based on statistical tests.

We intend to calculate (to the best of our knowledge, for the first time in time literature) each time series motif's p-value. To do so, we follow a very simple 3 step approach:

Our approach:

We aim to highlight the importance of motif evaluation, since we believe it is crucial to make motif mining an useful task in practice.

Full experimental results:
Table I results.
Full results for table II (p-values and ranks), for all datasets.