visualizing and discovering non trivial patterns in large time-series databases

31
Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases Quan Le HCI Lab 23th Mar, 2015 Jessica Lin, Eamonn Keogh, Stefano Lonardi, Jeffrey P. Lankford, Daonna M. Nystrom Computer Science & Engineering Department University of California, Riverside, CA 92521 Proceedings of the 30 th VLDB Conference, Toronto, Canada, 2004

Upload: quan-le

Post on 17-Jul-2015

192 views

Category:

Engineering


6 download

TRANSCRIPT

Page 1: Visualizing and discovering non trivial patterns in large time-series databases

Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases

Quan Le HCI Lab

23th Mar, 2015

Jessica Lin, Eamonn Keogh, Stefano Lonardi, Jeffrey P. Lankford, Daonna M. Nystrom Computer Science & Engineering Department University of California, Riverside, CA 92521 Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004

Page 2: Visualizing and discovering non trivial patterns in large time-series databases

Contents

Introduction

Background

Time series data mining tasks

Visualizing Time Series

VizTree

Evaluation

Conclusion

2

Fig 1. Time Series Visualization

Page 3: Visualizing and discovering non trivial patterns in large time-series databases

Introduction Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - Data visualization techniques are very important for data

analysis.

- Visualizing massive time series datasets.

- VizTree – a time series pattern discovery and visualization system based on augment suffix trees.

Occurring pattern (Motif Discovery)

Surprising Pattern (Anomaly Discovery)

Query by content

Measures the dissimilarity between any two time series

3

Page 4: Visualizing and discovering non trivial patterns in large time-series databases

Introduction Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - DoD – The U.S Department of Defense and The Aerospace

Corp (TAC)

- There are two major directions of research area:

Producing better technique to mine the archival launch data from previous missions (mining stage).

Producing better techniques to visualize the streaming telemetry data in the hours before launch (monitor stage)

- Dr. Ben Shneiderman of Uni. of Maryland – “Overview, zoom & filter, details-on-demand”.

4

Page 5: Visualizing and discovering non trivial patterns in large time-series databases

Background

Time Series data mining tasks

Visualization Time Series

5

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Page 6: Visualizing and discovering non trivial patterns in large time-series databases

Time Series data mining tasks

Subsequence matching

Motifs Discovery

Anomaly detection

6

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Page 7: Visualizing and discovering non trivial patterns in large time-series databases

Subsequence Matching

- Sequence matching has long been divided into two categories: whole matching and subsequence matching.

- Subsequence matching: a short query subsequence time series is matched against longer time series by sliding it along the longer sequence, looking for the best matching location.

- Chunking – the process where a time series is broken into individual time series by either specific period.

7

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Fig 2. A weekly map of drought conditions in Texas

Page 8: Visualizing and discovering non trivial patterns in large time-series databases

Time Series Motif Discovery

- A substantial body of literature has been devoted to techniques to discover frequently recurring, overrepresented patterns in time series.

8

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Fig 3. Time series Subsequence Motifs Discovery

Page 9: Visualizing and discovering non trivial patterns in large time-series databases

Anomaly Detection

9

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - The problem of detecting anomalous/surprising patterns has

attracted much attention.

- Keogh’s definition – “whose frequently of occurrences differs substantially from that expected or given previously seen data”.

Fig 4. Illustration of Anomalous Series Detection ( Red represents anomalous time series)

Page 10: Visualizing and discovering non trivial patterns in large time-series databases

Visualizing Time Series

10

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion TimeSearcher

Cluster and Calendar-based visualization

Spirals

Page 11: Visualizing and discovering non trivial patterns in large time-series databases

TimeSearcher

11

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - A time series exploratory and visualization tool that allows

user to retrieve time series by creating queries.

Fig 5. The TimeSearcher visual query interface. User can filter away sequence that are not interesting by insisting that all sequences have at least on data point within the query box. http://www.cs.umd.edu/hcil/timesearcher/videos/TimeSearcherDemo.mp4

Flexibility

Specify different regions to compare.

Page 12: Visualizing and discovering non trivial patterns in large time-series databases

Cluster and Calendar-based visualization

12

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - The time series data are chunked into sequences of day

patterns. - This visualization system displays patterns represented

calendar with each day color-coded by the cluster that it belongs to.

Fig 6. The cluster and calendar-based visualization on employee working hours data. It shows 6 clusters, representing different working day patterns.

Good overview

Limited to calendar-based data

Page 13: Visualizing and discovering non trivial patterns in large time-series databases

Spirals

13

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - Weber developed this tool to visualize time series on spirals.

- Identify periodic structures in the data.

- Do not exhibit periodic behaviors of time series data.

- Requires pixel space in length of time series.

Fig 7. The Spiral visualization approach of Weber applied to the power usage dataset

Page 14: Visualizing and discovering non trivial patterns in large time-series databases

VizTree - Motivation

14

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Here are two sets of bit strings.

Which set is generated by human and

which one is generated by a computer?

0101100101111001101001000010001010 0110110101110000101010111011111000 1101101101111110100110010010001101 0001111001101101000101111000101101 0011011001101000000100110001001110 000011101001100101100001010010

1000100010100100010101010000101010 0010101110111101011010010111010010 1010011101010101001010010101011101 0101001010101011010101001011001011 1011110100011100001010000100111010 100011100001010101100101110101

Page 15: Visualizing and discovering non trivial patterns in large time-series databases

VizTree - Motivation

15

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Fig 8. (Left) Computer-generated random bits presented as an augmented suffix tree. (Right) Human-constructed bits presented as an augmented suffix tree.

0

1

(0,1,0) (1,0,1) (0,1,1)

1000100010100100010101010000101010 0010101110111101011010010111010010 1010011101010101001010010101011101 0101001010101011010101001011001011 1011110100011100001010000100111010 100011100001010101100101110101

0101100101111001101001000010001010 0110110101110000101010111011111000 1101101101111110100110010010001101 0001111001101101000101111000101101 0011011001101000000100110001001110 000011101001100101100001010010

Page 16: Visualizing and discovering non trivial patterns in large time-series databases

VizTree - Motivation

16

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion - The strings represented in the tree are in fact “subsequences”

rather than “suffixes”.

- Using a time-series discretization method.

- Given the same parameters, the tree has the same overall

shape for any dataset.

Fig 9. VizTree Tool

Page 17: Visualizing and discovering non trivial patterns in large time-series databases

Discretizing time series method

17

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion SAX

Symbolic Aggregate ApproXimation

baabccbc

Page 18: Visualizing and discovering non trivial patterns in large time-series databases

18

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Discretizing time series method

Convert time series

to PAA

Convert PAA to symbols

PAA = Piecewise Aggregate Approximation

Fig 10. A summarization of the notation used

Page 19: Visualizing and discovering non trivial patterns in large time-series databases

19

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Discretizing time series method

Fig 11. A time series dataset of electrical consumption (of length 1024) is converted into an eight-symbol string “acdcbdba”. Note that the general shape is the time series is preserved, in spite of the massive amount of dimensionality reduction.

Page 20: Visualizing and discovering non trivial patterns in large time-series databases

20

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

VizTree – First Look

Fig 12. A screenshot of VizTree

The parameter setting area

The actual subsequence when the technician clicks on a branch

The input time series

The subsequence tree for the time series

Zoom-in window

Page 21: Visualizing and discovering non trivial patterns in large time-series databases

VizTree

21

Motifs Discovery

Anomaly Detection

Diff-Tree (Surprising Patterns)

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Page 22: Visualizing and discovering non trivial patterns in large time-series databases

22

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Motif Discovery

Fig 13. Example of Motif discovery on the winding dataset. Two nearly identical subsequences are identified, among the other motifs.

Page 23: Visualizing and discovering non trivial patterns in large time-series databases

23

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Anomaly Detection

Fig 14. Heart-beat data with anomaly is shown. While the subsequence tree can be used to identify motifs, it can be used for simple anomaly detection as well.

Page 24: Visualizing and discovering non trivial patterns in large time-series databases

24

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Diff-Tree (Surprising Patterns)

Fig 15. The blue ECG data is the reference data and the green ECG data is the testing data. The resulting tree show the difference in pattern distributions of two datasets. The surprising patterns are ranked with the red one.

Page 25: Visualizing and discovering non trivial patterns in large time-series databases

25

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Evaluation

Subsequence Matching & Motifs Discovery: Human motion data of Yoga Postures

Anomaly Detection: Power consumption data

Page 26: Visualizing and discovering non trivial patterns in large time-series databases

26

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Yoga Postures – Subsequence Matching

- A model postured yoga routines in front of a green screen. - The motion capture is transformed into a time series. - The length of the time series is approximately 26.000.

Fig 16. The sample yoga sequence that we are interested in finding

Page 27: Visualizing and discovering non trivial patterns in large time-series databases

27

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Yoga Postures – Subsequence Matching

- A model postured yoga routines in front of a green screen. - The motion capture is transformed into a time series. - The length of the time series is approximately 26.000.

Fig 17. Matches for the yoga sequence. The bottom right corner shows how similar these two subsequences are

Page 28: Visualizing and discovering non trivial patterns in large time-series databases

28

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Yoga Postures – Motif Discovery

- Identify approximately motifs by examining the subsequences

represented by thick three paths.

Fig 18. Example of Motifs discovery on the winding dataset. Two nearly identical subsequences are identified, among the other motifs.

Page 29: Visualizing and discovering non trivial patterns in large time-series databases

29

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation Conclusion

Power Consumption – Anomaly Detection

- Electricity consumption is recorded every 15 minutes, in 1997

Fig 19. Anomaly detection on power consumption data. The anomaly shown here is a short week during Christmas.

Page 30: Visualizing and discovering non trivial patterns in large time-series databases

30

Introduction Background

Time Series data mining tasks Visualizing Time Series

VizTree Evaluation

Conclusion

Conclusion

- Proposed VizTree as a visualization framework for massive

time series datasets.

- Mining and monitoring purposes.

- Process new data arrive.

Page 31: Visualizing and discovering non trivial patterns in large time-series databases

Thank you!