1 high dimensionality evgeny maksakov cs533c department of computer science ubc
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/1.jpg)
1
High dimensionality
Evgeny Maksakov
CS533C
Department of Computer Science
UBC
![Page 2: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/2.jpg)
2
Today
• Problem Overview
• Direct Visualization Approaches– Dimensional anchors
– Scagnostic SPLOMs
• Nonlinear Dimensionality Reduction– Locally Linear Embedding and Isomaps
– Charting manifold
![Page 3: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/3.jpg)
3
Problems with visualizing high dimensional data
• Visual cluttering
• Clarity of representation
• Visualization is time consuming
![Page 4: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/4.jpg)
4
Classical methods
![Page 5: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/5.jpg)
5
Multiple Line Graphs
Pictures from Patrick Hoffman et al. (2000)
![Page 6: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/6.jpg)
6
Multiple Line Graphs
- Hard to distinguish dimensions if multiple line graphs overlaid
- Each dimension may have different scale that should be shown
- More than 3 dimensions can become confusing
Advantages and disadvantages:
![Page 7: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/7.jpg)
7
Scatter Plot Matrices
Pictures from Patrick Hoffman et al. (2000)
![Page 8: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/8.jpg)
8
Scatter Plot Matrices
+ Useful for looking at all possible two-way interactions between dimensions
- Becomes inadequate for medium to high dimensionality
Advantages and disadvantages:
![Page 9: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/9.jpg)
9
Bar Charts, Histograms
Pictures from Patrick Hoffman et al. (2000)
![Page 10: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/10.jpg)
10
Bar Charts, Histograms
+ Good for small comparisons
- Contain little data
Advantages and disadvantages:
![Page 11: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/11.jpg)
11
Survey Plots
Pictures from Patrick Hoffman et al. (2000)
![Page 12: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/12.jpg)
12
Survey Plots
+ allows to see correlations between any two variables when the data is sorted according to one particular dimension
- can be confusing
Advantages and disadvantages:
![Page 13: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/13.jpg)
13
Parallel Coordinates
Pictures from Patrick Hoffman et al. (2000)
![Page 14: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/14.jpg)
14
Parallel Coordinates
+ Many connected dimensions are seen in limited space
+ Can see trends in data
- Become inadequate for very high dimensionality
- Cluttering
Advantages and disadvantages:
![Page 15: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/15.jpg)
15
Circular Parallel Coordinates
Pictures from Patrick Hoffman et al. (2000)
![Page 16: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/16.jpg)
16
Circular Parallel Coordinates
+ Combines properties of glyphs and parallel coordinates making pattern recognition easier
+ Compact
- Cluttering near center
- Harder to interpret relations between each pair of dimensions than parallel coordinates
Advantages and disadvantages:
![Page 17: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/17.jpg)
17
Andrews’ Curves
+ Allows to draw virtually unlimited dimensions
- Hard to interpret
Advantages and disadvantages:
![Page 18: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/18.jpg)
18
Radviz
Radviz employs spring model
Pictures from Patrick Hoffman et al. (2000)
![Page 19: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/19.jpg)
19
Radviz
+ Good for data manipulation
+ Low cluttering
- Cannot show quantitative data
- High computational complexity
Advantages and disadvantages:
![Page 20: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/20.jpg)
20
Parameters of DA
Survey plot feature
4. Width of the rectangle in a survey plot
Parallel coordinates features
5. Length of the parallel coordinate lines
6. Blocking factor for the parallel coordinate lines
![Page 21: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/21.jpg)
21
Parameters of DA
Radviz features
7. Size of the radviz plot point
8. Length of “spring” lines extending from individual anchor points of radviz plot
9. Zoom factor for the “spring” constant K
![Page 22: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/22.jpg)
22
DA Visualization Vector
P (p1,p2,p3,p4,p5,p6,p7,p8,p9)
![Page 23: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/23.jpg)
23
DA describes visualization for any combination of:
• Parallel coordinates
• Scatterplot matrices
• Radviz
• Survey plots (histograms)
• Circle segments
![Page 24: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/24.jpg)
24
Scatterplots
2 DAs, P = (0.8, 0.2, 0, 0, 0, 0, 0, 0, 0) 2 DAs, P = (0.1, 1.0, 0, 0, 0, 0, 0, 0, 0)
Picture from Patrick Hoffman et al. (1999)
![Page 25: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/25.jpg)
25
Scatterplots with other layouts
3 DAs, P = (0.6, 0, 0, 0, 0, 0, 0, 0, 0) 5 DAs, P = (0.5, 0, 0, 0, 0, 0, 0, 0, 0)
Picture from Patrick Hoffman et al. (1999)
![Page 26: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/26.jpg)
26
Survey Plots
P = (0, 0, 0, 0.4, 0, 0, 0, 0, 0) P = (0, 0, 0, 1.0, 0, 0, 0, 0, 0)
Picture from Patrick Hoffman et al. (1999)
![Page 27: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/27.jpg)
27
Circular Segments
P = (0, 0, 0, 1.0, 0, 0, 0, 0, 0)
Picture from Patrick Hoffman et al. (1999)
![Page 28: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/28.jpg)
28
Parallel Coordinates
P = (0, 0, 0, 0, 1.0, 1.0, 0, 0, 0)
Picture from Patrick Hoffman et al. (1999)
![Page 29: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/29.jpg)
29
Radviz like visualization
P = (0, 0, 0, 0, 0, 0, 0.5, 1.0, 0.5)
Picture from Patrick Hoffman et al. (1999)
![Page 30: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/30.jpg)
30
Playing with parameters
Crisscross layout with P = (0, 0, 0, 0, 0, 0, 0.4, 0, 0.5)
Parallel coordinates with P = (0, 0, 0, 0, 0, 0, 0.4, 0, 0.5)
Pictures from Patrick Hoffman et al. (1999)
![Page 31: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/31.jpg)
31
More?
Pictures from Patrick Hoffman et al. (1999)
![Page 32: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/32.jpg)
32
Scatterplot Diagnostics
or
Scagnostics
![Page 33: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/33.jpg)
33
Tukey’s Idea of Scagnostics
• Take measures from scatterplot matrix
• Construct scatterplot matrix (SPLOM) of these measures
• Look for data trends in this SPLOM
![Page 34: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/34.jpg)
34
Scagnostic SPLOM
Is like:• Visualization of a set of pointers
Also:• Set of pointers to pointers also can be constructed
Goal:• To be able to locate unusual clusters of measures that characterize
unusual clusters of raw scatterplots
![Page 35: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/35.jpg)
35
Problems with constructing Scagnostic SPLOM
1) Some of Tukeys’ measures presume underlying continuous empirical or theoretical probability function. It can be a problem for other types of data.
2) The computational complexity of some of the Tukey measures is O( n³ ).
![Page 36: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/36.jpg)
36
Solution*
1. Use measures from the graph-theory. – Do not presume a connected plane of support – Can be metric over discrete spaces
2. Base the measures on subsets of the Delaunay triangulation• Gives O(nlog(n)) in the number of points
3. Use adaptive hexagon binning before computing to further reduce the dependence on n.
4. Remove outlying points from spanning tree
* Leland Wilkinson et al. (2005)
![Page 37: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/37.jpg)
37
Properties of geometric graph for measures
• Undirected (edges consist of unordered pairs)
• Simple (no edge pairs a vertex with itself)
• Planar (has embedding in R2 with no crossed edges)
• Straight (embedded eges are straight line segments)
• Finite (V and E are finite sets)
![Page 38: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/38.jpg)
38
Graphs that fit these demands:
• Convex Hull
• Alpha Hull
• Minimal Spanning Tree
![Page 39: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/39.jpg)
39
Measures:
• Length of en edge
• Length of a graph
• Look for a closed path (boundary of a polygon)
• Perimeter of a polygon
• Area of a polygon
• Diameter of a graph
![Page 40: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/40.jpg)
40
Five interesting aspects of scattered points:
• Outliers – Outlying
• Shape – Convex– Skinny– Stringy– Straight
• Trend – Monotonic
• Density – Skewed– Clumpy
• Coherence – Striated
![Page 41: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/41.jpg)
41
Classifying scatterplots
Picture from L. Wilkinson et al. (2005)
![Page 42: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/42.jpg)
42
Looking for anomalies
Picture from L. Wilkinson et al. (2005)
![Page 43: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/43.jpg)
43
Picture from L. Wilkinson et al. (2005)
![Page 44: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/44.jpg)
44
Nonlinear Dimensionality Reduction (NLDR)
Assumptions:• data of interest lies on embedded nonlinear manifold
within higher dimensional space• manifold is low dimensional can be visualized in low
dimensional space.
Picture from: http://en.wikipedia.org/wiki/Image:KleinBottle-01.png
![Page 45: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/45.jpg)
45
Manifold
Topological space that is “locally Euclidean”.
Picture from: http://en.wikipedia.org/wiki/Image:Triangle_on_globe.jpg
![Page 46: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/46.jpg)
46
Methods
• Locally Linear Embedding
• ISOMAPS
![Page 47: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/47.jpg)
47
Isomaps Algorithm
1. Construct neighborhood graph
2. Compute shortest paths
3. Construct d-dimensional embedding (like in MDS)
Picture from: Joshua B. Tenenbaum et al. (2000)
![Page 48: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/48.jpg)
48
Pictures taken from http://www.cs.wustl.edu/~pless/isomapImages.html
![Page 49: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/49.jpg)
49
Locally Linear Embedding (LLE) Algorithm
Picture from Lawrence K. Saul at al. (2002)
![Page 50: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/50.jpg)
50
Original Sample Mapping by LLE
Application of LLE
Picture from Lawrence K. Saul at al. (2002)
![Page 51: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/51.jpg)
51
Limitations of LLE
• Algorithm can only recover embeddings whose dimensionality, d, is strictly less than the number of neighbors, K. Margin between d and K is recommended.
• Algorithm is based on assumption that data point and its nearest neighbors can be modeled as locally linear; for curved manifolds, too large K will violate this assumption.
• In case of originally low dimensionality of data algorithm degenerates.
![Page 52: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/52.jpg)
52
Proposed improvements*
• Analyze pairwise distances between data points instead of assuming that data is multidimensional vector
• Reconstruct convex
• Estimate the intrinsic dimensionality
• Enforce the intrinsic dimensionality if it is known a priori or highly suspected
* Lawrence K. Saul at al (2002)
![Page 53: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/53.jpg)
53
Strengths and weaknesses:
• ISOMAP handles holes well
• ISOMAP can fail if data hull is non-convex
• Vice versa for LLE
• Both offer embeddings without mappings.
![Page 54: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/54.jpg)
54
Charting manifold
![Page 55: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/55.jpg)
55
Algorithm Idea
1) Find a set of data covering locally linear neighborhoods (“charts”) such that adjoining neighborhoods span maximally similar subspaces
2) Compute a minimal-distortion merger (“connection”) of all charts
![Page 56: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/56.jpg)
56
Picture from Matthew Brand (2003)
![Page 57: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/57.jpg)
57
Video test
Picture from Matthew Brand (2003)
![Page 58: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/58.jpg)
58
Where ISOMAPs and LLE fail, Charting Prevail
Picture from Matthew Brand (2003)
![Page 59: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/59.jpg)
59
Questions?
![Page 60: 1 High dimensionality Evgeny Maksakov CS533C Department of Computer Science UBC](https://reader031.vdocuments.site/reader031/viewer/2022032800/56649d385503460f94a1170a/html5/thumbnails/60.jpg)
60
Literature
Covered papers:1. Graph-Theoretic Scagnostics L. Wilkinson, R. Grossman, A. Anand. Proc.
InfoVis 2005. 2. Dimensional Anchors: a Graphic Primitive for Multidimensional Multivariate
Information Visualizations, Patrick Hoffman et al., Proc. Workshop on New Paradigms in Information Visualization and Manipulation, Nov. 1999, pp. 9-16.
3. Charting a manifold Matthew Brand, NIPS 2003. 4. Think Globally, Fit Locally: Unsupervised Learning of Nonlinear Manifolds.
Lawrence K. Saul & Sam T. Roweis. University of Pennsylvania Technical Report MS-CIS-02-18, 2002
Other papers:• A Global Geometric Framework for Nonlinear Dimensionality Reduction,
Joshua B. Tenenbaum, Vin de Silva, John C. Langford, SCIENCE VOL 290 2319-2323 (2000)