high-dimensional datavda.univie.ac.at/teaching/vis/13s/lecturenotes/14_attribute.pdf · • no...
TRANSCRIPT
![Page 1: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/1.jpg)
© Munzner/Möller
High-Dimensional Data
VisualizationTorsten Möller
![Page 2: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/2.jpg)
© Munzner/Möller 2
Readings
• Munzner, “Visualization Design and Analysis”:– Chapter 8 (Reducing Items and Attributes)
![Page 3: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/3.jpg)
© Munzner/Möller
Data Reduction
• Cannot make sense of everything at once
• reduce amount shown– items (rows of a table / elements)– attributes (columns of a table / dimensions)
3
![Page 4: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/4.jpg)
© Munzner/Möller
Last time - Item reduction
• simple filtering• simple aggregation• navigation (camera-oriented: zooming)
– geometric vs. semantic zooming– pan/translate– constraints + combinations
• overviews: focus + context– selective filtering– geometric distortions
4
![Page 5: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/5.jpg)
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
5
![Page 6: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/6.jpg)
© Munzner/Möller
Attribute filtering: SPloM• filtering, but for attributes rather than items
– unfiltered vs filtered SPLOM•
[Yang et al. InfoVis 2003]
![Page 7: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/7.jpg)
© Munzner/Möller
Attribute filtering: Parallel Coordinates
• unfiltered vs. filtered PC
[Yang et al. InfoVis 2003]
![Page 8: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/8.jpg)
© Munzner/Möller
Attribute filtering: Star Glyphs
• unfiltered vs. filtered
[Yang et al. InfoVis 2003]
![Page 9: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/9.jpg)
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
9
![Page 10: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/10.jpg)
© Munzner/Möller
Slicing / Cutting: Spatial Data
• easy to understand metaphor• reduces data
from 3D to 2D
10
![Page 11: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/11.jpg)
© Munzner/Möller
Axis-aligned slices
11http://rsbweb.nih.gov/ij/plugins/volume-viewer.html
![Page 12: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/12.jpg)
© Munzner/Möller
HyperSlice: slicing in multi-d
• HyperSlice: matrix of orthogonal 2D slices– each panel is display and control: drag to
change slice– simple 3D example
12
![Page 13: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/13.jpg)
© Munzner/Möller
Example
• 4D function Sum(i=0...3) wi/(1 + |x − pi|2)• diagonals = standard graph
13
![Page 14: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/14.jpg)
© Munzner/Möller
(orthographic) Projections
• (axis-aligned) orthographic: remove all information about filtered dims– hypercube: 3D to 2D, 4D to 3D (video)
14
![Page 15: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/15.jpg)
© Munzner/Möller
SPloM: orthographic projections in multi-d
15
![Page 16: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/16.jpg)
© Munzner/Möller 16
Oblique (parallel) projections
• Projectors are not normal to projection plane
• Most drawings in textbooks use oblique projection
![Page 17: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/17.jpg)
© Munzner/Möller 17
Common oblique projections
• Cavalier projections– Angle α = 45 degrees– Preserves the length of a line segment
perpendicular to the projection plane– Angle φ is typically 30 or 45 degrees
• Cabinet projections– Angle α = 63.7 degrees or arctan(2)– Halves the length of a line segment
perpendicular to the projection plane – more realistic than cavalier
![Page 18: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/18.jpg)
© Munzner/Möller
(perspective) Projections
• some info about filtered dims remains• mimics human visual system in 3D• not as effective in multi-D
18
![Page 19: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/19.jpg)
© Munzner/Möller
Today - Attribute reduction
• simple attribute filtering• camera-oriented:
– slice– cut– project
• aggregation: dimensionality reduction– linear– non-linear
19
![Page 20: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/20.jpg)
© Munzner/Möller
Dimensionality vs. attribute reduction
• vocab use in field not consistent– dimension/attribute
• attribute reduction: reduce set with filtering– includes orthographic projection
• dimensionality reduction: create smaller set of new dims– set size is smaller than original, new dims
completely synthetic– clarification: includes dimensional aggregation– includes some projections (but not all)
• vocab: projection/mapping20
![Page 21: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/21.jpg)
© Munzner/Möller
Uncovering Hidden Structure
• measurements indirect not direct– real-world sensor limitations
• measurements made in sprawling space– documents, images
• DR only suitable if (almost) all information could be conveyed with fewer dimensions– how do you know? need to estimate true
dimensionality to check if different than original! 21
![Page 22: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/22.jpg)
© Munzner/Möller
Dimensionality Reduction• mapping multidimensional space into
space of fewer dimensions– typically 2D for clarity– 1D/3D possible?– keep/explain as much variance as possible– show underlying dataset structure
• motivation: true/intrinsic dimension of dataset is much lower than measured dim
• linear vs. non-linear approaches22
![Page 23: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/23.jpg)
© Munzner/Möller
Estimating True Dimensionality
• error for low-dim projection vs high-dim original
• no single correct answer; many metrics proposed– cumulative variance that is not accounted for– strain: match variations in distance (vs actual
distance values)– stress: difference between interpoint
distances in high and low dimensions
23
![Page 24: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/24.jpg)
© Munzner/Möller
Showing Dimensionality Estimates
• scree plots as simple way: error against # dims– original dataset: 294 dims– estimate: almost all variance preserved with
< 20 dims
24
![Page 25: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/25.jpg)
© Munzner/Möller
Linear DimRed• based on linear projections!• given dimensions have a strong meaning• want to preserve linearity in layout• examples: Principal component analysis (PCA),
Independent component analysis (ICA), Linear discriminant analysis (LDA), etc.
25
PCA LDA
![Page 26: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/26.jpg)
© Munzner/Möller
Problem for linear dimred
26
![Page 27: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/27.jpg)
© Munzner/Möller
Nonlinear DimRed
• no inherent meaning to given dimensions• minimize differences between interpoint
distances in high and low dimensions• examples: Multidimensional scaling
(MDS), isomap, local linear embedding (LLE)
27
![Page 28: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/28.jpg)
© Munzner/Möller
Dimred: Isomap
• 4096 D to 2D [Tenenbaum 00]• 2D: wrist rotation, fingers extension
[A G
loba
l Geo
met
ric F
ram
ewor
k fo
r Non
linea
r Dim
ensi
onal
ity R
educ
tion.
Tene
nbau
m, d
e S
ilva
and
Lang
ford
. S
cien
ce 2
90 (5
500)
: 231
9-23
23, 2
2 D
ecem
ber 2
000,
is
omap
.sta
nfor
d.ed
u]
28
![Page 29: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/29.jpg)
© Munzner/Möller
Goals / Tasks
• keep/explain as much variance as possible
• find clusters– or compare / evaluate with previous
clustering• understand structure
– absolute position not reliable– fine grained structure not reliable
29
![Page 30: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/30.jpg)
© Munzner/Möller
Naive Spring Model• repeat for all points
– compute spring force to all other points• difference between high dim vs. low dim distance (stress)
– move to better location using computed forces• compute distances between all points
– O(n2) iteration, O(n3) algorithm
30
![Page 31: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/31.jpg)
© Munzner/Möller
Faster Spring Model [Chalmers 96]
• compare distances only with a few points– maintain small local neighborhood set
31
![Page 32: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/32.jpg)
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if
closer
Faster Spring Model [Chalmers 96]
32
![Page 33: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/33.jpg)
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if
closer
Faster Spring Model [Chalmers 96]
33
![Page 34: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/34.jpg)
© Munzner/Möller
• compare distances only with a few points– maintain small local neighborhood set– each time pick some randoms, swap in if closer
• small constant: 6 locals, 3 randoms typical– O(n) iteration, O(n2) algorithm
Faster Spring Model [Chalmers 96]
34
![Page 35: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/35.jpg)
© Munzner/Möller
True/Intrinsic Dimensionality
• how many dimensions is enough?– could be more than 2 or 3– knee in scree plot
• example– measured materials
from graphics– linear PCA: 25– get physically
impossible intermediate points
35
A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf
![Page 36: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/36.jpg)
© Munzner/Möller
True/Intrinsic Dimensionality
• nonlinear MDS: 10-15– all intermediate points possible
• they have intuitive meaning– red, green, blue,
specular, diffuse, glossy, metallic, plastic-y, roughness, rubbery, greasiness, dustiness...
36
A Data-Driven Reflectance Model, SIGGRAPH 2003,W. Matusik, H. Pfister, M. Brand and L. McMillan, people.csail.mit.edu/wojciech/DDRM/ddrm.pdf
![Page 37: High-Dimensional Datavda.univie.ac.at/Teaching/Vis/13s/LectureNotes/14_attribute.pdf · • no single correct answer; many metrics proposed – cumulative variance that is not accounted](https://reader035.vdocuments.site/reader035/viewer/2022070808/5f06c6677e708231d419abca/html5/thumbnails/37.jpg)
© Munzner/Möller
Showing DR Data• scatterplot showing points
– only works if true dimensionality is 2 (... or 3)– need to drill down to see what points represent
• SPLOM– safe choice
• landscapes– avoid! studies show worse than just using points
37