data projections & visualization rajmonda caceres mit lincoln laboratory
TRANSCRIPT
![Page 1: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/1.jpg)
Data Projections &Visualization
Rajmonda CaceresMIT Lincoln Laboratory
![Page 2: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/2.jpg)
Dimensionality Reduction
Reduce complexityVisualComputational
Identify the intrinsic dimensionality of data
Identify the most relevant aspects of data given a task
![Page 3: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/3.jpg)
The Curse of Dimensionality
Lower Dimension
Higher Dimension
![Page 4: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/4.jpg)
Data Projections
a) b)
Not all projections are equal
![Page 5: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/5.jpg)
Data Projections
Desired propertiesReduced, compressed representationPreserved useful/intrinsic properties of the dataApplify patterns of interest (e.g. outliers)Simple, interpretable
Trade-off between simplicity and preservation of structure
![Page 6: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/6.jpg)
Distance Function
Helps us organize the data
Helps us discriminate patterns
![Page 7: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/7.jpg)
Distance Functions
Manhattan distance (1 norm, taxicab distance)
Euclidean distance (2 norm)
![Page 8: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/8.jpg)
L-p Distance
Distance Functions
As p grows the largest coordinate distances tends to dominate the global distance
![Page 9: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/9.jpg)
Distance Functions
![Page 10: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/10.jpg)
Data Projections
Projective methods: preserve a property of dataPrincipal Component Analysis (PCA)Many others: ICA, Factor Analysis,
Manifold LearningMultidimensional Dimension Reduction (MDS) LLE, Isomap
![Page 11: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/11.jpg)
Principal Component Analysis
Goal: Find a linear projection that captures most of variance
1st Principal Component
2nd Principal Component1st Principal Component
![Page 12: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/12.jpg)
Principal Component Analysis
PCA pseudo code:Centralize the data by subtracting the meanCalculate the covariance matrix:
Calculate the eigenvectors(principal components) of the covariance matrixSelect top few(2-3) eigenvectors (highest eigenvalues)Project the data using these eigenvectors as axis
![Page 13: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/13.jpg)
PCA on IRIS Dataset
Screeplot Biplot
![Page 14: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/14.jpg)
Multidimensional Scaling
Goal: Find a lower embedding of the data that preserves pairwise distances
Formally:
: Input distance values
: Output distances values
![Page 15: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/15.jpg)
MDS Projection of Us Capitals
![Page 16: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/16.jpg)
Goodness of MDS SolutionShepard Diagram
MDS Distances
Dat
a D
istan
ces
![Page 17: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/17.jpg)
Takeaways
More features are not necessarily better
Understand the assumptions of different modeling choices
When choosing distance functions, projection methodsConsider the characteristics of the data Consider the learning objective
Explore multiple choices simultaneously to gain better insight
![Page 18: Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf991a28abf838c91e21/html5/thumbnails/18.jpg)
Referenceshttp://statweb.stanford.edu/~jtaylo/courses/stats202/mds.html
https://planspacedotorg.wordpress.com/2013/02/03/pca-3d-visualization-and-clustering-in-r/
Multidimensional Scaling, Leland Wilkinson
Dimension Reduction: A Guided Tour, Christopher J.C. Burgesti
When is “nearest neighbor” meaningful?, Beyer, K.S., GoldStein, J. Ramakrishnan, R. & Shaft g, by