isye 6416: computational statistics spring 2017 …yxie77/isye6416_17/lecture...isye 6416:...
TRANSCRIPT
![Page 1: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/1.jpg)
ISyE 6416: Computational StatisticsSpring 2017
Lecture 11: Principal Component Analysis(PCA)
Prof. Yao Xie
H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of Technology
![Page 2: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/2.jpg)
Why PCA
I Real-world data sets usually exhibit structures among theirvariables
I Principal component analysis (PCA) rotates the original datato new coordinates
I Dimension reductionI ClassificationI Denoising
![Page 3: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/3.jpg)
Data compression
![Page 4: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/4.jpg)
Face recognition
Eigenfaces
![Page 5: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/5.jpg)
Denoising
![Page 6: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/6.jpg)
Data visualization
“The system collects opinions on statements as scalar values on acontinuous scale and applies dimensionality reduction to projectthe data onto a two-dimensional plane for visualization andnavigation.” Using Canonical Correla.on Analysis (CCA) for Opinion
Visualiza.on, Faridani et.al, UC Berkeley, 2010.
![Page 7: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/7.jpg)
What is PCAI Given a data matrix X ∈ Rn×p: n samples, and p variablesI Transform data set to one with a few number of principal
components
vTj xi, j = 1, 2, . . . ,K, i = 1, 2, . . . , n.
I It is a form of linear dimension reductionI Specifically, “interesting directions” means “high variance”
![Page 8: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/8.jpg)
Projection onto unit vectors
(xT v) ∈ R: score(xT v)v ∈ Rp: projection
![Page 9: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/9.jpg)
Example: projection onto unit vectors
![Page 10: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/10.jpg)
Projections onto orthonormal vectors
Source: R. Tibshrani.
![Page 11: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/11.jpg)
First principal component
I The first principal component direction of X is the unit vectorv1 ∈ Rp that maximizes the sample variance of Xv1 ∈ Rn
among all unit length vector
v1 = arg max‖v‖2=1
(Xv)T (Xv)
Note that
Xv =
xT1 v...
xTnv
are projection of each of the sample on a unit length vector
I Xv1 is the first principal component score
![Page 12: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/12.jpg)
Eigenvector and eigenvalue
v1 = arg max‖v‖2=1
(Xv)T (Xv)
I v1 corresponds to the largest eigenvector of XTX: samplecovariance matrix
I vT1 XTXv1, the largest eigenvalue of XTX is the variance
explained by v1I Rayleigh quotient R(v) = vTAv
vT v
λmin ≤vTAv
vT v≤ λmax
![Page 13: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/13.jpg)
Example: X ∈ R50×2
![Page 14: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/14.jpg)
Beyond first principal componentI What’s next? The idea is the find orthogonal directions of the
remaining highest varianceI Orthogonal: since we have already explained the variance
along v1, we need new direction that has no “overlap” with v1to avoid redundancy.
v2 = arg max‖v‖2=1,vT v1=0
(Xv)T (Xv)
I Can repeat this process to find the kth principal componentdirection
vk = arg max‖v‖2=1,vT vj=0,j=1,...,k−1
(Xv)T (Xv)
![Page 15: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/15.jpg)
Example: X ∈ R50×2
![Page 16: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/16.jpg)
Properties
I There are at most p principal components
I [v1, v2, . . . , vp] can be found from eigendecompositionLet Σ = XTX
Σ = UΛUT
where Λ = diag{λ1, . . . , λp}, U is orthogonal matrix
I Can be computed efficiently if you only need the first principlecomponent: power’s method
v(k) := Σv(k−1)/‖Σv(k−1)‖
I In general can be computed using Jacobi’s method
I Sparse Σ
![Page 17: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/17.jpg)
PCA example: financial data analysis
I Daily swap rates of eight maturities from 7/3/2000 to7/15/2005
data from
http://www.stanford.edu/~xing/statfinbook/data.html
![Page 18: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/18.jpg)
![Page 19: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/19.jpg)
We typically take difference of these time series: yt = xt − xt−1, tomake it more “stationary”.
![Page 20: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/20.jpg)
Use the MATLAB command to perform PCA
eigenvalues first 3 eigenvectors
![Page 21: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof](https://reader035.vdocuments.site/reader035/viewer/2022081422/5f048b527e708231d40e7f02/html5/thumbnails/21.jpg)
Resulted first 3 reduced dimensions