robust pca
DESCRIPTION
Robust PCA. Robust PCA 3: Spherical PCA. Robust PCA. Spherical PCA Problem : Magnification of High Freq. Coeff ’ s Solution : Elliptical Analysis Background ( Univariate ): MAD = Median Absolute Deviation MAD = - PowerPoint PPT PresentationTRANSCRIPT
Title
Robust PCARobust PCA 3:
Spherical PCA
1Robust PCA22Robust PCA
RescaleCoordsUnscaleCoordsSpherical PCA3
Aside On VisualizationAnother Multivariate Data Visualization Tool:Parallel Coordinates
Forgot to Give Citations:
Inselberg (1985, 2009)44Big Picture View of PCAAlternate Viewpoint: Gaussian LikelihoodWhen data are multivariate GaussianPCA finds major axes of elliptal contoursof Probability Density Maximum Likelihood Estimate
Mistaken idea: PCA only useful for Gaussian data 5Big Picture View of PCARaw Cornea Data:
Data Median
(Data Median)------------------- MAD
6
Big Picture View of PCAMistaken idea: PCA only useful for Gaussian data
Toy Example:Each MarginalBinary
Clearly NOTGaussian n = 100, d = 40007Big Picture View of PCAMistaken idea: PCA only useful for Gaussian data
But PCARevealsTrimodalStructure
8Correlation PCAA related (& better known) variation of PCA:Replace cov. matrix with correlation matrixI.e. do eigen analysis of
Where
9
Transformations10TransformationsMuch Nicer Distribution
11
TransformationsUseful Visualization: MargDistPlot
ChangeSummaryTo Skewness
Most/LeastSymmetric12Clusters in dataCommon Statistical Task:Find Clusters in DataInteresting sub-populations?Important structure in data?How to do this?
PCA provides very simple approachThere is a large literature of other methods(will study more later)13PCA to find clustersRecall Toy Example with more clusters:
14PCA to find clustersBest revealed by 2d scatterplots (4 clusters):
15PCA to find clustersA deeper example:Mass Flux DataData from Enrica Bellone,National Center for Atmospheric ResearchMass Flux for quantifying cloud typesHow does mass change when moving into a cloud16PCA to find clustersPCA of Mass Flux Data:
17PCA to find clustersSummary of PCA of Mass Flux Data:Mean:Captures General Mountain ShapePC1:Generally overall height of peakshows up nicely in mean +- plot (2nd col)3 apparent clusters in scores plotAre those really there?If so, could lead to interesting discoveryIf not, could waste effort in investigation 18PCA to find clustersSummary of PCA of Mass Flux Data:PC2:Location of peakagain mean +- plot very useful herePC3:Width adjustmentagain see most clearly in mean +- plot
Maybe non-linear modes of variation???19PCA to find clustersReturn to Investigation of PC1 Clusters:Can see 3 bumps in smooth histogramMain Question: Important structureorsampling variability?
Approach: SiZer(SIgnificance of ZERo crossings of deriv.)20Statistical SmoothingIn 1 Dimension
(Numbers as Data Objects)21Statistical SmoothingIn 1 Dimension, 2 Major Settings:
Density EstimationHistograms
Nonparametric RegressionScatterplot Smoothing
22Density EstimationE.g. Hidalgo Stamp DataThicknesses of Postage Stamps Produced in MexicoOver ~ 70 yearsDuring 1800sPaper produced in several factories?How many factories? (Records lost)Brought to statistical literature by:Izenman and Sommer (1988)23Density EstimationE.g. Hidalgo Stamp DataThicknesses of Postage Stamps Produced in MexicoOver ~ 70 yearsDuring 1800sPaper produced in several factories?
(Thicknesses vary by up to factor of 2)24Density EstimationE.g. Hidalgo Stamp DataA histogramOversmoothedBin Width toolarge?Miss importantstructure?
25Density EstimationE.g. Hidalgo Stamp DataAnother histogramSmaller binwidthSuggests 2 modes?2 factories makingthe paper?
26Density EstimationE.g. Hidalgo Stamp DataAnother histogramSmaller binwidthSuggests 6 modes?6 factories makingthe paper?
27Density EstimationE.g. Hidalgo Stamp DataAnother histogramEven smaller binwidthSuggests many modes?Really believemodes are there?Or just samplingvariation?
28Density EstimationE.g. Hidalgo Stamp DataCritical Issue for histograms:Choice of binwidth (well understood?)
29HistogramsLess Well Understood issue:Choice of bin locationMajor impact on number of modes (2-7)All for same binwidth
30HistogramsChoice of bin location:What is going on?Compare with Average Histogram
31Density EstimationCompare shifts with Average HistogramFor 7 mode shiftPeaks line up with bin centersSo shifted histosfind peaks
32Density EstimationCompare shifts with Average HistogramFor 2 (3?) mode shiftPeaks split between binsSo shifted histosmiss peaks
33Density EstimationHistogram Drawbacks:Need to choose bin widthNeed to choose bin locationBut Average Histogram reveals structure So should use that, instead of histo
Name: Kernel Density Estimate34Kernel Density EstimationChondrite Data:Stony (metal) Meteorites (hit the earth)So have a chunk of rockStudy % of silicaFrom how many sources?Only 22 rocksHistogram hopeless?Brought to statistical literature by:Good and Gaskins (1980)
35Kernel Density EstimationChondrite Data:Represent points by red barsWhere are data more dense?
36Kernel Density EstimationChondrite Data:Put probability mass 1/n at each pointSmooth piece of density
37Kernel Density EstimationChondrite Data:Sum pieces to estimate densitySuggests 3 modes (rock sources)
38Kernel Density EstimationMathematical Notation:
WhereWindow shape given by kernel,
Window width given by bandwidth,
39Kernel Density EstimationMathematical Notation:
ThisWas Used In PCA Graphics
40Kernel Density EstimationChoice of kernel (window shape)?Controversial issueWant Computational Speed?Want Statistical Efficiency?Want Smooth Estimates?There is more, but personal choice: GaussianGood Overall Reference: Wand and Jones (1994)
41Kernel Density EstimationChoice of bandwidth (window width)?Very important to performance
Fundamental Issue:Which modes are really there?
42Density EstimationHow to use histograms if you must:Undersmooth (minimizes bin edge effect)Human eye is OK at post-smoothing
43Statistical Smoothing2 Major Settings:
Density EstimationHistograms
Nonparametric RegressionScatterplot Smoothing
44Scatterplot SmoothingE.g. Bralower Fossil Data
Prof. of GeosciencesPenn. State Univ.
45Scatterplot SmoothingE.g. Bralower Fossil DataStudy Global ClimateTime scale of millions of yearsData points are fossil shellsDated by surrounding materialRatio of Isotopes of Strontium(differences in 4th decimal point!)Surrogate for Sea Level (Ice Ages)Data seem to have structure
46Scatterplot SmoothingE.g. Bralower Fossil Data
47Scatterplot SmoothingE.g. Bralower Fossil DataWay to bring out structure:Smooth the dataMethods of smoothing?Local AveragesSplines (several types)Fourier trim high frequenciesOther basesAlso controversial
48Scatterplot SmoothingE.g. Bralower Fossil Data some smooths
49Scatterplot SmoothingA simple approach: local averages
Given data:
Model in regression form:
How to estimate ?
50Scatterplot SmoothingA simple approach: local averagesGiven a kernel window function:
Estimate the curve by a weighted local average:
51Scatterplot SmoothingInteresting representation of local average:
Given kernel window weights,
Local constant fit to data:
52Scatterplot SmoothingLocal Constant Fits (visually):Moving AverageWindow width is critical (~ k.d.e.)
53Scatterplot SmoothingInteresting variation:Local linear fit to data:
Given kernel window weights,
54Scatterplot SmoothingLocal Linear Fits (visually):Intercept of Moving Fit LineWindow width is critical (~ k.d.e.)
55Scatterplot SmoothingAnother variation:Intercept of Moving Polynomial FitWindow width is critical (~ k.d.e.)
56Scatterplot SmoothingLocal Polynomial SmoothingWhat is best polynomial degree?Once again controversialAdvocates for all of 0, 1, 2, 3.Depends on personal weighting of factors involvedGood reference:Fan & Gijbels (1995)Personal choice: degree 1, local linear57Scatterplot SmoothingE.g. Bralower Fossils local linear smooths
58Scatterplot SmoothingSmooths of Bralower Fossil Data:Oversmoothed misses structureUndersmoothed feels sampling noise?About right shows 2 valleys:One seems clearIs other one really there?Same question as above
59Kernel Density EstimationChoice of bandwidth (window width)?Very important to performance
Fundamental Issue:Which modes are really there?
60Kernel Density EstimationChoice of bandwidth (window width)?Very important to performanceData Based Choice?Controversial IssueMany recommendationsSuggested Reference:Jones, Marron & Sheather (1996)Never a consensus
61Kernel Density EstimationChoice of bandwidth (window width)?Alternate Choice:Consider all of them!I.e. look at whole spectrum of smoothsCan see different structureAt different smoothing levelsConnection to Scale SpaceE.g. Stamps dataHow many modes?All answers are there.
62Kernel Density Estimation
63Statistical SmoothingFundamental QuestionFor both ofDensity Estimation: HistogramsRegression: Scatterplot Smoothing
Which bumps are really there?vs. artifacts of sampling noise?64