robust pca

64
Robust PCA Robust PCA 3: Spherical PCA

Upload: leila-oliver

Post on 03-Jan-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Robust PCA. Robust PCA 3: Spherical PCA. Robust PCA. Spherical PCA Problem : Magnification of High Freq. Coeff ’ s Solution : Elliptical Analysis Background ( Univariate ): MAD = Median Absolute Deviation MAD = - PowerPoint PPT Presentation

TRANSCRIPT

Title

Robust PCARobust PCA 3:

Spherical PCA

1Robust PCA22Robust PCA

RescaleCoordsUnscaleCoordsSpherical PCA3

Aside On VisualizationAnother Multivariate Data Visualization Tool:Parallel Coordinates

Forgot to Give Citations:

Inselberg (1985, 2009)44Big Picture View of PCAAlternate Viewpoint: Gaussian LikelihoodWhen data are multivariate GaussianPCA finds major axes of elliptal contoursof Probability Density Maximum Likelihood Estimate

Mistaken idea: PCA only useful for Gaussian data 5Big Picture View of PCARaw Cornea Data:

Data Median

(Data Median)------------------- MAD

6

Big Picture View of PCAMistaken idea: PCA only useful for Gaussian data

Toy Example:Each MarginalBinary

Clearly NOTGaussian n = 100, d = 40007Big Picture View of PCAMistaken idea: PCA only useful for Gaussian data

But PCARevealsTrimodalStructure

8Correlation PCAA related (& better known) variation of PCA:Replace cov. matrix with correlation matrixI.e. do eigen analysis of

Where

9

Transformations10TransformationsMuch Nicer Distribution

11

TransformationsUseful Visualization: MargDistPlot

ChangeSummaryTo Skewness

Most/LeastSymmetric12Clusters in dataCommon Statistical Task:Find Clusters in DataInteresting sub-populations?Important structure in data?How to do this?

PCA provides very simple approachThere is a large literature of other methods(will study more later)13PCA to find clustersRecall Toy Example with more clusters:

14PCA to find clustersBest revealed by 2d scatterplots (4 clusters):

15PCA to find clustersA deeper example:Mass Flux DataData from Enrica Bellone,National Center for Atmospheric ResearchMass Flux for quantifying cloud typesHow does mass change when moving into a cloud16PCA to find clustersPCA of Mass Flux Data:

17PCA to find clustersSummary of PCA of Mass Flux Data:Mean:Captures General Mountain ShapePC1:Generally overall height of peakshows up nicely in mean +- plot (2nd col)3 apparent clusters in scores plotAre those really there?If so, could lead to interesting discoveryIf not, could waste effort in investigation 18PCA to find clustersSummary of PCA of Mass Flux Data:PC2:Location of peakagain mean +- plot very useful herePC3:Width adjustmentagain see most clearly in mean +- plot

Maybe non-linear modes of variation???19PCA to find clustersReturn to Investigation of PC1 Clusters:Can see 3 bumps in smooth histogramMain Question: Important structureorsampling variability?

Approach: SiZer(SIgnificance of ZERo crossings of deriv.)20Statistical SmoothingIn 1 Dimension

(Numbers as Data Objects)21Statistical SmoothingIn 1 Dimension, 2 Major Settings:

Density EstimationHistograms

Nonparametric RegressionScatterplot Smoothing

22Density EstimationE.g. Hidalgo Stamp DataThicknesses of Postage Stamps Produced in MexicoOver ~ 70 yearsDuring 1800sPaper produced in several factories?How many factories? (Records lost)Brought to statistical literature by:Izenman and Sommer (1988)23Density EstimationE.g. Hidalgo Stamp DataThicknesses of Postage Stamps Produced in MexicoOver ~ 70 yearsDuring 1800sPaper produced in several factories?

(Thicknesses vary by up to factor of 2)24Density EstimationE.g. Hidalgo Stamp DataA histogramOversmoothedBin Width toolarge?Miss importantstructure?

25Density EstimationE.g. Hidalgo Stamp DataAnother histogramSmaller binwidthSuggests 2 modes?2 factories makingthe paper?

26Density EstimationE.g. Hidalgo Stamp DataAnother histogramSmaller binwidthSuggests 6 modes?6 factories makingthe paper?

27Density EstimationE.g. Hidalgo Stamp DataAnother histogramEven smaller binwidthSuggests many modes?Really believemodes are there?Or just samplingvariation?

28Density EstimationE.g. Hidalgo Stamp DataCritical Issue for histograms:Choice of binwidth (well understood?)

29HistogramsLess Well Understood issue:Choice of bin locationMajor impact on number of modes (2-7)All for same binwidth

30HistogramsChoice of bin location:What is going on?Compare with Average Histogram

31Density EstimationCompare shifts with Average HistogramFor 7 mode shiftPeaks line up with bin centersSo shifted histosfind peaks

32Density EstimationCompare shifts with Average HistogramFor 2 (3?) mode shiftPeaks split between binsSo shifted histosmiss peaks

33Density EstimationHistogram Drawbacks:Need to choose bin widthNeed to choose bin locationBut Average Histogram reveals structure So should use that, instead of histo

Name: Kernel Density Estimate34Kernel Density EstimationChondrite Data:Stony (metal) Meteorites (hit the earth)So have a chunk of rockStudy % of silicaFrom how many sources?Only 22 rocksHistogram hopeless?Brought to statistical literature by:Good and Gaskins (1980)

35Kernel Density EstimationChondrite Data:Represent points by red barsWhere are data more dense?

36Kernel Density EstimationChondrite Data:Put probability mass 1/n at each pointSmooth piece of density

37Kernel Density EstimationChondrite Data:Sum pieces to estimate densitySuggests 3 modes (rock sources)

38Kernel Density EstimationMathematical Notation:

WhereWindow shape given by kernel,

Window width given by bandwidth,

39Kernel Density EstimationMathematical Notation:

ThisWas Used In PCA Graphics

40Kernel Density EstimationChoice of kernel (window shape)?Controversial issueWant Computational Speed?Want Statistical Efficiency?Want Smooth Estimates?There is more, but personal choice: GaussianGood Overall Reference: Wand and Jones (1994)

41Kernel Density EstimationChoice of bandwidth (window width)?Very important to performance

Fundamental Issue:Which modes are really there?

42Density EstimationHow to use histograms if you must:Undersmooth (minimizes bin edge effect)Human eye is OK at post-smoothing

43Statistical Smoothing2 Major Settings:

Density EstimationHistograms

Nonparametric RegressionScatterplot Smoothing

44Scatterplot SmoothingE.g. Bralower Fossil Data

Prof. of GeosciencesPenn. State Univ.

45Scatterplot SmoothingE.g. Bralower Fossil DataStudy Global ClimateTime scale of millions of yearsData points are fossil shellsDated by surrounding materialRatio of Isotopes of Strontium(differences in 4th decimal point!)Surrogate for Sea Level (Ice Ages)Data seem to have structure

46Scatterplot SmoothingE.g. Bralower Fossil Data

47Scatterplot SmoothingE.g. Bralower Fossil DataWay to bring out structure:Smooth the dataMethods of smoothing?Local AveragesSplines (several types)Fourier trim high frequenciesOther basesAlso controversial

48Scatterplot SmoothingE.g. Bralower Fossil Data some smooths

49Scatterplot SmoothingA simple approach: local averages

Given data:

Model in regression form:

How to estimate ?

50Scatterplot SmoothingA simple approach: local averagesGiven a kernel window function:

Estimate the curve by a weighted local average:

51Scatterplot SmoothingInteresting representation of local average:

Given kernel window weights,

Local constant fit to data:

52Scatterplot SmoothingLocal Constant Fits (visually):Moving AverageWindow width is critical (~ k.d.e.)

53Scatterplot SmoothingInteresting variation:Local linear fit to data:

Given kernel window weights,

54Scatterplot SmoothingLocal Linear Fits (visually):Intercept of Moving Fit LineWindow width is critical (~ k.d.e.)

55Scatterplot SmoothingAnother variation:Intercept of Moving Polynomial FitWindow width is critical (~ k.d.e.)

56Scatterplot SmoothingLocal Polynomial SmoothingWhat is best polynomial degree?Once again controversialAdvocates for all of 0, 1, 2, 3.Depends on personal weighting of factors involvedGood reference:Fan & Gijbels (1995)Personal choice: degree 1, local linear57Scatterplot SmoothingE.g. Bralower Fossils local linear smooths

58Scatterplot SmoothingSmooths of Bralower Fossil Data:Oversmoothed misses structureUndersmoothed feels sampling noise?About right shows 2 valleys:One seems clearIs other one really there?Same question as above

59Kernel Density EstimationChoice of bandwidth (window width)?Very important to performance

Fundamental Issue:Which modes are really there?

60Kernel Density EstimationChoice of bandwidth (window width)?Very important to performanceData Based Choice?Controversial IssueMany recommendationsSuggested Reference:Jones, Marron & Sheather (1996)Never a consensus

61Kernel Density EstimationChoice of bandwidth (window width)?Alternate Choice:Consider all of them!I.e. look at whole spectrum of smoothsCan see different structureAt different smoothing levelsConnection to Scale SpaceE.g. Stamps dataHow many modes?All answers are there.

62Kernel Density Estimation

63Statistical SmoothingFundamental QuestionFor both ofDensity Estimation: HistogramsRegression: Scatterplot Smoothing

Which bumps are really there?vs. artifacts of sampling noise?64