lecture 17 factor analysis. syllabus lecture 01describing inverse problems lecture 02probability and...

Post on 15-Jan-2016

230 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 17

Factor Analysis

SyllabusLecture 01 Describing Inverse ProblemsLecture 02 Probability and Measurement Error, Part 1Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least SquaresLecture 05 A Priori Information and Weighted Least SquaredLecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and VarianceLecture 08 The Principle of Maximum LikelihoodLecture 09 Inexact TheoriesLecture 10 Nonuniqueness and Localized AveragesLecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality ConstraintsLecture 13 L1 , L∞ Norm Problems and Linear ProgrammingLecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor AnalysisLecture 18 Varimax Factors, Empircal Orthogonal FunctionsLecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s ProblemLecture 20 Linear Operators and Their AdjointsLecture 21 Fréchet DerivativesLecture 22 Exemplary Inverse Problems, incl. Filter DesignLecture 23 Exemplary Inverse Problems, incl. Earthquake LocationLecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture

Introduce Factor Analysis

Work through an example

Part 1

Factor Analysis

source A

ocean

sediment

source B

s4s2 s3s1

sample matrix S

S arranged row-wisebut we’ll use a column vector s(i) for individual samples)

theory

samples are a linear mixture of sources

S = C F

theory

samples are a linear mixture of sources

S = C Fsamples contain

“elements”

theory

samples are a linear mixture of sources

S = C Fsources called “factors”

factors contain “elements”

factor matrix F

F arranged row-wisebut we’ll use a column vector f(i) for individual factors

theory

samples are a linear mixture of sources

S = C Fcoefficients

called “loadings”

loading matrix C

inverse problem

given Sfind C and F

so that S=CF

very non-unique

given T with inverse T-1

if S=CFthen S=[C T-1][TF] =C’F’

very non-unique

so a priori information needed to select a solution

simplicity

what is the minimum number of factors needed

call that number p

does S span the full space of M elements?

or just a p –dimensional subspace?

0 0.2 0.4 0.6 0.8 10

0.5

1

0

0.2

0.4

0.6

0.8

1

E2

E1

E3 E3

E2

s1s3

A

Bs1

s4

E1

we know how to answer this question

p is the number of non-zero singular values

-0.50

0.51

1.5

-1

-0.5

0

0.5

1

-1

-0.5

0

0.5

1

E1

E2

E3

E3

E2E1

s2 s3

A

Bs1

s4

v2

v1

v3

SVD identifies a subspace

but the SVD factors

f(i) = v(i) i=1, pnot unique

usually not the “best”

factor f(1)v with the largest singular value

usually near the mean sample

sample mean <s>minimize

eigenvector <v>minimize

factor f(1)v with the largest singular value

usually near the mean sample

sample mean <s>minimize

eigenvector <v>minimize

about the same if samples are clustered

0 0.2 0.4 0.6 0.8 10

0.5

10

0.2

0.4

0.6

0.8

1

E1

E2

E3

0 0.2 0.4 0.6 0.8 10

0.5

10

0.2

0.4

0.6

0.8

1

E1

E2

E3 s3

f(1) s1s4

f(1)

f(2) f(2)s4s3s2 s2s1E

3E2 E1

E3E2 E1

(A) (B)

[U, LAMBDA, V] = svd(S,0);lambda = diag(LAMBDA);F = V';C = U*LAMBDA;

in MatLab

[U, LAMBDA, V] = svd(S,0);lambda = diag(LAMBDA);F = V';C = U*LAMBDA;

“economy” calculationLAMBDA is M⨉Min MatLab

since samples have measurement noise

probably no exactly singular values

just very small ones

so pick pfor whichS≈CF

is an adequate approximation

Atlantic Rock Dataset

51.97 1.25 14.28 11.57 7.02 11.67 2.12 0.0750.21 1.46 16.41 10.39 7.46 11.27 2.94 0.0750.08 1.93 15.6 11.62 7.66 10.69 2.92 0.3451.04 1.35 16.4 9.69 7.29 10.82 2.65 0.1352.29 0.74 15.06 8.97 8.14 13.19 1.81 0.0449.18 1.69 13.95 12.11 7.26 12.33 2 0.1550.82 1.59 14.21 12.85 6.61 11.25 2.16 0.1649.85 1.54 14.07 12.24 6.95 11.31 2.17 0.1550.87 1.52 14.38 12.38 6.69 11.28 2.11 0.17(several thousand more rows)

SiO2 TiO2 Al2O3 FeOt MgO CaO Na2O K2O

Al203

Ti02Al203

Si02

K20

Fe0

Mg0

Al203

A) B)

C) D)

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000singular values, s(i)

index, i

s(i)

sing

ular

val

ues,

λ i

index, i

f2 f3 f4 f5

f2p f3p f4p f5pf(5)f(2) f(3) f(4)

SiO2

TiO2

Al2O3

FeOtotal

MgO

CaO

Na2O

K2O

C2C3

C4

top related