independent component analysisarthurg/papers/icatutorial.pdf · ica: setting independent component...

84
Independent Component Analysis Arthur Gretton Carnegie Mellon University 2009

Upload: others

Post on 25-May-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Independent Component Analysis

Arthur Gretton

Carnegie Mellon University

2009

Page 2: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: setting

Independent component analysis:

• S a vector of l unknown, independent sources: PS =∏l

i=1 PSi

• X vector of mixtures

• A is l × l mixing matrix (full rank)

Page 3: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: setting

Independent component analysis:

• B is estimated A−1, we solve for this

• Y vector of estimated sources

Page 4: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: setting

Independent component analysis:

• B is estimated A−1, we solve for this

• Y vector of estimated sources

Neglect time dependence: m i.i.d. mixture observations

Page 5: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: another example

• Mixtures X are

original EEG

[Jung et al., 2000]

• Estimated sources

Y are ICA

components

• Scalp map from B

Page 6: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA examples

• We’ve seen:

– Sounds mixed together (“cocktail party” problem) [Hyvarinen et al., 2001]

– EEG recordings (brain, fetal heartbeat) [Jung et al., 2000, Stogbauer et al., 2004]

• Some further examples:

– Extracting independent activity from fMRI [Calhoun et al., 2003]

– Financial data [Kiviluoto and Oja, 1998]

– Linear edge filters for image patch coding? (Possibly not: [Bethge, 2006])

Page 7: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A toy example

• Two distributions: PS1 is uniform, PS2 is bimodal

−2 −1 0 1 20

20

40

60

80

100

Source 1, uniform

−2 −1 0 1 20

50

100

150

200

250

Source 2, bimodal

Page 8: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A toy example

• Two distributions: PS1 is uniform, PS2 is bimodal

Page 9: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A toy example

• Two distributions: PS1 is uniform, PS2 is bimodal

Page 10: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

First indeterminacy: ordering

• Initial unmixed RVs in red

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

so

urc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/6

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/4

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/3

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/2

mixture 1

mix

ture

2

• Independent at rotation π/2

Page 11: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

First indeterminacy: ordering

• Initial unmixed RVs in red

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

so

urc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/6

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/4

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/3

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Rotation π/2

mixture 1

mix

ture

2

• Independent at rotation π/2

Ignore source order

Page 12: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Second indeterminacy: sign

• Initial unmixed RVs in red

• Source 2 sign reversed in blue

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Mixture

mixture 1

mix

ture

2

Page 13: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Second indeterminacy: sign

• Initial unmixed RVs in red

• Source 2 sign reversed in blue

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Mixture

mixture 1

mix

ture

2

Ignore source sign

Page 14: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Second indeterminacy: sign

• Initial unmixed RVs in red

• Source 2 sign reversed in blue

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Mixture

mixture 1

mix

ture

2

• More generally: S1 and S2 independent iff

aS1 and S2 independent for a 6= 0

– Assume sources have unit variance

Page 15: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Third indeterminacy: Gaussians

Both sources GaussianSource distribution P

S1 P

S2

Source S1

So

urc

e S

2

−2 0 2−3

−2

−1

0

1

2

3

−4 −2 0 2 40

0.1

0.2

0.3

0.4

Source S1

P(S

1)

−4 −2 0 2 40

0.1

0.2

0.3

0.4

Source S2

P(S

2)

Page 16: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Third indeterminacy: Gaussians

Both sources GaussianSource distribution P

S1 P

S2

Source S1

So

urc

e S

2

−2 0 2−3

−2

−1

0

1

2

3

−4 −2 0 2 40

0.1

0.2

0.3

0.4

Source S1

P(S

1)

−4 −2 0 2 40

0.1

0.2

0.3

0.4

Source S2

P(S

2)

Meaningless to “unmix” Gaussians

Page 17: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Things that are impossible for ICA

Using independence alone, we cannot . . .

• recover signal order,

• recover signal sign (or amplitude) ,

• separate multiple Gaussians.

Page 18: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Things that are impossible for ICA

Using independence alone, we cannot . . .

• recover signal order,

• recover signal sign (or amplitude) ,

• separate multiple Gaussians.

We can recover

B∗ = PDA−1

• P is a permutation matrix

• D diagonal, dii ∈ −1, 1

(as long as no more than one Gaussian source)

Page 19: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

First step in ICA: decorrelate

• Idea: remove all dependencies of order 2 between mixtures X

Page 20: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

First step in ICA: decorrelate

• Idea: remove all dependencies of order 2 between mixtures X

−2 −1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

X

Y

Uncorrelated

−10 −5 0 5 10−6

−4

−2

0

2

4

6

A X

A Y

Correlated

Page 21: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

First step in ICA: decorrelate

• Idea: remove all dependencies of order 2 between mixtures X

• New signals have unit covariance:

T = BwX Ct = I

• We thus break up B as follows:

B = BrBw

– Bw is a whitening matrix

– Br is remaining demixing operation

• Use the SVD of mixture covariance Cx = UΛU⊤:

Bw = Λ−1/2U

Page 22: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

What does decorrelation achieve?

• Two distributions: PS1 is uniform, PS2 is bimodal

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−20 −10 0 10 20−4

−3

−2

−1

0

1

2

3

4

Observed mixtures

mixture 1

mix

ture

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

After decorrelation

mixture 1

mix

ture

2

Page 23: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Problem remaining: rotation

• Assume correlation has already been removed

• To recover original signal, need to rotate

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

After decorrelation

mixture 1m

ixtu

re 2

• In remainder: unmixing matrix B is rotation,

B⊤B = I

Page 24: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• Model for mixtures parametrised by (B, PS)

Source distribution PS1

PS2

Source S1

Sourc

e S

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 25: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• Model for mixtures parametrised by (B, PS)

Source distribution PS1

PS2

Source S1

Sourc

e S

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

Angle (× π/2)

Log lik

elih

ood

Unmixing angle for B: 0

Page 26: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• Model for mixtures parametrised by (B, PS)

Source distribution PS1

PS2

Source S1

Sourc

e S

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

Angle (× π/2)

Log lik

elih

ood

Unmixing angle for B: π/12

Page 27: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• Model for mixtures parametrised by (B, PS)

Source distribution PS1

PS2

Source S1

Sourc

e S

2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

Angle (× π/2)

Log lik

elih

ood

Unmixing angle for B: π/4

Page 28: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• We have a model for the observations, parametrised by (B, PS)

– Model must have PS =∏l

i=1 PSi

Page 29: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• We have a model for the observations, parametrised by (B, PS)

– Model must have PS =∏l

i=1 PSi

• With this model, our estimated density of observations is

PX = |det(B)| PS(BX)

Page 30: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• We have a model for the observations, parametrised by (B, PS)

– Model must have PS =∏l

i=1 PSi

• With this model, our estimated density of observations is

PX =»

»»

»»|det(B)| PS(BX)

Page 31: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA: maximum likelihood

• We have a model for the observations, parametrised by (B, PS)

– Model must have PS =∏l

i=1 PSi

• With this model, our estimated density of observations is

PX = PS(BX)

• Maximise the expected log likelihood,

L := EX

[log PX

]

Page 32: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Maximum likelihood: where it fails

• Model as before, but true source densities are Laplace.

• Why is this so wrong?

−10 −5 0 5 10−10

−5

0

5

10

Input sources

source 1

sour

ce 2

−5 0 5−8

−6

−4

−2

0

2

4

6

8

Max. likelihood solution

est. source 1es

t. so

urce

2

Page 33: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Maximum likelihood: where it fails

• Model as before, but true source densities are Laplace.

• Why is this so wrong?

Source distribution PS1

PS2

Source S1

Sou

rce

S2

−5 0 5−5

0

5

0 0.5 1−1.95

−1.9

−1.85

−1.8

−1.75

−1.7

−1.65

−1.6

−1.55

Angle (× π/2)Lo

g lik

elih

ood

Page 34: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Back to original setting: independence

• Ideally: contrast φ(Y ) = 0 if and only if all components of Y mutually

independent:

PY =l∏

i=1

PYi.

– Under our mixing assumptions: Y are original sources S besides

permutations, sign swaps

Page 35: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Back to original setting: independence

• Ideally: contrast φ(Y ) = 0 if and only if all components of Y mutually

independent:

PY =l∏

i=1

PYi.

– Under our mixing assumptions: Y are original sources S besides

permutations, sign swaps

• How it’s really used: contrast should be “smallest” when random

variables are “most independent”

Page 36: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Mutual information

• The mutual information:

I(Y ) = DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)

• DKL ≥ 0 with equality iff PY =∏l

i=1 PYi

Page 37: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Mutual information

• The mutual information:

I(Y ) = DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)

• DKL ≥ 0 with equality iff PY =∏l

i=1 PYi

• Simplification: when B is a rotation,

DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)=

l∑

i=1

h (Yi) − h (X) − log |detB| .

where h(Y ) = −EY log(PY (y))

Page 38: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Mutual information

• The mutual information:

I(Y ) = DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)

• DKL ≥ 0 with equality iff PY =∏l

i=1 PYi

• Simplification: when B is a rotation,

DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)=

l∑

i=1

h (Yi) − h (X) − log |detB| .︸ ︷︷ ︸constant

where h(Y ) = −EY log(PY (y))

Page 39: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Mutual information

• The mutual information:

I(Y ) = DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)

• DKL ≥ 0 with equality iff PY =∏l

i=1 PYi

• Simplification: when B is a rotation,

DKL

(PY

∥∥∥∥∥

l∏

i=1

PYi

)=

l∑

i=1

h (Yi) − h (X) − log |detB| .︸ ︷︷ ︸constant

where h(Y ) = −EY log(PY (y))

Contrast: φKL(Y ) :=∑l

i=1 h (Yi)

Page 40: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Maximum likelihood revisited

• Mutual information contrast: minimize

φKL(Y ) :=l∑

i=1

−EY log(PY (y))

• Maximum likelihood: maximize

L := EX

[log PS(BX)

]

=l∑

i=1

EY log(PY (y))

• Same thing!

Page 41: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Maximum likelihood revisited

• Mutual information contrast: minimize

φKL(Y ) :=l∑

i=1

−EY log(PY (y))

• Maximum likelihood: maximize

L := EX

[log PS(BX)

]

=l∑

i=1

EY log(PY (y))

• Same thing!

• The difference is in approach:

– For max. likelihood we assumed a model PS

– Now we assume no model for PY (though we still make assumptions)

Page 42: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions with fixed nonlinearities

• Entropies hard to compute/optimize: replace with

φf (Y ) =l∑

j=1

E(f(Yj))

for some other nonlinear f(y)

2 22

Page 43: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions with fixed nonlinearities

• Entropies hard to compute/optimize: replace with

φf (Y ) =l∑

j=1

E(f(Yj))

for some other nonlinear f(y)

−2 −1 0 1 20

2

4

6

8

10

12

14

16

Jade

f(y) = y4

−2 −1 0 1 29

9.2

9.4

9.6

9.8

10

Infomax

f(y) = a−exp(−y2/2)sech2(y)

−2 −1 0 1 20

0.5

1

1.5

2

Fast ICA

f(y) =1

alog cosh(ay),

Page 44: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Our example again

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

After decorrelation

mixture 1

mix

ture

2

0 0.5 12.5

3

3.5

4

4.5

5

Contr

ast

Jade

Angle (× π/2)0 0.5 1

8.9

9

9.1

9.2

9.3

9.4

Imax

Angle (× π/2)

Contr

ast

0 0.5 1

1.5

1.55

1.6

1.65

1.7

1.75

1.8

Fica

Angle (× π/2)

Contr

ast

Page 45: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Kurtosis: an important concept

• Kurtosis definition: when mean is zero,

κ4 = E(x4)− 3

(E

(x2))2

.

• Source densities can be super-Gaussian (positive kurtosis) or

sub-Gaussian (negative kurtosis)

• Zero kurtosis does not mean Gaussian!

−5 0 50

0.05

0.1

0.15

0.2

0.25

S

p(S

)

Sub−Gaussian (uniform)

−5 0 50

0.1

0.2

0.3

0.4

0.5

S

p(S

)

Super−Gaussian (Laplace)

Page 46: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Demo: contrasts with fixed nonlinearities

• Super-Gaussian (Laplace) and sub-Gaussian (Uniform) sources

• Unmixed sources in red

• Mixture (angle π/6) in black

−10 −5 0 5 10−8

−6

−4

−2

0

2

4

6

Super−Gaussian sources

−4 −2 0 2 4−3

−2

−1

0

1

2

3

Sub−Gaussian sources

Page 47: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Demo: contrasts with fixed nonlinearities

• Results for Jade, Infomax, and Fast ICA contrasts

0 0.5 1200

250

300

350

400

450

500

Su

pe

r−G

au

ssia

n

Jade

0 0.5 1

8.64

8.66

8.68

8.7

8.72

8.74

Imax

0 0.5 11.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

Fica

0 0.5 13.5

4

4.5

5

Su

b−

Ga

uss

ian

Angle (× π/2)0 0.5 1

9

9.05

9.1

9.15

9.2

Angle (× π/2)0 0.5 1

1.5

1.55

1.6

1.65

Angle (× π/2)

Care needed when using fixed contrasts!

Page 48: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions using entropy estimates

• Simplest option: convolve with spline kernel, then compute discrete

entropy via space partition [Pham, 2004]

0 0.2 0.4 0.6 0.8 12.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3

Angle (× π/2)

MIC

A c

ontr

ast

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

After decorrelation

mixture 1

mix

ture

2

Page 49: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions using spacings entropy estimate

• More sophisticated option: spacings estimate of entropy

[Learned-Miller and Fisher III, 2003]

0 0.2 0.4 0.6 0.8 1−3.4

−3.3

−3.2

−3.1

−3

−2.9

−2.8x 10

4

Angle (× π/2)

RA

DIC

AL c

ontr

ast

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Input sources

source 1

sourc

e 2

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

After decorrelation

mixture 1

mix

ture

2

Page 50: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions using spacings entropy estimate

• More sophisticated option: spacings estimate of entropy

[Learned-Miller and Fisher III, 2003]

• Sort sample Y1, . . . , Ym in increasing order: Y(i) ≤ Y(i+1)

• Prob. density estimate based on spacings

P(y;Y1, . . . , Ym) =1

(m + 1)(Y(i+1) − Y(i)), Y(i) ≤ y < Y(i+1)

• Entropy estimate based on spacings

h(Y ) =1

m − 1

m−1∑

i=1

log(m + 1)(Y(i+1) − Y(i))

• Smoothing: add “extra” mixture points (noisy copies of original

mixtures)

• Hard to optimize

Page 51: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Other independence measures as contrasts

• Why mutual information?

– Same as maximum likelihood (good if model is correct)

– Contrast function is sum of entropies: fast

• Other independence measures?

Page 52: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Other independence measures as contrasts

• Why mutual information?

– Same as maximum likelihood (good if model is correct)

– Contrast function is sum of entropies: fast

• Other independence measures?

• Most common: kernel/characteristic function-based

– Characteristic function-based ICA [Eriksson and Koivunen, 2003, Chen and Bickel,

2005]

– Kernel ICA (covariance): COCO, KMI, HSIC [Gretton et al., 2005, Shen et al.,

2007, 2009]

– Kernel ICA (correlation): KCCA, KGV [Bach and Jordan, 2002]

• HSIC same as characteristic function-based (for the purposes of ICA) [Shen et al.,

2009]

Page 53: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Kernel contrast function: HSIC

• Dependence measure:

HSIC(PUV , F ) :=

(supf∈F

[EUV f − EUEV f ]

)2

Y(1)

Y(2

)Dependence witness and sample

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

0.05

Page 54: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

HSIC: empirical expression

• Empirical HSIC:

HSIC :=1

m2tr(KHLH)

– K Gram matrix for (u1, . . . , um)

– L Gram matrix for (v1, . . . , vm)

– Centering H = I − 1m1m1

⊤m

Page 55: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions: a small selection

Contrast functions

• Sum of expectations of a fixed nonlinearity

– Fast ICA, Infomax, Jade

• Sum of entropies/mutual information. . .

– . . . using fast, smoothed entropy estimates

– . . . using spacings/k-nn entropy estimates

• Kernel/characteristic function dependence measures

Page 56: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Contrast functions: a small selection

Contrast functions

• Sum of expectations of a fixed nonlinearity

– Fast ICA, Infomax, Jade

• Sum of entropies/mutual information. . .

– . . . using fast, smoothed entropy estimates

– . . . using spacings/k-nn entropy estimates

• Kernel/characteristic function dependence measures

How do we optimize?

Page 57: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Optimization (Jacobi)

• For two signals, the rotation is expressed

B =

cos(θ) − sin(θ)

sin(θ) cos(θ)

• Higher dimensions, eg for l = 3,

B :=

2

6

6

4

cos(θz) − sin(θz) 0

sin(θz) cos(θz) 0

0 0 1

3

7

7

5

×

2

6

6

4

cos(θy) 0 − sin(θy)

0 1 0

sin(θy) 0 cos(θy)

3

7

7

5

×

2

6

6

4

1 0 0

0 cos(θx) − sin(θx)

0 sin(θx) cos(θx)

3

7

7

5

• Coordinate descent, exhaustive search, etc...

Page 58: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Optimization (Newton)

• Unmixing matrix B satisfies B⊤B = I

• Local parameterisation Ω about B: at iteration k,

Bk+1 = Bk exp(Ω) Ω = −Ω⊤

• How to choose direction and size of Ω?

Page 59: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Optimization (Newton)

• Unmixing matrix B satisfies B⊤B = I

• Local parameterisation Ω about B: at iteration k,

Bk+1 = Bk exp(Ω) Ω = −Ω⊤

• How to choose direction and size of Ω?

• Newton-like method: solve the linear system for Ω ∈ Rm(m−1)/2

HBk(φ)Ω = −∇Bk

(φ)

• Approximate Hessian as diagonal: FastICA [Shen and Huper, 2006]

Page 60: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Gradient descent vs Newton

5 10 15 20 25 30

10−0.4

10−0.3

10−0.2

10−0.1

100

Iteration

10

0x A

ma

ri e

rro

r

Newton

Gradient descent

Page 61: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

What if we have time dependence?

• We can get extra information from sources not being i.i.d.

• Mixture x(t) now stationary random process, depends on x(t − τ)

• Define mixture covariances

C0 = E(x(t)x(t)), Cτ = E(x(t)x(t − τ)),

– Cτ indpendent of t (stationarity)

Page 62: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

What if we have time dependence?

• We can get extra information from sources not being i.i.d.

• Mixture x(t) now stationary random process, depends on x(t − τ)

• Define mixture covariances

C0 = E(x(t)x(t)), Cτ = E(x(t)x(t − τ)),

– Cτ indpendent of t (stationarity)

• Decorrelate:

BC0B⊤ = Λ BCτB

⊤ = Λ

– Λ and Λ diagonal

• Combining both requirements:

BC0C−1τ =

(ΛΛ−1

)B

• Greater number of delays: joint diagonalisation

Page 63: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

What’s the best method?

Page 64: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark

• l = 8 sources

• m = 40, 000 samples

• Benchmark data from

[Bach and Jordan, 2002]

• Average over 24 repetitions

Page 65: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: results

Page 66: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: results

Adaptive contrasts outperform fixed nonlinearities

Jade KDICA MICA FKICA QGD RAD MILCA

0.2

0.4

0.6

0.8

1

1.2

1.4

100x A

mari e

rror

Demixing quality

Adaptive contrasts

Page 67: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: computational cost

Jade KDICA MICA FKICA QGD RAD MILCA

0.2

0.4

0.6

0.8

1

1.2

1.4

100x A

mari e

rror

Demixing quality

Jade KDICA MICA FKICA QGD RAD MILCA

100

101

102

103

104

105

se

c

Run times

Page 68: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: computational cost

Best runtime (adaptive): fast entropy estimates

Jade KDICA MICA FKICA QGD RAD MILCA

0.2

0.4

0.6

0.8

1

1.2

1.4

100x A

mari e

rror

Demixing quality

Jade KDICA MICA FKICA QGD RAD MILCA

100

101

102

103

104

105

se

c

Run times

Fast entropy esimates

Page 69: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: computational cost

Kernel methods: Newton outperforms Gradient Descent

Jade KDICA MICA FKICA QGD RAD MILCA

0.2

0.4

0.6

0.8

1

1.2

1.4

100x A

mari e

rror

Demixing quality

Jade KDICA MICA FKICA QGD RAD MILCA

100

101

102

103

104

105

se

c

Run times

Kernel methods

Page 70: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

A basic benchmark: computational cost

Spacings/k-nn entropy contrasts slowest

Jade KDICA MICA FKICA QGD RAD MILCA

0.2

0.4

0.6

0.8

1

1.2

1.4

100x A

mari e

rror

Demixing quality

Jade KDICA MICA FKICA QGD RAD MILCA

100

101

102

103

104

105

se

c

Run times

Spacings/k−nn

Page 71: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

High frequency perturbations

• Two sources, sinusoidal perturbations to Gaussian

• Random mixing angle.

• Results averaged over 25 datasets, m = 1000

−10 −5 0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

s

p(s

)

Source density

−4 −3 −2 −1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

f

F[p

](f)

Source spectrum

Page 72: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

High frequency perturbations

0.2 0.4 0.6 0.8 1

5

10

15

20

25

30

35

40

45

50

Perturbation frequency

100 x

Am

ari e

rror

FKICA

MICA

KDICA

MILCA

RAD

Page 73: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

High frequency perturbations

Spacings/k-nn methods perform best(but slow)

0.2 0.4 0.6 0.8 1

5

10

15

20

25

30

35

40

45

50

Perturbation frequency

100 x

Am

ari e

rror

FKICA

MICA

KDICA

MILCA

RAD

Page 74: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

High frequency perturbations

Fast entropy estimates: narrowest range

0.2 0.4 0.6 0.8 1

5

10

15

20

25

30

35

40

45

50

Perturbation frequency

100 x

Am

ari e

rror

FKICA

MICA

KDICA

MILCA

RAD

Page 75: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

High frequency perturbations

Fast Kernel ICA: peforms in between(good performance/runtime tradeoff)

0.2 0.4 0.6 0.8 1

5

10

15

20

25

30

35

40

45

50

Perturbation frequency

100 x

Am

ari e

rror

FKICA

MICA

KDICA

MILCA

RAD

Page 76: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Outlier resistance

Two sources, outliers added to both mixtures

5 10 15 20 25

5

10

15

20

25

number of outliers

10

0 x

Am

ari e

rro

r

FKICA

MICA

KDICA

MILCA

RAD

Page 77: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Outlier resistance

Kernel ICA performs best

5 10 15 20 25

5

10

15

20

25

number of outliers

10

0 x

Am

ari e

rro

r

FKICA

MICA

KDICA

MILCA

RAD

Page 78: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Outlier resistance

Fast entropy estimates: less goodKDICA initialized with kernel ICA solution!

5 10 15 20 25

5

10

15

20

25

number of outliers

10

0 x

Am

ari e

rro

r

FKICA

MICA

KDICA

MILCA

RAD

Page 79: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA algorithm choice

• Choosing kernel ICA approach

– Fastest (by far): Fast ICA [Hyvarinen et al., 2001], Jade [Cardoso, 1998]

– Good tradeoff between speed and performance: MICA [Pham, 2004]

– Tricky cases (outliers, non-smooth sources): Fast KICA [Shen et al., 2007,

2009]

– Small sample size: KGV very good [Bach and Jordan, 2002]

Page 80: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA algorithm choice

• Choosing kernel ICA approach

– Fastest (by far): Fast ICA [Hyvarinen et al., 2001], Jade [Cardoso, 1998]

– Good tradeoff between speed and performance: MICA [Pham, 2004]

– Tricky cases (outliers, non-smooth sources): Fast KICA [Shen et al., 2007,

2009]

– Small sample size: KGV very good [Bach and Jordan, 2002]

• Some further hints:

– Use multiple restarts (non-convex)

– Independence test to check answer

Page 81: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

ICA algorithm choice

• Choosing kernel ICA approach

– Fastest (by far): Fast ICA [Hyvarinen et al., 2001], Jade [Cardoso, 1998]

– Good tradeoff between speed and performance: MICA [Pham, 2004]

– Tricky cases (outliers, non-smooth sources): Fast KICA [Shen et al., 2007,

2009]

– Small sample size: KGV very good [Bach and Jordan, 2002]

• Some further hints:

– Use multiple restarts (non-convex)

– Independence test to check answer

• Comparing (usually fixed contrast) algorithms:

– One approach “better” than another?

– Example: sources l very large, samples m small (wrt l), e.g.

microarray data [Lee and Batzoglou, 2003]

Page 82: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Selected ICA references

• Start with Cardoso’s excellent introduction [Cardoso, 1998], and the book by

Hyvarninen et al. [Hyvarinen et al., 2001]

• Fast kernel ICA is described in [Shen et al., 2007, 2009]. Characteristic

function-based ICA is described in [Eriksson and Koivunen, 2003, Chen and Bickel, 2005].

For earlier kernel ICA methods, see [Bach and Jordan, 2002, Gretton et al., 2005]

• Mutual information/entropy based: [Pham, 2004, Learned-Miller and Fisher III, 2003,

Stogbauer et al., 2004, Chen, 2006]

• Classic algorithms for time series separation with second order methods

(not covered much in this talk): [Molgedey and Schuster, 1994, Belouchrani et al., 1997]

• An important paper for optimising over orthogonal matrices: [Edelman et al.,

1998]. The Newton-like method: [Huper and Trumpf, 2004].

Page 83: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

Refe

rences

F.R

.Bach

and

M.I.

Jordan.K

ernelin

dependentcom

ponentanaly

sis

.Journal

ofM

achin

eLea

rnin

gResea

rch,3:1

–48,2002.

A.Belo

uchrani,

K.A

bed-M

eraim

,J.-F

.C

ardoso,and

E.M

oulin

es.

Ablin

dsource

separatio

ntechniq

ue

usin

gsecond

order

statis

tic

s.

IEEE

Tra

nsa

c-

tions

on

Sig

nalPro

cessin

g,45(2):4

34–444,1997.

M.Bethge.Factoria

lcodin

gofnaturalim

ages:

how

effe

ctiv

eare

linearm

od-

els

inrem

ovin

ghig

her-o

rder

dependencie

s?

Journalofth

eO

ptica

lSocie

tyof

Am

erica

A,23(6):1

253–1268,2006.

V.C

alh

oun,T

.A

dali,

L.H

ansen,J.Larsen,and

J.Pekar.

Ica

offu

nctio

nal

mri

data:

An

overvie

w.

InPro

c.

4th

Int.

Sym

p.

on

Independent

Com

ponent

Analy

sisand

Blin

dSig

nalSepara

tion

(ICA

2003),

pages

281–288,2003.

J.-F

.C

ardoso.

Blin

dsig

nalseparatio

n:

Statis

tic

alprin

cip

les.

Pro

ceedin

gs

of

the

IEEE,Spec

ialIssu

eon

Blin

dId

entifi

catio

nand

Estim

atio

n,86(10):2

009–2025,

1998.

A.

Chen.

Fast

kernel

density

independent

com

ponent

analy

sis

.In

Six

thIn

ternatio

nal

Confe

rence

on

ICA

and

BSS,

volu

me

3889,

pages

24–31,

Berlin

/H

eid

elb

erg,2006.Sprin

ger-V

erla

g.

A.C

hen

and

P.J.Bic

kel.

Consis

tent

independent

com

ponent

analy

sis

and

prew

hit

enin

g.

IEEE

Tra

nsa

ctio

ns

on

Sig

nal

Pro

cessin

g,

53(10):3

625–3632,

2005.

A.Edelm

an,T

.A

.A

ria

s,and

S.T

.Sm

ith.T

he

geom

etry

ofalg

orit

hm

sw

ith

orthogonalit

yconstrain

ts.

SIA

MJournalon

Matr

ixAnaly

sisand

Applica

tions,

20(2):3

03–353,1998.

J.

Erik

sson

and

K.

Koiv

unen.

Characteris

tic

-functio

nbased

independent

com

ponent

analy

sis

.Sig

nalPro

cessin

g,83(10):2

195–2208,2003.

A.G

retton,R

.H

erbric

h,A

.J.Sm

ola

,O

.Bousquet,and

B.Scholk

opf.

Kernel

methods

for

measurin

gin

dependence.

JournalofM

achin

eLea

rnin

gResea

rch,

6:2

075–2129,2005.

K.H

uper

and

J.Trum

pf.

New

ton-lik

em

ethods

for

num

eric

aloptim

isatio

non

manifo

lds.

InPro

ceedin

gs

of

Thir

ty-e

ighth

Asilo

mar

Confe

rence

on

Sig

nals,

Syste

ms

and

Com

pute

rs,

pages

136–139,2004.

A.H

yvarin

en,J.K

arhunen,and

E.O

ja.In

dependentCom

ponentAnaly

sis.W

iley,

New

York,2001.

42-1

Page 84: Independent Component Analysisarthurg/papers/icaTutorial.pdf · ICA: setting Independent component analysis: • S a vector of l unknown, independent sources: PS = Q l i=1 P S i •

T.-P

.Jung,S.M

akeig

,C

.H

um

phrie

s,T

.-W.Lee,M

.M

cK

eow

n,V

.Ir

agui,

and

T.Sejn

ow

ski.

Rem

ovin

gele

ctroencephalo

graphic

artifa

cts

by

blin

dsource

separatio

n.

Psy

chophysio

logy,37:1

63–178,2000.

K.

Kiv

iluoto

and

E.

Oja

.In

dependent

com

ponent

analy

sis

for

paralle

lfi-

nancia

ltim

eserie

s.

InIn

Pro

c.

Int.

Conf.

on

Neura

lIn

form

atio

nPro

cessin

g(IC

ON

IP),

volu

me

2,pages

895–898,1998.

E.G

.Learned-M

iller

and

J.W

.Fis

her

III.

ICA

usin

gspacin

gs

estim

ates

of

entropy.

JournalofM

achin

eLea

rnin

gResea

rch,4:1

271–1295,2003.

S.I.

Lee

and

S.Batzoglo

u.

Applic

atio

nof

independent

com

ponent

analy

sis

to

mic

roarrays.

Genom

eBio

logy,4(11):R

76,2003.

L.M

olg

edey

and

H.Schuster.Separatio

nofa

mix

ture

ofin

dependentsig

nals

usin

gtim

edela

yed

correla

tio

n.

Physica

lRevie

wLette

rs,

72(23):3

634–3637,

1994.

D.-T

.Pham

.Fast

alg

orit

hm

sfo

rm

utual

info

rm

atio

nbased

independent

com

ponent

analy

sis

.IE

EE

Tra

nsa

ctio

ns

on

Sig

nalPro

cessin

g,

52(10):2

690–

2700,2004.

H.Shen

and

K.H

uper.

New

ton-lik

em

ethods

for

paralle

lin

dependent

com

-ponent

analy

sis

.In

MLSP

16,pages

283–288,M

aynooth,Ir

ela

nd,2006.

H.Shen,S.Jegelk

a,and

A.G

retton.

Fast

kernelIC

Ausin

gan

approxim

ate

New

ton

method.

InA

ISTATS

11,pages

476–483.M

icrotom

e,2007.

H.Shen,S.Jegelk

a,and

A.G

retton.

Fast

kernel-b

ased

independent

com

po-

nent

analy

sis

.IE

EE

Tra

nsa

ctio

ns

on

Sig

nalPro

cessin

g,2009.

Inpress.

H.Stogbauer,A

.K

raskov,S.A

stakhov,and

P.G

rassberger.Leastdependent

com

ponent

analy

sis

based

on

mutual

info

rm

atio

n.

Phys.

Rev.

E,

70(6):

066123,2004.

42-2