a comparative study between ica (independent component analysis) and pca (principal component...

26
A Comparative Study between ICA and PCA Md. Sahidul Islam Roll No. 08054718 Department of Statistics University of Rajshahi [email protected] 1

Upload: sahidul-islam

Post on 02-Jul-2015

420 views

Category:

Education


2 download

DESCRIPTION

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

TRANSCRIPT

Page 1: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

A Comparative Study between

ICA and PCA

Md. Sahidul IslamRoll No. 08054718

Department of StatisticsUniversity of Rajshahi

[email protected]

1

Page 2: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Overview

Motivation of the study

Objective

Definition of ICA

FastICA algorithm

Results of the study

Latent structure

Cluster analysis

Outlier detection

Conclusions

2

Page 3: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Motivation of the study

o In multivariate statistics Latent structure detection, cluster analysis, and outlier detection using PCA is a promising old technique.

o In many cases ICA perform better than PCA.

o Our motivation in this thesis is to perform latent structure, cluster analysis and outlier detection using ICA and compare it with that of PCA

3

Page 4: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Objectives

o Study algorithms of ICA

o Applying ICA for Latent structure detection, cluster analysis

and outlier detection.

o Comparing its performance with that of PCA

4

Page 5: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Independent Component Analysis

The simple “Cocktail Party” Problem

SourcesObservations

s1

s2

x1

x2

Mixing matrix A

2221212

2121111

sasax

sasax

11a

12a

22a

21a

2

1

2221

1211

2

1

s

s

aa

aa

x

x

x=As

5

ICA

PCAy= WTx

Page 6: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Non-gaussianity is independent

Central limit theorem

The distribution of a sum of independent random variables tends

toward a Gaussian distribution

Observed signal = S1 S2 Sna1 + a2 ….+ an

toward Gaussian Non-GaussianNon-GaussianNon-Gaussian6

Page 7: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Non-guassianity is Independent

Nongaussianity estimates independent

Estimation of y = wT x =wTAs = zTs

let z = AT w, so y = wTAs = zTs

y is a linear combination of si, therefore zTs is more gaussian than any of si

zTs becomes least gaussian when it is equal to one of the si

wTx = zTs equals an independent component

Maximizing nongaussianity of wTx gives us one of the independent components

7

Page 8: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

FastICA algorithm

Iteration procedure for maximizing nongaussianity

Step1: choose an initial weight vector w

Step2: Let w+=E[xg(wTx)]-E[g’(wTx)]w(g: a non-quadratic function)

Step3: Let w=w+/||w+||

Step4: if not converged, go back to

Step2

8

Page 9: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Results and Discussions

Latent structure detection

9

Page 10: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Simulated dataset -1

Figure: Matrix plot of original source of 10 uniform distribution.

10

Page 11: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Simulated dataset -1

Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables.

11

Page 12: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Simulated dataset -1

Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables

12

Page 13: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Simulated dataset-2

Simulated dataset-2 consists of

5 variables comes from Laplace

(super-gaussian), uniform

(sub-gaussian), binomial,

multinomial and normal

distribution each have 10000

observation.

Figure: Matrix plot of original source of 5 variables each

comes form different distribution.

13

Page 14: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205 14

Simulated dataset-2

Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables

each comes form different distribution.

Page 15: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Simulated dataset-2

15

Figure: (Left)Matrix plot of independent components. (Right) Original source of 5

variables each comes form different distribution.

Page 16: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Cluster Analysis

16

Page 17: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

The first experiment of real data set for clustering is Australian crabs data set where

there are 200 rows and 8 columns describing the 5 morphological measurements

(Frontal lob size, Rear width, Carapace length, Carapace width, Body depth). There

are two species in the data set each have both sexes (male, female) of the genus

Leptograpsus. There are 50 specimens of each sex of each species, collected on site

at Fremantle, Western Australia. (N. A. Campbell et al., 1974).

Australian Crabs dataset

17

Page 18: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

The second example of real data set is world famous Fishers Iris data set

where the data report four characteristics (sepal width, sepal length, petal

width and petal length) of three species (setosa, versicolor, virginica) of Iris

flower.

Fisher Iris dataset

18

Page 19: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Outlier detection

19

Page 20: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Scottish hill racing dataset

The data gives the record wining times for 35 hill races in Scotland (Atkinson,

1986). The purpose of that study was to investigate the relationship of record

time 35 hill races.

20

Page 21: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Epilepsy dataset

Thal and Vail reported data from clinical trial of 59 patients with

epilepsy, 31 of whom were randomized to receive the anti-epilepsy

drug Progabide and 28 receive placebo

21

Page 22: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

This data consists of 21 days of operation for a plant for the

oxidation of ammonia as a stage in the production of nitric acid. The

response is called stack loss which is percent of uncovered

ammonia that escapes from the planet. There are three explanatory

and one response variable in the dataset.

Stackloss data

22

Page 23: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Education expenditure dataset

These data are used by Chatterjee, Hadi, and Price as an example

of heteroscedasticity. The data gives the education expenditures of

U.S. states as projected in 1975.

23

Page 24: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Conclusions

If the subject domain supports the assumption of

independent non-gaussian source variables, we

recommended of using ICA in place of PCA for latent

structure detection, clustering and outlier detection.

24

Page 25: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Future Research

The following are the areas in which we want to study

o Use Kernel technique of ICA for shape study, clustering and outlier

detection.

o Separation of Nonlinear mixture.

o Data mining (sometimes called data or knowledge discovery) is the

most recent technique in multivariate analysis to extract information

from a data set and transform it into an understandable structure for

further use. Text data mining or Medical data mining using ICA wolud

be future research.

25

Page 26: A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Department of Statistics, University of Rajshahi-6205

Thank you

26