using svm weight-based methods to identify causally relevant and non-causally relevant variables

24
Using SVM Weight-Based Methods to Identify Causally Relevant and Non- Causally Relevant Variables Alexander Statnikov 1 , Douglas Hardin 1,2 , Constantin Aliferis 1,3 1 Department of Biomedical Informatics, 2 Department of Mathematics, 3 Department of Cancer Biology, Vanderbilt University, Nashville, TN, USA NIPS 2006 Workshop on Causality and Feature Selection

Upload: afi

Post on 16-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

NIPS 2006 Workshop on Causality and Feature Selection. Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables. Alexander Statnikov 1 , Douglas Hardin 1,2 , Constantin Aliferis 1,3 1 Department of Biomedical Informatics, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3

1Department of Biomedical Informatics, 2Department of Mathematics, 3Department of Cancer Biology, Vanderbilt University, Nashville, TN, USA

NIPS 2006 Workshop on Causality and Feature Selection

Page 2: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Major Goals of Variable Selection

• Construct faster and more cost-effective classifiers.

• Improve the prediction performance of the classifiers.

• Get insights in the underlying data-generating process.

Page 3: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Taxonomy of Variables

Variables

Relevant Irrelevant

Causallyrelevant

Non-causallyrelevant

F

C D

E T

J

A

K

L

M

B

Response

Page 4: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Support Vector Machine (SVM) Weight-Based Variable Selection Methods

• Scale up to datasets with many thousands of variables and as few as dozens of samples

• Often yield variables that are more predictive than the ones output by other variable selection techniques or the full (unreduced) variable set (Guyon et al, 2002; Rakotomamonjy 2003)

Currently unknown: Do we get insights on the causal structure ?

(Hardin et al, 2004):• Irrelevant variables will be given a 0 weight by a linear SVM

in the sample limit;• Linear SVM may assign 0 weight to strongly relevant

variables and nonzero weight to weakly relevant variables.

Page 5: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Simulation Experiments

X1 X2 XN

T

Z1 Z2 ZM…

Relevant variables

Irrelevant variables

(hidden from the learner)Y

Network structure 1

Response

Causally relevant

• P(Y=0) = ½ and P(Y=1) = ½. Y is hidden from the learner;

• {Xi}i=1,…,N are binary variables with P(Xi=0|Y=0) = q and P(Xi=1|Y=1) = q.

• {Zi}i=1,..,M are independent binary variables with P(Zi=0) = ½ and P(Zi=1) = ½.

• T is a binary response variable with P(T=0|X1=0) = 0.95 and P(T=1|X1=1) = 0.95.

q = 0.95 Network 1a

q = 0.99 Network 1b

Page 6: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Simulation ExperimentsNetwork structure 1 in real-world distributions

Disease and its ‘causes’(except for kras)

kras and its regulators(except for SOS1)

SOS1 and targets

Adrenal gland cancer pathway produced by Ariadne Genomics PathwayStudio software version 4.0 (http://www.ariadnegenomics.com/).

Disease and its putative causes (except for kras)

Page 7: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Simulation ExperimentsNetwork structure 2

• {Xi}i=1,..,N are independent binary variables with P(Xi=0) = ½ and P(Xi=1) = ½.

• {Zi}i=1,..,M are independent binary variables with P(Zi=0) = ½ and P(Zi=1) = ½.

• Y is a “synthesis variable” with the following function:

X1 XN

T

Y

Relevant variables

X2

Z1 Z2 ZM…

Irrelevant variablesResponse

Causally relevant

N

iiiNvsign

14XT

N

iiN

1

1 XY

• T is a binary response variable defined as

where vi’s are generated from the uniform random U(0,1) distribution and are fixed for all experiments.

Page 8: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Simulation ExperimentsNetwork structure 2 in real-world distributions

Common targets of ‘causes’ of the disease

‘Causes’ of the diseasePutative causes of the disease

Targets of putative causes of the disease

Page 9: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Data Generation

• Generated 30 training samples of sizes = {100, 200,

500, 1000} for different values of N (number of all

relevant variables) = {10, 100} and M (number of

irrelevant variables) = {10,100,1000}.

• Generated testing samples of size 5000 for different

values of N and M.

• Added noise to simulate random measurement errors:

replace {0%, 1%, 10%} of each variable values with

values randomly sampled from the distribution of that

variable in simulated data.

Page 10: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Overview of Experiments with SVM Weight-Based Methods

Variable selection by SVM weights & classification

- Used C = {0.001, 0.01, 0.1, 1, 10, 100, 1000} - Classified 10%, 20%,…,90%, 100% top-ranked variables

Also classified baselines (causally relevant, non-causally relevant, all relevant, and irrelevant).

Variable selection by SVM-RFE & classification

- Removed one variable at a time- Used C = {0.001, 0.01, 0.1, 1, 10, 100, 1000} - 75% training/25% testing

Page 11: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

SVM Formulation Used

Page 12: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Results

Page 13: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

I. SVMs Can Assign Higher Weights to the Irrelevant Variables than to the Non-Causally Relevant Ones

Average ranks of variables (by SVM weights) over 30 random training samples of size 100 (w/o noise) from network 1a with 100 relevant and irrelevant variables

C is small (≤0.01) C is large (≥0.1)

Page 14: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

I. SVMs Can Assign Higher Weights to the Irrelevant Variables than to the Non-Causally Relevant Ones

SVM penalty param C

Nsample = 100 Nsample = 200 Nsample = 500 Nsample = 1000

0.001 1.000 1.000 1.000 1.0000.01 0.834 0.784 0.739 0.8110.1 0.342 0.406 0.581 0.7161 0.335 0.423 0.592 0.69410 0.335 0.423 0.593 0.715100 0.335 0.423 0.593 0.7241000 0.335 0.423 0.593 0.726

10% 20% 30% 40% 50% 60% 70% 80% 90%

0.001 0.94 0.933 0.929 0.926 0.924 0.923 0.923 0.922 0.922 0.922 0.955 0.908 0.924 0.4990.01 0.941 0.936 0.934 0.932 0.931 0.93 0.929 0.928 0.928 0.928 0.955 0.908 0.938 0.4990.1 0.948 0.934 0.927 0.924 0.922 0.921 0.921 0.921 0.921 0.921 0.955 0.906 0.949 0.4981 0.93 0.92 0.918 0.916 0.916 0.917 0.917 0.917 0.917 0.917 0.955 0.883 0.94 0.49710 0.923 0.92 0.918 0.916 0.916 0.917 0.917 0.917 0.917 0.917 0.955 0.851 0.92 0.497100 0.923 0.92 0.918 0.916 0.916 0.917 0.917 0.917 0.917 0.917 0.955 0.83 0.908 0.4971000 0.923 0.92 0.918 0.916 0.916 0.917 0.917 0.917 0.917 0.917 0.955 0.83 0.908 0.497

All relevant variables

Irrelevant variables

SVM penalty

param C

Proportion of top ranked variables by SVM weights All variables

Causally relevant variable

Non-causally relevant

AUC analysis for discrimination between groups of all relevant and irrelevant variables based on SVM weights

AUC classification performance obtained on the 5,000-sample independent testing set: results for variable ranking based on SVM weights

Page 15: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

II. SVMs Can Select Irrelevant Variables More Frequently than Non-Causally Relevant Ones

Probability of selecting variables (by SVM-RFE) estimated over 30 random training samples of size 100 (w/o noise) from network 1a

with 100 relevant and irrelevant variables

C is small (≤0.01) C is large (≥0.1)

Page 16: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

II. SVMs Can Select Irrelevant Variables More Frequently than Non-Causally Relevant Ones

SVM penalty param C

Selected variables by SVM-RFE

Causally relevant variable

Non-causally relevant variables

All relevant variables

Irrelevant variables

0.001 0.948 0.955 0.908 0.924 0.4990.01 0.948 0.955 0.908 0.938 0.4990.1 0.949 0.955 0.906 0.949 0.4981 0.942 0.955 0.883 0.94 0.49710 0.937 0.955 0.851 0.92 0.497100 0.94 0.955 0.83 0.908 0.4971000 0.933 0.955 0.83 0.908 0.497

AUC classification performance obtained on the 5,000-sample independent testing set: results for variable selection by SVM-RFE

Page 17: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

III. SVMs Can Assign Higher Weights to the Non-Causally Relevant Variables Than to the

Causally Relevant Ones

SVM penalty param C

Nsample = 100 Nsample = 200 Nsample = 500 Nsample = 1000

0.001 0.604 0.535 0.502 0.490.01 0.602 0.536 0.502 0.4820.1 0.61 0.531 0.489 0.4841 0.611 0.528 0.498 0.48510 0.611 0.528 0.498 0.484100 0.611 0.528 0.498 0.4841000 0.611 0.528 0.498 0.484

Average ranks of variables (by SVM weights) over 30 random training samples of size 500 (w/o noise) from network 2 with 100 relevant and irrelevant variables

AUC analysis for discrimination between groups of causally relevant and non-causally relevant variables based on SVM weights

Page 18: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

IV. SVMs Can Select Non-Causally Relevant Variables More Frequently Than the

Causally Relevant Ones

Probability of selecting variables (by SVM-RFE) estimated over 30 random training samples of size 500 (w/o noise) from network 2 with 100 relevant and irrelevant variables

Page 19: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

V. SVMs Can Assign Higher Weights to the Irrelevant Variables Than to the

Causally Relevant Ones

SVM penalty param C

Nsample = 100 Nsample = 200 Nsample = 500 Nsample = 1000

0.001 0.565 0.65 0.737 0.7990.01 0.571 0.654 0.756 0.8360.1 0.58 0.664 0.803 0.8781 0.58 0.662 0.805 0.89110 0.58 0.662 0.805 0.893100 0.58 0.662 0.805 0.8931000 0.58 0.662 0.805 0.893

Average ranks of variables (by SVM weights) over 30 random training samples of size 100 (w/o noise) from network 2 with 100 relevant and irrelevant variables

AUC analysis for discrimination between groups of causally relevant and non-causally relevant variables based on SVM weights

Page 20: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

VI. SVMs Can Select Irrelevant Variables More

Frequently Than the Causally Relevant Ones

Probability of selecting variables (by SVM-RFE) estimated over 30 random training samples of size 100 (w/o noise) from network 2 with 100 relevant and irrelevant variables

Page 21: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Theoretical Example 1

X1

T

X2

Y

(Network structure 2)

• P(X1=-1) = ½, P(X1=1) = ½, P(X2=-1) = ½, and P(X2=1) = ½.

• Y is a “synthesis variable” with the following function:

• T is a binary response variable defined as:

221 XX

Y

121 XXT sign

Variables X1, X2, and Y have expected value 0 and variance 1.

The application of linear SVMs results in the following weights: 1/2 for X1, 1/2 for X2, and for Y.

Therefore, the non-causally relevant variable Y receives higher SVM weight than the causally relevant ones.

2/1

Page 22: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Theoretical Example 2

X

YT = + T = -

G1

G2

X T

X Y

Y T

Y T | X

X Y | T

X T

Y

The maximum-gap inductive bias is inconsistent with local causal discovery.

Page 23: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Discussion1. Using nonlinear SVM weight-based methods

• Preliminary experiment: When polynomial SVM-RFE is used, non-causally relevant variable is never selected in network structure 2. However, the performance of polynomial SVM-RFE is similar to linear SVM-RFE.

2. The framework of formal causal discovery (Spirtes et al, 2000) provides algorithms that can solve these problems, e.g. HITON (Aliferis et al, 2003) or MMPC & MMMB (Tsamardinos et al, 2003; Tsamardinos et al, 2006).

3. Methods based on modified SVM formulations, e.g. 0-norm and 1-norm penalties (Weston et al, 2003; Zhu et al, 2004).

4. Extend empirical evaluation to different distributions

Page 24: Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables

Conclusion

• Causal interpretation of the current SVM weight-based variable selection techniques must be conducted with great caution by practitioners

• The inductive bias employed by SVMs is locally causally inconsistent.

• New SVM methods may be needed to address this issue and this is an exciting and challenging area of research.