1 choosing independent variables the main idea of variables selection is to reduce the number of...

41
1 Choosing independent variables • The main idea of variables selection is to reduce the number of independent variables. • The goal is to identify the independent variables that are decently correlated with dependent variable and possibly not correlated among themselves. • Otherwise the independent variables might be of linear relationship which may seriously damage the model.

Upload: kimberly-bradley

Post on 01-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

1

Choosing independent variables

• The main idea of variables selection is to reduce the number of independent variables.

• The goal is to identify the independent variables that are decently correlated with dependent variable and possibly not correlated among themselves.

• Otherwise the independent variables might be of linear relationship which may seriously damage the model.

Page 2: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

2

Choosing independent variablesThree popular methods of choosing independent variables are:

• Hellwig's method• Graphs analysis method• Correlation matrix method.

Page 3: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

3

Hellwig’s method

Three steps:1. Number of combinations: 2m-12. Individual capacity of every independent

variable in the combination:

3. Integral capacity of information for every combination:

kIiij

jkj

r

rh

20

kjk hH

Page 4: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

4

Hellwig’s method

1.    Number of combinations

In Hellwig’s method the number of

combinations is provided by the formula 2m –

1 where m is the number of independent

variables.

Page 5: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

5

Hellwig’s method2. Individual capacity of each independent variable in the combination is given by the formula:

kIiij

jkj

r

rh

20

where:hkj – individual capacity of information for j-th variable in k-th combination

Page 6: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

6

Hellwig’s method2. Individual capacity of each independent variable in the combination is given by the formula:

kIiij

jkj

r

rh

20

where:r0j – correlation coefficient between j-th variable (independent) and dependent variable

Page 7: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

7

Hellwig’s method2. Individual capacity of each independent variable in the combination is given by the formula:

kIiij

jkj

r

rh

20

where:rij – correlation coefficient between i-th and j-th variable (both independent)

Page 8: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

8

Hellwig’s method2. Individual capacity of each independent variable in the combination is given by the formula:

kIiij

jkj

r

rh

20

where:Ik – the set of numbers of variables in k-th combination

Page 9: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

9

Hellwig’s method3. Integral capacity of information for every combination The next step is to calculate Hk – integral capacity of information for each combination as the sum of individual capacities of information within each combination:

kjk hH

Page 10: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

10

Hellwig’s method

• Q: HOW TO CHOOSE INDEPENDENT VARIABLES?

• A: LOOK AT INTEGRAL CAPACITIES OF INFORMATION. THE GREATEST Hk MEANS THAT VARIABLES FROM THIS COMBINATION SHOULD BE INCLUDED IN THE MODEL.

Page 11: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

11

• Let’s choose independent variables, using Hellwig's method.

Example

See details for calculations

Page 12: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

12

Example

• First we need to have vector and matrix of correlation coefficients.        Correlation coefficients between every independent variable X1, X2 and X3 and dependent variable Y are provided in vector R0.

Page 13: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

13

Example

• First we need to have vector and matrix of correlation coefficients.

       Correlation matrix R includes correlation coefficients between independent variables.

Page 14: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

14

Example

1.    Number of combinationsWe have 3 independent variables X1, X2 and X3. Thus we may have 2m-1 = 23-1= 8-1= 7 combinations of independent variables.

{X1} {X2} {X3} {X1, X2} {X1, X3} {X2, X3} {X1, X2, X3}

Page 15: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

15

Example 2. Individual capacity of independent variable in the combination 1

Page 16: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

16

Example

2. Individual capacity of independent variable in the combination 2

Page 17: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

17

Example

2. Individual capacity of independent variable in the combination 3

Page 18: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

18

Example

2. Individual capacity of every independent variable in the combination 4

Page 19: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

19

Example

2. Individual capacity of independent variables in the combination 5

Page 20: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

20

Example

2. Individual capacity of every independent variables in the combination 6

Page 21: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

21

Example

Page 22: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

22

Example 3. Integral capacity of information for each combination

The greatest integral capacity is for combination C4. Independent variables - X1, X2 - will be included in model.

Page 23: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

23

Graph analysis method

Three steps1. Calculating r*2. Modification of correlation matrix 3. Drawing the graph

Page 24: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

24

Graph analysis method• Q: HOW TO CHOOSE INDEPENDENT VARIABLES?

• A: LOOK AT THE GRAPHS. THE NUMBER OF GROUPS MEANS THE NUMBER OF VARIABLES INCLUDED IN THE MODEL. IF THERE’S SEPARATED (ISOLATED) VARIABLE, YOU SHOULD INCLUDE IT IN THE MODEL. FROM EACH GROUP, THE VARIABLE WITH THE GREATEST NUMBER OF LINKS SHOULD BE INCLUDED IN MODEL. IF THERE’S TWO VARIABLES WITH THE GREATEST NUMBER OF LINKS, YOU SHOULD TAKE THE VARIABLE WHICH IS MORE STRONGLY CORRELATED WITH DEPENDENT VARIABLE.

Page 25: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

25

Graph analysis method

1. Calculating r* We start with calculating critical value of r* using

the formula:

where tα is provided in the table of t-Student distribution at the significance level α and the degrees of freedom n-2 (sometimes r* can be given, so there’s no need to calculate it).

2

2*

2

tn

tr

Page 26: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

26

Graph analysis method

2. Modification of correlation matrix The correlation coefficients for which

are statistically irrelevant and we replace them with nulls in correlation matrix.

3. Drawing the graphUsing modified correlation matrix we draw the

graphs with bulbs representing the variables and the links representing correlation coefficients of statistical significance.

*rrij

Page 27: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

27

Example

Let’s have an example (the same one as for Hellwig’s method, n=7)

Page 28: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

28

Example

1. Calculating r* (n=7, tα, n-

2=t0,05,5=2,571)

7545,0569337,061,11

61,6

571,25

571,2

2 2

2

2

2*

tn

tr

Page 29: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

29

Example

2. Modification of correlation matrix

7545,0* r

Page 30: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

30

Example

3. Drawing the graph

Conclusion: Model will consist of X1 (as isolated variable) and x2 (cause is more strongly correlated with dependent variable – you may check it in R0 vector).

Page 31: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

31

Correlation matrix method

1. Calculate r* We start with calculating critical value of r* using

the formula:

where tα is provided in the table of t-Student distribution at the significance level α and the degrees of freedom n-2 (sometimes r* can be given, so there’s no need to calculate it).

2

2*

2

tn

tr

Page 32: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

32

2. To eliminate Xi variables weakly correlated withY

3. To choose Xs where

[Xs is the best source of information]

4. To eliminate Xi variables strongly correlated with Xs

*rrij

is rr max

*rrsi

Page 33: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

33

Example

Let’s have an example (the same one as for Hellwig’s method and graph analysis metod, n=7)

Page 34: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

34

Example

1. Calculating r* (n=7, tα, n-

2=t0,05,5=2,571)

7545,0569337,061,11

61,6

571,25

571,2

2 2

2

2

2*

tn

tr

Page 35: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

35

2. To eliminate Xi variables weakly correlated withY

*rrij

None of the variables will be eliminated

7545,0* r

Page 36: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

36

3. To choose Xs where is rr max

Page 37: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

37

4. To eliminate Xi variables strongly correlated with Xs

*rrsi 7545,0* r

None of the variables will be eliminated.

Page 38: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

38

3 .(repeated). To choose another Xs where is rr max

Page 39: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

39

4. (repeated)To eliminate Xi variables strongly correlated with Xs *rrsi

7545,0* r

X3 will be eliminated.

Page 40: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

40

4. Eliminate X3 (all correlation coefficients for X3)

OUR FINAL CHOICE - X1 and X2

Page 41: 1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent

41

In this example level of significane can be changed – this will give us different

results (you may check it if you want).

DON’T EXPECT TO GET THE SAME RESULTS FROM THESE THREE METHODS…