spurious regressions with near-multicollinearity jean-bernard chatelain kirsten ralf 2013

44
Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Upload: charleen-ray

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Spurious Regressions with Near-Multicollinearity

Jean-Bernard Chatelain

Kirsten Ralf

2013

Page 2: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Plan

1. Parameter identification problem (for interaction and quadratic terms among other cases)

2. Unstability of conditional independance

3. The identification of statistically significant spurious effect

4. Preventing spurious regressions

5. « Winner’s curse » papers

Page 3: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

If does not reject the null hypothesis that a simple correlation coefficient with the dependent variable

is zero:corr ( aid/GDP, growth of GDP) = 0,…

L

Page 4: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Then, there may be still no effect in multiple regressions

Page 5: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

1. A parameter identification problem

Page 6: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 7: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

When there are shocks on x(2) and x(1) remains invariant, observational equivalence and parameter identification problem between:

23.123231321

22

12321312

23.13132121

232332322323

22

.0

:

0:

xrxxx

x

CholeskiSchmidtGramversus

rrwith

xxx

xrxxrx

x

Page 8: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

When there are shocks on x(2) AND x(1) remains invariant, observational equivalence and parameter identification problem between:

Spurious case: x(2) has no effect on x(1)

Non spurious case: A negative feedback control variable x(3) reacts to shock on x(2) so that x(1) remains invariant:

Page 9: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Negative feedback control variable

β12 + r23 β13 = r12 = 0

7.0181 - 0.99 x 7.0889 = r12 = 0

ε3.2 ε2

  r23 = 0,99

x3 x2

 

β13=7,0889 β12=-7,0181

x1

  ε1.23

Spurious effect

r12 = 0

 

ε3.2=x3-r23x2 ε2

r(x3- r23x2, x2) =0

x2

 

β13=7,0889 r12 = 0 

x1

 ε1.23

Page 10: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Partial or set identification is not feasible: maximal range

Cf. numerical example above:

The parameter could be 0

or the maximal standardized parameter value (for the given correlation matrix) equal to: -7.0181

Page 11: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

A parameter identification problem

It is not possible to distinguish between alternative values of the parameter of a « classical suppressor » using a single multiple regression.

Then, it is not possible to learn the true value of the parameter after obtaining an large number of observations from it

(statistical inference does not help and may even be misleading).

Page 12: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

2. The instability of conditional independence with dependent y

Regression includes x3

Do not reject r12=0 (classical suppressor)

Reject r12=0

Simple regression

Do not reject β12=0

Multiple regression

No effect Type I discordance

Reject β12=0 Type II discordance (spurious)

Effect

Page 13: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

.:011det

11

ˆ

1

1

1

1

2

11ˆ

ˆ

1

1

1

223

223.13

231213

231312

223

223.1ˆ13ˆ

ˆ12ˆ

223

223

223.1

ˆ

ˆ

231213

231312

22313

12

32

22

12

1313

1212

13

12

FEASIBLErRR

rrr

rrr

rR

Nt

t

rN

rR

rrr

rrr

r

xxx

Page 14: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Two distinct acceptance region of t-tests for simple vs trivariate

196.011

196.02)100(%5

%5

223

213

321312

ˆ

12

12

12

rr

rrr

Nt

tr

Page 15: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Typologie of type II inference discordance

Correlation between regressors

Correlation of the control regressor with the dependent

The second regressor is orthogonal to the 1st one. It explains nearly all the variance of the dependent.

HIGH CORRELATION

MODERATE MODERATE

The second regressor is a second classical suppressor highly correlated with x(2)

HIGH CORRELATION

2.0

0:

:

23

230

r

rH

Accept

8.02.0 23 r

8.023 r

8.013 r

8.02.0 13 r

2.0

0:

13

130

r

rH

Accept0:

0:

:

130

230

rH

rH

reject

Page 16: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 17: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 18: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 19: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Case 2: Monte Carlo simulationsr12=0, r13=-0.03, r23=0.5, N=102/1002

Regression includes x3

Do not reject r12=0

Reject r12=0

Do not reject β12=0 No effect

92.3%/90.6%

Type I discordance

3.3%/2.4%

Reject β12=0 Type II discordance (spurious)

2.1%/4.9%

Effect

2.3%/2.1%

Page 20: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 21: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

It restricts the values of possible correlation coefficients to be inside a large ellipse for a given r(23).

On the large ellipse, the coefficient of determination is zero R2(1.23)=0, all the residuals are zero, RMSE=0.

The higher the positive correlation between two regressors, the closer the correlation with the dependent variable (the large ellipse shrinks).

011det 223

223.13 rRR

131223 1 rrr

Page 22: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 23: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Case 3: Monte Carlo simulationsr12=0, r13=-0.03, r23=0.99, N=102/1002

Regression includes x3

Do not reject r12=0

Reject r12=0

Do not reject β12=0 No effect

42.3%/0%

Type I discordance

2.8%/0%

Reject β12=0 Type II discordance (spurious)

52.1%/95.5%

Effect

2.8%/4.5%

Page 24: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Statistical significance

1. It cannot solve a parameter identification problem.

2. It is very easy to reach statistical significance for highly correlated regressors which are both very poorly correlated with the dependent variable.

Page 25: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Case 3: Statistically significant LARGE parameters for N=102 observations

r23 r12 r13 β12 β13 PIF12 PIF13

0.99 0.00 -0.03 1.4925 -1,5075 +∞ 50.2

0.99 0.015 -0.015 1.5 -1.5 100 100

0.99 0.01 0.04 -1.4874 1.5126 -148.7 37.8

0.99 0.1 0.13 -1.4422 1.5578 -14.4 11.9

0.95 0.05 -0.05 1 -1 20 20

Page 26: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

3. The identification of statistically significant spurious effects

Page 27: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Case 1: Accept r23=0

Then one rejects the negative feedback relation.

The effect of x2 is spurious

Statistical significance in the trivariate regression is meaningless.

Page 28: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Negative feedback control variable

β12 + r23 β13 = r12 = 0

7.0181 - 0 x 7.0889 = r12 = 0

ε3.2 ε2

  r23 = 0

x3 x2

 

β13=7,0889 β12=-7,0181

x1

  ε1.23

Spurious effect

r12 = 0

 

ε3.2=x3 ε2

r23=r(x3, x2) =0

x2

 

β13=7,0889 r12 = 0 

x1

 ε1.23

Page 29: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Case 3: identification of spurious effects with outlier-robust regressions

IF the statistical significance does not hold when removing or decreasing the weight of outliers in the regression with highly collinear pair of classical suppressors:

THEN the primarily found statistically significant effect is identified as SPURIOUS,

even with a LARGE number of observations.

Page 30: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

From a highly collinear regressor to an outlier driven orthogonal regressor

The non standardized parameters is determined with the second axis of principal component analysis with say 2% of the overall variance of explanatory variables.

A very large estimated parameter highly sensible to high leverage observations (outliers).

Page 31: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

0889,799,01

1

1

1

99,0,

99,0

99,0,cov99,0,

1:...11

13213223

13

23

131

232

231231

223

2232233

2

rrr

r

xx

xxxr

xx

xxxxxx

rifrxrx

Page 32: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Negative feedback control variable

β12 + r23 β13 = r12 = 0

7.0181 - 0.99 x 7.0889 = r12 = 0

ε3.2 ε2

  r23 = 0,99

x3 x2

 

β13=7,0889 β12=-7,0181

x1

  ε1.23

Spurious effect

r12 = 0

 

ε3.2=x3-r23x2 ε2

r(x3- r23x2, x2) =0

x2

 

β13=7,0889 r12 = 0 

x1

 ε1.23

Page 33: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Non spurious effect

  x2

x3

x1

 

Spurious effect 

x2

x3- r23 x2

x1

 

Page 34: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Small relative variance of regressor: sensitivity to high leverage observations

Page 35: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013
Page 36: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

4. How to avoid spurious regressions: PRE-TESTS

Pre-test of the hypothesis that simple correlation of regressors with the dependent variable:

H0: r1j= 0 and H0: rij =0, i>1

Cautious when r1j < 0.1 in absolute value.

Page 37: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

PifometricsParameter Inflation Factor > 2

232

323212

1332

232

321312

1212

2

112

2

112

12

1212

1

1;1

1

1

2

rVIFVIF

r

rr

r

rrr

rPIF

rr

PIF

S

S

Page 38: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

5. An ideal tool for winner’s curse research papers

John P.A. Ioannidis.

Researchers who makes the highest bid and wins the auction have overpaid.

Finding large and unexpected effects: risk of publication in top journal AND of being far from the truth.

Page 39: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [PMC free article] [PubMed]Goodman S, Greenland S. Assessing the unreliability of the medical literature: A response to “Why most published research findings are false” 2007. Johns Hopkins University, Department of Biostatistics. Available: http://www.bepress.com/jhubiostat/paper135. Accessed 21 March 2007.Ioannidis, J. P. A. (2005). "Contradicted and Initially Stronger Effects in Highly Cited Clinical Research". JAMA: the Journal of the American Medical Association 294 (2): 218–228. doi:10.1001/jama.294.2.218 "49 of the most highly regarded research findings in medicine over the previous 13 years". In the paper Ioannidis compared the 45 studies that claimed to have uncovered effective interventions with data from subsequent studies with larger sample sizes. 7 (16%) of the studies were contradicted and for 7 (16%) the effects were smaller than in the initial study. 31 (68%) studies remained either unchallenged or the findings could be replicated. (32% > 5% significance threshold)

Page 40: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

“An analogy can be applied to scientific publications. As with individual bidders in an auction, the average result from multiple studies yields a reasonable estimate of a “true” relationship. However, the more extreme, spectacular results (the largest treatment effects, the strongest associations, or the most unusually novel and exciting biological stories) may be preferentially published. Journals serve as intermediaries and may suffer minimal immediate consequences for errors of over- or mis-estimation, but it is the consumers of these laboratory and clinical results (other expert scientists; trainees choosing fields of endeavour; physicians and their patients; funding agencies; the media) who are “cursed” if these results are severely exaggerated - overvalued and unrepresentative of the true outcomes of many similar experiments.”Young NS, Ioannidis JPA, Al-Ubaydli O (2008) Why Current Publication Practices May Distort Science. PLoS Med 5(10): e201. doi:10.1371/journal.pmed.0050201. Science succeeds not because it is right all the time from the beginning but because it is self-correcting. The analogy for science is that if we took articles published in more prestigious journals less seriously, we would be less prone to error.

Page 41: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Problems with corr (x , y ) =0 « Classical suppressor » regressor x

Publication bias

1. Two interpretations of the parameters: spurious or not spurious

1. Publish unexpected, odd effects (easy if spurious). Identification ambiguity.

2. Non linear model, interaction terms, dynamics models with lag: highly collinear pairs of statistically significant classical suppressor

2. Interesting models. Data easily computed by the researcher (lag, square, products of variables).

3. Highly collinear pair: large parameters

3. Large effects are more convincing.

Page 42: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Problems with corr (x , y ) =0 Publication bias

4. Highly collinear statistically significant pairs of classical suppressor: ARE NOT ROBUST TO OUTLIERS

4. Fosters controversies and citations with other samples, good for the impact factor of top level journals.

5. Statistical significance easy to obtain with highly collinear pair of classical suppressor

5. Publish statistically significant results

6. Highly collinear pair of classical suppressor: more observations implies unstability of conditional independance

6. More observations boosts spurious inference, while responding to « micro-numerosity », the opposite of conventional wisdom on inference.

Page 43: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

Easily available pairs of classical suppressors chosen by researchers’ hand

X3 is highly correlated with X2:

Very interesting because:

X4 unobserved common cause to X3 and X2

X3 as a statistically significant « control variable »

X3=X2(t-1) Dynamical model

X3=X2 square,

X3= X2 cube

Non-linear model.

X3=X2 * X4

Interaction term

Complementarity

Page 44: Spurious Regressions with Near-Multicollinearity Jean-Bernard Chatelain Kirsten Ralf 2013

A current trend of economic research towards inductive papers

The share of inductive papers (with empirical statistical inference) increased with respect to deductive (math hypothesis to theorems) papers (50%, 25% papers with both) since 1995-2000.

Data availibility increased and its cost decreased (free internet access for macro data).

Personal computer storage, computation power.

Easiness of learning statistical software code.

Gresham’s law affects the average level of the use of statistics.