endogeneity and entrepreneurship research

Post on 12-Apr-2017

387 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Endogeneity & Entrepreneurship Research: The Case of Entrepreneurial Orientation

University of Nebraska—Lincoln April 2016

Brian S. Anderson, Ph.D. Assistant Professor University of Colorado, Boulder brian.s.anderson@colorado.edu

Endogeneity can seem pretty scary.

But, it’s also [reasonably] easy to address.

What we want to accomplish today…

• Dispel some myths about endogeneity

• Empower you with practical tips to deal with endogeneity in your research models/designs

• Help me make this paper better!

YX

ζ

y = α + βx + ζ

Some statistical assumptions of linear models…

• Variables are reliable (no measurement error)

• ζ represents the variance in y not accounted for by x

• ζ is orthogonal to x (x is exogenous)

YX

ζ

Endogeneity is a violation of the orthogonality assumption between x and ζ.

ψ ≠ 0

So what is the practical impact of COR(ζ,x ≠ 0)?

YX

ζ

y = α + βx + ζ

The estimator will ‘adjust’ β to try to satisfy the orthogonality assumption.

The result is β becomes inconsistent— β will not converge to the population value no matter how large the sample.

ψ ≠ 0

Depending on the conditions, the estimator could attenuate β, amplify β, or β could be correct by random chance.

Unfortunately you can’t diagnose inconsistency without testing for its presence, but the safe a priori assumption is that β is wrong.

Endogeneity also impacts zero-order correlations, so you can’t trust those much either, which also carries implications for meta-analyses.

Depending on whom you ask, the end result is that almost all of our published empirical research is wrong.

At best, the strength of most relationships are wrong, or at worse, assumed ‘true’ relationships are spurious.

What are the main sources of endogeneity for entrepreneurship and management scholars?

• Measurement error

• Omitted variables/selection, including common methods bias

• Simultaneity (chicken and egg problem)

• Panel structures/omitting fixed effects/autoregression

A few common myths—and associated (incorrect) remedies—for endogeneity often found in the literature…

Endogeneity is just reverse causality. If you just flip the model (x -> y and y -> x) and the reciprocal relationship is insignificant, then no endogeneity (Cao et al., 2015).

If you just lag the potential endogenous variable (and/or the dependent variable), you can rule out endogeneity (D’Innocenzo et al., 2015).

You can rule out endogeneity with a post hoc test, such as an Arellano-Bond model (Martinez et al., 2015).

So how do you really (correctly) deal with endogeneity?

The gold standard is the randomized experiment, but…

This assumes perfect randomization of the participants on every observed and unobserved variable.

Again depending on whom you ask, because randomization is never guaranteed, there is never an acceptable case to rule out endogeneity on theoretical grounds only.

Arguing for a minimal potential effect is acceptable—such as with a natural experiment—but endogeneity is a statistical problem that must be addressed statistically.

For us mere mortals, selection models (e.g., Heckman) and instrumental variables (e.g., 2SLS) models are our best options.

But how do we use these methods in the real-world?

The case of entrepreneurial orientation.

Innovativeness

Proactiveness Risk Taking

Entrepreneurial Orientation

Shameless self-promotion plug here for the Anderson, Kreiser, Kuratko, Hornsby & Eshima (2015) EO reconceptualization.

Consider this model using the same data from Anderson et al. (2009), but corrected for measurement error and measurement model misspecification.

Strategic LearningRisk Taking

ζ

There is a positive relationship between Risk Taking and a firm’s Strategic Learning Capability (β = .35; p < .001).

We’re not fooled though.

Couldn’t we argue that being an effective learner imbues confidence in my ability to take risk (simultaneity)?

Anderson et al. (2009) also suggest a number of possible mediators of the EO-strategic learning relationship that right now aren’t being modeled (omitted variables).

So what happens when we use instruments?

Strategic LearningRisk Taking

ζζIV1

IV2

The main effect of Risk Taking on Strategic Learning Capability disappears (β = .32; p > .1).

We’re faced with making either a Type I or Type II error.

Which model do we retain?

Ultimately, our goal is to recover the ‘correct’ parameter estimate, allowing us to draw a causal inference about the relationship between Risk and Strategic Learning.

We do this by removing the portion of variance in Risk Taking that is shared with the disturbance term (ζ).

This shared variance represents all of the unobserved effects—including measurement error—that correlate with Risk Taking and predict Strategic Learning.

In the Two Stage Least Squares (2SLS) method, the job of partialling out this shared variance falls to our instruments.

Strategic LearningRisk Taking

ζζIV1

IV2

Instruments…

• Must be individually and jointly significant

• Need at least one to identify the model; two or more per endogenous variable to conduct over-identification tests (this is really important)!

• Must be properly excluded (can’t correlate with ζ)

Strategic LearningRisk Taking

ζζIV1

IV2

Instruments must be individually and jointly significant predictors of the endogenous variable.

β = .36 p < .001

β = .39 p < .001 F = 17.40

p < .001

Strategic LearningRisk Taking

ζζIV1

IV2

Need at least one instrument to identify the model; two or more instruments per endogenous variable are necessary to conduct tests of the model’s assumptions.

A common—and partially true—myth about instruments…

You want your instrument to correlate strongly with the potential endogenous variable, but have no correlation with the dependent variable.

Just like any predictor, however, instruments must have a theoretical (non-spurious) connection with the endogenous construct.

• Heavy investments in R&D are characteristic of my industry

• Over the past three years, risk taking by executives of my business unit in seizing and exploring chancy initiatives has [decreased much — increased much]

Strategic LearningRisk Taking

ζζIV1

IV2

Instruments must be properly excluded from the second stage of the model—the instruments should not correlate with the disturbance term (or the actual DV).

This is the Sargan-Hansen test of over-identifying restrictions; in SEM, the Chi-Square statistic tells us the same thing. Also in SEM, we can actually observe these paths using modification indices.

β/ψ = 0

Our instruments look good, but how do we know if we need to use a 2SLS model?

Strategic LearningRisk Taking

ζζIV1

IV2

Endogeneity violates the assumption that COR(ζ,x ≠ 0). In SEM, we actually observe this assumption by freeing the ψ (psi) parameter between Risk and Learning.

If this parameter is significantly different from zero, endogeneity is present in the model and this indicates that we need to retain the 2SLS estimator.

ψ = .03 p = .890

Our instruments look good and the ψ parameter is not significant, so it looks like no endogeneity. But there is one more way to make sure…

Strategic LearningRisk Taking

ζζIV1

IV2

Fortunately, these models are ‘nested’—the model without the ψ parameter is simply a constrained version of the model with it, so we can just test for a significant difference between the two.

Strategic LearningRisk Taking

ζζIV1

IV2

Our test is the equivalent to a Hausman endogeneity test with 1df. If there is a significant difference between the two models, then we retain the unconstrained—the ‘larger’—model because it fits the data better.

χ2constrained = 21.79 χ2unconstrained = 21.77

χ2diff = χ2constrained - χ2unconstrained χ2diff = .02; p > .05

But what is harm in retaining the ‘more conservative’ model with instruments?

Isn’t a Type II error better than a Type I error?

Consistency comes at the expense of efficiency.

2SLS is a limited information estimator because the instruments have removed a portion of the variance in Risk Taking.

This means that we have less variance available to predict changes in the variance of Strategic Learning, and hence, lower efficiency.

Ultimately, if you don’t have to use instruments, that’s great, because you have greater freedom in your modeling approach, and—assuming consistency—efficiency is always preferred to minimize Type II error.

But you don’t know if you have that freedom unless you first evaluated the model with instruments.

Related note…consistency of parameters is different from consistency of inference.

Can’t forget about those standard errors!

We rarely just publish main effect relationships, so how does the 2SLS approach extend to mediation models?

Consider the following model again based on Anderson et al. (2009)…

Structural Organicity

Strategic LearningRisk Taking

ζ

ζ

We use the same logic as before, but now we have three potentially endogenous relationships.

In fact, in a mediation model, endogeneity is implied.

In a mediation model, we theoretically expect endogeneity to be present. Until we actually add it, the mediator(s) is an un-modeled omitted variable that ‘connects’ the predictor with the criterion.

This means that the Baron & Kenny (1986) ‘Step 1’ is, assuming mediation exists, [almost] always wrong.

Structural Organicity

Strategic LearningRisk Taking

ζ

ζ

IV1

IV2

IV3

IV4

ζ

ψ1

ψ2

ψ3

Just as before, we evaluate the individual and joint significance of the instruments, along with the exclusion restriction of all possible endogenous paths.

Structural Organicity

Strategic LearningRisk Taking

ζ

ζ

IV1

IV2

IV3

IV4

ζ

β = .36 p < .001

β = .39 p < .001

F = 17.40 p < .001

β = -.23 p < .01

β = .29 p < .001

F = 9.26 p < .001

ψ1 = .17 p = .384

ψ2 = -.27 p = .296

ψ3 = .01 p = .962

χ2model = 36.14; p = .725

Wait a second…what do we want significant and what don’t we want to be significant?

Significant

• β parameter from the instrument to the endogenous variable

• F test of the instruments

Non-Significant

• χ2 of the overall model • Modification indices

from the instruments to the dependent variable

If the ψ parameter of the disturbance term covariance is significant, you MUST retain the 2SLS model!

Our instruments are individually and jointly valid, the ψ parameters are not significant, and our model does not show evidence of misspecification.

Do we retain the model?

To help make this determination, we setup a series of nested models, constraining each ψ parameter in turn and comparing the unconstrained with the focal constrained model.

Supporting our overall findings, none of these comparisons were significant.

So we retain the more efficient estimator, which has the (happy) result of being nomologically consistent with the findings reported in Anderson et al. (2009).

What about multiple mediation models?

The same logic as a single mediator applies, but the assumptions become increasingly difficult to satisfy.

A note about moderation and curvilinear relationships…

What about observed variables and panel data?

Wait a second, don’t control variables basically do the same thing as instruments?

Implications…

Stronger measurement models go a long way to minimizing endogeneity’s impact…

• Maximize indicator reliability

• For reflective measurement models, be aggressive about trimming less reliable indicators, but avoid construct deficiency

• A non-significant χ2 does not guarantee against model misspecification, but a properly specified model will ALWAYS have a non-significant χ2

If the researcher is using latent constructs with psychometric indicators, there is no reason (excuse) to avoid accounting for measurement error.

The more complex the model, the more likely the model will be misspecified in some material way, including potentially endogenous paths.

Our bar for a strong theoretical contribution on the basis of model complexity is therefore more likely to introduce erroneous findings into the literature.

As reviewers, we need to push authors to deal (correctly) with endogeneity in their research design, but that also may mean educating them on the best way to go about it given their research question.

Dealing with endogeneity should be an integral component of doctoral student training from the very beginning.

top related