week 8: regression in the social sciences · week 8: regression in the social sciences brandon...

610
Week 8: Regression in the Social Sciences Brandon Stewart 1 Princeton November 7 and 9, 2016 1 These slides are heavily influenced by Matt Blackwell, Justin Grimmer, Jens Hainmueller, Erin Hartman, Kosuke Imai and Gary King. Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 1 / 117

Upload: others

Post on 26-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Week 8: Regression in the Social Sciences

Brandon Stewart1

Princeton

November 7 and 9, 2016

1These slides are heavily influenced by Matt Blackwell, Justin Grimmer, JensHainmueller, Erin Hartman, Kosuke Imai and Gary King.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 1 / 117

Page 2: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 3: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 4: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This Week

I Monday:F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 5: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 6: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 7: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next Week

I regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 8: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 9: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 2 / 117

Page 10: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 3 / 117

Page 11: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 3 / 117

Page 12: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 13: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 14: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 15: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math?

We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 16: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 17: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 18: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 19: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 4 / 117

Page 20: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 5 / 117

Page 21: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 22: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:

1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 23: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 24: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:

I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 25: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)

I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 26: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)

I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 27: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)

I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 28: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 29: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

This week we will cover a few issues in these areas:I power (research design)I problems with p-values (argument, design, presentation)I visualization and quantities of interest (argument, presentation)I identification and causal inference (argument, design)

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 6 / 117

Page 30: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Are Most Published Findings False?

The backdrop for this is numerous studies with a dim view of the veracityof the average academic article

Ioannidis (2005) “Why most published research findings are false”PLOS Medicine

Begley and Ellis (2012) “Drug development: raise standard forpreclinical cancer research” Nature

Johnson (2013) “Revised standards for statistical evidence” PNAS

Franco, Malhotra, Simonovits (2014) “Publication Bias in the SocialSciences: Unlocking the File Drawer” Science

Nosek et al (2015) “Estimating the reproducibility of psychologicalscience” Science

Leek and Jager (2017) “Is Most Published Research Really False?”Annual Review of Statistics and Its Applications

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 7 / 117

Page 31: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Are Most Published Findings False?

The backdrop for this is numerous studies with a dim view of the veracityof the average academic article

Ioannidis (2005) “Why most published research findings are false”PLOS Medicine

Begley and Ellis (2012) “Drug development: raise standard forpreclinical cancer research” Nature

Johnson (2013) “Revised standards for statistical evidence” PNAS

Franco, Malhotra, Simonovits (2014) “Publication Bias in the SocialSciences: Unlocking the File Drawer” Science

Nosek et al (2015) “Estimating the reproducibility of psychologicalscience” Science

Leek and Jager (2017) “Is Most Published Research Really False?”Annual Review of Statistics and Its Applications

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 7 / 117

Page 32: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Are Most Published Findings False?

The backdrop for this is numerous studies with a dim view of the veracityof the average academic article

Ioannidis (2005) “Why most published research findings are false”PLOS Medicine

Begley and Ellis (2012) “Drug development: raise standard forpreclinical cancer research” Nature

Johnson (2013) “Revised standards for statistical evidence” PNAS

Franco, Malhotra, Simonovits (2014) “Publication Bias in the SocialSciences: Unlocking the File Drawer” Science

Nosek et al (2015) “Estimating the reproducibility of psychologicalscience” Science

Leek and Jager (2017) “Is Most Published Research Really False?”Annual Review of Statistics and Its Applications

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 7 / 117

Page 33: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 8 / 117

Page 34: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 8 / 117

Page 35: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Example Gerber, Green and Larimer (2008)

Matt Salganik in his book Bit by Bit argues that it is an ethicalimperative to only use as many subjects as we need in an experiment

Why did GGL use sample sizes of 38,000? Could they have usedfewer? How would we know?

We choose the sample size that can ensure that we can detect whatwe think is the true treatment effect (i.e. reject the null of no effect).

Small effects will require more observations than large effects. Buthow many more?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 9 / 117

Page 36: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Example Gerber, Green and Larimer (2008)

Matt Salganik in his book Bit by Bit argues that it is an ethicalimperative to only use as many subjects as we need in an experiment

Why did GGL use sample sizes of 38,000? Could they have usedfewer? How would we know?

We choose the sample size that can ensure that we can detect whatwe think is the true treatment effect (i.e. reject the null of no effect).

Small effects will require more observations than large effects. Buthow many more?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 9 / 117

Page 37: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Example Gerber, Green and Larimer (2008)

Matt Salganik in his book Bit by Bit argues that it is an ethicalimperative to only use as many subjects as we need in an experiment

Why did GGL use sample sizes of 38,000? Could they have usedfewer? How would we know?

We choose the sample size that can ensure that we can detect whatwe think is the true treatment effect (i.e. reject the null of no effect).

Small effects will require more observations than large effects. Buthow many more?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 9 / 117

Page 38: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Example Gerber, Green and Larimer (2008)

Matt Salganik in his book Bit by Bit argues that it is an ethicalimperative to only use as many subjects as we need in an experiment

Why did GGL use sample sizes of 38,000? Could they have usedfewer? How would we know?

We choose the sample size that can ensure that we can detect whatwe think is the true treatment effect (i.e. reject the null of no effect).

Small effects will require more observations than large effects. Buthow many more?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 9 / 117

Page 39: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discovery

α identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 40: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.

Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 41: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 42: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 43: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 44: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 45: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 46: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 47: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 48: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 49: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Statistical Power

Type 1 errors–false discoveryα identifies tolerance for type 1 errors.Type 2 errors–failed discovery.

Experimental design: what effect can I detect?

What is the power of my study?

Power: 1 - Prob. type 2 error.

Power will (in general) depend on four factors (particularly for t/normallydistributed test statistics):

1) Type I error rate (α)

2) Effect size

3) Variance

4) Number of observations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 10 / 117

Page 50: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).

Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 51: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 52: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 53: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 54: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 55: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis Tests

Suppose (again) we’re running an experiment, sampling from two normaldistributions (treatment and control).Testing hypotheses:

H0: µt − µc ≡ µdiff = 0

H1: µdiff 6= 0

Test statistic:

t =T − C√σ2

tnt

+ σ2c

nc

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 11 / 117

Page 56: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 57: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 58: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 59: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 60: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 61: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 62: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 63: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 64: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

t =T − C√σ2

tnt

+σ2

cnc

Increasing nt and nc → facilitatesdetection of smaller effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 65: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

- Power analysis:

- a priori How manyparticipants in study

- post-hoc Largest effect sizelikely detected, given studycharacteristics

- Precise zeros: under powered(small nt and nc relative tovariance) make failedrejection likely.

- not evidence of no effect!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 66: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

- Power analysis:

- a priori How manyparticipants in study

- post-hoc Largest effect sizelikely detected, given studycharacteristics

- Precise zeros: under powered(small nt and nc relative tovariance) make failedrejection likely.

- not evidence of no effect!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 67: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

- Power analysis:

- a priori How manyparticipants in study

- post-hoc Largest effect sizelikely detected, given studycharacteristics

- Precise zeros: under powered(small nt and nc relative tovariance) make failedrejection likely.

- not evidence of no effect!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 68: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

- Power analysis:

- a priori How manyparticipants in study

- post-hoc Largest effect sizelikely detected, given studycharacteristics

- Precise zeros: under powered(small nt and nc relative tovariance) make failedrejection likely.

- not evidence of no effect!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 69: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power and Hypothesis TestsKey question: given true value of µ∗diff 6= 0 what is the probability t falls in“fail to reject” region?

Pr(Type 2 error) = P(−1.96 < t < 1.96)

Power = 1− Pr(Type 2 error)

−0.4 −0.2 0.0 0.2 0.4

0.2

0.4

0.6

0.8

1.0

mu_diff

Pow

er =

1 −

Pro

b. T

ype

2 E

rror

- Power analysis:

- a priori How manyparticipants in study

- post-hoc Largest effect sizelikely detected, given studycharacteristics

- Precise zeros: under powered(small nt and nc relative tovariance) make failedrejection likely.

- not evidence of no effect!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 12 / 117

Page 70: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 71: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis

1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 72: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 73: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 74: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 75: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to Gerber, Green and Larimer

Before starting they did a power analysis

Steps to a power analysis1 Pick some hypothetical effect size µy − µx = .052 Calculate the distribution of test statistic T under that effect size3 Calculate the probability of rejecting the null under that distribution4 Repeat for different effect sizes

Let’s say we want to run another experiment where we believe thetrue effect is µy − µx = 0.05 and the variances are σ2

y = σ2x = .2

We can only afford to send out 500 mailers. Should we run theexperiment?

To the board!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 13 / 117

Page 76: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 77: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 78: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 79: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 80: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 81: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 82: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)

I perform a power analysis under given effect size, observed standarderror of measurement.

I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 83: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.

I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 84: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Short Case Study of Retrospective Power Analysis

Durante, Arsena and Griskevicius (2013) publish a study inPsychological Science on menstrual cycles and political attitudes

They report a 17 point swing in voting preferences in the 2012election.

I for context polling showed Obama’s support varying by 7 points duringthe entire general campaign (Gallup 2012)

I the implied standard error was 8.1 percentage points with a p-value of0.035.

Gelman and Carlin (2014) show how the study design should cause usto question the finding.

I they assume a true effect size of 2 points (upper end of plausible)I perform a power analysis under given effect size, observed standard

error of measurement.I power comes out to 0.06

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 14 / 117

Page 85: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Troubling Figure (via Gelman)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 15 / 117

Page 86: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 16 / 117

Page 87: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 16 / 117

Page 88: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Problems with p-values

p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.

p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false

a large p-value could mean either that we are in the null world ORthat we had insufficient power

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 17 / 117

Page 89: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Problems with p-values

p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.

p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false

a large p-value could mean either that we are in the null world ORthat we had insufficient power

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 17 / 117

Page 90: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Problems with p-values

p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.

p-values are not:

I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false

a large p-value could mean either that we are in the null world ORthat we had insufficient power

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 17 / 117

Page 91: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Problems with p-values

p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.

p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false

a large p-value could mean either that we are in the null world ORthat we had insufficient power

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 17 / 117

Page 92: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Problems with p-values

p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.

p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false

a large p-value could mean either that we are in the null world ORthat we had insufficient power

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 17 / 117

Page 93: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

So what is the basic idea?

The idea was to run an experiment, then see if the results wereconsistent with what random chance might produce. Researcherswould first set up a ’null hypothesis’ that they wanted todisprove, such as there being no correlation or no differencebetween groups. Next, they would play the devil’s advocate and,assuming that this null hypothesis was in fact true, calculate thechances of getting results at least as extreme as what wasactually observed. This probability was the P value. The smallerit was, suggested Fisher, the greater the likelihood that thestraw-man null hypothesis was false.(Nunzo 2014, emphasis mine)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 18 / 117

Page 94: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 95: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 96: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 97: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 98: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 99: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

I’ve got 99 problems. . .

p-values are hard to interpret, but even in the best scenarios they havesome key problems:

they remove focus from data, measurement, theory and thesubstantive quantity of interest

they are often applied outside the dichotomous/decision-makingframework where they make some sense

“significant covariates” aren’t even necessarily good predictors (Wardet al 2010, Lo et al 2015)

they lead to publication filtering on arbitrary cutoffs.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 19 / 117

Page 100: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Arbitrary Cutoffs

Gerber and Malhotra (2006) Top Political Science JournalsStewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 20 / 117

Page 101: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Arbitrary Cutoffs

Gerber and Malhotra (2008) Top Sociology JournalsStewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 20 / 117

Page 102: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Arbitrary Cutoffs

Masicampo and Lalande (2012) Top Psychology Journals

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 20 / 117

Page 103: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Still Not Convinced?The Real Harm of Misinterpreted p-values

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 21 / 117

Page 104: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Example from Hauer: Right-Turn-On-Red

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 22 / 117

Page 105: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Point in Hauer

Two other interesting examples in Hauer (2004)

Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)

On the opposite end, large tech companies essentially never usesignificance testing because they have huge samples which essentiallyalways find some non-zero effect. But that doesn’t make the effectsignificant in a colloquial sense of important.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 23 / 117

Page 106: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Point in Hauer

Two other interesting examples in Hauer (2004)

Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)

On the opposite end, large tech companies essentially never usesignificance testing because they have huge samples which essentiallyalways find some non-zero effect. But that doesn’t make the effectsignificant in a colloquial sense of important.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 23 / 117

Page 107: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Point in Hauer

Two other interesting examples in Hauer (2004)

Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)

On the opposite end, large tech companies essentially never usesignificance testing because they have huge samples which essentiallyalways find some non-zero effect. But that doesn’t make the effectsignificant in a colloquial sense of important.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 23 / 117

Page 108: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Point in Hauer

Two other interesting examples in Hauer (2004)

Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)

On the opposite end, large tech companies essentially never usesignificance testing because they have huge samples which essentiallyalways find some non-zero effect. But that doesn’t make the effectsignificant in a colloquial sense of important.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 23 / 117

Page 109: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

P-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?

- Basically, yes.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 110: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

P-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?

- Basically, yes.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 111: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

What’s the matter with you?

- Two reasons not to worship p-values [of many]

1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information

2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (Grandparent rule)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 112: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

What’s the matter with you?

- Two reasons not to worship p-values [of many]

1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information

2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (Grandparent rule)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 113: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

What’s the matter with you?

- Two reasons not to worship p-values [of many]

1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information

2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (Grandparent rule)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 114: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

What’s the matter with you?

- Two reasons not to worship p-values [of many]

1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information

2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (Grandparent rule)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 115: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?

1) Me too, good luck.

2) That’s not what p-values measure

3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 116: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?

1) Me too, good luck.

2) That’s not what p-values measure

3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 117: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?

1) Me too, good luck.

2) That’s not what p-values measure

3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 118: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?

1) Me too, good luck.

2) That’s not what p-values measure

3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 119: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

I like to estimate the certainty of my results using a p-value, do you findthis problematic?

- Yes

- What could be wrong with this?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 120: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

I like to estimate the certainty of my results using a p-value, do you findthis problematic?

- Yes

- What could be wrong with this?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 121: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

I like to estimate the certainty of my results using a p-value, do you findthis problematic?

- Yes

- What could be wrong with this?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 122: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

I like to estimate the certainty of my results using a p-value, do you findthis problematic?

- Yes

- What could be wrong with this?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 24 / 117

Page 123: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it.

No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 124: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values

(but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 125: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways.

I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 126: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 127: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)

Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 128: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interest

Why?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 129: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 130: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 131: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 132: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

P-values and Confidence Intervals

OK–I get it. No p-values (but I’m still going to use them anyways. I’mgoing to put an extra star in my table and name it after you).

But what should I do? (Other than talk about methodologists and howthey are jerks?)Confidence Intervals, Graphical Presentation of a quantity of interestWhy?

1) Substantive significance and statistical significance simultaneously

2) Make comparisons across factors approximately and accurately(though exercise caution)

3) Harder to hide weird looking effects

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 25 / 117

Page 133: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

But Let’s Not Obsess Too Much About p-values

From Leek and Peng (2015) “P values are just the tip of the iceberg” Nature.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 26 / 117

Page 134: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 27 / 117

Page 135: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 27 / 117

Page 136: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Intro Motivation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 28 / 117

Page 137: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

An Intro Motivation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 28 / 117

Page 138: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 139: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 140: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 141: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposes

I drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 142: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/dataset

I presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 143: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidence

I exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 144: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 145: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved

1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 146: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal

2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 147: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest

3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 148: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 149: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 29 / 117

Page 150: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Examples via Gelman

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 30 / 117

Page 151: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Examples via Gelman

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 30 / 117

Page 152: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Examples via Gelman

One may be better for drawing people in, the other for evidence.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 30 / 117

Page 153: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Examples via Gelman

A classic infographic by Nightingale. Dramatizes the problem.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 30 / 117

Page 154: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Examples via Gelman

A simpler version where it is easier to see the patterns.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 30 / 117

Page 155: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 156: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 157: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 158: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 159: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 160: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results

This one is typical of current practice, not especially bad.

What do these numbers mean?

Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 31 / 117

Page 161: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results: Continued

What does the star-gazing add?

Can any be interpreted as causal estimates?

Can you compute a quantity of interest from these numbers?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 32 / 117

Page 162: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results: Continued

What does the star-gazing add?

Can any be interpreted as causal estimates?

Can you compute a quantity of interest from these numbers?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 32 / 117

Page 163: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results: Continued

What does the star-gazing add?

Can any be interpreted as causal estimates?

Can you compute a quantity of interest from these numbers?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 32 / 117

Page 164: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How Not to Present Statistical Results: Continued

What does the star-gazing add?

Can any be interpreted as causal estimates?

Can you compute a quantity of interest from these numbers?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 32 / 117

Page 165: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 166: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 167: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 168: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and

(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 169: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.

(d) Include no superfluous information, no long lists of coefficients no oneunderstands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 170: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 171: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 172: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 173: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of greatestsubstantive interest,

(b) Include reasonable measures of uncertainty about those estimates, and(c) Require little specialized knowledge to understand.(d) Include no superfluous information, no long lists of coefficients no one

understands, no star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy the expert reviewer and the lay person.

4. King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 33 / 117

Page 174: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How to Calculate Quantities of Interest

We will discuss more general recipes in Soc504 but essentially threeapproaches: analytic, parametric simulation, non-parametricsimulation.

Analytic method involve finding the quantity and using the propertiesof variance and covariances to calculate a confidence interval. Can bechallenging for complicated quantities.

Parametric simulation uses the estimated coefficients andvariance-covariance matrix to simulate outcomes (we will cover this inSoc504)

Non-parametric simulation uses the bootstrap. Calculate the quantityof interest within each bootstrap sample and then aggregate to get anestimate of the variance.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 34 / 117

Page 175: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How to Calculate Quantities of Interest

We will discuss more general recipes in Soc504 but essentially threeapproaches: analytic, parametric simulation, non-parametricsimulation.

Analytic method involve finding the quantity and using the propertiesof variance and covariances to calculate a confidence interval. Can bechallenging for complicated quantities.

Parametric simulation uses the estimated coefficients andvariance-covariance matrix to simulate outcomes (we will cover this inSoc504)

Non-parametric simulation uses the bootstrap. Calculate the quantityof interest within each bootstrap sample and then aggregate to get anestimate of the variance.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 34 / 117

Page 176: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How to Calculate Quantities of Interest

We will discuss more general recipes in Soc504 but essentially threeapproaches: analytic, parametric simulation, non-parametricsimulation.

Analytic method involve finding the quantity and using the propertiesof variance and covariances to calculate a confidence interval. Can bechallenging for complicated quantities.

Parametric simulation uses the estimated coefficients andvariance-covariance matrix to simulate outcomes (we will cover this inSoc504)

Non-parametric simulation uses the bootstrap. Calculate the quantityof interest within each bootstrap sample and then aggregate to get anestimate of the variance.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 34 / 117

Page 177: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How to Calculate Quantities of Interest

We will discuss more general recipes in Soc504 but essentially threeapproaches: analytic, parametric simulation, non-parametricsimulation.

Analytic method involve finding the quantity and using the propertiesof variance and covariances to calculate a confidence interval. Can bechallenging for complicated quantities.

Parametric simulation uses the estimated coefficients andvariance-covariance matrix to simulate outcomes (we will cover this inSoc504)

Non-parametric simulation uses the bootstrap. Calculate the quantityof interest within each bootstrap sample and then aggregate to get anestimate of the variance.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 34 / 117

Page 178: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

How to Calculate Quantities of Interest

We will discuss more general recipes in Soc504 but essentially threeapproaches: analytic, parametric simulation, non-parametricsimulation.

Analytic method involve finding the quantity and using the propertiesof variance and covariances to calculate a confidence interval. Can bechallenging for complicated quantities.

Parametric simulation uses the estimated coefficients andvariance-covariance matrix to simulate outcomes (we will cover this inSoc504)

Non-parametric simulation uses the bootstrap. Calculate the quantityof interest within each bootstrap sample and then aggregate to get anestimate of the variance.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 34 / 117

Page 179: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 180: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 181: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference

art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 182: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 183: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 184: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Visualization

Looking at data

- Most basic method of inference

- Hardest method of inference art

- Why visualize?

Four (related) reasons

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 35 / 117

Page 185: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 1: Models LieRemember anscombe’s quartet?

4 6 8 10 12 14

45

67

89

1011

x

y

0.82

●●

4 6 8 10 12 14

34

56

78

9

x

y

0.82

4 6 8 10 12 14

68

1012

x

y

0.82

●●

8 10 12 14 16 18

68

1012

x

y

0.82

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 36 / 117

Page 186: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 2: Delivery of Informationx y

1 -10.00000 -95.890032 -9.00000 -82.657203 -8.00000 -62.426554 -7.00000 -40.879585 -6.00000 -26.674746 -5.00000 -22.436077 -4.00000 -24.556638 -3.00000 -16.215679 -2.00000 -5.69815

10 -1.00000 5.8026611 0.00000 -6.3636612 1.00000 0.8360113 2.00000 4.1015014 3.00000 0.7934915 4.00000 -19.6315216 5.00000 -17.7679517 6.00000 -41.6158718 7.00000 -43.4915919 8.00000 -68.3698120 9.00000 -88.8633921 10.00000 -101.64692

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 37 / 117

Page 187: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 2: Delivery of Information

●●

−10 −5 0 5 10

−10

0−

80−

60−

40−

200

x

y

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 37 / 117

Page 188: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 3: Model Skepticism

Reason 3: Model Skepticism and Checking

All inferences rest on assumption–visualization is a particularly reliablemethod for identifying obvious violations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 38 / 117

Page 189: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 3: Model Skepticism

Reason 3: Model Skepticism and CheckingAll inferences rest on assumption–visualization is a particularly reliablemethod for identifying obvious violations

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 38 / 117

Page 190: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

Example from Justin Grimmer’s work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 191: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

Lincoln Chafee, (R-RI)

33% Credit Claiming

26% Position Taking

Airport Grants

Example from Justin Grimmer’s work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 192: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

●●

Susan Collins, (R-RI)

29% Credit Claiming

20% Position Taking

Airport Grants

Example from Justin Grimmer’s work.Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 193: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

●●

Mike Enzi, (R-WY)

18% Credit Claiming

35% Position Taking

Tax Reform

Example from Justin Grimmer’s work.Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 194: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

●●

Example from Justin Grimmer’s work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 195: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

●●

John Tester, (D-MT)

43% Credit Claiming

20% Position Taking

Water Grants

Example from Justin Grimmer’s work.Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 196: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Reason 4: Presentation and Persuasion

0.4 0.5 0.6 0.7

−0.

4−

0.2

0.0

0.2

0.4

Two Party Vote Share, Bush

Pro

p C

redi

t Cla

imin

g −

Pro

p. P

ositi

on T

akin

g

●●

●Sheldon Whitehouse, (D-RI)

8% Credit Claiming

54% Position Taking

Iraq War

Example from Justin Grimmer’s work.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 39 / 117

Page 197: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 198: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 199: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 200: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 201: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 202: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 203: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Power, p-values and quantities of interest

There are a number of problems with p-values and significance testingas currently applied

At the core of these problem is that p-values don’t mean quite whatwe want them to mean

A solution? Generate quantities of interest connected to your theoryand present those.

Tools such as the bootstrap and simulation techniques describedtoday can help get the right quantities. We will expand on this nextsemester.

Visualizations can serve many purposes including a compelling way topresent these quantities.

Many of these concerns have motivated a turn towards causalinference

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 40 / 117

Page 204: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 41 / 117

Page 205: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 41 / 117

Page 206: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ”

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 42 / 117

Page 207: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ”

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 42 / 117

Page 208: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ”

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 42 / 117

Page 209: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ”

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 42 / 117

Page 210: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ”

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 42 / 117

Page 211: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 212: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 213: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 214: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 215: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 216: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 217: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when access to infinite data wouldresult in the estimate taking on only a single value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claimyou’ve estimated a causal effect?

As we will see this is not a conversation about estimation (in otherwords the answer cannot be “regression”)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 43 / 117

Page 218: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. regression)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 44 / 117

Page 219: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. regression)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 44 / 117

Page 220: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. regression)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 44 / 117

Page 221: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. regression)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 44 / 117

Page 222: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. regression)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 44 / 117

Page 223: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next Time

Next class we will talk about what kind of identification assumptionswe need to estimate causal effects!

Causal inference is tricky and I highly recommend you take a look atMorgan and Winship Chapter 1 before class.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 45 / 117

Page 224: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next Time

Next class we will talk about what kind of identification assumptionswe need to estimate causal effects!

Causal inference is tricky and I highly recommend you take a look atMorgan and Winship Chapter 1 before class.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 45 / 117

Page 225: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next Time

Next class we will talk about what kind of identification assumptionswe need to estimate causal effects!

Causal inference is tricky and I highly recommend you take a look atMorgan and Winship Chapter 1 before class.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 45 / 117

Page 226: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 227: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 228: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 229: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 230: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 231: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 232: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with Censorship

Often you don’t need sophisticated methods to reveal interestingfindings

“Ansolabahere’s Law”: real relationship is visible in a bivariate plotand remains in a more sophisticated in a statistical model

In other words: all inferences require both visual and mathematicalevidence

Example: King, Pan and Roberts (2013) “How Censorship in ChinaAllows Government Criticism but Silences Collective Expression”American Political Science Review

They use very simple (statistical) methods to great effect.

This line of work is one of my favorites.

Sequence of slides that follow courtesy of King, Pan and Roberts

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 46 / 117

Page 233: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history

:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 234: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 235: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 236: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state

2 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 237: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state

2 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 238: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state

2 Stop collective action

Right

Either or both could be right or wrong.

(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 239: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action

Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 240: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 241: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 242: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 243: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 244: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 245: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 246: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 247: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Chinese Censorship

The largest selective suppression of human expression in history:

implemented manually,

by ≈ 200, 000 workers,

located in government and inside social media firms

Theories of the Goal of Censorship

1 Stop criticism of the state Wrong

2 Stop collective action Right

Either or both could be right or wrong.

(They also censor 2 other smaller categories)

Benefit

?

Huge

Cost

Huge

Small

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 47 / 117

Page 248: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):

I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 249: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):

I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 250: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):

I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 251: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):

I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 252: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):

I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 253: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):I Download content the instant it appears

I (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 254: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censored

I Use computer-assisted methods of text analysis (some existing, somenew, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 255: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Study

Collect 3,674,698 social media posts in 85 topic areas over 6 months

Random sample: 127,283

(Repeat design; Total analyzed: 11,382,221)

For each post (on a timeline in one of 85 content areas):I Download content the instant it appearsI (Carefully) revisit each later to determine if it was censoredI Use computer-assisted methods of text analysis (some existing, some

new, all adapted to Chinese)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 48 / 117

Page 256: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Censorship is not Ambiguous: BBS Error Page

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 49 / 117

Page 257: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

For 2 Unusual Topics: Constant Censorship Effort

05

1015

Count

Jan Mar Apr May Jun

Count PublishedCount Censored

Count PublishedCount Censored

010

2030

4050

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored

Students Throw Shoes at

Fang BinXing

Pornography Criticism of the Censors

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 50 / 117

Page 258: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

For 2 Unusual Topics: Constant Censorship Effort0

510

15

Count

Jan Mar Apr May Jun

Count PublishedCount Censored

Count PublishedCount Censored

010

2030

4050

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored

Students Throw Shoes at

Fang BinXing

Pornography Criticism of the Censors

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 50 / 117

Page 259: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”

010

2030

4050

6070

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:

I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 260: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:

I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 261: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:

I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 262: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burst

I (≈ 3 SDs greaterthan baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 263: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 264: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 265: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 266: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 267: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 268: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 269: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 270: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 271: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

All other topics: Censorship & Post Volume are “Bursty”0

1020

3040

5060

70

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored Riots in

Zengcheng

Unit of analysis:I volume burstI (≈ 3 SDs greater

than baseline volume)

They monitored 85 topicareas (Jan–July 2011)

Found 87 volume burstsin total

Identified real worldevents associated witheach burst

Their hypothesis: The government censors all posts in volume burstsassociated with events with collective action (regardless of how critical orsupportive of the state)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 51 / 117

Page 272: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 273: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 274: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 275: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 276: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 277: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship

-0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 278: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 1: Post Volume

Begin with 87 volume bursts in 85 topics areas

For each burst, calculate change in % censorship inside to outsideeach volume burst within topic areas – censorship magnitude

If goal of censorship is to stop collective action, they expect:

1 On average, % censoredshould increase duringvolume bursts

2 Some bursts (associatedwith politically relevantevents) should havemuch higher censorship -0.2 0.0 0.2 0.4 0.6 0.8

01

23

45

Censorship Magnitude

Density

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 52 / 117

Page 279: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1

I protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 280: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1

I protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 281: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action Potential

I protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 282: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the Internet

I individuals who have organized or incited collective action on theground in the past;

I topics related to nationalism or nationalist sentiment that have incitedprotest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 283: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;

I topics related to nationalism or nationalist sentiment that have incitedprotest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 284: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 285: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 286: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 287: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 288: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 289: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 290: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 291: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 292: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 293: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Observational Test 2: The Event Generating VolumeBursts

Event classification (each category can be +, −, or neutral commentsabout the state)

1 Collective Action PotentialI protest or organized crowd formation outside the InternetI individuals who have organized or incited collective action on the

ground in the past;I topics related to nationalism or nationalist sentiment that have incited

protest or collective action in the past.

2 Criticism of censors

3 Pornography

4 (Other) News

5 Government Policies

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 53 / 117

Page 294: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Types of Events Are Censored?

Censorship Magnitude

Density

02

46

810

12

-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Collective ActionCriticism of CensorsPornography

PolicyNews

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 54 / 117

Page 295: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Types of Events Are Censored?

Censorship Magnitude

Popular Book Published in Audio FormatDisney Announced Theme ParkEPA Issues New Rules on LeadChinese Solar Company Announces EarningsChina Puts Nuclear Program on HoldGov't Increases Power PricesJon Hunstman Steps Down as Ambassador to ChinaNews About Iran Nuclear ProgramIndoor Smoking Ban Takes EffectPopular Video Game ReleasedEducation Reform for Migrant ChildrenFood Prices RiseU.S. Military Intervention in Libya

Censorship Magnitude

New Laws on Fifty Cent PartyRush to Buy Salt After Earthquake

Students Throw Shoes at Fang BinXingFuzhou Bombing

Localized Advocacy for Environment LotteryGoogle is Hacked

Collective Anger At Lead Poisoning in JiangsuAi Weiwei Arrested

Pornography Mentioning Popular BookZengcheng Protests

Baidu Copyright LawsuitPornography Disguised as News

Protests in Inner Mongolia

PoliciesNews

Collective ActionCriticism of CensorsPornography

-0.2 0 0.1 0.3 0.5 0.7

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 54 / 117

Page 296: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Censoring Collective Action: Ai Weiwei’s Arrest

010

2030

40

Count

Jan Mar Apr May Jun Jul

Count PublishedCount Censored

Ai Weiwei arrested

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 55 / 117

Page 297: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Censoring Collective Action: Protests in Inner Mongolia

05

1015

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored

Protests in Inner Mongolia

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 56 / 117

Page 298: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Low Censorship on One Child Policy

010

2030

40

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored

Speculation of

Policy Reversal at NPC

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 57 / 117

Page 299: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Low Censorship on News: Power Prices

010

2030

4050

6070

Count

Jan Feb Mar Apr May Jun Jul

Count PublishedCount Censored

Power shortages Gov't raises power prices

to curb demand

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 58 / 117

Page 300: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

References

This Lecture:

Gelman and Carlin (2014). “Beyond Power Calculations: AssessingType S (Sign) and Type M (Magnitude) Errors”

Kastellec and Leoni (2007). “Using Graphs Instead of Tables inPolitical Science.” Perspectives on Politics

King, Pan and Roberts (2013) “How Censorship in China AllowsGovernment Criticism but Silences Collective Expression”

King, G. Tomz, M., and Wittenberg, J. (2000). “Making the Most ofStatistical Analyses: Improving Interpretation and Presentation.”American Journal of Political Science

Nunzo, R. (2014) “Scientific method: Statistical errors” Nature.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 59 / 117

Page 301: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 302: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 303: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This Week

I Monday:F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 304: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 305: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 306: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next Week

I regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 307: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 308: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI matrix form of linear regressionI inference and F-tests

This WeekI Monday:

F making an argument in social sciences

I Wednesday:F causal inference

Next WeekI regression diagnostics

Long RunI regression → diagnostics → causal inference

Questions?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 60 / 117

Page 309: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 61 / 117

Page 310: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 61 / 117

Page 311: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 312: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 313: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 314: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation

- We will see that correlation based definitions can lead to considerationof nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 315: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 316: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 317: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 318: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 319: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause

- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 320: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 321: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

What’s a cause?

- Time precedence

- Constant Conjunction

- Correlation is not causation- We will see that correlation based definitions can lead to consideration

of nonsensical causes

- Method of difference

- Identify identical units, except for treatment. Attribute cause todifference

- Granger Cause

- Forecast based definition of cause- Problem of common causes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 322: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 62 / 117

Page 323: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Potential Outcomes Model

Figure: Neyman

Figure: Rubin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 63 / 117

Page 324: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 325: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 326: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 327: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .

Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 328: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditions

Potential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 329: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 330: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 331: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 332: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman-Rubin Model

Two possible conditions:

- Treatment condition T = 1

- Control condition T = 0

Suppose that we have an individual i .Key assumption: we can imagine a world where individual i is assigned totreatment and control conditionsPotential Outcomes: responses under each condition, Yi (T )

- Response under treatment Yi (1)

- Response under control Yi (0)

Definition: no differences between treatment and control worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 64 / 117

Page 333: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 334: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 335: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 336: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 337: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 338: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 339: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Concrete Example

Job Training Programs:

- Treatment: Receive job training (T = 1)

- Control: No Job training (T = 0)

Response: Income (Dollars per year)

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 65 / 117

Page 340: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Definition of Causation

The individual causal effect of treatment T for individual i is,

Individual Causal Effecti = Yi (1)− Yi (0)

Causal effect:Difference in individual i ’s potential outcomesCompare responses in hypothetical worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 66 / 117

Page 341: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Definition of Causation

The individual causal effect of treatment T for individual i is,

Individual Causal Effecti = Yi (1)− Yi (0)

Causal effect:Difference in individual i ’s potential outcomesCompare responses in hypothetical worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 66 / 117

Page 342: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Definition of Causation

The individual causal effect of treatment T for individual i is,

Individual Causal Effecti = Yi (1)− Yi (0)

Causal effect:Difference in individual i ’s potential outcomesCompare responses in hypothetical worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 66 / 117

Page 343: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Definition of Causation

The individual causal effect of treatment T for individual i is,

Individual Causal Effecti = Yi (1)− Yi (0)

Causal effect:Difference in individual i ’s potential outcomes

Compare responses in hypothetical worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 66 / 117

Page 344: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

A Definition of Causation

The individual causal effect of treatment T for individual i is,

Individual Causal Effecti = Yi (1)− Yi (0)

Causal effect:Difference in individual i ’s potential outcomesCompare responses in hypothetical worlds

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 66 / 117

Page 345: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 346: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:

Response Under Treatment - Response Under ControlJob Training Program:Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 347: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under Control

Job Training Program:Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 348: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:

Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 349: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:

Treatment (Yi (1)) Control (Yi (0))

Person 1 45,000 32,000Person 2 54,000 45,000Person 3 34,000 34,000

Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 350: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:

Treatment (Yi (1)) Control (Yi (0))

Person 1 ? 32,000Person 2 54,000 ?Person 3 34,000 ?

Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 351: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:

Treatment (Yi (1)) Control (Yi (0))

Person 1 ? 32,000Person 2 54,000 ?Person 3 34,000 ?

Fundamental Problem of Causal Inference (Holland (1986)):

It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 352: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fundamental Problem of Causal Inference

Causal Effect:Response Under Treatment - Response Under ControlJob Training Program:

Treatment (Yi (1)) Control (Yi (0))

Person 1 ? 32,000Person 2 54,000 ?Person 3 34,000 ?

Fundamental Problem of Causal Inference (Holland (1986)):It is impossible to observe both Yi (1) and Yi (0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 67 / 117

Page 353: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman Urn Model

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0Y1 Y0

Y1 Y0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 68 / 117

Page 354: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman Urn Model

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0Y1 Y0

Y1 Y0

1

2

3

4

5

6

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 68 / 117

Page 355: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman Urn Model

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0Y1 Y0

Y1 Y0

1

2

3

4

5

6

Treat = 1

Treat = 1

Treat = 1

Treat = 0

Treat = 0

Treat = 0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 68 / 117

Page 356: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Neyman Urn Model

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0Y1 Y0

Y1 Y0

Y1 Y0 Y1 Y0 Y1 Y0Y1 Y0 Y1 Y0 Y1 Y0 Y1 Y0 Y1 Y0 Y1 Y0

1

1 4

4

6

6

2

2

3

3

5

5

Treat = 1

Treat = 1

Treat = 1

Treat = 1 Treat = 1 Treat = 1

Treat = 0

Treat = 0

Treat = 0

Treat = 0 Treat = 0 Treat = 0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 68 / 117

Page 357: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Treatment)

Di : Indicator of treatment intake for unit i

Di =

{1 if unit i received the treatment0 otherwise.

Definition (Outcome)

Yi : Observed outcome variable of interest for unit i . The treatment occurstemporally before the outcomes.

Definition (Potential Outcome)

Y0i and Y1i : Potential outcomes for unit i

Ydi =

{Y1i Potential outcome for unit i with treatmentY0i Potential outcome for unit i without treatment

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 69 / 117

Page 358: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Treatment)

Di : Indicator of treatment intake for unit i

Di =

{1 if unit i received the treatment0 otherwise.

Definition (Outcome)

Yi : Observed outcome variable of interest for unit i . The treatment occurstemporally before the outcomes.

Definition (Potential Outcome)

Y0i and Y1i : Potential outcomes for unit i

Ydi =

{Y1i Potential outcome for unit i with treatmentY0i Potential outcome for unit i without treatment

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 69 / 117

Page 359: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Treatment)

Di : Indicator of treatment intake for unit i

Di =

{1 if unit i received the treatment0 otherwise.

Definition (Outcome)

Yi : Observed outcome variable of interest for unit i . The treatment occurstemporally before the outcomes.

Definition (Potential Outcome)

Y0i and Y1i : Potential outcomes for unit i

Ydi =

{Y1i Potential outcome for unit i with treatmentY0i Potential outcome for unit i without treatment

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 69 / 117

Page 360: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Treatment)

Di : Indicator of treatment intake for unit i

Di =

{1 if unit i received the treatment0 otherwise.

Definition (Outcome)

Yi : Observed outcome variable of interest for unit i . The treatment occurstemporally before the outcomes.

Definition (Potential Outcome)

Y0i and Y1i : Potential outcomes for unit i

Ydi =

{Y1i Potential outcome for unit i with treatmentY0i Potential outcome for unit i without treatment

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 69 / 117

Page 361: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Causal Effect)

Causal effect of the treatment on the outcome for unit i is the differencebetween its two potential outcomes:

τi = Y1i − Y0i

Assumption

Observed outcomes are realized as

Yi = Di · Y1i + (1− Di ) · Y0i so Yi =

{Y1i if Di = 1Y0i if Di = 0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 70 / 117

Page 362: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Causal Effect)

Causal effect of the treatment on the outcome for unit i is the differencebetween its two potential outcomes:

τi = Y1i − Y0i

Assumption

Observed outcomes are realized as

Yi = Di · Y1i + (1− Di ) · Y0i so Yi =

{Y1i if Di = 1Y0i if Di = 0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 70 / 117

Page 363: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Some Useful Terms

Definition (Causal Effect)

Causal effect of the treatment on the outcome for unit i is the differencebetween its two potential outcomes:

τi = Y1i − Y0i

Assumption

Observed outcomes are realized as

Yi = Di · Y1i + (1− Di ) · Y0i so Yi =

{Y1i if Di = 1Y0i if Di = 0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 70 / 117

Page 364: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Realized Outcome:

Yi

Yi = DiY1i+(1-Di)Y0i

Di=1

Di=0

Fundamental Problem of Causal InferenceCannot observe both potential outcomes, so we how can we calculateτi = Y1i − Y0i ?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 71 / 117

Page 365: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Realized Outcome:

Yi

Yi = DiY1i+(1-Di)Y0i

Di=1

Di=0

Fundamental Problem of Causal InferenceCannot observe both potential outcomes, so we how can we calculateτi = Y1i − Y0i ?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 71 / 117

Page 366: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 367: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 368: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 369: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 370: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 371: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference as a Missing Data Problem

Causal inference is difficult because it involves missing data. How can wecalculate τi = Y1i − Y0i ?

Homogeneity is one solution:

I If {Y1i ,Y0i} is constant across individuals, then cross-sectionalcomparisons will recover τi

I If {Y1i ,Y0i} is constant across time, then before and after comparisonswill recover τi

In social phenomenon, unfortunately, homogeneity is very rare.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 72 / 117

Page 372: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 373: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 374: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 375: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 376: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 377: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 378: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 379: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 380: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 73 / 117

Page 381: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 74 / 117

Page 382: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 74 / 117

Page 383: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 74 / 117

Page 384: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 74 / 117

Page 385: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 74 / 117

Page 386: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 75 / 117

Page 387: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 75 / 117

Page 388: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 389: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 390: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 391: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 392: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 393: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 394: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 395: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 396: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 397: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 398: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Assumptions: Be Careful When Defining Treatment

1) There is only one version of the treatment, not T1, T2, ...

- Drug trial

- Private Schooling

- Practical Advice: are there hidden versions of your treatment?(suggests different interpretations)

2) Potential outcomes depend only on my treatment status (Y (1), notY (1, 0, 0, 1, 0, . . . , 0, 1) or Y (T ))

- Survey experiment

- AIDS drug trials

- Practical Advice: design study to avoid spillovers and contamination(unless question of interest, see Nickerson (2008) and Gerber, Green,and Larimer (2008) )

Together: (1) and (2) constitute:SUTVA: Stable Unit Treatment Value AssumptionAlso sometimes referred to as the “No Interference” assumption.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 76 / 117

Page 399: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 400: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 401: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 402: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 403: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 404: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Interference

Let D = {Di ,Dj} be a vector of treatment assignments for two units i (me) and j(you).

How many elements in D?

D = {(Di = 0,Dj = 0), (Di = 1,Dj = 0), (Di = 0,Dj = 1), (Di = 1,Dj = 1)}

How many potential outcomes for unit i?

Y1i (D) =

{Y1i (1, 1)Y1i (1, 0)

Y0i (D) =

{Y0i (0, 1)Y0i (0, 0)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 77 / 117

Page 405: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

How many causal effects for unit i?

τi (D) =

Y1i (1, 1)− Y0i (0, 0)Y1i (1, 1)− Y0i (0, 1)Y1i (1, 0)− Y0i (0, 0)Y1i (1, 0)− Y0i (0, 1)Y1i (1, 1)− Y1i (1, 0)Y0i (0, 1)− Y0i (0, 0)

How many potential outcome are observed for unit i?

Since we only observe one of the four potential outcomes, the missing dataproblem for causal inference is even more severe.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 78 / 117

Page 406: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

How many causal effects for unit i?

τi (D) =

Y1i (1, 1)− Y0i (0, 0)Y1i (1, 1)− Y0i (0, 1)Y1i (1, 0)− Y0i (0, 0)Y1i (1, 0)− Y0i (0, 1)Y1i (1, 1)− Y1i (1, 0)Y0i (0, 1)− Y0i (0, 0)

How many potential outcome are observed for unit i?

Since we only observe one of the four potential outcomes, the missing dataproblem for causal inference is even more severe.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 78 / 117

Page 407: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

How many causal effects for unit i?

τi (D) =

Y1i (1, 1)− Y0i (0, 0)Y1i (1, 1)− Y0i (0, 1)Y1i (1, 0)− Y0i (0, 0)Y1i (1, 0)− Y0i (0, 1)Y1i (1, 1)− Y1i (1, 0)Y0i (0, 1)− Y0i (0, 0)

How many potential outcome are observed for unit i?

Since we only observe one of the four potential outcomes, the missing dataproblem for causal inference is even more severe.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 78 / 117

Page 408: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

How many causal effects for unit i?

τi (D) =

Y1i (1, 1)− Y0i (0, 0)Y1i (1, 1)− Y0i (0, 1)Y1i (1, 0)− Y0i (0, 0)Y1i (1, 0)− Y0i (0, 1)Y1i (1, 1)− Y1i (1, 0)Y0i (0, 1)− Y0i (0, 0)

How many potential outcome are observed for unit i?

Since we only observe one of the four potential outcomes, the missing dataproblem for causal inference is even more severe.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 78 / 117

Page 409: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

The No Interference assumption states that unit i ’s potential outcomes dependson Di , not D:

Y1i (1, 1) = Y1i (1, 0) and Y0i (0, 1) = Y0i (0, 0)

This assumption furthermore allows us to define the effect for unit i asτi = Y1i − Y0i .

No interference is an example of an exclusion restriction. We rely on outsideinformation to rule out the possibility of certain causal effects (eg. you taking thetreatment has no effect on my potential outcomes).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 79 / 117

Page 410: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

The No Interference assumption states that unit i ’s potential outcomes dependson Di , not D:

Y1i (1, 1) = Y1i (1, 0) and Y0i (0, 1) = Y0i (0, 0)

This assumption furthermore allows us to define the effect for unit i asτi = Y1i − Y0i .

No interference is an example of an exclusion restriction. We rely on outsideinformation to rule out the possibility of certain causal effects (eg. you taking thetreatment has no effect on my potential outcomes).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 79 / 117

Page 411: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

The No Interference assumption states that unit i ’s potential outcomes dependson Di , not D:

Y1i (1, 1) = Y1i (1, 0) and Y0i (0, 1) = Y0i (0, 0)

This assumption furthermore allows us to define the effect for unit i asτi = Y1i − Y0i .

No interference is an example of an exclusion restriction. We rely on outsideinformation to rule out the possibility of certain causal effects (eg. you taking thetreatment has no effect on my potential outcomes).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 79 / 117

Page 412: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

The No Interference assumption states that unit i ’s potential outcomes dependson Di , not D:

Y1i (1, 1) = Y1i (1, 0) and Y0i (0, 1) = Y0i (0, 0)

This assumption furthermore allows us to define the effect for unit i asτi = Y1i − Y0i .

No interference is an example of an exclusion restriction. We rely on outsideinformation to rule out the possibility of certain causal effects (eg. you taking thetreatment has no effect on my potential outcomes).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 79 / 117

Page 413: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

Some Examples of Interference:

Contagion

Displacement

Communication

Deterrence

Causal inference in the presence of interference between subjects is an area ofactive research. Specially tailored experimental designs have been developed tostudy these interactions, e.g. Miguel and Kremer (2004) and Sinclair, McConnell,and Green (2012).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 80 / 117

Page 414: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

Some Examples of Interference:

Contagion

Displacement

Communication

Deterrence

Causal inference in the presence of interference between subjects is an area ofactive research. Specially tailored experimental designs have been developed tostudy these interactions, e.g. Miguel and Kremer (2004) and Sinclair, McConnell,and Green (2012).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 80 / 117

Page 415: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

Some Examples of Interference:

Contagion

Displacement

Communication

Deterrence

Causal inference in the presence of interference between subjects is an area ofactive research. Specially tailored experimental designs have been developed tostudy these interactions, e.g. Miguel and Kremer (2004) and Sinclair, McConnell,and Green (2012).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 80 / 117

Page 416: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Potential Outcomes with Interference

Some Examples of Interference:

Contagion

Displacement

Communication

Deterrence

Causal inference in the presence of interference between subjects is an area ofactive research. Specially tailored experimental designs have been developed tostudy these interactions, e.g. Miguel and Kremer (2004) and Sinclair, McConnell,and Green (2012).

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 80 / 117

Page 417: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 81 / 117

Page 418: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

1 Thousand Foot View

2 Power

3 Problems with p-Values

4 Visualization and Quantities of Interest

5 A Preview of Causal Inference

6 Fun With Censorship

7 Neyman-Rubin Model of Causal Inference

8 Complications

9 ATE and Other Estimands

10 Graphical Models

11 Fun With A Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 81 / 117

Page 419: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 420: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditions

What is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 421: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?

Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 422: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 423: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 424: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 425: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 426: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 427: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 428: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 429: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 82 / 117

Page 430: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 431: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 432: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society

- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 433: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims

- What facet of institutionalized racism (or its consequences) causesracial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 434: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 435: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 436: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 437: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 438: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 439: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment

- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 440: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 83 / 117

Page 441: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Back to the Neyman Urn Model

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0

Y1 Y0Y1 Y0

Y1 Y0

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 84 / 117

Page 442: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 443: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:

Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 444: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)

Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 445: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 446: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 447: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 448: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 449: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 85 / 117

Page 450: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 451: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 452: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 453: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 454: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 455: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑

i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 86 / 117

Page 456: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Estimands

Because τi are unobservable, we shift what we are interested to:

Definition (Average Treatment Effect (ATE))

τATE = Average of all treatment potential outcomes −Average of all control potential outcomes

or

τATE =

∑Ni Y1i

N−∑N

i Y0i

N

or

τATE = E [Y1i − Y0i ]

or

τATE = E [τi ]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 87 / 117

Page 457: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Other Estimands

Definition (Average treatment effect on the treated (ATT))

τATT = E [Y1i − Y0i |Di = 1]

Definition (Average treatment effect on the controls (ATC))

τATC = E [Y1i − Y0i |Di = 0]

Definition (Average treatment effects for subgroups)

τATE(X ) = E [Y1i − Y0i |Xi = x ]

or

τATT (X ) = E [Y1i − Y0i |Di = 1,Xi = x ]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 88 / 117

Page 458: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Other Estimands

Definition (Average treatment effect on the treated (ATT))

τATT = E [Y1i − Y0i |Di = 1]

Definition (Average treatment effect on the controls (ATC))

τATC = E [Y1i − Y0i |Di = 0]

Definition (Average treatment effects for subgroups)

τATE(X ) = E [Y1i − Y0i |Xi = x ]

or

τATT (X ) = E [Y1i − Y0i |Di = 1,Xi = x ]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 88 / 117

Page 459: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Other Estimands

Definition (Average treatment effect on the treated (ATT))

τATT = E [Y1i − Y0i |Di = 1]

Definition (Average treatment effect on the controls (ATC))

τATC = E [Y1i − Y0i |Di = 0]

Definition (Average treatment effects for subgroups)

τATE(X ) = E [Y1i − Y0i |Xi = x ]

or

τATT (X ) = E [Y1i − Y0i |Di = 1,Xi = x ]

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 88 / 117

Page 460: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 89 / 117

Page 461: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 89 / 117

Page 462: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 89 / 117

Page 463: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 90 / 117

Page 464: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 90 / 117

Page 465: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi

1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 90 / 117

Page 466: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 91 / 117

Page 467: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 91 / 117

Page 468: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 91 / 117

Page 469: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 91 / 117

Page 470: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 91 / 117

Page 471: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 472: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 473: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 474: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 475: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 476: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 477: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 92 / 117

Page 478: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 479: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 480: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 481: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 482: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 483: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 93 / 117

Page 484: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No causation without manipulation?

Always ask:what is the experiment I would run if I had infinite resources and power?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 94 / 117

Page 485: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No causation without manipulation?

Always ask:what is the experiment I would run if I had infinite resources and power?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 94 / 117

Page 486: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Causal Inference WorkflowCausal Inference Workflow

Data Estimator Estimate

Identification Strategy

Quantity of Interest

Ideal Experiment

Causal Relationship

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 95 / 117

Page 487: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 488: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 489: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 490: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 491: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 492: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 96 / 117

Page 493: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 494: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 495: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 496: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 497: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 498: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]

- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 499: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study

- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 500: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 501: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 502: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 97 / 117

Page 503: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 504: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 505: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 506: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 507: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)

(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 508: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 509: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:

F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 510: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric method

F “Preprocessing” data to make analysis robust to misspecification(2) can be made plausible by:

F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 511: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 512: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:

F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 513: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the model

F Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 514: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 515: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 516: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantity

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 517: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Summary: Regression as a Causal Model

Can regression be also used for causal inference?

Answer: A very qualified yes

For example, can we say that a one unit increase in inequality causes a 5.2point increase in intensity?

To interpret β as a causal effect of X on Y , we need very specific and oftenunrealistic assumptions:

(1) E [Y |X ] is correctly specified as a linear function (linearity)(2) There are no other variables that affect both X and Y (exogeneity)

(1) can be relaxed by:F Using a flexible nonlinear or nonparametric methodF “Preprocessing” data to make analysis robust to misspecification

(2) can be made plausible by:F Including carefully-selected control variables in the modelF Choosing a clever research design to rule out confounding

We will discuss more in a few weeks.

For now while doing diagnostics, it is safest to treat β as a purelydescriptive/predictive quantityStewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 98 / 117

Page 518: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 519: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 520: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 521: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 522: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 523: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 99 / 117

Page 524: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 525: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 526: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 527: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 528: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 529: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 530: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 100 / 117

Page 531: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 532: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 533: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 534: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 535: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 536: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 101 / 117

Page 537: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 102 / 117

Page 538: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 102 / 117

Page 539: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 102 / 117

Page 540: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 102 / 117

Page 541: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 103 / 117

Page 542: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 103 / 117

Page 543: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 103 / 117

Page 544: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders are scary because you can induce dependence

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 104 / 117

Page 545: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 546: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and Z

We can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 547: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door path

A back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 548: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)

Two paths from T to Y here:1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 549: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 550: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on Y

We want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 551: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 105 / 117

Page 552: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 106 / 117

Page 553: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 106 / 117

Page 554: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 106 / 117

Page 555: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 106 / 117

Page 556: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 557: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 558: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 559: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 560: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 561: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 107 / 117

Page 562: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 563: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)

2 X D-separates every path between T and Y that has an incomingarrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 564: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 565: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 566: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 567: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

See also Frontdoor Criterion in the social sciences in work by Glynnand Kashin

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 108 / 117

Page 568: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 569: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 570: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 571: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 572: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 573: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal process

Note that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 574: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 575: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 576: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 109 / 117

Page 577: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 578: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 579: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 580: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 581: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 582: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Next “Week” (Three Classes)

Diagnostics

Unusual and Influential Data → Robust Estimation (Day 1)

Nonlinearity → Generalized Additive Models (Day 2)

Unusual Errors→ Sandwich Standard Errors/Block Bootstrap (Day 3)

Reading:I Fox Chapters 11-13I Optional: Fox Chapter 19 Robust RegressionI Optional: King and Roberts “How Robust Standard Errors Expose

Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.

I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)

I Optional: Angrist and Pishke Chapter 8 (Nonstandard Standard ErrorIssues)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 110 / 117

Page 583: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Fun with a Bundle of Sticks

Sen and Wasow (2016) “Race as a Bundle of Sticks:Designs that Estimate Effects of Seemingly ImmutableCharacteristics” Annual Review of Political Science.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 111 / 117

Page 584: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 112 / 117

Page 585: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 112 / 117

Page 586: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 112 / 117

Page 587: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 112 / 117

Page 588: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 589: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 590: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulated

I without the capacity to manipulate the question is arguably ill-posedand the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 591: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 592: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatment

I everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 593: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfying

I this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 594: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 595: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstable

I there is substantial variance across treatments which is a SUTVAviolation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 596: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 113 / 117

Page 597: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 114 / 117

Page 598: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

The Bundle of Sticks

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 114 / 117

Page 599: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 600: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”

b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 601: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”

c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 602: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 603: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)

I Audit/Correspondence Studies (Pager 2003, Bertrand andMullainathan 2004)

I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 604: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)

I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 605: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)

I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 606: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)

I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 607: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 115 / 117

Page 608: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 2: Within-Group Studies

Approach: identify variation within the racial group along constitutiveelement.

Example: Sharkey (2010) exploiting temporal variation in localhomicides in Chicago to identify a significant neighborhood effect ofproximity to violence on cognitive performance of African-Americanchildren

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 116 / 117

Page 609: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Design 2: Within-Group Studies

Approach: identify variation within the racial group along constitutiveelement.

Example: Sharkey (2010) exploiting temporal variation in localhomicides in Chicago to identify a significant neighborhood effect ofproximity to violence on cognitive performance of African-Americanchildren

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 116 / 117

Page 610: Week 8: Regression in the Social Sciences · Week 8: Regression in the Social Sciences Brandon Stewart1 Princeton November 7 and 9, 2016 1These slides are heavily in uenced by Matt

Concluding Thoughts

We can study race with causal inference, it just takes very careful design.

Stewart (Princeton) Week 8: Regression in the Social Sciences November 7 and 9, 2016 117 / 117