stata statistics

35
Regression in Stata Ista Zahn Harvard MIT Data Center January 24 2013 e Institute for Quantitative Social Science at Harvard University Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 1 / 35

Upload: izahn

Post on 27-Jan-2015

198 views

Category:

Technology


6 download

DESCRIPTION

Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation. Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata

TRANSCRIPT

Page 1: Stata statistics

Regression in Stata

Ista Zahn

Harvard MIT Data Center

January 24 2013

�e Institutefor Quantitative Social Scienceat Harvard University

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 1 / 35

Page 2: Stata statistics

Outline

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 2 / 35

Page 3: Stata statistics

Introduction

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 3 / 35

Page 4: Stata statistics

Introduction

Organization

Please feel free to ask questions at any point if they are relevant tothe current topic (or if you are lost!)There will be a Q&A after class for more specific, personalizedquestionsCollaboration with your neighbors is encouragedIf you are using a laptop, you will need to adjust paths accordinglyMake comments in your Do-file rather than on hand-outsSave on flash drive or email to yourself

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 4 / 35

Page 5: Stata statistics

Introduction

Copy the workshop materials to your home directory

Log in to an Athena workstation using your Athena user name andpasswordClick on the “Ubuntu” button on the upper-left and type “term” asshown below

Click on the “Terminal” icon as shown aboveIn the terminal, type this line exactly as shown:

cd; wget http://tinyurl.com/stata-stats-zip; unzip stata-stats-zip

If you see “ERROR 404: Not Found”, then you mistyped the command– try again, making sure to type the command exactly as shown

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 5 / 35

Page 6: Stata statistics

Introduction

Launch Stata on Athena

To start Stata type these commands in the terminal:

add stataxstata

Open up today’s Stata scriptIn Stata, go to Window => New do file => OpenLocate and open the StatStatistics.do script in the StataStatisticsfolder in your home directory

I encourage you to add your own notes to this file!

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 6 / 35

Page 7: Stata statistics

Introduction

Today’s Dataset

We have data on a variety of variables for all 50 statesPopulation, density, energy use, voting tendencies, graduation rates,income, etc.We’re going to be predicting SAT scoresUnivariate Regression: SAT scores and Education ExpendituresDoes the amount of money spent on education affect the mean SATscore in a state?Dependent variable: csatIndependent variable: expense

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 7 / 35

Page 8: Stata statistics

Introduction

Opening Files in Stata

Look at bottom left hand corner of Stata screen – This is the directoryStata is currently reading fromFiles are located in the StataDatMan folderStart by changing directory and loading the data

// change directorycd "C:/Users/dataclass/Desktop/StataStatistics"// use dir to see what is in the directory:dir// tell Stata to use the states data setuse states.dta

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 8 / 35

Page 9: Stata statistics

Introduction

Steps for Running Regression

1 Examine descriptive statistics2 Look at relationship graphically and test correlation(s)3 Run and interpret regression4 Test regression assumptions

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 9 / 35

Page 10: Stata statistics

Univariate regression

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 10 / 35

Page 11: Stata statistics

Univariate regression

Univariate Regression: Preliminaries

We want to predict csat scores from expenseFirst, let’s look at some descriptives

// generate summary statistics for csat and expensesum csat expense// look at codebokcodebook csat expense

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 11 / 35

Page 12: Stata statistics

Univariate regression

Univariate Regression

Look at scatterplots, compute correlation matrix, and regress SATscores on expenditures

// graph expense by csattwoway scatter expense csat

// correlate csat and expensepwcorr csat expense, star(.05)

// run the regressionregress csat expense

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 12 / 35

Page 13: Stata statistics

Univariate regression

Linear Regression Assumptions

Assumption 1: Normal DistributionThe errors of regression equation are normally distributedAssumption 2: Homoscedasticity (The variance around the regressionline is the same for all values of the predictor variable)Assumption 3: Errors are independentAssumption 4: Relationships are linear

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 13 / 35

Page 14: Stata statistics

Univariate regression

Homoscedasticity

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 14 / 35

Page 15: Stata statistics

Univariate regression

Testing Assumptions: Normality

A simple histogram of the residuals can be informative

// graph the residual values of csatpredict resid, residualhistogram resid, normal

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 15 / 35

Page 16: Stata statistics

Univariate regression

Testing Assumptions: Homoscedasticity

rvfplot

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 16 / 35

Page 17: Stata statistics

Multiple regression

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 17 / 35

Page 18: Stata statistics

Multiple regression

Multiple Regression

Just keep adding predictorsLet’s try adding some predictors to the model of SAT scoresincome :: % students taking SATspercent :: % adults with HS diploma (high)

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 18 / 35

Page 19: Stata statistics

Multiple regression

Multiple Regression Preliminaries

As before, start with descriptive statistics and correlations

// descriptive statisticssum income percent high

// generate correlation matrixpwcorr csat expense income percent high

// regress csat on exense, income, percent, and high\regress csat expense income percent high

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 19 / 35

Page 20: Stata statistics

Multiple regression

Exercise 1: Multiple Regression

Open the datafile, states.dta.1 Select a few variables to use in a multiple regression of your own.

Before running the regression, examine descriptive of the variables andgenerate a few scatterplots.

2 Run your regression3 Examine the plausibility of the assumptions of normality and

homogeneity

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 20 / 35

Page 21: Stata statistics

Interactions

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 21 / 35

Page 22: Stata statistics

Interactions

Interactions

What if we wanted to test an interaction between percent & high?Option 1: generate product terms by hand

// generate product of percent and highgen percenthigh = percent*highregress csat expense income percent high percenthigh

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 22 / 35

Page 23: Stata statistics

Interactions

Interactions

What if we wanted to test an interaction between percent & high?Option 2: Let Stata do your dirty work

// use the # sign to represent interactionsregress csat percent high c.percent#c.high// same as . regress csat c.percent##high

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 23 / 35

Page 24: Stata statistics

Interactions

Categorical Predictors

For categorical variables, we first need to dummy codeUse region as example

Option 1: create dummy codes before fitting regression model

// create region dummy codes using tabtab region, gen(region) // could also use gen / replace

//regress csat on regionregress csat region1 region2 region3

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 24 / 35

Page 25: Stata statistics

Interactions

Categorical Predictors

For categorical variables, we first need to dummy codeUse region as example

Option 2: Let Stata do it for you

// regress csat on region using fvvarlist syntax// see help fvvarlist for detailsregress csat i.region

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 25 / 35

Page 26: Stata statistics

Interactions

Exercise 2: Regression, Categorical Predictors, &Interactions

Open the datafile, states.dta.1 Add on to the regression equation that you created in exercise 1 by

generating an interaction term and testing the interaction.2 Try adding a categorical variable to your regression (remember, it will

need to be dummy coded). You could use region or high25, orgenerate a new categorical variable from one of the continuousvariables in the dataset.

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 26 / 35

Page 27: Stata statistics

Exporting and saving results

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 27 / 35

Page 28: Stata statistics

Exporting and saving results

Saving and exporting regression tables

Usually when we’re running regression, we’ll be testing multiplemodels at a timeCan be difficult to compare resultsStata offers several user-friendly options for storing and viewingregression output from multiple modelsFirst, download the necessary packages:

* install outreg2 packagefindit outreg2

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 28 / 35

Page 29: Stata statistics

Exporting and saving results

Saving and replaying

You can store regression model results in Stata

// fit two regression models and store the resultsregress csat expense income percent highestimates store Model1regress csat expense income percent high i.regionestimates store Model2

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 29 / 35

Page 30: Stata statistics

Exporting and saving results

Saving and replaying

Stored models can be recalled

// Display Model1estimates replay Model1

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 30 / 35

Page 31: Stata statistics

Exporting and saving results

Saving and replaying

Stored models can be compared

// Compare Model1 and Model2 coefficientsestimates table Model1 Model2

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 31 / 35

Page 32: Stata statistics

Exporting and saving results

Exporting into Excel

Avoid human error when transferring coefficients into tablesExcel can be used to format publication-ready tables

outreg2 [Model1 Model2] using csatprediction.xls, replace

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 32 / 35

Page 33: Stata statistics

Wrap-up

Topic

1 Introduction

2 Univariate regression

3 Multiple regression

4 Interactions

5 Exporting and saving results

6 Wrap-up

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 33 / 35

Page 34: Stata statistics

Wrap-up

Help Us Make This Workshop Better

Please take a moment to fill out a very short feedback formThese workshops exist for you–tell us what you need!ttp://tinyurl.com/StataRegressionFeedback

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 34 / 35

Page 35: Stata statistics

Wrap-up

Additional resources

training and consultingIQSS workshops:http://projects.iq.harvard.edu/rtc/filter_by/workshopsIQSS statistical consulting: http://rtc.iq.harvard.edu

Stata resourcesUCLA website: http://www.ats.ucla.edu/stat/Stata/Great for self-studyLinks to resources

Stata website: http://www.stata.com/help.cgi?contentsEmail list: http://www.stata.com/statalist/

Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 35 / 35