statistics for social and behavioral sciences part iv: causality multivariate regression chapter 11...
TRANSCRIPT
![Page 1: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/1.jpg)
Statistics for Socialand Behavioral Sciences
Part IV: CausalityMultivariate Regression
Chapter 11Prof. Amine Ouazad
![Page 2: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/2.jpg)
Movie Buzz• Can we predict the success of a movie?
1. Avatar (2009)$760,505,847
2. Titanic (1997)$658,672,302
3. The Avengers (2012)$623,279,547
4. The Dark Knight (2008) $533,316,0615. Star Wars: Episode I – The Phantom Menace
(1999)$474,544,677
![Page 3: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/3.jpg)
Data• Box_mil = First run U.S. box office (Millions of $)• MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.• Budget = Production budget (Millions of $)• Starpowr = Index of star power• Sequel = 1 if movie is a sequel, 0 if not• Action = 1 if action film, 0 if not• Comedy = 1 if comedy film, 0 if not• Animated = 1 if animated film, 0 if not• Horror = 1 if horror film, 0 if not• Addict = Trailer views at traileraddict.com• Cmngsoon = Message board comments at comingsoon.net• Fandango = Attention at fandango.com • Cntwait3 = Percentage of Fandango votes that can't wait to see.
![Page 4: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/4.jpg)
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
PART II. DESCRIBING DATA
PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL
STATISTICS
PART IV. : CORRELATION AND CAUSATION: TWO GROUPS,
REGRESSION ANALYSIS
Week 1
Weeks 2-4
Weeks 5-9
Weeks 10-14
Multivariate regression now!
Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method.
Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical RuleBivariate sample statistics: Correlation, Slope
Four Steps of “Thinking Like a Statistician”Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling
Biases: Nonresponse bias, Response bias, Sampling bias
![Page 5: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/5.jpg)
Coming up
• “Comparison of Two Groups”Last week.
• “Univariate Regression Analysis”Last Saturday, Section 9.5.
• “Association and Causality: Multivariate Regression”Last Saturday, Chapter 10.Today, Tomorrow, Chapter 11.
• “Randomized Experiments and ANOVA”.Wednesday. Chapter 12.
• “Robustness Checks and Wrap Up”.Last Thursday.
![Page 6: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/6.jpg)
Outline
1. Multivariate regression
2. Interpreting coefficientsCeteris Paribus
3. Standardized Coefficient
4. Multiple Correlation and R Squared
Next time: Multivariate regression: the F test (Continued)
![Page 7: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/7.jpg)
Data: Variables
• y Box = First run U.S. box office ($)• x1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.
• x2 Budget = Production budget ($Mil)
• x3 Starpowr = Index of star power
• x4 Sequel = 1 if movie is a sequel, 0 if not
• x5 Action = 1 if action film, 0 if not
• x6 Comedy = 1 if comedy film, 0 if not
• x7 Animated = 1 if animated film, 0 if not
• x8 Horror = 1 if horror film, 0 if not
• x9 Addict = Trailer views at traileraddict.com
• x10 Cmngsoon = Message board comments at comingsoon.net
• x11 Fandango = Attention at fandango.com
• x12 Cntwait3 = Percentage of Fandango votes that can't wait to see.
![Page 8: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/8.jpg)
Multivariate Regression
• With variables x1, x2, …, x12.• We are trying to get the true impact:
b1 of variable x1 on y. b2 of variable x2 on y. … b12 of variable xK on y.
• True model: y = a + b1 x1 + b2 x2 + b3 x3 + … + b12 x12 + e
We would get those if we had the population of all possible movies.
![Page 9: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/9.jpg)
• Instead we estimate b1, b2, …, bK on the sample:– Minimizing the sum of the squared prediction
error !
• With these we can predict the success of a movie:
Multivariate Regression
![Page 10: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/10.jpg)
Sampling Distribution of b3
• We only observe one coefficient estimate b3, because we have only one sample.
• But across all possible samples, the sampling distribution of b3 is bell-shaped.
• Hence we can design a test:• H0: “ b3 = 0 ”
follows a t distribution with N – (K + 1) degrees of freedom.
Under H0,
![Page 11: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/11.jpg)
Hypothesis testing for H0 : “b3=0”
• Reject the null hypothesis at 95% if:
– The absolute value of the t statistic is greater than the t score with N – (K+1) degrees of freedom at 95%.
– Equivalently, if the p value is lower than 0.05.
There are as many null hypothesis as there are coefficients to estimate :
Here, there are
![Page 12: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/12.jpg)
Outline
1. Multivariate regression
2. Interpreting coefficientsCeteris Paribus
3. Standardized Coefficient
4. Multiple Correlation and R Squared
Next time: Multivariate regression (Continued)
![Page 13: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/13.jpg)
Ceteris Paribus=“All other things equal”
• “All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?
Increase in starpower (variable x3) all other things equal.Keep x1,x2,x4,x5,x6,x7,x8,x9,x10,x12 constant ! And change x3.
Increase in x3
(Star power)
![Page 14: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/14.jpg)
Ceteris Paribus=“All other things equal”
• “All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?
Increase in budget(variable x2) all other things equal.Keep x1,x3,x4,x5,x6,x7,x8,x9,x10,x12 constant ! And change x3.
Increase in x2
(Budget)by 1 million $
![Page 15: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/15.jpg)
![Page 16: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/16.jpg)
Reading the coefficients
• An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.
• An action movie has on average all other things equal a lower box office outcome, by $12 million.
• An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.
We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.
![Page 17: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/17.jpg)
Which coefficients arestatistically significant?
• x1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. ❏❏❏
• x2 Budget = Production budget ($Mil) ❏❏❏
• x3 Starpowr = Index of star power ❏❏❏
• x4 Sequel = 1 if movie is a sequel, 0 if not ❏❏❏
• x5 Action = 1 if action film, 0 if not ❏❏❏
• x6 Comedy = 1 if comedy film, 0 if not ❏❏❏
• x7 Animated = 1 if animated film, 0 if not ❏❏❏
• x8 Horror = 1 if horror film, 0 if not ❏❏❏
• x9 Addict = Trailer views at traileraddict.com ❏❏❏
• x10 Cmngsoon = Message board comments at comingsoon.net ❏❏❏
• x11 Fandango = Attention at fandango.com ❏❏❏
• x12 Cntwait3 = Percentage of Fandango votes that can't wait to see. ❏❏❏
At 1
0%At
5%
At 1
%
Read the p value !!! Or compare the t stat to the t score with N-13 degrees of freedom
![Page 18: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/18.jpg)
With Budget
![Page 19: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/19.jpg)
Without Budget
![Page 20: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/20.jpg)
Budget and Can’t Wait to See the movie !
• Without budget among the variables, the popularity cntwait3 has a bigger impact…
• Than with budget included.
Budget
Cntwait3
Box office (box_mil)
We know that Budget and Cntwait3 are correlated (an arrow either in one direction or in the other, or both) because including Budget affects the coefficient of Cntwait3
Other variables
![Page 21: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/21.jpg)
Outline
1. Multivariate regression
2. Interpreting coefficientsCeteris Paribus
3. Standardized Coefficient
4. Multiple Correlation and R Squared
Next time: Multivariate regression (Continued)
![Page 22: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/22.jpg)
Standardized CoefficientWe just saw:• An increase in budget by 1 million $ leads to a
rise in box office $ of 0.144 million $, all other things equal.
But is 1 million $ big? Is 0.144 million $ big?
![Page 23: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/23.jpg)
• “a 1 standard deviation increase in x2, leads to a …. % standard deviation increase in y.”
• Standard deviation of x2 (budget): 42.9.• Standard deviation of y (box office outcome):
17.5.• Coefficient of budget: 0.144.• Fill in the blank.
Standardized Coefficient
![Page 24: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/24.jpg)
Standardized Coefficient
We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.
• An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.
• An action movie has on average all other things equal a lower box office outcome, by $12 million.
• An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.
![Page 25: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/25.jpg)
Outline
1. Multivariate regression
2. Interpreting coefficientsCeteris Paribus
3. Standardized Coefficient
4. Multiple Correlation and R Squared
Next time: Multivariate regression (Continued)
![Page 26: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/26.jpg)
R Squared
• How good are we at predicting the success of a movie?
• The multiple correlation is 1 if we are absolutely correct in our predictions. ei=0 for every movie.
• The multiple correlation is 0 if we do not better than taking the average. ei =
![Page 27: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/27.jpg)
ESS/TSS = 13356/18665 = 0.7156
![Page 28: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/28.jpg)
Wrap up
• We can use a number of variables to explain a dependent variable.
• Multiple regression accounts for multiple causes.• The coefficients minimize the sum of the squared
residuals.• Understand the t test and the p value.• The coefficients should be understood “all other things
equal” or “ceteris paribus”.• The standardized coefficients express effects in terms of
standard deviations.• The R squared between 0 and 100% measures how
accurate our predictions are.
![Page 29: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad](https://reader030.vdocuments.site/reader030/viewer/2022032705/56649dda5503460f94ad0110/html5/thumbnails/29.jpg)
Coming up:
• Schedule for next week:• Chapter on “Association and Causality”, and “Multivariate Regression”.• Make sure you come to sessions and recitations.
Sunday MondayMultivariate Regression
TuesdayMultivariate RegressionThe F test
WednesdayRandomized Experiments and ANOVA
ThursdayWrap up
Recitation Evening session 7.30pmWest Administration 002
Usual class12.45pmUsual room
Evening session7.30pmWest Administration 001
Usual class12.45pmUsual room