skillshare - regression analysis for data journalism
TRANSCRIPT
![Page 1: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/1.jpg)
(AN INTRODUCTION)
REGRESSION ANALYSIS FOR DATA-JOURNALISM
Camila SalazarSchool of Data Fellow
@milamila07
![Page 2: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/2.jpg)
Outline1. Target audience2. A step beyond descriptive statistics3. What is regression analysis?4. Example: the effect of education on wages5. Other types of regression analysis useful in data
journalism.6. Using regression models in data journalism
![Page 3: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/3.jpg)
TARGET AUDIENCE
![Page 4: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/4.jpg)
Target audience• Data journalists
• School of Data Fellows
• People with basic knowledge of statistics
• Journalism students
![Page 5: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/5.jpg)
A STEP BEYOND DESCRIPTIVE STATISTICS
![Page 6: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/6.jpg)
So you are in the newsroom...There’s a big debate in you country about the importance of education
Your editor asks you to make a story about the importance of education
![Page 7: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/7.jpg)
First step: descriptive statisticsYou find data about education in your country and start
calculating the descriptive statistics.
![Page 8: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/8.jpg)
Descriptive statisticsWith descriptive statistics you find:
-How many people has a college degree.
-Unemployment according to the level of education.
![Page 9: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/9.jpg)
And...You interview young people that are still in highschool that don’t want to go to college. And you want to convince them with your story how could they improve their future earnings if they go to college.
You can’t answer this question using descriptive statistics :(
![Page 10: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/10.jpg)
But...You can calculate how much an extra year of schooling increases wages using regression analysis!
![Page 11: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/11.jpg)
WHAT IS REGRESSION ANALYSIS?
![Page 12: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/12.jpg)
What is regression analysis?
Regression analysis is a statistical tool for the
investigation of relationships between variables.
![Page 13: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/13.jpg)
What is regression analysis?
It helps you explain how the value of a dependent
variable (Y) changes when and independent variable
(X) is varied, holding all other variables fixed.
![Page 14: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/14.jpg)
What is regression analysis?
For example:
Health (Y)
Vegetables consumption (X), exercise (X), sleep (X)
dependent variableindependent variables
![Page 15: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/15.jpg)
The linear regression It’s a method for modeling the linear relationship between a dependent variable Y and one or more explanatory variables.
dependent variable independent
variable
error term
coefficient
We are interested in estimating B (the
coefficient). It captures the effect X has on Y,
holding all other factors fixed.
![Page 16: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/16.jpg)
The linear regressionFor example you want to explain the effect of education on
wages.
Wage EducationExperience
Variation in wage that has to do with educationVariation in wage that has
to do with experience
![Page 17: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/17.jpg)
What is a linear regression?• You have to formulate a hypothesis about the
relationships of interest. • Have some theory behind your assumptions.• There are some essential assumptions and
statistical properties of the regression that you have to consider. Wage
![Page 18: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/18.jpg)
EXAMPLE: THE EFFECT OF EDUCATION ON
WAGES
![Page 19: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/19.jpg)
Example• Database with 994 observations. • 3 variables: wage (in dollars), experience, years of
education.• The equation to estimate:
Wage
![Page 20: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/20.jpg)
Example
Wage
![Page 21: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/21.jpg)
Example: coefficients
Wage
An additional year of education increases wage by $161.68, holding all other factors fixed.
An additional year of experience increases wage by $16.54, holding all other factors fixed.
![Page 22: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/22.jpg)
Example: p-value
Wage
P-Value
But, what is the p-value?
![Page 23: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/23.jpg)
Example: p-value
Wage
With statistics you can’t be 100% certain.
A relatively simple way to interpret P values is to think of them as representing how likely a result would occur by chance.
![Page 24: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/24.jpg)
Example: p-value
Wage
Null-hypothesis: is a hypothesis which the researcher tries to disprove, reject or nullify.
“Education has NO explanatory power over wages”“Men are NOT taller than women on average”
To test the null-hypothesis we use the p-value.
![Page 25: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/25.jpg)
Example: p-value
Wage
The p-value is the probability of being wrong when rejecting the null hypothesis
If your p-value is small < 0.05 you have strong evidence to reject the null hypothesis.
“Men are significantly taller than women, p=0.01.” That means there is a 1% chance that men are NOT actually taller than women and this result happened only because of random chance.
![Page 26: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/26.jpg)
Example
Wage
P-Value
It tells you if the coefficient is statistically significant.With a low p-value (less than 10%, 5% or 1%) you can reject the null hypothesis that the coefficient is equal to zero (it has no explanatory power). In this case,
the coefficients are significant. That means that education and experience have explanatory power on wage.
![Page 27: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/27.jpg)
Example
Wage
R-squared: This indicates how well the explanatory variables explain the variability of the dependent variable.
In this case: 33.8% of the variability of wage is explained by the years of education and years of
experience.
![Page 28: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/28.jpg)
OTHER TYPES OF REGRESSION ANALYSIS
![Page 29: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/29.jpg)
The logistic regression
Wage
Imagine you want to estimate the probability that a person with a college degree is employed.
The linear regression wouldn’t be very useful.
![Page 30: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/30.jpg)
The logistic regression
Wage
Is a regression model where the dependent variable (Y) is categorical. For example (binary):
1= unemployed, 0= employedIt is used to estimate the probability of a binary response based
on one or more independent variables.
![Page 31: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/31.jpg)
The logistic regression
Wage
Explanatory variables:
-Age-Education-Family income-Ocuppation
Logistic regression
Employed
Unemployed
The model would tell you, for example, that a person with a college degree is three times more likely to be employed that a person that only went to highschool.
![Page 32: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/32.jpg)
The logistic regression
Wage
• The coefficients can not be interpreted as the rate of change in the dependent variable.
• You check the sign of the coefficients.
• You can calculate marginal effects or odds ratio (logit).
![Page 33: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/33.jpg)
USING REGRESSION MODELS IN DATA
JOURNALISM
![Page 34: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/34.jpg)
Some examples"Does School Pay Off? How Much?" - El Financiero (Costa Rica),
winner of the Data Journalism Awards 2014.
http://www.elfinancierocr.com/gnfactory/especiales/2015/calculadorasalarial/
Wage
![Page 35: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/35.jpg)
Some examples“Presidential Pardons Heavily Favor Whites” - ProPublica
http://www.propublica.org/article/shades-of-mercy-presidential-forgiveness-heavily-favors-whites
Methodology: http://www.propublica.org/article/how-propublica-analyzed-pardon-data
Wage
![Page 36: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/36.jpg)
Some advice• Statistical analysis can be complex. If you’re not
sure find advice with an expert! • Be transparent with your methodology.• Study a lot! • https://www.coursera.org/ Free courses!
Wage
![Page 37: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/37.jpg)
References-Wooldridge (2010). Introductory Econometrics
-Long (1997). Regression models for categorical and limited dependent variables
-Costa Rica National Survey of Income and Spending (2004).
Wage
![Page 38: Skillshare - Regression Analysis for Data Journalism](https://reader031.vdocuments.site/reader031/viewer/2022022414/587333151a28ab596c8b6fa3/html5/thumbnails/38.jpg)
THANKS :) @milamila07
schoolofdata.org