statistics for social and behavioral sciences session #5: the regression line (agresti and finlay,...

Download Statistics for Social and Behavioral Sciences Session #5: The Regression Line (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad

If you can't read please download the document

Upload: clinton-henderson

Post on 24-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Statistics for Social and Behavioral Sciences Session #5: The Regression Line (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad
  • Slide 2
  • Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 This is where we talk about Zmapp and Ebola! Firenze or Lebanese Express? Where we are right now! Describing associations between two variables
  • Slide 3
  • Last Session Descriptive statistics summarize data, to make it easier to assimilate the information. Measuring the distribution of a variable Mean, Median. Range, standard deviation. Applies both to bell-shaped and non bell-shaped distributions (e.g. the superstar distribution). Bell-shaped distributions. Empirical rule applies! Measuring associations Contingency table. Scatter plot.
  • Slide 4
  • Outline 1.Scatter plot, linear relationship Unemployment and Crime 1.The regression line What is the relationship between height and weight? 2.Warning: Correlation is not causation Spurious relationships Next session:Bivariate analysis Chapter 9 of A&F, continued
  • Slide 5
  • Unemployment Crime ? Is there really a link? ATLANTIC CITY - "With the layoffs the city is going to have, we'll have to expect that increase in crime." With an increase in unemployment and crime typically going hand-in-hand, Atlantic City PBA President Paul Barbere believes a challenging time lies ahead for the Atlantic City Police Department. That's going require us to respond to more calls for service, more calls for services requires more time out of service for our patrol units, with fewer patrol units, it's going to be difficult," said Barbere. This potential spike in crime comes during a time when Barbere says the department is already short-handed. "With the police department, we're running about 30 men and women short of what the ordinance calls for."
  • Slide 6
  • Unemployment Crime ? On the boardwalk, the potential for more crime has the valuable tourist the city relies on questioning what lies ahead. "They should have a plan designed for that, because they certainly don't want to dissuade people from coming here," said Yvette Dilworth of Queens, New York. "I don't know what Atlantic City is going to do to prepare for that but obviously when you're losing jobs the crime rate could come up," said Chris Mascioli of Camden County. "So yeah I'm concerned about it." In addition to the potential increase in calls stemming from unemployment, police will also have to keep an eye on the newly vacant casinos. "We'll have to maintain a certain staff to keep mechanicals going and to ensure the integrity and safety of the buildings themselves. That's not to say people won't try to break in," said Barbere And even with less officers and more unemployment in the city, Barbere is confident the department is capable of rising to the challenge. "The men and women of the Atlantic City Police Department are well trained and have been dealing with this staffing for sometime now, said Barbere. So it's nothing they can't handle."
  • Slide 7
  • United States data Data set: County Characteristics 2000-2007. Observation: County. Number of observations? Variables: Unemployed persons, 2005. Number of Murders reported to police, 2004. Comments? Self Check Observational data Experimental data Unemployed persons Categorical variable Quantitative variable Unemployed persons Discrete variable Continuous variable Number of murders Categorical variable Quantitative variable Number of murders Discrete variable Continuous variable Survey data Online data Administrative data
  • Slide 8
  • Scatter plot Number of murders reported to police Number of observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max: 1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133 Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236 P25:285P75:1683 Which is the response variable and which is the explanatory variable?
  • Slide 9
  • Distribution of Murders Kind of distribution Bell shaped Superstar distribution (Spotify) The Empirical Rule applies True False County Name Murders in 2004 Los Angeles County 1038 Wayne County 415 Harris County 346 Philadelphia County 330 Maricopa County 281 Dallas County 278 Baltimore city 276
  • Slide 10
  • Scatter plot Number of murders reported to police Number of observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max: 1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133 Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236 P25:285P75:1683
  • Slide 11
  • Linear Relationship? y = + x Murders = + Unemployed + 20,000 unemployed + 20,000 unemployed An increasing relationship, >0
  • Slide 12
  • What a Linear Relationship Implies A increase in the number of unemployed raises the number of murders by * the increase. A decline in the number of unemployed raises the number of murders by * the decline. An increase in the number of unemployed by, say, 10,000, raises the number of murders by the same amount regardless of whether there were initially 0 murders or 300 murders. No gang formation? A decline in the number of unemployed by, say, 10,000, lowers the number of murders by the same amount regardless of whether there were initially 0 murders or 300 murders. Shouldnt it be tougher to lower the number of murders than to raise it? This is a model, a simplification of the world
  • Slide 13
  • What we can do with a linear relationship Extrapolate Predict. With more local data (census block, census tract, ZIP code level) With individual data. (Minority report style, possible with Danish or Swedish data). Interpolate Fill in the gaps. When data is missing.
  • Slide 14
  • The Los Angeles Police Department, like many urban police forces today, is both heavily armed and thoroughly computerised. The Real-Time Analysis and Critical Response Division in downtown LA is its central processor. Rows of crime analysts and technologists sit before a wall covered in video screens stretching more than 10 metres wide. Multiple news broadcasts are playing simultaneously, and a real-time earthquake map is tracking the regions seismic activity. Half-a-dozen security cameras are focused on the Hollywood sign, the citys icon. In the centre of this video menagerie is an oversized satellite map showing some of the most recent arrests made across the city a couple of burglaries, a few assaults, a shooting. On a slightly smaller screen the divisions top official, Captain John Romero, mans the keyboard and zooms in on a comparably micro-scale section of LA. It represents just 500 feet by 500 feet. Over the past six months, this sub-block section of the city has seen three vehicle burglaries and two property burglaries an atypical concentration. And, according to a new algorithm crunching crime numbers in LA and dozens of other cities worldwide, its a sign that yet more crime is likely to occur right here in this tiny pocket of the city. The algorithm at play is performing whats commonly referred to as predictive policing. Using years and sometimes decades worth of crime reports, the algorithm analyses the data to identify areas with high probabilities for certain types of crime, placing little red boxes on maps of the city that are streamed into patrol cars. Burglars tend to be territorial, so once they find a neighbourhood where they get good stuff, they come back again and again, Romero says. And that assists the algorithm in placing the boxes. The dashboard for New York Police Department's 'Domain Awareness System'. Photograph: Shannon Stapleton/Reuters
  • Slide 15
  • Outline 1.Scatter plot, linear relationship Back to height and weight. 1.The regression line What is the relationship between height and weight? 2.Warning: Correlation is not causation Spurious relationships Next session:Bivariate analysis Chapter 9 of A&F, continued
  • Slide 16
  • Finding the regression line Any line is imperfect
  • Slide 17
  • Finding the regression line Which line is the right one? A line is entirely determined by the choice of and . An essential formula. Notice the difference between b and , between a and . x is the explanatory variable y is the response variable If y increases when x increases, then b>0 If y decreases when x increases, then b