introduction to inferential statistics. taking out the “loosey-goosey” so far we’ve assessed...

21
Introduction to Inferential Statistics

Upload: colleen-byrd

Post on 03-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Introduction toInferential Statistics

Page 2: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Taking out the “loosey-goosey”• So far we’ve assessed relationships between variables two ways:

– Categorical variables: tables and proportions (percentages)

– Continuous variables: scattergrams and simple correlation (r)

• Alas, results are usually less extreme than those above. What if 55 percent of officers are high stress and 45 percent low stress? What if the correlation coefficient (r) between income and crime is -.2 (r2 of .04)? Would we really want to stick our necks out and confirm the hypotheses? What would be the chance that we were wrong?

Higher rank more stress

Higher income less crime

r = -.6r2 = .36

Page 3: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Inferential statistics• Inferential statistics are an extension of procedures that we’ve already used

– Provide far more precise assessments of relationships– Allow us to properly “infer” (project) our results to populations– Called “test” statistics because they are used to test hypotheses

• Examples of inferential statistics– Categorical variables: Chi-Square (X2)– Combination of categorical dependent and continuous independent variable

• Difference between the means test (t statistic)– Continuous variables

• Regression (r2 and R2)• b statistic, generated through regression analysis

– Combination of nominal and continuous variables• Logistic regression, generates b and exp(b) (b exponentiated, a.k.a. odds ratio)

• Requirements– Must use probability sampling techniques (e.g., random sampling)– “Parametric” inferential statistics, including r2, b and t

• Variables must be continuous and normally distributed in the population– Non-parametric statistics

• Variables need not be normally distributed. We’ll cover one – Chi-Square (X2).

Page 4: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Some statistics used to test relationshipsProcedure Level of

MeasurementStatistic Interpretation

Regression All variablescontinuous

r2, R2

b

Proportion of change in the dependent variable accounted for by change in the independent variable. R2 denotes cumulative effect of multiple independent variables.Unit change in the dependent variable caused by a one-unit change in the independent variable

Logisticregression

DV nominal & dichotomous,IV’s nominal or continuous

bexp(B)

Don’t tryOdds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Range 0 to infinity; 1 denotes even odds, or no relationship. Higher than 1 means positive relationship, lower negative relationship. Use percentage to describe likelihood of effect.

Chi-Square All variables categorical (nominal or ordinal)

X2 Reflects difference between Observed and Expected frequencies. Use table to determine if coefficient is sufficiently large to reject null hypothesis

Difference between means

IV dichotomous, DV continuous

t Reflects magnitude of difference. Use table to determine if coefficient is sufficiently large to reject null hypothesis.

Page 5: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

General procedure• Types of hypotheses

– Working hypothesis – what a regular hypothesis is called– Null hypothesis – its opposite: the presumption that any apparent

relationship between variables is caused by chance. • Draw one or more samples and code the independent and dependent variables• Use a test statistic to assess the working hypothesis

– The computer calculates a coefficient for the test statistic (e.g., r2 = .20)

– These coefficients are the sum of two components

• “Systematic” variance: The actual, “systematic” relationship between variables

• “Error” variance: An apparent relationship, caused by sampling error. It shrinks as sample size increases.

Errorvariance

Systematicvariance - the

“real” relationship

The big question

Once we remove the error component, is enough “real” relationship left to reject the

null hypothesis?

Page 6: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Test statistics and the null hypothesis• To reject the null hypothesis, the test statistic coefficient (e.g., r2 = .20) must be sufficiently large,

after subtracting sampling error, to reject the null hypothesis

• How much “room” is required? Enough to yield a probability of less than five in one-hundred (< .05) that the relationship between variables was produced by chance.

– If the computer decides that the coefficient is sufficiently large it will award at least one asterisk. The relationship between variables is “statistically significant” and the null hypothesis (no relationship) is FALSE.

– If the coefficient is too small, no asterisk (*) is awarded. The association between variables is deemed “non-significant” and the null hypothesis is TRUE. Working hypotheses that depend on this relationship must be rejected.

• For significant relationships, one to three asterisks usually appear next to the test statistic’s coefficient (e.g., .25*, .36**, .41***). More asterisks = greater confidence that a relationship is systematic – not the product of chance.

* Probability less than 5 in 100 that a coefficient was produced by chance (p< .05)

** Probability less than 1 in 100 that a coefficient was produced by chance (p< .01)

*** Probability less than 1 in 1,000 that a coefficient was produced by chance (p< .001)

• Instead of asterisks, sometimes the actual probability that a coefficient was produced by chance are given, usually in a column labeled “p”.

– Again, significant relationships are denoted by p’s less than .05

Good

Better

Best

Page 7: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

• Probabilities (that the null hypothesis is true) are the most common way to evaluate relationships.– The smaller the probability, the more likely that the null hypothesis (meaning, no

relationship) is false, meaning that the greater the likelihood that the working hypothesis is true

– But this process has been criticized for suggesting misleading results. (Click here for a summary of the arguments.)

• We normally use p values to accept or reject null hypotheses. Its real meaning is subtle:– Formally, a p <.05 means that, if an association between variables was tested an

infinite number of times, a coefficient as large as the one actually obtained (say, an r2 of .30) would come up less than five times in a hundred if the null hypothesis of no relationship was actually true.

• For our purposes, as long as we keep in mind the inherent sloppiness of social science, and the difficulties of accurately quantifying social science phenomena, it’s sufficient to use p-values to accept or reject null hypotheses.

• We should always be skeptical of findings of “significance,” particularly when very large samples are involved.– When sample size is large - say, a thousand - even weak relationships can show up

as statistically significant. (More on this later.)

A caution on hypothesis testing…

Page 8: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

Examples of tables fromarticles, panels 1-12

Page 9: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

1 Hypothesis: Alcohol consumption VictimizationMethod: Logistic regression Statistics: b and Odds Ratio (Exp b)

Richard B. Felson and Keri B. Burchfield, “Alcohol and the Risk of Physical and Sexual Assault Victimization,” Criminology (42:4, 2004)

Page 10: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

2 Hypothesis: Veteran status less punitive police response to domestic violenceMethod: Logistic regression Statistics: b and Odds Ratio (Exp b)

Fred Markowitz and Amy C. Watson, “Police Response to Domestic Violence Situations Involving Veterans Exhibiting Signs of Mental Illness,” Criminology, (53:2, 2015)

Page 11: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

3 Hypothesis: Race and class Satisfaction with policeMethod: Logistic regression Statistics: b and Exp b (odds ratio)

Yuning Wu, Ivan Y. Sun and Ruth A. Triplett, “Race, Class or Neighborhood Context: Which Matters More in Measuring Satisfaction With Police?,” Justice Quarterly (26:1, 2009)

Page 12: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

4 Hypothesis: Low self control More contact with policeMethod: Logistic regression Statistics: b and Exp b (odds ratio)

Kevin M. Beaver, Matt DeLisi, Daniel P. Mears and Eric Stewart, “Low Self-Control and Contact with the Criminal Justice System in a Nationally Representative Sample of Males,” Justice Quarterly (26:4, 2009)

Page 13: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

5 Hypothesis: Gender and race of victim Imposition of death sentenceMethod: Logistic regression Statistics: b (“coefficient”) and odds-ratio (exp b)

Marian R. Williams, Stephen Demuth and Jefferson E. Holcomb, “Understanding the Influence of Victim Gender in Death Penalty Cases: The Importance of Victim Race, Sex-Related Victimization, and Jury Decision Making,” Criminology (45:4, 2007)

Page 14: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

6 Hypothesis: Academic performance DelinquencyMethod: “Tobit” regression* Statistic: b

* Best when the DV for a large proportion of cases has a zero value

Richard B. Felson and Jeremy Staff, “Explaining the Academic Performance-Delinquency Relationship,” Criminology (44:2, 2006)

Page 15: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

7 Hypothesis: Strains of imprisonment RecidivismMethod: Logistic regression Statistics: B and exp B (odds-ratio)

Shelley Johnson Listwan, Christopher J. Sullivan, Robert Agnew, Francis T. Cullen and Mark Colvin, “The Pains of Imprisonment Revisited: The Impact of Strain on Inmate Recidivism,” Justice Quarterly (30:1, 2013)

Page 16: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

8 Hypothesis: Father’s incarceration Son’s delinquencyMethod: Tobit regression Statistic: Random effect coefficient (S.E. in parentheses)

Michael E. Roettger and Raymond R. Swisher, “Associations of Fathers’ History of Incarceration With Sons’ Delinquency and Arrest Among Black, White and Hispanic Males in the United States,” Criminology (49:4, 2011)

Page 17: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

8 Hypothesis: Father’s incarceration Son’s delinquencyMethod: Logistic regression Statistic: Odds ratio (Standard Error in parentheses)

Michael E. Roettger and Raymond R. Swisher, “Associations of Fathers’ History of Incarceration With Sons’ Delinquency and Arrest Among Black, White and Hispanic Males in the United States,” Criminology (49:4, 2011)

Page 18: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

9 Hypothesis: Officer and driver race Vehicle searchMethod: Logistic regression Statistics: Odds ratio (Standard Error in parentheses)

Jeff Rojek, Richard Rosenfeld and Scott Decker, “Policing Race: The Racial Stratification of Searches in Police Traffic Stops,” Criminology (50:4, 2012

Page 19: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

10 Hypothesis: Offender race & gender Use of intermediate sanctionsMethod: Logistic regression Statistics: b and Exp b (odds ratio)

Brian D. Johnson and Stephanie M. Dipietro, “The Power of Diversion: Intermediate Sanctions and Sentencing Disparity Under Presumptive Guidelines,” Criminology (50:3, 2012)

Page 20: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

11 Hypothesis: Race & ethnicity Prosecution and sentencing outcomesMethod: Logistic regression Statistic: Odds ratio (Exp b)

Besiki L. Kutateladze, Nancy R. Andiloro, Brian D. Johnson andCassia C. Spohn, “Cumulative Disadvantage: Examining Racial and Ethic Disparity in Prosecution and Sentencing,” Criminology (52:3, 2014)

Page 21: Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:

12 Hypothesis: Marriage Desistance from crimeMethod: HLM (like logistic regression) Statistics: b (Coeff.) [Can compute log odds)

Bianca E. Bersani and Elaine Eggleston Doherty, “When the Ties That Bind Unwind: Examining the Enduring and Situational Processes of Change Behind the Marriage Effect,” Criminology (51:2, 2013)