political science stat notes

29
Scatterplots and Graphs – Visualizing Relationships Plots and graphs are good for displaying the DIRECTION of relationships. o POSITIVE RELATIONSHIP: (upward-sloping) an increase in the IV relates with an increase in the DV. o NEGATIVE RELATIONSHIP: (downward-sloping) an increase in the IV relates with a decrease in the DV. Plots and graphs are also good for displaying the LINEARITY of relationship. o Linear relationships o Curvilinear or nonlinear relationships. Scatterplots Use when both variables are interval Shows the individual observations as points on the graph Using Graphs You can also use bar and line graphs to make comparisons o Bar graphs = nominal or ordinal DV o Line graphs = interval IV Be consistent with axes. o X-(horizontal) axis = IV o Y-(vertical) axis = DV

Upload: william-stroud

Post on 24-Nov-2015

20 views

Category:

Documents


0 download

DESCRIPTION

Political Science Statistics

TRANSCRIPT

Scatterplots and Graphs Visualizing RelationshipsPlots and graphs are good for displaying the DIRECTION of relationships.POSITIVE RELATIONSHIP: (upward-sloping) an increase in the IV relates with an increase in the DV.NEGATIVE RELATIONSHIP: (downward-sloping) an increase in the IV relates with a decrease in the DV.Plots and graphs are also good for displaying the LINEARITY of relationship.Linear relationshipsCurvilinear or nonlinear relationships.

ScatterplotsUse when both variables are intervalShows the individual observations as points on the graph

Using GraphsYou can also use bar and line graphs to make comparisonsBar graphs = nominal or ordinal DVLine graphs = interval IVBe consistent with axes.X-(horizontal) axis = IVY-(vertical) axis = DV

Basics of Controlled ComparisonsAsks the question: How else?TerminologyX = independent variableY = dependent variableZ = additional independent variable or control variableX explains Y controlling for Z

Controlled Comparison TermsZero-order effect what is the relationship between a causal and a dependent variable.Controlled comparison table or control table - a cross-tabulation table between a IV and DV for each value of a third (control) variable.Controlled effect what is the relationship between a causal and a dependent variable within one value of a third (control) variable.Partial effect what is the relationship between a dependent variable and a control variable within one value of a causal (X) variable.

What happens when we ask, How else?Spurious relationship: a relationship between X and Y is caused entirely by a third (control) variable.Additive relationship: a third (control) variable adds to the relationship between X and Y.Interaction relationship: the relationship between X and Y depends on the value of the third (control) variable.

How to set up a control tablePlace DV into rowsDecide the base category of the controlMake columns for controlMake columns for IV within columns of controlAdd dataCalculate down Interpret across

InterpretationControlled effect:1) relationship between party ID and gun control separately for men and womenDemocratic women are 30.5% more likely to support more gun control than GOP womenDemocratic men are 34.7% more likely to support more gun control than GOP menOverall, Democrats are roughly 30% more likely to favor gun control than GOP, controlling for gender

Partial effect:The change in the DV due to the control variable on each value of IV.Democratic women are 10.2% more likely to support more gun control than Democratic men.GOP women are 14.4% more likely to support more gun control than GOP men.Overall, women are about 10% more likely than men to favor gun control regardless of party.

WHICH IS IT?SpuriousAdditiveInteraction

IDENTIFYING THE PATTERN1. After holding the control variable constant, does a relationship exist between the IV and DV within at least one value of the control variable? (ie., Are either control effects consistent with the bivariate cross-tabulation finding?)IF NO, THEN ITS A SPURIOUS RELATIONSHIP. IF YES, GO TO (2).2. Is the tendency (direction) of the relationship between the IV and DV the same at all values of the control variable? (ie., Are both control effects consistent with the bivariate cross-tabulation finding?) IF NO, THEN ITS AN INTERACTION. IF YES, PROCEED TO (3)3. Is the strength of the relationship between the IV and the DV the same or very similar at all values of the control variable? (ie., Are both controlled equally strong?)IF YES, THEN ITS ADDITIVE. IF NO, INTERACTION.

Another example DV: vote choice (D or R)IV: abortion opinion (permit or not)Hypothesis:Control: Saliency of the issue to the voter

Means Comparison Use when DV is interval with categorical IV and controlDV: feeling thermometer towards homosexualsIV: egalitarianism scale (belief is social equality)Hypothesis?Control: age group

How to Construct a Mean Comparison Control TableIV makes the rows.CV makes the columns.DV reported as means inside the table

Findings What does the chart tell us?Egalitarianism on homosexual attitudes?

Age on homosexual attitudes?

Egalitarianism on homosexual attitudes, controlling for age?

A Final ExampleDV: % of women in legislaturesIV: electoral system (PR, non-PR)CV: cultural acceptance (low, high)Hypothesis:Control: cultural acceptance of women

FindingsWhat does the chart tell us?Electoral system (Non-PR, PR) on women legislatures?

Cultural acceptance (low, high) on women legislatures?

Conditional Pattern? A conditional change of 5% (from Non-PR to PR) for countries with low cultural acceptance of women.A conditional change of 11.4% (from Non-PR to PR) for countries with high cultural acceptance of women.

Foundations of Statistical Inference CHAPTER 6

Key TermsSignificance (informally):A statistical probability indicating whether an observed relationship could have occurred by chanceInferential statistics:A set of statistical procedures for assessing the relationship between an observed sample value and an unobserved population parameterPopulation: Population parameter: an unknown population characteristicSample:Sample statistic: an estimate of the population parameter.

Ole Miss Student SurveySTEP ONE: DEFINE THE POPULATION. What is the population? All Students, right? But, what about . . . Graduate v. Undergraduate?Part time v. full time students?Faculty and Staff? Oxford campus v. De Soto/Tupelo v. Jackson Medical campus?STEP TWO: DECIDE ON THE SAMPLING FRAME. We draw the sample and sample statistic from our population list (aka, a sampling frame). The sampling frame is the method for defining all population members.What is our sampling frame for our Ole Miss survey?STEP THREE: DRAW A SAMPLE.How will you select your cases from the population to reduce selection bias?Selection Bias - the difference between the population parameter and parameter estimate resulting from some population members being more likely than others to be included in the sample. STEP FOUR: DESIGN A SURVEY INSTRUMENT.How will you write questions to avoid measurement error? STEP FIVE: IMPLEMENT THE SURVEY WITH A HIGH RESPONSE RATE.How will you avoid response bias the bias occurring when some subjects in the sample participate at a higher rate than other subjectsHow could we get a random sample from our sampling frame?Generate random process from which all students have an equal chance of being selectedSetting up a booth in front of the union?Selection bias?Response bias?Email?Selection bias?Response bias?We want to know student bodys average binge drinking score (number of drinks over 2 hours).Plug in our equation:Population parameter = sample statistic + Random sampling errorOle Miss (population) BDS average = sample BDS average + standard errorAssume a random sample of n = 100 drawn from the population has a mean of 3 and a standard deviation of 1.5. Whats the standard error of the sample mean? (Hint: s / n )x bar = 3n = 100s = 1.5Standard error = 1.5 / 10 = .15What is our 95% confidence interval for the population mean? X-Bar +/- Z (SE)3 2(.15) = 2.73 + 2(.15) = 3.3Interpretation: 95% of all possible random samples of n = 100 will produce sample means between 2.7 and 3.3.

Types of Samples Nonrandom Samples: samples that do NOT have an equal probability of selection method (EPSEM)Convenience sample a nonrandom sample in which subjects are selected because they are easy (convenient) for the researcher. Snowball sample respondents are asked to identify other likely participants.Quota sample a nonrandom sample in which subjects are selected in proportion to their representation in the population.Random Sample a family of samples that use an equal probability of selection method (EPSEM).Simple Random Samples random selection from a list (ex. Random digit dialing, random number tables)Stratified Samples - a probability sample by strata. Cluster Samples a probability sample by clusters. Sampling Review: Terms Types of samples EPSEMSampling frameSelection biasResponse bias

Random Sampling Error Random Sampling Error (Standard Error) The extent to which a sample statistic differs, by chance, from a population parameter.Eliminating bias does not eliminate errorMinimized by increasing our nIncreases with variation in the population parameter.

Calculating Random Sampling ErrorPopulation parameter = sample statistic + Random sampling error

Random Sampling Error = Variation component / Sample size component

Variation = direct (positive) relationship

Sample size = inverse (negative) relationship

Sample SizeAs sample size increases, error decreasesThis is an inverse and nonlinear relationship

The Nonlinear Effect of Sample Size Sample size = 100 So sample size component is 100 = 10Becomes variation / 10Increase sample to 400So sample size component is 400 = 20Becomes variation / 20Increase sample to 1,600So sample size component is 1600 = 40Becomes variation / 40Increase sample to 2,500So sample size component is 2500 = 50Becomes variation / 50

Standard DeviationMost common measure of dispersionAverage distance from the meanSample Standard Deviation First calculate the mean:

Example data: 5, 8, 5, 4, 6, 7, 8, 8, 3, 6Sum of X mean: squared Find the mean:(5+8+5+4+6+7+8+8+3+6)/10 = 6Add up the squared deviations for each observation: (5-6) + (8-6) + (5-6) + (4-6) + (6-6) + (7-6) + (8-6) + (8-6)2 + (3-6)2 + (6-6)2 = 28Divide sum by n-1N = sample size (or number of observations28 / (10-1) = 3.11Take the Square Root So standard deviation = 1.76Random Sampling Error (Standard Error): Whats the difference between SD and SE? Standard deviation: A measure of dispersion around a single mean from a known population.Ex. POL 251 Exam ScoresStandard error: A measure of how closely sample means estimate an unknown population mean in repeated sampling.Ex. Binge Drinking Score at Ole Miss Notation = standard deviation of the populations = standard deviation of the sampleSee table 6-5Random sampling error = standard error of the meanStandard error = s / n or may see / n

Central Limit Theorem Established statistical rule that tells us that, if we were to take an infinite number of samples of size n from a population of N members, the means of these samples would be normally distributedThe distribution of sample means would have a mean equal to the true population mean and have random sampling error (standard error) equal to , the population standard deviation, divided by the square root of n.

Normal Distribution The mean, median, and mode are all the same value. If this wasnt the case, then the distribution would not be normal.More importantly, a fixed percentage of observations will lie between the mean and any number of standard deviations from the mean.From our standard deviation example:Assuming the data is normally distributed, then 68.2% of the cases lie between +/- 1 standard deviation of the mean and 95% fall within +/- about 2 (1.96) standard deviations.

Inference from the Normal Distribution95% confidence interval:Interval within which 95% of all possible sample estimates will fall by chanceCalculate by: x bar +/- 1.96 (standard error)Rule of thumb: x bar +/- 2 (standard error)

Standard(ized) Normal Distribution: Used For ProbabilityBell-shaped Curve

Z-Scores Z- Score: converts a raw deviation from the mean into a standardized deviation from the mean.

Extending Statistical Inference: Sample Proportions Use when variable is nominal or intervalAllows us to compare a sample category outcome to the populationSample proportion: the number of cases falling into 1 category divided by the number of all cases in the sample. Sample Proportions To apply statistical inference, we need to calculate our standard error (so we can compute a confidence interval or margin of error).Problem?We cant use standard deviation with nominal or interval dataSolution:

Standard error of a sample proportion

Sample Proportions: An Example Survey question:Should all alcohol be banned in the Grove on football Saturdays?Yes or noLevel of measurement?Outcome: Yes = 15, no = 85, n = 100Sample proportion of abolitionists (p): .15Sample proportion of non-abolitionists (q): .85Whats the standard error of the sample proportion?P=.15Q=.85N=100Sample Proportion: A CI for a sample proportionStandard error in our question: .357/10 = .0357Confidence interval for our finding: .15 give or take .0357Use the rule of thumb for a 95% CI:p 2(.0357) = .0786

p + 2(.0357) = .2214