sociology 5811: lecture 6: samples, populations copyright © 2005 by evan schofer do not copy or...

35
Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Upload: barbra-hardy

Post on 14-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sociology 5811:Lecture 6: Samples, Populations

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements

• Problem set #2 due next Tuesday, Sept 27

Page 3: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Problem Set: Z-table

• Several problems require looking up area under the normal curve associated with certain Z-scores

• Requires use of “Z-table”

• Found on Knoke, p. 459

• Issue: We know that 95% of area under a normal curve falls within +/- 2 standard deviations

• Thus: Area under normal curve from Z = -2 to Z = 2 is equal to .95

• Area left of Z = -2 and right of Z = 2 is .05

• But, what if we want area for a value like 1.4?• Z-table lists areas for all values!

Page 4: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Problem Set: Z-table

Let’s look at Z=.40

Area from 0 to Z = .15

Area beyond +Z = .35

Question: What is

Area from -Z to +Z?

Page 5: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Probability• Probability of event A defined as p(A):

outcomes ofnumber total

occursA in which outcomes)( Ap

• “The probability of a particular outcome is the proportion of times that outcome would occur in a long run of repeated observations (Agresti & Finlay 1997, p. 81)”

p(red) = 2 divided by 10

p(red) = .20

Page 6: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Probability

• Question: What is the probability of picking red twice in a row (assuming you replaced the red one after you picked)?

• Answer: Combined probabilities multiply

• Each probability is .20• .20 x .20 = .04

• Under 5% chance!

• Conclusion: If you pick many times, you are unlikely to continually get atypical colors

• It can happen, but it is very improbable.

• Ex: Picking red 5 times: Probability is .00032.

Page 7: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Probability Distributions

• Both nominal/ordinal and continuous measures can be conceived of as probability distributions– Nominal/Ordinal: Height of bars indicates probability

of picking someone with that value– Continuous: Can’t be graphed in separate bars

• Instead, a continuous curve approximates probability

• Area under curve = probability of picking a case within a given range of values.

Page 8: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Probability Distributions

• Notation: – Greek alpha () is used to refer to probabilities in a

range for a continuous distribution

Page 9: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Probability Distributions

• P(Y<a)=

Page 10: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Probability Distributions

• P(Y<a, Y>b)=

Page 11: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Normal Distributions

• Normal curves have well-known properties:• 68% of area under the curve (and thus cases) fall within 1

standard deviation of the mean

• 95% of cases fall within 2 standard deviations

• 99% of cases fall within 3 standard deviations

• Percentages translate directly into probabilities

• Thus, it is easy to determine the probability associated with any range around the mean

• e.g., there is a .95 probability that a person randomly chosen will fall within 2 SD of mean

• This property makes normal curves very useful!

Page 12: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Samples and Populations

• Issue: As social scientists, we wish to describe and understand large sets of people (or organizations or countries)

• School achievement of American teenagers

• Fertility of individuals in Indonesia

• Behavior of organizations in the auto industry

• Problem: It is seldom possible to collect data on all relevant people (or organizations or countries) that we hope to study.

Page 13: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Samples and Populations

• How can we calculate the mean or standard deviation for a population, without data on most individuals?

• Without even knowing the total N of the population?

• Are we stuck?

• IDEA: Maybe we can gain some understanding of large groups, even if we have information about only some of the cases within the group

• We can examine part of the group and try to make intelligent guesses about what the entire group is like.

Page 14: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Populations Defined

• Population: The entire set of persons, objects, or events that have at least one common characteristic of interest to a researcher (Knoke, p. 15)

• Populations (and things we’d like to study)• Voting age Americans (their political views)

• 6th grade students attending a particular school (their performance on a math test)

• People (their response to a new AIDS drug)

• Small companies (their business strategies).

Page 15: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Population: Defined

• People in those populations have one common characteristic, even if they are different in many other ways

• Example: Voting age Americans may differ wildly, but they share the fact that they are voting aged Americans

• Beyond literal definition, a population is the general group that we wish to study and gain insight into.

Page 16: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sample: Defined

• Sample: A subset of a population• Any subset, chosen in any way

• But, manner of choosing makes some samples more useful than others

• Datasets are usually samples of a larger population

• Beyond literal definition, sample often means “the group that we have data on”.

Page 17: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Statistical Inference: Defined

• Our Goal: to describe populations– However, we only have data on a sample (a subset) of

the population– We hope that studying a sample will give us some

insight into the overall population

• Statistical Inference: making statistical generalizations about a population from evidence contained in a sample (Knoke, 77).

Page 18: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Statistical Inference

• When is statistical inference likely to work?

• 1. When a sample is large• If a sample approaches the size of the population, it is likely

be a good reflection of that population

• 2. When a sample is representative of the entire population

• As opposed to a sample that is atypical in some way, and thus not reflective of the larger group.

Page 19: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Random Samples

• One way to get a representative sample is by choosing one randomly

• Definition: A sample chosen from a population such that each observation has an equal chance of being selected (Knoke, p. 77)– Probability of selection:

Np

1)selection(

• Randomness is one strategy to avoid “bias”, the circumstance when a sample is not representative of the larger population.

Page 20: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Biased Samples: Examples

• Biased samples can lead to false conclusions about characteristics of populations

• What are the problems with these samples?– Internet survey asking people the number of CDs they

own (population = all Americans)– Telephone survey conducted during the day of

political opinions (pop = voting age Americans)– Survey of an Intro Psych class on causes of stress and

anxiety (pop = All humans)– Survey of Fortune 500 firms on reasons that firms

succeed (pop = all companies).

Page 21: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Statistical Inference

• Statistical inference involves two tasks:

• 1. Using information from a sample to estimate properties of the population

• 2. Using laws of statistics and information from the sample to determine how close our estimate is likely to be– We can determine whether or not we are confident in

our assessment of a population

Page 22: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Statistical Inference Example

• Population: Students in the United States

• Sample: Individuals in this classroom

• Question: What is the mean number of CD’s owned by students in the US?

• Goal #1: Use information on students in this class to guess the mean number of CD’s owned by students in the US

• Goal #2: Try to determine how close (or far off) our estimate of the population mean might be. Estimate the quality of the guess.

• Part #2 helps prevent us from drawing inappropriate conclusions from #1

Page 23: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Population and Sample Notation

• Characteristics of populations are called parameters

• Characteristics of a sample are called statistics

• To keep things straight, mathematicians use Greek letters to refer to populations and Roman letters to refer to samples– Mean of sample is: Y-bar– Mean of population is Greek mu: μ– Standard deviation of sample is: s– Standard deviation of a population is lower case

Greek sigma: σ

Page 24: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Population and Sample Notation

• Estimates of a population parameter based on information from a sample is called a “point estimate”– Example of a point estimate:

• Based on this sample, I estimate that the mean # of CDs owned by students in the U.S. is 47.

• Formulas to estimate a population parameter from a sample are “estimators”.

Page 25: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Estimation: Notation

• We often wish to estimate population parameters, using information from a sample we have

• We may use a variety of formulas to do this

• Mathematicians identify estimates of population parameters in formulas by placing a caret (“^” ) over the parameter– The caret is called a “hat”– An estimate of is called “sigma-hat”– Symbol: σ̂

Page 26: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Populations and Samples

• Population parameters (μ, σ) are constants• There is one true value, but it is usually unknown

• Sample statistics (Y-bar, s) are variables• Up until now we’ve treated them as constants

• But, there are many possible samples

• Different samples yield different values of the mean & S.D.

– Like any variable, the mean and S.D. have a distribution!

• Called the “sampling distribution”

• Made up of all values for any given population

• We’ll discuss it later…

Page 27: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Population and Sample Distributions

Y

s

Page 28: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Population Distributions

• Population distributions are typically conceived of as probability distributions

• Because we don’t usually see the whole thing… We just pull individuals out based on relative probability

• Some populations are finite and could graphed as a raw frequency plot or histogram (examples?)

• Many populations are infinite, can’t ever be graphed as a frequency plot/histogram (examples?)

• The main thing that matters about a population is how likely you are to pick a person with a given value (or in a range of values).

Page 29: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Populations and Samples: Overview

Population Sample

Characteristics “parameters” “statistics”

Characteristics are:

constant (one for population)

variables (varies for each sample)

Notation Greek (, ) Roman ( , s)

Estimate “hat”: “point estimate” based on sample

σ̂

Y

Page 30: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
Page 31: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
Page 32: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
Page 33: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Normal Distribution

• Example: Blood Cholesterol• normally distributed

• mean = 200

• S.D. = 40

• What is the range of cholesterol that encompasses 95% of the population?

• Answer: 200 +/- (2)(40) = 200 +/- 80– Range = 120 to 280

Page 34: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Normal Distributions and Inference

• The link between normal distributions and probabilities allows us to draw conclusions

• Example: Suppose you are a detective

• You suspect that a person is taking an illegal drug• One side-effect of the drug is that it raises cholesterol to

extremely high levels

• Strategy: Take a sample of blood from person• Compare with known distribution for normal people

• Observation: Blood cholesterol is 5 standard deviations above the mean…

Page 35: Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Normal Distributions and Inference

• What can you tell by knowing cholesterol is 5 standard deviations above the mean?

• 99% are within 3 standard deviations, 1% not

• A much lower percentage fall 5 S.D’s from the mean

• Based on properties of a normal curve:• Only .000000287 of cases fall 5 or more S.D’s from the

mean

• Conclusion: It is improbable that the person is not taking drugs

• But, in a world of 6 billion people, there are 1,722 such people – you can’t be absolutely certain…