overview of sample surveys

63
OVERVIEW OF SAMPLE SURVEYS Mehdi Nassirpour,Ph.D. Illinois Department of Transportation sentation was part of the Applied Sampling Workshop RB Conference in Washington DC in January 2004.

Upload: ciel

Post on 17-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

OVERVIEW OF SAMPLE SURVEYS. Mehdi Nassirpour,Ph.D. Illinois Department of Transportation. This presentation was part of the Applied Sampling Workshop at the Annual TRB Conference in Washington DC in January 2004. HOW GOOD MUST THE SAMPLE BE?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OVERVIEW OF SAMPLE SURVEYS

OVERVIEW OF SAMPLE SURVEYS

Mehdi Nassirpour,Ph.D.Illinois Department of Transportation

This presentation was part of the Applied Sampling Workshop at the Annual TRB Conference in Washington DC in January 2004.

Page 2: OVERVIEW OF SAMPLE SURVEYS

HOW GOOD MUST THE SAMPLE BE?

• There is no uniform standard of quality that must be reached by every sample.

• The quality of the sample depends entirely on the stage of the research and how the information will be used.

Division of Traffic Safety at IDOT

Page 3: OVERVIEW OF SAMPLE SURVEYS

CURRENT POPULATION SURVEY

• CPS is a monthly survey of households. • It provides data on the labor force, employment,

unemployment, and persons not in the labor force.• This is a precise and controlled sample since it is the

only source of monthly estimates of total employment and unemployment.

• The sampling error for this kind of sample is about 0.1 percent

Division of Traffic Safety at IDOT

Page 4: OVERVIEW OF SAMPLE SURVEYS

PUBLIC PERCEPTION OF ILLINOIS SAFETY BELT USE • A sample of 500 Illinois residents over 18 years of age

were selected. Although to achieve equal sample reliability, the sample size for a state or local geographic area would need to be virtually as large as if the study were a national sample of the US, one generally finds that local samples are smaller. That is, although the public attitudes toward safety belt issues are as important, the level of research funds available is smaller for a state than for a national study.

Division of Traffic Safety at IDOT

Page 5: OVERVIEW OF SAMPLE SURVEYS

INAPPROPRIATE SAMPLE DESIGN

• Whether or not a sample design is appropriate depends on how it is used and the resources available. It may be fair to say that the sample generalizations made from the sample go too far.

Division of Traffic Safety at IDOT

Page 6: OVERVIEW OF SAMPLE SURVEYS

WHAT IS THE APPROPRIATE SAMPLE DESIGN?

• DEGREE OF ACCURACY

• RESOURCES

• TIME

• ADVANCED KNOWLEDGE OF THE POPULATION

• NATIONAL VERSUS LOCAL

• NEED FOR STATISTICAL ANALYSIS

Division of Traffic Safety at IDOT

Page 7: OVERVIEW OF SAMPLE SURVEYS

SMALL-SCALE SAMPLE WITH

LIMITED RESOURCES • Generalizability• Sample size

– Too small for a meaningful analysis– Adequate for some but not all major analyses– Adequate for the purpose of study

• Sample Execution– Poor response rate– Careless field work

• Use of resources

Division of Traffic Safety at IDOT

Page 8: OVERVIEW OF SAMPLE SURVEYS

Define the target Population

Select a sampling frame

Determine if probability or non-probability sampling will be chosen

Plan procedures for selecting sampling units

Determine sample size

Select actual sampling units

Conduct field work

Stages in the Selection of a Sample

Page 9: OVERVIEW OF SAMPLE SURVEYS

TARGET POPULATION

• RELEVANT POPULATION

• OPERATIONALY DEFINE

Division of Traffic Safety at IDOT

Page 10: OVERVIEW OF SAMPLE SURVEYS

DEFINING POPULATION

1. DEFINITION OF TARGET POPULATION – Complete set of individuals from which information is

collected• TARGET AREA

– Entire region or set of locations from which information is collected

• Example: define population for a study of elderly in Springfield, IL

• How will you distinguish the elderly from the non- elderly?

• Will the elderly be defined by occupational categories? Do you want retired people? Or do you want persons over 65 and retired?

Division of Traffic Safety at IDOT

Page 11: OVERVIEW OF SAMPLE SURVEYS

SAMPLING FRAME

• A LIST OF ELEMENTS FROM WHICH SAMPLE MAY BE DRAWN

• WORKING POPULATION

• MAILING LIST--DATABASE

• SAMPLING FRAME ERROR

Division of Traffic Safety at IDOT

Page 12: OVERVIEW OF SAMPLE SURVEYS

SAMPLING FRAME (Examples) CONSTRUCTION OF OPERATIONAL SAMPLING FRAME• List of all subjects in the population• Specific definition of population• Wish to have a sampling frame that is almost or exactly

identical to the entire population

• Example: use of telephone surveys of voter preferences for political parties

• Population of interest: all voters– Sampling frame: all voters with a telephone and who

answer it• SAMPLED POPULATION – set of all individuals contained in

the sampling frame, from which the sample is actually taken.• SAMPLED AREA – set of all locations within the study area

boundary line that delimits the spatial sampling frame, from which the sample is drawn

Division of Traffic Safety at IDOT

Page 13: OVERVIEW OF SAMPLE SURVEYS

SAMPLING UNITS

• GROUP SELECTED FOR THE SAMPLE

• PRIMARY SAMPLING UNIT (PSU)

• SECONDARY SAMPLING UNIT

• TERTIARY SAMPLING UNIT

Division of Traffic Safety at IDOT

Page 14: OVERVIEW OF SAMPLE SURVEYS

SAMPLING ERRORS

• SAMPLING FRAME ERROR (STUDY DESIGN)

• RANDOM SAMPLING ERROR (SAMPLING VARIABILITY)

• NONRESPONSE ERROR (MEASUREMENT BIASES)

Division of Traffic Safety at IDOT

Page 15: OVERVIEW OF SAMPLE SURVEYS

RANDOM SAMPLING ERROR

• DIFFERENCE BETWEEN THE SAMPLE RESULT AND THE RESULT OF A CENSUS CONDUCTED USING IDENTICAL PROCEDURES

• STATISTICAL FLUCTUATION DUE TO CHANCE VARIATIONS

Division of Traffic Safety at IDOT

Page 16: OVERVIEW OF SAMPLE SURVEYS

SYSTEMATIC ERRORS

• NONSAMPLING ERRORS

• UNREPRESENTATIVE SAMPLE RESULTS

• NOT DUE TO CHANCE

• DUE TO STUDY DESIGN OR IMPERFECTIONS IN EXECUTION

Division of Traffic Safety at IDOT

Page 17: OVERVIEW OF SAMPLE SURVEYS

SOURCES OF N0N-SAMPLING ERRORS• Under-representation

– poor, homeless, prison inmates– opinion polls over telephones will miss 6% of

population that do not have phones

• Non-response– when selected individuals are not contacted or do not

respond– usually 30%– results in bias

• Interviewing skills - important not to introduce bias– types of questions asked– attitude during interviewing– wording of questions - confusing, misleading,

intimidating

Page 18: OVERVIEW OF SAMPLE SURVEYS

SOURCES OF SAMPLING ERROR

• Inadequate sample size• The smaller the sample, the more difficult it

will be for that sample to truly capture the characteristics of a population

– Imprecise sample/results• The larger the sample, the better• But, collecting large samples costs money and

resources• In reality, a balance needs to be struck

between collecting extensive samples and spending a lot of money and resources and saving money but not having enough data to draw conclusions from

Division of Traffic Safety at IDOT

Page 19: OVERVIEW OF SAMPLE SURVEYS

Relationship Between Total Error and Sampling and

Non-Sampling Errors

Total Erro

r

Non-sampling Error

SamplingError

Division of Traffic Safety at IDOT

Page 20: OVERVIEW OF SAMPLE SURVEYS

TWO TYPES OF SAMPLING

• PROBABILITY SAMPLING

• NONPROBABILITY SAMPLING

Division of Traffic Safety at IDOT

Page 21: OVERVIEW OF SAMPLE SURVEYS

NONPROBABILITY SAMPLING

• CONVENIENCE

• JUDGMENT

• QUOTA

• SNOWBALL

Division of Traffic Safety at IDOT

Page 22: OVERVIEW OF SAMPLE SURVEYS

PROBABILITY SAMPLING

• SIMPLE RANDOM SAMPLE

• SYSTEMATIC RANDOM SAMPLE

• STRATIFIED SAMPLE

• CLUSTER SAMPLE

• MULTISTAGE RANDOM SAMPLE

Division of Traffic Safety at IDOT

Page 23: OVERVIEW OF SAMPLE SURVEYS

CONVIENCE SAMPLING

• Obtaining a sample of people or units that are most convenient.

Division of Traffic Safety at IDOT

Page 24: OVERVIEW OF SAMPLE SURVEYS

JUDGMENT SAMPLING

• Selecting a sample based on judgment of an individual about some appropriate characteristics required from the sample member.

Division of Traffic Safety at IDOT

Page 25: OVERVIEW OF SAMPLE SURVEYS

QUOTA SAMPLING

• Requires that the various subgroups in a population are represented .

• It should not be confused with stratified sampling.

Division of Traffic Safety at IDOT

Page 26: OVERVIEW OF SAMPLE SURVEYS

SNOWBALL SAMPLING

• Requires additional respondents are obtained from information provided by the initial sample of respondents.

Division of Traffic Safety at IDOT

Page 27: OVERVIEW OF SAMPLE SURVEYS

JUDGMENT SAMPLING

• Selecting a sample based on judgment of an individual about some is appropriate.

Division of Traffic Safety at IDOT

Page 28: OVERVIEW OF SAMPLE SURVEYS

SIMPLE RANDOM SAMPLE

• A sampling procedure that ensures that each element in the population will have an equal chance of being included in the sample.

Division of Traffic Safety at IDOT

Page 29: OVERVIEW OF SAMPLE SURVEYS

HOW TO CHOOSE RANDOM SAMPLE

• Assign each element within the sampling frame a unique number (1-99).

• Identify a random start from the random number table.

• Determine how the digits in the random number table will be assigned to the sampling frame.

• Select the sample elements from the sampling frame.

Division of Traffic Safety at IDOT

Page 30: OVERVIEW OF SAMPLE SURVEYS
Page 31: OVERVIEW OF SAMPLE SURVEYS

SYSTEMATIC RANDOM SAMPLE

• Identify the total number of elements in the population

• Identify the sampling ratio K/n (K=total population size/n=size of desired sample)

• identify the random start.

• Draw a sample by choosing every kth entry

Division of Traffic Safety at IDOT

Page 32: OVERVIEW OF SAMPLE SURVEYS

EXAMPLE OF SYSTEMATIC RANDOM SAMPLE

Division of Traffic Safety at IDOT

Page 33: OVERVIEW OF SAMPLE SURVEYS

STRATIFIED RANDOM SAMPLE

• Sub-samples are drawn within different strata.

• Each stratum in more or less equal on some characteristics.

Division of Traffic Safety at IDOT

Page 34: OVERVIEW OF SAMPLE SURVEYS

REASONS FOR STRATIFIED RANDOM SAMPLE

• Make a sample more efficient since variance differs between the strata.

• Reduce sampling error between strata.

• Reduce number of cases required in order to achieve a given degree of accuracy.

Division of Traffic Safety at IDOT

Page 35: OVERVIEW OF SAMPLE SURVEYS

TYPES OF STRATIFIED

RANDOM SAMPLE • Proportionate Stratified Random Sample

• Disproportionate Stratified Random Sample

Division of Traffic Safety at IDOT

Page 36: OVERVIEW OF SAMPLE SURVEYS

PROPORTIONATE STRATIFIED

RANDOM SAMPLE • It is used to get a more representative sample

than might be expected under SRS.

• Reduces sampling errors between strata with respect to the relative numbers selected. This is true when we have homogeneous groups.

• Population strata must be known in order to draw a proportionate stratified sample.

Division of Traffic Safety at IDOT

Page 37: OVERVIEW OF SAMPLE SURVEYS

DISPROPORTIONATE

STRATIFIED RANDOM SAMPLE • It is used to manipulate the number of cases

selected in order to improve efficiency of the design.

• The main interest is to study separate sub-populations represented by the strata rather than on the entire population

Division of Traffic Safety at IDOT

Page 38: OVERVIEW OF SAMPLE SURVEYS

TYPICAL EXAMPLES OF

STRATIFIED RANDOM SAMPLE • More popular examples are

demographics, Age, Gender, Race, Region, Road type, Urban/Rural.

Division of Traffic Safety at IDOT

Page 39: OVERVIEW OF SAMPLE SURVEYS

WEIGHTING THE SAMPLE

• Reason for weighting is to correct problems associated with sample bias (sampling and non-sampling ).

• Known Sampling biases, such as household selected by random digit dialing will have more than one phone number.

Division of Traffic Safety at IDOT

Page 40: OVERVIEW OF SAMPLE SURVEYS

WEIGHTING PROCESS

• Assign a weight that is equal to the inverse of its probability of selection. In this case, where all sample elements have had the same chance of selection, given the same weight: 1. (This is called self-weighting sample)

Division of Traffic Safety at IDOT

Page 41: OVERVIEW OF SAMPLE SURVEYS

WEIGHTING EXAMPLE

Unweighted Sample Expected Sample(Based on Population)

Non-white

White Total Non-white

White Total

Female 12.3% 56.7% 69% 7.2% 57.7% 65%Male 9.8% 21.2% 31% 3.8% 31.2% 35%Total 22% 78% 100% 11% 89% 100%

Nonwhite Female weight =7.2/12.3=0.59Nonwhite Male weight =3.8/9.8=0.39White Female weight = 57.7/56.7=1.02White Male weight = 31.2/21.2=1.47

Division of Traffic Safety at IDOT

Page 42: OVERVIEW OF SAMPLE SURVEYS

Computation (Estimates of Means, and standard Errors) for Stratified

Sample • Compute values for each strata and then

weight them based on the relative size of the stratum in the population.

Division of Traffic Safety at IDOT

Page 43: OVERVIEW OF SAMPLE SURVEYS

ik

iiW XX

1

WEIGHTING FORMULA

Division of Traffic Safety at IDOT

Page 44: OVERVIEW OF SAMPLE SURVEYS

Data for Computing Parameter Estimates from Stratified Samples

County1 2 3

Total

Size of County (Mi) 10,000 15,000 25,000 50,000 (=M)

Weight (Wi) .2 .3 .5 1.00

Size of sample (Ni) 50 50 50 150

_Sample Mean (Xi)

3,100 4,300 3,800

Sample StandardDeviation (Si)

500 400 300

Page 45: OVERVIEW OF SAMPLE SURVEYS

Estimated Standard Errors

9.4249

300

1

1.5749

400

1

4.7149

500

1

3

2

1

N

S

N

S

N

SCounty 1:

County 2:

County 3:

Division of Traffic Safety at IDOT

Page 46: OVERVIEW OF SAMPLE SURVEYS

Estimated Mean and Variance

5.957ˆ

810,3)800,3(50.)300,4(30.)100,3(20.

)9.42()50(.)1.57()30(.)4.71()20(.222222

X

X

Division of Traffic Safety at IDOT

2

Page 47: OVERVIEW OF SAMPLE SURVEYS

CLUSTER SAMPLING

• Divide population into a large number of groups, called clusters and then sample among clusters. Finally select all individuals within those clusters.

• The main reason for cluster sampling is to sample economically while retaining the characteristics of a probability sample.

Division of Traffic Safety at IDOT

Page 48: OVERVIEW OF SAMPLE SURVEYS

TYPES OF CLUSTER SAMPLING • Single -Stage Cluster sampling--Divide population into

several hundred census tracts and then select 40 tracts as a sample. Then select every individuals within selected census tracts.

• Multistage Cluster Sampling--Take a random sample of census tracts within a city. Then within each selected census tract we take a simple random sample of blocks (smaller clusters). Finally we might select every third house and interview every second adult within each of these households

Division of Traffic Safety at IDOT

Page 49: OVERVIEW OF SAMPLE SURVEYS

CLUSTER SAMPLINGProbability Proportionate to

Size (PPS) • Arrange clusters in a desire order (not

necessarily by size)

• Obtain the size data

• Sum up the size measures over clusters

• Determine sampling interval

• Select a random start

Division of Traffic Safety at IDOT

Page 50: OVERVIEW OF SAMPLE SURVEYS

Difference Between Cluster

Sampling and Stratified Sampling

• Although both types of sample involve divide population into groups, they involve in a opposite sampling operations.

• In a stratified sample, we sample individuals within every stratum. The sampling errors involve variability within strata. Strata are supposed to be homogeneous as possible and as different as possible from each other.

• In (single-stage ) cluster sampling, we have no source of sampling error within the clusters because every case is being used. The variability is between the clusters.

Division of Traffic Safety at IDOT

Page 51: OVERVIEW OF SAMPLE SURVEYS

Difference Between Cluster Sampling and Simple Random

Sample • Cluster sample is less efficient than the simple random

samples of the same size. But it may cost considerably less.

• The efficiency can be measured in terms of the size of standard error of estimate, a small standard error indicates high efficiency.

Division of Traffic Safety at IDOT

Page 52: OVERVIEW OF SAMPLE SURVEYS

Comparing Cluster Sampling and

Simple Random Sample

)1(12

2 NPi

X

X

r

c

These are the variances of the means for cluster and simple randomsamples, Pi represents the population intra-class correlation, andthe mean number of cases selected from each of the cluster

N

Page 53: OVERVIEW OF SAMPLE SURVEYS

MUTI-STAGE CLUSTER SAMPLING

• Stratification techniques within the clusters will be used to refine and improve the sample. Examples of this kind of sampling Census, National Safety Belt Survey.

Division of Traffic Safety at IDOT

Page 54: OVERVIEW OF SAMPLE SURVEYS

PRINCIPAL STEPS TO CONSIDER IN CHOOSING A

SAMPLE SIZE

Mehdi Nassirpour, Ph.D.Illinois Department of Transportation

Page 55: OVERVIEW OF SAMPLE SURVEYS

AFTER SAMPLE DESIGN IS SELECTED

• DETERMINE SAMPLE SIZE

• SELECT ACTUAL SAMPLE UNIT

• CONDUCT FIELD WORK

Division of Traffic Safety at IDOT

Page 56: OVERVIEW OF SAMPLE SURVEYS

STEPS IN DETERMINING SAMPLE SIZE

• Importance of the research or the gains and losses associated with alternative decisions

• Previous example of sample sizes used in social sciences

• Confidence Level to be used• Degree of accuracy within which we wish to

estimate the parameter.• Some reasonable estimate of the values of any

parameters that may appear in the formula.

Division of Traffic Safety at IDOT

Page 57: OVERVIEW OF SAMPLE SURVEYS

DATA ELEMENTS NEEDED TO DETERMINE SAMPLE

SIZE• Mean Value

• Standard Error

• Accuracy level

• Confidence Level

Division of Traffic Safety at IDOT

Page 58: OVERVIEW OF SAMPLE SURVEYS

Formula for Determining Sample Size with an Example

401,21.

5.296.1

5.296.11.

96.11.

96.1

1.

N

N

N

N

NX

Accuracy Level = .1

(Standard deviation of Population)

Confidence level = 95%

Example: Determining a sample size toestimate the mean number of schooling completed by persons with foreign-born Parents.

Division of Traffic Safety at IDOT

Page 59: OVERVIEW OF SAMPLE SURVEYS

Formula for Determining Sample Size for a Categorical

Variable with an Example

3840006507.

25.

))((

))((

2

NPnPy

ErrorStdN

PnPy

ErrorStd

Accuracy Level plus orminus 5 percent (95% confidence level)

Steps:

A. .05/1.96=.0255102

B. (.0255102)2 =.0006507

Division of Traffic Safety at IDOT

Page 60: OVERVIEW OF SAMPLE SURVEYS

Point and Interval Estimations

• Point Estimation: Estimating Population mean using Sample Mean– Bias: Estimate is unbiased if the mean of its sampling

distribution is equal to value of the parameter being estimated

– Efficiency of an Estimate: It refers to the degree to which the sampling distribution is clustered about the true value of the parameter. The smaller the the standard error, the greater the efficiency of the estimate.

• Interval Estimation: It refers to interval estimation of population parameter. – Actual procedure used to obtain an interval estimate is

Confidence Interval.

Page 61: OVERVIEW OF SAMPLE SURVEYS

Confidence Interval Formula

98.015100

596.115

96.196.1

N

XxX

Interval would run between 14.02 to 15.98 using 95 percent Confidence level

Division of Traffic Safety at IDOT

Page 62: OVERVIEW OF SAMPLE SURVEYS

Confidence Interval Formula For Sample

Interval would run between 45.15 to 58.85 using 99 percent Confidence level

85.65224

12797.252

1797.2ˆ797.2

N

SXxX

Division of Traffic Safety at IDOT

Page 63: OVERVIEW OF SAMPLE SURVEYS

Confidence Interval Formula For Proportions

Interval would run between 45.15 to 58.85 using 99 percent Confidence level

1037.055.125

)45)(.55(.33.255.

33.2

N

PsQsPs

Division of Traffic Safety at IDOT