the information school of the university of washington lis 470 data & sampling lis 570 session...

24
LIS 470 Data & Sampling The Information School of the University of Washington LIS 570 Session 4.1 [Many of the slides and graphics adapted from Harry Bruce’s Spring 2005 Class]

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

LIS 470 Data & Sampling

Th

e I

nfo

rmati

on

Sch

ool

of

the U

niv

ers

ity o

f W

ash

ing

ton

LIS 570

Session 4.1[Many of the slides and graphics adapted from

Harry Bruce’sSpring 2005 Class]

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 2

Objectives• Understand the options in, and

goals of, sampling techniques• Reinforce knowledge of vocabulary

and basic principles

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 3

Agenda• Warm-up exercise: review of

principles• Discussion of sampling goals and

methods• Hypothetical research exercise

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 4

What Happened in 1997?

Graduating Class

Year

Est. Average 1st Year Earnings

Projected Est. Average Total

5 year Earnings

1994 $28,100 $154,550

1995 $29,200 $160,600

1996 $30,400 $167,200

1997 $50,500 $339,800

FSU MIS Graduates

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 5

Possible Explanations• Beginning of dot com boom• Beginning of Y2K fears and staffing

frenzy• Other…?• Peter Boulware first round NFL pick

– Overall no. 4 pick by Baltimore Ravens– $800,000 1st year salary, $1M signing

bonus– $17M total 5 year package

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 6

Summary• Sampling - the process of selecting

observations– random; non-random– probability; non-probability

You don’t have to eat the whole ox toknow that the meat is tough

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 7

Aim• A representative sample: a

sample which accurately reflects its population

• Avoid (unconscious) bias

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 8

Basic terminology• Population [universe] - the entire group

of objects about which information is wanted

• Unit [element] - any individual member of the population

• Sample - a part or subset of the population used to gain information about the whole

• Sampling frame - the list of units [subset of the universe] from which the sample is chosen

• Variable - a characteristic of a unit, to be measured for those units in the sample

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 9

Step 1: Identify the Population

The units of the population about whom or which you want to know

• Define the population concretely; no ambiguityExample: “Adult Residents of Seattle”– How is “adult” defined?– What is the exact boundary of Seattle?– As of what date?– Can the population be identified completely?

(e.g., are the homeless included as “residents?”)

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 10

2. Decide on a Census or a Sample

• Census– Observe each unit– An “attempt” to sample the entire

population– Not foolproof (example: issues of US

census)

• Sample: observe a sub-group of the population

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 11

3. Decide on Sampling Approach

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 12

Random samplingRandom (Probability) Sampling• Each unit (element) has the same

chance (probability) of being in the sampleChance or luck of the draw determines who is in the sample (random)

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 13

• Each unit has a known probability or chance of being included in the sample

• An objective way of selecting units• Random Sampling is not

haphazard or unplanned sampling

Random samples

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 14

Types of random sampling

• Simple random sample• Systematic sampling• Stratified sampling• Cluster sampling

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 15

How to choose

The nature of the research problem

Availability of asampling frame

Money Desired level of accuracy

Data collection method

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 16

Simple random samples

• Obtain a complete sampling frame• Give each case a unique number

starting with one• Decide on the required sample size• Select that many numbers from a table

of random numbers• Select the cases which correspond to

the randomly chosen numbers

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 17

Systematic sampling• Sample fraction: divide the

population size by the desired sample size

• Select from the sampling frame according to the sample fraction—e.g., sample faction of 1/5 means that we select one element for every five in the population

• Must decide where to start

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 18

Stratified sampling• Premise: if a sample is to be

representative, then proportions for various groups in the sample should be the same as in the population

• Stratifying variable: characteristic on which we want to ensure correct representation in the sample– Order sampling frame into groups– Use systematic sampling to select

appropriate proportion of people from each strata

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 19

Cluster samplingInvolves drawing several different

samples– draw a sample of areas– start with large areas then progressively

sample smaller areas within the larger—e.g., example of city population

• Divide city into districts - select SRS sample of districts

• Divide sample of districts into blocks - select SRS sample of blocks

• Draw list of households in each block - select SRS sample of households

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 20

Random Samples

Advantages– Ability to generalise from sample to

population using statistical techniques—inferential statistics

– High probability that sample generally representative of the population on variables of interest

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 21

Non-random Samples

• Purposive • Quota • Accidental• Generalizability based on

“argument”– Replication– Sample “like” the population

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 22

Selecting a sampling method

• Depends on the population• Problem and aims of the research• Existence of sampling frame

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570_Data & Sampling Mason; p. 23

Conclusion• The purpose of sampling is to select a

set of elements from the population in such a way that what we learn about the sample can be generalised to the population from which it was selected

• The sampling method used determines the generalizability of findings

Random samples Non-random sample

LIS 470 Data & Sampling

Th

e I

nfo

rmati

on

Sch

ool

of

the U

niv

ers

ity o

f W

ash

ing

ton

Research Exercise

10-12 minutes: work alone15 minutes: in teams--compare

solutions 15 minutes: discussion