reaearch methodoly

8/14/2019 reaearch methodoly

1/51

SurveysSurveys


2/51

What is it?


3/51

A survey is a measurement process used to collect

information during a highly structured interview

sometimes with a human interviewer and other times

without. The questions are carefully chosen or crafted,

sequenced, and precisely asked for each participant.


4/51

The sources of errors in the communicationapproach

Selection or crafting inappropriate questions

Asking questions in an inappropriate order

Use of inappropriate transitions and

instructions to elicit information.


5/51


6/51

Interviewer error

Failure to secure participant co-

operation : It is likely that

interviewers dont do a good job of

enlisting participants to co-operate.

Failure to record answers accurately

and completely

Failure to consistently execute

interview procedures

Failure to establish appropriate

interview environment

Falsification of individual answers or

whole interviews

Inappropriate influencing behaviour

Physical presence bias


7/51

Participant error

For a successful survey three broad conditions must be met by

the participants:

1. They must posses the information being targeted by the

investigative questions.

2. Must understand his/her role in the interview as the provider of

accurate information.

3. Must have adequate motivation to cooperate.


8/51

Participation based error

Three factors influence participation:

1. Must believe that the experience will be pleasant and satisfying

2. Must believe that answering the survey is important and

worthwhile use of his or her time.3. Must dismiss any mental reservations that he or she might

have about participation.


9/51

Choice of the processChoice of the process

Refer to the word document onRefer to the word document on

surveyssurveys


10/51

Experiments and test markets

What are experiments?

1. A study involving intervention by the researcher beyond that

required for measurement.

2. The usual intervention is to manipulate some variable in thesetting and observe how it affects the subjects being studied.

i.e the researcher manipulates the independent variable and

then observes whether the hypothesized dependent variable is

affected by the intervention.


11/51

Conducting an experiment

A researcher must accomplish certain activates to do a

successful experiment:

1. Select relevant variables

2. Specify the treatment levels.3. Control the experimental environment

4. Choose the experimental design

5. Select and assign the subjects

6. Pilot test, revise and test

7. Analyze the data


12/51

HYPOTHESIS: It is a relational statement as it describes therelationship between two or more variables.

Treatment levels:

In an experiment , participants experience a manipulation of theindependent variable called the experimental treatment.

The treatment levels of the independent variable are the arbitraryor natural groups the researcher makes within the independentvariable of an experiment.

e.g.: if salary is hypothesized to have an effect on employees

exercising of stock of purchase options, it might be divided intohigh , middle and low ranges to represent three levels of theindependent variable.


13/51

Miscellaneous terms

The control group is composed of subjects who are not exposed

to the independent variables in contrast to those who receive the

experimental treatment.

When the subjects do not know they are receiving treatment they

are said to be blind.

When the experimenters do not know that they are giving

treatment to the experimental group or to the control group the

experiment is said to be double blind.


14/51

Sampling Design


15/51

Lucky ones get to work

with these

The rest of us mere

mortals have to make dowith.

0

50

100

1st

Qtr

3rd

Qtr

East

West

North


16/51

Sampling is choosing which subjects to measure in a

research project.

Regardless, sampling will determine how much and

how well the researcher may generalize his or herfindings. A bad sample may well render findings

meaningless.


17/51

Key concepts and terms

Population: The population is the set of people or

entities to which findings are to be generalized.

The population must be defined explicitly before a

sample is taken. Enumerations or censuses are collections of data

from every person or entity in the population.


18/51

Random sampling is data collection in which every person inthe population has a chance of being selected which is knownin advance. Normally this is an equal chance of being selected.

If data are a random sample, the researcher must report notonly the magnitude of relationships uncovered but also theirsignificance level (the chance the findings are due to thechance of sampling).


19/51

The sampling frame is the list of ultimate sampling entities,

which may be people, households, organizations, or other units

of analysis. The list of registered students may be the sampling

frame for a survey of the student body at a university.

Telephone directories are often used as sampling frames, for

instance, but tend to under-represent the poor (who have fewer

or no phones) and the wealthy (who have unlisted numbers).

Random digit dialing (RDD) reaches unlisted numbers but not

those with no phones, while over representing householdsowning multiple phones. In multi-stage sampling, there will be

one sampling frame per stage


20/51

Significance is the percent chance that a relationship found inthe data is just due to an unlucky sample, and if we tookanother sample we might find nothing.

That is, significance is the chance of a Type Ierror: the chance

of concluding we have a relationship when we do not. Socialscientists often use the .05 level as a cutoff: if there is 5% orless chance that a relationship is just due to chance, weconclude the relationship is real (technically, we fail to acceptthe null hypothesis that the strength of the relationship is not

different from zero).


21/51

HYPOTHESIS: It is a relational statement as itdescribes the relationship between two or morevariables.

Treatment levels: In an experiment , participants experience a

manipulation of the independent variable called the

experimental treatment. The treatment levels of the independent variable

are the arbitrary or natural groups the researchermakes within the independent variable of anexperiment.

e.g.: if salary is hypothesized to have an effect onemployees exercising of stock purchase options, itmight be divided into high , middle and low rangesto represent three levels of the independentvariable.


22/51

Miscellaneous terms

The control group is composed of subjects who arenot exposed to the independent variables incontrast to those who receive the experimentaltreatment.

When the subjects do not know they are receivingtreatment they are said to be blind.

When the experimenters do not know that they aregiving treatment to the experimental group or to

the control group the experiment is said to bedouble blind.


23/51

Ruling out Chance as an Explanation

When an independent variable appears to have an effect, it isvery important to be able to state with confidence that theeffect was really due to the variable and not just due to chance.

consider a hypothetical experiment on a new antidepressantdrug.

Ten people suffering from depression were sampled andtreated with the new drug (the experimental group);

an additional 10 people were sampled from the samepopulation and were treated only with a placebo (the controlgroup).

After 12 weeks, the level of depression in all subjects wasmeasured and it was found that the mean level of depression(on a 10-point scale with higher numbers indicating moredepression) was 4 for the experimental group and 6 for thecontrol group
http://davidmlane.com/hyperstat/A29697.htmlhttp://davidmlane.com/hyperstat/A29697.html


24/51

The most basic question that can be asked here is: "How can

one be sure that the drug treatment rather than chanceoccurrences were responsible for the difference between the

groups?"

It could be that by chance, the people who were randomly

assigned to the treatment group were initially somewhat less

depressed than those randomly assigned to the control group


25/51

Confidence intervals are directly related to coefficients ofsignificance. For a given variable in a given sample, one couldcompute the standard error, which, assuming a normaldistribution, has a 95% confidence interval of plus or minus

1.96 times the standard error. If a very large number of samples were taken, and a (possibly

different) estimated mean and corresponding 95% confidenceinterval was constructed from each sample, then 95% of theseconfidence intervals would contain the true population value,

assuming random sampling. The formula for calculating the confidence interval,

significance levels, and standard errors, etc will be discussedlater.
http://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htm


26/51

Standard error. If we took several samples of the same thing we would,of course, be able to compute several means, one for each sample.

If we computed the standard deviation of these sample means as anestimate of their variation around the true but unknown population mean,

that standard deviation of means is called the standard error. Standard error measures the variability of sample means. However, since

we normally have only one sample but still wish to assess its variability,we can compute estimated standard errorby this formula:

SE = sd/SQRT(n)

where sd is the standard deviation for a variable and n is samplesize. We are estimating that SE diminishes proportional to thesquare root of n. The larger the n, the smaller the SE. Oftenestimated standard error is just called 'standard error.'


27/51

Census & sample survey

A census is basically a complete enumeration of all items in

the population. Such and inquiry should cover all items and

nothing should be left to chance.

But remember that in practice this may not always be true.

A census would require a great deal of time , money and

resources.

Due to these reasons most studies undertake sample survey's

instead.


28/51

What happens in a Sample design process

The respondents basically select a representative of thetotal population.

The selected constitute what is called a sample and theprocess is called sampling technique.

The survey that us conducted is called sample survey.

Arithmetically : if we let the population size be N anda part of it be n where (n


29/51

Two Sorts of Statistics

Descriptive statistics

To describe and summarize the characteristics of the

sample

Applied in the context ofexploratory techniques

Inferential statistics

To infer something about the population from the sample

Applied in the context ofconfirmatory methods

From Descriptive to


30/51

From Descriptive toInferential

We have to look at some aspects of the data

we use first

The most important aspect of inferential

statistics is the selection of the sample

A statistic is meaningless if the sample is not

representative

We must consider: Data Acquisition, Quality, & Collection Procedures

Sampling Design & Methods


31/51

Data Acquisition

Any descriptive summaries that we form from a

data set, or any inferences that we draw from the

data set fundamentally reply upon the notion that

the observations that the data record are anaccurate reflection of the phenomenon of interest

at the time they were taken

To have any confidence in the usefulness of a

dataset, we need to be aware ofhow the data wascollected, and by whom, and make use of that

data to inform our judgment about how sound that

source of data is for a given purpose


32/51

Data Acquisition

The fundamental distinction we can draw between sources

of data is data that you have collected yourself, versus data

that has been collected by others and archived

Collected - In many ways, this is the best sort of data

because you can be absolutely certain of the methods used,

although this can be expensive

Archived - Has the competing merit ofalready beingavailable, possibly having been collected over a period of

time, and others have undertaken the expense of doing so


33/51

Collected Data

Collected Data - a.k.a. primary data, is collected directly by

the researcher through experiments, measurements, field

surveys etc.

Benefits: Total certainty as to methods used and error

associated with them, can be customized to the research

question, the methods can be precisely repeated on another

occasion or in another location

Drawbacks: Collecting data is expensive, there may not be

a comparable historical record of similar measurements, gives

critics an opportunity to criticize your data collection as well!


34/51

Collected Data

Collected Data Cont. - We can further sub-divide collected

data into categories that denote the sort of collection

procedure used to produce the data:

Experimental (controlled experiment) data is produced

under repeatable conditions and is presumably an objective

description of some phenomenon (often used in physical

geography)

Non-experimental data, such as interview or

questionnaires are used to assess more qualitative or

subjective ideas or concepts (often applied in the human

geography context)


35/51

Archived Data

Archived Data - Data that is already available because it

has been collected by someone else

Benefits: The expense of collecting the data has beenabsorbed already, the methods used are often a standard

approach that allows for inter-comparison with historical

records or records for other places

Drawbacks: One cannot be as sure of the data quality,methods and associated errors here (sometimes metadata is

not available), the variable of interest may not be available, or

your definition may vary slightly from that used by others


36/51

Archived Data

Archived Data cont. - We can characterize archived data

as being internal (meaning it was collected by another

member of your organization), orexternal (meaning it was

collected by someone you do not know as well) we can

call these:

Secondary data, which is obtained directly from those that

did collect the data

Tertiary data, which we can obtain from a third-party(sometimes via publication, sometimes not), often this is

data which has already been analyzed or transformed

somehow


37/51

Data Quality

The further removed we are from those that actually

collect and create a data set, the worse offwe are when

using that data

The results of any statistical study are only as good as

the data that was used, thus the quality of the data is very

important because it in turn determines the quality and

reliability of descriptions and inferences based upon it

Data obtained externally should be used only after a

serious investigation and consideration of its quality and

reliability


38/51

Sampling Populations

Typically, when we collect data, we are somewhat limited in the scope

of what information we can reasonably collect

Ideally, we would enumerate each and every member of a population

so we could know its parameters perfectlyIn most cases this is not possible, because of the size of the

population (infinite populations?) and associated costs (time, money,

etc.)

Usually it is not necessary, because by collecting data on anappropriate subset of the population we can create statistics that are

adequate estimates of population parameters

Instead, we sample a population, trying to get information about a

representative subset of the population


39/51

Sampling Concepts

We must define the sampling unit - the smallest sub-division of the

population that becomes part of our sample

We want to minimize sampling error when we design how we will

collect data: Typically the sampling error as the sample sizebecause larger samples make up a larger proportion of the population

(and a complete census, for example, theoretically has no sampling

error)

We want to try and avoid sampling bias when we design how we

will collect data: Bias here is referring to a systematic tendency in

the selection of members of a population to be included in a sample,

i.e. any given member of a population should have an equal chance of

bein included in the sam le for random sam lin


40/51

Steps in Sampling

1. Definition of the population - We first need to identify

the population we wish to sample, and do so somewhat

formally because any inferences we draw are really

only applicable to that population

2. Construction of a sampling frame - This involvesidentifying all the individual sampling units within a

population in order that the sample can be drawn from

them. In a survey-type study, this could involve

procuring a list of all the potential individuals who couldbe included in a sample.


41/51

Steps in Sampling Cont.

3. Selection of a sampling design - This is a critical decision about

how to collect the sample. We will look at some different

sampling designs in the following slides

4. Specification of information to be collected - The formaldefinition ofwhat data we will collect and how Often, a pilot

sample is conducted to refine the sampling design and

specifications to help minimize biases that only become apparent

once the sampling design and specifics are tested

5. Collection of the data - When we have steps 1-4 straight, we go

about collecting the sample


42/51

Types of Samples (Designs)

We can distinguish between two families of sampling designs:

Non-probability designs are not concerned with being

representative by virtue of minimizing bias, are typically used

for non-scientific purposes, and are not appropriate forstatistical inference studies, although they can be useful in an

investigative sense

Probability designs aim to representative of the population

they sample, follow rules of randomness in selection tominimize bias, and are those that are used in scientific studies

were inferential statistics will be used

Non-probability Sampling


43/51

Non-probability SamplingDesigns

Some types of non-probability designs:

Volunteer sampling - A self-selecting sample, which is

convenient, but rarely representative

Quota sampling - Researchers select individuals to includebased on fulfilling counts of sub-groups

Convenience sampling - Individuals are included in the

sample because they are available/accessible

Judgmental or purposive sampling - Those that are chosento be included in the sample are chosen based upon some

preconceived notions of what sorts of individuals would be

most appropriate for this investigative purpose (e.g. product

testing based on ideas about the market for a product)

Probability Sampling Designs -


44/51

Probability Sampling Designs -Random

Random sampling - In general, we need some degree of

randomness in the selection of a sample to be able to draw any

meaningful inferences about a population, but in some cases this

may conflict with representativeness

These are drawn in such a way that every unit of a population has

an equal chance of being chosen and the selection of one unit has

no impact on whether or not another individual will be selected

(independence)

This can be done with or without replacement (which determineswhether the same unit can be drawn twice)

We can generate random numbers using a table or using a

computer, and can scale the 0 to 1 values to any required range of

values

Probability Sampling


45/51

Probability SamplingDesigns - Systematic

Representative approaches place restrictions on selection:

Systematic sampling - This approach uses every kth element of

the sampling frame, by beginning at a randomly chosen point in

the frame, e.g. given a sampling frame of size = 200, to create asample of size n=10 from such a sample, select a random point to

begin within the frame and then include every 20 th value in the

systematic sample

This approach assumes that the assignment of the individuals inthe sampling frame is random (i.e. they have not been placed in

the frame in some order or grouping), and this should be checked

before systematically sampling from a frame

Probability Sampling


46/51

Probability SamplingDesigns - Systematic

Some problems with systematic sampling:

The possible values of sample size n are somewhat restricted by

the size of the sampling frame, since the interval should divide

evenly into the size of the sampling frame

If the population itself exhibits some periodicity, then a stratified

sample is likely to not be representative



47/51

Probability Sampling Designs Stratified

We may need to place restrictions on how we select units for

inclusion in a sample to ensure a representative sample.

Stratified sampling - Divide the population into categories and

select a random sample from each of theseThis approach can be used to decrease the likelihood of an

unrepresentative sample if the classes/categories/strata are selected

carefully (the individuals within a strata must be very much alike,

which means that the population must be able to divided intorelatively homogeneous groups)

We need to know something about the population in order to make

good decisions about stratification



48/51


We can take a stratified sample that is

Proportional - Where the random sample drawn from each

class/category/stratum is the same size OR

Disproportional - Where random samples ofdifferent sizes aredrawn from each class/category/stratum, with the sample size

usually being chosen on the basis of the size of that sub-

population. This approach is best used when the sizes of the

categories are significantly different, although it can also beapplied to mitigate cost issues (i.e. it may be more costly to

sample in a swamp than in a grassy field, so we might choose to

take less samples in the swamp, although this clearly would be

nothing to enhance representativeness in our sample)



49/51


WARNING:

A class/category/stratum that is homogeneous with

respect to one variable may have high variation with

respect to another variable! Thus, stratification

must be performed with some foreknowledge of how

the sample will be analyzed, and if the sampling is

being performed in a preliminary fashion (stillseeking the relationships), there is a danger that the

stratification will be found to be inappropriate after

the fact



50/51

Probability Sampling Designs Cluster

Another sampling approach that subdivides the population into

categories is cluster sampling

Cluster sampling - Divides the population into categories based on

convenience rather than some structure designed to promote unbiased

representation of a particular variable across all clusters, and

sampling is performed within individual clusters

Certain clusters are selected forintensive study, usually by a

random procedure, and the content of clusters should each be

individually be heterogeneous (a cross-section of the range of valuesseen in the whole population), and thus representative

This is usually applied for reasons of cost and convenience

Ch i S li i


51/51

Choosing a Sampling Design

In a geographic context:

Stratified sampling works best if the regions are reasonably

homogeneous

Cluster sampling works best if the regions are heterogeneousFrom an efficiency point of view (the number of samples required),

stratified sampling is best since it can be representative using a

smaller number of samples, but if there is no clear means of rational

stratification, then clustering might be the way to goMany sampling designs are hybrids of approaches (e.g. stratify by

ethnic group, cluster to pick neighborhoods, select houses randomly)

reaearch methodoly

Documents