reaearch methodoly
TRANSCRIPT
-
8/14/2019 reaearch methodoly
1/51
SurveysSurveys
-
8/14/2019 reaearch methodoly
2/51
What is it?
-
8/14/2019 reaearch methodoly
3/51
A survey is a measurement process used to collect
information during a highly structured interview
sometimes with a human interviewer and other times
without. The questions are carefully chosen or crafted,
sequenced, and precisely asked for each participant.
-
8/14/2019 reaearch methodoly
4/51
The sources of errors in the communicationapproach
Selection or crafting inappropriate questions
Asking questions in an inappropriate order
Use of inappropriate transitions and
instructions to elicit information.
-
8/14/2019 reaearch methodoly
5/51
-
8/14/2019 reaearch methodoly
6/51
Interviewer error
Failure to secure participant co-
operation : It is likely that
interviewers dont do a good job of
enlisting participants to co-operate.
Failure to record answers accurately
and completely
Failure to consistently execute
interview procedures
Failure to establish appropriate
interview environment
Falsification of individual answers or
whole interviews
Inappropriate influencing behaviour
Physical presence bias
-
8/14/2019 reaearch methodoly
7/51
Participant error
For a successful survey three broad conditions must be met by
the participants:
1. They must posses the information being targeted by the
investigative questions.
2. Must understand his/her role in the interview as the provider of
accurate information.
3. Must have adequate motivation to cooperate.
-
8/14/2019 reaearch methodoly
8/51
Participation based error
Three factors influence participation:
1. Must believe that the experience will be pleasant and satisfying
2. Must believe that answering the survey is important and
worthwhile use of his or her time.3. Must dismiss any mental reservations that he or she might
have about participation.
-
8/14/2019 reaearch methodoly
9/51
Choice of the processChoice of the process
Refer to the word document onRefer to the word document on
surveyssurveys
-
8/14/2019 reaearch methodoly
10/51
Experiments and test markets
What are experiments?
1. A study involving intervention by the researcher beyond that
required for measurement.
2. The usual intervention is to manipulate some variable in thesetting and observe how it affects the subjects being studied.
i.e the researcher manipulates the independent variable and
then observes whether the hypothesized dependent variable is
affected by the intervention.
-
8/14/2019 reaearch methodoly
11/51
Conducting an experiment
A researcher must accomplish certain activates to do a
successful experiment:
1. Select relevant variables
2. Specify the treatment levels.3. Control the experimental environment
4. Choose the experimental design
5. Select and assign the subjects
6. Pilot test, revise and test
7. Analyze the data
-
8/14/2019 reaearch methodoly
12/51
HYPOTHESIS: It is a relational statement as it describes therelationship between two or more variables.
Treatment levels:
In an experiment , participants experience a manipulation of theindependent variable called the experimental treatment.
The treatment levels of the independent variable are the arbitraryor natural groups the researcher makes within the independentvariable of an experiment.
e.g.: if salary is hypothesized to have an effect on employees
exercising of stock of purchase options, it might be divided intohigh , middle and low ranges to represent three levels of theindependent variable.
-
8/14/2019 reaearch methodoly
13/51
Miscellaneous terms
The control group is composed of subjects who are not exposed
to the independent variables in contrast to those who receive the
experimental treatment.
When the subjects do not know they are receiving treatment they
are said to be blind.
When the experimenters do not know that they are giving
treatment to the experimental group or to the control group the
experiment is said to be double blind.
-
8/14/2019 reaearch methodoly
14/51
Sampling Design
-
8/14/2019 reaearch methodoly
15/51
Lucky ones get to work
with these
The rest of us mere
mortals have to make dowith.
0
50
100
1st
Qtr
3rd
Qtr
East
West
North
-
8/14/2019 reaearch methodoly
16/51
Sampling is choosing which subjects to measure in a
research project.
Regardless, sampling will determine how much and
how well the researcher may generalize his or herfindings. A bad sample may well render findings
meaningless.
-
8/14/2019 reaearch methodoly
17/51
Key concepts and terms
Population: The population is the set of people or
entities to which findings are to be generalized.
The population must be defined explicitly before a
sample is taken. Enumerations or censuses are collections of data
from every person or entity in the population.
-
8/14/2019 reaearch methodoly
18/51
Random sampling is data collection in which every person inthe population has a chance of being selected which is knownin advance. Normally this is an equal chance of being selected.
If data are a random sample, the researcher must report notonly the magnitude of relationships uncovered but also theirsignificance level (the chance the findings are due to thechance of sampling).
-
8/14/2019 reaearch methodoly
19/51
The sampling frame is the list of ultimate sampling entities,
which may be people, households, organizations, or other units
of analysis. The list of registered students may be the sampling
frame for a survey of the student body at a university.
Telephone directories are often used as sampling frames, for
instance, but tend to under-represent the poor (who have fewer
or no phones) and the wealthy (who have unlisted numbers).
Random digit dialing (RDD) reaches unlisted numbers but not
those with no phones, while over representing householdsowning multiple phones. In multi-stage sampling, there will be
one sampling frame per stage
-
8/14/2019 reaearch methodoly
20/51
Significance is the percent chance that a relationship found inthe data is just due to an unlucky sample, and if we tookanother sample we might find nothing.
That is, significance is the chance of a Type Ierror: the chance
of concluding we have a relationship when we do not. Socialscientists often use the .05 level as a cutoff: if there is 5% orless chance that a relationship is just due to chance, weconclude the relationship is real (technically, we fail to acceptthe null hypothesis that the strength of the relationship is not
different from zero).
-
8/14/2019 reaearch methodoly
21/51
HYPOTHESIS: It is a relational statement as itdescribes the relationship between two or morevariables.
Treatment levels: In an experiment , participants experience a
manipulation of the independent variable called the
experimental treatment. The treatment levels of the independent variable
are the arbitrary or natural groups the researchermakes within the independent variable of anexperiment.
e.g.: if salary is hypothesized to have an effect onemployees exercising of stock purchase options, itmight be divided into high , middle and low rangesto represent three levels of the independentvariable.
-
8/14/2019 reaearch methodoly
22/51
Miscellaneous terms
The control group is composed of subjects who arenot exposed to the independent variables incontrast to those who receive the experimentaltreatment.
When the subjects do not know they are receivingtreatment they are said to be blind.
When the experimenters do not know that they aregiving treatment to the experimental group or to
the control group the experiment is said to bedouble blind.
-
8/14/2019 reaearch methodoly
23/51
Ruling out Chance as an Explanation
When an independent variable appears to have an effect, it isvery important to be able to state with confidence that theeffect was really due to the variable and not just due to chance.
consider a hypothetical experiment on a new antidepressantdrug.
Ten people suffering from depression were sampled andtreated with the new drug (the experimental group);
an additional 10 people were sampled from the samepopulation and were treated only with a placebo (the controlgroup).
After 12 weeks, the level of depression in all subjects wasmeasured and it was found that the mean level of depression(on a 10-point scale with higher numbers indicating moredepression) was 4 for the experimental group and 6 for thecontrol group
http://davidmlane.com/hyperstat/A29697.htmlhttp://davidmlane.com/hyperstat/A29697.html -
8/14/2019 reaearch methodoly
24/51
The most basic question that can be asked here is: "How can
one be sure that the drug treatment rather than chanceoccurrences were responsible for the difference between the
groups?"
It could be that by chance, the people who were randomly
assigned to the treatment group were initially somewhat less
depressed than those randomly assigned to the control group
-
8/14/2019 reaearch methodoly
25/51
Confidence intervals are directly related to coefficients ofsignificance. For a given variable in a given sample, one couldcompute the standard error, which, assuming a normaldistribution, has a 95% confidence interval of plus or minus
1.96 times the standard error. If a very large number of samples were taken, and a (possibly
different) estimated mean and corresponding 95% confidenceinterval was constructed from each sample, then 95% of theseconfidence intervals would contain the true population value,
assuming random sampling. The formula for calculating the confidence interval,
significance levels, and standard errors, etc will be discussedlater.
http://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htm -
8/14/2019 reaearch methodoly
26/51
Standard error. If we took several samples of the same thing we would,of course, be able to compute several means, one for each sample.
If we computed the standard deviation of these sample means as anestimate of their variation around the true but unknown population mean,
that standard deviation of means is called the standard error. Standard error measures the variability of sample means. However, since
we normally have only one sample but still wish to assess its variability,we can compute estimated standard errorby this formula:
SE = sd/SQRT(n)
where sd is the standard deviation for a variable and n is samplesize. We are estimating that SE diminishes proportional to thesquare root of n. The larger the n, the smaller the SE. Oftenestimated standard error is just called 'standard error.'
-
8/14/2019 reaearch methodoly
27/51
Census & sample survey
A census is basically a complete enumeration of all items in
the population. Such and inquiry should cover all items and
nothing should be left to chance.
But remember that in practice this may not always be true.
A census would require a great deal of time , money and
resources.
Due to these reasons most studies undertake sample survey's
instead.
-
8/14/2019 reaearch methodoly
28/51
What happens in a Sample design process
The respondents basically select a representative of thetotal population.
The selected constitute what is called a sample and theprocess is called sampling technique.
The survey that us conducted is called sample survey.
Arithmetically : if we let the population size be N anda part of it be n where (n
-
8/14/2019 reaearch methodoly
29/51
Two Sorts of Statistics
Descriptive statistics
To describe and summarize the characteristics of the
sample
Applied in the context ofexploratory techniques
Inferential statistics
To infer something about the population from the sample
Applied in the context ofconfirmatory methods
From Descriptive to
-
8/14/2019 reaearch methodoly
30/51
From Descriptive toInferential
We have to look at some aspects of the data
we use first
The most important aspect of inferential
statistics is the selection of the sample
A statistic is meaningless if the sample is not
representative
We must consider: Data Acquisition, Quality, & Collection Procedures
Sampling Design & Methods
-
8/14/2019 reaearch methodoly
31/51
Data Acquisition
Any descriptive summaries that we form from a
data set, or any inferences that we draw from the
data set fundamentally reply upon the notion that
the observations that the data record are anaccurate reflection of the phenomenon of interest
at the time they were taken
To have any confidence in the usefulness of a
dataset, we need to be aware ofhow the data wascollected, and by whom, and make use of that
data to inform our judgment about how sound that
source of data is for a given purpose
-
8/14/2019 reaearch methodoly
32/51
Data Acquisition
The fundamental distinction we can draw between sources
of data is data that you have collected yourself, versus data
that has been collected by others and archived
Collected - In many ways, this is the best sort of data
because you can be absolutely certain of the methods used,
although this can be expensive
Archived - Has the competing merit ofalready beingavailable, possibly having been collected over a period of
time, and others have undertaken the expense of doing so
-
8/14/2019 reaearch methodoly
33/51
Collected Data
Collected Data - a.k.a. primary data, is collected directly by
the researcher through experiments, measurements, field
surveys etc.
Benefits: Total certainty as to methods used and error
associated with them, can be customized to the research
question, the methods can be precisely repeated on another
occasion or in another location
Drawbacks: Collecting data is expensive, there may not be
a comparable historical record of similar measurements, gives
critics an opportunity to criticize your data collection as well!
-
8/14/2019 reaearch methodoly
34/51
Collected Data
Collected Data Cont. - We can further sub-divide collected
data into categories that denote the sort of collection
procedure used to produce the data:
Experimental (controlled experiment) data is produced
under repeatable conditions and is presumably an objective
description of some phenomenon (often used in physical
geography)
Non-experimental data, such as interview or
questionnaires are used to assess more qualitative or
subjective ideas or concepts (often applied in the human
geography context)
-
8/14/2019 reaearch methodoly
35/51
Archived Data
Archived Data - Data that is already available because it
has been collected by someone else
Benefits: The expense of collecting the data has beenabsorbed already, the methods used are often a standard
approach that allows for inter-comparison with historical
records or records for other places
Drawbacks: One cannot be as sure of the data quality,methods and associated errors here (sometimes metadata is
not available), the variable of interest may not be available, or
your definition may vary slightly from that used by others
-
8/14/2019 reaearch methodoly
36/51
Archived Data
Archived Data cont. - We can characterize archived data
as being internal (meaning it was collected by another
member of your organization), orexternal (meaning it was
collected by someone you do not know as well) we can
call these:
Secondary data, which is obtained directly from those that
did collect the data
Tertiary data, which we can obtain from a third-party(sometimes via publication, sometimes not), often this is
data which has already been analyzed or transformed
somehow
-
8/14/2019 reaearch methodoly
37/51
Data Quality
The further removed we are from those that actually
collect and create a data set, the worse offwe are when
using that data
The results of any statistical study are only as good as
the data that was used, thus the quality of the data is very
important because it in turn determines the quality and
reliability of descriptions and inferences based upon it
Data obtained externally should be used only after a
serious investigation and consideration of its quality and
reliability
-
8/14/2019 reaearch methodoly
38/51
Sampling Populations
Typically, when we collect data, we are somewhat limited in the scope
of what information we can reasonably collect
Ideally, we would enumerate each and every member of a population
so we could know its parameters perfectlyIn most cases this is not possible, because of the size of the
population (infinite populations?) and associated costs (time, money,
etc.)
Usually it is not necessary, because by collecting data on anappropriate subset of the population we can create statistics that are
adequate estimates of population parameters
Instead, we sample a population, trying to get information about a
representative subset of the population
-
8/14/2019 reaearch methodoly
39/51
Sampling Concepts
We must define the sampling unit - the smallest sub-division of the
population that becomes part of our sample
We want to minimize sampling error when we design how we will
collect data: Typically the sampling error as the sample sizebecause larger samples make up a larger proportion of the population
(and a complete census, for example, theoretically has no sampling
error)
We want to try and avoid sampling bias when we design how we
will collect data: Bias here is referring to a systematic tendency in
the selection of members of a population to be included in a sample,
i.e. any given member of a population should have an equal chance of
bein included in the sam le for random sam lin
-
8/14/2019 reaearch methodoly
40/51
Steps in Sampling
1. Definition of the population - We first need to identify
the population we wish to sample, and do so somewhat
formally because any inferences we draw are really
only applicable to that population
2. Construction of a sampling frame - This involvesidentifying all the individual sampling units within a
population in order that the sample can be drawn from
them. In a survey-type study, this could involve
procuring a list of all the potential individuals who couldbe included in a sample.
-
8/14/2019 reaearch methodoly
41/51
Steps in Sampling Cont.
3. Selection of a sampling design - This is a critical decision about
how to collect the sample. We will look at some different
sampling designs in the following slides
4. Specification of information to be collected - The formaldefinition ofwhat data we will collect and how Often, a pilot
sample is conducted to refine the sampling design and
specifications to help minimize biases that only become apparent
once the sampling design and specifics are tested
5. Collection of the data - When we have steps 1-4 straight, we go
about collecting the sample
-
8/14/2019 reaearch methodoly
42/51
Types of Samples (Designs)
We can distinguish between two families of sampling designs:
Non-probability designs are not concerned with being
representative by virtue of minimizing bias, are typically used
for non-scientific purposes, and are not appropriate forstatistical inference studies, although they can be useful in an
investigative sense
Probability designs aim to representative of the population
they sample, follow rules of randomness in selection tominimize bias, and are those that are used in scientific studies
were inferential statistics will be used
Non-probability Sampling
-
8/14/2019 reaearch methodoly
43/51
Non-probability SamplingDesigns
Some types of non-probability designs:
Volunteer sampling - A self-selecting sample, which is
convenient, but rarely representative
Quota sampling - Researchers select individuals to includebased on fulfilling counts of sub-groups
Convenience sampling - Individuals are included in the
sample because they are available/accessible
Judgmental or purposive sampling - Those that are chosento be included in the sample are chosen based upon some
preconceived notions of what sorts of individuals would be
most appropriate for this investigative purpose (e.g. product
testing based on ideas about the market for a product)
Probability Sampling Designs -
-
8/14/2019 reaearch methodoly
44/51
Probability Sampling Designs -Random
Random sampling - In general, we need some degree of
randomness in the selection of a sample to be able to draw any
meaningful inferences about a population, but in some cases this
may conflict with representativeness
These are drawn in such a way that every unit of a population has
an equal chance of being chosen and the selection of one unit has
no impact on whether or not another individual will be selected
(independence)
This can be done with or without replacement (which determineswhether the same unit can be drawn twice)
We can generate random numbers using a table or using a
computer, and can scale the 0 to 1 values to any required range of
values
Probability Sampling
-
8/14/2019 reaearch methodoly
45/51
Probability SamplingDesigns - Systematic
Representative approaches place restrictions on selection:
Systematic sampling - This approach uses every kth element of
the sampling frame, by beginning at a randomly chosen point in
the frame, e.g. given a sampling frame of size = 200, to create asample of size n=10 from such a sample, select a random point to
begin within the frame and then include every 20 th value in the
systematic sample
This approach assumes that the assignment of the individuals inthe sampling frame is random (i.e. they have not been placed in
the frame in some order or grouping), and this should be checked
before systematically sampling from a frame
Probability Sampling
-
8/14/2019 reaearch methodoly
46/51
Probability SamplingDesigns - Systematic
Some problems with systematic sampling:
The possible values of sample size n are somewhat restricted by
the size of the sampling frame, since the interval should divide
evenly into the size of the sampling frame
If the population itself exhibits some periodicity, then a stratified
sample is likely to not be representative
Probability Sampling Designs -
-
8/14/2019 reaearch methodoly
47/51
Probability Sampling Designs Stratified
We may need to place restrictions on how we select units for
inclusion in a sample to ensure a representative sample.
Stratified sampling - Divide the population into categories and
select a random sample from each of theseThis approach can be used to decrease the likelihood of an
unrepresentative sample if the classes/categories/strata are selected
carefully (the individuals within a strata must be very much alike,
which means that the population must be able to divided intorelatively homogeneous groups)
We need to know something about the population in order to make
good decisions about stratification
Probability Sampling Designs -
-
8/14/2019 reaearch methodoly
48/51
Probability Sampling Designs Stratified
We can take a stratified sample that is
Proportional - Where the random sample drawn from each
class/category/stratum is the same size OR
Disproportional - Where random samples ofdifferent sizes aredrawn from each class/category/stratum, with the sample size
usually being chosen on the basis of the size of that sub-
population. This approach is best used when the sizes of the
categories are significantly different, although it can also beapplied to mitigate cost issues (i.e. it may be more costly to
sample in a swamp than in a grassy field, so we might choose to
take less samples in the swamp, although this clearly would be
nothing to enhance representativeness in our sample)
Probability Sampling Designs -
-
8/14/2019 reaearch methodoly
49/51
Probability Sampling Designs Stratified
WARNING:
A class/category/stratum that is homogeneous with
respect to one variable may have high variation with
respect to another variable! Thus, stratification
must be performed with some foreknowledge of how
the sample will be analyzed, and if the sampling is
being performed in a preliminary fashion (stillseeking the relationships), there is a danger that the
stratification will be found to be inappropriate after
the fact
Probability Sampling Designs -
-
8/14/2019 reaearch methodoly
50/51
Probability Sampling Designs Cluster
Another sampling approach that subdivides the population into
categories is cluster sampling
Cluster sampling - Divides the population into categories based on
convenience rather than some structure designed to promote unbiased
representation of a particular variable across all clusters, and
sampling is performed within individual clusters
Certain clusters are selected forintensive study, usually by a
random procedure, and the content of clusters should each be
individually be heterogeneous (a cross-section of the range of valuesseen in the whole population), and thus representative
This is usually applied for reasons of cost and convenience
Ch i S li i
-
8/14/2019 reaearch methodoly
51/51
Choosing a Sampling Design
In a geographic context:
Stratified sampling works best if the regions are reasonably
homogeneous
Cluster sampling works best if the regions are heterogeneousFrom an efficiency point of view (the number of samples required),
stratified sampling is best since it can be representative using a
smaller number of samples, but if there is no clear means of rational
stratification, then clustering might be the way to goMany sampling designs are hybrids of approaches (e.g. stratify by
ethnic group, cluster to pick neighborhoods, select houses randomly)