reaearch methodoly

Upload: rashmeet-kohli

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 reaearch methodoly

    1/51

    SurveysSurveys

  • 8/14/2019 reaearch methodoly

    2/51

    What is it?

  • 8/14/2019 reaearch methodoly

    3/51

    A survey is a measurement process used to collect

    information during a highly structured interview

    sometimes with a human interviewer and other times

    without. The questions are carefully chosen or crafted,

    sequenced, and precisely asked for each participant.

  • 8/14/2019 reaearch methodoly

    4/51

    The sources of errors in the communicationapproach

    Selection or crafting inappropriate questions

    Asking questions in an inappropriate order

    Use of inappropriate transitions and

    instructions to elicit information.

  • 8/14/2019 reaearch methodoly

    5/51

  • 8/14/2019 reaearch methodoly

    6/51

    Interviewer error

    Failure to secure participant co-

    operation : It is likely that

    interviewers dont do a good job of

    enlisting participants to co-operate.

    Failure to record answers accurately

    and completely

    Failure to consistently execute

    interview procedures

    Failure to establish appropriate

    interview environment

    Falsification of individual answers or

    whole interviews

    Inappropriate influencing behaviour

    Physical presence bias

  • 8/14/2019 reaearch methodoly

    7/51

    Participant error

    For a successful survey three broad conditions must be met by

    the participants:

    1. They must posses the information being targeted by the

    investigative questions.

    2. Must understand his/her role in the interview as the provider of

    accurate information.

    3. Must have adequate motivation to cooperate.

  • 8/14/2019 reaearch methodoly

    8/51

    Participation based error

    Three factors influence participation:

    1. Must believe that the experience will be pleasant and satisfying

    2. Must believe that answering the survey is important and

    worthwhile use of his or her time.3. Must dismiss any mental reservations that he or she might

    have about participation.

  • 8/14/2019 reaearch methodoly

    9/51

    Choice of the processChoice of the process

    Refer to the word document onRefer to the word document on

    surveyssurveys

  • 8/14/2019 reaearch methodoly

    10/51

    Experiments and test markets

    What are experiments?

    1. A study involving intervention by the researcher beyond that

    required for measurement.

    2. The usual intervention is to manipulate some variable in thesetting and observe how it affects the subjects being studied.

    i.e the researcher manipulates the independent variable and

    then observes whether the hypothesized dependent variable is

    affected by the intervention.

  • 8/14/2019 reaearch methodoly

    11/51

    Conducting an experiment

    A researcher must accomplish certain activates to do a

    successful experiment:

    1. Select relevant variables

    2. Specify the treatment levels.3. Control the experimental environment

    4. Choose the experimental design

    5. Select and assign the subjects

    6. Pilot test, revise and test

    7. Analyze the data

  • 8/14/2019 reaearch methodoly

    12/51

    HYPOTHESIS: It is a relational statement as it describes therelationship between two or more variables.

    Treatment levels:

    In an experiment , participants experience a manipulation of theindependent variable called the experimental treatment.

    The treatment levels of the independent variable are the arbitraryor natural groups the researcher makes within the independentvariable of an experiment.

    e.g.: if salary is hypothesized to have an effect on employees

    exercising of stock of purchase options, it might be divided intohigh , middle and low ranges to represent three levels of theindependent variable.

  • 8/14/2019 reaearch methodoly

    13/51

    Miscellaneous terms

    The control group is composed of subjects who are not exposed

    to the independent variables in contrast to those who receive the

    experimental treatment.

    When the subjects do not know they are receiving treatment they

    are said to be blind.

    When the experimenters do not know that they are giving

    treatment to the experimental group or to the control group the

    experiment is said to be double blind.

  • 8/14/2019 reaearch methodoly

    14/51

    Sampling Design

  • 8/14/2019 reaearch methodoly

    15/51

    Lucky ones get to work

    with these

    The rest of us mere

    mortals have to make dowith.

    0

    50

    100

    1st

    Qtr

    3rd

    Qtr

    East

    West

    North

  • 8/14/2019 reaearch methodoly

    16/51

    Sampling is choosing which subjects to measure in a

    research project.

    Regardless, sampling will determine how much and

    how well the researcher may generalize his or herfindings. A bad sample may well render findings

    meaningless.

  • 8/14/2019 reaearch methodoly

    17/51

    Key concepts and terms

    Population: The population is the set of people or

    entities to which findings are to be generalized.

    The population must be defined explicitly before a

    sample is taken. Enumerations or censuses are collections of data

    from every person or entity in the population.

  • 8/14/2019 reaearch methodoly

    18/51

    Random sampling is data collection in which every person inthe population has a chance of being selected which is knownin advance. Normally this is an equal chance of being selected.

    If data are a random sample, the researcher must report notonly the magnitude of relationships uncovered but also theirsignificance level (the chance the findings are due to thechance of sampling).

  • 8/14/2019 reaearch methodoly

    19/51

    The sampling frame is the list of ultimate sampling entities,

    which may be people, households, organizations, or other units

    of analysis. The list of registered students may be the sampling

    frame for a survey of the student body at a university.

    Telephone directories are often used as sampling frames, for

    instance, but tend to under-represent the poor (who have fewer

    or no phones) and the wealthy (who have unlisted numbers).

    Random digit dialing (RDD) reaches unlisted numbers but not

    those with no phones, while over representing householdsowning multiple phones. In multi-stage sampling, there will be

    one sampling frame per stage

  • 8/14/2019 reaearch methodoly

    20/51

    Significance is the percent chance that a relationship found inthe data is just due to an unlucky sample, and if we tookanother sample we might find nothing.

    That is, significance is the chance of a Type Ierror: the chance

    of concluding we have a relationship when we do not. Socialscientists often use the .05 level as a cutoff: if there is 5% orless chance that a relationship is just due to chance, weconclude the relationship is real (technically, we fail to acceptthe null hypothesis that the strength of the relationship is not

    different from zero).

  • 8/14/2019 reaearch methodoly

    21/51

    HYPOTHESIS: It is a relational statement as itdescribes the relationship between two or morevariables.

    Treatment levels: In an experiment , participants experience a

    manipulation of the independent variable called the

    experimental treatment. The treatment levels of the independent variable

    are the arbitrary or natural groups the researchermakes within the independent variable of anexperiment.

    e.g.: if salary is hypothesized to have an effect onemployees exercising of stock purchase options, itmight be divided into high , middle and low rangesto represent three levels of the independentvariable.

  • 8/14/2019 reaearch methodoly

    22/51

    Miscellaneous terms

    The control group is composed of subjects who arenot exposed to the independent variables incontrast to those who receive the experimentaltreatment.

    When the subjects do not know they are receivingtreatment they are said to be blind.

    When the experimenters do not know that they aregiving treatment to the experimental group or to

    the control group the experiment is said to bedouble blind.

  • 8/14/2019 reaearch methodoly

    23/51

    Ruling out Chance as an Explanation

    When an independent variable appears to have an effect, it isvery important to be able to state with confidence that theeffect was really due to the variable and not just due to chance.

    consider a hypothetical experiment on a new antidepressantdrug.

    Ten people suffering from depression were sampled andtreated with the new drug (the experimental group);

    an additional 10 people were sampled from the samepopulation and were treated only with a placebo (the controlgroup).

    After 12 weeks, the level of depression in all subjects wasmeasured and it was found that the mean level of depression(on a 10-point scale with higher numbers indicating moredepression) was 4 for the experimental group and 6 for thecontrol group

    http://davidmlane.com/hyperstat/A29697.htmlhttp://davidmlane.com/hyperstat/A29697.html
  • 8/14/2019 reaearch methodoly

    24/51

    The most basic question that can be asked here is: "How can

    one be sure that the drug treatment rather than chanceoccurrences were responsible for the difference between the

    groups?"

    It could be that by chance, the people who were randomly

    assigned to the treatment group were initially somewhat less

    depressed than those randomly assigned to the control group

  • 8/14/2019 reaearch methodoly

    25/51

    Confidence intervals are directly related to coefficients ofsignificance. For a given variable in a given sample, one couldcompute the standard error, which, assuming a normaldistribution, has a 95% confidence interval of plus or minus

    1.96 times the standard error. If a very large number of samples were taken, and a (possibly

    different) estimated mean and corresponding 95% confidenceinterval was constructed from each sample, then 95% of theseconfidence intervals would contain the true population value,

    assuming random sampling. The formula for calculating the confidence interval,

    significance levels, and standard errors, etc will be discussedlater.

    http://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htmhttp://www2.chass.ncsu.edu/garson/pa765/normal.htm
  • 8/14/2019 reaearch methodoly

    26/51

    Standard error. If we took several samples of the same thing we would,of course, be able to compute several means, one for each sample.

    If we computed the standard deviation of these sample means as anestimate of their variation around the true but unknown population mean,

    that standard deviation of means is called the standard error. Standard error measures the variability of sample means. However, since

    we normally have only one sample but still wish to assess its variability,we can compute estimated standard errorby this formula:

    SE = sd/SQRT(n)

    where sd is the standard deviation for a variable and n is samplesize. We are estimating that SE diminishes proportional to thesquare root of n. The larger the n, the smaller the SE. Oftenestimated standard error is just called 'standard error.'

  • 8/14/2019 reaearch methodoly

    27/51

    Census & sample survey

    A census is basically a complete enumeration of all items in

    the population. Such and inquiry should cover all items and

    nothing should be left to chance.

    But remember that in practice this may not always be true.

    A census would require a great deal of time , money and

    resources.

    Due to these reasons most studies undertake sample survey's

    instead.

  • 8/14/2019 reaearch methodoly

    28/51

    What happens in a Sample design process

    The respondents basically select a representative of thetotal population.

    The selected constitute what is called a sample and theprocess is called sampling technique.

    The survey that us conducted is called sample survey.

    Arithmetically : if we let the population size be N anda part of it be n where (n

  • 8/14/2019 reaearch methodoly

    29/51

    Two Sorts of Statistics

    Descriptive statistics

    To describe and summarize the characteristics of the

    sample

    Applied in the context ofexploratory techniques

    Inferential statistics

    To infer something about the population from the sample

    Applied in the context ofconfirmatory methods

    From Descriptive to

  • 8/14/2019 reaearch methodoly

    30/51

    From Descriptive toInferential

    We have to look at some aspects of the data

    we use first

    The most important aspect of inferential

    statistics is the selection of the sample

    A statistic is meaningless if the sample is not

    representative

    We must consider: Data Acquisition, Quality, & Collection Procedures

    Sampling Design & Methods

  • 8/14/2019 reaearch methodoly

    31/51

    Data Acquisition

    Any descriptive summaries that we form from a

    data set, or any inferences that we draw from the

    data set fundamentally reply upon the notion that

    the observations that the data record are anaccurate reflection of the phenomenon of interest

    at the time they were taken

    To have any confidence in the usefulness of a

    dataset, we need to be aware ofhow the data wascollected, and by whom, and make use of that

    data to inform our judgment about how sound that

    source of data is for a given purpose

  • 8/14/2019 reaearch methodoly

    32/51

    Data Acquisition

    The fundamental distinction we can draw between sources

    of data is data that you have collected yourself, versus data

    that has been collected by others and archived

    Collected - In many ways, this is the best sort of data

    because you can be absolutely certain of the methods used,

    although this can be expensive

    Archived - Has the competing merit ofalready beingavailable, possibly having been collected over a period of

    time, and others have undertaken the expense of doing so

  • 8/14/2019 reaearch methodoly

    33/51

    Collected Data

    Collected Data - a.k.a. primary data, is collected directly by

    the researcher through experiments, measurements, field

    surveys etc.

    Benefits: Total certainty as to methods used and error

    associated with them, can be customized to the research

    question, the methods can be precisely repeated on another

    occasion or in another location

    Drawbacks: Collecting data is expensive, there may not be

    a comparable historical record of similar measurements, gives

    critics an opportunity to criticize your data collection as well!

  • 8/14/2019 reaearch methodoly

    34/51

    Collected Data

    Collected Data Cont. - We can further sub-divide collected

    data into categories that denote the sort of collection

    procedure used to produce the data:

    Experimental (controlled experiment) data is produced

    under repeatable conditions and is presumably an objective

    description of some phenomenon (often used in physical

    geography)

    Non-experimental data, such as interview or

    questionnaires are used to assess more qualitative or

    subjective ideas or concepts (often applied in the human

    geography context)

  • 8/14/2019 reaearch methodoly

    35/51

    Archived Data

    Archived Data - Data that is already available because it

    has been collected by someone else

    Benefits: The expense of collecting the data has beenabsorbed already, the methods used are often a standard

    approach that allows for inter-comparison with historical

    records or records for other places

    Drawbacks: One cannot be as sure of the data quality,methods and associated errors here (sometimes metadata is

    not available), the variable of interest may not be available, or

    your definition may vary slightly from that used by others

  • 8/14/2019 reaearch methodoly

    36/51

    Archived Data

    Archived Data cont. - We can characterize archived data

    as being internal (meaning it was collected by another

    member of your organization), orexternal (meaning it was

    collected by someone you do not know as well) we can

    call these:

    Secondary data, which is obtained directly from those that

    did collect the data

    Tertiary data, which we can obtain from a third-party(sometimes via publication, sometimes not), often this is

    data which has already been analyzed or transformed

    somehow

  • 8/14/2019 reaearch methodoly

    37/51

    Data Quality

    The further removed we are from those that actually

    collect and create a data set, the worse offwe are when

    using that data

    The results of any statistical study are only as good as

    the data that was used, thus the quality of the data is very

    important because it in turn determines the quality and

    reliability of descriptions and inferences based upon it

    Data obtained externally should be used only after a

    serious investigation and consideration of its quality and

    reliability

  • 8/14/2019 reaearch methodoly

    38/51

    Sampling Populations

    Typically, when we collect data, we are somewhat limited in the scope

    of what information we can reasonably collect

    Ideally, we would enumerate each and every member of a population

    so we could know its parameters perfectlyIn most cases this is not possible, because of the size of the

    population (infinite populations?) and associated costs (time, money,

    etc.)

    Usually it is not necessary, because by collecting data on anappropriate subset of the population we can create statistics that are

    adequate estimates of population parameters

    Instead, we sample a population, trying to get information about a

    representative subset of the population

  • 8/14/2019 reaearch methodoly

    39/51

    Sampling Concepts

    We must define the sampling unit - the smallest sub-division of the

    population that becomes part of our sample

    We want to minimize sampling error when we design how we will

    collect data: Typically the sampling error as the sample sizebecause larger samples make up a larger proportion of the population

    (and a complete census, for example, theoretically has no sampling

    error)

    We want to try and avoid sampling bias when we design how we

    will collect data: Bias here is referring to a systematic tendency in

    the selection of members of a population to be included in a sample,

    i.e. any given member of a population should have an equal chance of

    bein included in the sam le for random sam lin

  • 8/14/2019 reaearch methodoly

    40/51

    Steps in Sampling

    1. Definition of the population - We first need to identify

    the population we wish to sample, and do so somewhat

    formally because any inferences we draw are really

    only applicable to that population

    2. Construction of a sampling frame - This involvesidentifying all the individual sampling units within a

    population in order that the sample can be drawn from

    them. In a survey-type study, this could involve

    procuring a list of all the potential individuals who couldbe included in a sample.

  • 8/14/2019 reaearch methodoly

    41/51

    Steps in Sampling Cont.

    3. Selection of a sampling design - This is a critical decision about

    how to collect the sample. We will look at some different

    sampling designs in the following slides

    4. Specification of information to be collected - The formaldefinition ofwhat data we will collect and how Often, a pilot

    sample is conducted to refine the sampling design and

    specifications to help minimize biases that only become apparent

    once the sampling design and specifics are tested

    5. Collection of the data - When we have steps 1-4 straight, we go

    about collecting the sample

  • 8/14/2019 reaearch methodoly

    42/51

    Types of Samples (Designs)

    We can distinguish between two families of sampling designs:

    Non-probability designs are not concerned with being

    representative by virtue of minimizing bias, are typically used

    for non-scientific purposes, and are not appropriate forstatistical inference studies, although they can be useful in an

    investigative sense

    Probability designs aim to representative of the population

    they sample, follow rules of randomness in selection tominimize bias, and are those that are used in scientific studies

    were inferential statistics will be used

    Non-probability Sampling

  • 8/14/2019 reaearch methodoly

    43/51

    Non-probability SamplingDesigns

    Some types of non-probability designs:

    Volunteer sampling - A self-selecting sample, which is

    convenient, but rarely representative

    Quota sampling - Researchers select individuals to includebased on fulfilling counts of sub-groups

    Convenience sampling - Individuals are included in the

    sample because they are available/accessible

    Judgmental or purposive sampling - Those that are chosento be included in the sample are chosen based upon some

    preconceived notions of what sorts of individuals would be

    most appropriate for this investigative purpose (e.g. product

    testing based on ideas about the market for a product)

    Probability Sampling Designs -

  • 8/14/2019 reaearch methodoly

    44/51

    Probability Sampling Designs -Random

    Random sampling - In general, we need some degree of

    randomness in the selection of a sample to be able to draw any

    meaningful inferences about a population, but in some cases this

    may conflict with representativeness

    These are drawn in such a way that every unit of a population has

    an equal chance of being chosen and the selection of one unit has

    no impact on whether or not another individual will be selected

    (independence)

    This can be done with or without replacement (which determineswhether the same unit can be drawn twice)

    We can generate random numbers using a table or using a

    computer, and can scale the 0 to 1 values to any required range of

    values

    Probability Sampling

  • 8/14/2019 reaearch methodoly

    45/51

    Probability SamplingDesigns - Systematic

    Representative approaches place restrictions on selection:

    Systematic sampling - This approach uses every kth element of

    the sampling frame, by beginning at a randomly chosen point in

    the frame, e.g. given a sampling frame of size = 200, to create asample of size n=10 from such a sample, select a random point to

    begin within the frame and then include every 20 th value in the

    systematic sample

    This approach assumes that the assignment of the individuals inthe sampling frame is random (i.e. they have not been placed in

    the frame in some order or grouping), and this should be checked

    before systematically sampling from a frame

    Probability Sampling

  • 8/14/2019 reaearch methodoly

    46/51

    Probability SamplingDesigns - Systematic

    Some problems with systematic sampling:

    The possible values of sample size n are somewhat restricted by

    the size of the sampling frame, since the interval should divide

    evenly into the size of the sampling frame

    If the population itself exhibits some periodicity, then a stratified

    sample is likely to not be representative

    Probability Sampling Designs -

  • 8/14/2019 reaearch methodoly

    47/51

    Probability Sampling Designs Stratified

    We may need to place restrictions on how we select units for

    inclusion in a sample to ensure a representative sample.

    Stratified sampling - Divide the population into categories and

    select a random sample from each of theseThis approach can be used to decrease the likelihood of an

    unrepresentative sample if the classes/categories/strata are selected

    carefully (the individuals within a strata must be very much alike,

    which means that the population must be able to divided intorelatively homogeneous groups)

    We need to know something about the population in order to make

    good decisions about stratification

    Probability Sampling Designs -

  • 8/14/2019 reaearch methodoly

    48/51

    Probability Sampling Designs Stratified

    We can take a stratified sample that is

    Proportional - Where the random sample drawn from each

    class/category/stratum is the same size OR

    Disproportional - Where random samples ofdifferent sizes aredrawn from each class/category/stratum, with the sample size

    usually being chosen on the basis of the size of that sub-

    population. This approach is best used when the sizes of the

    categories are significantly different, although it can also beapplied to mitigate cost issues (i.e. it may be more costly to

    sample in a swamp than in a grassy field, so we might choose to

    take less samples in the swamp, although this clearly would be

    nothing to enhance representativeness in our sample)

    Probability Sampling Designs -

  • 8/14/2019 reaearch methodoly

    49/51

    Probability Sampling Designs Stratified

    WARNING:

    A class/category/stratum that is homogeneous with

    respect to one variable may have high variation with

    respect to another variable! Thus, stratification

    must be performed with some foreknowledge of how

    the sample will be analyzed, and if the sampling is

    being performed in a preliminary fashion (stillseeking the relationships), there is a danger that the

    stratification will be found to be inappropriate after

    the fact

    Probability Sampling Designs -

  • 8/14/2019 reaearch methodoly

    50/51

    Probability Sampling Designs Cluster

    Another sampling approach that subdivides the population into

    categories is cluster sampling

    Cluster sampling - Divides the population into categories based on

    convenience rather than some structure designed to promote unbiased

    representation of a particular variable across all clusters, and

    sampling is performed within individual clusters

    Certain clusters are selected forintensive study, usually by a

    random procedure, and the content of clusters should each be

    individually be heterogeneous (a cross-section of the range of valuesseen in the whole population), and thus representative

    This is usually applied for reasons of cost and convenience

    Ch i S li i

  • 8/14/2019 reaearch methodoly

    51/51

    Choosing a Sampling Design

    In a geographic context:

    Stratified sampling works best if the regions are reasonably

    homogeneous

    Cluster sampling works best if the regions are heterogeneousFrom an efficiency point of view (the number of samples required),

    stratified sampling is best since it can be representative using a

    smaller number of samples, but if there is no clear means of rational

    stratification, then clustering might be the way to goMany sampling designs are hybrids of approaches (e.g. stratify by

    ethnic group, cluster to pick neighborhoods, select houses randomly)