administrivia
DESCRIPTION
Administrivia. Lab 2 distributed this Thurs. Introduction to Experiments. Research Design. Correlational. Experimental. Variation comes from nature Researcher draws conclusions from hands-off observation. Researcher manipulates variables directly - PowerPoint PPT PresentationTRANSCRIPT
Administrivia
Lab 2 distributed this Thurs
Introduction to Experiments
Research Design
Correlational
Experimental
• Variation comes from nature
• Researcher draws conclusions from hands-off observation
• Researcher manipulates variables directly
• In a pure experiment, conditions are randomly assigned
Experiments vs Non-Experiments
Experiment is a study in which variation is introduced by the researcher
The type of experiment (such as a true, randomized experiment) depends on several factors– the elements of design dictate the type of experiment you have
Correlational studies include responses from natural groups (e.g., most surveys are non-experiments)
Just because you manipulate something, it does not make it an experiment.
http://imgs.xkcd.com/comics/the_data_so_far.png
Why Experiments? Because we want to test specific,
directional hypotheses.
Because we want to isolate cause from effect.
Because we want tight control of conditions to rule out spurious relationships.
Experiments can be useful for exploration, but only when we want to isolate and explore specific mechanisms.
Because we need carefully-constructed designs that can be replicated to verify findings across environments or situations.
Why Experiments? Consider Pro’s and Con’s of Smoking Research We have decades of observational research on
the link between smoking and health outcomes such as cancer. But, despite overwhelming correlational evidence, it is actually not possible to establish a causal relationship through observation alone.
Smoking Research Example What would a “true experiment” of smoking and health
look like? Randomly assign people from the population to be
smokers or non-smokers (principle of random assignment)
Long-term study over lifetime (since we would need to have enough time for smoking to have negative effects).
We might have to withhold treatments to help with negative health effects in order to get good data on negative health outcomes.
In sum, this would be a terribly unethical thing to do– which is why most researchers accept the overwhelming observational research even though no true experiment on the effects of smoking has ever been conducted by scientists.
But… that doesn’t mean that researchers have not pushed the boundaries of ethics over the years in an attempt to isolate certain causes and effects…
First, a few famous Experiments…for good and bad reasons
Tuskegee Syphilis Study: 1932-1972 clinical study of natural progression of untreated syphilis in rural African American men who thought they were getting free health care from US Government. Enrolled about 400 men who had syphilis but didn’t know it, and 200 without the disease. No one was told that they had syphilis, and those with the disease were not treated in order to observe the progression.
Important because: researchers withheld known treatment to observe disease running its course. Led to the establishment of the Office for Human Research Protections (OHRP) and IRB’s.
Milgram Obedience Study Famous study about role of obedience to
authority figures. Conducted in early 1960’s, at a time when Nazi war criminals were on trial.
The larger question that Milgram was interested in was whether people like the Nazi’s were just “following orders”? Could any average person be pushed to harm others simply by the power of an authority asking him or her to behave a certain way?
Milgram Obedience Study
Basic design: Experimenter asked the teacher to give questions to a student (also called a learner). The learner was a confederate (someone in on the study). The teacher would give electric shocks when the student did not give correct answers to questions. The experimenter would simply tell the teacher that “the study must continue”, no matter how high the electric shocks went.
Conditions included putting the “student” and “teacher” in same room or not, as well as having the experimenter in the same room as the teacher or not.
Important because: raised serious issues of ethics in lab studies, but also showed how tough concepts (such as obedience) could be studied in a controlled setting.
Stanford Prison Experiment
Similar to Milgram’s study, the goal was to learn about the psychological effects of being a prison guard or a prisoner.
University randomly assigned students to be “prisoners” or “guards”. Researchers watched to see how they behaved over time.
Important because: it changed how studies are reviewed by IRB (Institutional Review Board). The psychological trauma to some of the participants raised an important issue about detrimental effects after a study is over.
Video Links to 3 Examples AboveTuskegee Syphilis Study
http://www.c-spanvideo.org/program/Syp
Milgram Obedience Study http://www.youtube.com/watch?v=fCVlI-
_4GZQ
Stanford Prison Experiment http://www.youtube.com/watch?v=sZwf
Ns1pqG0
In Sum… Many, many types of experiments, some with
small samples (such as lab experiments), others with thousands or even millions of participants (such as online field experiments).
Data scientists working with very large data often conduct experiments with users– and unlike academic research, these experiments are not necessarily disclosed to participants (e.g., Wikimedia foundation (Wikipedia), Google search data, Facebook, Insurance companies, rewards and purchasing programs at companies like Wal-Mart, Target, grocery stores.
In Sum (continued)… Data scientists are especially interested in
experiments because they let us test specific causal mechanisms in a way that may not be possible in other forms of data collection (such as scraping data from the web, or compiling data from multiple sources).
But, even if we could construct a good “true” experiment, we also have to balance issues of ethics. We have seen some classic examples of ethical problems, but that doesn’t mean the issue has gone away. Today, data scientists routinely come up against issues of privacy and data collection, security of sensitive information, and trust between those who collect and analyze information and those who provide it.
True Randomized Experimental Design
(1) Independent Variables must be manipulated (usually by experimenter, sometimes by context)
(2) Participants must be assigned randomly to various conditions or groups
Courtesy http://psychology.ucdavis.edu/SommerB/
Example: Asch Conformity Experiments: “True Experiment” Design
Study: Would individuals conform to peer pressure even if they knew their peers were wrong? Participants sat in room with confederates and had to say which line (A, B, C) matched the line in the other box.
Independent variable manipulated by experimenter: Whether there was any peer pressure to give
incorrect answer or not.
Random Assignment to Control group or Treatment Group:
In treatment group, confederates answered incorrectly and the subject answered last
In control condition, no pressure to give incorrect answer (no confederates were present)
Meanings of the term “Controls” in Research
At least three different meanings:
▪ Control in an Experiment
▪ Controlled Studies
▪ Controls used in a study or analysis (e.g., controlling for gender while examining relationship between education and income)
Active versus Attribute Independent Variables in Experiments
Active independent variable(s):The I.V. is “given” to the
participants, usually for some specified time period. It is often manipulated and controlled by the investigator. (e.g., presence or lack of peer pressure in the Asch conformity experiment)
Attribute independent variable(s):A predictive, defining
characteristic of individuals. Cannot be manipulated. (e.g., Gender of participants)
Two Different Types of Randomization: Randomization in Sample and Assignment Random Sample
System for choosing participants from a population
Generally, larger sampling population leads to better generalizability.
Random Assignment Method for assigning
participants randomly to experiment conditions
Pitfalls in Assignment to Conditions
▪ Salk Vaccine Trail▪ Huge vaccine study for polio.
Wealthy families more likely to consent to study, but incidence of polio was correlated with quality of living environment. Again, allowing self-selection into study rather than true random assignment can affect the interpretation of results.
Pre-test, experimental manipulation and post-testing
Pre-test: allows us to check group equivalence before the intervention X is introduced.
Experimental manipulation/Treatment: An independent variable (X) that the experimenter manipulates.
Post-test: allows us to check group equivalence after intervention X has been introduced.
Hypothesis Random Assignment Measure D.V. Treatment Measure D.V.
(pre-test) (post-test)
Common types of true experiments
RO
O
X O
O
RX
O
O
Pretest-Posttest Control
Post-only Control
R
O
O
X O
O
(1)
(2)
X O (3)O (4)
Solomon 4-group
Three Major Types of Experiments True experiments protect against both time
and group threats to internal validity by randomly assigning subjects to treatment and control groups. The treatment (independent variable) is active.
If we cannot randomly assign subjects to treatment and control groups, then it is a quasi-experiment. The independent variable is active.
If we cannot randomly assign subjects to groups because the groups contain the attribute of interest, and if we give all groups the same treatment, then it is an associational non-experiment. The independent variable is not active.
We have already talked a lot about true experiments by example, so lets focus on the other two major types of experiments…
Example: Pen Study
Question: Do individuals in the US and Japan make different choices about ‘unique’ versus ‘less unique’ items when given a choice?
Pen Study Independent Variables
Cultural difference: Japanese students compared to US students
Pen layout (4 pens of one color, 2 pens of another color)▪ Everyone received the same treatment
Assignment Subjects were not randomly assigned because they
already fell into one of the two societies. Dependent Variable:
Would they choose the ‘common’ pen or the ‘unique’ one?
Example: Website Credibility StudyQuestion: Do people infer different
amounts of credibility in websites with different URLs?
Web Credibility Study Example
Independent Variable Experiment Condition (2 conditions):
Website has professional URL. Website has dubious URL.
(www.gooogle.com/webuygoldandsellprescriptions)
Assignment We cannot control who goes to which
website– we can only wait for enough individuals to go to each site and respond to the questionnaire about website credibility.
Dependent Variable Credibility rating (a series of questionnaire
items)
Internal Validity
Design Concerns Internal Validity External Validity Ecological Validity
Photo credit: http://www.indiana.edu/~p1013447/dictionary/int_val.htm
Internal Validity
Internal Validity deals with questions about whether changes in the dependent variable were caused by the treatment.
Cause Effect
? ?
? ?
Threats to Internal Validity History
▪ additional I.V. that occurs between pre-test and post-test
Maturation▪ Subjects simply get older and
change during experiment
Testing▪ Subjects “get used” to being
tested
Regression to the Mean▪ Issue with studies of “high and
low extremes” on some key variable
Contamination and Internal Validity
Demand Characteristics▪ Anything in the experiment that could guide subjects to expected outcome
General Demand Characteristics
Three Key Demand Characteristics: Eval Apprehension: Subjects know that they are
being evaluated and this changes their behavior; creates anxiety and can interfere with task performance.
Social Desirablilty: Subjects act how they think the experimenter wants them to act, or what they think would be proper behavior
Placebo effect: just being treated or being given *anything* can lead to measurable improvement over nothing at all (e.g., sugar pill cures headache, etc).
Solutions to Demand Characteristics Solutions:
Double-blind experiments
Experiments in natural setting (i.e., subjects do not know they are in an experiment)
Hidden measurements
Cover stories▪ Provide a reasonably believable story that explains what the
researcher is doing, while concealing the actual intention of the study.
Expectancy Effects
Experimenter Expectancy▪ Researcher behavior that guides subjects to expected outcome (self-fulfilling prophecy)
Reducing expectancy effects
Blind Researcher is unaware of the
experiment condition that he/she is administering.
Naïve experimenter Those conducting study are not
aware of theory or hypotheses in the experiment
Standardization Experimenter follows a script, and
only the script “Canned” Experimenter
Audio/Video/Print material gives instructions
And More Potential Problems with Experiments!
Selection Bias▪ Issue with non-random selection of
subjects (e.g., using sample by convenience and mistaking this as random…eg., picking a few people “at random” from the same street corner)
Mortality/Attrition▪ Departure of subjects in the
experiment
Diffusion, Sharing of Treatments▪ Control group unexpectedly obtains
treatment
Other ‘social’ threats?▪ “Compensatory rivalry” and “resentful
demoralization”: the control group may realize their status and be upset that they do not get the real treatment.
In Sum… The validity of an experiment is subject to a large
variety of threats. You should be aware that experiments absolutely depend
on stamping out these sources of bias.
This is the fun and the challenge of experimental designs
Three key threats to external validity (generalizability) in experiments
Setting: Physical and social context of the experiment
Population: Is there something specific about the sample that interacts with the treatment?
History: Is there something about the time that interacts with the treatment?
External Validity: How far does the given study/experiment generalize to similar groups, individuals, etc?
Ecological ValidityEcological Validity: Approximation of ‘real-life’ situations
This is such a tough one– for any lab experiment it is really very hard to get high ecological validity, even with computer-based studies.
The Validity Tradeoff: Truth and Myth
Internal Validity
External Validity
Balance is important between the types of validity, but internal validity is usually (if not always) the more important factor.
Experiments in the Field
Some experiments can be conducted in a real-world setting while maintaining random assignment and manipulation of treatments
Field Experiment Example
Milliman (1986) Study of music tempo and restaurant customer behavior
Natural Experiments Variation is provided by natural or
unplanned event Treatment is exogenous (no self-selection) Natural disasters Vietnam war draft School admissions cutoffs Olympics reduces air
pollution Distance from school to
ISP central office Provides an
identification strategy: a way to identify a causal relationship in the data
Pro’s and Con’s of Experiments Pro’s
Gives researcher tight control over independent factors Allows researcher to test key relationships with as few
confounding factors as possible Allows for direct causal testing
Con’s Often very small N in lab studies; enough for statistical
purposes but not ideal for generalizability. Sometimes give up large amounts of external validity in
favor of internal validity (and vice versa) Require a large amount of planning, training, and time–
sometimes to test relationship between only 2 factors!