382c empirical studies in software engineering lecture...
TRANSCRIPT
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 1© 2000-present, Dewayne E Perry
Artifacts/Confounding Variables
Dewayne E PerryENS 623
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 2
Limitations on KnowledgeAll experiments subject to errorUnderstand and measure itDoes not destroy our opportunities
Makes us aware of errors and limitsExtraneous variables that vary systematically
Importance of keeping other variables equalRule out alternative explanations
Two prime sourcesIrrelevant effects of proceduresArtifacts: biasing effects of investigators and participants
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 3
Nature of ProblemArtifact – finding resulting from factors other than the one intended
Usually quite extraneous to the intent of the experimenterFactors that can jeopardize the validity of the conclusions
Interested in subject-experimenter artifactsMust have dependable knowledge about the E-S equationAstronomers need to know the effects of their telescopesIn behavioral experiments, experimenter is the instrument of observation and manipulation
Subject side of the equationHuman complexityNo two research subjects behave identicallyThe same careful experiment will have different results in different places/timesSubjects know they are research participantsResearch subject role well understood
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 4
Nature of ProblemExperimenter side of the equation
Systematic errors usually unintentional2 classes
InteractionalBiases that effect the response of the subject
Non-interactionalIn the mind, eye or hand of the experimenter
ControlComparison condition to isolate some effectProcedure to serve as a check on validity
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 5
History of ProblemClever Hans
Horse known for remarkable intellectual featsTap out with help ostensibly of code table in front of him
answers to mathematical problemsDate of any day mentioned
Psychologist Oskar Pfungst noticed that he responded to unintended cues from his questioners
Eg, body positionIf animals can do this why not humans
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 6
History of ProblemHawthorne Works study – began in 1924
How workers productivity were affected by workplace conditions such as light, temperature, rest periodsBoth treatment and control groups increased their performanceSuggested reasons:
flattered to participateKeenly aware and responsive to task clues
Hawthorne effect now synonymous with placebo effect, the power of suggestion
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 7
History of ProblemRosenzweig 1933 – landmark paper
Argued that the experimental situation is a psychological problem in its own rightDeveloped methodological analysis and taxonomy of certain types of interactionsContended that subjects try to guess the purpose of the experiment and give the results expected
Called the good subject effectFurther, the experimenter might unintentionally influence the results
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 8
Resistance to the ProblemWhy did it take so long for systematic research to begin in earnest?3 suggested reasons
Phenomenon of artifacts that stem from playing a subject role presupposes the active influence of conscious cognitionConcerns about pervasive biases were possibly viewed as impeding emergence and growing influence of behavioral researchLogical positivist view placed great faith on impartiality of research
In late 50s, positivists and logical empiricists tenants began to loose their hold and cognitive science rose
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 9
Demand CharacteristicsOrne’s work on hypnotism and subject expectations
Coined the term demand characteristics of the experimental situation
Could expectations not also apply to other research?Treatment group: novel characteristic – catalepsy of the dominant hand; Control group: no such mentionAlmost all the treatment group exhibited the catalepsy; none in the control groupTypical subject:
Attentive to demand characteristicsAttempted to behave altruistically in a way that confirms the experimenters hypothesis
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 10
MotivationAltruism, evaluation apprehension, obedience as motivators
As early as high school, associate subject role with such characteristics as being cooperative, alert and observantDo not always enact altruistic role – may be other motivations
Aiken and Rosnow 7311 statements representing three motivations were compared to each other as pleasant or unpleasantKey situation: being a subject in a psychology experimentRR, Figure 6-1: arrows show mean psychological distanceWhat do we learn from the map
Being subject closely associated with good subject roleObedience and evaluation also entered into subjects thinkingParticipation is mildly pleasant work-oriented activity
Looking good (evaluation apprehension) is more likely to be dominant than doing good
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 11
Task Oriented CuesDetection:
Use quasi-control subjects as a possible way to get the subjects to figure out what is going on just by thinking about itServe as co-investigators rather than subjects of the study
Orne suggests 3 techniquesExperimental subjects function as their own controls
Post-experimental interviewPilot study on their perceptions/beliefs
Pre-inquiry:Quasi-control subjects to imagine they are the real subjectsNot subjects, but given full treatment informationQuasi-subjects predict how they might behaveSimilarity between data from quasi and real implies results could be affected by subject guesses
Blind controls: unaware of their statusCompare blind controls to quasi controlsBlind groups sometimes used as a sacrifice group
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 12
Task Oriented CuesAlternative: observe dependent variable in different contexts
Eg, both inside and outside the lab settingEg, observed by someone other than the experimenters
Orne considered these to be supplementary techniquesDo not automatically enable us to avoid problemsNot always aware of effectsChallenge: their subtlety and teasing them out
Interesting model proposed, a preliminary statement, not a theory
Assumption: subjects are sensitive to coercive demands of whatever propriety norms may be operating in the experimentFocuses on a few intervening variables instead of categorizing artifact producing variables
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 13
Theoretical ViewAssumption: artifact producing variables generalize to a few mediatory factors:
-> compliance, non-compliance, counter-compliance-> receptivity, motivation, capability
Either receptive or notIf not receptive, then non-compliant
Either motivated, not motivated or uncooperativeIf not motivated, then non-compliantIf not cooperative and capable, then counter compliantIf not cooperative and incapable, then non-compliant
Either capable or incapableIf incapable then non-compliantOtherwise compliant to demand characteristics
The only path to worry about
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 14
Prediction and ControlTwo objectives in the model
Visualize systematically how demand characteristics operateBlueprint for strategies for reducing or eliminating subject artifacts
Allows us to generate theoretical predictions about how artifact producing events might operate in a given situation
Eg. Clarity of demand and subjects behavior as a resultConsider only receptivity and motivation; hardly ever incapableMotivation and receptivity cancel out when demand very high or very low
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 15
Strategies Receptivity manipulations to minimize demand clarity
Measure dependent variable in a setting not obviously connected to the treatment or employ unobtrusive measurements
Ideally, no demand characteristics receivedApproximated in field studies with unobtrusive measuresUnaware -> receptivity is nil
Measure the dependent variable removed in time from the treatment
Ideal usually not met, reception of demand unavoidableSome demand transmitted by means of relationship between treatment and testBreak the relationship: separate in time and space
Employ Solomon design or else avoid pre-testing, especially in attitude change experiments, and instead employ after-only design
Pretest sensitization is a problemMeasure effect or rule out pre-test
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 16
StrategiesStandardize and restrict the experimenters communication with subjects
Experimenters are the main channel of demand characteristicsThe more standardized and restricted the betterEg, computerized instructions
Use blind procedures in testing and experimental manipulations
The less known the less transmittedReceptivity manipulation to generate alternative demands
Elicit false hypotheses about the purpose of the research, ie be deceptive
Contrived demands
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 17
StrategiesMotivation manipulations to encourage honest responding
Give feedback of compliant behavior in a set of pre-experimental tasks to being the subject to a state of non-acquiescence
More difficult and less confident in manipulation outcomeAim for cooperation and favorable evaluation for true experiment
Make experimental setting and procedures low-keyed and non-threatening (eg, anonymous of confidential)
Non-threatening to avoid evaluation apprehensionProtection of privacy is important
Encourage honest responding thru self-monitoring bogus pipeline: subject is told device detects lyingSubject will give truthful answersNot without risk
No method really infallible, but do need to think deeply about the problem
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 18
Non-interactional EffectsSystematic errors on the experimenter side
Observer effectsInterpreter effectsIntentional effects
Observer effectsNot so easy to be sure that one has made accurate observationsObserver effects well know in astronomyAccounted for in interpreting the data
Interpreter effectsExperimenters rarely debate accuracy of observations, will debate interpretationDifficult to state rules of accurate interpretationWrongness of interpretation often due to theory monogamy
Though theory monogamy often advantageousIntentional effects
Implies dishonestyMay cook the data too much – ie, too good to be trueNeed strong sense of ethics and honesty
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 19
Interactional EffectsBiosocial effects
Gender, age, race, of experimenterSubjects may respond differently to those aspects of the experimenterCan get different results merely by varying these factorsMales and females may unconsciously conduct different experiments
Males might be more friendly towards female subjectsBefore declaring gender differences in studies must make sure they were treated the same
Psycho-social effectsPersonality, temperament, etcDifferences in hostility, authoritarianism, status and warmth will get different responsesWarmer examiners tend to get better responses than cooler challenging
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 20
Interactional EffectsSituational effects
Context, situationMore experience experimenters tend to get different results than less experiencedAcquaintance may yield different results as wellWhat happens during experiment can cascade throughout the rest of the experiment
Modeling effectsOften experimenter will trial experimentSometimes the experimenter’s performance becomes a factor in the subject’s performanceWhen situation is ambiguous, subjects may agree with experimenter too oftenExperimenter’s behavior may have been influencing results
Sell-fulfilling prophecyResearchers expectations – expectancy effectEg, teacher thinks X is bright, treats differently from Y
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 21
Experimenter Expectancy EffectsConsider a standard type of experiment
Differing only in hypotheses, expectationsEg, study of bright rats, bright did better Meta-analysis of 345 studies
Mean effect size of expectancy bias of .33Vary according to category of studyDo occur to a considerable degree in all
How are expectancy effects communicatedPsychological climatePhysical distance from interactantsFrequency and duration of interactionsEye contact and smilingVerbal rewards and punishments
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 22
Expectancy ControlsIncrease the number of experimenters
Decreases learning of influence techniquesHelps maintain blindnessRandomizes expectanciesIncreases generality results
Observing the behavior of experimentersSometimes reduces expectancy effectsPermits correction of unprogrammed behaviorFacilitates greater standardization
Analyze experiments for their order effectsPermits inference about changes in experimenter behaviorCompare earlier with later
End of experiment changes (whew!, etc)Learning effect on experimenter
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 23
Expectancy ControlsMaintaining blind contact
Minimize expectancy effectsLack of knowledge of treatment being given make expectancy unlikely
Minimizing experimenter-subject contactMinimizes expectancy effectsImportant: does it reduce the realism of the manipulations
This affects generalizationEmploying expectancy control groups
Permits assessment of expectancy effectsExperimenter expectancy becomes a second variableGet the magnitude of expectancy effect
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 24
Participant VariablesDemographic and personal characteristicsCritical issues: groups need to be comparableMethods of control:
Random assignmentEasiest and surest way of scrambling all possible variables across all groupsPromotes but does not guarantee equivalence
Particularly on small samplesHomogeneous sample
Restrict variance by narrowing sampleHave to control potential confoundsPrice: generalizability can be challenged
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 25
Participant VariablesMatched participants
Virtual twins in each groupDesired size and diversityRule out group differencesDifficult to find enough people who match on more than a few variables
Often narrow match – but have to be carefulReferred to as matched group design
Equated groupsMeans, medians and percentages are important participant variables of the groupsGroups should not be significantly different - should be significantly alikeAssess by nonparametric tests: chi-square, Z testPossible strategy: drop, add or exchange members
Could change mean of other variablesDropping after measurement should raise skepticism
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 26
Participant VariablesStatistical control
Balance secondary variablesTreat as covariant in covariance analysis
Adjust scores for secondary effectsCreation of blocking variables
Study effect and see whether if interacts with treatment variableMust increase the number of cells
Own controlSampling error is the largest error built into a design that has different people in each groupEspecially useful in SWE for accounting for differences in abilities/productivitySome studies do not lend themselves to this kind of control
Long term studies, egPossible problems:
First treatment effects response on secondLearning effects from first test
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 27
Participant VariablesExtra-experimental changes in participantsCritical issues:
Especially in cases where considerable time elapsesMaturity and history
Methods of controlCannot be prevented over the long courseBut if truly random, odds are greater against systematic problems
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 28
Participant VariablesMotivation and role perceptionCritical issues
Are some more motivated than others?Are egos more involved in some than others?Is it important to be a part of a studyUnequal benefits may result in unequal performancePerception of the role might differ systematicallySecond guessing, scoping out, expectations
Methods of controlJudge whether same benefits and rewardsConstant motivation over time and between groupsUnobtrusive and non-reactive measures
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 29
Participant VariablesCommunication among participantsCritical issues:
Communicating experiences with those waiting for treatmentPossible where participants drawn from a co-located populationNot a problem in some cases: eg, auditory acuityWhere there are right/wrong answers, judgments
Methods of controlPhysical separation or simultaneous treatmentWith adults, explain problem and ask for cooperationPretreatment screening for possible contaminationParticipants from different placesWork quickly and finish before communication can take placeMonitor for communication
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 30
Participant VariablesPlacebo effectsCritical issues
Can be quite powerful effectsImportant where there are change expectanciesEspecially where benefit expected
Methods of controlPlacebo to random half of sampleNot always appropriate – eg, psychotherapyWhat about SWE?
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 31
Experimental VariablesCritical issues
Interactional effectsBiosocial effects:
Demographic: men reacting to womenPsychosocial effects
Personal characteristics: don’t like pushy peopleSituational effectsModeling effects
Self-fulfilling propheciesDemand characteristics
Noninteractional effects:Observer effectsInterpreter effects
Personal equationEg, astronomer’s observations differed
Selective – effects that are different in one groupSecondary variance - affect both groupsExperimenter bias
382C Empirical Studies in Software Engineering Lecture 13
© 2000-present, Dewayne E Perry 32
Experimental VariablesMethods of control
Institutionalized critical review processExperience and self-disciplineControl bio-social effects
Anticipate themRule them out, minimize by designAnalyze themReport them
Psychosocial effectsSame ways as biosocialTrial experiment and monitor experimenter