382c empirical studies in software engineering lecture...

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 1© 2000-present, Dewayne E Perry

Artifacts/Confounding Variables

Dewayne E PerryENS 623

[email protected]


© 2000-present, Dewayne E Perry 2

Limitations on KnowledgeAll experiments subject to errorUnderstand and measure itDoes not destroy our opportunities

Makes us aware of errors and limitsExtraneous variables that vary systematically

Importance of keeping other variables equalRule out alternative explanations

Two prime sourcesIrrelevant effects of proceduresArtifacts: biasing effects of investigators and participants



Nature of ProblemArtifact – finding resulting from factors other than the one intended

Usually quite extraneous to the intent of the experimenterFactors that can jeopardize the validity of the conclusions

Interested in subject-experimenter artifactsMust have dependable knowledge about the E-S equationAstronomers need to know the effects of their telescopesIn behavioral experiments, experimenter is the instrument of observation and manipulation

Subject side of the equationHuman complexityNo two research subjects behave identicallyThe same careful experiment will have different results in different places/timesSubjects know they are research participantsResearch subject role well understood



Nature of ProblemExperimenter side of the equation

Systematic errors usually unintentional2 classes

InteractionalBiases that effect the response of the subject

Non-interactionalIn the mind, eye or hand of the experimenter

ControlComparison condition to isolate some effectProcedure to serve as a check on validity



History of ProblemClever Hans

Horse known for remarkable intellectual featsTap out with help ostensibly of code table in front of him

answers to mathematical problemsDate of any day mentioned

Psychologist Oskar Pfungst noticed that he responded to unintended cues from his questioners

Eg, body positionIf animals can do this why not humans



History of ProblemHawthorne Works study – began in 1924

How workers productivity were affected by workplace conditions such as light, temperature, rest periodsBoth treatment and control groups increased their performanceSuggested reasons:

flattered to participateKeenly aware and responsive to task clues

Hawthorne effect now synonymous with placebo effect, the power of suggestion



History of ProblemRosenzweig 1933 – landmark paper

Argued that the experimental situation is a psychological problem in its own rightDeveloped methodological analysis and taxonomy of certain types of interactionsContended that subjects try to guess the purpose of the experiment and give the results expected

Called the good subject effectFurther, the experimenter might unintentionally influence the results



Resistance to the ProblemWhy did it take so long for systematic research to begin in earnest?3 suggested reasons

Phenomenon of artifacts that stem from playing a subject role presupposes the active influence of conscious cognitionConcerns about pervasive biases were possibly viewed as impeding emergence and growing influence of behavioral researchLogical positivist view placed great faith on impartiality of research

In late 50s, positivists and logical empiricists tenants began to loose their hold and cognitive science rose



Demand CharacteristicsOrne’s work on hypnotism and subject expectations

Coined the term demand characteristics of the experimental situation

Could expectations not also apply to other research?Treatment group: novel characteristic – catalepsy of the dominant hand; Control group: no such mentionAlmost all the treatment group exhibited the catalepsy; none in the control groupTypical subject:

Attentive to demand characteristicsAttempted to behave altruistically in a way that confirms the experimenters hypothesis



MotivationAltruism, evaluation apprehension, obedience as motivators

As early as high school, associate subject role with such characteristics as being cooperative, alert and observantDo not always enact altruistic role – may be other motivations

Aiken and Rosnow 7311 statements representing three motivations were compared to each other as pleasant or unpleasantKey situation: being a subject in a psychology experimentRR, Figure 6-1: arrows show mean psychological distanceWhat do we learn from the map

Being subject closely associated with good subject roleObedience and evaluation also entered into subjects thinkingParticipation is mildly pleasant work-oriented activity

Looking good (evaluation apprehension) is more likely to be dominant than doing good



Task Oriented CuesDetection:

Use quasi-control subjects as a possible way to get the subjects to figure out what is going on just by thinking about itServe as co-investigators rather than subjects of the study

Orne suggests 3 techniquesExperimental subjects function as their own controls

Post-experimental interviewPilot study on their perceptions/beliefs

Pre-inquiry:Quasi-control subjects to imagine they are the real subjectsNot subjects, but given full treatment informationQuasi-subjects predict how they might behaveSimilarity between data from quasi and real implies results could be affected by subject guesses

Blind controls: unaware of their statusCompare blind controls to quasi controlsBlind groups sometimes used as a sacrifice group



Task Oriented CuesAlternative: observe dependent variable in different contexts

Eg, both inside and outside the lab settingEg, observed by someone other than the experimenters

Orne considered these to be supplementary techniquesDo not automatically enable us to avoid problemsNot always aware of effectsChallenge: their subtlety and teasing them out

Interesting model proposed, a preliminary statement, not a theory

Assumption: subjects are sensitive to coercive demands of whatever propriety norms may be operating in the experimentFocuses on a few intervening variables instead of categorizing artifact producing variables



Theoretical ViewAssumption: artifact producing variables generalize to a few mediatory factors:

-> compliance, non-compliance, counter-compliance-> receptivity, motivation, capability

Either receptive or notIf not receptive, then non-compliant

Either motivated, not motivated or uncooperativeIf not motivated, then non-compliantIf not cooperative and capable, then counter compliantIf not cooperative and incapable, then non-compliant

Either capable or incapableIf incapable then non-compliantOtherwise compliant to demand characteristics

The only path to worry about



Prediction and ControlTwo objectives in the model

Visualize systematically how demand characteristics operateBlueprint for strategies for reducing or eliminating subject artifacts

Allows us to generate theoretical predictions about how artifact producing events might operate in a given situation

Eg. Clarity of demand and subjects behavior as a resultConsider only receptivity and motivation; hardly ever incapableMotivation and receptivity cancel out when demand very high or very low



Strategies Receptivity manipulations to minimize demand clarity

Measure dependent variable in a setting not obviously connected to the treatment or employ unobtrusive measurements

Ideally, no demand characteristics receivedApproximated in field studies with unobtrusive measuresUnaware -> receptivity is nil

Measure the dependent variable removed in time from the treatment

Ideal usually not met, reception of demand unavoidableSome demand transmitted by means of relationship between treatment and testBreak the relationship: separate in time and space

Employ Solomon design or else avoid pre-testing, especially in attitude change experiments, and instead employ after-only design

Pretest sensitization is a problemMeasure effect or rule out pre-test



StrategiesStandardize and restrict the experimenters communication with subjects

Experimenters are the main channel of demand characteristicsThe more standardized and restricted the betterEg, computerized instructions

Use blind procedures in testing and experimental manipulations

The less known the less transmittedReceptivity manipulation to generate alternative demands

Elicit false hypotheses about the purpose of the research, ie be deceptive

Contrived demands



StrategiesMotivation manipulations to encourage honest responding

Give feedback of compliant behavior in a set of pre-experimental tasks to being the subject to a state of non-acquiescence

More difficult and less confident in manipulation outcomeAim for cooperation and favorable evaluation for true experiment

Make experimental setting and procedures low-keyed and non-threatening (eg, anonymous of confidential)

Non-threatening to avoid evaluation apprehensionProtection of privacy is important

Encourage honest responding thru self-monitoring bogus pipeline: subject is told device detects lyingSubject will give truthful answersNot without risk

No method really infallible, but do need to think deeply about the problem



Non-interactional EffectsSystematic errors on the experimenter side

Observer effectsInterpreter effectsIntentional effects

Observer effectsNot so easy to be sure that one has made accurate observationsObserver effects well know in astronomyAccounted for in interpreting the data

Interpreter effectsExperimenters rarely debate accuracy of observations, will debate interpretationDifficult to state rules of accurate interpretationWrongness of interpretation often due to theory monogamy

Though theory monogamy often advantageousIntentional effects

Implies dishonestyMay cook the data too much – ie, too good to be trueNeed strong sense of ethics and honesty



Interactional EffectsBiosocial effects

Gender, age, race, of experimenterSubjects may respond differently to those aspects of the experimenterCan get different results merely by varying these factorsMales and females may unconsciously conduct different experiments

Males might be more friendly towards female subjectsBefore declaring gender differences in studies must make sure they were treated the same

Psycho-social effectsPersonality, temperament, etcDifferences in hostility, authoritarianism, status and warmth will get different responsesWarmer examiners tend to get better responses than cooler challenging



Interactional EffectsSituational effects

Context, situationMore experience experimenters tend to get different results than less experiencedAcquaintance may yield different results as wellWhat happens during experiment can cascade throughout the rest of the experiment

Modeling effectsOften experimenter will trial experimentSometimes the experimenter’s performance becomes a factor in the subject’s performanceWhen situation is ambiguous, subjects may agree with experimenter too oftenExperimenter’s behavior may have been influencing results

Sell-fulfilling prophecyResearchers expectations – expectancy effectEg, teacher thinks X is bright, treats differently from Y



Experimenter Expectancy EffectsConsider a standard type of experiment

Differing only in hypotheses, expectationsEg, study of bright rats, bright did better Meta-analysis of 345 studies

Mean effect size of expectancy bias of .33Vary according to category of studyDo occur to a considerable degree in all

How are expectancy effects communicatedPsychological climatePhysical distance from interactantsFrequency and duration of interactionsEye contact and smilingVerbal rewards and punishments



Expectancy ControlsIncrease the number of experimenters

Decreases learning of influence techniquesHelps maintain blindnessRandomizes expectanciesIncreases generality results

Observing the behavior of experimentersSometimes reduces expectancy effectsPermits correction of unprogrammed behaviorFacilitates greater standardization

Analyze experiments for their order effectsPermits inference about changes in experimenter behaviorCompare earlier with later

End of experiment changes (whew!, etc)Learning effect on experimenter



Expectancy ControlsMaintaining blind contact

Minimize expectancy effectsLack of knowledge of treatment being given make expectancy unlikely

Minimizing experimenter-subject contactMinimizes expectancy effectsImportant: does it reduce the realism of the manipulations

This affects generalizationEmploying expectancy control groups

Permits assessment of expectancy effectsExperimenter expectancy becomes a second variableGet the magnitude of expectancy effect



Participant VariablesDemographic and personal characteristicsCritical issues: groups need to be comparableMethods of control:

Random assignmentEasiest and surest way of scrambling all possible variables across all groupsPromotes but does not guarantee equivalence

Particularly on small samplesHomogeneous sample

Restrict variance by narrowing sampleHave to control potential confoundsPrice: generalizability can be challenged



Participant VariablesMatched participants

Virtual twins in each groupDesired size and diversityRule out group differencesDifficult to find enough people who match on more than a few variables

Often narrow match – but have to be carefulReferred to as matched group design

Equated groupsMeans, medians and percentages are important participant variables of the groupsGroups should not be significantly different - should be significantly alikeAssess by nonparametric tests: chi-square, Z testPossible strategy: drop, add or exchange members

Could change mean of other variablesDropping after measurement should raise skepticism



Participant VariablesStatistical control

Balance secondary variablesTreat as covariant in covariance analysis

Adjust scores for secondary effectsCreation of blocking variables

Study effect and see whether if interacts with treatment variableMust increase the number of cells

Own controlSampling error is the largest error built into a design that has different people in each groupEspecially useful in SWE for accounting for differences in abilities/productivitySome studies do not lend themselves to this kind of control

Long term studies, egPossible problems:

First treatment effects response on secondLearning effects from first test



Participant VariablesExtra-experimental changes in participantsCritical issues:

Especially in cases where considerable time elapsesMaturity and history

Methods of controlCannot be prevented over the long courseBut if truly random, odds are greater against systematic problems



Participant VariablesMotivation and role perceptionCritical issues

Are some more motivated than others?Are egos more involved in some than others?Is it important to be a part of a studyUnequal benefits may result in unequal performancePerception of the role might differ systematicallySecond guessing, scoping out, expectations

Methods of controlJudge whether same benefits and rewardsConstant motivation over time and between groupsUnobtrusive and non-reactive measures



Participant VariablesCommunication among participantsCritical issues:

Communicating experiences with those waiting for treatmentPossible where participants drawn from a co-located populationNot a problem in some cases: eg, auditory acuityWhere there are right/wrong answers, judgments

Methods of controlPhysical separation or simultaneous treatmentWith adults, explain problem and ask for cooperationPretreatment screening for possible contaminationParticipants from different placesWork quickly and finish before communication can take placeMonitor for communication



Participant VariablesPlacebo effectsCritical issues

Can be quite powerful effectsImportant where there are change expectanciesEspecially where benefit expected

Methods of controlPlacebo to random half of sampleNot always appropriate – eg, psychotherapyWhat about SWE?



Experimental VariablesCritical issues

Interactional effectsBiosocial effects:

Demographic: men reacting to womenPsychosocial effects

Personal characteristics: don’t like pushy peopleSituational effectsModeling effects

Self-fulfilling propheciesDemand characteristics

Noninteractional effects:Observer effectsInterpreter effects

Personal equationEg, astronomer’s observations differed

Selective – effects that are different in one groupSecondary variance - affect both groupsExperimenter bias



Experimental VariablesMethods of control

Institutionalized critical review processExperience and self-disciplineControl bio-social effects

Anticipate themRule them out, minimize by designAnalyze themReport them

Psychosocial effectsSame ways as biosocialTrial experiment and monitor experimenter

382c empirical studies in software engineering lecture...

Documents