the role of behavior observation in measurement systems for randomized prevention trials

14
Prevention Science, Vol. 7, No. 1, March 2006 ( C 2006) DOI: 10.1007/s11121-005-0020-3 The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials James Snyder, 1,7 John Reid, 2 Mike Stoolmiller, 3 George Howe, 4 Hendricks Brown, 5 Getachew Dagne, 5 and Wendi Cross 6 Published online: 30 March 2006 The role of behavior observation in theory-driven prevention intervention trials is examined. A model is presented to guide choice of strategies for the measurement of five core elements in theoretically informed, randomized prevention trials: (1) training intervention agents, (2) delivery of key intervention conditions by intervention agents, (3) responses of clients to intervention conditions, (4) short-term risk reduction in targeted client behaviors, and (5) long-term change in client adjustment. It is argued that the social processes typically thought to mediate interventionist training (Element 1) and the efficacy of psychosocial interventions (Elements 2 and 3) may be powerfully captured by behavior observation. It is also argued that behavior observation has advantages in the measurement of short-term change (Element 4) engendered by intervention, including sensitivity to behavior change and blinding to intervention status. KEY WORDS: prevention trials; behavior observation; mediators; short-term outcomes. It is as important to know how intervention works as it is to document that it works (Follette, 1995). The most informative randomized field trials of preventive or clinical interventions are theory- driven. That is, they are guided by two integrated sets of theoretical concepts. The first set reflects a theory of risk and protective processes, and provides guidance about cognitive, emotional, and behavioral 1 Department of Psychology, Box 34, Wichita State University Wichita, Kansas. 2 Oregon Social Learning Center, 160 East 4th Avenue, Eugene, Oregon. 3 Research and Statistical Consulting, 3084 Lakeshore Blvd, Marquette, Michigan. 4 Psychiatry and Human Behavior, George Washington University, 2300 K Street NW, Warwick Bldg Room 303, Washington, District of Columbia. 5 Department of Epidemiology and Biostatistics, College of Pub- lic Health MDC-56, University of South Florida, 13201 Bruce B. Downs Blvd, Tampa, Florida. 6 Department of Psychiatry and Pediatrics, University of Rochester Medical Center, 300 Crittenden Blvd, Rochester, New York. 7 Correspondence should be directed to James Snyder, De- partment of Psychology, Box 34, Wichita State University Wichita, Kansas 67260-0034; e-mail: [email protected]. targets for change and the conditions that influence their expression. Theory-driven trials are also guided by an explicit theory of change. Both the content and process of intervention are shaped by hypothe- ses about the psychosocial conditions needed to systematically alter targeted risk and protective mechanisms. Intervention theory specifies models of learning and skills attainment, and mechanisms of social influence. In summary, theory-driven pre- vention trials have two interlocking goals: to test the efficacy of intervention in terms of reducing risk for future disorder and to ascertain whether the theo- retical mechanisms hypothesized to mediate change as a result of intervention account for the observed reduction in risk (Eddy et al., 1998). Achievement of both goals requires careful attention to measurement issues (Reid, 2003). This paper examines the role and potential utility of behavior observation as a measurement tactic in theory-driven intervention trials. First, a model of the core elements comprising theory-driven trials is presented. Second, observational methods are described and contrasted with other measure- ment methods such as ratings, self-reports, and clin- ical judgments. Third, the potential advantages and 43 1389-4986/06/0300-0043/1 C 2006 Society for Prevention Research

Upload: james-snyder

Post on 15-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Prevention Science, Vol. 7, No. 1, March 2006 ( C© 2006)DOI: 10.1007/s11121-005-0020-3

The Role of Behavior Observation in Measurement Systemsfor Randomized Prevention Trials

James Snyder,1,7 John Reid,2 Mike Stoolmiller,3 George Howe,4 Hendricks Brown,5Getachew Dagne,5 and Wendi Cross6

Published online: 30 March 2006

The role of behavior observation in theory-driven prevention intervention trials is examined.A model is presented to guide choice of strategies for the measurement of five core elementsin theoretically informed, randomized prevention trials: (1) training intervention agents,(2) delivery of key intervention conditions by intervention agents, (3) responses of clientsto intervention conditions, (4) short-term risk reduction in targeted client behaviors, and(5) long-term change in client adjustment. It is argued that the social processes typicallythought to mediate interventionist training (Element 1) and the efficacy of psychosocialinterventions (Elements 2 and 3) may be powerfully captured by behavior observation. Itis also argued that behavior observation has advantages in the measurement of short-termchange (Element 4) engendered by intervention, including sensitivity to behavior changeand blinding to intervention status.

KEY WORDS: prevention trials; behavior observation; mediators; short-term outcomes.

It is as important to know how interventionworks as it is to document that it works (Follette,1995). The most informative randomized field trialsof preventive or clinical interventions are theory-driven. That is, they are guided by two integratedsets of theoretical concepts. The first set reflects atheory of risk and protective processes, and providesguidance about cognitive, emotional, and behavioral

1Department of Psychology, Box 34, Wichita State UniversityWichita, Kansas.

2Oregon Social Learning Center, 160 East 4th Avenue, Eugene,Oregon.

3Research and Statistical Consulting, 3084 Lakeshore Blvd,Marquette, Michigan.

4Psychiatry and Human Behavior, George WashingtonUniversity, 2300 K Street NW, Warwick Bldg Room 303,Washington, District of Columbia.

5Department of Epidemiology and Biostatistics, College of Pub-lic Health MDC-56, University of South Florida, 13201 Bruce B.Downs Blvd, Tampa, Florida.

6Department of Psychiatry and Pediatrics, University ofRochester Medical Center, 300 Crittenden Blvd, Rochester,New York.

7Correspondence should be directed to James Snyder, De-partment of Psychology, Box 34, Wichita State UniversityWichita, Kansas 67260-0034; e-mail: [email protected].

targets for change and the conditions that influencetheir expression. Theory-driven trials are also guidedby an explicit theory of change. Both the contentand process of intervention are shaped by hypothe-ses about the psychosocial conditions needed tosystematically alter targeted risk and protectivemechanisms. Intervention theory specifies modelsof learning and skills attainment, and mechanismsof social influence. In summary, theory-driven pre-vention trials have two interlocking goals: to test theefficacy of intervention in terms of reducing risk forfuture disorder and to ascertain whether the theo-retical mechanisms hypothesized to mediate changeas a result of intervention account for the observedreduction in risk (Eddy et al., 1998). Achievement ofboth goals requires careful attention to measurementissues (Reid, 2003).

This paper examines the role and potentialutility of behavior observation as a measurementtactic in theory-driven intervention trials. First, amodel of the core elements comprising theory-driventrials is presented. Second, observational methodsare described and contrasted with other measure-ment methods such as ratings, self-reports, and clin-ical judgments. Third, the potential advantages and

431389-4986/06/0300-0043/1 C© 2006 Society for Prevention Research

Page 2: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

44 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

disadvantages of behavior observation in measuringeach of the core elements are examined in terms ofconstruct validity, measurement bias, and affordabil-ity.

CORE ELEMENTS OF THEORY-DRIVENINTERVENTION TRIALS

We propose that five core elements shown inFig. 1 need to be defined and measured to answerquestions about efficacy of interventions and themechanisms by which efficacy is attained. The spe-cific constructs defining each element are carefullyarticulated by the theories of etiology and behaviorchange that inform the intervention effort. Each el-ement reflects one component of a complex causalprocess, and the elements linked by arrows designatea causal connection. The design of intervention trialsentails making choices about measurement strategiesto adequately capture each construct or element asdefined by the theories informing the intervention.

The first core element entails the transfer ofskills from the training specialist to the agent whodelivers the intervention (Forgatch et al., 2005).The intervention agent could be a parent in parenttraining (Forgatch, 1994), a home visitor for youngmothers (Olds, 2002), a teacher in school-basedintervention (Ialongo et al., 2001), or a group trainerfor at-risk adolescents (Botvin, 2000). Skills transfertypically occurs during formal training sessions,supervision, coaching, and staff meetings with in-tervention agents. Measurement of this element hastwo objectives: to assess the degree to which trainersof intervention agents carry out the training activitiesas planned and to assess the degree to which theseactivities in fact lead to the intervention agent’sacquisition of critical behavior change skills.

The second core element refers to the applicationof concrete actions by the intervention agent desig-nated as critical elements of the intervention. Mea-surement objectives for this element involve testingthe degree to which the intervention agent deliversthese critical elements when interacting with clientparticipants (Hogue et al., 1996). These elements caninclude structured activities, such as the presentationof new information, modeling, and role playing, andit also includes behavioral tactics to induce client mo-tivation, respond to cooperative and resistant clientbehavior, and adjust the manner in which critical ele-ments are delivered to fit the characteristics of clientparticipants (Miller & Rollnick, 2002).

The third core element reflects the immediate re-sponses of clients to the actions of the interventionistduring interaction in the intervention session. Mea-surement objectives for this element involve testingwhether the set of client behaviors specified as crit-ical targets by intervention theory change in mean-ingful ways during and across intervention sessions(Stoolmiller et al., 1995). They also involve test-ing whether these changes can be attributed to theactions of the intervention agent. The second andthird elements comprise the active ingredients orbehavior-change mechanisms specified by interven-tion theory: If the program is delivered with fidelity,then the client is predicted to respond in a favorablemanner. In this formulation (see Fig. 1), the “causal”paths between Elements 2 and 3 are bidirectional.When intervention is delivered during face-to-facesocial interaction, influence is likely to be reciprocal.In this sense, intervention delivery is co-constructedand active treatment mechanisms entail social ex-change rather than a “top-down” interventionist-to-client model of social influence.

The fourth core element is the impact of inter-vention on short-term or proximal outcomes. Theprimary measurement objective is to assess the de-gree to which intervention systematically alters riskor protective processes as they operate in the client’sday-to-day environment. The definition of “short-term” is relative to the risk or protective factors tar-geted for change by the intervention, but for manypsychosocial interventions short-term entails periodsranging from a few weeks to a few months. The timeframe for measuring short-term outcomes may beginduring intervention delivery (including baseline) andtypically extends into a period after intervention hasended, as engendered change persists and generalizesoutside of the intervention setting (DeGarmo et al.,2004). Client behaviors defining short-term outcomesare typically composed of larger behavioral aggre-gates than the moment-by-moment interactions be-tween the interventionist and client described by thesecond and third elements.8

8Some theories may not distinguish between Core Elements 3 and4. For example, intervention with young adolescents may entailusing instruction and role playing to shape skills to resist peer in-fluence or to refuse offers to use drugs. This same set of skillsmay be the desired short-term outcome. Even if this is the case,distinction between Elements 3 and 4 may be useful. Core Ele-ment 4 in contrast to Element 3 refers to the display of the targetresponse outside of the intervention session—its generalizationto natural environmental settings and its persistence over time af-ter its acquisition and practice. Heuristically, measurement of the

Page 3: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 45

Fig. 1. Generic model of measurement elements for intervention trials.

The fifth and the last core element concerns thelong-term or distal outcomes that motivate the designand implementation of intervention. The long-termoutcomes to be engendered by intervention are typ-ically more enduring, apparent in multiple settings,and entail a persisting reduction in morbidity and/ormortality, personal suffering, and social costs. Or, ina complementary fashion, they may entail a sustainedincrease in skill, knowledge, and resilience represent-ing attainment of socially approved and valued char-acteristics, roles, and competencies.

Some randomized trials measure only a sub-set of these core elements. Designs that removeElements 2 and 3 result in a traditional baseline,post-intervention, and follow-up assessment schemeignoring measurement of active intervention ingre-dients. Designs that remove Elements 1 and 2 ignoremeasurement of training, implementation, and treat-ment fidelity. Even intervention trials that assessElements 1–3 rarely use behavior observation asa measurement strategy. Two other constructs areshown in Fig. 1: characteristics of the interventionagents and clients. These constructs can be used toexamine how characteristics of intervention agentsand clients moderate change processes and the ef-

same response in two different contexts may be optimized by dif-ferent methods.

fectiveness of intervention (MacKinon & Lockwood,2003).

The five core elements in the heuristic modelwill not be measured in the same way across allintervention targets or across all theories of risk andbehavior change. Before describing measurementoptions and offering a set of guiding principles forformulating measurement decisions applicable toa variety of intervention targets and change theo-ries, we describe how the heuristic model could beapplied to an existing intervention theory, ParentManagement Training.

AN EXAMPLE OF THEORY-DRIVEN INTERVENTIONTRIALS: PARENT MANAGEMENT TRAINING

Parent Management Training (PMT) is selectedas an example of the heuristic model because it isbased on a detailed theory of familial contributionsto child conduct problems (Reid et al., 2002), clearlydelineated intervention strategies (Forgatch, 1994),and specification of the mechanisms by which in-tervention has its effects (Schrepferman & Snyder2002). The model can be applied to preventive in-terventions delivered in other settings (e.g., schools),derived from other theories (e.g., attention regula-tion and reading instruction) and applied by other

Page 4: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

46 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

agents (e.g., teachers). An explicit application of themodel to a universal school-based program is pro-vided later in this report.

The intervention agents in PMT are the parentswho are trained (Element 1) by PMT therapiststo provide a home environment that reduces childrisk for conduct problems and increases child socialcompetence. Critical PMT parenting skills (Ele-ment 2) include tracking child behavior, clear andconcrete commands, contingent use of time outfor child aggressive behavior, and reinforcementof child cooperation (see Fig. 2). These parentingconditions are explicitly connected to expectedchild reactions—reductions in aggressive behaviorand increased cooperation (Element 3; Patterson,1982; Snyder & Stoolmiller, 2002). Child behavioralso influences parental actions. Therefore, PMTtrainers teach parents to constructively respondwhen children are disruptive or resist parents’behavior-management efforts (e.g., reduced threatsin response to child aversive behavior; use of re-sponse cost for refusal of time out). The PMT modelhypothesizes that as these changes in parent behav-ior are sustained over time, there will be reductionin child behavior problems at home and school(Element 4). These proximal changes contribute tolong-term reductions in risk for delinquency, druguse, and school failure (Element 5).9

MEASUREMENT ALTERNATIVES INTHEORY-DRIVEN INTERVENTION TRIALS

A range of methods can be used to measure eachof the core elements in the model. This report exam-ines how behavior observation might be applied tothe core elements and the potential advantages of do-ing so. We first describe what we mean by behaviorobservation, and distinguish between observationaland other measurement methods (ratings by natu-ral agents, self-report, interviews, and clinical judg-ments) in terms of sources of variance optimally cap-tured by each.

9There is another step in the training process—training PMT ther-apists who provide training for parents. For sake of simplicity ofexposition, this additional element is not described here althoughobservational methods and sequential analyses are clearly rele-vant to measurement of the training of PMT therapists as well(Forgatch et al., 2005).

WHAT ARE BEHAVIORAL OBSERVATIONS?

A broad range of measurement techniques askpeople to describe the actions of others. We re-serve the term behavioral observation for methodsthat meet four central requirements. First, system-atic observation incorporates a set of mutually ex-clusive categories for describing behavior. This doesnot mean that behavior can be described in only oneway. Different categorization schemes can be appliedto the same set of observations, but each of theseschemes must have categories that are mutually ex-clusive within that scheme.

Second, explicit criteria are used to define eachbehavior category in concrete terms so that behaviorcan be coded with minimal inference. Behavior cat-egories can range from micro-temporal expressionsto longer trains of action. These criteria are typicallydescribed in terms of the topography or functionof the behavior, and capture aspects of behaviordeemed critical by theory. The onset and offset (orsequence) of the behaviors may also need to berecorded.

Third, clear criteria must be provided regardingthe “location” or setting in which the episodes of be-havior are observed and coded. This location may in-clude a physical place (e.g., classroom, playground,home), the social players involved (e.g., teachers,peers, siblings, parents), and a time frame with aspecified beginning and end. Independent observerswho record behaviors in this specified time, period,and place meet this criterion. Teacher ratings ofchild behavior “over the past week” are unlikely tomeet this criterion because such ratings are based onpoorly specified contexts that are not accessible to asecond observer.

Fourth, the observational system must includemethods to verify that obtained data are replica-ble across observers. This entails the availability ofthe relevant raw behavioral data to two or moreobservers, and calculation of agreement among ob-servers on the coding of those data. Low rates ofagreement reflect a failed attempt at behavior ob-servation. Without careful training, regular recali-bration, clearly specified target behaviors, and cleardemonstration of observer agreement, ratings bytrainers, interventionists, teachers or parents wouldnot be considered behavior observation. These rat-ings are a global index of adaptation from the per-spective of a natural rater that includes variancedue to idiosyncratic rater characteristics (Kellam,1990).

Page 5: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 47

Fig. 2. Application of the model of measurement elements to parent management training intervention for conduct problems.

SOURCES OF VARIANCE IN DIFFERENTMEASUREMENT METHODS

No one measurement method is superior to oth-ers in some a priori manner. Rather, the validity andutility of each method depends on the nature of theconstruct being defined—in our case the five core ele-ments of theory-driven intervention trials as specifiedby specific risk and intervention theory. However, itwould be useful to have a set of principles to guidechoice of measurement methods to define these ele-ments. Cairns and Green (1979) offer an analysis ofmeasurement methods, contrasting behavioral obser-vation and rating scales, from which such principlesmight be derived. A brief overview of the relativesources of variance captured by behavioral observa-tions and ratings is shown in Table 1 (modified fromCairns & Green, 1979).

As shown in the left column in the table, vari-ation in behavior can be apportioned to a numberof sources, including characteristics of the Asses-sor (S2

A), Stable characteristics of the Target per-son whose behavior is being measured (S2

ST), Tem-porary characteristics of the Target person (S2

TT),changing characteristics of the Setting in which be-havior is observed (S2

S), characteristics of the so-cial Interchange in which the behavior is embed-ded (S2

I ), and nonsystematic Error (S2e). Ratings and

behavior observation differ in their sensitivity tothese various sources of variance, and thus representbehavioral phenomena being measured in differentways.

In terms of guiding selection of measurementmethods for the core elements, behavior observationmay be more applicable and powerful in measure-ment of elements that reflect behavioral variationacross time and over situations (S2

TT; S2S) and behav-

ioral variation due to social influence (S2I ), or when

systematic variation due to assessor characteristics(S2

A) is particularly worrisome (i.e., systematic bias).Global ratings may be more applicable and powerfulmeasures of elements in the model that reflect stablecharacteristics of the person whose behavior is beingmeasured (S2

ST) and when variance due to assessorcharacteristics (S2

A) can be reduced or is unlikely tobe a source of systematic measurement error.

CRITERIA FOR SELECTING MEASUREMENTMETHODS IN INTERVENTION TRIALS

Measurement methods for each element inintervention research need to be chosen in a mannerthat “maps onto” the theories of risk and behaviorchange from which the intervention is derived.Although these theories posit a wide variety ofconstructs and mechanisms, five general questionsare relevant to the choice of measurement methodsfor each of the core elements, and to determiningwhen behavior observation may be particularlyrelevant and powerful. (1) Does the core elementinvolve behaviors that are public and observable?(2) What time scale for behavioral variation is in-ferred by the element? (3) Does the element reflect

Page 6: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

48 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

Table 1. Sources of Variation (S2) in Behavior Captured by Ratings and Observations

Sources of variation Examples Amount of S2 by ratings Accounted for by observation

Characteristics of the assessor Selective memory & attention More Less(S2

A) Limits of perceptionBiases to participants (halo)Biases to groups (stereotype)Selective memoryDisposition of assessorKnowledge of reference population

(bias to treatment or othercondition)

Idiosyncratic definition of theconstruct & relevant behaviors

Scaling of individual on groupdistribution

Characteristics of the targetperson—stable (S2

ST)Enduring dispositions More Less

Characteristics of the targetperson—temporary (S2

TT)(internal environment)

Fatigue, mood, hunger, etc.Diurnal rhythms

Less More

Characteristics of thesetting/environment (S2

S)(external environment)

Behavior of the partnerSocial context, taskPhysical settingReactivity to assessment

Less More

Characteristics of theinterchange (S2

I )Behavior of partner in relation to

behavior of targetLess More

Nonsystematic Error (S2e) In behavior ascertainment,

recording, analysis?? ??

Note. Adapted from Cairns and Green (1979).

planned social influence? (4) To what degree doesthe element reflect behavior as it is changing overtime? (5) Is measurement of the element poten-tially subject to systematic error or bias? Althoughother psychometric attributes are relevant to thechoice of measurement methods, these questionsare considered from the perspectives of constructvalidity (Questions 1–4) and measurement error(Question 5). Parent Management Training is usedas an example to further elucidate these questions.

Question #1: Does the Core Element InvolveBehaviors that are Public and Observable?

It is important to determine the degree to whicheach element (Fig. 1) can be optimally describedby observable behavior and environmental events,as defined by the intervention theory from whichthe constructs are derived. In most psychosocialinterventions, Elements 1–3 are readily apparentas observable, public behaviors, and environmen-tal events. These interventions entail face-to-face

contact and social interchange between trainers andintervention agents (Element 1) and between inter-vention agents and client participants (Elements 2and 3) that are overt and observable. The observableintervention conditions and participant reactionshypothesized to serve as the active mechanisms inPMT are shown in Fig. 2.

Other theories may postulate that changes inparticipants’ covert responses to intervention (Ele-ment 3) are critical to efficacy, such as increasedneuro-physiological inhibition or the formation ofexplicit intentions. In this case, behavior observa-tion may be less useful in assessing Element 3.Inhibition may be directly and validly measuredby neural imaging or electrical brain activity, andintentions by self-report. However, inhibition, in-tentions, and other covert processes are likely tohave observable behavioral referents that could bemeasured.

In summary, the first three elements of manypsychosocial theories of change entail overt, observ-able behavior by intervention trainers, interventionagents, and client participants. As such, behavioral

Page 7: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 49

observation methods are potentially relevant mea-surement tactics. Short (Element 4) and long (Ele-ment 5) term outcomes also may entail public, ob-servable behavior, but other aspects of construct va-lidity as well as systematic measurement error andaffordability influence the utility of behavior obser-vation as a method to measure the various elements,especially Elements 4 and 5.

Question #2: What Time Scale for BehavioralVariation is Inferred by the Element?

The phenomena that comprise each element inthe model vary in time frame, molecularity (responsedisaggregation) and function, as specified by theo-ries informing the intervention. Some phenomenadefined by the elements may occur very rapidly—in seconds or minutes, whereas others may occurover longer time periods—days, months, or years.Some phenomena entail discrete, molecular events—a definable set of temporally brief, concrete actionsor reactions, whereas others are better described asmore generalized aggregations of behavior—definedby a larger set of interrelated responses over longertime periods, or by behavioral products. Thinking interms of variance sources described in Table 1, dis-crete, molecular behaviors, and events that changefrequently over short time periods (S2

TT; S2S) may be

more usefully measured by behavior observationswhereas global ratings and self-reports may be moreuseful and powerful in capturing elements involvingmore generalized, molar, and temporally cumulatingbehavior (S2

ST).In many theories, interventionist training, deliv-

ery of intervention, and clients’ responses are con-strued as exchanges of relatively molecular behav-iors varying over relatively short time periods. In sofar as this is the case, Elements 1–3 may be use-fully and perhaps optimally measured by behaviorobservation. This is apparent in Parent ManagementTraining (PMT—see Fig. 2). Training of the parentby the PMT therapist (Element 1) entails a care-fully sequenced set of specific therapist behaviors(queries, supportive statements, instructions, model-ing, role playing, feedback, etc.). The target behav-ioral skills to be acquired by the parent are describedin some detail in PMT intervention manuals (For-gatch, 1994). Element 1 as defined by PMT theory isobservable, and perhaps is even best captured by be-havior observation. Parents’ application of acquiredparenting strategies (Element 2) and children’s re-

actions to those strategies (Element 3) in the homeare also specified by the family theory that informsPMT. These actions and reactions involve molecularand observable parent and child behaviors occurringrapidly in time (e.g., parent commands, contingentattention and time out, child coercive behavior andcooperation).

Short-term (Element 4) and long-term (Element5) outcomes engendered by most psychosocial in-terventions are potentially observable, but involvecharacteristics of the client participant (S2

ST) that arehopefully becoming more stable over time and acrosssettings, and thus more aptly captured by ratingscales or methods other than behavior observation.The short-term outcomes of Parent ManagementTraining are often measured by parent or teacherratings and child self-report. Long-term outcomes ofPMT are measured by even broader indices of ad-justment (see Fig. 2), such as arrests, school dropout,and placement in out-of-home care (Schrepferman &Snyder, 2002 ). However, there are other reasons whybehavioral observation may be important in measur-ing short-term outcomes—an issue considered fur-ther in relation to Question #4.

Question #3: Does the Element Reflect PlannedSocial Influence?

Most psychosocial interventions are derivedfrom theories incorporating the notion of system-atic social influence as a central process in behav-ior change. Training intervention agents (Element 1),delivery of intervention (Element 2), and its impacton the client (Element 3) implicitly or explicitly in-volve social influence. Focusing on the influence ofthe intervention agent on the client, most theorieswould hypothesize the more (in terms of frequency,duration, and consistency) and the more sensitivelythe intervention agent can expose the client to criti-cal intervention conditions (Element 2), the greaterthe change in the behavior (Element 3) of the client.

We have described behavior observation asa useful and powerful way to measure Elements1–3. It might be alternately argued that global ratingsby a trainer of intervention agents (Element 1), or byintervention agents, clients, or independent assessorswould be sufficient to accurately assess Elements 2and 3—and this is a common method of measuringtraining (Element 1), implementation and fidelity(Element 2), and intervention mechanisms (Ele-ments 2 and 3). However, rating scales summarizing

Page 8: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

50 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

behavior over multiple social exchanges and longperiods represent social influence in a relatively grossand limited manner. In fact, observation methodsrepresenting the behavior of trainers, intervention-ists, and clients as independent counts of molecularactions may not be sufficient to adequately representsocial influence.

Theories of social influence really require amore complex and detailed measurement approachthan ratings or simple behavioral counts. The timingand sequence of reciprocal behavior exchange arecritical aspects of change theory. To be effective,the intervention agent must provide critical con-ditions (e.g., instruction, modeling, role playing,and feedback) in a specific sequence, and do sowith a timing and tactic selection that is sensitiveto and accommodates the reactions of the client.Optimal and accurate measurement of Elements 2and 3 (as well as Element 1) may not only requirethe use of behavioral observation, but also codingof the sequence or order in which behaviors ofthe interventionist and client occur in real time.Sequential coding of observed social exchanges canthen be represented in Markov, hazard, or othermathematical models (Dagne et al., 2002; Gardner &Griffin, 1989). In terms of the “sources of variance”framework in Table 1, social influence needs toexplicitly capture variance due to reciprocal socialInterchange between interventionist and client (S2

I ).Measurement methods reflecting social influence areless relevant to short- and long-term outcomes.

PMT theory, for example, identifies specificmolecular sequences of parent–child behavior ex-changes (Elements 2 and 3) as critical to achiev-ing positive short-term child outcomes (Element 4).These include parental commands followed by childcompliance, child compliance followed by parentalpositive reinforcement, child noncompliance fol-lowed by parental use of time out, and reducedparental irritable reactions to child aversive behavior(Schrepferman & Snyder, 2002).10

10This model of social exchange and influence can also be appliedto therapists’ training of parents in PMT skills. Parents react toactions of the PMT therapist, and these reactions may be coop-erative or resistant. As such, reciprocal influence is embeddedin this element. Efforts to assess this process have been made(Stoolmiller et al., 1993) and strategies by which PMT therapistscan effectively deal with resistance are explicitly addressed in in-tervention manuals and in training of PMT therapists (Forgatchet al., 2005). These therapist–parent actions and reactions reflectsources of variation due to social Interchange (S2

I ). This suggests

Question #4: To What Degree Does the ElementReflect Behavior as it is Changing Over Time?

Theoretically, short-term changes (Element 4)engendered by intervention should be explicitlylinked to the processes mediating intervention effects(Elements 2 and 3), and to the long-term changes(Element 5) that are the ultimate goal of interven-tion. Intervention theory should specify the timeframe during which short-term changes occur, accu-mulate, and stabilize. There is very little empiricalguidance concerning the dynamics and timing of be-havior change resulting from psychosocial interven-tions (DeGarmo et al., 2004). In the relative absenceof such guidance, a useful tactic is to assess changesin short-term behavioral targets at multiple times, be-ginning prior to intervention (the classic baseline as-sessment point) and then intermittently during im-plementation of intervention through its termination(including the classic post-intervention measurementpoint). Given that behavior change may be suddenand nonlinear rather than cumulative and orderly(Eddy et al., 1998), measures of targeted short-termchange derived at multiple time points are needed toestimate linear and other forms of growth. Measure-ment at multiple time points has the added advantageof providing substantially more reliable estimates ofchange parameters (Willett, 1989) and boosts botheffect size and power (Kraemer, 1991). Given the ex-pense of randomized intervention trials, these issuesare critical in reducing Type II error.

The strategy of collecting data on target behav-iors at multiple time points implies that measurementmethods to detect short-term outcomes should besensitive to change over relatively short time frames,on the order of weeks or months. Rating scales de-rived from cross-sectional and diagnostic researchare typically used to detect short-term change. Thesescales often have strong psychometric characteristicssuch as cross-time stability and concurrent validitythat emphasize static pictures of individuals’ func-tioning (S2

ST). Rating scales asking informants to as-sess clients’ behavior in relatively global terms andover relatively long time periods may be insensitiveto change in specific behaviors over short tempo-ral frames. Reliable and valid measures sensitive tochanges in attributable risk over brief intervals areneeded (Collins, 1991).

that behavior observation methods are also relevant to the mea-surement of Core Element 1.

Page 9: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 51

To what degree might observational methodsprovide more time dynamic indicators of change?An important characteristic of observational data isthat the temporal variability in behavior coded at amolecular level is typically large (Jones et al., 1975;Snyder & Stoolmiller, 2002). Such variability re-flects short-term changes in the person being assessed(S2

TT), in the ambient setting conditions (S2S), and

in the social environment (S2I ). Although this short-

term, situational variability is often viewed as a mea-surement “problem,” it can be seen as advantageousin assessment of presumably malleable short-termtargets of intervention. In contrast to “trait” mea-surement strategies appropriate to long-term out-comes (S2

TS), short-term outcomes entail measuringthe degree to which the target behavior of clients ischanged by intervention over relatively short timeperiods—the exact strength of behavioral observa-tions. This argument was made powerfully by Cairnsand Green (1979). “. . . the unreliability [of repeated]behavioral observations occurs primarily because be-havioral measures are indeed precise and sensitiveto powerful effects of interactional and contextualcontrols. In fact, the abilities of the child [client] toadapt to the changing demands of social and nonso-cial environment [intervention] constitute the centralfoci for a developmental [short-term change] analy-sis.” “The techniques that are most effective in de-scribing the [long-term] outcomes of development[intervention] may not be most effective in analyz-ing the processes [short-term changes] by which socialpatterns arise and are maintained or eliminated” (pp.223–224; italics in the original, bracketed materialadded).

Occasion-to-occasion variability in observedrates of behavior is often quite high. Rates of childphysical aggression on school playgrounds, for exam-ple, have temporal stability coefficients in the rangeof .20–.40 (Stoolmiller et al., 2000). The size of thesetemporal reliabilities increases as the duration ofthe observation is longer at each occasion (Radke-Yarrow & Zahn-Waxler, 1979) or as behavioral dataare averaged over multiple observation occasions(Stoolmiller et al., 2000). Data from behavior obser-vations, given multi-occasion assessment, can be usedto estimate “trait” variance (and, in fact, changesin trait variance over time) comparable to ratingscales (Stoolmiller et al., 2000), but maintain the de-sirable characteristic of providing real counts of be-havior and sensitivity to variation in behavior overtime.

Question #5: Is Measurement of the ElementPotentially Subject to Systematic Error or Bias?

We have argued that observation, because of itssensitivity to variation in behavior over time, may op-timally measure short-term intervention outcomes.A reasonable counter-argument is that interventionshould generate short-term outcomes that are be-coming increasing stable over time and across situ-ations. This emphasis on stability or increasing dis-positional quality (S2

ST) of the short-term targets ofintervention maps well onto global ratings or reportsby client-participants, intervention agents, or othernatural informants. In terms of face and constructvalidity, the use of rating scales and self-report tomeasure short-term outcomes in randomized inter-vention trials seems eminently sensible.

However, an additional issue bears considera-tion before ratings and self-reports are selected as thesole method to measure short-term outcomes: sys-tematic error or measurement bias. Systematic erroroccurs when a method of measurement systematicallyassigns values to variables different than their truevalue. One critical source of bias in measuring short-term outcomes entails assessors’ awareness of inter-vention condition (one part of S2

A in Table 1).Systematic bias due to knowledge of clients’ as-

signment to intervention condition can occur in anymeasurement method. A prime example in behav-ior observation entails observer drift in which an ob-server applies coding categories to behavior in con-sistently idiosyncratic ways. One potential source ofdrift is knowledge of intervention condition. System-atic bias in observation has received considerableempirical attention. Methods of training and recali-brating observers to minimize bias due to observerdrift have been developed and are quite effective(Bakeman & Gottman, 1997). The use of two ob-servers provides the opportunity to assess systematicerror (in contrast to random error due to low ob-server agreement; Jones et al., 1975).

Systematic bias is also a problem in using rat-ings, clinical judgments, and self-report methods tomeasure short-term outcomes. Individuals who arethe sources of these measures are often aware of theintervention status of clients and have investmentin the positive effects of the intervention (Whiteet al., 1985). Ratings by teachers, parents, peers,therapists, and clients are known to change in sys-tematic ways given knowledge of intervention status.It is the inherent lack of raters’ blindness to the

Page 10: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

52 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

assignment of participants to intervention condi-tions that distinguishes “open label” trial designsfrom classic double-blind randomized controlledtrial designs (Brown, 1993). In pharmaceuticalresearch “open label” designs are considered lessrigorous than double blind designs. Is awarenessof the intervention condition to which a client isassigned a serious limitation in estimating short termtreatment efficacy? It would be if sole reliance isplaced on reports of natural agents who are aware ofparticipants’ intervention assignment and who oftenhave investment in demonstrating the efficacy of theintervention.

The potential and sometimes severe bias result-ing from estimates of short-term outcomes derivedfrom natural raters is apparent in research on par-ent management training. Patterson and Reid (1973)reported that 100% of parents who received par-ent management training rated their children’s be-havior as improved whereas only 50–75% showedimprovement according to behavioral observations.Walter and Gilmore (1973) reported that 100% ofparents in a placebo condition as well as in parenttraining reported substantial improvement in theirchildren’s behavior using a global impressions mea-sure; in contrast, behavior observations indicatedthat problematic child behavior was reduced by over60% in the active treatment condition but increasedby 25% in the placebo condition. Schelle (1974) ex-amined the validity of parent global reports of chil-dren’s treatment-induced improvement on school at-tendance. In all, 37% of parents indicated that theirchildren’s attendance was “very improved” and 42%indicated it was “improved” whereas school recordsindicated children’s actual attendance had actuallydecreased by 25 and 15%, respectively. The correla-tion between parent reports and school records was−.20.

Intervention trials can overcome this deficit ininterveners’, natural raters’, and participants’ blind-ness provided critical data are collected in waysthat minimize systematic bias. Behavior observationshave the potential to provide these critical data fortwo reasons. First, observers can be methodologicallyblinded to both hypotheses and client treatment sta-tus. But, absolute blinding is often difficult to achieve(Carroll et al., 1994).

Behavior observation has a second characteristicthat reduces systematic bias. Trained observers areless easily biased by knowledge of treatment statusor other systematic expectations than are natural, un-trained raters such as teachers, interventionists, and

clients. Skindrud (1973) compared the impact on ob-servational data of coders’ knowledge of family treat-ment status (intervention vs. control and baselinevs. post-intervention). Rates of child deviant behav-ior from informed versus uninformed coders showedno significant effect for intervention assignment orassessment timing. Kent et al. (1974) assigned well-trained observers to two groups. One group was toldthat a series of videotapes would show improvementsin child behavior and a second group that the child’sbehavior was getting worse. There were no signifi-cant group effects on rates of observed deviant childbehavior.

The diminished susceptibility of behavior ob-servation to systematic bias is the result of carefuldefinition of behaviors to be coded, observer train-ing to criterion reliability, explicit calculations ofagreement, and ongoing training to reduce observerdrift (Patterson, 1982). In contrast, systematic bias isgreater as the behaviors to be measured are less con-cretely defined, informants are untrained, the refer-ence time period is long and the contexts for behav-ior are diverse—exactly the conditions characterizingmany sources of ratings.

Long-term outcomes reflect more global and en-during reductions in dysfunction, or increases in ca-pacities and resilience. As such, they are appropri-ately assessed using multiple global measures (e.g.,public records; reports by self and natural raters; ser-vice utilization; objective tests of ability and healthstatus; indicators of social support, occupational sta-tus, and educational attainment) reflecting stable,trait-like behavioral outcomes and products (S2

ST)congruent with long-term intervention goals. More-over, many informants or sources for behavioralproducts used to measure long-term outcomes arerelatively blind to clients’ previous involvement in in-tervention. Observation methods are less useful forthese tasks.

A SECOND EXAMPLE OF THEORY-DRIVENINTERVENTION TRIALS: THE BALTIMORE SCHOOL

PREVENTION TRIAL

To this point, Parent Management Training hasbeen used as a concrete example to describe selec-tion of measurement methods for theory-driven in-tervention trials. The use of PMT as the sole exam-ple leaves open the applicability of the model andthe principles guiding measurement choices to pre-

Page 11: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 53

vention trials situated in other ecologies or derivedfrom alternate theories. To demonstrate their gener-ality, the concepts described in this report are nowapplied to a school-based, universal intervention, theBaltimore Prevention Program (Reid, 2003), cur-rently being implemented as a randomized trial infirst-grade classrooms. Two key components of theintervention are an enhanced reading curriculum anda classroom behavior management strategy. Thesecomponents target known, co-occurring risk factorsfor drug use, delinquency, and psychiatric symptomsin adolescence. The first-grade teacher is the primaryintervention agent.

Focusing solely on the behavior managementcomponent (see Fig. 3), teachers are systematicallytrained (Element 1) to use contingent attention(Element 2) to increase children’s academic engagedtime and to decrease child disruptive behavior(Element 3). The intervention manipulation oractive ingredient (teacher contingent attention) andchild responses (being on task, disruptive behavior)are detailed in concrete terms, can be observedand counted, and their expected (reciprocal) inter-relationship is specified. As child academic engagedtime increases and disruptive behavior decreasesover days and weeks, and are accompanied byeffective instructional practices, measurable short-term increases in child reading comprehension anddecreases in child behavior problems are expected(Element 4). Child classroom behavior is directlyobservable, and reading comprehension is measuredby brief standardized behavior samples (focusedobservation of letter naming, initial sound, phonemefluency, etc.). As these short-term changes accu-mulate, the child is more likely to meet academicstandards for reading and other academic subjects,and ultimately to graduate. The child is also expectedto display fewer serious behavioral and emotionalproblems, to be arrested less often, and to evidencedelayed initiation into drug use in later development(Element 5). These long-term outcomes will bemeasured using official records (school achievementtests, arrests, grade retention, graduation, utilizationof health and social services) and reports of keynatural informants (youth, teacher, parent).

In the Baltimore trials, active behavior changemechanisms are clearly specified in terms of the tim-ing and sequencing of intervention conditions in re-lation to children’s behavior (the reciprocal relationbetween Elements 2 and 3). Teacher attention isthought to be effective only in so far as it is deliv-ered in a contingent fashion, regularly following a

child’s “on-task” behavior or attention and not fol-lowing child misbehavior. Intervention mechanisms(Element 3) are explicitly tied to short-term out-comes (Element 4). Increased on-task behavior is hy-pothesized to result in a child’s immediate thoughperhaps short-term benefit from instruction over thenext few minutes. Indiscriminant or noncontingentincreases in teacher attention would not be effec-tive in increasing on-task behavior nor the result-ing instructional benefit. Short-term reductions in be-havior problems and increased on-task behavior andreading skills are construed to be the cumulative re-sult of teacher contingencies and skilled instructionalpractices. These short-term changes are expected tobecome increasingly stable and independent of inter-vention as the Program is implemented across one ormore academic years.

The heuristic model and principles for choice ofmeasurement methods could be extended to otherprevention programs, such as training life skills to re-duce early substance use and enhance health prac-tices (Botvin, 2000), school-based parenting inter-ventions to reduce risk for antisocial behavior andearly initiation into drug use (Dishion & Kavanagh,2004), and community-based interventions targetingHIV and STD risk in runaway youth (Rotheram-Borus et al., 2003). In each application, the selec-tion of measurement methods will vary depending onthe theories informing the intervention, but also onthe potential contribution of behavioral observationin terms of increased construct validity and reducedmeasurement bias.

THE AFFORDABILITY OF BEHAVIOROBSERVATIONS

The scientific merits of various measurementstrategies must also be weighed against practical is-sues. Good measurement is expensive and time con-suming. Measurement methods also vary in socialvalidity and acceptability by organizations provid-ing the context for intervention, intervention agents,and clients. Behavior observation is often avoided inintervention research because of its perceived highcost. These costs include development of a codingsystem, coder training and recalibration, actual cod-ing of behavior, and data management and analysis.This list appears to make observation a daunting taskrelative to standardized, “off-the-shelf” rating scales,checklists, self-report instruments, and diagnostic in-terviews.

Page 12: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

54 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

Fig. 3. Application of model of measurement elements to an intervention involving teacher classroom management of child attentionand behavior.

The cost and complexity of using behavior ob-servations are not as great as they initially appear.Relatively inexpensive approaches to observationthat maintain its advantages can be implemented byfollowing a few guidelines. The first guideline en-tails creating relatively simple, easy-to-use observa-tion coding systems. Many existing coding systemsare very complex and consist of a large number ofseparate categories (e.g., Family and Peer ProcessCode: Crosby et al., 1998). Although these codingsystems capture ongoing behavior and social influ-ence in great detail and along a real time line, codertraining, and fidelity are expensive. Many codes incomplex observational systems occur at such lowbase rates that they are collapsed into higher ordercoding categories (e.g., aversive behavior or negativeemotion) in data analyses.

As an alternative, investigators might identifytwo or three observable indicators of training (Ele-ment 1) and intervention processes (Elements 2 and3), and short-term outcomes (Element 4) specified ascritical by intervention theory. Coding interval occur-rence rather than real-time onset and offset of behav-ior provides another simplification. Simpler behaviorcoding systems are less expensive in terms of codertraining, maintenance of reliability and data analy-sis, are more portable, and can be implemented withminimal equipment. Simple systems also facilitatecollection of observational data to assess a core el-ement on more than one occasion at each assessmentpoint. Brief duration (5–10 min) coding of behavior

on multiple occasions provides considerable tractionin addressing issues of sampling and situational vari-ability in observational data (Snyder & Stoolmiller,2002; Stoolmiller et al., 2000).

Simpler observational coding systems have beendeveloped to capture family (Dishion et al., 2002) andpeer (Schrepferman et al., 2004) interaction. A re-liable multi-occasion observation system with pow-erful (incremental) predictive validity can be imple-mented for as little as $15 per participant (Schrepfer-man et al., 2004). Complex coding systems providea number of advantages in describing behavior andsocial interchange in more detail, including exquisitemeasurement of behavior duration and sequence.However, use of simpler coding systems is a betteralternative than to not use behavior observation atall to measure intervention processes and the short-term outcomes generated by intervention.

Another strategy to increase affordabilityentails the application of observation methods in in-tervention designs other than randomized controlledtrials. Single-participant experimental designs maybe used as an affordable alternative to randomizedtrial control group designs to promote repeated andintensive application of observation methods neededto more clearly delineate social influence and behav-ior change processes. Repeated single-participant orsmall sample designs may also be embedded withina larger randomized control group design in anepidemiologically informed way (Kellam, 1990) topromote generalization of the findings to larger

Page 13: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

Role of Behavior Observation 55

samples. In addition to powerful graphical analyses,increasingly sophisticated hierarchical analytic tools(such as Mplus, Muthen & Muthen, 1998–2004) arebeing developed to analyze long temporal stringsof data that result from intensive and repeatedbehavioral observations in relatively small samples.

The potential yield of repeated, intensive be-havior observation of training and intervention pro-cesses in small sample or single participant designs isconsiderable. The manner in which systematic socialinfluence engendered by intervention is reciprocallyrelated to client response and change has beenarticulated (Miller & Rollnick, 2002) and appliedin preventive interventions (Dishion & Kavanagh,2004), but has not been clearly tested largely becauserelevant processes have not been measured in a timeframe and behavioral molecularity at which theyoccur. The time dynamics of intervention-inducedbehavior change is poorly understood, partly as aresult of the standard protocol of applying staticmeasures of outcomes at baseline, post-intervention,and one or two follow-up points. Repeated use ofbehavior observation, beginning before and contin-uing throughout intervention as well as at repeatedpoints after intervention, would promote a betterunderstanding the temporal dynamics and shape ofchange trajectories resulting from intervention.

Finally, the costs of behavior observation needto be considered in the larger context of program-matic prevention research that involves not only test-ing the efficacy of preventive interventions, but alsotesting of the effectiveness of these interventionswhen they are integrated into real-world service sys-tems. Flawed conclusions concerning efficacy dueto compromised measurement can have far-reachingcosts when programs are scaled-up for general use.Investment in observational assessment during effi-cacy trials can produce long-term payoffs by increas-ing confidence in results of randomized control trialsused to inform more general service delivery.

SUMMARY AND CONCLUSION

Behavior observation methods have critical ad-vantages in measuring training and intervention pro-cesses and short-term outcomes, but these methodshave been under-utilized for a number of reasons,some of which are legitimate and others of which arenot. This report selectively reviewed what is knownabout behavior observation methods and provideda set of guidelines about when and how observa-

tional methods may be used in intervention researchas an alternative or complement to other more com-monly used measurement methods. Behavior obser-vation methods have a unique value and potentiallycritical role in theory-driven intervention research,especially in the assessment of training processes,treatment mediators, and the dynamics of behaviorchange.

ACKNOWLEDGMENTS

This report is a result of a collaborative ef-fort by members of the Workgroup for the Anal-ysis of Observational Data (WODA), supported inpart by grants 3P30 MH46690-13S1, R01 MH 57342,R01 MH40859, MH59855, R01 DA015409, and T32MH18911.

REFERENCES

Bakeman, R., & Gottman, J. M. (1997). Observing interaction (2nded.). New York: Cambridge University Press.

Botvin, G. J. (2000). Life skills training: Promoting health and per-sonal development. Princeton, NJ: Princeton Health Press.

Brown, C. H. (1993). Statistical methods for prevention trials inmental health. Statistics in Medicine, 12, 289–300.

Cairns, R. B., & Green, J. A. (1979). How to assess personalityand social patterns: Observations or ratings? In R. B. Cairns(Ed.), The analysis of social interactions: Methods, issues, andillustrations (pp. 209–226). Hillsdale, NJ: Erlbaum.

Carroll, K. M., Rounsaville, B. J., & Nich, C. (1994). Blind man’sbluff: Effectiveness and significance of psychotherapy andpharmacotherapy blinding procedures in a clinical trial. Jour-nal of Consulting and Clinical Psychology, 62, 276–280.

Collins, L. M. (1991). Measurement in longitudinal research. In L.M. Collins & J. L. Horn (Eds.), Best methods for analysis ofchange (pp. 137–148). Washington, DC: American Psycho-logical Association.

Crosby, L., Stubbs, J., Forgatch, M., & Capaldi, D. (1998). Fam-ily and peer process code training manual. Eugene: OregonSocial Learning Center.

Dagne, G. A., Howe, G. W., Brown, C. H., & Muthen, B. O.(2002). Hierarchical modeling of sequential behavioral data:An empirical Bayesian approach. Psychological Methods, 7,262–280.

DeGarmo, D. S., Patterson, G. R., & Forgatch, M. S. (2004). Howdo outcomes in a specified parent training intervention main-tain or wane over time? Prevention Science, 5, 73–90.

Dishion, T. J., & Kavanagh, K. (2004). Adolescent problem behav-ior: An intervention and assessment sourcebook for workingwith families in schools. New York: Guilford Press.

Dishion, T. J., Rivera, E. K., Jones, L., Verberkmoes, S., &Patras, J. (2002). Relationship process code. UnpublishedCoding Manual, Child and Family Research Center, Univer-sity of Oregon, Eugene.

Eddy, J. M., Dishion, T. J., & Stoolmiller, M. (1998). The analysisof intervention change in children and families: Methodolog-ical and conceptual issues embedded in intervention studies.Journal of Abnormal Child Psychology, 26, 53–69.

Page 14: The Role of Behavior Observation in Measurement Systems for Randomized Prevention Trials

56 Snyder, Reid, Stoolmiller, Howe, Brown, Dagne, and Cross

Follette, W. C. (1995). Correcting methodological weaknesses inthe knowledge base used to derive practice standards. InS. C. Hayes, W. C. Follette, R. M. Dawes, & K. E. Grady(Eds.), Scientific standards of psychological practice: Issuesand recommendations (pp. 229–247). Reno, NV: ContextPress.

Forgatch, M. S. (1994). Parenting through change: A training man-ual. Eugene: Oregon Social Learning Center.

Forgatch, M. S., Patterson, G. R., & DeGarmo, D. S. (2005). Eval-uating fidelity: Predictive validity for a measure of compe-tent adherence to the Oregon Model of Parent ManagementTraining (PMTO). Behavior Therapy, 36, 3–14.

Gardner, W., & Griffin, W. A. (1989). Methods for the analysis ofparallel streams of continuously recorded behavior. Psycho-logical Bulletin, 105, 446–455.

Hogue, A., Liddle, H. A., & Rowe, C. (1996). Treatment ad-herence process research in family therapy: A rationale andsome practical guidelines. Psychotherapy, 33, 332–345.

Ialongo, N., Poduska, J., Werthamer, L., & Kellam, S. (2001).The distal impact of two first grade preventive interven-tions on conduct problems and disorder in early adolescence.Journal of Emotional and Behavioral Disorders, 9, 146–160.

Jones, R. R., Reid, J. B., & Patterson, G. R. (1975). Naturalisticobservation in clinical assessment. In P. McReynolds (Ed.),Advances in psychological assessment (Vol. 3, pp. 42–95). SanFrancisco, CA: Jossey-Bass.

Kellam, S. G. (1990). Developmental epidemiologic frameworkfor family research on depression and aggression. In G. R.Patterson (Ed.), Depression and aggression in family interac-tion (pp. 1–48). Hillsdale, NJ: Erlbaum.

Kent, R. N., O’Leary, K. D., Diament, C., & Dietz, A. (1974). Ex-pectation biases in observational evaluation of therapeuticchange. Journal of Consulting and Clinical Psychology, 42,774–780.

Kraemer, H. C. (1991). To increase power in randomized clini-cal trials without increasing sample size. Psychopharmacol-ogy Bulletin, 27, 217–224.

MacKinon, D. P., & Lockwood, C. M. (2003). Advances in statis-tical methods for substance use prevention research. Preven-tion Science, 4, 155–171.

Miller, W. R., & Rollnick, S. (2002). Motivational interviewing:Preparing people to change addictive behavior. New York:Guilford Press.

Muthen, L. K., & Muthen, B. O. (1998–2004). MPLUS user’s guide(3rd ed.). Los Angeles, CA: Muthen &Muthen.

Olds, D. L. (2002). Prenatal and infancy home-visiting by nurses:From randomized trials to community replication. PreventionScience, 3, 153–172.

Patterson, G. R. (1982). Coercive family process. Eugene, OR:Castalia.

Patterson, G. R., & Reid, J. B. (1973). Interventions for aggressiveboys: A replication study. Behavior Research and Therapy,11, 383–394.

Radke-Yarrow, M., & Zahn-Waxler, C. (1979). Observing inter-action: A confrontation with methodology. In R. B. Cairns(Ed.), The analysis of social interactions: Methods, issues, andillustrations (pp. 37–66). Hillsdale, NJ: Erlbaum.

Reid, J. (2003). Development of measurement systems for random-ized trials. Paper presented to the American Institutes forResearch, Center for Integrating Education and PreventionResearch in Schools, Washington, DC.

Reid, J. B., Patterson, G. R., & Snyder, J. (2002). Antisocial behav-ior in children and adolescents: A developmental analysis andmodel for intervention. Washington, DC: American Psycho-logical Association.

Rotheram-Borus, M. J., Song, J., Gwadz, M., Lee, M., VanRossem, R., & Koopman, C. (2003). Reductions in HIV riskamong runaway youth. Prevention Science, 4, 173–188.

Schelle, J. (1974). A brief report on the invalidity of parent evalua-tions of behavior change. Journal of Applied Behavior Anal-ysis, 7, 341–343.

Schrepferman, L., & Snyder, J. (2002). Coercion: The link betweentreatment mechanisms in behavioral parent training and riskreduction in child antisocial behavior. Behavior Therapy, 33,339–359.

Schrepferman, L., Snyder, J., Prichard, J., & Suarez, M. (2004).An observational system for children’s social interaction withpeers: Reliability and validity. Manuscript submitted for pub-lication, Wichita State University, Wichita, KS.

Skindrud, K. D. (1973). Field observation of observer bias underovert and covert monitoring. In L. Handy & E. Mash (Eds.),Behavior change: Methodology, concepts, and practice (pp.97–118). Champaign, IL: Research Press.

Snyder, J., & Stoolmiller, M. (2002). Reinforcement and coercionmechanisms in the development of antisocial behavior: Fam-ily processes. In J. R. Reid, J. Snyder, & G. R. Patterson(Eds.), Antisocial behavior: Prevention, intervention and ba-sic research (pp. 65–100). Washington, DC: American Psy-chological Association.

Stoolmiller, M., Duncan, T. E., & Patterson, G. R. (1995). Someproblems and solutions in the study of change: Significantpatterns of client resistance. Journal of Consulting and Clini-cal Psychology, 61, 920–928.

Stoolmiller, M., Eddy, J. M., & Reid, J. B. (2000). Detectingand describing preventive intervention effects in a universalschool-based randomized trial targeting delinquent and vio-lent behavior. Journal of Consulting and Clinical psychology,68, 296–306.

Walter, H. I., & Gilmore, S. K. (1973). Placebo versus social learn-ing effects in parent training procedures designed to alter thebehaviors of aggressive boys. Behavior Therapy, 4, 361–377.

White, L., Tursky, B., & Schwartz, G. E. (1985). Placebo: Theory,research and mechanisms. New York: Guilford Press.

Willett, J. B. (1989). Some results on the reliability for longitudi-nal measurement of change: Implications for the design ofstudies of individual growth. Educational and PsychologicalMeasurement, 49, 587–602.