understanding prevention effectiveness in real-world

Understanding Prevention Effectiveness inReal-World Settings: The National

Cross-Site Evaluation of HighRisk Youth Programs

Soledad Sambrano,1 J. Fred Springer,2 Elizabeth Sale,2

Rafa Kasim,2 and Jack Hermann3

1Center for Substance Abuse Prevention, Rockville, Maryland, USA2EMT Associates, Inc., Folsom, California, USA

3ORC Macro, Inc., Calverton, Maryland, USA

Abstract: The National Cross-Site Evaluation is a large multisite evaluation (MSE)

of 48 substance abuse prevention programs, 5,934 youth participating in programs,

and 4,539 comparison youth programs. Data included a self-report questionnaire

administered at 4 points in time, detailed dosage data on over 217,000 program

contacts, and detailed site visit information. In a pooled analysis, the programs did not

demonstrate significant positive effects on a composite outcome measure of tobacco,

alcohol, and marijuana use in the previous 30 days. However, disaggregated analyses

indicated that 1) sites in which comparison groups had strong opportunity to

participate in prevention programs suppressed observed effects; 2) youth who had

already started using before they entered programs reduced use significantly more

than comparison youth who had started using; and 3) both males and females who

participated in programs significantly reduced use relative to comparisons, but in very

different patterns. Combining these patterns produced an apparent null effect. Finally,

programs that incorporated at least 4 out of 5 effective intervention characteristics

This research was conducted under the Center for Substance Abuse Prevention

(CSAP) contract #277-95-5002. The views expressed herein represent the opinions

and analyses of the individual authors and may not necessarily reflect the opinions,

official policy, or position of the U.S. Department of Health and Human Services, the

Substance Abuse and Mental Health Service Administration, or the Center for

Substance Abuse Prevention.

Address correspondence to J. Fred Springer, Ph.D., EMT Associates, Inc., 771 Oak

Ave., Parkway, Suite 2, Folsom, CA 95630, USA; E-mail: [email protected]

The American Journal of Drug and Alcohol Abuse, 31:491–513, 2005

Copyright D Taylor & Francis Inc.

ISSN: 0095-2990 print / 1097-9891 online

DOI: 10.1081/ADA-200068089

Order reprints of this article at www.copyright.rightslink.com

https://s100.copyright.com/AppDispatchServlet?publisherName=tandf&publication=LADA&contentID=10.1081/ADA-200068089&mac=&numPages=23&orderBeanReset=true

identified in the study significantly reduced use for both males and females relative to

comparison youth. The lessons produced by this study attest to the value of MSE

designs as a source of applicable knowledge about prevention interventions.

Keywords: Substance use, prevention, multisite evaluation, underage drinking

INTRODUCTION

Research on programs for substance abuse prevention has been important for

contributing strengthened knowledge and practice for community-based

programs that support youth—particularly youth in high-risk circumstances

(1). One approach to these research contributions emphasizes a ‘‘phases of

research’’ (2) perspective in which highly controlled clinical trials (3) are

used to test and provide science-based verification of ‘‘model programs’’ (4)

for dissemination to the field. A second approach emphasizes learning from

the experience of existing programs by studying various programs imple-

mented in community settings. Meta-analyses can identify intervention de-

sign or implementation features that characterize more effective programs,

that is, those that produce larger effect sizes on intended outcomes (5–9).

Recently, multisite evaluations (MSEs) have become more visible. MSEs

are designed to compare program characteristics, their relation to program

effectiveness, and common primary data across sites (10). This overcomes

some of the inherent limitations introduced by nonstandard data in meta-

analytic studies, particularly with respect to process measures (9:40). Because

they allow natural variation in method and place, MSEs are suited to identify

the relative effectiveness of different intervention and implementation strat-

egies in real-world settings. With the ultimate objective to develop knowledge

that can be applied by practitioners in real-world settings, MSEs of locally

driven interventions may represent a more efficient approach to knowledge

generation than multisite trials of carefully controlled program models (11).

The National Cross-Site Evaluation of High Risk Youth (HRY) Programs

was designed as a large MSE that would document evidence-based lessons

from the Center for Substance Abuse Prevention’s (CSAP’s) large High Risk

Youth Prevention demonstration (1). This article provides a summary of major

study findings concerning the effectiveness of the prevention interventions

included in this 5-year study (48). First, we summarize the degree to which

participation in community-based prevention programs makes a difference in

the longitudinal substance use patterns of program participants compared to

similar youth who did not participate in the programs. To improve and specify

estimates of overall sustained effectiveness, trends are adjusted for select

method-induced bias and for differences in the substance abuse history of

participants. Second, the range of differences in program effectiveness as

measured by entry to exit effect sizes are summarized, and meta-analyses are

conducted to identify design and implementation characteristics that correlate

significantly with program effect sizes. Third, longitudinal effectiveness of

S. Sambrano et al.492

those programs is tested that manifest multiple positive design and

implementation practices. A final section presents a discussion of findings

and their implications for prevention practice.

METHODS

As an MSE assessing the effectiveness of prevention programs in achieving

outcome objectives, the national HRY study includes several important

features: 1) a common instrument, the CSAP National Youth Survey,1 used to

collect individual outcome data across all study sites; 2) a viable comparison

group constituted in each study site; 3) data collected from over 10,000 youth

(5,934 participant and 4,539 comparison youth) at 4 points in time, including 2

follow-up points after program exit; 4) a common instrument used to collect

data on exposure to prevention services for each program participant, totaling

more than 217,000 coded intervention exposures; and 5) a common site-visit

protocol used to collect data on program-level information, including

information on design and implementation.

Program Sample

The sample was drawn to represent the range of strategies, capabilities, and

participation in prevention programs funded through the High Risk Youth

Initiatives in 1994 and 1995.2 Thus, the sample represents school- and

community-based programs using differing intervention strategies imple-

mented by a variety of organizations with different resources, staffing, and

experience. Since no criteria of design or implementation strength were used

to select programs, the study ‘‘provides the opportunity to learn about a range

of program experience in actual community conditions, and to learn what

design and implementation features contribute to effectiveness in reaching

prevention objectives’’ (12:2).

The High Risk Youth programs were diverse in organizational context,

population served, and program setting. Over two-thirds of the grantee

organizations in the study were in urban communities. The remaining sites

were in rural or suburban/mixed environments. The majority (62%) were

community-based non-profit grantee organizations. Research organizations,

typically in universities or health institutions, received more than one-fourth

2Two sites were dropped from the outcome analyses because the programs were

delivered in secure institutions in which access to substances or self-reports of use

would not be comparable to other sites in which youth are in the community.

1The instrument includes measures of risk and protection factors, perceived

normative environments, attitudes toward drug use, and self-reported substance use

(i.e., lifetime, 30-day, amount), including the use of alcohol, tobacco, inhalants,

marijuana, and other drugs.

493Prevention Effectiveness in Real-World Settings

of the grants. The remaining grants were to school districts and other

government agencies (e.g., probation departments). Nearly one-third (29.2%)

delivered a substantial portion of their services as part of school classroom

activities, with the remainder offering services primarily after school.

Programs varied in duration and amount of contact with participating

youth. One-fourth of the programs delivered services to participants for 4

months or less in length, another one-fourth of the programs were longer than

a year, and the remaining 50 percent delivered services for between 5 and 11

months. Half of the programs had an average of 40 or fewer hours of direct

contact with participating youth; about one-third had between 41 and 80

hours of contact; and the remaining 12 percent delivered an average of more

than 80 hours to participants.3

Youth Sample

The sample of youth from all 48 sites totaled 10,473, with 5,934 participant

and 4,539 comparison youth. The programs served youth throughout the

adolescent age range, of both genders, and from a variety of ethnic and racial

identities. The majority of youth in the programs (52%) were of middle school

age (12–14). Forty percent of the programs served females only,4 and two-

thirds of the total youth sample was female. More than one-third (35%) of the

study youth identified themselves as African American; 26 percent identified

themselves as Hispanic; and the remaining youth were relatively evenly

distributed among Native American (13%), non-Hispanic White (12%), and

Asian or Pacific Islander (11%) identification.

Figure 1 compares use rates of 12- to 17-year-olds in the study sample

with those of youth who participated in the 1998 National Household Survey

on Drug Abuse (NHSDA), a randomly sampled general population survey of

persons 12 years and older.5 The youth in the cross-site sample reported

higher use for all substances within all age groups than did the general

population. Although the circumstances of the NHSDA and the National

Cross-Site Evaluation data collection were different, this comparison

suggests that the cross-site programs served youth who were at higher risk

for initiating substance use when young.6

4The programs serving girls only were funded through a special ‘‘Female

Adolescent’’ initiative within the HRY funding program.

3Service hours are actual. Detailed dosage data was an important feature of data

collection.

5Eighteen-year-old youth were not included in these analyses due to small sample

sizes within this age group.6The NHSDA survey is administered to a national, random sample through face-to-

face, in-home interviews; the cross-site instrument was administered primarily in

proctored, group settings.


Figure 1. Comparison of NHSDA and HRY substance use rates: percentage of

youth using cigarettes, alcohol, and marijuana in the past 30 days.

Figure 2. Percentage of youth reporting substance use in past 30 days at program

entry by age (N=10,473).


Even though youth in the HRY sample reported higher use than is

reported in the general population, their young age meant that the majority

were not substance users at the baseline measurement (entry into the

prevention program). Figure 2 displays, by age and sex, any self-reported use

of cigarettes, alcohol, or marijuana in the 30 days before baseline data

collection of youth in the study. Substance use is very low until about age 12

and then rises rapidly through the early teen years. Rates are somewhat lower

for girls, particularly after age 14.

Dependent Measures

Programs within the National Cross-Site Evaluation shared a common

outcome goal—to reduce the rates at which participating youth initiated or

increased their substance use. Two approaches to evaluating change in

substance use rates were taken; changes in individual substance use rates, and

aggregated changes at the program level. At the individual level, the criterion

variable for assessing this outcome was a measure of the frequency with which

the youth used cigarettes, alcohol, or marijuana over the last 30 days.7 In

several analyses, items for cigarettes, alcohol, and marijuana use in the last 30

days are specified as separate outcomes. At the program level, the measure of

program effectiveness was a statistical effect size that expresses the difference

between participant group change on ‘‘30 day substance use’’ between

baseline and program exit and comparison group change over the same time

period, standardized for comparability across the different programs (13).8

Program Level Measures

The National Cross-Site Evaluation produced detailed data on the design

and implementation of the interventions actually delivered in each of the

programs in the study (14, 15). Program level data collection included

1) structured site visits where evaluators interviewed program directors,

8For each site, effect sizes were calculated as the pre-post mean change for program

youth minus pre-post change for comparison youth over the pooled standard deviation.

Effect sizes were calculated on means adjusted for age, sex, and ethnic background;

and further adjusted to account for differential sample size using the method outlined

by Becker (1988).

7Youth reported their frequency of substance use by indicating how many days they

had used cigarettes, alcohol, or marijuana in the previous 30 days. Response

categories were (0) no use; (1) 1–2 days; (2) 3–5 days; (3) 6–9 days; (4) 10–19 days;

and (5) 20–31 days. Scores on each variable were combined in a simple, 3-item

additive scale with scores adjusted within a range of 0 to 5. (Cronbach’s alpha=0.78

overall and 0.71 or above across all sub-groupings by age, race/ethnicity, and gender.)


managers, service and evaluation staff, observed program activities, and

completed common protocols;9 2) individual contact data for all participants

(dosage), which included the type of services being offered, method of

delivery, and amount of service contact within each category of intervention;

and 3) case study narratives and standardized protocol summaries. The

process of reviewing and summarizing these data resulted in the development

of a comprehensive set of design and implementation measures (16).

Building on prior research (5, 6, 8, 17) measures in the areas of program

design and implementation were constructed to explain potential differences

in program effectiveness across study sites.

Program Design Measures

To assess differences in effectiveness attributable to prevention strategy,

programs were categorized as emphasizing one of four types of intervention:

behavioral skills programs (n=13), that focus on social skills and basic life

skills development (e.g., refusal skills, anger management, conflict resolution,

decision making, and academic enrichment), informational focused programs

(n=17) that emphasize educational content concerning tobacco, alcohol, other

drugs and related issues; recreational focused programs (5 programs) that

devote substantial time to substance-free leisure and enrichment activities, and

affective programs (12 programs) focusing on youth’s personal and social

concerns, often including self-image and self-esteem.

With respect to intervention design, the HRY process data supported

development of measures of the method in which programs are delivered.

Previous research has shown that programs that engage youth in interac-

tive activities are more effective than noninteractive programs (5, 6, 8).

The HRY data focused on identifying specific approaches to interactive

delivery. Specifically, we identified programs with delivery methods that

focused on building connections with others and programs that focused on

introspective, or self-reflective learning (12:12). Of the 46 programs an-

alyzed for substance use change, 17 had a strong emphasis on introspec-

tive learning methods and 13 placed an emphasis on building connections

with others.

9Site evaluators reviewed program records and documentation, interviewed program

staff and local evaluators, and observed program activities. Information from these

sources was used by each evaluator to complete standard protocols that paired closed-

ended indicators of program design and implementation process characteristics with

brief narrative descriptions and rationale for coding decisions. This careful field

procedure produced comparable information on program design and implementation

process features across sites, while preserving information on program context that

can be important to interpreting findings on complex organizational processes.


Program Implementation Measures

Several areas of program implementation were hypothesized to have a

relationship to program effectiveness, including staff training, program

coherence, program duration, program intensity, and program management.

Findings on the relative contribution of each of these variables are reported

elsewhere (16, 18). In the analyses reported here, 2 measures of the im-

plementation of program services in the National Cross-Site programs are

particularly important. The cross-site measure of program coherence refers to

the extent to which program theory is explicit, articulated, and used to focus

multiple activities on achieving program objectives. Achieving coherence is

closely related to adequate and relevant training of staff so that program

objectives, procedures, and rationale are understood and put into practice day

to day. Analysis of the site visit data allowed us to categorize the programs as

exhibiting higher (n=19) or lower coherence (n=27). Program intensity is

measured as the number of hours per week that youth were involved in the

program.10 Intensity ranged from programs that averaged less than one hour

of service per week, to programs offering 15 hours of service. To accom-

modate the skewed shape of the distribution of intensity for the 46 programs,

and to make the intensity measure more compatible with the dichotomous

categorizations identified above, programs were divided into 2 equal groups

of 23 sites, those with higher intensity (3.3 hours per week or more) and

lower intensity (less than 3.3 hours per week).

In summary, the multimethod information gathered in the cross-site

evaluation supported development of measures of program implementation

through direct indicators (e.g., dosage measures); multiple-item indicators, or

coded variables. The specific measures described here have been identified in

previous analyses as those most strongly associated with program effect sizes

and are used in subsequent analyses to identify strong programs.

Procedures

To produce optimal estimates, multilevel statistical models that correct for

potential biases attributable to ‘‘nesting’’ of individuals in separate settings

are appropriate (19, 20). In the analyses presented below, we utilize

Hierarchical Linear Modeling (HLM) to conduct pooled, longitudinal

analyses (21). In addition, the analysis uses a meta-analytic approach to

identify, examine, and explain variation in impact on outcomes across sites.

10The program level variables used in this article are those that previous analyses

have identified as significant correlates of program effectiveness in achieving

outcomes (16). These analyses demonstrated that intensity is a significant predictor of

effect size in these programs, while total amount of contact and length of program

were not.


Effect sizes are calculated for the period of time youth were actively involved

in program services (baseline to exit). As noted above, these effect sizes are

the dependent variable in exploratory analyses of the relationship between

program characteristics and program effectiveness.

In summary, the overall analytic strategy used for the findings presented

here combines ‘‘top-down’’ or pooled analyses using multilevel statistical

modeling, and ‘‘bottom-up’’ or meta-analytic approaches based on program

level explanations of effect sizes (22). Each of these approaches answered

different research questions. The multilevel statistical modeling was

important for identifying pooled longitudinal effects of interventions, and

for describing differences in these effects between subgroups of programs or

subgroups of participants (e.g., youth who were already using substances at

baseline). The meta-analytic approach was useful for clearly describing the

differences in effectiveness between sites and for exploratory analyses

identifying those characteristics of programs that were related to larger effect

sizes. Using both approaches allowed for exploration of the data,

identification of variables contributing to effective prevention programming,

and testing of the effects of the longitudinal impacts of effective practice.

FINDINGS

A full pooled analysis of trends for the composite 30-day substance use by

program participants (n=5,605) and comparison group (n=4,341) youth

across 4 points in time produced no statistically significant differences.

However, as demonstrated in the growth curves displayed in Figure 3, HLM

analyses demonstrated that participant youth did report less increase in the

use of alcohol (linear trend, treatment interaction=�0.056, p<0.05) or mar-

ijuana (linear trend, treatment interaction=�0.06, p<0.05).11 Even though

statistically significant, these substance-specific differences in trend are very

small, and they do not hold in the three-item 30-day use measure in which

cigarette use is included.

11Analyses were 3-level, HLM models with repeated measures (Level 1) nested

within individual respondents at Level 2, and respondents nested within sites at

Level 3. The Level 1 covariate was time between administrations; Level 2 covariates

included 1) group status (participant or comparison), selection propensity score (a

multivariate correction for nonequivalence between participant and comparison

groups at baseline within each site), 3) age, 4) gender, and 5) 2 dichotomous measures

of race/ethnicity (African American and Hispanic). At the program exit administra-

tion, retention rates were 17% for participants and 83% for comparison; at 6 months

after exit, 74% for participant and 75% for comparison; and at 18 months after exit,

68% for participants and 67% for comparison. HLM analyses reported here are on the

full sample with missing data replacement using an e-m algorithm technique.


In the real-world settings of programs in this MSE sample, there are 3

plausible explanations for why the pooled result may underestimate the

effectiveness of substance abuse programs in the study. First, some or all of

the evaluation designs may allow sufficient design ‘‘noise’’ (23) to make it

difficult to detect true effects (e.g., contamination of comparison groups).

Second, the effectiveness of programming may be differential, or

differentially detectable given the outcome measure, according to the

characteristics of participants (e.g., history of use, gender). Third, given the

diversity of design and implementation circumstances of individual

programs, differences between effective and ineffective programs may be

masked in pooled findings. Each of these plausible explanations for

suppressed pooled findings is explored in the following sections.

Adjustments for Comparison Group Prevention Exposure

In field experiments comparison group youth may be exposed to services

that have the same or similar objectives as the intervention under study.

This potential source of comparison group contamination is even more

Figure 3. Trends in substance use for participant and comparison youth overtime

(N=9,946).


problematic in longitudinal studies that include follow-up periods in which

either participant or comparison group youth may participate in additional

community services. When the intent of the study is to test the effectiveness

of particular interventions, the participation of comparison youth in similar

services may diminish the ability to detect effects.

The HRY study included data that identified the degree to which youth

in the comparison groups in each site had the opportunity to participate in

substance abuse prevention programming services.12 This measure allowed

comparisons of outcomes for programs in which comparison youth had

‘‘lower’’ or ‘‘higher’’ opportunity for participation in substance use

Figure 4. Trends in 30-day substance use measures for participant and comparison

youth in sites with low opportunity for comparison group participation in prevention

(N=5,195; 23 sites).

12Data was coded from site visit protocols, and confirmed or elaborated through

telephone interviews with program directors and local evaluators. Comparison group

exposure to prevention services was measured through a multiple-item index that

summed scores on 1) whether the comparison group was ‘‘reduced service,’’ 2) whether

the comparison group was drawn from another organized program (e.g., an after-school

service program), 3) whether the comparison group was in a ‘‘high service’’

community, 4) whether the comparison group was in an urban area, and 5) whether

key informants identified explicit program participation by comparison group youth.


prevention programs. Comparison youth in the 23 sites with higher

opportunity for prevention participation increased their substance use less

than comparison youth in the 23 sites with lower opportunity for

participation. Specifically, change scores for comparison youth in high

exposure sites were 0.029 for cigarettes, 0.032 for alcohol, and 0.03 for

marijuana compared to 0.09, 0.075, and 0.076, respectively, for youth from

communities with less opportunity for exposure to prevention services. This

result is consistent with the expectation that full prevention effects in sites

with higher availability of prevention services will be underestimated

because comparison youth in these field settings also benefit from prevention

services. Studies in which comparison youth have opportunities to participate

in prevention activities within the community, and this exposure remains

unmeasured, will underestimate the overall value of prevention services in

attaining intended outcomes.

Figure 4 summarizes trends for specific substances and the combined

30-day use measure for participant and comparison youth in the 23 sites in

which comparison youth have low opportunity for exposure to prevention

services. As compared to the full sample results in Figure 3, the growth

curves indicate greater separation between participant and comparison youth

with participant youth consistently indicating lower substance use rates. As

in the full sample, statistically significant benefits for participant youth

across programs are found only for the use of alcohol (linear trend, treatment

interaction=�0.075, p<.05) and marijuana (linear trend, treatment inter-

action=�0.070, p<.05). Adjusting the sample to account for comparison

group exposure to prevention services strengthens the evidence for the

effectiveness of prevention programming in reducing substance use, but

the pooled effect across diverse programs in diverse community settings

remains small.

Adjustments for Participant Characteristics

A pattern of very small or null effects may be misleading if programs have

strong measurable effects for youth in one subgroup, but no measurable

effects for another subgroup. The combined result may mask the positive

effects experienced by the benefiting subgroup, particularly if the benefiting

group is smaller. Similarly, programs may benefit different groups of

participants in different ways. The combination of these beneficial patterns

does not accurately reflect the specific effects for either group, and may

produce an apparent null effect. In the HRY study, analyses of differential

program impacts across participant subgroups found important differences in

impacts for youth who were self-reported substance users in the 30 days

before baseline (as compared to those reporting no use in the last 30 days),

and between males and females.


Self-Reported Use at Baseline

The National Cross-Site Evaluation programs served an adolescent and pre-

adolescent population centered on the middle school years. Approximately

three-fourths (76%) of the youth in the study reported that they had not used

any of the 3 index substances for the study in the last 30 days at their baseline

measurement point. While the age and abstinent or experimental use patterns

of these youth are appropriate to prevention programs, the large percentage of

zero use in the major dependent variable reduces variance and creates a

‘‘floor effect’’ for measuring reductions in use. Furthermore, the periodicity

of the measure (i.e., use in the last 30 days) makes instability and significant

regression to the mean likely for experimenters or occasional users. In sum,

the large percentage of youth who did not use substances, or who were

occasional users, made it impossible to produce dramatic reductions in

substance use among the full pooled study sample.

In high-risk communities, prevention programming must change

behavior in youth who have already begun to use substances, not just deter

Figure 5. Trends in 30-day substance use measures for participant and comparison

youth who initiated substance use prior to baseline measurement in sites with low

opportunity for prevention participation (n=1,235, 23 sites).


those who have not yet started. Approximately 25 percent of youth reported

some substance use at baseline.

Figure 5 displays the trends in 30-day substance use measures for

participant and comparison youth who had already begun to use substances in

the 23 sites in which comparison youth have less opportunity to participate in

prevention services. Self-reported 30-day substance use was slightly higher

than that of comparison youth at baseline. This reported use rate drops below

that of comparison youth by program exit, and the magnitude of this rela-

tive decrease in participant use grows at the 6- and 18-month postexit

measurement points. The difference between participant and comparison

group use is highly significant for youth who reported some use at baseline

(coefficient=�0.27, p=0.02). These findings indicate that the CSAP-funded

prevention programs provided effective interventions for youth who have

already initiated use. As in previous analyses, the differences in linear trend

for alcohol and marijuana use were significant (linear trend, treatment

interaction=�0.368, p=0.008 for alcohol, linear trend, treatment inter-

action=�0.307, p=0.038 for marijuana) with similar patterns over time as

shown in the figure related to 30-day substance use.

Figure 5 also shows that, overall, participant youth in this subgroup

reduced their use of all substances through the full period of the study. Given

that widespread substance use typically starts and escalates among youth in

the age groups included in this study, programs are typically considered

effective when they slow the rate of increased use among program partic-

ipants. However, this analysis of youth who were already using at baseline

demonstrates that program participation can actually reduce use rates.13

Gender Differences

Prevention practitioners and researchers have become increasingly aware of

the differences between boys and girls in the way substance use develops

and how prevention effects this development (24). Gender differences in the

National Cross-Site sample were evident in our partitioned analyses of males

and females who participated in programs in which comparison youth had

low opportunity for prevention program services. Figure 6 displays separate

growth curves for 30-day substance use by males and females. In contrast to

13The slight decrease in comparison group use in these analyses may be attributable

to regression toward the mean. Some of the youth who reported use in the 30 days

prior to the baseline measurement are not habitual users, for example, and may not use

in the 30 days prior to a second measurement. Controlling for this tendency is one

reason for a comparison group, and the important point here is that reductions in use

scores at repeated measurement points for participant youth were larger than those for

comparison youth.


the pooled analysis, the patterns of effects were nonlinear for males

(quadratic trend, treatment interaction=0.049, p=0.025) but linear for

females (linear trend, treatment interaction=�0.044, p=0.016). In both

cases, participant group youth experience a decrease in use relative to

comparison youth of the same gender. However, the shapes of the growth

curves for participant and comparison youth of each gender are dramatically

different. The curves for participant and comparison females grow pro-

gressively further apart in the shape of a fan. The effects of programming

are small, but are sustained and increase over time. By contrast, use rates

reported by participant boys diverge more dramatically from comparison

boys at exit and 6 months after exit, but the participant and comparison

trends converge at 18 months after exit. In the study overall across all the

programs in the low exposure sites, boys reported reductions in use while

involved in the prevention program. However, on average these gains eroded

by the end of the study.

These differences are dramatic, and the average trends when these

patterns are combined do not clearly reflect either. The slow divergence in

the female pattern pulls in the more dramatic divergence in participating and

comparison males at the middle time points. The convergence in the males at

the final time point mutes the growing difference for females. The result is a

nonsignificant trend, and a masking of two very different, but significant,

longitudinal patterns. When partitioned, both females and males show

Figure 6. Trends in 30-day substance use by gender in sites with low opportunity for

prevention participation.


stronger, albeit different, program effects than are evident when they are

combined in the pooled sample.

Differences in Program Effectiveness

Another plausible explanation of the limited prevention effect found in the

pooled analyses is that there may be significant differences in the effectiveness

of programs across sites. Null, or even counterhypothetical, effects of some

programs may cancel out the positive outcomes produced in other sites.14 A

meta-analysis of program effect sizes was used to explore this possibility.

Figure 7 displays the distribution of effect sizes for 30-day substance use

across the programs in the National Cross-Site Evaluation sample.

Sites with high opportunity for prevention participation are included in

this distribution, and the magnitudes of effect sizes in Figure 7 are biased

14To test whether there was significant variance in longitudinal program effec-

tiveness between programs in the HRY sample, we tested the ‘‘deviance’’ between two

models, one with the between site variance constrained to ‘‘0’’ and the other with

variance unconstrained. A likelihood ratio chi-square test was used to test the signi-

ficance of the variability in the group linear trend interaction across sites. For a model

including group, gender, Hispanic identity, African American identity, and a propensity

score, the chi-square values were 52.592 with 2 degrees of freedom, highly significant.

Figure 7. Distribution of effect sizes for 30-day substance use across substance

abuse prevention programs (n=46).


somewhat to the left.15 Values range from �0.71 to 1.54, with positive values

indicating that participant youth reduced their use rates relative to com-

parison youth during the program period. Most of the effect sizes are small

with 19 programs clustered between �0.09 and 0.09. Just 8 programs had

positive effect sizes larger than 0.20, a conventional standard for a meaning-

ful program effect. Nevertheless, a primary objective of the cross-site eval-

uation was to generate information about the characteristics of effective

programs in natural community settings. Not all of the programs reduced

substance use, though they often had other positive outcomes for the youth.

Among those that did show positive effects on use, some were more effective

than others. This diversity in outcome, and our ability to measure differences

between programs through our primary data on program design and

implementation, provided an opportunity to identify those program character-

istics that correlate with more positive effect sizes. This exploratory analysis

tested hypotheses developed in past prevention research against this large

sample of programs implemented under real community conditions.

Analyses of a large number of program level measures of organizational

capacity, intervention design, and implementation design (11, 12, 16) iden-

tified 5 program characteristics that produced statistically significant16

contrasts in average effect size between programs that exhibited the char-

acteristic and those that did not (see Program measures section above). Three

are features of intervention design (behavioral skills emphasis, use of intro-

spective learning, and a focus on connection-building delivery methods) and

2 are features of implementation (coherent program implementation practices

and high service intensity).

Figure 8 demonstrates average effect sizes for programs manifesting

each of the characteristics of effective programming. While these effect sizes

are substantially larger than the overall effect size for all 46 programs

(0.022), none of these average values exceed 0.20, a common standard for a

small but meaningful effect This is partly due to the downward biases of

comparison groups with high exposure to prevention services, the limited

ability to detect effects because the great majority of youth are nonusers at

15In a separate analysis (11), it was estimated that high opportunity for comparison

group exposure to prevention services reduced the grand mean (average) adjusted

effect size for sample programs from .081 to .022. Since the effects of comparison

group exposure are not collinear with important explanatory variables in the effect

size analysis, and excluding the sites would seriously reduce the power of the

program-level analyses of effect sizes, the effect size analyses include all 46 study

sites for which outcome measures were valid.16The exploratory analyses to identify characteristics associated with effect size

were conducted as dichotomous contrasts using the nonparametric Wilcoxon rank

order test as a conservative alternative for reducing the influence of outliers. All of the

contrasts for these five variables were statistically significant with an n=46, and

probability <0.05, one-tailed test. Results were consistent when tested with parametric

tests and a windsorized effect size measure.


baseline, and the suppression of overall effects attributable to the combina-

tion of genders. Furthermore, it is not surprising that no one characteristic of

the complex social processes represented in prevention programs would be

sufficient to attain strong effects across diverse programs and contexts.

Nonetheless, each of these program characteristics made a statistically signif-

icant difference in the magnitude of measured effects.

Effects of Multiple Positive Program Characteristics

Further analyses tested the degree to which programs with multiple positive

characteristics produced larger effects. The fact that the 5 characteristics were

not highly correlated strengthened the potential for gain in effectiveness

through combinations of strong design and implementation characteristics.17

The analyses identified a clear increment in effectiveness when programs

implemented at least 4 of the positive program characteristics.18 The exact

Figure 8. Summary of statistically significant effects of program characteristics on

effect sizes for 30-day substance use.

17Correlations between most program level measures were nonsignificant with

coefficients (phi) ranging from 0.04 to 0.266, indicating that for the most part positive

characteristics do not cluster in the same programs. The use of introspective learning,

which was significantly correlated with connection-building methods (0.690) and

program coherence (0.531) is the major exception. This co-occurrence would make it

difficult to establish the independent contribution of these variables to program

effectiveness. However, introspective learning was retained in the following analysis

because of its theoretical interest, and because the analysis technique does not include

the estimation of unique contributions by individual program characteristics.18Importantly, the 8 comprehensively strong programs were not identified simply on

the basis of the magnitude of their effect sizes (which, as previous analyses have

shown, may be biased or inaccurate). Rather the programs are identified according to

whether they exhibit program characteristics that are correlated with effect sizes

across a large number of diverse programs.


combination did not make a significant difference. Eight of the 46 programs

were identified as comprehensively strong programs implementing at least 4

of the 5 positive characteristics. The median effect size for these eight

programs is 0.22 compared to a median of �0.02 for all other programs. This

difference is large and highly statistically significant (p<.01).

Longitudinal Outcomes of Effective Programming

These explanations of program effectiveness have been based on program

effect sizes calculated for the period between program entry and exit. An

advantage of the National Cross-Site Evaluation of High-Risk Youth

Programs data set is that it allows the determination of whether the effec-

tiveness of the 8 comprehensively strong programs is maintained after

program exit. The study includes 6- and 18-month follow-up data points.

To test the hypothesis that these comprehensively strong programs will

produce lasting effects on the substance use of participants, the HLM analysis

was replicated with only participants and comparison youth in the 8

comprehensively strong program sites. Figure 9 presents growth curves con-

trasting trends in substance use for participants and comparison youth in the

8 comprehensively strong programs. These longitudinal findings support and

extend the effect size analysis reported above. There are dramatic differences

between the findings for the total sample and findings for the 8 programs with

at least 4 positive program characteristics. While over time differences in

Figure 9. Trends in 30-day substance use for participant and comparison youth in

8 sites with positive program characteristics (n=1759).


substance use change between participant and comparison group youth were

very small (not statistically significant) in the full sample, differences be-

tween the participant and comparison youth in the 8-program cluster were

pronounced and statistically significant [interaction coefficient=0.0322,

p=0.039 (one-tailed test)]. Even 18 months after program exit, use rates

for the participant youth were significantly lower than use rates of comparison

youth, demonstrating enduring prevention effectiveness. Furthermore, the

growth curves for these programs are nearly identical for males and females,

suggesting that comprehensively strong programs produce similar and lasting

benefits for both genders (24). These 8 comprehensive programs are more

effective over time than other programs, and their effects are more equitable

for males and females.

DISCUSSION

The National Cross-Site Evaluation of High Risk Youth Programs is an

example of a large-scale MSE designed to improve knowledge concerning

the design and implementation of effective prevention programs in

community settings. The sample represents the diversity of HRY programs

funded by CSAP during the funding period under study, including the full

range of design and implementation strengths and weaknesses that

characterize community-based programs. Neither the individual site research

designs nor the findings were filtered through the scrutiny of report

preparation or journal review found in studies incorporated into meta-

analyses. It follows that not all of the programs would achieve their outcome

objectives, and of those that did, some would achieve stronger outcomes than

others. Furthermore, research designs in community settings will differ in

accuracy and sensitivity, introducing differential capacity to detect outcome

effects that may exist.

This article has presented an overview of a complex series of analyses

carried out for the cross-site evaluation. This research has utilized a mix of

multilevel statistical modeling of longitudinal data, and meta-analytic

analyses of program effect sizes. In combination, these approaches to

analysis of a large MSE data set support important findings relevant to

research and practice in prevention programs for youth at high risk. With

respect to research method, the analysis clarifies several important methods

issues that may obfuscate evaluation results, and that are not typically

detectable through either meta-analytic or single-site evaluations.

The overall analysis of the full-pooled sample at the individual level

produced no significant differentiation in 30-day substance use growth curves

for program participants and comparison youth, indicating that as a group,

these programs did not demonstrate strong effects in reducing substance use.

However, subsequent analyses indicated that the availability of prevention

services to comparison youth in some sites suppressed findings of program


effects in the full sample. Further analyses of subgroups of youth provided

further explanations of the small aggregate effects found in the pooled

analysis. In particular, these analyses demonstrated significant and mean-

ingful program reductions in substance use by youth who were already using

when they entered programs. The fact that the large majority of nonusing (or

experimenting) youth in the programs did not experience significant

measurable effects does not have a clear interpretation. The issue may be

programmatic, supporting a conclusion that these programs are more

effective as early interventions than they are as primary prevention.

Alternatively, the result may reflect a combination of low variance and the

instability of a 30-day use measure within this low variance environment. In

this case, longer-term studies may be necessary to specify prevention

effectiveness for young adolescents.

Analysis by gender is also revealing. Analyses of the large cross-site

sample partitioned by gender indicated that both males and females reduced

substance use through participation in a prevention program, but in very

different patterns. Participating females reported gradual but lasting

reductions in their substance use relative to comparison females. Participating

males reported large but diminishing reductions in use relative to comparison

males. From an analytic perspective, the combination of male and female

patterns helps explain the null pattern of differences between participant and

comparison youth in the full sample.

The research plan for the cross-site data included a meta-analytic

analysis of program effect sizes for 30-day substance use. This analysis

explored the explanation of differences in prevention effectiveness that are

attributable to intervention design and implementation rather than to

differences in the individual characteristics of participating youth. This

analysis produced findings of great significance for the design and

implementation of effective prevention programs serving youth at high risk.

Exploratory analyses of contrasts in effect size identified five characteristics

of intervention design and implementation that distinguish program

effectiveness at a statistically significant level. These characteristics included

1) behavioral skills emphasis, 2) use of introspective learning, 3) connection-

building focus, 4) coherent program implementation practices, and 5) high

service intensity. Analyses of the average effect sizes of programs

characterized by multiple positive intervention and implementation charac-

teristics indicated that programs with 4 or more of these positive character-

istics had average effect sizes that were dramatically higher than other

programs, and that produced significant lasting reductions in substance use

relative to comparison youth for both boys and girls. In programs that are

well designed and implemented, the differential gender results that are found

across a more diverse set of programs do not appear. This is a strong

recommendation for the implementation of science-based programs that

incorporate design and implementation practices that have demonstrated

applicability in a variety of program models and settings.


Finally, this overview has demonstrated the importance of MSEs as a

contributor to knowledge about the effectiveness of substance abuse

prevention or other social interventions. MSEs provide the opportunity to

disentangle the complex interaction of methods error (including as shown

here, exposure of comparison group youth to prevention services), differ-

ences in outcomes based on the characteristics of the youth (males/females,

users/nonusers), and differences in the effectiveness of intervention and

implementation practices. They provide a context of understanding for the

selection, application, and adaptation of program models that the evidence

base for the models individually cannot provide. MSEs have an important and

unique place in research to improve prevention interventions and other

services designed to improve social conditions.

REFERENCES

1. Sambrano S, Springer JF, Hermann J. Informing the next generation of

prevention programs: CSAP’s cross-site evaluation of the 1994–95

high-risk youth grantees. J Commun Psychol 1997; 25:375–395.

2. Brounstein PJ, Zweig JM. Understanding Substance Abuse Prevention,

Toward the 21st Century: A Primer on Effective Programs. Department

of Health and Human Services Publication No. (SMA)99-3301,

Washington, DC: Government Printing Office, 1999.

3. Holder H, Flay B, Howard J, Boyd G, Voas R, Grossman M. Phases of

alcohol prevention research. Alcohol, Clin Exp Res 1999; 23:183–194.

4. Friedman L, DeMets D. Multicenter trials. In: Fundamentals of Clinical

Trials. New York: Springer-Verlag, 1998:345–357.

5. Tobler NS, Roona MR, Ochshorn P, Marshall DG, Streke AV,

Stackpole KM. School-based adolescent drug prevention programs:

1998 meta-analysis. J Prim Prev 2000; 20:275–336.

6. Tobler NS, Stratton HH. Effectiveness of school-based drug prevention

programs: a meta-analysis of the research. J Prim Prev 1997; 18:71–

128.

7. Tobler NS. Drug prevention programs can work: research findings. J

Addict Dis 1992; 11(3):1–28.

8. Tobler NS. Meta-analysis of 143 adolescent drug prevention programs:

quantitative outcome results of program participants compared to a

control or comparison group. J Drug Issues 1986; 16:537–567.

9. Schaps E, DiBartolo R, Moskowitz J, Palley C, Churgin S. A review of

127 drug abuse prevention evaluations. J Drug Issues 1981; 11(1): 17–

43.

10. Straw RB, Herrel JM, eds. Conducting Multiple Site Evaluations in

Real-World Settings. San Francisco: Jossey-Bass, 2002.

11. Derzon J, Springer JF, Sale E. Person-Centered Approaches for


Assessing the Impact of the CSAP National Evaluation of High-Risk

Youth Programs. 2002. Manuscript in preparation.

12. SAMHSA. Designing and implementing effective prevention programs

for youth at high risk. In: Points of Prevention. CSAP Monograph

Series #3, Washington, DC: U.S. Government Printing Office, 2002.

13. Cohen J. Statistical Power Analysis for the Behavioral Sciences.Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.

14. Scheirer MA. Program theory and implementation theory: implications

for evaluators. In: New Directions for Program Evaluation. Vol. 33.1987:59–76.

15. EMT Associated, Inc. CSAP national cross-site evaluation of high risk

youth programs: research design. 1996.16. Springer JF, Sale E, Hermann J, Sambrano S, Kasim R, Nistler M.

Characteristics of effective substance abuse prevention programs for

high-risk youth. J Prim Prev, (forthcoming).

17. Hansen W. School based alcohol prevention. Alcohol Health Res World

1993; 17(3).

18. EMT Associates, ORC Macro. CSAP National Cross-Site Evaluation of

High-Risk Youth Programs Report. Rockville, MD: Department of

Health and Human Services, 2000.

19. Kreft IG, deLeeuw J. Introducing Multilevel Modeling. London: Sage,

1998.

20. EMT Associates, ORC Macro. CSAP National Cross-Site Evaluation of

High-Risk Youth Programs Year Four Technical Report. Rockville,

MD: Department of Health and Human Services, 2000.

21. Murray D. Design and Analysis of Group-Randomized Trials. New

York: Oxford University Press, 1998.

22. Bryk AS, Raudenbush SW. Hierarchical Linear Models: Applications

and Data Analysis Methods. Newbury Park: Sage Publications, 1992.

23. Lipsey MW. Design Sensitivity: Statistical Power for Experimental

Research [Authored Book]. Thousand Oaks, CA: Sage Publications,

1990.

24. SAMHSA. Making prevention effective for adolescent boys and girls:

gender differences in substance use and prevention. In: Points of

Prevention. SAMHSA/CSAP Monograph Series #4, Washington, DC:

U.S. Government Printing Office, 2002.


understanding prevention effectiveness in real-world

Documents