understanding prevention effectiveness in real-world
TRANSCRIPT
Understanding Prevention Effectiveness inReal-World Settings: The National
Cross-Site Evaluation of HighRisk Youth Programs
Soledad Sambrano,1 J. Fred Springer,2 Elizabeth Sale,2
Rafa Kasim,2 and Jack Hermann3
1Center for Substance Abuse Prevention, Rockville, Maryland, USA2EMT Associates, Inc., Folsom, California, USA
3ORC Macro, Inc., Calverton, Maryland, USA
Abstract: The National Cross-Site Evaluation is a large multisite evaluation (MSE)
of 48 substance abuse prevention programs, 5,934 youth participating in programs,
and 4,539 comparison youth programs. Data included a self-report questionnaire
administered at 4 points in time, detailed dosage data on over 217,000 program
contacts, and detailed site visit information. In a pooled analysis, the programs did not
demonstrate significant positive effects on a composite outcome measure of tobacco,
alcohol, and marijuana use in the previous 30 days. However, disaggregated analyses
indicated that 1) sites in which comparison groups had strong opportunity to
participate in prevention programs suppressed observed effects; 2) youth who had
already started using before they entered programs reduced use significantly more
than comparison youth who had started using; and 3) both males and females who
participated in programs significantly reduced use relative to comparisons, but in very
different patterns. Combining these patterns produced an apparent null effect. Finally,
programs that incorporated at least 4 out of 5 effective intervention characteristics
This research was conducted under the Center for Substance Abuse Prevention
(CSAP) contract #277-95-5002. The views expressed herein represent the opinions
and analyses of the individual authors and may not necessarily reflect the opinions,
official policy, or position of the U.S. Department of Health and Human Services, the
Substance Abuse and Mental Health Service Administration, or the Center for
Substance Abuse Prevention.
Address correspondence to J. Fred Springer, Ph.D., EMT Associates, Inc., 771 Oak
Ave., Parkway, Suite 2, Folsom, CA 95630, USA; E-mail: [email protected]
The American Journal of Drug and Alcohol Abuse, 31:491–513, 2005
Copyright D Taylor & Francis Inc.
ISSN: 0095-2990 print / 1097-9891 online
DOI: 10.1081/ADA-200068089
Order reprints of this article at www.copyright.rightslink.com
identified in the study significantly reduced use for both males and females relative to
comparison youth. The lessons produced by this study attest to the value of MSE
designs as a source of applicable knowledge about prevention interventions.
Keywords: Substance use, prevention, multisite evaluation, underage drinking
INTRODUCTION
Research on programs for substance abuse prevention has been important for
contributing strengthened knowledge and practice for community-based
programs that support youth—particularly youth in high-risk circumstances
(1). One approach to these research contributions emphasizes a ‘‘phases of
research’’ (2) perspective in which highly controlled clinical trials (3) are
used to test and provide science-based verification of ‘‘model programs’’ (4)
for dissemination to the field. A second approach emphasizes learning from
the experience of existing programs by studying various programs imple-
mented in community settings. Meta-analyses can identify intervention de-
sign or implementation features that characterize more effective programs,
that is, those that produce larger effect sizes on intended outcomes (5–9).
Recently, multisite evaluations (MSEs) have become more visible. MSEs
are designed to compare program characteristics, their relation to program
effectiveness, and common primary data across sites (10). This overcomes
some of the inherent limitations introduced by nonstandard data in meta-
analytic studies, particularly with respect to process measures (9:40). Because
they allow natural variation in method and place, MSEs are suited to identify
the relative effectiveness of different intervention and implementation strat-
egies in real-world settings. With the ultimate objective to develop knowledge
that can be applied by practitioners in real-world settings, MSEs of locally
driven interventions may represent a more efficient approach to knowledge
generation than multisite trials of carefully controlled program models (11).
The National Cross-Site Evaluation of High Risk Youth (HRY) Programs
was designed as a large MSE that would document evidence-based lessons
from the Center for Substance Abuse Prevention’s (CSAP’s) large High Risk
Youth Prevention demonstration (1). This article provides a summary of major
study findings concerning the effectiveness of the prevention interventions
included in this 5-year study (48). First, we summarize the degree to which
participation in community-based prevention programs makes a difference in
the longitudinal substance use patterns of program participants compared to
similar youth who did not participate in the programs. To improve and specify
estimates of overall sustained effectiveness, trends are adjusted for select
method-induced bias and for differences in the substance abuse history of
participants. Second, the range of differences in program effectiveness as
measured by entry to exit effect sizes are summarized, and meta-analyses are
conducted to identify design and implementation characteristics that correlate
significantly with program effect sizes. Third, longitudinal effectiveness of
S. Sambrano et al.492
those programs is tested that manifest multiple positive design and
implementation practices. A final section presents a discussion of findings
and their implications for prevention practice.
METHODS
As an MSE assessing the effectiveness of prevention programs in achieving
outcome objectives, the national HRY study includes several important
features: 1) a common instrument, the CSAP National Youth Survey,1 used to
collect individual outcome data across all study sites; 2) a viable comparison
group constituted in each study site; 3) data collected from over 10,000 youth
(5,934 participant and 4,539 comparison youth) at 4 points in time, including 2
follow-up points after program exit; 4) a common instrument used to collect
data on exposure to prevention services for each program participant, totaling
more than 217,000 coded intervention exposures; and 5) a common site-visit
protocol used to collect data on program-level information, including
information on design and implementation.
Program Sample
The sample was drawn to represent the range of strategies, capabilities, and
participation in prevention programs funded through the High Risk Youth
Initiatives in 1994 and 1995.2 Thus, the sample represents school- and
community-based programs using differing intervention strategies imple-
mented by a variety of organizations with different resources, staffing, and
experience. Since no criteria of design or implementation strength were used
to select programs, the study ‘‘provides the opportunity to learn about a range
of program experience in actual community conditions, and to learn what
design and implementation features contribute to effectiveness in reaching
prevention objectives’’ (12:2).
The High Risk Youth programs were diverse in organizational context,
population served, and program setting. Over two-thirds of the grantee
organizations in the study were in urban communities. The remaining sites
were in rural or suburban/mixed environments. The majority (62%) were
community-based non-profit grantee organizations. Research organizations,
typically in universities or health institutions, received more than one-fourth
2Two sites were dropped from the outcome analyses because the programs were
delivered in secure institutions in which access to substances or self-reports of use
would not be comparable to other sites in which youth are in the community.
1The instrument includes measures of risk and protection factors, perceived
normative environments, attitudes toward drug use, and self-reported substance use
(i.e., lifetime, 30-day, amount), including the use of alcohol, tobacco, inhalants,
marijuana, and other drugs.
493Prevention Effectiveness in Real-World Settings
of the grants. The remaining grants were to school districts and other
government agencies (e.g., probation departments). Nearly one-third (29.2%)
delivered a substantial portion of their services as part of school classroom
activities, with the remainder offering services primarily after school.
Programs varied in duration and amount of contact with participating
youth. One-fourth of the programs delivered services to participants for 4
months or less in length, another one-fourth of the programs were longer than
a year, and the remaining 50 percent delivered services for between 5 and 11
months. Half of the programs had an average of 40 or fewer hours of direct
contact with participating youth; about one-third had between 41 and 80
hours of contact; and the remaining 12 percent delivered an average of more
than 80 hours to participants.3
Youth Sample
The sample of youth from all 48 sites totaled 10,473, with 5,934 participant
and 4,539 comparison youth. The programs served youth throughout the
adolescent age range, of both genders, and from a variety of ethnic and racial
identities. The majority of youth in the programs (52%) were of middle school
age (12–14). Forty percent of the programs served females only,4 and two-
thirds of the total youth sample was female. More than one-third (35%) of the
study youth identified themselves as African American; 26 percent identified
themselves as Hispanic; and the remaining youth were relatively evenly
distributed among Native American (13%), non-Hispanic White (12%), and
Asian or Pacific Islander (11%) identification.
Figure 1 compares use rates of 12- to 17-year-olds in the study sample
with those of youth who participated in the 1998 National Household Survey
on Drug Abuse (NHSDA), a randomly sampled general population survey of
persons 12 years and older.5 The youth in the cross-site sample reported
higher use for all substances within all age groups than did the general
population. Although the circumstances of the NHSDA and the National
Cross-Site Evaluation data collection were different, this comparison
suggests that the cross-site programs served youth who were at higher risk
for initiating substance use when young.6
4The programs serving girls only were funded through a special ‘‘Female
Adolescent’’ initiative within the HRY funding program.
3Service hours are actual. Detailed dosage data was an important feature of data
collection.
5Eighteen-year-old youth were not included in these analyses due to small sample
sizes within this age group.6The NHSDA survey is administered to a national, random sample through face-to-
face, in-home interviews; the cross-site instrument was administered primarily in
proctored, group settings.
S. Sambrano et al.494
Figure 1. Comparison of NHSDA and HRY substance use rates: percentage of
youth using cigarettes, alcohol, and marijuana in the past 30 days.
Figure 2. Percentage of youth reporting substance use in past 30 days at program
entry by age (N=10,473).
495Prevention Effectiveness in Real-World Settings
Even though youth in the HRY sample reported higher use than is
reported in the general population, their young age meant that the majority
were not substance users at the baseline measurement (entry into the
prevention program). Figure 2 displays, by age and sex, any self-reported use
of cigarettes, alcohol, or marijuana in the 30 days before baseline data
collection of youth in the study. Substance use is very low until about age 12
and then rises rapidly through the early teen years. Rates are somewhat lower
for girls, particularly after age 14.
Dependent Measures
Programs within the National Cross-Site Evaluation shared a common
outcome goal—to reduce the rates at which participating youth initiated or
increased their substance use. Two approaches to evaluating change in
substance use rates were taken; changes in individual substance use rates, and
aggregated changes at the program level. At the individual level, the criterion
variable for assessing this outcome was a measure of the frequency with which
the youth used cigarettes, alcohol, or marijuana over the last 30 days.7 In
several analyses, items for cigarettes, alcohol, and marijuana use in the last 30
days are specified as separate outcomes. At the program level, the measure of
program effectiveness was a statistical effect size that expresses the difference
between participant group change on ‘‘30 day substance use’’ between
baseline and program exit and comparison group change over the same time
period, standardized for comparability across the different programs (13).8
Program Level Measures
The National Cross-Site Evaluation produced detailed data on the design
and implementation of the interventions actually delivered in each of the
programs in the study (14, 15). Program level data collection included
1) structured site visits where evaluators interviewed program directors,
8For each site, effect sizes were calculated as the pre-post mean change for program
youth minus pre-post change for comparison youth over the pooled standard deviation.
Effect sizes were calculated on means adjusted for age, sex, and ethnic background;
and further adjusted to account for differential sample size using the method outlined
by Becker (1988).
7Youth reported their frequency of substance use by indicating how many days they
had used cigarettes, alcohol, or marijuana in the previous 30 days. Response
categories were (0) no use; (1) 1–2 days; (2) 3–5 days; (3) 6–9 days; (4) 10–19 days;
and (5) 20–31 days. Scores on each variable were combined in a simple, 3-item
additive scale with scores adjusted within a range of 0 to 5. (Cronbach’s alpha=0.78
overall and 0.71 or above across all sub-groupings by age, race/ethnicity, and gender.)
S. Sambrano et al.496
managers, service and evaluation staff, observed program activities, and
completed common protocols;9 2) individual contact data for all participants
(dosage), which included the type of services being offered, method of
delivery, and amount of service contact within each category of intervention;
and 3) case study narratives and standardized protocol summaries. The
process of reviewing and summarizing these data resulted in the development
of a comprehensive set of design and implementation measures (16).
Building on prior research (5, 6, 8, 17) measures in the areas of program
design and implementation were constructed to explain potential differences
in program effectiveness across study sites.
Program Design Measures
To assess differences in effectiveness attributable to prevention strategy,
programs were categorized as emphasizing one of four types of intervention:
behavioral skills programs (n=13), that focus on social skills and basic life
skills development (e.g., refusal skills, anger management, conflict resolution,
decision making, and academic enrichment), informational focused programs
(n=17) that emphasize educational content concerning tobacco, alcohol, other
drugs and related issues; recreational focused programs (5 programs) that
devote substantial time to substance-free leisure and enrichment activities, and
affective programs (12 programs) focusing on youth’s personal and social
concerns, often including self-image and self-esteem.
With respect to intervention design, the HRY process data supported
development of measures of the method in which programs are delivered.
Previous research has shown that programs that engage youth in interac-
tive activities are more effective than noninteractive programs (5, 6, 8).
The HRY data focused on identifying specific approaches to interactive
delivery. Specifically, we identified programs with delivery methods that
focused on building connections with others and programs that focused on
introspective, or self-reflective learning (12:12). Of the 46 programs an-
alyzed for substance use change, 17 had a strong emphasis on introspec-
tive learning methods and 13 placed an emphasis on building connections
with others.
9Site evaluators reviewed program records and documentation, interviewed program
staff and local evaluators, and observed program activities. Information from these
sources was used by each evaluator to complete standard protocols that paired closed-
ended indicators of program design and implementation process characteristics with
brief narrative descriptions and rationale for coding decisions. This careful field
procedure produced comparable information on program design and implementation
process features across sites, while preserving information on program context that
can be important to interpreting findings on complex organizational processes.
497Prevention Effectiveness in Real-World Settings
Program Implementation Measures
Several areas of program implementation were hypothesized to have a
relationship to program effectiveness, including staff training, program
coherence, program duration, program intensity, and program management.
Findings on the relative contribution of each of these variables are reported
elsewhere (16, 18). In the analyses reported here, 2 measures of the im-
plementation of program services in the National Cross-Site programs are
particularly important. The cross-site measure of program coherence refers to
the extent to which program theory is explicit, articulated, and used to focus
multiple activities on achieving program objectives. Achieving coherence is
closely related to adequate and relevant training of staff so that program
objectives, procedures, and rationale are understood and put into practice day
to day. Analysis of the site visit data allowed us to categorize the programs as
exhibiting higher (n=19) or lower coherence (n=27). Program intensity is
measured as the number of hours per week that youth were involved in the
program.10 Intensity ranged from programs that averaged less than one hour
of service per week, to programs offering 15 hours of service. To accom-
modate the skewed shape of the distribution of intensity for the 46 programs,
and to make the intensity measure more compatible with the dichotomous
categorizations identified above, programs were divided into 2 equal groups
of 23 sites, those with higher intensity (3.3 hours per week or more) and
lower intensity (less than 3.3 hours per week).
In summary, the multimethod information gathered in the cross-site
evaluation supported development of measures of program implementation
through direct indicators (e.g., dosage measures); multiple-item indicators, or
coded variables. The specific measures described here have been identified in
previous analyses as those most strongly associated with program effect sizes
and are used in subsequent analyses to identify strong programs.
Procedures
To produce optimal estimates, multilevel statistical models that correct for
potential biases attributable to ‘‘nesting’’ of individuals in separate settings
are appropriate (19, 20). In the analyses presented below, we utilize
Hierarchical Linear Modeling (HLM) to conduct pooled, longitudinal
analyses (21). In addition, the analysis uses a meta-analytic approach to
identify, examine, and explain variation in impact on outcomes across sites.
10The program level variables used in this article are those that previous analyses
have identified as significant correlates of program effectiveness in achieving
outcomes (16). These analyses demonstrated that intensity is a significant predictor of
effect size in these programs, while total amount of contact and length of program
were not.
S. Sambrano et al.498
Effect sizes are calculated for the period of time youth were actively involved
in program services (baseline to exit). As noted above, these effect sizes are
the dependent variable in exploratory analyses of the relationship between
program characteristics and program effectiveness.
In summary, the overall analytic strategy used for the findings presented
here combines ‘‘top-down’’ or pooled analyses using multilevel statistical
modeling, and ‘‘bottom-up’’ or meta-analytic approaches based on program
level explanations of effect sizes (22). Each of these approaches answered
different research questions. The multilevel statistical modeling was
important for identifying pooled longitudinal effects of interventions, and
for describing differences in these effects between subgroups of programs or
subgroups of participants (e.g., youth who were already using substances at
baseline). The meta-analytic approach was useful for clearly describing the
differences in effectiveness between sites and for exploratory analyses
identifying those characteristics of programs that were related to larger effect
sizes. Using both approaches allowed for exploration of the data,
identification of variables contributing to effective prevention programming,
and testing of the effects of the longitudinal impacts of effective practice.
FINDINGS
A full pooled analysis of trends for the composite 30-day substance use by
program participants (n=5,605) and comparison group (n=4,341) youth
across 4 points in time produced no statistically significant differences.
However, as demonstrated in the growth curves displayed in Figure 3, HLM
analyses demonstrated that participant youth did report less increase in the
use of alcohol (linear trend, treatment interaction=�0.056, p<0.05) or mar-
ijuana (linear trend, treatment interaction=�0.06, p<0.05).11 Even though
statistically significant, these substance-specific differences in trend are very
small, and they do not hold in the three-item 30-day use measure in which
cigarette use is included.
11Analyses were 3-level, HLM models with repeated measures (Level 1) nested
within individual respondents at Level 2, and respondents nested within sites at
Level 3. The Level 1 covariate was time between administrations; Level 2 covariates
included 1) group status (participant or comparison), selection propensity score (a
multivariate correction for nonequivalence between participant and comparison
groups at baseline within each site), 3) age, 4) gender, and 5) 2 dichotomous measures
of race/ethnicity (African American and Hispanic). At the program exit administra-
tion, retention rates were 17% for participants and 83% for comparison; at 6 months
after exit, 74% for participant and 75% for comparison; and at 18 months after exit,
68% for participants and 67% for comparison. HLM analyses reported here are on the
full sample with missing data replacement using an e-m algorithm technique.
499Prevention Effectiveness in Real-World Settings
In the real-world settings of programs in this MSE sample, there are 3
plausible explanations for why the pooled result may underestimate the
effectiveness of substance abuse programs in the study. First, some or all of
the evaluation designs may allow sufficient design ‘‘noise’’ (23) to make it
difficult to detect true effects (e.g., contamination of comparison groups).
Second, the effectiveness of programming may be differential, or
differentially detectable given the outcome measure, according to the
characteristics of participants (e.g., history of use, gender). Third, given the
diversity of design and implementation circumstances of individual
programs, differences between effective and ineffective programs may be
masked in pooled findings. Each of these plausible explanations for
suppressed pooled findings is explored in the following sections.
Adjustments for Comparison Group Prevention Exposure
In field experiments comparison group youth may be exposed to services
that have the same or similar objectives as the intervention under study.
This potential source of comparison group contamination is even more
Figure 3. Trends in substance use for participant and comparison youth overtime
(N=9,946).
S. Sambrano et al.500
problematic in longitudinal studies that include follow-up periods in which
either participant or comparison group youth may participate in additional
community services. When the intent of the study is to test the effectiveness
of particular interventions, the participation of comparison youth in similar
services may diminish the ability to detect effects.
The HRY study included data that identified the degree to which youth
in the comparison groups in each site had the opportunity to participate in
substance abuse prevention programming services.12 This measure allowed
comparisons of outcomes for programs in which comparison youth had
‘‘lower’’ or ‘‘higher’’ opportunity for participation in substance use
Figure 4. Trends in 30-day substance use measures for participant and comparison
youth in sites with low opportunity for comparison group participation in prevention
(N=5,195; 23 sites).
12Data was coded from site visit protocols, and confirmed or elaborated through
telephone interviews with program directors and local evaluators. Comparison group
exposure to prevention services was measured through a multiple-item index that
summed scores on 1) whether the comparison group was ‘‘reduced service,’’ 2) whether
the comparison group was drawn from another organized program (e.g., an after-school
service program), 3) whether the comparison group was in a ‘‘high service’’
community, 4) whether the comparison group was in an urban area, and 5) whether
key informants identified explicit program participation by comparison group youth.
501Prevention Effectiveness in Real-World Settings
prevention programs. Comparison youth in the 23 sites with higher
opportunity for prevention participation increased their substance use less
than comparison youth in the 23 sites with lower opportunity for
participation. Specifically, change scores for comparison youth in high
exposure sites were 0.029 for cigarettes, 0.032 for alcohol, and 0.03 for
marijuana compared to 0.09, 0.075, and 0.076, respectively, for youth from
communities with less opportunity for exposure to prevention services. This
result is consistent with the expectation that full prevention effects in sites
with higher availability of prevention services will be underestimated
because comparison youth in these field settings also benefit from prevention
services. Studies in which comparison youth have opportunities to participate
in prevention activities within the community, and this exposure remains
unmeasured, will underestimate the overall value of prevention services in
attaining intended outcomes.
Figure 4 summarizes trends for specific substances and the combined
30-day use measure for participant and comparison youth in the 23 sites in
which comparison youth have low opportunity for exposure to prevention
services. As compared to the full sample results in Figure 3, the growth
curves indicate greater separation between participant and comparison youth
with participant youth consistently indicating lower substance use rates. As
in the full sample, statistically significant benefits for participant youth
across programs are found only for the use of alcohol (linear trend, treatment
interaction=�0.075, p<.05) and marijuana (linear trend, treatment inter-
action=�0.070, p<.05). Adjusting the sample to account for comparison
group exposure to prevention services strengthens the evidence for the
effectiveness of prevention programming in reducing substance use, but
the pooled effect across diverse programs in diverse community settings
remains small.
Adjustments for Participant Characteristics
A pattern of very small or null effects may be misleading if programs have
strong measurable effects for youth in one subgroup, but no measurable
effects for another subgroup. The combined result may mask the positive
effects experienced by the benefiting subgroup, particularly if the benefiting
group is smaller. Similarly, programs may benefit different groups of
participants in different ways. The combination of these beneficial patterns
does not accurately reflect the specific effects for either group, and may
produce an apparent null effect. In the HRY study, analyses of differential
program impacts across participant subgroups found important differences in
impacts for youth who were self-reported substance users in the 30 days
before baseline (as compared to those reporting no use in the last 30 days),
and between males and females.
S. Sambrano et al.502
Self-Reported Use at Baseline
The National Cross-Site Evaluation programs served an adolescent and pre-
adolescent population centered on the middle school years. Approximately
three-fourths (76%) of the youth in the study reported that they had not used
any of the 3 index substances for the study in the last 30 days at their baseline
measurement point. While the age and abstinent or experimental use patterns
of these youth are appropriate to prevention programs, the large percentage of
zero use in the major dependent variable reduces variance and creates a
‘‘floor effect’’ for measuring reductions in use. Furthermore, the periodicity
of the measure (i.e., use in the last 30 days) makes instability and significant
regression to the mean likely for experimenters or occasional users. In sum,
the large percentage of youth who did not use substances, or who were
occasional users, made it impossible to produce dramatic reductions in
substance use among the full pooled study sample.
In high-risk communities, prevention programming must change
behavior in youth who have already begun to use substances, not just deter
Figure 5. Trends in 30-day substance use measures for participant and comparison
youth who initiated substance use prior to baseline measurement in sites with low
opportunity for prevention participation (n=1,235, 23 sites).
503Prevention Effectiveness in Real-World Settings
those who have not yet started. Approximately 25 percent of youth reported
some substance use at baseline.
Figure 5 displays the trends in 30-day substance use measures for
participant and comparison youth who had already begun to use substances in
the 23 sites in which comparison youth have less opportunity to participate in
prevention services. Self-reported 30-day substance use was slightly higher
than that of comparison youth at baseline. This reported use rate drops below
that of comparison youth by program exit, and the magnitude of this rela-
tive decrease in participant use grows at the 6- and 18-month postexit
measurement points. The difference between participant and comparison
group use is highly significant for youth who reported some use at baseline
(coefficient=�0.27, p=0.02). These findings indicate that the CSAP-funded
prevention programs provided effective interventions for youth who have
already initiated use. As in previous analyses, the differences in linear trend
for alcohol and marijuana use were significant (linear trend, treatment
interaction=�0.368, p=0.008 for alcohol, linear trend, treatment inter-
action=�0.307, p=0.038 for marijuana) with similar patterns over time as
shown in the figure related to 30-day substance use.
Figure 5 also shows that, overall, participant youth in this subgroup
reduced their use of all substances through the full period of the study. Given
that widespread substance use typically starts and escalates among youth in
the age groups included in this study, programs are typically considered
effective when they slow the rate of increased use among program partic-
ipants. However, this analysis of youth who were already using at baseline
demonstrates that program participation can actually reduce use rates.13
Gender Differences
Prevention practitioners and researchers have become increasingly aware of
the differences between boys and girls in the way substance use develops
and how prevention effects this development (24). Gender differences in the
National Cross-Site sample were evident in our partitioned analyses of males
and females who participated in programs in which comparison youth had
low opportunity for prevention program services. Figure 6 displays separate
growth curves for 30-day substance use by males and females. In contrast to
13The slight decrease in comparison group use in these analyses may be attributable
to regression toward the mean. Some of the youth who reported use in the 30 days
prior to the baseline measurement are not habitual users, for example, and may not use
in the 30 days prior to a second measurement. Controlling for this tendency is one
reason for a comparison group, and the important point here is that reductions in use
scores at repeated measurement points for participant youth were larger than those for
comparison youth.
S. Sambrano et al.504
the pooled analysis, the patterns of effects were nonlinear for males
(quadratic trend, treatment interaction=0.049, p=0.025) but linear for
females (linear trend, treatment interaction=�0.044, p=0.016). In both
cases, participant group youth experience a decrease in use relative to
comparison youth of the same gender. However, the shapes of the growth
curves for participant and comparison youth of each gender are dramatically
different. The curves for participant and comparison females grow pro-
gressively further apart in the shape of a fan. The effects of programming
are small, but are sustained and increase over time. By contrast, use rates
reported by participant boys diverge more dramatically from comparison
boys at exit and 6 months after exit, but the participant and comparison
trends converge at 18 months after exit. In the study overall across all the
programs in the low exposure sites, boys reported reductions in use while
involved in the prevention program. However, on average these gains eroded
by the end of the study.
These differences are dramatic, and the average trends when these
patterns are combined do not clearly reflect either. The slow divergence in
the female pattern pulls in the more dramatic divergence in participating and
comparison males at the middle time points. The convergence in the males at
the final time point mutes the growing difference for females. The result is a
nonsignificant trend, and a masking of two very different, but significant,
longitudinal patterns. When partitioned, both females and males show
Figure 6. Trends in 30-day substance use by gender in sites with low opportunity for
prevention participation.
505Prevention Effectiveness in Real-World Settings
stronger, albeit different, program effects than are evident when they are
combined in the pooled sample.
Differences in Program Effectiveness
Another plausible explanation of the limited prevention effect found in the
pooled analyses is that there may be significant differences in the effectiveness
of programs across sites. Null, or even counterhypothetical, effects of some
programs may cancel out the positive outcomes produced in other sites.14 A
meta-analysis of program effect sizes was used to explore this possibility.
Figure 7 displays the distribution of effect sizes for 30-day substance use
across the programs in the National Cross-Site Evaluation sample.
Sites with high opportunity for prevention participation are included in
this distribution, and the magnitudes of effect sizes in Figure 7 are biased
14To test whether there was significant variance in longitudinal program effec-
tiveness between programs in the HRY sample, we tested the ‘‘deviance’’ between two
models, one with the between site variance constrained to ‘‘0’’ and the other with
variance unconstrained. A likelihood ratio chi-square test was used to test the signi-
ficance of the variability in the group linear trend interaction across sites. For a model
including group, gender, Hispanic identity, African American identity, and a propensity
score, the chi-square values were 52.592 with 2 degrees of freedom, highly significant.
Figure 7. Distribution of effect sizes for 30-day substance use across substance
abuse prevention programs (n=46).
S. Sambrano et al.506
somewhat to the left.15 Values range from �0.71 to 1.54, with positive values
indicating that participant youth reduced their use rates relative to com-
parison youth during the program period. Most of the effect sizes are small
with 19 programs clustered between �0.09 and 0.09. Just 8 programs had
positive effect sizes larger than 0.20, a conventional standard for a meaning-
ful program effect. Nevertheless, a primary objective of the cross-site eval-
uation was to generate information about the characteristics of effective
programs in natural community settings. Not all of the programs reduced
substance use, though they often had other positive outcomes for the youth.
Among those that did show positive effects on use, some were more effective
than others. This diversity in outcome, and our ability to measure differences
between programs through our primary data on program design and
implementation, provided an opportunity to identify those program character-
istics that correlate with more positive effect sizes. This exploratory analysis
tested hypotheses developed in past prevention research against this large
sample of programs implemented under real community conditions.
Analyses of a large number of program level measures of organizational
capacity, intervention design, and implementation design (11, 12, 16) iden-
tified 5 program characteristics that produced statistically significant16
contrasts in average effect size between programs that exhibited the char-
acteristic and those that did not (see Program measures section above). Three
are features of intervention design (behavioral skills emphasis, use of intro-
spective learning, and a focus on connection-building delivery methods) and
2 are features of implementation (coherent program implementation practices
and high service intensity).
Figure 8 demonstrates average effect sizes for programs manifesting
each of the characteristics of effective programming. While these effect sizes
are substantially larger than the overall effect size for all 46 programs
(0.022), none of these average values exceed 0.20, a common standard for a
small but meaningful effect This is partly due to the downward biases of
comparison groups with high exposure to prevention services, the limited
ability to detect effects because the great majority of youth are nonusers at
15In a separate analysis (11), it was estimated that high opportunity for comparison
group exposure to prevention services reduced the grand mean (average) adjusted
effect size for sample programs from .081 to .022. Since the effects of comparison
group exposure are not collinear with important explanatory variables in the effect
size analysis, and excluding the sites would seriously reduce the power of the
program-level analyses of effect sizes, the effect size analyses include all 46 study
sites for which outcome measures were valid.16The exploratory analyses to identify characteristics associated with effect size
were conducted as dichotomous contrasts using the nonparametric Wilcoxon rank
order test as a conservative alternative for reducing the influence of outliers. All of the
contrasts for these five variables were statistically significant with an n=46, and
probability <0.05, one-tailed test. Results were consistent when tested with parametric
tests and a windsorized effect size measure.
507Prevention Effectiveness in Real-World Settings
baseline, and the suppression of overall effects attributable to the combina-
tion of genders. Furthermore, it is not surprising that no one characteristic of
the complex social processes represented in prevention programs would be
sufficient to attain strong effects across diverse programs and contexts.
Nonetheless, each of these program characteristics made a statistically signif-
icant difference in the magnitude of measured effects.
Effects of Multiple Positive Program Characteristics
Further analyses tested the degree to which programs with multiple positive
characteristics produced larger effects. The fact that the 5 characteristics were
not highly correlated strengthened the potential for gain in effectiveness
through combinations of strong design and implementation characteristics.17
The analyses identified a clear increment in effectiveness when programs
implemented at least 4 of the positive program characteristics.18 The exact
Figure 8. Summary of statistically significant effects of program characteristics on
effect sizes for 30-day substance use.
17Correlations between most program level measures were nonsignificant with
coefficients (phi) ranging from 0.04 to 0.266, indicating that for the most part positive
characteristics do not cluster in the same programs. The use of introspective learning,
which was significantly correlated with connection-building methods (0.690) and
program coherence (0.531) is the major exception. This co-occurrence would make it
difficult to establish the independent contribution of these variables to program
effectiveness. However, introspective learning was retained in the following analysis
because of its theoretical interest, and because the analysis technique does not include
the estimation of unique contributions by individual program characteristics.18Importantly, the 8 comprehensively strong programs were not identified simply on
the basis of the magnitude of their effect sizes (which, as previous analyses have
shown, may be biased or inaccurate). Rather the programs are identified according to
whether they exhibit program characteristics that are correlated with effect sizes
across a large number of diverse programs.
S. Sambrano et al.508
combination did not make a significant difference. Eight of the 46 programs
were identified as comprehensively strong programs implementing at least 4
of the 5 positive characteristics. The median effect size for these eight
programs is 0.22 compared to a median of �0.02 for all other programs. This
difference is large and highly statistically significant (p<.01).
Longitudinal Outcomes of Effective Programming
These explanations of program effectiveness have been based on program
effect sizes calculated for the period between program entry and exit. An
advantage of the National Cross-Site Evaluation of High-Risk Youth
Programs data set is that it allows the determination of whether the effec-
tiveness of the 8 comprehensively strong programs is maintained after
program exit. The study includes 6- and 18-month follow-up data points.
To test the hypothesis that these comprehensively strong programs will
produce lasting effects on the substance use of participants, the HLM analysis
was replicated with only participants and comparison youth in the 8
comprehensively strong program sites. Figure 9 presents growth curves con-
trasting trends in substance use for participants and comparison youth in the
8 comprehensively strong programs. These longitudinal findings support and
extend the effect size analysis reported above. There are dramatic differences
between the findings for the total sample and findings for the 8 programs with
at least 4 positive program characteristics. While over time differences in
Figure 9. Trends in 30-day substance use for participant and comparison youth in
8 sites with positive program characteristics (n=1759).
509Prevention Effectiveness in Real-World Settings
substance use change between participant and comparison group youth were
very small (not statistically significant) in the full sample, differences be-
tween the participant and comparison youth in the 8-program cluster were
pronounced and statistically significant [interaction coefficient=0.0322,
p=0.039 (one-tailed test)]. Even 18 months after program exit, use rates
for the participant youth were significantly lower than use rates of comparison
youth, demonstrating enduring prevention effectiveness. Furthermore, the
growth curves for these programs are nearly identical for males and females,
suggesting that comprehensively strong programs produce similar and lasting
benefits for both genders (24). These 8 comprehensive programs are more
effective over time than other programs, and their effects are more equitable
for males and females.
DISCUSSION
The National Cross-Site Evaluation of High Risk Youth Programs is an
example of a large-scale MSE designed to improve knowledge concerning
the design and implementation of effective prevention programs in
community settings. The sample represents the diversity of HRY programs
funded by CSAP during the funding period under study, including the full
range of design and implementation strengths and weaknesses that
characterize community-based programs. Neither the individual site research
designs nor the findings were filtered through the scrutiny of report
preparation or journal review found in studies incorporated into meta-
analyses. It follows that not all of the programs would achieve their outcome
objectives, and of those that did, some would achieve stronger outcomes than
others. Furthermore, research designs in community settings will differ in
accuracy and sensitivity, introducing differential capacity to detect outcome
effects that may exist.
This article has presented an overview of a complex series of analyses
carried out for the cross-site evaluation. This research has utilized a mix of
multilevel statistical modeling of longitudinal data, and meta-analytic
analyses of program effect sizes. In combination, these approaches to
analysis of a large MSE data set support important findings relevant to
research and practice in prevention programs for youth at high risk. With
respect to research method, the analysis clarifies several important methods
issues that may obfuscate evaluation results, and that are not typically
detectable through either meta-analytic or single-site evaluations.
The overall analysis of the full-pooled sample at the individual level
produced no significant differentiation in 30-day substance use growth curves
for program participants and comparison youth, indicating that as a group,
these programs did not demonstrate strong effects in reducing substance use.
However, subsequent analyses indicated that the availability of prevention
services to comparison youth in some sites suppressed findings of program
S. Sambrano et al.510
effects in the full sample. Further analyses of subgroups of youth provided
further explanations of the small aggregate effects found in the pooled
analysis. In particular, these analyses demonstrated significant and mean-
ingful program reductions in substance use by youth who were already using
when they entered programs. The fact that the large majority of nonusing (or
experimenting) youth in the programs did not experience significant
measurable effects does not have a clear interpretation. The issue may be
programmatic, supporting a conclusion that these programs are more
effective as early interventions than they are as primary prevention.
Alternatively, the result may reflect a combination of low variance and the
instability of a 30-day use measure within this low variance environment. In
this case, longer-term studies may be necessary to specify prevention
effectiveness for young adolescents.
Analysis by gender is also revealing. Analyses of the large cross-site
sample partitioned by gender indicated that both males and females reduced
substance use through participation in a prevention program, but in very
different patterns. Participating females reported gradual but lasting
reductions in their substance use relative to comparison females. Participating
males reported large but diminishing reductions in use relative to comparison
males. From an analytic perspective, the combination of male and female
patterns helps explain the null pattern of differences between participant and
comparison youth in the full sample.
The research plan for the cross-site data included a meta-analytic
analysis of program effect sizes for 30-day substance use. This analysis
explored the explanation of differences in prevention effectiveness that are
attributable to intervention design and implementation rather than to
differences in the individual characteristics of participating youth. This
analysis produced findings of great significance for the design and
implementation of effective prevention programs serving youth at high risk.
Exploratory analyses of contrasts in effect size identified five characteristics
of intervention design and implementation that distinguish program
effectiveness at a statistically significant level. These characteristics included
1) behavioral skills emphasis, 2) use of introspective learning, 3) connection-
building focus, 4) coherent program implementation practices, and 5) high
service intensity. Analyses of the average effect sizes of programs
characterized by multiple positive intervention and implementation charac-
teristics indicated that programs with 4 or more of these positive character-
istics had average effect sizes that were dramatically higher than other
programs, and that produced significant lasting reductions in substance use
relative to comparison youth for both boys and girls. In programs that are
well designed and implemented, the differential gender results that are found
across a more diverse set of programs do not appear. This is a strong
recommendation for the implementation of science-based programs that
incorporate design and implementation practices that have demonstrated
applicability in a variety of program models and settings.
511Prevention Effectiveness in Real-World Settings
Finally, this overview has demonstrated the importance of MSEs as a
contributor to knowledge about the effectiveness of substance abuse
prevention or other social interventions. MSEs provide the opportunity to
disentangle the complex interaction of methods error (including as shown
here, exposure of comparison group youth to prevention services), differ-
ences in outcomes based on the characteristics of the youth (males/females,
users/nonusers), and differences in the effectiveness of intervention and
implementation practices. They provide a context of understanding for the
selection, application, and adaptation of program models that the evidence
base for the models individually cannot provide. MSEs have an important and
unique place in research to improve prevention interventions and other
services designed to improve social conditions.
REFERENCES
1. Sambrano S, Springer JF, Hermann J. Informing the next generation of
prevention programs: CSAP’s cross-site evaluation of the 1994–95
high-risk youth grantees. J Commun Psychol 1997; 25:375–395.
2. Brounstein PJ, Zweig JM. Understanding Substance Abuse Prevention,
Toward the 21st Century: A Primer on Effective Programs. Department
of Health and Human Services Publication No. (SMA)99-3301,
Washington, DC: Government Printing Office, 1999.
3. Holder H, Flay B, Howard J, Boyd G, Voas R, Grossman M. Phases of
alcohol prevention research. Alcohol, Clin Exp Res 1999; 23:183–194.
4. Friedman L, DeMets D. Multicenter trials. In: Fundamentals of Clinical
Trials. New York: Springer-Verlag, 1998:345–357.
5. Tobler NS, Roona MR, Ochshorn P, Marshall DG, Streke AV,
Stackpole KM. School-based adolescent drug prevention programs:
1998 meta-analysis. J Prim Prev 2000; 20:275–336.
6. Tobler NS, Stratton HH. Effectiveness of school-based drug prevention
programs: a meta-analysis of the research. J Prim Prev 1997; 18:71–
128.
7. Tobler NS. Drug prevention programs can work: research findings. J
Addict Dis 1992; 11(3):1–28.
8. Tobler NS. Meta-analysis of 143 adolescent drug prevention programs:
quantitative outcome results of program participants compared to a
control or comparison group. J Drug Issues 1986; 16:537–567.
9. Schaps E, DiBartolo R, Moskowitz J, Palley C, Churgin S. A review of
127 drug abuse prevention evaluations. J Drug Issues 1981; 11(1): 17–
43.
10. Straw RB, Herrel JM, eds. Conducting Multiple Site Evaluations in
Real-World Settings. San Francisco: Jossey-Bass, 2002.
11. Derzon J, Springer JF, Sale E. Person-Centered Approaches for
S. Sambrano et al.512
Assessing the Impact of the CSAP National Evaluation of High-Risk
Youth Programs. 2002. Manuscript in preparation.
12. SAMHSA. Designing and implementing effective prevention programs
for youth at high risk. In: Points of Prevention. CSAP Monograph
Series #3, Washington, DC: U.S. Government Printing Office, 2002.
13. Cohen J. Statistical Power Analysis for the Behavioral Sciences.Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.
14. Scheirer MA. Program theory and implementation theory: implications
for evaluators. In: New Directions for Program Evaluation. Vol. 33.1987:59–76.
15. EMT Associated, Inc. CSAP national cross-site evaluation of high risk
youth programs: research design. 1996.16. Springer JF, Sale E, Hermann J, Sambrano S, Kasim R, Nistler M.
Characteristics of effective substance abuse prevention programs for
high-risk youth. J Prim Prev, (forthcoming).
17. Hansen W. School based alcohol prevention. Alcohol Health Res World
1993; 17(3).
18. EMT Associates, ORC Macro. CSAP National Cross-Site Evaluation of
High-Risk Youth Programs Report. Rockville, MD: Department of
Health and Human Services, 2000.
19. Kreft IG, deLeeuw J. Introducing Multilevel Modeling. London: Sage,
1998.
20. EMT Associates, ORC Macro. CSAP National Cross-Site Evaluation of
High-Risk Youth Programs Year Four Technical Report. Rockville,
MD: Department of Health and Human Services, 2000.
21. Murray D. Design and Analysis of Group-Randomized Trials. New
York: Oxford University Press, 1998.
22. Bryk AS, Raudenbush SW. Hierarchical Linear Models: Applications
and Data Analysis Methods. Newbury Park: Sage Publications, 1992.
23. Lipsey MW. Design Sensitivity: Statistical Power for Experimental
Research [Authored Book]. Thousand Oaks, CA: Sage Publications,
1990.
24. SAMHSA. Making prevention effective for adolescent boys and girls:
gender differences in substance use and prevention. In: Points of
Prevention. SAMHSA/CSAP Monograph Series #4, Washington, DC:
U.S. Government Printing Office, 2002.
513Prevention Effectiveness in Real-World Settings