Seeking the Magic Metric: Using Evidence to Identify and Track School System Progress
1
The Wing Institute's Sixth Annual Summit on Evidence-based EducationApril 21, 2011
Mary Beth Celio , Northwest Decision Resources
Seattle, Washington
Based on work funded by the Wallace Foundation and completed in cooperation with the Center on Reinventing Public Education, Daniel J. Evans School of Public Affairs, University of WashingtonExpanded through research done in urban school districts for the design of early warning systems
What is a metric?
2
In the business world, a metric is any type of measurement used to gauge some quantifiable component of a company's performance, such as return on investment , employee and customer churn rates, revenues, and so on; in software development, a metric is the measurement of a particular characteristic of a program's performance or efficiency.
There is a natural desire to reduce complexity, to summarize vast quantities of information into a single number to characterize everything from a person to a nation. Examples: a student’s GPA or SAT score; a borrower’s credit score, a country’s GNP.
The general public is familiar with metrics in common use: poverty level, unemployment rate, cost of living index, even AYP. All are summaries of complex data.
2
And why would we want to seek one that is “magic?”
3
An effective metric captures the key elements of an institution, in a compact and compelling way, pointing toward a (perhaps unarticulated) goal.
A “magic metric” would be a metric that is compact, understandable, measurable ; able to be used with units in a system of similar units (e.g., retail stores in a
chain, high schools in a district, hospitals in a network) as well as the system as a whole; and
universally accepted
3
4
Beyond its power to summarize complex data, a metric or indicator system should provide those with responsibility for a system with the ability to make decisions—to decide whether and where action is needed.
Medical researchers/biostatisticians have often been ahead of the game in searching for improved tools for triage; education research is following their lead.
Current rush to develop metrics, dashboards, report cards, indicator systems, and indices (like California’s API), responds to the need for accountability . . . and as a substitute for the much maligned AYP.
The uses of a metric or indicator system
Basic principles of metric development:
5
“Top-down” vs grassroots? Neither. Long history of conflict between top-down and grassroots; evidence-
based is more important than either
Less may be more, but one is not enough. Schools and school systems awash in data ; the human mind has a
limited capacity to absorb unlimited data. A single metric, though attractive, is difficult to unpack for or motivate
to action.
Parsimony and power must be respected. “Thin slicing” is key (cf Malcolm Gladwell)
Basic principles of metric development, cont.:
6
Current status data are necessary but not sufficient. Data out of context are ungrounded; year-to-year fluctuations confuse.
Proxies for key elements (e.g. adequacy of funding, teacher effectiveness) are inevitable. Areas for which there are no universally accepted indicators cannot be
excused from assessment/reporting for that reason.
Presentation cannot be an afterthought. “Getting information from a table is like extracting sunbeams from a
cucumber" (Farquhar and Farquhar via Wainer)
Selected indicators should include:
1. A measure of the status of a school (or school system) relative to a specific, if implied, goal (e.g., goal is high school completion; status indicator is percent of Class of 2005 who graduated on time)
2. A description of the trend in this measure over 5 years relative to the 1st year (e.g., change in completion rates from 1999 through 2005)
3. A way to diagnose underlying problems and/or predict the future performance on this indicator (e.g., percent of 9th graders failing 2+ courses and earning < 1/4th required credits predicts percent non-completions—in progress based on Chicago Consortium research)
Harry S. Truman High School: 2004-5 school enrollment
0
50
100
150
200
9th grade 10th grade 11th grade 12th grade
White African American Hispanic
7
Harry S Truman High School: 2001-2005 cohort completion
550
600
650
700
750
800
850
900
9th 10th 11th 12th Grads
2005
2004
2003
2002
2001
Class of:
2002-72.6%
2001-71.2%2003-66.7%
2005-78.5%
2004-73.9%
The birth of the particular metric/indicator system presented today
8
Wallace-supported research clarified the current challenges faced by school leaders at all levels. These findings were bolstered by an extensive review of the “indicator” literature and further research in urban school districts. Key conclusions:
There is extreme pressure on school leaders to know and show where they are and where they’re going.
All levels of school personnel are drowning in data and need to figure out what it says.
The public (everyone from the Federal government to the local parent) needs to be able to understand what the data are saying.
There is frenzied activity among consultants/researchers to provide districts and schools with tools to gather, analyze and present data—some of which confuse the matter further.
Steps in a process
9
1. Select goals to be measured Use Deming (TQC) approach Survey existing goal/vision
statements Consult Superintendents
2. Include different types of indicators, with types being: Output (achievement, achievement
gap, student completion/retention) Input (student attraction, student
engagement, teacher attraction/retention, funding equity)
Lagging (status and trend for each) Leading (projections to future
performance)
3. Select quantitative measures linked by research to outcomes Must be readily available in
districts/schools Must be intuitively appropriate
4. Specify comparison group(s) Other schools in district/state (all or
by similar profile)
5. Decide on number 7 +/- 2
6. Select display mechanism Should permit status and change in
same format Designed to encourage rapid
understanding (familiar format, if possible)
Must be “do-able” by school districts
Seven indicators suggested:
10
1. Student achievement Scores on standards-based math and reading tests
2. Elimination of the achievement gap Status and change in reading and math achievement for subgroups of students
by race, economic status, English language facility, etc. (where there are adequate numbers within a subgroup for comparison)
3. Student attraction Ability of the school/district to attract students where there are opportunities for
choice among parents/students
4. Student engagement with school Proxy measures of school engagement, including attendance, tardiness and
involvement in school
Seven indicators suggested (cont.):
11
5. Student retention/completion Retention of students during the school year(s) and completion of the
requirements appropriate at each level (elementary, middle, high)
6. Teacher attraction and retention Proxy measure of teacher attraction using applications per opening and non-
retirement turnover
7. Funding equity/efficiency Proxy measure using amount of funding per student expected by policy and
amount actually received; return on investment using calculated per student funding
A Graphical Overview of the 2012 Republican FieldBy NATE SILVERFebruary 4, 201112
The siren call of the single magic metric remains. (Entrancing, but listen at your own risk.)
13
Return on Educational InvestmentA District-by-District Evaluation of U.S. Educational Productivity by Ulrich Boser, Centerfor American Progress, January 19, 2011
A final, critical step: selection of a display mechanism for a system of indicators
How a critical research finding is conveyed makes all the difference Wainer’s account of the Challenger disaster Tufte’s critique of PowerPoint (“PowerPoint makes you
stupid.”) An important insight, inadequately portrayed, won’t
spark understanding or action. AYP. Enough said. State and school district websites are replete with
examples of inadequate, boring or gimmicky displays that fail to make an impact, spark understanding and/or action, or extract sunshine.
14
Some important do’s and don’ts in indicator display
Basic principle: Overabundance of data + overabundance of display mechanisms = confusion.
If both status and trends are important for decision makers/stake holders, then both should be displayed on the same grid.
Where an entire system is being considered, all individual elements (schools) must be able to be viewed on a single scale.
15
16
Indicator Reading Writing Math Science Ext. grad rate
Achievement of- Non-low inc.- Low income (% met standard)
% MET STANDARD RATING90 – 100% 780 – 89.9% 670 – 79.9% 560 – 69.9% 450 – 59.9% 340 – 49.9% 2< 40% 1
RATE RATING> 95 790 – 95% 685 – 89.9% 580 – 84.9% 475 – 79.9% 370 – 75% 2< 70% 1
- Achievement vs. Peers(Learning Index)
DIFFERENCE IN LEARNING INDEX RATING> .20 7.151 to .20 6.051 to .15 5-.05 to .05 4-.051 to -.15 3-.151 to -.20 2 < -.20 1
DIFFERENCEIN RATE RATING> 12 76.1 to 12 63.1 to 6 5-3 to 3 4-3.1 to -6 3-6.1 to -12 2< -12 1
Indicator Reading Writing Math Science Grad Rate AverageNon-low inc. ach. 6 6 6 4 5.50Low-inc. ach. 5 4 3 1 3.25Ach. vs. peers 7 7 7 7 7.00Improvement
7 7 6 4 6.00Average 6.25 6.00 5.50 4.00 5.44
Washington State Accountability Index, 2010
The final display: not a magic metric, but an indicator system with two display mechanisms
Seven indicators are suggested here, each with a status and a growth measurement [Other goals/benchmarks could be substituted where necessary, but the total number shouldn’t increase without good reason.]
A summary metric can be computed for each unit (school) in the system and displayed on the S/C (status/change) Grid.
Complete indicator data for each school in the district is displayed on the Wallace Indicator Grid, with ratings for each school on each indicator relative to the selected standard.
Leading indicator (“Early Warning System”) also suggested and possible, but not shown here.
17
A possible summary display: S/C (status/change) Grid
Based on recent (January 2111) research and interactive report by Center for American Progress: Return on Educational Investment.
Combines all indicators into two metrics per school: one for status, one for change
Like the Wallace Indicators, can display whole system (or parts thereof) and individual schools on the same grid
18
The Wallace Indicators: full pictureA familiar and compact
display that tracks both status and trends for all indicators
Can display data for the entire system and for individual schools in the same grid
Allows comparisons across schools and with other schools in the district /state or with standards
Guy Fawkes
D.B. Cooper
MonmouthTroy
MemorialEdsel United
Crispus Atticus
Math U T W A A T
Reading U A W A T q
Math q T W A A A
Reading U T W A A A
A T T U W q
U A A T q U
q T A A W A
U A A U W T
A q A q A W
Math T q W A T A
Reading A U T A W q
Math T U A W A A
Reading W A T A A q
q U A W A A
T U W A A A
A q T U A W
T U A A q W
W T A U q
W = In bottom 10% of comparison group U = In top third of comparison group, but < top 10%
T = In bottom third of comparison group, but > bottom 10% q = In top 10% of comparison group
A = Within 15% (+/-) of comparison group = Not available for comparison group
Worse Better
Funding equity, change from 2005
Ch
an
ge
Student retention/completion, change from 2005
Teacher attraction and retention, change from 2005
Student attraction, change from 2005
Engagement with school, change from 2005
Reduction in achievement gap, change from 2005
Achievement, change from 2005
Indicators
Teacher attraction and retention
Middle Schools
Achievement
Sta
tus
Elimination of achievement gap
Student attraction
Engagement with school
Student retention/completion
Funding equity
19
Some final observationsCollecting and analyzing school-by-school data isn’t
enough. Reams of reports can paralyze rather than propel.
A compelling, well-displayed, research-based measurement system can provide accountability and motivate change.
Dashboards, report cards or other display mechanisms, whether commercial or home-grown, are inadequate unless grounded in research on goals and appropriate indicators.
Dashboards, etc., like PowerPoint, can make you stupid.
20