what is predictive assessment nm 20100616

WHAT IS PREDICTIVE ASSESSMENT?

DISCOVERY EDUCATION ASSESSMENT

New Mexico Benchmark Assessments

Reading and Mathematics, Grades 3-10


What is Predictive Assessment?

EXECUTIVE SUMMARY

1. Are Discovery Education Predictive Assessments reliable?

The median Reading reliability for the 0809 NM benchmark assessments grades 3-8 was .78 with

a median sample size of 574. The median Mathematics reliability for the 0809 NM benchmark assessments was .83 with a median sample size of 574.

2. Do Discovery Education Predictive Assessments have content validity?

Discovery Education Assessments ensure content validity by using each state’s curriculum

standards for reading and mathematics. The DEA New Mexico reporting categories for Reading and Mathematics Grades 3-8 are based on the US Reporting Categories The DEA New Mexico

reporting categories for Reading and Mathematics grades 9 & 10 are based on the New Mexico

Short-Cycle Assessments (SCA) requirements.

3. Do Discovery Education Predictive Assessments match state standardized tests?

4. Can Discovery Education Predictive Assessments predict proficiency levels?

Yes, there is a greater than 90% accuracy rate for predicting combined state proficiency percentages. The median state Proficiency Prediction Score for the Reading tests was 94% and

the median state Proficiency Prediction Score for Mathematics tests was 94%.

5. Can the use of Discovery Education Predictive Assessments improve student learning?

6. Can Discovery Education Predictive Assessments be used to measure growth over time?

Yes. These benchmark assessments are scored on a vertical scale using state-of-the-art Rasch

psychometric modeling. Thus, reliable estimates of student growth can be made over time.

7. Are Discovery Education Predictive Assessments based on scientifically-based research

advocated by the U. S. Department of Education?

Two matched control group studies—one in Birmingham, Alabama, and the other in Nashville,

Tennessee—support the claim that Discovery Education Predictive Assessments help schools demonstrate significant improvement in student proficiency.


What is Predictive Assessment?

NEW MEXICO

An Overview of Standards and Scientifically-Based Evidence

Supporting the Discovery Education Assessment Test Series

Since its inception in 2000 by Vanderbilt University, ThinkLink Learning, now a part of

Discovery Education, has focused on the use of formative assessments to improve K-12 student

learning and performance. Bridging the gap between university research and classroom practice, Discovery Education Assessment offers effective and user-friendly assessment products that

provide classroom teachers and students with the feedback needed to strategically adapt their

teaching and learning activities throughout the school year.

Discovery Education Assessment has pioneered a unique approach to formative assessments

using a scientifically research-based continuous improvement model that maps diagnostic

assessments to each state’s high stakes test. Discovery Education Assessment’s Predictive State-Specific Benchmark tests are aligned to the content assessed by each state test allowing teachers

to track student progress toward the standards and objectives used for accountability purposes.

Furthermore, Discovery Education Assessment subscribes to the Standards for Educational and

Psychological Testing articulated by the consortium of the American Educational Research

Association, the American Psychological Association, and the National Council on Measurement

in Education. This document, “What is Predictive Assessment?” outlines how Discovery Education Assessment addresses the following quality testing standards:


Test reliability provides evidence that test questions are consistently measuring a given

construct, such as mathematics ability or reading comprehension. Furthermore, high test

reliability indicates that the measurement error for a test is low.


Content validity evidence shows that test content is appropriate for the particular constructs that

are being measured. Content validity is measured by agreement among subject matter experts

about test material and alignment to state standards, by highly reliable training procedures for item writers, by thorough reviews of test material for accuracy and lack of bias, and by

examination of depth of knowledge of test questions.


Criterion validity evidence demonstrates that test scores predict scores on an important criterion

variable, such as a state’s standardized test.

4. Can Discovery Education Predictive Assessments predict proficiency levels?


Proficiency predictive validity evidence supports the claim that a test can predict a state’s

proficiency levels. High accuracy levels show that a high degree of confidence can be placed in

the vendor’s prediction of student proficiency.


Consequential validity outlines how the use of these predictive assessments facilitates important

consequences, such as the improvement of student learning and student performance on state

standardized tests.


Growth models depend on a highly rigorous and valid vertical scale to measure student

performance over time. A vendor’s vertical scales should be constructed using advanced

statistical methodologies such as Rasch measurement models and other state-of-the-art

psychometric techniques.



In the No Child Left Behind Act of 2001, the U.S. Department of Education outlined six major

criteria for “scientifically-based research” to be used by consumers of educational measurements

and interventions. Accordingly, a vendor’s test

(i) employs systematic, empirical methods that draw on observation and experiment;

(ii) involves rigorous data analyses that are adequate to test the stated hypotheses and justify

the general conclusions drawn;

(iii) relies on measurements or observational methods that provide reliable and valid data

across evaluators and observers, across multiple measurements and observations, and

across studies by the same or different investigators;

(iv) is evaluated using experimental or quasi-experimental designs in which individuals,

entities, programs or activities are assigned to different conditions and with

appropriate controls to evaluate the effects of the condition of interest, with a

preference for random-assignment experiments, or other designs to the extent that

those designs contain within-condition or across-condition control.

(v) ensures experimental studies are presented in sufficient detail and clarity to allow for

replication or, at a minimum, offer the opportunity to build systematically on their

finding; has been accepted by a peer-reviewed journal or approved by a panel of

independent experts through a comparably rigorous, objective and scientific review;


TEST RELIABILITY


Test reliability provides evidence that test questions are consistently measuring a given

construct, such as mathematics ability or reading comprehension. Furthermore, high test reliability indicates that the measurement error for a test is low. Reliabilities are calculated using

Cronbach’s alpha.

Table 1 and Table 2 present test reliabilities and sample sizes for Discovery Education Predictive

Assessments in the subject areas of Reading and Mathematics for New Mexico grades 3-8.

Reliabilities for grades 3-8 are based on Discovery Education Assessments given in the fall,

winter and spring 2008-2009 school year.

The median Reading reliability was .78 with a median sample size of 574. The median

Mathematics reliability was .83 with a median sample size of 574.

Table 1: Test reliabilities for the 0809 DEA Reading Benchmark assessments, grades 3-8.

Reading Reliabilities

Fall 2008 Winter 2009 Spring 2009

N Reliability N Reliability N Reliability

Grade 3 459 0.77 563 0.82 546 0.83

Grade 4 535 0.78 596 0.78 601 0.80

Grade 5 589 0.78 643 0.75 645 0.76

Grade 6 575 0.79 627 0.83 621 0.79

Grade 7 223 0.78 228 0.76 227 0.79

Grade 8 250 0.77 254 0.75 243 0.78

Median 497 0.78 580 0.77 574 0.79

Table 2: Test reliabilities for the 0809 DEA Mathematics Benchmark assessments, grades 3-8.

Mathematics Reliabilities



Grade 3 460 0.82 564 0.81 549 0.83

Grade 4 532 0.84 597 0.83 598 0.84

Grade 5 585 0.84 648 0.76 646 0.80

Grade 6 572 0.79 615 0.74 634 0.82

Grade 7 215 0.75 237 0.80 227 0.78

Grade 8 253 0.84 245 0.80 237 0.83

Median 496 0.83 581 0.80 574 0.83

Table 3 and Table 4 present test reliabilities and sample sizes for Discovery Education Predictive

Assessments in the subject areas of Reading and Mathematics for New Mexico grades 9 and 10.

Reliabilities for grades 9 and 10 are based on Discovery Education Assessments given in the fall,

winter and spring 2009-2010 school year.


Table 3: Test reliabilities for the 0910 DEA Reading Benchmark assessments, grades 9 & 10.

Reading Reliabilities



Grade 9 283 0.79 284 0.80 263 0.78

Grade 10 279 0.85 273 0.84 228 0.88

Table 4: Test reliabilities for the 0910 DEA Reading Benchmark assessments, grades 9 & 10.

Mathematics Reliabilities



Grade 9 290 0.49 285 0.68 252 0.71

Grade 10 300 0.66 284 0.57 231 0.79


CONTENT VALIDITY


Content validity evidence shows that test content is appropriate for the particular constructs that are being measured. Content validity is measured by agreement among subject matter experts

about test material and alignment to state standards, by highly reliable training procedures for

item writers, by thorough reviews of test material for accuracy and lack of bias, and by examination of depth of knowledge of test questions.

To ensure content validity of all tests, Discovery Education Assessment carefully aligns the content of its assessments to a given state’s content standards and the content sampled by the

respective high stakes test. Discovery Education Assessment hereby employs one of the leading

alignment research methodologies, the Webb Alignment Tool (WAT), which has continually

supported the alignment of our tests to state specific content standards both in breadth (i.e., amount of standards and objectives sampled) and depth (i.e., cognitive complexity of standards

and objectives). All Discovery Education Assessment tests are thus state specific and feature

matching reporting categories of a given state’s large-scale assessment used for accountability purposes.

The following reporting categories are used on Discovery Education Predictive Assessments. The DEA New Mexico reporting categories for Reading and Mathematics are based on the

United States Reporting Categories for grades 3-8. The DEA New Mexico 9th and 10

th grade

reporting categories for reading and math are based on the New Mexico Short-Cycle Assessments

We continually update our assessments to reflect the most current version of a state’s standards.

Reading US Reporting Categories: Grades 3-8

Text & Vocabulary Understanding Sentence Construction

Interpretation Composition

Extend Understanding Proofreading/ Editing

Comprehension Strategies

Reading New Mexico Reporting Categories: Grades 9 & 10

Reading and Listening for Comprehension Literature and Media

Writing and Speaking for Expression

Math US Reporting Categories: Grades 3-8

Number and Number Relations Geometry and Spatial Sense

Computation and Numerical Estimation Data Analysis, Statistics and Probability

Operation Concepts Patterns, Functions, Algebra

Measurement Problem Solving and Reasoning


Math New Mexico Reporting Categories: Grades 9 & 10

Data Analysis and Probability Algebra

Geometry


CRITERION VALIDITY



PROFICIENCY PREDICTIVE VALIDITY

4. Can Discovery Education Predictive Assessments predict state proficiency levels?

Proficiency predictive validity supports the claim that a test can predict a state’s proficiency

levels. High accuracy levels show that a high degree of confidence can be placed in our test

predictions of student proficiency. Two measures of predictive validity are calculated. If only summary data for a school or district are available, the Proficiency Prediction Score is tabulated.

When individual student level data is available, then an additional index, the Proficiency Success

Rate, is also calculated. Both measures are explained in the following sections with examples drawn from actual data from the states.

Proficiency Prediction Score

The Proficiency Prediction Score is used to determine the accuracy of predicted proficiency status. Under the NCLB legislation, it is important that states and school districts help students

progress from a “Not Proficient” status to one of “Proficient”. The Proficiency Prediction Score is

based on the percentage of correct proficiency classifications (Not Proficient/Proficient). If a state uses two or more classifications for “Proficient” (such as “Proficient” and “Advanced”), the

percentage of students in these two or more categories would be added together. Also, if a state

uses two or more categories for “Not Proficient” (such as “Below Basic” and “Basic”), the percentage of students in these two or more categories would be added together. To see how to

use this score, let’s assume a school district had the following data based on its annual state test

and a Discovery Education Assessment Spring benchmark assessment. Let’s use data from a

Grade 4 Mathematics Test as an example:

Predicted Percent Proficient or higher = 70%

Actual Percent Proficient or higher on the State Test = 80%

The error rate for these predictions is as follows:

Error Rate = /Actual Percent Proficient - Predicted Percent Proficient/ Error Rate = 80% - 70% = 10%

In this example, Discovery Education Assessment underpredicted the percent of students proficient by 10%. The absolute value (the symbols / / ) of the error rate is used to account for

cases where Discovery Education Assessment overpredicts the percent of students proficient and

the calculation is negative (e.g., Actual - Predicted = 70% - 80% = -10%; absolute value is 10%).

The Proficiency Prediction Score is calculated as follows:

Proficiency Prediction Score = 100% - Error Rate

In this example, the score is as follows:

Proficiency Prediction Score = 100% - 10% = 90%.

A higher Proficiency Prediction Score indicates a larger number or percentage of correct

proficiency predictions. In this example, Discovery Education Assessment had a score of 90%,


which indicates 9 correct classifications for every 1 misclassification. Discovery Education

Assessment uses information from these scores to improve its benchmark assessments every year.

The state of New Mexico uses a four level proficiency scale: Beginning, Near Proficient,

Proficient and Advanced. The table following show the proficiency prediction scores for the

Reading and Mathematics assessments. On average, Discovery Education Assessments had a 94% proficiency prediction score on the reading tests and a 94% proficiency prediction score on

the math tests. The figures on the following pages compare the percentages of the student

proficiency levels predicted by Discovery Education Assessment 0809B test and the 2009 NM State assessments.

Proficiency Prediction Score for New Mexico

Reading Math

Grade 3 94.10 94.30

Grade 4 98.00 98.60

Grade 5 98.50 97.00

Grade 6 94.70 91.30

Grade 7 93.00 95.20

Grade 8 84.40 81.80

Grade 11 94.90 97.20


Grade 3 Reading Proficiency Level Percentages

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

tsDEA

NM State

Grade 3 Math Proficiency Level Percentages

0.00

10.00

20.00

30.00

40.00

50.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State



0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

tsDEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State



0.00

10.00

20.00

30.00

40.00

50.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

tsDEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


Grade 7 Proficiency Level Percentages

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

tsDEA

NM State

Grade 8 Proficiency Level Percentage

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

ts

DEA

NM State


Grade 10/11 Reading Proficiency Level

Percentages

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Beginning Near

Proficient

Proficient Advanced

Perc

en

t o

f S

tud

en

tsDEA Grade

10

NM State

Grade 11

Grade 10/11 Math Proficiency Level Percentages

0.00

10.00

20.00

30.00

40.00

Beginning Near

Proficient

Proficient Advanced

Pe

rce

nt

of

Stu

de

nts

DEA Grade

10

NM State

Grade 11


CONSEQUENTIAL VALIDITY


Consequential validity outlines how the use of benchmark assessments facilitates important consequences, such as the improvement of student learning and student performance on state

standardized tests.


GROWTH MODELS


Growth models depend on a highly rigorous and valid vertical scale to measure student performance over time. Discovery Education Assessment vertical scales are constructed using

Rasch measurement models with state-of-the-art psychometric techniques.

The accurate measurement of student achievement over time is becoming increasingly important

to parents, teachers, and school administrators. Student “growth” within a grade and across

grades has also been sanctioned by the U. S. Department of Education as a reliable way to measure student proficiency in Reading and Mathematics and to satisfy the requirements of

Adequate Yearly Progress (AYP) under the No Child Left Behind Act. Accurate measurement

and recording of individual student achievement can also help with issues of student mobility: as

students move within a district or state, records of individual student achievement can help new schools administer to the needs of this mobile population.

The assessment of student achievement over time is even more important with the use of benchmarks tests. Discovery Education Assessment Benchmark tests provide a snapshot of

student progress toward state standards at up to four points during the school year. These

benchmark tests are scientifically linked, so that the reporting of student proficiency levels is both reliable and valid.

How is the growth score created?

Discovery Education Assessment has added a scientifically based vertical scaled growth score to its family of benchmark tests in 2007-08. These growth scores are based on the Rasch

measurement model, a state-of-the-art psychometric technique for scaling ability (e.g., Wright &

Stone, 1979; Wright & Masters, 1982; Linacre 1999; Smith & Smith, 2004; Wilson, 2005). To accomplish vertical scaling, common items are embedded across assessments to enable the

psychometric linking of tests at different points in time. For example, a Grade 3 mathematics

benchmark test administered mid-year might contain below grade level and above grade level

items. Performance on these off grade level items provides an accurate measurement of how much growth occurs across grades. Furthermore, benchmark tests within a grade are also linked

with common items, once again to assess change at different points in time within a grade.

Discovery Education Assessment is using established psychometric procedures to build calibrated item banks and linked tests (i.e., Ingebo, 1997; Kolen & Brennan, 2004).

Why use such a rigorous vertical scale? Isn’t student growth similar across grades? Don’t students change as much from Grade 3 to Grade

4 as they do from Grade 7 to Grade 8? Previous research on the use of vertical scales has

demonstrated that student growth is not linear; that is, growth in student achievement is

different from grade to grade (see Young 2006). For instance, Figure 9 on the next page shows preliminary Discovery Education Assessment vertically scaled growth results. This graph shows

growth from Grades 3 to 10 in Mathematics as measured by Discovery Education Assessment’s

Spring benchmark tests. Typically, students have larger gains in mathematics achievement in elementary grades with growth somewhat slowing in middle and high school, as published by

other major testing companies.


Figure 11: Vertically Scaled Growth Results for Discovery Education Assessment Mathematics

Tests.

Discovery/ThinkLink Growth Score: Mathematics

1300

1400

1500

1600

1700

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Grade 9 Grade 10

Med

ian

Scale

Sco

re

What is unique about the Discovery Education Assessment vertical growth scores?

Student growth can now be accurately measured at four points in time in each grade level. Discovery Education Assessment benchmark tests are administered up to four times yearly:

Early Fall, Late Fall, Winter, and Spring. For each time period, we report scale scores and

accompanying statistics. Most testing companies only allow the measurement of student growth

at two points in time: Fall and Spring. Discovery Education Assessment benchmark tests provide normative information to assess student growth multiple times each year. Figure 10

illustrates this growth for Grade 4 Mathematics using our benchmark assessments.

Figure 12: Within-Year Growth Results for Discovery Education Assessment Mathematics

Tests.

Discovery/ThinkLink Growth in Grade 4 Mathematics

1380

1400

1420

1440

1460

1480

1500

1520

Early Fall Late Fall Winter Spring

ThinkLink Benchmark Tests

Med

ian

Sc

ale

Sc

ore

Math


New Mexico Growth Scale

The following tables and figures illustrate the Test Difficulty on the Discovery Education

Assessment vertical growth scale for the 0809 Reading and Mathematics tests between three periods, Fall, Winter and Spring.

Reading Average Scale Scores


Grade 3 1395 1397 1430

Grade 4 1412 1429 1472

Grade 5 1485 1504 1502

Grade 6 1532 1529 1578

Grade 7 1557 1562 1601

Grade 8 1572 1585 1605

Grade 9 1604 1636 1626

Grade 10 1672 1657 1581

Math Average Scale Scores


Grade 3 1398 1352 1404

Grade 4 1440 1449 1456

Grade 5 1478 1487 1516

Grade 6 1516 1545 1584

Grade 7 1557 1560 1597

Grade 8 1595 1616 1630

Grade 9 1617 1611 1682

Grade 10 1638 1621 1638


0809 NM Reading Average Student Scale Scores

1300

1400

1500

1600

1700

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Grade

3

Grade

4

Grade

5

Grade

6

Grade

7

Grade

8

Grade

9

Grade

10

Avg

. S

tud

en

t S

cale

Sco

re

Scale

Score

0809 NM Math Average Student Scale Scores

1300

1400

1500

1600

1700

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Fall

Win

ter

Spring

Grade

3

Grade

4

Grade

5

Grade

6

Grade

7

Grade

8

Grade

9

Grade

10

Avg

Stu

den

t S

cale

Sco

re

Scale

Score


NCLB SCIENTIFICALLY-BASED RESEARCH



Discovery Education Assessment has also adhered to the criteria for “scientifically-based

research” put forth in the No Child Left Behind Act of 2001. “What is Predictive Assessment?”

has outlined how Discovery Education Predictive Assessments test reliability and validity meets the following criteria for scientifically-based research set forth by NCLB:

(i) employs systematic, empirical methods that draw on observation and

experiment;

(ii) involves rigorous data analyses that are adequate to test the stated hypotheses

and justify the general conclusions drawn;

(iii) relies on measurements or observational methods that provide reliable and valid

data across evaluators and observers, across multiple measurements and

observations, and across studies by the same or different investigators;

Discovery Education Assessment also provides evidence of meeting the following scientifically-based research criterion:

(iv) is evaluated using experimental or quasi-experimental designs in which

individuals, entities, programs or activities are assigned to different conditions

and with appropriate controls to evaluate the effects of the condition of interest,

with a preference for random-assignment experiments, or other designs to the

extent that those designs contain within-condition or across-condition control.

Case Study One: Birmingham, Alabama City Schools

Larger schools and school districts typically do not participate in experimental or quasi-experimental studies due to logistical and ethical concerns. However, a unique situation in

Birmingham, Alabama afforded Discovery Education Assessment with the opportunity to

investigate the efficacy of its benchmark assessments in respect to a quasi-control group. In 2003-2004, approximately one-half of the schools in Birmingham City used Discovery Education

Predictive Assessments whereas the other half did not. At the end of the school year,

achievement results for both groups were compared revealing a significant improvement on the SAT10 for those schools that used the Discovery Education Predictive Assessments as opposed to

those that did not. Discovery Education Assessment subsequently compiled a brief report titled

the “Birmingham Case Study”. Excerpts from the case study are included below:

This study is based on data from elementary and middle schools in the City of Birmingham,

Alabama. In 2002-03, no Birmingham Schools used Discovery Education’s Predictive

Assessment Series. Starting in 2003-04, 20 elementary and 9 middle schools used the Discovery Education Assessment program. All Birmingham schools took the Stanford Achievement Test

Tenth Edition (SAT10) at the end of both school years. The SAT10 is administered yearly as part

of the State of Alabama’s School Accountability Program. The State of Alabama uses

improvement in SAT10 percentiles to gauge school progress and as part of its NCLB reporting.


National percentiles on the SAT10 are reported by subject and grade level. A single national

percentile is reported for all students within a subject and grade level (this analysis is

subsequently referred as ALL STUDENTS). Furthermore, national percentiles are disaggregated by various subgroups within a school. For the comparisons that follow, the national percentiles

for students

classified as utilizing free and reduced lunch (referred to below as POVERTY) were used. All

percentiles have been converted to Normal Curve Equivalents (NCE) to allow for averaging of results.

The Discovery Education Assessment schools comprise the experimental group in this study. The Birmingham schools that did not use Discovery Education Assessment comprise the matched

comparison group. The following charts show SAT10 National Percentile changes for DEA

Schools vs. Non-DEA Schools in two grades levels (Grades 5 and 6) for three subjects (Language, Mathematics, and Reading) for two groups of students (ALL STUDENTS and

POVERTY students). In general, there was a significant decline or no improvement in SAT10

scores from 2002-03 to 2003-04 for most non-DEA schools. This trend however did not happen

in the schools using Discovery Education Assessment: instead, there was a marked improvement with most grades scoring increases in language, math and reading. In grade levels where there

was a decline in Discovery Education Assessment schools, it was a much lower decline in scores

when compared to those schools that did not use Discovery Education Assessment.

As a result of the improvement that many of these schools made in school year 2003-04, the

Birmingham City Schools selected Discovery Education Assessment to be used with all of the

schools in school year 2004-05. The Birmingham City Schools also chose to provide professional development in each school to help all teachers become more familiar with the

concepts of standardized assessment and better utilize data to focus instruction.

SAT 10 Math Comparisons: DEA vs. Non DEA Schools

in Birmingham, AL

38

39

40

41

42

43

44

45

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

DEA Schools Non-DEA

Schools

DEA Schools Non-DEA

Schools

All Schools Poverty Schools

NC

E


SAT 10 Grade 5 Reading Comparison: DEA vs. Non

DEA Schools in Birmingham, AL

40

41

42

43

44

45

46

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

DEA Schools Non-DEA

Schools

DEA Schools Non-DEA

Schools


NC

E

SAT 10 Grade 6 Language Comparison: DEA vs Non

DEA Schools

36

38

40

42

44

46

48

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

SAT 10

02-03

SAT 10

03-04

DEA Schools Non-DEA Schools DEA Schools Non-DEA Schools


NC

E

The following pie graph shows the Lunch Status percentages provided by Birmingham,

AL school system for grades 5th and 6

th.


Grades 5 & 6 Free Lunch Status Percentages

Free

83%

Not

Free/Reduced

11%

Reduced

6%

Free

Not Free/Reduced

Reduced

Case Study Two: Metro Nashville, Tennessee City Schools

Metro Nashville schools that used Discovery Education Assessment made greater

improvements in AYP than Metro Nashville schools that didn’t use Discovery Education Assessment. During the 2004-2005 school year, sixty-five elementary and middle schools in Metro Nashville, representing over 20,000 students, used Discovery Education assessments.

Fifty-two elementary and middle schools, representing over 10,000 students, did not use

Discovery Education assessments. The improvement in the percent of students at the

Proficient/Advanced level from 2004 to 2005 is presented in the graph below. The results compare DEA schools versus non-DEAschools in Metro Nashville. Discovery Education

Assessment schools showed more improvement in AYP status from 2004 to 2005 when schools

are combined and analyzed separately at the elementary and middle school level.

Improvement in Reading TCAP Scores 2005 over

2004

0

2

4

6

8

10

12

14

Combined

Improvement

(n=20190)

Elementary

Improvement

(n=5217)

Middle School

Improvement

(n=14948)

% Im

pro

vem

en

t

DEA

Schools

Non DEA

Schools


Improvement in Math TCAP Scores 2005 over

2004

0

2

4

6

8

10

12

14

Combined

Improvement

(n=21549)

Elementary

Improvement

(n=5765)

Middle School

Improvement

(n=15759)

% Im

pro

vem

en

t

DEA

Schools

Non DEA

Schools

The following pie charts display the frequency percents of the NCLB data

provided to DEA from the Metro Nashville Public School System for the Elementary

Schools.

Gender Frequency Percentages: Elementary

Male

49%Female

46%

No Gender

Provided

5%

Male

Female

No Gender Provided

Ethnicity Frequency Percentages: Elementary

Black

43.45%

Hispanic

15.99%

American Indian

0.13%

Other

0.08%

White

31.54%

Asian

3.46%

No Ethnicity

Provided

5.34%

Asian

Black

Hispanic

American Indian

Other

White

No Ethnicity Provided


Free Lunch Status Freq Percentages:

Elementary

Not

Free/Reduced

27%

Yes:

Free/Reduced

67%

No Status

Provided

6%

Not Free/Reduced

Yes: Free/Reduced

No Status Provided

English Language Learners Freq Percentages:

Elementary

81.43%

12.96%

5.61%

Non ELL Students

ELL Student

No Status Provided

Special Education Freq Percentages:

Elementary

83%

11%

6%

Not in Special Ed.

Yes: Special Education

No Status Provided

The following pie charts display the frequency percents of the NCLB data

provided to DEA from the Metro Nashville Public School System for the Middle

Schools.


Gender Frequency Percentages: Middle School

Male

48%Female

47%

No Gender

Provided

5%

Male

Female

No Gender Provided

Ethinicity Frequency Percentages: Middle School

Asian

3.33%

Black

46.43%

Hispanic

14.40%

White

30.71%

No Ethnicity

Provided

4.91%

Other

0.05%

American Indian

0.18%

Asian

Black

Hispanic

American Indian

Other

White

No Ethnicity Provided

Free Lunch Status Freq Percentages: Middle

School

Yes:

Free/Reduced

66%

Not

Free/Reduced

29%

No Status

Provided

5%

Not Free/Reduced

Yes: Free/Reduced

No Status Provided


English Language Learners Freq Percentages:

Middle School

86.39%

8.52%5.09%

Non ELL Students

ELL Student

No Status Provided

Special Education Freq Percentages: Middle

School

85%

10%

5%

Not in Special Ed.

Yes: Special Education

No Status Provided

(v) ensures experimental studies are presented in sufficient detail and clarity to

allow for replication or, at a minimum, offer the opportunity to build

systematically on their finding;

Consumers are encouraged to request additional data or further details for the examples listed in

this overview. Discovery Education Assessment also compiles Technical Manuals specific to

each school district and/or state. Accumulated data are of sufficient detail to permit adequate

psychometric analyses, and their results have been consistently replicated across school districts and states. Past documents of interest include among others: “A Multi-State Comparison of

Proficiency Predictions for Fall 2006” and “A Multi-State Look at ‘What is Predictive

Assessment?’.” Furthermore, the “What is Predictive Assessment?” series of documents is


available for multiple states. Please check the Discovery Education website

www.discoveryeducation.com for document updates.

(vi) has been accepted by a peer-reviewed journal or approved by a panel of

independent experts through a comparably rigorous, objective and scientific

review;

Discovery Education Assessment tests and results have been incorporated and analyzed in the

following peer- reviewed manuscripts and publications:

Jacqueline Shrago and Dr. Michael Smith of ThinkLink Learning contributed a chapter on

formative assessment to “Online Assessment and Measurement: Case Studies from Higher

Education, K-12, and Corporate” by Scott Howell and Mary Hicko in 2006.

Dr. Elizabeth Vaughn-Neely, Associate Professor, Dept. of Leadership & Counselor Education,

Ole Miss University ([email protected]) and Dr. Marjorie Reed, Associate Professor, Dept. of

Psychology, Oregon State University presented their peer-reviewed findings based on their joint research and work with schools using Discovery Education Assessment at:

Society for Research & Child Development, Atlanta, GA. April 2005 Kappa Delta Pi Conference, Orlando, FL November 2005

Society on Scientific Study of Reading, July 2006

Two dissertations for Ed.S studies have also been published: Dr. Juanita Johnson, Union University ([email protected])

Dr. Monica Eversole, Richmond KY ([email protected])

Please contact us for other specific information requests. We welcome your interest in the

evidence supporting the efficacy of our Discovery Education Assessment tests.


Procedures for Item and Test Review

Discovery Education Assessment has established policies to review items in each

benchmark assessment for appropriate item statistics and for evidence of bias.

Furthermore, the collection of items that form a specific test is reviewed for alignment to

state or national standards and for appropriate test difficulty. This document outlines in

more detail how these processes are implemented.

Item Statistics

P-values and item discrimination indices are calculated for each item based on the

number of students that completed a benchmark assessment.

P-values represent the percentage of students tested who answered the item correctly.

Based on p-values alone, the following item revision procedures are followed:

Items with p-values of .90 or above are considered too “easy” and are revised or

replaced.

Items with p-values of .20 or below are considered too “hard” and are revised or

replaced.

Item discrimination indices (biserial correlations) are calculated for each item. Items with

low biserial correlations (less than .20) are revised or replaced.

Test Content Validity and Statistics

The blueprints of state tests are monitored for changes in state standards. When state

blueprints have changed, Discovery Education Assessment revises the blueprints for

benchmarks tests used in that state. Items may be revised or replaced where necessary to

match any revised blueprint and to maintain appropriate test content validity.

P, A, B, and C tests within the same year are compared to verify that all tests are of

comparable difficulty. When necessary, items are revised to maintain the necessary level

of difficulty within each test.

The predictability of Discovery Education benchmark assessments is examined closely.

When necessary, items are revised or replaced to improve the predictability of benchmark

assessments. In order to maintain high levels of predictability, Discovery Education

Assessment strives to revise or replace less than 20% of the items in any benchmark

assessment each year.


Differential Item Functioning

Differential Item Functioning (DIF) analyses are performed on items from tests where the

overall sample size is large (n = 1000 or more per test) and the sample size for each

subgroup meets minimum standards (usually 300 or more students per subgroup). DIF

analyses for Gender (males and females) and Ethnicity (Caucasian vs. African-American,

or Caucasian vs. Hispanic) are routinely performed when sample size minimums are met.

DIF analyses are conducted using Rasch modeling through the computer program

WINSTEPS. This program calculates item difficulty parameters for each DIF subgroup

and compares these logit estimates. DIF Size is calculated using industry standard

criteria (see p.1070 “An Adjustment for Sample Size in DIF Analysis”, Rasch

Measurement Transactions, 20:3, Winter 2006).

The following criteria are used to determine DIF Size:

Negligible: 0 logits to .42 logits (absolute value)

Moderate: .43 logits to .63 logits (absolute value)

Large: .64 logits and up (absolute value)

Items with Large DIF Size are marked for the content team to review. Subject matter

content experts analyze an item to determine the cause of Gender or Ethnic DIF. If the

content experts can determine this cause, the item is revised to remove the gender or

ethnicity bias. If the cause cannot be determined, the item is replaced with a different

item that measures the same sub-skill.


TEST AND QUESTION STATISTICS, RELIABILITY, AND PERCENTILES

The following section reports test and question statistics, reliability, and percentiles for the New

Mexico benchmark tests, for grades 3-10, Reading and Mathematics. The 3-8 benchmark tests were administered in New Mexico in spring of 2009. The 9-10

th grade benchmarks were

administered winter of 2010. These benchmark tests are representative samples of over 1000

benchmark tests developed by Discovery Education Assessment. Benchmark tests are revised each year based on test and question statistics, particularly low item discrimination indices and

significant DIF.

The following statistics are reported:

Number of Students: Number of students used for calculation of test statistics.

Number of Items: Number of items in each benchmark test (including common items

used for scaling purposes).

Number of Proficiency

Items:

Number of items used to calculate the proficiency level.

Mean: Test mean in terms of number correct.

Standard Deviation: Test standard deviation.

Reliability: Cronbach’s alpha.

SEM: Standard Error of Measurement (SEM) for the test.

Scale Score:

Discovery Education Assessment Scale Score for each number correct (Scale scores are vertically scaled using Rasch

measurement. Scale scores from grades K-12 range from 1000 to

2000).

Percentiles: Percentage of students below each number correct score.

Stanines: Scale scores that range from 1 to 9.

Question P-values: The proportion correct for each item.

Biserial: Item discrimination using biserial correlation.

Rasch Item Difficulty: Rasch item difficulty parameter calculated using WINSTEPS.

DIF Gender: Rasch item difficulty difference (Male vs. Female).

DIF Ethnicity: Rasch item difficulty difference (White vs. Black).

DIF Size

Negligible: 0 logits to .42 logits (absolute value).

Moderate: .43 logits to .63 logits (absolute value).

Large: .64 logits and up (absolute value).


Reading Grade 3

0809 Spring New Mexico

Test Statistics

Number of Students 546

Number of Items 28

Mean 17.40

Std. Deviation 5.58

Reliability 0.83

Std. Error of Measurement 2.30

Question Statistics Scale Scores

Item No. P-Value Biserial

Rasch Item Difficulty

No. Correct Scale Score

No. Correct

Scale Score

1 0.70 0.45 -0.38 0 1010 15 1389

2 0.88 0.37 -1.77 1 1104 16 1402

3 0.58 0.29 0.23 2 1162 17 1415

4 0.79 0.41 -0.99 3 1197 18 1428

5 0.72 0.47 -0.50 4 1224 19 1442

6 0.55 0.46 0.43 5 1246 20 1456

7 0.30 0.31 1.71 6 1265 21 1472

8 0.48 0.34 0.74 7 1282 22 1489

9 0.74 0.50 -0.66 8 1298 23 1508

10 0.61 0.38 0.11 9 1312 24 1530

11 0.73 0.54 -0.55 10 1326 25 1557

12 0.81 0.36 -1.13 11 1339 26 1593

13 0.75 0.51 -0.73 12 1352 27 1650

14 0.61 0.48 0.12 13 1364 28 1744

15 0.61 0.45 0.10 14 1377

16 0.55 0.38 0.42

17 0.58 0.44 0.23

18 0.64 0.46 -0.07

19 0.61 0.43 0.11

20 0.69 0.47 -0.32

21 0.58 0.40 0.24

22 0.34 0.24 1.52

23 0.59 0.40 0.21

24 0.70 0.48 -0.41

25 0.55 0.42 0.39

26 0.64 0.36 -0.04

27 0.40 0.41 1.17

28 0.66 0.41 -0.18


Math Grade 3


Test Statistics


Number of Items 32

Mean 18.76

Std. Deviation 5.98

Reliability 0.83



Item No.

P-Value Biserial

Rasch Item

Difficulty No.

Correct Scale Score

No. Correct

Scale Score

41 0.41 0.38 0.92 0 1000 17 1378

42 0.53 0.29 0.29 1 1080 18 1389

43 0.66 0.30 -0.33 2 1138 19 1400

44 0.55 0.36 0.21 3 1173 20 1411

45 0.57 0.46 0.11 4 1200 21 1423

46 0.62 0.50 -0.16 5 1221 22 1435

47 0.71 0.41 -0.64 6 1240 23 1448

48 0.69 0.42 -0.52 7 1256 24 1462

49 0.79 0.42 -1.08 8 1271 25 1476

50 0.62 0.49 -0.13 9 1285 26 1492

51 0.62 0.35 -0.14 10 1298 27 1510

52 0.53 0.45 0.32 11 1310 28 1532

53 0.75 0.35 -0.83 12 1322 29 1557

54 0.50 0.49 0.46 13 1334 30 1592

55 0.52 0.36 0.35 14 1345 31 1648

56 0.65 0.32 -0.28 15 1356 32 1741

57 0.85 0.29 -1.59 16 1367

58 0.50 0.28 0.43

59 0.68 0.38 -0.45

60 0.61 0.41 -0.09

61 0.50 0.29 0.44

62 0.32 0.31 1.39

63 0.66 0.49 -0.32

64 0.43 0.45 0.8

65 0.59 0.34 0.01

66 0.33 0.20 1.32

67 0.88 0.37 -1.87

68 0.52 0.41 0.35

69 0.72 0.50 -0.65

70 0.62 0.50 -0.12

71 0.48 0.37 0.53

72 0.34 0.36 1.26


Reading Grade 4


Test Statistics


Number of Items 28

Mean 16.39

Std. Deviation 5.05

Reliability 0.80





No. Correct

Scale Score

No. Correct

Scale Score

1 0.51 0.42 0.41 0 1060 15 1449

2 0.63 0.31 -0.16 1 1154 16 1463

3 0.82 0.45 -1.36 2 1212 17 1476

4 0.25 0.3 1.77 3 1249 18 1490

5 0.66 0.43 -0.33 4 1277 19 1504

6 0.87 0.31 -1.76 5 1300 20 1519

7 0.75 0.41 -0.84 6 1319 21 1535

8 0.50 0.5 0.44 7 1337 22 1553

9 0.75 0.51 -0.85 8 1353 23 1572

10 0.70 0.51 -0.56 9 1368 24 1595

11 0.46 0.31 0.65 10 1383 25 1622

12 0.26 0.19 1.73 11 1397 26 1659

13 0.75 0.54 -0.87 12 1410 27 1716

14 0.66 0.38 -0.32 13 1423 28 1811

15 0.36 0.39 1.13 14 1436

16 0.42 0.34 0.83

17 0.63 0.55 -0.17

18 0.61 0.36 -0.09

19 0.45 0.43 0.69

20 0.61 0.3 -0.11

21 0.67 0.49 -0.38

22 0.64 0.45 -0.24

23 0.59 0.31 0

24 0.19 0.13 2.17

25 0.37 0.17 1.11

26 0.73 0.44 -0.74

27 0.77 0.46 -0.97

28 0.80 0.49 -1.17


Math Grade 4


Test Statistics


Number of Items 32

Mean 17.57

Std. Deviation 6.33

Reliability 0.84





No. Correct

Scale Score

No. Correct

Scale Score

41 0.67 0.39 -0.60 0 1056 17 1445

42 0.87 0.41 -2.00 1 1150 18 1457

43 0.65 0.43 -0.50 2 1207 19 1468

44 0.72 0.30 -0.91 3 1242 20 1479

45 0.57 0.48 -0.09 4 1268 21 1491

46 0.65 0.26 -0.49 5 1289 22 1504

47 0.58 0.31 -0.14 6 1308 23 1517

48 0.39 0.34 0.80 7 1324 24 1531

49 0.74 0.42 -0.98 8 1339 25 1546

50 0.49 0.51 0.29 9 1353 26 1562

51 0.58 0.45 -0.13 10 1366 27 1581

52 0.45 0.27 0.50 11 1378 28 1603

53 0.67 0.47 -0.60 12 1390 29 1630

54 0.23 0.23 1.76 13 1401 30 1665

55 0.49 0.52 0.31 14 1412 31 1722

56 0.52 0.42 0.17 15 1423 32 1817

57 0.69 0.37 -0.71 16 1434

58 0.61 0.49 -0.28

59 0.72 0.49 -0.87

60 0.53 0.42 0.14

61 0.59 0.46 -0.17

62 0.50 0.48 0.28

63 0.34 0.47 1.09

64 0.44 0.44 0.55

65 0.56 0.52 -0.04

66 0.41 0.45 0.71

67 0.48 0.43 0.34

68 0.31 0.10 1.24

69 0.40 0.25 0.74

70 0.74 0.51 -1.03

71 0.59 0.52 -0.16

72 0.4 0.47 0.74


Reading Grade 5


Test Statistics


Number of Items 28

Mean 16.10

Std. Deviation 4.81

Reliability 0.76





No. Correct

Scale Score

No. Correct

Scale Score

1 0.94 0.36 -2.67 0 1093 15 1490

2 0.82 0.41 -1.36 1 1188 16 1504

3 0.71 0.41 -0.65 2 1247 17 1517

4 0.41 0.37 0.83 3 1284 18 1531

5 0.38 0.27 0.99 4 1312 19 1546

6 0.55 0.45 0.18 5 1336 20 1561

7 0.55 0.40 0.18 6 1356 21 1577

8 0.65 0.56 -0.31 7 1374 22 1594

9 0.55 0.36 0.17 8 1391 23 1613

10 0.74 0.23 -0.82 9 1407 24 1636

11 0.50 0.40 0.43 10 1422 25 1663

12 0.51 0.39 0.39 11 1436 26 1698

13 0.39 0.27 0.96 12 1450 27 1755

14 0.41 0.24 0.83 13 1464 28 1849

15 0.50 0.35 0.42 14 1477

16 0.67 0.43 -0.40

17 0.30 0.33 1.39

18 0.81 0.48 -1.26

19 0.61 0.30 -0.09

20 0.33 0.26 1.23

21 0.66 0.32 -0.38

22 0.65 0.38 -0.33

23 0.76 0.49 -0.92

24 0.42 0.40 0.79

25 0.73 0.43 -0.76

26 0.60 0.43 -0.07

27 0.63 0.46 -0.20

28 0.29 0.16 1.44


Math Grade 5


Test Statistics


Number of Items 32

Mean 17.79

Std. Deviation 5.64

Reliability 0.80





No. Correct

Scale Score

No. Correct

Scale Score

41 0.76 0.44 -1.10 0 1112 17 1504

42 0.74 0.51 -0.94 1 1205 18 1515

43 0.57 0.38 -0.06 2 1262 19 1526

44 0.68 0.32 -0.61 3 1298 20 1538

45 0.17 0.13 2.12 4 1324 21 1550

46 0.52 0.33 0.19 5 1346 22 1563

47 0.53 0.28 0.14 6 1364 23 1576

48 0.37 0.26 0.93 7 1381 24 1590

49 0.70 0.43 -0.72 8 1396 25 1606

50 0.43 0.31 0.59 9 1409 26 1622

51 0.64 0.50 -0.42 10 1422 27 1641

52 0.57 0.44 -0.06 11 1435 28 1663

53 0.58 0.41 -0.09 12 1447 29 1690

54 0.62 0.44 -0.31 13 1459 30 1725

55 0.68 0.39 -0.62 14 1470 31 1782

56 0.77 0.38 -1.14 15 1481 32 1876

57 0.67 0.37 -0.54 16 1492

58 0.76 0.46 -1.04

59 0.54 0.36 0.09

60 0.46 0.34 0.46

61 0.57 0.49 -0.03

62 0.70 0.31 -0.71

63 0.84 0.27 -1.67

64 0.46 0.40 0.47

65 0.37 0.41 0.91

66 0.17 0.17 2.17

67 0.56 0.44 -0.01

68 0.51 0.33 0.22

69 0.54 0.47 0.08

70 0.31 0.36 1.22

71 0.60 0.40 -0.20

72 0.41 0.26 0.69


Reading Grade 6


Test Statistics


Number of Items 28

Mean 17.33

Std. Deviation 4.94

Reliability 0.79





No. Correct

Scale Score

No. Correct

Scale Score

1 0.57 0.29 0.34 0 1155 15 1541

2 0.60 0.32 0.19 1 1250 16 1555

3 0.74 0.38 -0.58 2 1307 17 1568

4 0.62 0.37 0.06 3 1343 18 1582

5 0.41 0.31 1.10 4 1370 19 1596

6 0.78 0.53 -0.85 5 1393 20 1611

7 0.34 0.48 1.46 6 1412 21 1628

8 0.75 0.34 -0.69 7 1430 22 1645

9 0.54 0.30 0.45 8 1446 23 1665

10 0.41 0.44 1.10 9 1461 24 1688

11 0.77 0.43 -0.79 10 1475 25 1716

12 0.80 0.39 -0.97 11 1489 26 1752

13 0.74 0.39 -0.58 12 1502 27 1811

14 0.78 0.54 -0.86 13 1515 28 1906

15 0.64 0.47 -0.04 14 1528

16 0.85 0.45 -1.39

17 0.87 0.44 -1.60

18 0.54 0.42 0.48

19 0.69 0.46 -0.31

20 0.36 0.30 1.35

21 0.63 0.44 0.01

22 0.25 0.26 1.95

23 0.68 0.49 -0.28

24 0.83 0.45 -1.25

25 0.35 0.21 1.42

26 0.82 0.42 -1.12

27 0.39 0.23 1.20

28 0.60 0.36 0.19


Math Grade 6


Test Statistics


Number of Items 32

Mean 19.12

Std. Deviation 5.78

Reliability 0.82





No. Correct

Scale Score

No. Correct

Scale Score

41 0.61 0.46 -0.04 0 1172 17 1555

42 0.69 0.22 -0.44 1 1265 18 1566

43 0.82 0.21 -1.30 2 1321 19 1577

44 0.69 0.43 -0.43 3 1355 20 1588

45 0.84 0.38 -1.46 4 1381 21 1600

46 0.57 0.46 0.18 5 1402 22 1612

47 0.28 0.29 1.65 6 1420 23 1624

48 0.81 0.28 -1.17 7 1436 24 1638

49 0.26 0.25 1.75 8 1450 25 1652

50 0.67 0.39 -0.35 9 1464 26 1668

51 0.56 0.45 0.22 10 1476 27 1686

52 0.44 0.34 0.82 11 1488 28 1707

53 0.66 0.46 -0.28 12 1500 29 1733

54 0.77 0.31 -0.89 13 1511 30 1767

55 0.48 0.48 0.60 14 1522 31 1823

56 0.49 0.35 0.55 15 1533 32 1916

57 0.45 0.40 0.74 16 1544

58 0.61 0.50 -0.02

59 0.76 0.43 -0.84

60 0.69 0.45 -0.44

61 0.65 0.40 -0.22

62 0.67 0.35 -0.31

63 0.76 0.39 -0.83

64 0.50 0.38 0.50

65 0.55 0.40 0.29

66 0.50 0.43 0.50

67 0.35 0.53 1.25

68 0.80 0.46 -1.11

69 0.52 0.29 0.41

70 0.75 0.39 -0.78

71 0.35 0.21 1.28

72 0.58 0.39 0.14


Reading Grade 7


Test Statistics


Number of Items 28

Mean 17.57

Std. Deviation 5.05

Reliability 0.79





No. Correct

Scale Score

No. Correct

Scale Score

1 0.67 0.26 -0.17 0 1189 15 1563

2 0.75 0.30 -0.63 1 1283 16 1575

3 0.61 0.39 0.12 2 1340 17 1587

4 0.72 0.37 -0.47 3 1375 18 1600

5 0.83 0.49 -1.22 4 1401 19 1613

6 0.63 0.38 0.01 5 1423 20 1627

7 0.60 0.32 0.19 6 1442 21 1642

8 0.22 0.22 2.17 7 1458 22 1658

9 0.78 0.47 -0.80 8 1474 23 1677

10 0.88 0.47 -1.68 9 1488 24 1698

11 0.65 0.42 -0.06 10 1501 25 1724

12 0.60 0.49 0.19 11 1514 26 1758

13 0.70 0.43 -0.37 12 1526 27 1815

14 0.78 0.41 -0.83 13 1539 28 1908

15 0.53 0.45 0.51 14 1551

16 0.41 0.24 1.11

17 0.54 0.31 0.47

18 0.45 0.29 0.92

19 0.68 0.48 -0.25

20 0.61 0.43 0.12

21 0.72 0.40 -0.44

22 0.61 0.31 0.15

23 0.80 0.40 -0.99

24 0.68 0.39 -0.22

25 0.40 0.40 1.18

26 0.49 0.42 0.70

27 0.63 0.39 0.06

28 0.59 0.46 0.23


Math Grade 7


Test Statistics


Number of Items 32

Mean 18.44

Std. Deviation 5.04

Reliability 0.78





No. Correct

Scale Score

No. Correct

Scale Score

41 0.82 0.30 -1.35 0 1190 17 1578

42 0.92 0.22 -2.31 1 1284 18 1589

43 0.79 0.42 -1.16 2 1341 19 1600

44 0.32 0.40 1.32 3 1376 20 1612

45 0.62 0.48 -0.18 4 1402 21 1623

46 0.36 0.55 1.09 5 1424 22 1635

47 0.66 0.26 -0.36 6 1442 23 1648

48 0.67 0.53 -0.43 7 1458 24 1661

49 0.52 0.47 0.31 8 1473 25 1676

50 0.63 0.40 -0.25 9 1487 26 1692

51 0.18 0.37 2.24 10 1499 27 1710

52 0.92 0.26 -2.31 11 1512 28 1731

53 0.62 0.33 -0.18 12 1523 29 1757

54 0.80 0.36 -1.19 13 1535 30 1792

55 0.86 0.39 -1.72 14 1546 31 1848

56 0.57 0.36 0.05 15 1557 32 1941

57 0.63 0.20 -0.23 16 1567

58 0.30 0.33 1.42

59 0.63 0.49 -0.21

60 0.09 -0.05 3.08

61 0.78 0.29 -1.07

62 0.80 0.35 -1.22

63 0.26 0.33 1.63

64 0.12 0.41 2.72

65 0.55 0.50 0.16

66 0.35 0.16 1.16

67 0.61 0.37 -0.12

68 0.53 0.50 0.27

69 0.69 0.14 -0.55

70 0.77 0.46 -1.02

71 0.37 0.32 1.02

72 0.70 0.34 -0.60


Reading Grade 8


Test Statistics


Number of Items 28

Mean 17.07

Std. Deviation 4.82

Reliability 0.78





No. Correct

Scale Score

No. Correct

Scale Score

1 0.86 0.30 -1.52 0 1170 15 1571

2 0.48 0.22 0.72 1 1265 16 1585

3 0.93 0.25 -2.39 2 1324 17 1598

4 0.58 0.50 0.23 3 1361 18 1612

5 0.85 0.38 -1.41 4 1390 19 1627

6 0.74 0.37 -0.58 5 1414 20 1642

7 0.39 0.31 1.17 6 1434 21 1658

8 0.36 0.30 1.34 7 1453 22 1675

9 0.84 0.35 -1.34 8 1470 23 1695

10 0.49 0.26 0.68 9 1486 24 1717

11 0.67 0.48 -0.19 10 1501 25 1744

12 0.50 0.42 0.62 11 1516 26 1780

13 0.49 0.35 0.68 12 1530 27 1837

14 0.74 0.36 -0.58 13 1544 28 1931

15 0.56 0.40 0.35 14 1558

16 0.73 0.53 -0.56

17 0.66 0.35 -0.17

18 0.41 0.40 1.09

19 0.36 0.40 1.34

20 0.58 0.41 0.25

21 0.46 0.39 0.82

22 0.76 0.50 -0.71

23 0.44 0.38 0.92

24 0.69 0.39 -0.33

25 0.36 0.49 1.34

26 0.49 0.37 0.70

27 0.89 0.20 -1.77

28 0.75 0.36 -0.69


Math Grade 8


Test Statistics


Number of Items 32

Mean 18.14

Std. Deviation 5.90

Reliability 0.83





No. Correct

Scale Score

No. Correct

Scale Score

41 0.65 0.38 -0.40 0 1203 17 1612

42 0.85 0.33 -1.65 1 1299 18 1624

43 0.44 0.24 0.67 2 1359 19 1635

44 0.71 0.37 -0.72 3 1397 20 1647

45 0.79 0.41 -1.23 4 1425 21 1660

46 0.55 0.48 0.13 5 1448 22 1673

47 0.32 0.36 1.30 6 1467 23 1686

48 0.77 0.34 -1.06 7 1485 24 1701

49 0.28 0.15 1.50 8 1500 25 1717

50 0.76 0.43 -1.00 9 1515 26 1734

51 0.55 0.44 0.11 10 1529 27 1754

52 0.35 0.34 1.12 11 1542 28 1777

53 0.56 0.32 0.09 12 1554 29 1806

54 0.65 0.31 -0.40 13 1566 30 1844

55 0.58 0.31 -0.04 14 1578 31 1906

56 0.62 0.33 -0.20 15 1589 32 2000

57 0.38 0.47 0.94 16 1601

58 0.51 0.55 0.31

59 0.47 0.37 0.52

60 0.69 0.53 -0.61

61 0.86 0.51 -1.73

62 0.84 0.41 -1.55

63 0.65 0.47 -0.38

64 0.37 0.49 1.01

65 0.37 0.24 1.01

66 0.43 0.52 0.69

67 0.37 0.28 1.03

68 0.64 0.42 -0.33

69 0.66 0.55 -0.45

70 0.66 0.52 -0.42

71 0.52 0.40 0.27

72 0.29 0.36 1.47


Reading Grade 9


Test Statistics


Number of Items 31

Mean 16.18

Std. Deviation 5.61

Reliability .80





No. Correct

Scale Score

No. Correct

Scale Score

1 0.49 0.18 0.2 0 1251 16 1643

2 0.61 0.30 -0.42 1 1345 17 1655

3 0.75 0.43 -1.17 2 1402 18 1666

4 0.42 0.38 0.52 3 1438 19 1678

5 0.55 0.46 -0.1 4 1464 20 1690

6 0.73 0.52 -1.02 5 1486 21 1702

7 0.47 0.41 0.28 6 1505 22 1716

8 0.23 0.11 1.54 7 1521 23 1730

9 0.45 0.37 0.37 8 1536 24 1746

10 0.51 0.25 0.07 9 1550 25 1763

11 0.62 0.43 -0.45 10 1563 26 1784

12 0.68 0.43 -0.79 11 1576 27 1810

13 0.44 0.3 0.43 12 1587 28 1844

14 0.88 0.29 -2.08 13 1599 29 1899

15 0.58 0.54 -0.26 14 1610 30 1992

16 0.51 0.44 0.05 15 1621 31 1643

17 0.39 0.23 0.64

18 0.62 0.45 -0.43

19 0.40 0.54 0.59

20 0.61 0.47 -0.42

21 0.61 0.53 -0.43

22 0.45 0.38 0.35

23 0.34 0.03 0.93

24 0.44 0.42 0.4

25 0.37 0.42 0.74

26 0.46 0.31 0.32

27 0.39 0.41 0.64

28 0.33 0.28 0.93

29 0.81 0.36 -1.56

30 0.50 0.44 0.1

31 0.51 0.50 0.07


0910A NM Reading 9th Grade Assessment

NM Item Number

Also found in: Question # Biserial

DIF Gender DIF Eth

1 KY / 9th 10 0.3 0.56 0.27

2 FL / 9th 11 0.38 0.09 0.22

3 - - - - -

4 IL / 9th 16 0.32 0.06 0.04

5 MS / 9th 17 0.33 NA NA

6 - - - - -

7 - - - - -

8 - - - - -

9 FL / 9th 30 0.5 0.09 0.09

10 - - - - -

11 MS / 9th 8 0.18 NA NA

12 FL / 9th 19 0.45 0.08 0.44

13 - - - - -

14 IL / 9th 5 0.16 0.39 0.46

15 IL / 9th 7 0.43 0.08 0.36

16 IL / 9th 18 0.47 0.02 0.03

17 - - - - -

18 - - - - -

19 - - - - -

20 - - - - -

21 IL / 9th 14 0.39 0.02 0.35

22 - - - - -

23 - - - - -

24 - - - - -

25 - - - - -

26 FL / 9th 28 0.39 0.21 0.11

27 - - - - -

28 KY / 9th 23 0.45 0.29 0.04

29 - - - - -

30 - - - - -

31 - - - - -


0910B NM Reading 9\th Grade Assessment

NM Item # Also found

in: Item # Biserial DIF Eth DIF Gen

1 KY / 9th 3 0.31 0.35 0.04

2 WI / 9th 2 0.24 NA NA

3 - - - - -

4 KY / 9th 6 0.45 0.51 0.08

5 - - - - -

6 - - - - -

7 IL / 9th 10 0.43 0.08 0.24

8 - - - - -

9 IL / 9th 11 0.49 0.1 0.17

10 FL / 9th 19 0.4 0.25 0.04

11 - - - - -

12 IL / 9th 22 0.44 0.07 0.08

13 MS / 9 23 0.38 0.01 0.15

14 - - - - -

15 IL / 9th 14 0.52 0.09 0.08

16 FL / 9th 13 0.49 0.16 0.08

17 - - - - -

18 FL / 9th 12 0.48 0.0 0.03

19 US / HS 20 0.49 NA NA

20 FL / 9th 9 0.44 0.49 0.05

21 - - - - -

22 IL / 9th 24 0.38 0.23 0.12

23 - - - - -

24 - - - - -

25 - - - - -

26 IL / 9th 26 0.46 0.02 0.16

27 - - - - -

28 - - - - -

29 - - - - -

30 KY / 9th 32 0.41 0.03 0.04

31 KY / 9th 33 0.52 0.12 0.32


Reading Grade 10


Test Statistics


Number of Items 31

Mean 17.81

Std. Deviation 6.14

Reliability .84





No. Correct

Scale Score

No. Correct

Scale Score

1 0.56 0.38 0.13 0 1260 16 1633

2 0.65 0.41 -0.34 1 1353 17 1644

3 0.61 0.47 -0.15 2 1409 18 1655

4 0.81 0.50 -1.37 3 1444 19 1667

5 0.63 0.61 -0.24 4 1469 20 1678

6 0.67 0.43 -0.46 5 1490 21 1690

7 0.61 0.52 -0.13 6 1508 22 1703

8 0.91 0.44 -2.3 7 1524 23 1717

9 0.71 0.5 -0.67 8 1539 24 1731

10 0.77 0.42 -1.03 9 1552 25 1747

11 0.54 0.33 0.2 10 1565 26 1765

12 0.47 0.28 0.55 11 1577 27 1786

13 0.40 0.42 0.94 12 1589 28 1812

14 0.75 0.38 -0.94 13 1600 29 1847

15 0.50 0.34 0.42 14 1611 30 1903

16 0.58 0.55 0.04 15 1622 31 1997

17 0.70 0.41 -0.63

18 0.70 0.57 -0.65

19 0.61 0.62 -0.13

20 0.34 0.32 1.25

21 0.40 0.17 0.94

22 0.53 0.52 0.26

23 0.53 0.33 0.26

24 0.46 0.41 0.62

25 0.75 0.53 -0.96

26 0.46 0.38 0.62

27 0.59 0.47 -0.05

28 0.23 0.28 1.9

29 0.50 0.41 0.42

30 0.48 0.40 0.53

31 0.39 0.23 0.97


0910A NM Reading 10th Grade Assessment

NM Item Number

Also Found in: Question # Biserial

DIF Gender DIF Eth

1 IL / 10th 7 0.51 0.2 0.08

2 - - - - -

3 - - - - -

4 IL / 10th 10 0.2 0.27 0.38

5 DC / 10th 2 0.46 0.19 0.26

6 - - - - -

7 KY / 10th 6 0.49 0.35 0.02

8 - - - - -

9 - - - - -

10 - - - - -

11 IL / 10th 2 0.41 0.1 0.0

12 US / HS 22 0.44 0.24 0.16

13 - - - - -

14 TN / HS 13 0.44 0.05 0.05

15 - - - - -

16 DC / 10th 34 0.42 0.03 1.2

17 - - - - -

18 - - - - -

19 - - - - -

20 FL / 10th 17 0.6 0.48 0.38

21 FL / 10th 18 0.54 0.56 0.25

22 - - - - -

23 TN / HS 17 0.41 0.15 0.07

24 FL / 10th 19 0.53 0.29 0.09

25 FL / 10th 20 0.39 0.0 0.03

26 - - - - -

27 IL / 10th 21 0.35 0.47 0.46

28 - - - - -

29 - - - - -

30 - - - - -

31 - - - - -


0910B NM Reading 10th Grade Assessment

NM Item # Also found in: Item # Biserial DIF Eth DIF Gen

1 - - - - -

2 IL / 10th 2 0.45 0.31 0.3

3 IL / 10th 3 0.39 0.08 0.01

4 IL / 10th 4 0.47 0.27 0.29

5 - - - - -

6 TN / HS 13 0.46 0.24 0.12

7 - - - - -

8 KY / 10th 4 0.46 0.08 0.19

9 KY / 10th 6 0.5 0.12 0.02

10 - - - - -

11 KY / 10th 8 0.28 0.27 0.13

12 IL / 10th 10 0.29 0.0 0.05

13 KY / 10th 18 0.45 0.02 0.13

14 - - - - -

15 KY / 10th 20 0.39 0.19 0.26

16 DC / 10th 11 0.52 0.18 0.3

17 KY / 10th 21 0.52 0.08 0.23

18 - - - - -

19 DC / 10th 6 0.46 0.42 0.06

20 DC / 10th 4 0.2 0.81 0.14

21 DC / 10th 5 0.3 0.91 0.21

22 KY / 10th 24 0.48 0.48 0.39

23 - - - - -

24 DC / 10th 25 0.44 0.11 0.23

25 DC / 10th 26 0.43 0.46 0.44

26 TN / HS 18 0.33 0.7 0.45

27 TN / HS 19 0.29 0.4 0.22

28 - - - - -

29 - - - - -

30 IL / 10th 23 0.47 0.04 0.01

31 KY / 10th 29 0.36 0.28 0.46


Math Grade 9

0910 Winter New Mexico

Test Statistics


Number of Items 32

Mean 10.79

Std. Deviation 3.54

Reliability 0.49


0910A New Mexico Test & Item Stats for 9

th Grade Math

NM Item Number

NM Item Average NM Biserial


Also Found in: # Biserial DIF Gen DIF Eth

41 0.43 0.34 -0.5 IL / 9th 76 0.35 0.22 0.07

42 0.65 0.26 -1.45 IL / 9th 42 0.35 0.21 0.00

43 0.65 0.36 -1.45 AL / HS 4 0.39 0.05 0.39

44 0.35 0.24 -0.11 KY / 10th 56 0.31 0.03 0.03

45 0.14 -0.06 1.16 IL / 10th 51 0.02 0.39 0.86

46 0.23 0.32 0.50 IL / 9th 73 0.23 0.12 0.28

47 0.19 0.05 0.77 US / HS 2 0.1 0.42 0.01

48 0.26 0.12 0.33 IL / HS 28 0.51 0.4 0.22

49 0.46 0.12 -0.60 - - - - -

50 0.45 0.42 -0.57 - - - - -

51 0.24 0.21 0.46 IL / 10th 70 0.29 0.36 0.2

52 0.37 0.24 -0.19 IL / 10th 72 0.29 0.01 0.18

53 0.37 -0.02 -0.22 - - - - -

54 0.23 0.07 0.48 AL / HS 37 0.28 0.31 0.81

55 0.28 0.06 0.22 - - - - -

56 0.37 0.56 -0.2 DC / 10th 5 0.51 0.17 1.57

57 0.18 0.34 0.84 FL / 9th 58 0.42 0.02 0

58 0.22 -0.06 0.59 - - - - -

59 0.14 0.16 1.16 - - - - -

60 0.40 0.24 -0.36 IL / 9th 74 0.32 0.17 0.05

61 0.32 0.37 0.01 - - - - -

62 0.18 0.01 0.84 - - - - -

63 0.43 0.32 -0.48 IL / 9th 63 0.4 0.21 0.04

64 0.34 0.37 -0.07 IL / 9th 56 0.37 0.12 0.04

65 0.51 0.21 -0.84 FL / 9th 46 0.39 0.29 0.06

66 0.35 0.37 -0.11 IL / 9th 65 0.51 0.21 0.2

67 0.29 0.29 0.20 FL / 9th 45 0.4 0.14 0.24

68 0.38 0.21 -0.27 FL / 9th 65 0.26 0.27 0.11

69 0.31 0.36 0.08 - - - - -

70 0.50 0.41 -0.79 FL / 10th 49 0.47 0.02 0.06

71 0.36 0.42 -0.16 KY / 10th 72 0.42 0.05 0.62

72 0.20 0.20 0.72 FL / 10th 67 0.36 0.2 0.11


Math Grade 10

0910 Winter New Mexico

Test Statistics


Number of Items 32

Mean 12.01

Std. Deviation 4.16

Reliability 0.66


0910A New Mexico Test & Item Stats for 10th

Grade Math

NM Item Number

NM Item Average NM Biseral


Also Found in: # Biserial DIF Gen DIF Eth

41 0.04 0.09 2.66 - - - - -

42 0.57 0.48 -0.98 FL / 10th 50 0.55 0.29 0.44

43 0.35 0.21 0.02 - - - - -

44 0.4 0.32 -0.25 IL / 10th 55 0.33 0.05 0.35

45 0.09 -0.15 1.84 AL / HS 18 0.18 0.04 0.26

46 0.77 0.45 -1.99 AL / HS 22 0.39 0.12 0.45

47 0.23 0.31 0.67 - - - - -

48 0.57 0.39 -1.00 IL / 10th 54 0.40 0.35 0.15

49 0.54 0.48 -0.85 IL / HS 2 0.37 0.34 0.06

50 0.14 0.2 1.34 US / HS 22 0.23 0.14 0.31

51 0.48 0.11 -0.61 IL / 10th 42 0.19 0.22 0.26

52 0.87 0.2 -2.78 - - - - -

53 0.36 0.38 -0.05 - - - - -

54 0.29 0.27 0.29 US / HS 6 0.26 0.23 0.05

55 0.53 0.37 -0.81 - - - - -

56 0.58 0.43 -1.05 - - - - -

57 0.31 0.29 0.20 - - - - -

58 0.09 -0.01 1.80 - - - - -

59 0.28 0.41 0.36 FL / 10th 61 0.52 0.00 0.24

60 0.28 0.1 0.38 - - - - -

61 0.38 0.38 -0.16 IL / 10th 61 0.33 0.14 0.08

62 0.25 0.16 0.51 KY / 10th 62 0.31 0.2 0.04

63 0.51 0.42 -0.75 AL / HS 15 0.33 0.00 0.1

64 0.14 -0.01 1.34 - - - - -

65 0.26 0.42 0.46 KY / 10th 70 0.47 0.05 0.23

66 0.57 0.32 -0.98 KY / 10th 41 0.37 0.1 0.47

67 0.27 0.38 0.42 KY / 10th 42 0.4 0.09 0.01

68 0.46 0.28 -0.49 IL / 10th 75 0.30 0.12 0.23

69 0.48 0.5 -0.58 DC / 10th 11 0.42 0.15 0.46

70 0.2 0.14 0.87 DC / 10th 19 0.29 0.11 0.23

71 0.15 0.22 1.20 - - - - -

72 0.58 0.36 -1.05 - - - - -

what is predictive assessment nm 20100616

Documents