editorial

3
Editorial I am writing this editorial having just returned from the National Conference on Large-Scale Assessment in San Antonio. For a number of years, the Council of Chief State School Officers (CCSSO) has been organizing this annual meeting, in which many NCME members and EMIP readers take part. According to CCSSO, the first large-scale meeting in 1970 included approximately 50 participants. When I first attended in 1985—on my first day of work after graduate school—I was one of perhaps 100 participants. This year, 1,300 people took part. Clearly, our field has grown signifi- cantly in a number of different ways in the last 35 years. And clearly, our field has expanded to include people with a wide range of interests, professional training and experiences, and professional responsibilities. In this issue of EMIP we hear from colleagues working in two content areas—reading and social studies. One article reports a study relevant to assessing an achievement con- struct, reading. Another article addresses No Child Left Be- hind (NCLB) testing requirements and social studies. Earlier this year I wrote to the National Science Teachers Associa- tion (NSTA) and the National Council for the Social Studies (NCSS) to invite their leaders to provide testing policy com- mentaries on NCLB. I was pleased to receive a response from NCSS. In addition, this issue includes the text of Dave Fris- bie’s Presidential Address at this year’s NCME breakfast in Montreal. Please take a look at and respond to calls in this issue for nominations for two NCME editorships and for six NCME awards (which will be announced at next year’s NCME break- fast). Responding to these calls is a great way to get involved in the activities of your professional organization and to help keep NCME robust, relevant, and useful to your professional work. In This Issue Paul Yovanoff, Luke Duesbery, Julie Alonzo, and Gerald Tindal report their analyses of changes in the covariance structure of a reading comprehension test across grades 4 through 8. Their analysis demonstrates that although both vo- cabulary knowledge and oral reading fluency explain reading comprehension in early grades, the influence of oral reading fluency becomes less prominent in later grades. They discuss oral reading fluency as an indicator of automaticity of de- coding and suggest implications for the design of large-scale reading assessments and for classroom assessment practice. Yovanoff and his colleagues advocate for regularly assessing oral reading fluency and vocabulary knowledge in the class- room as a means of supplementing large-scale assessments and guiding instruction. Social studies specialists Susie Burroughs, Eric Groce, and Mary Lee Webeck provide interesting perspectives on the potential impacts of NCLB on social studies curriculum, teaching, and learning. They conducted small-scale studies of elementary, middle, and high school social studies teach- ers to capture points of view on the potential benefits and drawbacks of NCLB’s silence on the social studies. I suppose that it is not surprising that the results of this study indi- cate some ambivalence—or is it relief?—that NCLB does not require testing in the social studies. It would be inter- esting to know how much ambivalence exists in the reading and mathematics communities, where testing requirements must be fully implemented in this next school year. It also would be interesting and instructive to hear from the science community, in anticipation of implementing science testing requirements by 2007–2008. I always appreciate fresh takes on the fundamentals of educational measurement. Dave Frisbie provides just that in the text of his Presidential Address. As you may recall from his talk at the breakfast meeting in Montreal, Dave illustrates ways in which educational measurement specialists and oth- ers miscommunicate fundamental measurement concepts. He uses evidence from a range of printed materials—from measurement textbooks to news articles. He also suggests ideas for improving the clarity and consistency of conception and terminology within and outside the educational mea- surement community. One of his suggestions is that NCME might consider becoming proactive in improving clarity and consistency by offering a certification program for people who work in or write about educational assessment but do not have formal training. The National Center for Education Statistics (NCES) is proactive in this area. NCES sponsors a Graduate Certificate Program in Large-Scale Assessment through the University of Maryland. The program is designed to train state and local department of education staff, NAEP State Coordinators, and others in technical aspects of assessment design, development, and implementation. (See http://www. education.umd.edu/EDMS/Certificate/Booklet_2004b.pdf.) Similarly, the Hechinger Institute on Education and the Media at Teachers College, Columbia University provides seminars and training sessions on a range of topics, including sessions to help education reporters and editors understand issues and principles in educational testing and in reporting testing results accurately. (See http://www.tc.columbia.edu/ hechinger/.) For example, this past April, the institute cosponsored in partnership with the American Educational Research Association a symposium on educational research that is expected to shape the coverage of education. It is noteworthy that the president of NCME has publicly recommended that NCME should consider becoming proactive in improving the accuracy of communication about educational measurement concepts and topics. Visual Displays on the Front and Inside Back Covers: Opportunity to Learn Box Plots The visual displays on the front cover and inside back cover of this issue are plots of measures of opportunity to learn (OTL). The idea of developing standards for opportunity to learn arose in Congress in 1992. The intention for requiring OTL standards was to ensure explicitly the delivery of instructional inputs and processes in addition to explicit con- tent and performance standards. (See http://www.edweek. Fall 2005 1

Upload: steve-ferrara

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Editorial

Iam writing this editorial having just returned from theNational Conference on Large-Scale Assessment in San

Antonio. For a number of years, the Council of Chief StateSchool Officers (CCSSO) has been organizing this annualmeeting, in which many NCME members and EMIP readerstake part. According to CCSSO, the first large-scale meetingin 1970 included approximately 50 participants. When I firstattended in 1985—on my first day of work after graduateschool—I was one of perhaps 100 participants. This year,1,300 people took part. Clearly, our field has grown signifi-cantly in a number of different ways in the last 35 years. Andclearly, our field has expanded to include people with a widerange of interests, professional training and experiences, andprofessional responsibilities.

In this issue of EMIP we hear from colleagues working intwo content areas—reading and social studies. One articlereports a study relevant to assessing an achievement con-struct, reading. Another article addresses No Child Left Be-hind (NCLB) testing requirements and social studies. Earlierthis year I wrote to the National Science Teachers Associa-tion (NSTA) and the National Council for the Social Studies(NCSS) to invite their leaders to provide testing policy com-mentaries on NCLB. I was pleased to receive a response fromNCSS. In addition, this issue includes the text of Dave Fris-bie’s Presidential Address at this year’s NCME breakfast inMontreal.

Please take a look at and respond to calls in this issuefor nominations for two NCME editorships and for six NCMEawards (which will be announced at next year’s NCME break-fast). Responding to these calls is a great way to get involvedin the activities of your professional organization and to helpkeep NCME robust, relevant, and useful to your professionalwork.

In This IssuePaul Yovanoff, Luke Duesbery, Julie Alonzo, and GeraldTindal report their analyses of changes in the covariancestructure of a reading comprehension test across grades 4through 8. Their analysis demonstrates that although both vo-cabulary knowledge and oral reading fluency explain readingcomprehension in early grades, the influence of oral readingfluency becomes less prominent in later grades. They discussoral reading fluency as an indicator of automaticity of de-coding and suggest implications for the design of large-scalereading assessments and for classroom assessment practice.Yovanoff and his colleagues advocate for regularly assessingoral reading fluency and vocabulary knowledge in the class-room as a means of supplementing large-scale assessmentsand guiding instruction.

Social studies specialists Susie Burroughs, Eric Groce,and Mary Lee Webeck provide interesting perspectives onthe potential impacts of NCLB on social studies curriculum,teaching, and learning. They conducted small-scale studiesof elementary, middle, and high school social studies teach-ers to capture points of view on the potential benefits anddrawbacks of NCLB’s silence on the social studies. I suppose

that it is not surprising that the results of this study indi-cate some ambivalence—or is it relief?—that NCLB doesnot require testing in the social studies. It would be inter-esting to know how much ambivalence exists in the readingand mathematics communities, where testing requirementsmust be fully implemented in this next school year. It alsowould be interesting and instructive to hear from the sciencecommunity, in anticipation of implementing science testingrequirements by 2007–2008.

I always appreciate fresh takes on the fundamentals ofeducational measurement. Dave Frisbie provides just that inthe text of his Presidential Address. As you may recall fromhis talk at the breakfast meeting in Montreal, Dave illustratesways in which educational measurement specialists and oth-ers miscommunicate fundamental measurement concepts.He uses evidence from a range of printed materials—frommeasurement textbooks to news articles. He also suggestsideas for improving the clarity and consistency of conceptionand terminology within and outside the educational mea-surement community.

One of his suggestions is that NCME might considerbecoming proactive in improving clarity and consistency byoffering a certification program for people who work in orwrite about educational assessment but do not have formaltraining. The National Center for Education Statistics(NCES) is proactive in this area. NCES sponsors a GraduateCertificate Program in Large-Scale Assessment through theUniversity of Maryland. The program is designed to trainstate and local department of education staff, NAEP StateCoordinators, and others in technical aspects of assessmentdesign, development, and implementation. (See http://www.education.umd.edu/EDMS/Certificate/Booklet_2004b.pdf.)Similarly, the Hechinger Institute on Education and theMedia at Teachers College, Columbia University providesseminars and training sessions on a range of topics, includingsessions to help education reporters and editors understandissues and principles in educational testing and in reportingtesting results accurately. (See http://www.tc.columbia.edu/hechinger/.) For example, this past April, the institutecosponsored in partnership with the American EducationalResearch Association a symposium on educational researchthat is expected to shape the coverage of education. Itis noteworthy that the president of NCME has publiclyrecommended that NCME should consider becomingproactive in improving the accuracy of communicationabout educational measurement concepts and topics.

Visual Displays on the Front and Inside Back Covers:Opportunity to Learn Box PlotsThe visual displays on the front cover and inside back coverof this issue are plots of measures of opportunity to learn(OTL). The idea of developing standards for opportunity tolearn arose in Congress in 1992. The intention for requiringOTL standards was to ensure explicitly the delivery ofinstructional inputs and processes in addition to explicit con-tent and performance standards. (See http://www.edweek.

Fall 2005 1

org/ew/articles/1994/02/23/22otl.h13.html?querystring=opportunity%20to%20learn; retrieved June 26, 2005.)Requiring OTL standards was among the most hotly debatedissues of the Clinton administration education reformagenda. Skip Kifer of the University of Kentucky hasprovided the displays. He explains them below.

These displays of OTL ratings come from the Second In-ternational Mathematics Study (SIMS) conducted under theauspices of the International Association for the Evaluationof Educational Achievement (IEA; see Schmidt, Wolfe, &Kifer, 1993). The box plots were constructed by RichardWolfe of the Ontario Institute for Studies in Education at theUniversity of Toronto. Richard is a master of displays thatmeet Tufte’s criterion of high data densities (2001). His dis-plays are featured in SIMS reports, U.S. reports of the Trendsin International Mathematics and Science Study (TIMSS),and elsewhere.

Opportunity to Learn. Opportunity to learn has a longhistory in IEA. The concept was devised by David A. Walkerof the Scottish Council for Research in Education for theFirst International Study of Mathematics (FIMS), in 1967.SIMS built on Walker’s work by including an OTL measurethat asked eighth-grade teachers in the survey to respond totwo questions about each of the 139 achievement items:

1. During this school year, did you teach or review the math-ematics needed to answer this item correctly?

2. If, in this school year, you did not teach or review themathematics needed to answer this item correctly, was itbecause:a. It had been taught prior to this school yearb. It will be taught later (this year or later)c. It is not in the school curriculum at alld. For other reasons.

The OTL values depicted in the box plots combine responsesindicating content that is taught in the current year or aprevious year for each achievement item (i.e., positive re-sponses to 1 and 2a). Those responses are averaged over allitems and teachers. The result is the percentage of itemstaught in the current or a previous year averaged over allteachers in each international school system. An example ofone achievement item and its OTL results follow.

A painter is to mix green and yellow paint in the ratio of 4 to7 to obtain the color he wants. It he has 28 L of green paint,how many liters of yellow paint should be added?

The table below shows selected OTL results for this item.(The text below provides a key to the abbreviations). Valuesin rows 1–3 indicate percentages of teachers who reportedteaching the needed content; the other values are item p val-

BFL CBC ONT FRA JPN NZE THA USA

Previouscontent

30 1 3 36 93 5 2 6

New content 30 78 91 30 5 36 93 83Not taught 40 21 6 40 2 59 5 11Pretest

p value60 44 40 44 63 37 51 33

Posttestp value

61 55 58 38 62 45 64 43

ues (i.e., percentages of students responding correctly). InJapan (JPN) nearly all teachers reported that the mathe-matics content needed for the item had been taught in aprevious year (i.e., old content); there is no change in theitem p value from the pretest (i.e., within a month of the be-ginning of the school year) and posttest (i.e., within a monthof the end of the school year). In Ontario, Canada (ONT) andThailand (THA) the item was reported as new content (i.e.,taught during the year of the study); growth in the p value isconsiderable. The remaining international systems portrayvaried results, both in terms of OTL and achievement.

Box Plots for the Eight International Systems. The box ploton the front cover depicts the eight international systemsthat participated in the longitudinal portion of SIMS. Thosesystems as ordered in the graph are Belgium Flemish (BFL),Canada British Columbia (CBC), Canada Ontario (ONT),France (FRA), Japan (JPN), New Zealand (NZE), Thailand(THA), and the United States (USA). In parentheses un-der the table headers are the numbers of classrooms in thesample (and, therefore, number of responding teachers).The four panels contain box plots for the content areas—arithmetic, algebra, geometry, and measurement. The valuesin parentheses indicate the number of cognitive items aboutwhich teachers responded. So, this set of box plots containsinformation on more than 1,400 eighth-grade classrooms and139 items.

The algebra content provides an example of the power ofthe box plots to depict the differences among these eighteducational systems. Median OTL ratings for France andJapan (about 85%) are at about the median for the eighteducational systems; opportunity to learn is rather homoge-neous. OTL ratings for Canada Ontario, New Zealand, andthe United States range from near 0% to 100% and have thelowest medians among these systems. There is substantialheterogeneity among systems on the OTL indicator.

Geometry stands out as a content area where teachers’OTL ratings are extraordinarily varied both within and acrosssystems. In each of the systems (except Belgium Flemish,where geometry was not widely taught) OTL variation is largeand central values are relatively modest. There is rather moreagreement about OTL in the arithmetic arena.

Box Plots for Class Types in the United States. The boxplots on the inside back cover display OTL by class type inthe United States. Again there are 139 mathematics itemsacross four content areas. The new variable, class type, wasdetermined according to the textbook used in a classroom.This display includes 28 Remedial (11%), 165 Regular (64%),31 Enriched (12%), and 35 Algebra classrooms (14%).

These box plots portray the differentiated mathematicscurriculum in the United States in the 1980s. This differenti-ation reflects tracking in the United States, where studentsare tracked more and earlier than in the other systems. Dif-ferent courses with different amounts and types of contentare offered to eighth-grade students enrolled in the sameschool. This finding was identified in SIMS and replicated inTIMSS results. The most startling comparison is in the alge-bra content area, where virtually all of the Algebra classesare exposed to the content of the majority of the SIMS alge-bra items, while the Remedial classes see very little algebracontent, and Regular classes range from about 0% to 100%coverage. In general, the Enriched and Algebra class types

2 Educational Measurement: Issues and Practice

are exposed to more mathematics content than are the otherclass types and may be exposed to material to which studentsin Remedial and Regular class types are never exposed.

It is interesting to imagine the display for algebra for theUnited States in the first set of box plots (i.e., the systemscomparisons) as a composite of the algebra content box plotsfor the four class types in the second set (i.e., the U.S. classtypes comparisons). The heterogeneity of algebra contentcoverage in the United States portrayed in the first set may beexplained, in part, by the differentiation of content accordingto class type in the second set.

These comparisons of opportunity to learn in the UnitedStates and other educational systems, along with other SIMSfindings, foreshadowed one of the most popular findings fromTIMSS: that the U.S. mathematics curriculum is “a mile wideand an inch deep.” SIMS suggests, according to some pundits,that the U.S. curriculum in the 1980s was “a series of onenight stands.”

In ClosingBest wishes for a healthy, productive, and enjoyable aca-demic year!

Steve FerraraEditor

References

Schmidt, W. H., Wolfe, R. G., & Kifer, E. (1993). The identificationand description of student growth in mathematics achievement. InL. Burstein (Ed.), Second International Mathematics Study: Stu-dent growth and classroom processes in lower secondary school.London: Pergamon Press.

Tufte, E. R. (2001). The visual display of quantitative information(2nd ed.). Cheshire, CT: Graphics Press.

Fall 2005 3