cms.education.gov.il · web viewassessment in the service of learning: theory and practice...

State of IsraelMinistry of Education

RAMAThe National Authority for Measurement & Evaluation in Education

Assessment in the Service of Learning:

Theory and Practice

Professor Michal BellerDirector-General RAMA

March 2013

Bet Avgad, 5 Jabotinsky Road, Ramat Gan, 2nd Floor, 5252006, ISRAEL Tel: +972-3-5205555 ▪ Fax: +972-3-5205509 ▪ e-mail: [email protected] ▪

http://rama.education.gov.il

Contents

Introduction.............................................................................................................................3

Large-Scale Tests in Educational Systems and their Frequency.......................................4

Updating the Measurement and Evaluation Format: Integrating External and Internal Assessment................................................................................................................7

School-Based External Assessment.......................................................................................8

The Meitzav.....................................................................................................................................8

What Can be Learned from the Meitzav?.......................................................................................11

Trends over Time and Comparison of Assessment Scores.............................................................12

Sample-Based National Assessment....................................................................................14

International Studies.......................................................................................................................14

National, Sample-Based Assessment – Mashov Artzi....................................................................16

National Sample-Based Monitoring of School Violence Level......................................................17

School-Based Internal Assessment......................................................................................19

Internal Meitzav.............................................................................................................................19

Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks........................................21

The School-Based Assessment Coordinator.......................................................................24

Evaluation of Teaching Staff and Investigation of Teaching Practice.............................25

Evaluation of teaching staff............................................................................................................25

International Teacher Survey - TALIS...........................................................................................28

The Use of Test and Survey Data for Research and for Program and Project Evaluation..............................................................................................................................28

Assessment in the Service of Learning – Summary...........................................................34

Bibliography..........................................................................................................................36

2

http://rama.education.gov.il/

Assessment in the Service of LearningProfessor Michal Beller, Director-General RAMA

Introduction

A review of the goals of education ministers in Israel over the years reveals three meta-

goals of the education system: realization of the potential of every student (scholastic,

creative and development of values), narrowing education gaps and maintaining a safe

learning environment.

Assuming wide-spread agreement with regard to these goals, the question arises as to how

we know whether these goals have been achieved. How are parents to know whether the

education system has provided their children with the tools necessary to successfully

function as active citizens in society? How are each of the partners in the educational

process (teachers, principals, and position holders at various levels of the education system)

to know whether they have satisfactorily fulfilled their role and whether the needs of

students from different backgrounds have been appropriately addressed? How are we to

detect educational gaps and how are we to know whether they have been narrowed? How

are the public to know that the future generation of Israeli children has been adequately

prepared to face the challenges of the 21st Century? How can the public be sure that the

extensive resources made available to the education system are being used judiciously and

have the planned effect? How can the benefit of increasing the State’s investment in

education, even at the expense of other important needs, be proven to policymakers? To

examine the extent to which these goals have been achieved, professionally-designed, valid

measurement and evaluation tools are needed.

Measurement and assessment are complex issues in all sectors – in the business sector, and

even more so in the public sector. In the education sphere, the implementation of

measurement and assessment is that much more difficult: learning and teaching processes

are inordinately complex, diversity among students is immense, there are different

pedagogical approaches to achieving educational goals, many programs are implemented in

the education system and more often than not the outcomes and results are realized only

after years of investment.

3

In education there is no single, "one size fits all" answer suited for all needs, nor is there a

single formula for implementing measurement and assessment processes. Different

pedagogical approaches and programs require, accordingly, different measurement and

assessment models. Therefore, the optimal education process must be accompanied by

measurement and assessment processes whose results guide educators and assist them in

deciding what is more and what is less suited for their students, what they should change

and improve and what is better maintained as is.

The National Authority for Measurement and Evaluation in Education (known by its

Hebrew acronym, RAMA) was founded in 2005 to address the need for professional

measurement, evaluation and assessment in the education system. The ideology underlying

RAMA’s activities rests on two principles: (a) assessment in the service of learning, and (b)

provision of professional solutions that effectively integrate different measurement and

evaluation components (for additional information about RAMA, see

http://rama.education.gov.il).

Large-Scale Tests in Educational Systems and their Frequency

Over the last decades national tests have been administered to large numbers of students in

educational institutions in various countries throughout the world, including Israel. The

significance and importance of these tests is growing, leaving its mark on all parties in the

learning and educational process.

Large-scale assessments and professional surveys are vital instruments for monitoring and

tracking student achievements as well as the extent to which the education system has been

successful in imparting knowledge and values to all learners within the system. Through

objective and professional analysis of assessment tests it is possible to identify gaps which

need to be rectified and to highlight areas that may have been overlooked and in which

even greater resources should be invested. Assessment tests may also spur learning, foster

responsibility and accountability on the part of those in charge of teaching, and enhance the

congruence between teaching and the Education Ministry policy as reflected in its curricula

and in frameworks for teacher training and professional development.

4

Alongside the many benefits inherent in the use of these test systems, it has been

acknowledged that over time they may be accompanied by negative effects on the

education system and on the quality of pedagogical processes in schools. These negative

effects intensify as the tests become more central and important in the eyes of those at all

levels of the system, and particularly as they are perceived as "high-stakes"1 in the eyes of

principals, teachers and students. Professor Don Campbell (Campbell, 1979), one of the

greatest scholars in the social sciences, wrote about this tendency and its ramifications:

“The more any quantitative social indicator is used for social decision making, the more

subject it will be to corruption pressures and the more apt it will be to distort and corrupt

the social processes it is intended to monitor… achievement tests may well be valuable

indicators of general school achievement under conditions of normal teaching aimed at

general competence. But when test scores become the goal of the teaching process, they

both lose their value as indicators of educational status and distort the educational process

in undesirable ways.”

The negative effects have been documented for quite some time in the research literature

(for example, Campbell, 1979; Hamilton, 2008; Koretz, 2005; Koretz & Hamilton, 2006;

Nichols & Berliner, 2007) and reported by RAMA in other publications.

Among the adverse impact of improper implementation of wide-scale tests are:

Diverting teaching resources from subjects that are not included in the national

assessments in favor of subjects included.

Focusing on test preparation, through intensive test-oriented study. This type of

study is often based on memorization and repetition involving fewer higher-order

thinking skills critical for comprehension and long-term mastery of the study

material and for generalization to additional fields of knowledge. Furthermore, this

type of study may bore students and erode their joy of learning, curiosity and

motivation.

In extreme cases, as a result of the pressure felt by some schools and their desire to

raise their achievements at any cost, some may resort to illegitimate actions that

1 A test system is “high stakes” when any of the entities in the system feels threatened by the results and perceives himself or herself as someone who may be hurt if results are low or benefit when results are high. Risk abundance increases when school supervision is perceived as threatening schools prior to the tests, and when it tends to use test results in order to reprimand or impose some kind of sanctions.

5

harm test integrity (for example keeping weak students away from school on the test

day, attempting to obtain in advance information related to test topics and questions

in order to teach them in class, helping students during the test, etc.). Even worse,

these actions send an undesirable educational message to students.

Besides damage to the quality of the pedagogical processes, tests perceived as "high stake"

may also impair test result validity and the ability to draw conclusions that will serve the

system and promote its improvement. Thus, improved test results achieved through

intensive preparation in sample schools tested in a given year, does not necessarily indicate

improvement in the education system as a whole as it does not represent an increase in the

knowledge level of all students. This improved achievement, even if it appears to enhance

the public image of the education system or parts of it, is in many ways only cosmetic and

worthless to policymakers and decision makers who strive to create real and sustainable

change in the system as a whole over time. Only test results collected under “true

conditions”, and without special preparation, can testify to the state of the system. This is

the only way decision makers at different hierarchy levels can become cognizant of

strengths and weaknesses and take action to improve the system.

The education system, in cooperation with RAMA, must act to minimize these negative

phenomena, primarily through a cultural change which upholds “measurement in the

service of learning”, whereby measurement and assessment are intended to serve

learning and not vice versa.

For the system to improve it must ensure that its measurement and assessment tools provide

valid data as much as possible, i.e., its results correctly and accurately reflect the condition

of the system and do not stem from unique and targeted efforts designed solely to raise test

grades. Steps should be taken to eliminate negative phenomena by sending the correct

message to the field and by reducing the pressure and the threat that external test results

serve as the sole evidence as to the quality of school pedagogical processes.

6

Updating the Measurement and Evaluation Format: Integrating

External and Internal Assessment

The perpetual dilemma facing the education system is the choice between independent,

internal measurement carried out by the school (partially free of the pressures described

above, and more suited to students and the material studied in each educational institution)

and external measurement which is standardized, professional and centralized. In other

words, there is a constant tension between decentralized and centralized measurement and

assessment. Some maintain that independent internal assessment is less intrusive and

empowers school principals and the teaching staff compared to external assessment.

However, considerations of responsibility, accountability, transparency, professionalism,

viability, and mainly the ability to make valid comparisons between schools (or between

sectors, countries or other groups), including multi-year comparisons, require that part of

the assessment be centralized, external and carried out by a professional entity responsible

for educational measurement and assessment.

In order to integrate the two approaches, which separately cannot address all needs, and

enjoying the benefits of both, the format of Israel’s national assessment was updated in

2007. The new format was designed by RAMA in collaboration with various entities in the

Ministry of Education and in consultation with school principals and many teachers. The

format is intended to provide a professional solution for educational measurement and

assessment to all stakeholders in the education system: schools and various external

entities. This new format was designed to improve the then existing assessment system

from which it was derived and to address its shortcomings, and was based on the following

principles:

Implementation of a culture of “assessment in the service of learning” in which

measurement is intended to support continued learning improvement through

congruency with learning goals and school vision, and which is based on the

understanding that tests are not a goal in and of itself, but rather a tool in the

service of learning.

7

Informed integration of internal and external assessment, and between formative

(assessment in the process of and for the purpose of learning) and summative

assessment (assessment of learning products).

Maximum decentralization of assessment while ensuring use of professional tools

provided to schools by RAMA.

Empowerment of school principals and teachers.

Reduced pressure and frequency of external tests.

Preferred use of external tests to a sample of students (representative) over

external tests encompassing all students.

The new format established by RAMA combines three elements: sample external

assessment (international and national), independent school-based assessment using

standardized and external tools and internal school-based assessment. In order to maintain a

balance between the various elements, significant reinforcement of internal school-based

assessment is required, and to this end more extensive teacher training in this field is

needed.

The internal Meitzav (detailed below), established concurrently with reduced frequency of

external Meitzav tests, is an important element of the new format. It serves teachers and

management and its results are not reported.

School-Based External Assessment

The Meitzav

The Meitzav (Hebrew acronym for “School Growth and Efficiency Measures”) is a system

of “School Growth and Efficiency Measures” that includes student achievement tests as

well as questionnaires designed to glean information about the school climate and

pedagogical environment (administered to principals, teachers and students). The purpose

of the Meitzav at the school level is to provide school principals and teaching staff with a

tool for planning and utilizing resources for realizing student potential, improving the

pedagogical climate and enhancing instruction in school. At the system level, the Meitzav

is intended to provide a snapshot of the mastery level of Israel’s students in the curricula of

four core subjects, and to serve professional entities and decision makers in the Education

8

Ministry in setting policy on various educational issues, including climate and pedagogical

environment.

The Meitzav achievement tests focus on four core subjects: Native Language

(Hebrew/Arabic) Mathematics, English, and Science and Technology. Tests are

administered to students at two grade levels: five and eight, and the native language test

(Hebrew/Arabic) is administered in grade two as well. The achievement tests are designed

in full congruence with the curricula in each of the subjects and are intended to examine the

extent to which elementary and junior-high school students meet the expected level

required of them according to these curricula. Examples of these tests can be found on the

RAMA website in the “School Assessment” tab on the topic of “Off the Shelf

Assessments”.

Each school belongs to one of four "Meitzav Clusters" – four equal and representative

groups of elementary and junior-high schools in Israel (Clusters A, B, C, and D). Each

cluster of schools is selected such that it will represent all schools nationwide.

Meitzav tests, in the format set by RAMA when it was established, are administered in each

of the four core subjects in a four-year cycle. Schools are tested once every two years in

external tests (external Meitzav) in only two subjects: Mathematics and Native Language

(Hebrew/Arabic) or English and Science and Technology, and two years later are tested in

the two other subjects. In a year in which the school is not tested on an external test in a

given subject it is tested on it through an internal test (internal Meitzav) which is the same

test administered that year on the external Meitzav (for details see the chapter on Internal

Meitzav). Thus each school is tested in an external Meitzav test in each subject (in the

relevant grades) once every four years and in the internal Meitzav test in the same subject in

the following three years. For further information see the RAMA website, “Meitzav” tab on

the subject “General Background – External and Internal Meitzav ” .

9

Table Number 1 – Cycles of Meitzav clusters by subject and year

Cluster of

Schools

Knowledge Field

2006/7 2007/8 2008/9 2009/10 2010/11 2011/12 2012/13

2013/14

A

Science & Tech External Internal Internal Internal External Internal Internal InternalEnglish External Internal Internal Internal External Internal Internal InternalMath Internal Internal External Internal Internal Internal External InternalNative Language

Internal Internal External Internal Internal Internal External Internal

B

Science & Tech Internal External Internal Internal Internal External Internal InternalEnglish Internal External Internal Internal Internal External Internal InternalMath Internal Internal Internal External Internal Internal Internal ExternalNative Language

Internal Internal Internal External Internal Internal Internal External

C

Science & Tech Internal Internal External Internal Internal Internal External InternalEnglish Internal Internal External Internal Internal Internal External InternalMath External Internal Internal Internal External Internal Internal InternalNative Language

External Internal Internal Internal External Internal Internal Internal

D

Science & Tech Internal Internal Internal External Internal Internal Internal ExternalEnglish Internal Internal Internal External Internal Internal Internal ExternalMath Internal External Internal Internal Internal External Internal InternalNative language

Internal External Internal Internal Internal External Internal Internal

Cycle 1 Cycle 2

Internal tests are accompanied by pedagogical material for teachers. The school can use

internal Meitzav grades as they see fit for improvement and internal learning and as part of

the annual assessment provided students. The process of teachers' grading internal Meitav

tests enhances their professional development as it exposes them to professional

indicators/rubrics that define expectations from students and enables them to learn from

students' answers about their knowledge and comprehension levels. Internal Meitav

grades serve, as noted, only the school staff, and the school is not required to report

them to an external entity (see expanded discussion on internal Meitzav below).

National norms, derived from the results of the external Meitzav tests, are reported to the

schools that administer internal Meitzav tests. These norms help the school principal and

the teaching staff interpret data obtained from the same tests administered internally that

year.

10

The school climate and pedagogical environment surveys in the Meitzav are designed to

provide a detailed picture of the school climate and pedagogical processes as revealed

through student questionnaires and interviews with teachers. The questionnaires provide

comprehensive and relevant information about important dimensions in this area, including:

student motivation level; the relationship between teachers and students; violent events and

students’ feelings of safety and sense of feeling protected; team work among teachers, and

more. These dimensions are based on insights gathered from several sources: focus groups

of teachers and principals, discussions with Education Ministry officials, consultation with

academic scholars and reviews of the current literature. The questionnaires are administered

to fifth through ninth graders and to elementary and junior-high school teachers. In the

2008/9 school year a pedagogic and school climate questionnaire was administered for the

first time to high school students. A school survey for high school teachers is in advanced

development stages. For more information see the RAMA website, “Climate and

pedagogical setting surveys” tab.

At the end of the external Meitzav administering process, RAMA produces a

comprehensive school Meitzav report (with respect to school achievements and climate), as

well as a detailed national Meitzav report.

What Can be Learned from the Meitzav?

The importance of using the Meitzav as a working tool stems from the need to obtain an

updated, diagnostic picture of the level of implementation and fulfillment of various system

goals (at the student, class, school, and overall education system level) in order to realize

the potential for continued improvement of schools and the education system.

At the school level – the process in which insights are gleaned from detailed school reports

enables the school staff to examine itself and to view the school as a holistic system from

various aspects: achievements, climate and pedagogical setting.

The findings presented in the reports enable the school staff to identify strengths and

weaknesses in the subjects tested, to identify topics or abilities that were not stressed, to

propose hypotheses with respect to the findings, to learn what additional data should be

collected to confirm or reject hypotheses, to examine the reasons for difficulties found (for

11

example, why students have difficulty in writing) and to design long-term programs that

will address these difficulties. Effective use of Meitzav findings can help schools design

mechanisms to improve school-based processes and plan long-term steps to sustain

improvement over time.

At the system level – the external Meitzav tests provide data about the education system

and schools across different cross-sections that, similar to school-level Meitzav tests, help

identify difficulties and gaps that need to be addressed in order to improve the system.

Thus, for example, achievement data is compared by socio-economic level and across

different sub-groups (for example by sector or gender). Comparisons of this kind provide

decision makers with information about gaps in the education system that require

intervention. Since 2008, valid comparisons of Meitzav grades in different years (see

below) are possible, improving the ability to monitor system quality over time. Moreover,

Meitzav data can serve to evaluate the effectiveness of national educational programs

implemented in the education system. The annual reports are published on the RAMA

website, “RAMA Publications” tab on the topic of “System and Local Government Meitzav

Data”.

Trends over Time and Comparison of Assessment Scores

Two of the more common though not necessarily informed uses of national external test

grades are the comparison of achievements over time and by grade level (“league tables”).

However not every comparison is valid or valuable; for example, does a grade of 85 on an

easy test in any particular year indicate an improvement compared to a grade of 75 on a

more difficult test the previous year? Of course not! Thus, unprofessional and irresponsible

rankings of this nature between schools may cause serious damage.

RAMA is working with all partners to the educational process and the general public to

instill the understanding that rankings have significance, if at all, only among relevant

populations, and that comparisons over years have value only when grade scales are

calibrated. Calibration is designed to neutralize the effect of differing levels of tests

administered in different years in the same subjects, allowing for valid comparisons of test

achievements. The need to calibrate scores from year to year stems from the fact that test

12

formulations differ every year with respect to the difficulty level of test questions. The

grades on the raw grade scale for each year are the total points accumulated by testees for

their answers on a given test formulation. It is impossible to determine for example,

whether the reasons for the rise in the raw score average is the result of an easier test

formulation that particular year or of a rise in achievement among students (or both).

In order to solve this problem and to examine Meitzav grade trends over time, in 2008

RAMA established a statistical calibration scheme of test grades that translates test grades

for each year to a new comparative scale – the multi-year Meitzav scale. The new

calibrated scale is designed to allow valid comparisons of Meitzav test scores over time.

The calibration scheme takes into account differences in test difficulty levels in different

years, "positioning” grades on a new measurement scale, allowing for multi-year

comparisons. The multi-year Meitzav scale was designed such that in the base year 2008 –

grade average was 500 and standard deviation was 100.

The calibration procedure and implementation of a multi-year scale is standard practice in

various national testing systems (for example in the Israeli psychometric exam and the

NAEP in the U.S.A.) and in tests administered as part of international studies. However, it

should be stressed that grades on the new Meitzav scale cannot be translated into grades on

other test systems; they only allow for multi-year comparison of Meitzav achievements

within each subject and grade level.

A comparison of Meitzav results for the years 2008 and 2012 points to a general trend of

improvement in the four core subjects – Native Language, Science and Technology, and

English – in grades five and eight and in Mathematics in grade 5, , in both the Hebrew-

speaking and Arabic-speaking sectors. For grade five, a comparison with respect to the test

results in 2007 was also possible.

In grade five – in the six years 2007-2012 a moderate to large cumulative increase

in the four subjects was recorded. The increase changes between knowledge areas

and language sector, and ranges between 30 to 60 points on the multi-year scale.

In grade eight - in the five years 2008-2012 a cumulative increase was recorded in

three subjects: a slight increase in English (about 10 points) and a large increase in

Native Language and in Science and Technology (about 50 points). Student

13

achievements on the 2012 Mathematics test are similar to the achievements

recorded in 2008.

The upward trend in achievements over the years is also reflected separately for

each of the three socio-economic groups (low, medium and high socio-economic

background) for each of the subjects.

For most subjects, changes in achievement gaps (narrow or widen) in favor of

students from a higher socio-economic background were not recorded over time.

Exceptions are Hebrew and English tests for grade five: in 2012 the gap between

achievements on these tests among students in Hebrew-speaking schools narrowed.

The Meitzav also includes an examination of school climate and pedagogical

environment. In this area the findings indicate stability over time with several positive

trends (including an increase in students' reported sense of safety and protection, an

increase in reported appropriate behavior among students in class, a slight decline in

reported violence in elementary school and an increase in reported use of computer

mediated communication for learning purposes).

Sample-Based National Assessment

International Studies

Among its roles, RAMA is responsible for conducting the international studies in Israel.

These studies make possible the comparison of student achievements across many countries

in several subjects as well as the study of other educational issues. Furthermore, results of

these studies enable comparison between different sectors and different population groups

within each participating country. The tests are administered in a fixed cycle once every

few years, and allow for the study of trends over time (calibration is structured into these

tests as well). The international organizations that develop the tests are among the leaders

in the field of evaluation and measurement. The tests, translated into different languages,

are meticulously designed and have high levels of reliability and validity. Each of the tests

and questionnaires is designed according to a detailed and rigorous theoretical framework,

drafted by experts in the subject tested and pedagogy from around the world.

14

In each country the tests and questionnaires are administered to a representative sample of

the population and scores are not reported at the class or school level, only at the country

level. Israel has participated in a series of international studies in recent years, including:

PISA (Reading Literacy, Mathematics and Science for age 15+), TIMSS (Mathematics and

Science for grade eight) and PIRLS (Reading Literacy for grade four).

These studies provide reliable information about the Israeli education system from an

international perspective. This information may be of great importance to policymakers

who strive to obtain an accurate picture of the weaknesses and strengths of the education

system in Israel through objective comparison with other education systems in the world.

Furthermore, participation in international studies enables Israel to learn about new and

contemporary approaches in the subjects tested and to examine its curricula in relation to

curricula in other countries in the world. Thus, for example, the need to strengthen literacy

in Science, Mathematics and Language was identified through the PISA study. The studies

also enable participants to learn from models of successful education systems in

different countries, through comparisons of high and low achieving countries, of countries

that differ in education gaps between and within schools, and through an examination of the

relationship between student achievement and different background variables (such as

parents’ education, socio-economic status, student attitudes towards the subject, etc.). For

example, this is how the world learned about the successful education system of Finland

having very high achievements on the PISA study. The success of the Finish education

system has been attributed mainly to its social and cultural norms accompanied by a reform

based on the empowerment of school teachers and principals.

Media and policymakers often tend to emphasize the rank achieved by each country on the

country ranking, comparing it to the position achieved on previous tests, and presenting the

ranking as the main finding of international studies. However, the main value of these

tests lies in the opportunity they offer to conduct comparisons and examinations

within each country separately, addressing objectives reached in each subject and the

relationship between achievement and background variables and attitudes related to the

various study topics. In Israel, for example, study results allow for comparison between

student achievements in the Arab-speaking sector and the Hebrew-speaking sector, and also

comparison between other sub-groups (boys and girls for example). Above all, through

15

these studies the achievements of the State of Israel can be examined over time, owing to a

cyclical pattern of these tests conducted in a similar format once every three to five years,

while maintaining calibration of the grade scale from one cycle to the next.

The PISA 2009 results published in December 2010 showed a significant improvement in

student achievements in Israel in Reading Literacy (and stability in students’ achievements

in Mathematics and Science Literacy). The results of PISA 2012 are expected to be

published in December 2013. The results recently published for PIRLS 2011 and TIMSS

2011 indicated significant improvement in the achievements of students in Israel. We are

currently preparing for participation in the PISA 2015 study that will be computerized in its

entirety. Further details about each of these tests can be found on the specific websites of

each of the international studies and on the RAMA website, under the “International Tests”

tab.

National, Sample-Based Assessment – Mashov Artzi

RAMA is currently launching a sample-based, national assessment, known as the Mashov

Artzi, that will focus on a different subject each time, and will provide system-level

information about educational achievements in the education system in Israel and also

information about the specific pedagogical context. The Mashov Artzi will enable the

policymakers to examine various subjects, over and above those tested on the Meitzav.

Similar national systems are implemented in several countries, among them the National

Assessment of Educational Progress (NAEP) which has been administered for decades in

the USA and which is the most well-known and advanced.

For each subject that will be tested on the Mashov Artzi a representative sample of schools

will be selected to participate. The Mashov Artzi for each subject will include achievement

tests and questionnaires intended for students, teachers and school principals. The Mashov

Artzi will provide information with regard to learning outcomes related to a wide range of

content, skills and thinking strategies relevant to each subject, and would be based on

relatively small sample of students. The multiplicity of test formats in each subject will

allow for a wide and in-depth coverage of the subject and will provide reliable information

with regard to the sub-topics included in it. The questionnaires will allow for the collection

of information with regard to various variables for describing and characterizing the context

16

in which the subject is taught (e.g., prevalent teaching practices, professional development

of subject teachers, school policy with regard to the subject, etc.). This information would

then contribute to interpreting and explaining the learning outcomes.

The Mashov Artzi will enable policymakers to learn about the success of different

instructional and educational methods by examining gaps found between sectors, and to

examine trends over time. In light of the study goals, the findings on the Mashov Artzi will

be analyzed and reported at the national system level only. Findings at the student, class or

school level will not be analyzed or reported.

The Mashov Artzi will be conducted cyclically once every few years and each knowledge

field will be testesd based on its own test cycle, in order to track indicative trends over

time. The national Mashov will be administered for the first time in the 2014 school year

and the first knowledge field to be examined will be Geography. The Mashov Artzi will be

administered in ninth grade, towards the end of mandatory studies in this subject. Study

tools (achievement tests and questionnaires) will be administered in a computerized setting.

Computerized tests in Geography will assess, among other things, geographical skills in

using technological tools (such as interactive maps). In this way there will be an attempt to

collect information about certain skills that are included among those that are required of a

citizen of the 21st Century. Further details about the national Mashov in Geography,

including the framework document, can be found in the RAMA website, “National Tests”

tab, on the subject “Sample-Based National Assessment/ Mashov ” . Additional knowledge

fields in which the national Mashov will be conducted have yet to be determined.

National Sample-Based Monitoring of School Violence Level

School violence is one of the central issues on the public agenda with respect to the

education system in Israel. In light of the extensive discourse on this topic and in order to

identify trends related to violence in the education system there has arisen a need to

monitor the level of school violence. .

In 2009 and 2011 RAMA conducted a large scale survey for monitoring violence among

a national, representative sample of students in grades four through eleven in both the

Hebrew speaking sector and the Arabic speaking sector. The aim is to continue to monitor

17

levels of violence using similar questionnaires to be administered once every two years

among a representative sample of students. The third monitoring survey is being conducted

in 2013.

The questionnaires administered to students examined a series of violent and dangerous

behaviors among students which have been summarized in the following indices: severe

violence, moderate violence, social violence, violence using digital media, verbal violence,

violent gangs and bullying, sexual violence, alcohol and drug abuse, violence by and

towards the school staff, cold weapons in school, violence on school buses to and from

school, absenteeism due to fear of injury, students' feelings of safety and protection in

school, school efforts to prevent violence. Questionnaire development was the result of the

work of a steering committee comprised of personnel from RAMA, entities from the

Education Ministry that deal in school violence (the Psychological and Counseling Services

Division - SHEFI, the Youth and Society Department and the two age divisions –

elementary and secondary) and experts from academia.

For the majority of indices a trend of improvement was evident between 2009 and

2011, at the different age levels and in the two language sectors. Stability was recorded

on the remaining indices for these years. Improvement was especially evident (in other

words a decline in the rate of student reporting) for the following indices: severe violence,

social violence, violence by and towards school staff, sexual violence, violence on school

buses and alcohol abuse.

Improvement characterizes both language sectors, and is especially evident in Arabic-

speaking schools, and also particularly in 4th-6th and 7th-8th grades. In reports of 10th-11th

grade students in Hebrew-speaking schools stability was recorded for the most part during

these years. In the reports of their counterparts in these grades in Arabic-speaking schools a

trend of improvement was recorded for the most part.

In each of the years 2009 and 2011: The older the students, the fewer reports of most types

of violence – except for violence towards school staff, bringing cold weapons to school,

alcohol and drug abuse.

In each of the years 2009 and 2011: among students in Arab-speaking schools the reporting

rate of most of the negative behaviors examined by the questionnaire was higher in

18

comparison to Hebrew-speaking schools. This holds true except for verbal violence and

alcohol consumption, in which the gap is in the opposite direction.

Further information can be found on the RAMA website in the “Research/Studies and

Project Assessments” tab, under the topic “Violence Monitoring”.

School-Based Internal Assessment

Internal assessment is carried out continuously, and by definition is performed by and

under the initiative of the school staff. The main goal of school-based internal assessment is

to promote student learning. Data/Information collected from the internal assessment help

teachers and school management identify students’ strengths and weaknesses with respect

to expected achievements and guide them regarding the needs of the students and adapting

instruction to these needs. School-based internal assessment is based on an approach of

assessment for learning (Halel – Hebrew acronym), which is intended to provide an answer

to two key questions: where is the student on the way to achieving learning objectives?

What steps are required to promote learning and realize its objectives?

The assessment process is based on gathering information and evidence from a variety

sources and through a range of tools (tests, performance tasks, assessment assignments)

together with an integrative interpretation of the evidence. Interpretation and conclusions

then constitute/are a foundation on which to design appropriate intervention aimed at

achieving learning objectives.

RAMA contributes to the reinforcement of school-based internal assessment by providing

professional assessment tools which also include pedagogical materials (for example,

detailed indicators/rubrics, explanations about mapping questions and definition of tested

abilities, examples of common mistakes and suggestions for further instruction activities ,

etc.).

Various tools available to teachers for school-based internal assessment are as follows:

Internal Meitzav

RAMA provides schools with the external Meitzav tests developed professionally for use in

the context of school-based internal assessment. The internal Meitzav tests are intended to

19

be included as an integral part of a school-based internal assessment routine and to

complement the other internal assessment tools. In the Bulletin of the Director-General, it

has been emphasized that the purpose of the internal Meitzav grades is to exclusively serve

the school staff, and therefore there is no requirement to report results to any /external

entity. For further information see the RAMA website, “School Assessment” tab, on the

subject of “Internal Meitzav ”.

The internal Meitzav test is based on the following principles:

An objective, external, national test, with psychometric qualities of reliability and

validity, developed by RAMA in collaboration with professional committees. The test

reflects the curriculum and requirements expected from students in each core subject

and at given grade levels, in terms of knowledge and skills. Internal examination

scored by school staff (with the help of indicators/rubrics and scoring tools) of

individual and group assessment can be produced quickly regarding students’

proficiency in every subject.

Enables comparison of student achievements to external norms (national, district,

sector) gleaned from the external Meitzav test.

The benefits for schools from the internal Meitzav test include the following:

Enhanced school-based internal assessment processes. Schools can gain a snapshot

of their condition that combines information based on external assessment sources

that can be adapted to the school context.

Data-based decision making. School administration and teaching staff can gain

insights from the test grading process and the results that will assist them focus on

appropriate educational and learning goals in alignment with school vision.

Adapted to school needs. The school can use internal Meitzav grades as it sees fit, as

part of annual student assessment.

Reduction of the negative phenomena that often accompany the external Meitzav,

for example diversion of resources (study time and teaching cadre) at the expense of

other subjects, distancing weak students from school and reduced motivation of

some students to take a test that “does not count,” for the individual grade.

20

Off-the-Shelf Tests, Formative Tests and Banks of Performance Tasks

RAMA provides a wide variety of class-based internal assessment tools: previous versions

of achievement tests, formative exams and tests, specifications and test rubrics, banks of

performance tasks in different subjects. Information gathered from these tools can serve as

the basis for developing intervention programs adapted to student needs. The performance

tasks designed and provided to schools by RAMA are intended to assess learning processes

and products as well as complex learning abilities. These include: a system perspective,

problem solving, taking a position, critical thinking, drawing conclusions, planning and

identifying connections.

Following are additional tools available to schools for internal assessment:

Hebrew reading and writing test for grade one:

The purpose is to track the proficiency of students in Hebrew reading and writing

skills. Test results serve teachers in designing appropriate intervention.

The test is comprised of a series of tasks which are administered individually

(teacher—student) during the school year.

Arabic reading and writing test for grade one:

The purpose is to track the proficiency of students in Arabic reading and writing

skills. Test results serve teachers in designing appropriate intervention.

The test is comprised of a series of tasks which are administered individually

(teacher-student) during the school year.

Kit for assessing beginning reading in English for grade five:

The purpose is to identify students with difficulty in beginning reading in English

and identify particular difficulties that may be obstacles to reading acquisition. Test

results serve teachers in designing appropriate intervention.

The test is comprised of two components: one is a screening test administered by

the teacher to all students, the other is a diagnostic test administered individually.

21

Amit test in native language for grade seven (Hebrew and Arabic) :

The purpose is to assess the proficiency of students in reading comprehension and

writing reading and writing (Hebrew/Arabic as first language). Test results serve

teachers in designing appropriate intervention.

The test is comprised of three parts. Each part includes one reading text and test

items relevant to reading comprehension, grammar and writing. The texts cover

different genres, and questions relate to skills such as locating information with a

low, medium and high access level; linguistic structures, and vocabulary. All

students participate in the first part, and then proceed to a second part which is

suited to their level of performance (as indicated by results on the first part).

Kit for assessing spoken language in Hebrew in grade eight :

The purpose is to assess the proficiency of students in speaking Hebrew and was

originally developed as part of the internal Meitzav for native speakers of Hebrew in

grade eight..

The kit covers three aspects of spoken language: reading out aloud, reporting and

group discussion. It includes assessment tasks, rubrics for assessing student

performance and a teacher's guide.

Kit for assessing spoken language English for junior high school:

The purpose is to assess the proficiency of students in speaking English

The kit includes five units that relate to various aspects of oral social interaction

and presentation. Each of the units is comprised of assessment tasks, rubrics for

teachers and students suggestions for teaching activities that promote development

of spoken language, and a teacher's guide

Kit for assessing Hebrew among immigrant students entering grades three through nine:

The purpose is to examine when and how immigrant students can be successfully

integrated into homeroom classes – among their Hebrew native speaking

contemporaries– and to participate in classes in the various subjects, while receiving

assistance.

The kit includes five sections: discourse, reading out aloud, listening

comprehension, reading comprehension and writing. The kit is accompanied by a

22

teacher's guide including very detailed guidelines regarding the structure, goals,

administration dates, administration mode, duration and scoring.

Bank of performance tasks in "Culture and Heritage of Israel" for grades six through eight:

The purpose is to assess achievement in this school subject. Student performance

on these tasks serves as the basis for planning and improving teaching processes and

is a basis on which the teacher can provide effective feedback that promotes

learning.

The kit includes 12 tasks that reflect four main themes: Jewish literature; Jewish

calendar and life cycle; the affinity of the Jewish people to the Land of Israel; the

image of the State of Israel as the State of the Jewish people. There are four tasks

for each grade level, each deriving from one of the four themes. The kit is

accompanied by a teacher's guide including very detailed guidelines as to the

administration and scoring of the tasks, as well as recommendations for instruction.

Use of off-the-shelf tests and tasks can contribute to integrated assessment in the learning

processes, improved planning of classroom instruction and increased effectiveness of

decision-making processes in class and in school. Analysis of the results of the various

tools enables teachers to plan intervention activities in line with the various needs of their

students, such as: create learning groups according to difficulties detected, select

appropriate learning material, reinforce topics not learned properly, address mistakes and

common misconceptions, etc. Schools can set priorities regarding resource allocation and

professional development of teaching staff in subjects in which many students encountered

difficulties.

The use of formative measurement and assessment tools and strategies requires a cultural

change at the school and class level, in favour of cultivating a culture that does not only

focus of grades, but also on the learning process itself. Within such a culture students

receive feedback, support and assistance which is based on the assessment process. Most

importantly, this culture ensures that student assessment is used effectively.

23

The School-Based Assessment Coordinator

RAMA attaches great importance to the development of school-based assessment processes

to assist schools in defining their information needs to function optimally and to reap the

utmost from internal and external assessment. To assist schools in collecting valid data and

making informed decisions based on these data, RAMA has acted to define a new position

in the education system – the school-based assessment coordinator. Within the

framework of two agreements, the “Ofek Hadash” and the "Oz Le'Tmura" between the

Ministry of Education and the Teachers’ Unions (Histadrut Ha'Morim and Irgun Ha'Morim)

the role of assessment coordinator is included in the list of role holders entitled to

remuneration. According to the agreements, all schools can appoint a school-based

assessment coordinator, provided that he or she has teaching experience and a Master’s

degree in measurement and assessment, or alternately, in another field and has completed

an academic specialization in measurement and assessment.

The role of the school-based assessment coordinator is to take the lead in incorporating a

school-based assessment culture in collaboration with other position holders on the school

staff, and under the supervision of the school principal. A school-based assessment culture

assumes that the school community is a learning one that views assessment as a central

component of the teaching-learning-assessment process and uses different assessment

tools through mechanisms that foster school norms and values as an integral part of its

work and/or its learning. The position entails dealing with assessment topics common to all

schools, but mainly with unique topics that address the needs of the specific school and are

congruent with its educational objectives, characteristics, its world view and culture.

The implementation of a school-based assessment culture begins with the design of an

organizational system and school-based mechanisms that allow for cooperation alongside

the development and management of a repository of internal assessment tools and building

a school-based database. It is believed that the assessment coordinator will set systematic

and continuous change in motion within school and thus the school staff will make the

connection between the curricula and the goals measured in each of the subjects tested.

This information will help interpret the test grades in a meaningful way that allows for the

identification of student strengths and weaknesses and achievement gaps between learners.

24

Utilizing this information will help monitor progress and examine changes in student

performance, and to identify topics and fields in the curricula that require strengthening,

reinforcement, or improvement.

The assessment coordinator advises school staff in all matters regarding the assessment of

student achievements and their progress using varied and innovative assessment methods.

The coordinator acts to foster the perception of “assessment for learning”, that stresses

improvement and streamlining of school teaching and learning methods based on

information gleaned from measurement data. It is also the coordinator’s responsibility to

help in interpreting data from school-based assessment reports, articles, and other data, and

to extract implications at the specific school level. As part of the assessment culture

fostered by the assessment coordinator, systematic information can be collected about the

myriad educational projects in which the school is involved, which in turn will lead to

system level discussion and conclusions.

Evaluation of Teaching Staff and Investigation of Teaching Practice

Evaluation of teaching staff

Teacher2 and principal evaluation is an important component in the process of promoting

teaching and learning quality (Isore, 2009). For years principals have evaluated teachers

and supervisors have evaluated principals, each in their own way: at different times, with

different tools and with respect to different aspects.

As part of the “Ofek Hadash” reform, a new promotion scale for teachers and other

educationalists3, vice principalsand principals was introduced, including a number of

junctures at which summative evaluation is required. The need for summative evaluation as

part of the career path of teachers and principals creates in effect a continuous, organized

and uniform evaluation procedure for the whole system for promoting teachers, other

educationalists, vice principals and principals at different stages.

2 This article will not expand on the teacher evaluation model at the high school level which was addressed in the “Oz LeTemura” reform agreement, as it is still in its infancy.

3 Other educationalists: kindergarten teachers, counselors, health professions, interns

25

The teacher evaluation tool was developed by RAMA in 2010, in collaboration with

representatives of the Ministry of Education and its districts. Tool development was

accompanied by many focus groups comprised of supervisors, principals and teachers,

using existing tools found in Israel and other countries. The tool was designed to reflect the

complexity of the teacher’s work and to create a common language among all Ministry of

Education entities (supervisors, principals, teachers and Ministry personnel) in relation to

all aspects of teacher performance.

The teacher evaluation tool is based on the following four meta-indicators:

Meta-indicator 1: Role perception and professional ethics refers to aspects related

to identification with the teaching and educational role and commitment to the

organization and the system.

Meta-indicator 2: Subject knowledge refers to knowledge of the subject and its

teaching.

Meta-indicator 3: Educational and learning processes refers to aspects related to

lesson design and organization, teaching methods, learning and assessment and a

supportive learning environment.

Meta-indicator 4: Partnership in a professional community refers to aspects

relating to the teacher’s participation in the professional community of the school

and that of the subject.

One of the important achievements deriving from the tool development process for teacher

evaluation is system-wide agreement that these four meta- indicators provide a uniform and

structured answer to the question: “Who is a good teacher?”

Teacher evaluation is carried out by principals based on systematic data collection,

including documented observations of teachers and gathering of other relevant material.

Additionally, for evaluation purposes teachers as requested to complete a self-evaluation

questionnaire based on the meta-indicators. The principal and teacher then meet for a

feedback meeting aimed at discussing the gaps in their evaluations. The entire evaluation

process is carried out on an online system specifically developed for this purpose. For

further information see the RAMA website in the “Evaluation of Educationalists and

26

Administrators”, on the topic of “The Teacher Evaluation Tool” and "Evaluation of Other

Educationalists".

The school principal evaluation tool was designed by RAMA based on the perception of

the principal’s role as outlined by the Israeli Institute for School-Based Leadership - “Avnei

Rosha” - that includes the following four meta-indicators:

Meta-indicator 1: Formulating a vision and leading school-based policy refers to

the following aspects of leadership: formulating an education-based vision, teaching

and learning in a socio-environmental context, designing a school-based work plan

and tracking its implementation.

Meta-indicator 2: Improving teaching, learning and education refers to the

following aspects of leadership: planning learning in school, institutionalization of

school-based assessment and learning; school culture and climate that support

learning and learners, and accountability.

Meta-indicator 3: Leading and professional development of school staff refers to

the following aspects of leadership: management of the professional development of

school staff; promoting a school-based professional community and cultivate

school-based leadership, and professional ethics.

Meta-indicator 4: Mutual relations with the community refer to maintaining

mutual relations with the community of parents and the community at large.

For further information see the RAMA website in the “Teacher and Management Personnel

Evaluation”, on the topic of “The Tool for Evaluating Principals and Vice-Principals”.

These two tools, the teacher evaluation tool and the school principal evaluation tool,

include descriptions of teacher/principal behaviour in each of the four meta-indicators, at

different performance levels which represent a professional development scale. The

teacher/principal evaluation is determined based on the performance level on each of the

detailed components of the tool in each of the meta-indicators. As such, these tools provide

a framework for the professional development of all educators, and allow for the

identification of needs at the individual, school and system level. The other evaluation tools

(the tool for evaluating vice-principals, the tool for evaluating kindergarten teachers, the

27

tool for evaluating interns, the tool for evaluating counsellors, the tool for evaluating health

professions) were developed based on the same principles that have been described here,

but the dimensions are suited to each particular population.

International Teacher Survey - TALIS

Israel participates in the Teaching and Learning International Survey (TALIS) conducted

within the framework of the OECD Education Administration. The survey is designed to

help decision makers to formulate policy that defines suitable conditions for promoting

effective teaching and effective schools. Participation in the study also allows for learning

about differences in educational policy between countries, and about the influence of this

policy on the school environment.

The study is based on a set of indicators that provide a solid and coherent description of a

“healthy” school system and is based on measurable variables of the functioning of

education systems and their performance. The set of indicators is founded on a conceptual

framework of “teaching at its best” and on the school-based conditions that make it

possible. Thus, for example, study provides information about: teachers’ teaching practices,

their beliefs and attitudes, the functioning and actions of school leadership, evaluation and

feedback that teachers receive regarding their work, classroom and school climate,

teachers’ sense of self efficacy and their work satisfaction.

The Use of Test and Survey Data for Research and for Program and Project Evaluation

The desire to integrate measurement and assessment as an integral part of educational

programs reflects a growing trend in recent years, a trend that attaches importance to

research-based educational interventions and to findings about their efficacy, and they have

become a prerequisite for program implementation. In a policy formulated by the United

States government at the beginning of the millennium (as it appeared in the No Child Left

Behind Act, 2001), educational interventions were required to prove that they were

evidence- based and predicated on meticulous and systematic research that produced valid

findings before receiving funding for implementation. However the demand for a research

base does not necessarily define the validity of the findings. The quality of the findings

28

depends on the research method. The most valid findings are produced from controlled

experimental studies based on randomized controlled trials (RCT). Such studies are not

common in education and are complicated to perform due to the variance between schools,

the complexity of schools as organizations, the difficulty in uniformly implementing them

and in controlling experiment conditions as well as ethical problems involved in conducting

such experiments (providing certain programs to certain students and preventing them from

others). In light of these difficulties there are researchers who claim that the education field

differs from clinical fields even to the extent that it is impossible to conduct randomized

and controlled experimental studies. On the other hand, there are those who claim that the

importance of the education field demands that a special effort be made in order to produce

research- based findings, as otherwise choosing policy, interventions or educational

practices will be a matter of taste or fashion forced on schools without examining the cost

and benefit.

Though the debate between these approaches is growing the development in the field

cannot be stopped. The desire to produce research-based data that will allow for informed

decision making can also be realized by other research methods, more accessible and

simple to perform, that provide findings at a reasonable level of validity. Among these

methods, the best one is quasi-experimental research that is based on comparison groups (in

terms of characteristics). At lower levels of validity are studies conducted according to the

method of pre- post-tests; after them – studies based on correlations and those based on

case studies. At the bottom of the list are anecdotal studies through which generalization is

not possible and therefore their findings are not considered to be valid according to

accepted standards.

RAMA strives to produce the most valid data and as such its studies are based on

randomized controlled trial (RCT) research methods, quasi-experiments, pre-post studies,

correlations and sometimes also on case study studies. In this sense, RAMA’s operating

theory is closer to the perception according to which education policy must be based on

valid findings as much as possible despite the uniqueness of the field and the difference

compared to clinical fields. The qualitative ranking of the various research methods is

reflected in practice in comparison databases of education studies – databases that evaluate

29

the quality of existing knowledge in different areas of the educational endeavour and

summarize it from a critical perspective. One of the important databases is the What Works

Clearinghouse of the U.S. Department of Education's Institute of Education Sciences. From

this and similar databases one may learn about different programs for which there are valid

findings that testify their efficacy level. These databases evaluate the quality of programs

based on various studies.

RAMA’s activities are slightly different since it deals mainly with applied and not in

theoretical research of education. The purpose of assessment is to provide information

about results of programs or policy, as a means for their improvement, in other words

information that will be meaningful to decision makers in real time (Weiss, 1998).

Assessment focuses on findings that can be implemented in the field, while research aims

to reach data that can be generalized and used to advance science. Research focuses on the

possible theoretical contribution and from this it derives the research question and method,

whereas assessment derives questions and goals from the needs of the field and those of

policy makers, and therefore its priority is for immediate, practical and specific uses.

Despite the differing goals, research methods are similar and RAMA bases its work on

controlled assessment schemes that will produce valid findings.

The program or project evaluation process is comprised of several milestones: at the first

stage expectations are coordinated with the evaluation requestor, and these include the

definition of evaluation goals versus project goals. At the second stage, a research team that

suits the nature of the required study is established (the study usually being a combination

of qualitative and quantitative methods). In the case of large-scale projects, a steering

committee is also appointed to operate in conjunction with the study team. The latter

formulates an evaluation proposal that includes schedule and budget and also a literature

review of the field. The third and final stage includes performing the evaluation and

preparing a summary report. The report is submitted for review and following corrections is

published. At the end of the process a discussion is held with the participation of relevant

entities in the Education Ministry to study the findings in order to learn and draw

conclusions.

30

The programs and projects that RAMA evaluates are varied – in terms of content

(educational, social, value, institutional and others) as well as scope and complexity.

Requests for evaluation are received first and foremost from the Education Ministry

administration and its various divisions. Nonetheless, when planning a study scheme an

attempt is made to collect knowledge and information that will be relevant to a wider circle

than that of the original requestor, and findings are also published on the RAMA website.

Each program is evaluated according to a set framework. This framework addresses

context, input, processes and outputs that RAMA evaluates (Stufflebeam, 1983), while

adjusting for the appropriate research method for the program and the requestor’s needs.

Examining the context of a program begins by identifying needs and mapping the target

audience and the environment in which the program will be conducted. This is followed by

learning about the goals of the program and how it is suited to the context. Sometimes

evaluation even helps program developers to articulate for themselves the underlying

program principles, goals and perceptions as it requires contending with structured

questions about the nature of the project. Inputs are reviewed by examining the program

rationale (with the help of experts in the specific field and literature review) and the

suitability of inputs to the needs. Process review includes analysis of program

implementation in terms of implementation methods, target audience, operating entities,

resource utilization, etc. (through surveys, observations and interviews). Outputs are

examined by measuring the effect on the target audience over different time periods.

Educational program evaluation activities also provide formative evaluation that collects

information in real time to improve implementation as well as summative evaluation that

examines project products. In summative evaluation the main goal is to evaluate program or

policy effectiveness. Effectiveness evaluation investigates program implementation

methods and its results. Sometimes evaluation requests deviate from the accepted

distinction between formative and summative evaluation. Thus for example is the request to

monitor and control various programs operated by Education Ministry divisions. It is

important to stress that collecting, monitoring and controlling data is not RAMA’s

responsibility, and should be performed by program leaders. Nonetheless, monitoring and

control data serve RAMA as part of the information required for project assessment.

31

Many tools and data are used in evaluations: structured interviews, observations, focus

groups, achievement data (based on Meitzav, matriculation and other tests), data about

school climate and pedagogical environment, data from the evaluation of teaching staff and

international studies. RAMA adapts varied research tools to the changing needs of the types

of different projects and to the type of evaluation required.

As stated, RAMA evaluates a wide-range of programs for varied reasons – some due to

their broad scope and their importance to the system (evaluation of the “Ofek Hadash”

reform, for example). There are others that are evaluated despite their limited scope. For

example, programs that have the potential to influence the entire system if they are

successful (this is the case with the program based on the personal educational model

operating in the city of Bat Yam, or evaluation of the centers for adults completing their

secondary education). Another example are programs that are the focus of attention of the

requesting entity (for example the Immigrant Absorption Division’s program for student

dropout prevention or the Amirim excellence program). Another type of program is

evaluated as part of an effort to pool resources from a system perspective – several

programs are often operated by different entities although their common aspects exceed the

differences between them since they all have the same goal, and often even the same

pedagogical method (this is the case with programs for increasing matriculation eligibility,

programs for encouraging reading at the kindergarten level, etc.). In these instances the

purpose of evaluation is to map the field, describe the variety of programs and finally to

examine the effect of the programs taken together.

In the evaluation of programs RAMA gives priority to a research scheme that will produce

the most valid results. Accordingly, when possible, a controlled experimental study is used.

Notwithstanding, implementing such a scheme involves many difficulties as described

above. A good example of this is the program for integrating computer mediated

communication technologies in elementary schools that was operated by the Education

Ministry. In light of resource limitations the Ministry sought to implement the program in

periphery areas of the country. The pedagogical logic underlying the decision is clear, yet

in terms of research it does not allow for a controlled experiment study. Nevertheless an

attempt was made to create an experimental scheme that would answer the questions as to

32

whether schools that received advanced computer mediated communication resources

improve teaching and learning processes and student achievements. For the study

experimental group 60 schools that took part in the program were randomly sampled, and

for the control group 60 schools that did not participate in the program and whose computer

mediated communication level was low were sampled. The experimental group was invited

to a conference in which it was explained that the attending schools were selected for a

study and they received tools for implementing the plan. Schools selected for the control

group were not told anything. The problem in the study arose of course when schools in the

control group began to obtain technological equipment on their own or with the help of

extra-program entities (local government for example), since after all it is not possible to

halt/stop their progress. In this sense the difference between a clinical medical study and a

study in education is obvious, as in the latter the effect of a placebo pill cannot be simulated

and it is impossible to control or oversee the control group in a way required by the

experiment.

Other difficulties in performing evaluations include: lack of organized databases in many

programs, delayed request to RAMA to combine an evaluation scheme in a project in

advanced implementation stages, the large number of studies, tests and questionnaires with

which schools must contend, the tension between complex processing of data and

preserving the relevance of knowledge produced from them, difficulty in attaining

meaningful data (compared to the relative ease of gathering data of limited significance

such as satisfaction surveys), lack of clarity regarding program goals (even for program

developers) and political changes that create frequent changes in the Ministry's interest in

certain programs.

Despite these challenges, RAMA makes an effort to integrate advanced evaluation models

and data processing methods with the intention of expanding the knowledge gathered

through project evaluation and make it even more meaningful in decision making

processes.

The various research and evaluation reports prepared by RAMA are presented on the

RAMA website, under the “Studies/Research and Project Assessment” tab.

33

Assessment in the Service of Learning – Summary

Effective use of measurement and evaluation, at the school and at the system level, is

dependent on the continued existence of a learning culture that views test and survey

findings and internal assessment findings as having the potential to create genuine

improvement rather than alleged improvement – this is the essence of “measurement in the

service of learning”.

The natural tendency of any person being assessed is to try to improve his or her ranking

with respect to assessment index in any way possible, especially when the assessment is

high-stakes, and this exists in many and varied fields. For example, in the field of medicine

the demand to publish hospital mortality data seems to have brought about a tendency

among some of the hospitals not to admit patients whose condition is serious. Publishing

the number of operations per physician in the United States lead hospital physicians to

unjustifiably and disproportionally increase the number of operations performed even

when they were deemed unnecessary.

In August 2012 the High Court of Justice in Israel published its decision/judgement

(1245/12) that accepts the demand of the Movement for Freedom of Information in Israel to

publish Meitzav grades at the school level, and rejects the position of the Ministry of

Education and of RAMA according to which the damage expected from the publication

exceeds the benefit. This judgement may have a significant effect on Meitzav tests and

perhaps also even on their administration in their current format. For details see RAMA

website in “Meitzav” – “General Background – external and internal Meitzav” tab in the

link (“Policy for Publishing Meitzav Results ”).

RAMA sees it main purpose in assisting the education system to improve through

informed use of measurement and assessment. The concept of “measurement in the

service of learning” is not only a slogan – RAMA views it as the justification for its

existence. RAMA believes in combining external and internal assessment. The two types of

assessment are related to differing approaches to generate change processes in

organizations in general and in educational organizations in particular. External assessment

deals in overall change that takes place in education systems and is executed at the national

34

level by the government. This approach reflects top down change. Internal assessment deals

in bottom up change and is led by the initiative of teachers or the school as an organization

– they are the ones that determine activity patterns. Education literature recommends the

effective combination between “bottom up” and “top down” change. RAMA also believes

that only dialogue and a process that acts simultaneously in both modes can create the

desired change. Creating a dialogue that will lead to shared consensus (at the school,

Education Ministry and the social level) is the desired path (Levin, Glaze & Fullan, 2008).

This assessment model, that includes large-scale external assessment alongside professional

internal assessment, strives to limit the role of large-scale assessment as the sole

determining assessment, and alongside it poses standards for beneficial internal assessment.

Class assessment is still far from providing quality information and from gaining

recognition and trust, however it must be recognized that large-scale assessment cannot fill

these needs and we must direct efforts to create a proper balance between the two. To this

end there is a need for professional training and development in the field of measurement

and assessment. The more professional the school assessment processes, less the influence

of external large-scale tests.

In summary, effective measurement and assessment require the existence of these

conditions: cooperation between all stakeholders, and between them and the measuring

entity; agreement as to educational goals and objectives; agreement about measurement

goals and honing them; good understanding of the series of tests and interpretation of their

findings; access to findings and transparency of results; continuity and consistency between

measurement cycles; fairness towards those assessed, and moderation of threatening

aspects; professionalism – and integrity. Only by this will we have “assessment in the

service of learning”.

35

Bibliography

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2, 67-90.

Hamilton, L.S. (2008). High-stakes testing. In N. J. Salkind (Ed.), Encyclopedia of Educational Psychology, Vol. 1 (pp. 465-470). Thousand Oaks, CA: Sage.

Isoré, M. (2009). Teacher Evaluation: Current Practices in OECD Countries and a Literature Review. OECD Working Papers No. 23, OECD Publishing.

Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. In J. Herman & E. Haertel (Eds.), Uses and misuses of data in accountability testing. Yearbook of the National Society for the Study of Education, Vol. 104, Part 2, 99-118. Malden, MA: Blackwell Publishing.

Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 531-578). Westport, CT: Praeger.

Levin, B., Glaze, A., & Fullan, M. (2008). Results without Rancor or Ranking: Ontario’s Success Story, Phi Delta Kappa, 90 (4), 273-280.

Nichols, S. N. & Berliner, D. C. (2007). Collateral Damage: The effects of high-stakes testing on America’s schools. Cambridge, MA: Harvard Education Press

Stufflebeam, D.L. (1983). The CIPP Model for Program Evaluation. In G.F. Madaus, M.Scriven, and D.L. Stufflebeam (Eds.), Evaluation Models: Viewpoints on Educational and

Human Services Evaluation. Boston: Kluwer Nijhof.

Weiss, C. (1998). Evaluation: Methods for studying programs and policies. New-Jersey: Prentice Hall.

36

cms.education.gov.il · web viewassessment in the service of learning: theory and practice...

Documents