Download - Messick’s framework
MESSICK’S FRAMEWOR
KWhat Do Evaluators
Need to Know?
Outline
What this report will cover
1. Concepts of Validity
2. Messick’s Contributions
3. Messick’s Framework
Validity Concept
The concept of validity has historically seen a variety of iterations that involved “packing” different aspects into the concept and subsequently “unpacking” some of them.
Points of broad consensus
Validity if the most fundamental consideration in the evaluation of the appropriateness of claims about, and uses and interpretations of assessment results.
Validity is a matter of degree rather than all or none.
SICI Conference 2010North Rhine-Westphalia Quality Assurance in the Work of “Inspectors”
Main controversial aspect
…empirical evidence and theoretical rationales…
Validity is “an integrated evaluativejudgment of the degree to whichempirical evidence and theoreticalrationales support the adequacy andappropriateness of inferences and actions based on test scores or other modes of assessment.” Messick, S. (1989). Validity. In R. Linn
(Ed.), Educational Measurement (3rd ed., pp.13-103). Washington, DC: American Council on Education/Macmillan.
Broad, but not universal agreement
(for a dissenting viewpoint, Lissitz & Samuelsen, 2007)
Karen Samuelsen, Assistant Professor in the Department of Educational Psychology and Instructional Technology.
Robert W. LissitzProfessor of Education in the College of Education at the University of Maryland and Director of the Maryland Assessment Research Center for Education Success (MARCES).
Broad, but not universal agreement(for a dissenting viewpoint, Lissitz & Samuelsen, 2007)
It is the uses and interpretations of an assessment result, i.e. the inferences, rather than the assessment result itself that is validated.
Validity may be relatively high for one use of assessment results by quite low for another use or interpretation
Messick’s contributions
According to Angoff (1988), theoretical conceptions of validity and validation practices have change appreciably over the last 60 years largely because of Messick’s many contributions to our contemporary conception of validity.
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1951 Cureton , the essential feature of validity was “how well a test does the job it was employed to do” (p.621)
1954 American Psychological Association (APA) listed four distinct types of validity
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
Types of Validity
1. Construct Validity refers to how well a particular test can be show to assess the construct that it is said to measure.
2. Content Validity refers to how well test scores adequately represent the content domain that these scores are said to measure. Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-Learning pp. 73-91
3. Predictive Validity is the degree to which the predictions made by a test are confirmed by the later behavior of the tested individuals.
4. Concurrent Validity is the extent to which individuals scores on a new test correspond to their scores on an established test of the same construct that is determined shortly before of after the new test. Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-Learning pp. 73-91
1966 APA, Standards for Educational and Psychological Tests and Manuals, criterion-related validity and predictive validity were collapsed into criterion-related validity.
1980 Guion, three aspects of validity referred to as “Holy Trinity.”
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1996 Hubley & Zumbo, the Holy Trinity referred by Guion, means that at least one type of validity is needed but one has three chances to get it.
1957 Loevinger, argued that construct validity was the whole of validity, anticipating a shift away from multiple types to a single type of validity.
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1988 Angoff, validity was viewed as a property of tests, but the focus later shifted to the validity of a test in a specific context or application, such as the workplace.
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1974 Standards for Educational and Psychological Tests (APA, American Educational Research Association and National Council on Measurement in Education) shifted the focus of content validity from a representative sample of content knowledge to a representative sample of behaviors in a specific context.
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1989 Messick professional standard s were established for a number of applied testing areas such as “counseling, licensure, certification and program evaluation
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
1985 Standards (APA, American Educational Research Association and National Council on Measurement in Education validity was redefined as the “appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores. Ruhe V. and Zumbo B.
Evaluation in Distance Education and E-Learning pp. 73-91
1985 the unintended social consequences of the use of tests – for example, bias and adverse impact---were also included in the Standards (Messick 1989).
Ruhe V. and Zumbo B.Evaluation in Distance Education and E-Learning pp. 73-91
Validation Practice is “disciplined inquiry” (Hubley & Zumbo, 1996) that
started out historically with calculation of measures of a single aspect of validity (content validity or predictive validity)
Building an argument based on multiple sources of evidence (e.g. statistical calculations, qualitative data, reflections on one’s own values and those of others, and an analysis of unintended consequences)
These calculations are based on logical or mathematical models that date from the early 20th century (Crocker & Algina, 1986)
Messick (1989) describes these procedures as fragmented, unitary approaches to validation
Hubley and Zumbo (1996) describe them as “scanty, disconnected bits of evidence…to make a two-point decision about the validity of a test”
Cronbach (1982) recommended a more comprehensive, argument-based approach to validation that considered multiple and diverse sources of evidence
Validation practice has also evolved from a fragmented approach to a comprehensive, unified approach in which multiple sources of data are used to support an argument
Messick’s framework
What is Validity? Validity is “an integrated evaluative judgment
of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989)
Validity is a unified concept, and validation is a scientific activity based on the collection of multiple and diverse type of evidence (Messick, 1989; Zumbo, 1998, 2007)
Messick’s Conception of Validity
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
In terms of functions(interpretation vs.
use)
Basis for justifying validity
(evidential basis vs. consequential
basis)
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
refer to traditional scientific evidence traditional
psychometrics
relevance to learners and
to society, and to cost
benefit
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
Consequential basis is not about poor test practice
rather, the consequences of testing refer to the
unanticipated or unintended consequences
of legitimate test interpretation and use
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
refers to underlying values, including
language or rhetoric, theory,
and ideology
JustificationOutcomes
Test Interpretation Test Use
Evidential basis
Construct Validity(CV)
CV + Relevance/+ Utility(RU)
Consequential basis
Value Implications(CV+RU+VI)
Social Consequences(CV+RU+VI+UC)
Defined as the unintended
social effects of testing,
including the actual and
potential effects of test use,
especially issues as bias,
adverse impact and distrib
utive
justice and any other indirect
effects, both actual/potential
and positive/negative, of using
the test on the overall
educational system
The four facets
The evidential basis of Messick’s framework contains two facets
1. Traditional psychometric evidence
2. The evidence for relevance in applied settings such as the workplace as well as utility or cost-benefit.
Evidential Basis for Test Inferences and Use
The evidential basis for test interpretation is an appraisal of the scientific evidence for construct validity.
A construct is a “definition of skills and knowledge included in the domain to be measured by a tool such as a test” (Reckase, 1998b)
The four traditional types of validity are included in this first facet.
Evidential Basis for Test Inferences and Use
The evidential basis for test use includes measures of predictive validity (e.g., correlations with other tests of behaviors) as well as ultility (i.e., a cost-benefit analysis)
Predictive validity coefficients re measures of behavior to be predicted from the test (e.g., a correlation between scores on a road test and a written driver qualification test)
Cost- benefit refers to an analysis of costs compared with benefits, which in education are often difficult to quantify.
The consequential basis of Messick’s framework contains two facets
1. Value Implications (VI)1. (CV + RU + VI)
2. Social Consequences 1. (CV + RU + VI + UC)
Value Implications
Rhetoric Theories Ideologies
Value Implications: The Dimensions
Value implications requires an investigation of three components
Rhetoric or value -laden language and terminology Value-laden language that conveys both a
concept and an opinion of concept Underlying theories
Underlying assumptions or logic of how a program is supposed to work (Chen, 1990)
Underlying ideologies A complex mix of shared values and beliefs that
provide a framework for interpreting the world (Messick, 1989)
RhetoricIncludes language that is discriminatory, exaggerated, or over blown, such as derogatory language used to refer to the homeless.
In validation practice, the rhetoric surrounding standardized tests should be critically evaluated to determine whether these terms are accurate description of knowledge and skills said to be assessed by a test (Messick, 1989)
Theory
The second component of the value implications category is an appraisal of the theory underlying the test. A theory connotes a body of knowledge that organizes, categorizes, describes, predicts, explains and otherwise aids in understanding phenomenon and organizing and directing thoughts, observations and actions (Sidan& Sechrest, 1999)
Ideology
The third component of value implications is an appraisal of the “broader ideologies that give theories their perspective and purpose (Messick, 1989)
An ideology is a “complex configuration of shared values, affects and beliefs that provides, among other things, an existential framework for interpreting the world.” (Messick, 1989)
Values implications challenge us to reflect upon:
a. The personal or social values suggested by our interest in the construct and the name/label selected to represent that construct
b. The personal or social values reflected by the theory underlying the construct and its measurement
c. The values reflected by the broader social ideologies that impacted the development of the identified theory
Messick 1980, 1989
Social Consequences
Social consequences refer to consequences for society stemming from the use of a measure
Remember that construct validity, relevance and utility, value implications and social consequences all work together and impact one another in test interpretation and use.