Download - Malaysia, Rigour, Benchmarking and ICAS
13-09-2017
1
Malaysia, Rigour, Benchmarking and ICASContext, purpose, validity, rigour and
psychometrics
Professor Kelvin GregoryUNSW Global
It has been almost 20 years sinceI was here in Kuala Lumpur.
13-09-2017
2
Outline Malaysian educational context Assessment defined and principles ICAS, validity, uses and interpretations Rigour and higher order thinking Bloom’s cognitive taxonomy
– Detail– Exercise
Webb’s depth of knowledge– Detail– Exercise
Hess’s cognitive rigour matrix– Detail– ICAS Examples
Psychometrics
A school year (204 student days)
Term 1(Jan-Jun)99 days
Term 2(Jun-Nov)105 days
Mid-term breaks: 9 days in each term. Midyear break: 16 daysPrimary education: 6 yearsSecondary education: 5 years What progress should a student make?
How will you know a student is making progress?ICAS is in Term 1? Why?
13-09-2017
3
Education Minister Datuk Seri Mahdzir Khalid As the current Minister of Education, he oversees policies at pre-school, school and pre university levels. He is now leading large scale transformation efforts as outlined in the Malaysia Education Blueprint (MEB) 2013-2025 in putting Malaysia as among the world’s best in education provision. This is where he instituted key transformational initiatives as a leader in education in incorporating best practices within the ecosystem.
https://asia.bettshow.com/speakers/yb-dato%E2%80%99-seri-mahdzir-bin-khalid
External assessments andgovernment/school policyPolices, procedures and guidelines for externalassessment useHow should the external assessment integratewith the school?Most teachers already know the most proficientlearners. What is really needed? What is a soundpurpose for the external assessment? Then howwill the assessment fulfil that purpose?
13-09-2017
4
Malaysia Education Blueprint 2013-2025 “Revamp national examinations and school-based assessments to gradually
increase percentage of questions that test higher-order thinking. By 2016, higher-order thinking questions will comprise at least 40% of questions in UPSR and 50% in SPM. This change in examination design means that teachers will focus less on predicting what topics and questions will come out and drilling for content recall. Instead, students will be trained to think critically and to apply their knowledge in different settings. Similarly, school-based assessments will also shift their focus to testing for higher-order thinking skills.” (E-11)
– Malaysian Certificate of Education or Sijil Pelajaran Malaysia (SPM)– Year 6 Primary School Evaluation Test or Ujian Pencapaian Sekolah Rendah (UPSR) National
Examination
Malaysia Education Blueprint 2013-2025 “As the TIMSS and PISA international assessments have
demonstrated, our students struggle with higher-order thinking skills.” (E-11)
“The aspiration is for Malaysiato be in the top third of countries in terms of performance ininternational assessments, as measured by outcomes in TIMSS and PISA, within 15 years.” (E-9)
13-09-2017
5
Teachers make a difference
“Our research indicates that there is a 15% variability difference in student achievement between teachers within the same schools.”
Deborah Loewenberg Ball, Dean of Education, University of Michigan
“What Matters Very Much is Which Classroom?”“If a student is in one of the most effective classrooms he or
she will learn in 6 months what those in an average classroom will take a year to learn. And if a student is in one of the least effective classrooms in that school, the same amount of learning take 2 years.”
13-09-2017
6
“Excellent examples exist” (E-6)
Education Ministry moves to decentralise some matters “Previously when it comes to assessment, everything is
given to the examination board, so like school-based assessment...schools can do the assessment.”
Education Minister Datuk Seri Mahdzir Khalid10 December 2016
http://www.themalaymailonline.com/malaysia/article/education-ministry-moves-to-decentralise-some-matters#OoElB22H61i8SEwh.99Return
13-09-2017
7
Assessment Defined
“Assessment is the systematic collection, interpretation and use of information to give a deeper appreciation of what learners know and understand, their skills and personal capabilities, and what their learning experiences enable them to do.”– Note the parts of the definition– How does this fit with you? With your school experience?
Each school and school system should have an operationalised definition of assessment
Northern Ireland Curriculum (2013) Guidance on Assessment in the Primary School
Educators know that more assessment is not the key to learning– “Children don’t grow by weighing them”
Research has shown that educators experience difficultiesin designing appropriate assessments
There is a reasonable argument to integrate ICAS into the school’s assessment system– Not more assessment– Rather better assessment– But you need a system to use the assessment and the
assessment data Enable educators to learn from the ICAS assessments
Use it as an external benchmark And an external validation of their work
13-09-2017
8
Principles of Assessment The following five principles underpin assessments:
– complementary to and supportive of learning; – valid and reliable;– fit for purpose and manageable; – supports educator’s professional judgement; and – supports accountability.
These principles all require careful thought and support before any assessment usage. Some of these principles arecontentious. Can you identify which ones?
An example assessment design Make a unit test in three parts
– Part 1: Definitions (tell learners this)– Part 2: Problems practiced in class, ranging from simple to more
challenging– Part 3: Unseen problems
Routine ones like seen in class Non-routine, novel, unseen
– Construct the test so that learners can pass if they do well on Parts 1 and 2, but they only achieve the highest achievement standard by answering all problems correctly
13-09-2017
9
Vygotsky: Zone of Proximal Development Zone of Proximal Development is what a child can
do with assistance today– It’s the learning zone
Some core uses of assessment– Use assessments for learning
to locate students and then teach
– Use assessments oflearning for certification
– Use external low stakesassessments to guideteachers, as an external reference todevelop teaching and learning
Potentialdevelopmentarea
ZPD: The learning zone
Child’s currentachievement
Return
A word or two about validity Validity is a central concept in assessment Validity refers to the interpretations and uses of the assessment scores
– Desired/intended interpretations and uses must be specified ahead of usage– And then evidence must be gathered that the assessment scores/findings
are well supported Evidence from the commission of the assessment through to item writing through to reports
and report usage Support from theory and research
– Each ICAS subject assumes that there is one dominant cognitive ability or proficiency underpins the learners responses Except for ICAS English, English reading proficiency should be a minor skill requirement
13-09-2017
10
Some Draft Statements about ICAS ICAS affirms educational policies and practices through
assessments with demonstrable links to curricula, and best teaching and assessment practices.
ICAS is an external high-quality suite of engaging and challenging assessments designed to assist and recognise learning.
ICAS provides an independent benchmark of learner achievement and progress.
Reflection on the Draft Statements about ICAS How do you interpret these statements within the Malaysian
context?– “demonstrable links to curricula, standards and best teaching and
assessment practices”– “engaging and challenging assessments designed to assist and
recognise learning.”– “an independent benchmark of learner achievement and progress”
What would you change? Why? What is missing?
13-09-2017
11
Some uses of ICAS As benchmarking test
– ICAS should be demonstrably aligned to the curriculum and assess what should be learnt This is being done for Australia, New Zealand and England
– It needs to be done with the revised Malaysian syllabi
– It can be used as a high-quality external assessment An external reference point This quality would be reinforced if a relationship between ICAS and the Malaysian
internal and external tests could be established
ICAS has design features which have implications for school assessment
Interpreting ICAS scores Atomistic level
– Learner responses to individual items Response patterns
– Make sense of learner attributes by looking for patterns Summary scores at either domain or sub-domain levels, gathered over
time– Interpreted in comparison with other learners or groups of learners – Interpreted in comparison to an achievement scale or curriculum– Monitor learning progress at individual, class and school levels
What are your intended interpretations of learner responses and scores?
13-09-2017
12
ICAS Processes ICAS is developed to a plan
– Systematic guidance for all test development activities: construct; desired test interpretations; major sources of validity evidence; clear purpose;
desired inferences; psychometric model; timelines; security; quality control
Content definition and test specifications Item and test development - more about this soon Test production and administration Scaling and reporting
– Scaling uses the Rasch model As in TIMSS 1995, 1995, and PISA studies
All done to increasingly meet AERA/APA/NCME Standards for Educational and Psychological Testing
ICAS Assessment Frameworks Internal documents developed and used by subject-assessment experts
– Guides and shapes item and test development so that the assessment satisfies the purpose
– Enables/facilitates the establishment of a validity framework A network of evidence, theory and argument that supports the intended interpretations and uses of the scores
Frameworks contain:– Construct definitions, main and sub domains– Content/curriculum maps
Australia links have been done England and New Zealand curriculum links are being identified We need to explore the links with the revised Malaysian syllabi documents
– Test blueprint Hess’s cognitive rigor matrix-item allocation
– Combining Bloom’s taxonomy and Webb’s depth of knowledge
Return
13-09-2017
13
The need for more higher order thinking and rigour
Malaysian Education Blueprint 2013-2025“Thinking skills: Every child will learn how to continue acquiring knowledge throughout their lives (instilling a love for inquiry and lifelong learning), to be able to connect different pieces of knowledge, and to create new knowledge. Every child will master a range of important cognitive skills, including critical thinking, reasoning, creative thinking, and innovation. This is an area where the system has historically fallen short, with students being less able than they should be in applying knowledge and thinking critically outside familiar academic contexts.” (E-10)
Malaysian Education Blueprint 2013-2025 “Revamp national examinations and school-based assessments to
gradually increase percentage of questions that test higher-order thinking. By 2016, higher-order thinking questions will comprise at least 40% of questions in UPSR and 50% in SPM. This change in examination design means that teachers will focus less on predicting what topics and questions will come out and drilling for content recall. Instead, students will be trained to think critically and to apply their knowledge in different settings. Similarly, school based assessments will also shift their focus to testing for higher-order thinking skills.” (E-11)– This is call for increased rigour
13-09-2017
14
Higher-order thinking and Cognitive RigourActivity: Take a minute and write your definition of each term
as it relates to Malaysian teaching, learning and assessment – Higher order thinking– Cognitive rigour
Cognitive Rigour
• The kind and level of thinking required of learners to successfully engage with and solve a task • Cognitive rigor is marked and measured by the depth and extent
students are challenged and engaged to demonstrate and communicate their knowledge and thinking.
• It also marks and measures the depth and complexity of learner learning experiences.
• The ways in which learners interact with content• And this is where ICAS excels
13-09-2017
15
Cognitive RigourImagine a primary class has just read some version of a short story.
– What is a basic comprehension question you might ask?– What is a more rigorous question you might ask?
What system might you use to guide your questioning?– Questions seek to elicit evidence of specific cognitive
(latent) functioning– How will you know that your questions are suitable for
eliciting this information?
The Ant and the GrasshopperIn a field one summer's day a Grasshopper was hopping about, chirping and singing to its heart's content. An Ant passed by, bearing along with great toil an ear of corn he was taking to the nest."Why not come and chat with me," said the Grasshopper, "instead of toiling and moiling in that way?“"I am helping to lay up food for the winter," said the Ant, "and recommend you to do the same.“"Why bother about winter?" said the Grasshopper; "We have got plenty of food at present." But the Ant went on its way and continued its toil.When the winter came the Grasshopper had no food and found itself dying of hunger - while it saw the ants distributing every day corn and grain from the stores they had collected in the summer. Then the Grasshopper knew: It is best to prepare for days of need.
What is a basic comprehension question you might ask?What is a more rigorous question you might ask?
13-09-2017
16
Cognitive Rigor Refers to the kind and level of thinking required of learners
to successfully engage with and solve a task ICAS uses Karen Hess’s Cognitive Rigor Matrix
– Kind of thinking (the verbs) For this we use Bloom’s Cognitive Taxonomy
– Level of thinking (the depth) How deeply do you have to understand the content to successfully interact with it?
How complex is the content? We leverage Webb’s Depth of Knowledge Return
Revised Bloom’s Taxonomy Revised by his doctoral students
– Including Professor Peter Airasian Defines the kind of knowledge
and type of thinking students are expected to demonstrate in order to answer questions, address problems, accomplish tasks, and analyse texts and topics
13-09-2017
17
Revised Bloom’s Taxonomy has Two Dimensions The Knowledge Dimension (Content and Concepts)
– The subject matter content (knowledge) Factual, conceptual, procedural, metacognitive
The Cognitive Process Dimension (Cognition)– What students must do (thinking) with what they are learning.
Lower order thinking– Remember, understand and apply
Higher order thinking– Analyse, evaluate and create
Revised Bloom’s knowledge dimensionDimension Components
Factual knowledge Terminology, elements and components
Conceptual knowledge Categories, principles, and theories
Procedural knowledge Specific skills and techniques
Metacognitive knowledge General knowledge and self knowledge
13-09-2017
18
Remembering
Analysing
Applying
Understanding
Evaluating
Creating
Exhibit memory of previously learned material by recalling facts, terms, basic concepts, and answers.
Demonstrate understanding of facts and ideas by organizing, comparing, translating, interpreting, giving descriptions, and stating main ideas.
Solve problems to new situations by applying acquired knowledge, facts, techniques and rules in a different way.
Examine and break information into parts by identifying motives or causes.
Make inferences and find evidence to support generalizations. Present and defend opinions by making judgments.
Compile information together in a different way by combining elementsin a new pattern or proposing alternative solutions.
HOT
Knowledge dimensionCognitive process dimension
Remember Understand Apply Analyse Evaluate Create
Factual knowledge
TerminologyElements & Components
Label mapList names
Interpret paragraphSummarisebook
Use mathematics algorithms
Categorise words
Critique article Create short story
Conceptual knowledge
Categories PrinciplesTheories
Describe taxonomy in own words
Describe taxonomy in own words
Write objectives using taxonomy
Differentiate levels of cognitive taxonomy
Critique written objectives
Create new classification system
Proceduralknowledge
Specific skills & Techniques,Criteria for Use
Paraphrase problem solving process in own words
Paraphrase problem solving process in own words
Use problem solving process for assigned task
Compare convergent and divergent techniques
Critique appropriateness of techniques used in case analysis
Develop an original approach to problem solving
Metacognitiveknowledge
General KnowledgeSelfKnowledge
Describe implications of learning styles
Describe implications of learning styles
Develop study skills appropriate to learning style
Compare elements of dimensions in learning styles
Critique appropriateness of particularlearning style theory to own learning
Create an originallearning style theory
Return
13-09-2017
19
Remember (I Know)APPROPRIATE VERBS: Recognize, Observe, List, Acquire,
Remember, Tell, Underline, State, Label, Record, Write, Relate, Match, Memorize, Show, Describe, Repeat, Identify, Name, Know
PRODUCTS:• Chart• Model• Worksheet• Draw a map• Picture• Demonstrate
Understand (I Comprehend)APPROPRIATE VERBS: Report, Communicate, Discuss, Review,
Debate, Generalize, Interpret, Draw, Relate Change, Prepare, Express, Describe, Explain, Paraphrase, Give Main Idea, Translate, Infer, Restate, Transform, Locate, Report, Summarize
PRODUCTS:• Diagram• Time line• Teach a lesson• Diorama• Make a Filmstrip• Make a recording• Game• Report
13-09-2017
20
Apply (I Can Use It)APPROPRIATE VERBS: Apply, Show, Role play, Practice,
Solve, Experiment, Manipulate, Restructure, Construct Models, Illustrate, Employ, Investigate, Operate, Sketch, Use, Interpret, Demonstrate, Dramatize, Transfer, Report, Conduct, Schedule, Classify, Solve
PRODUCTS:• Survey• Diary• Scrapbook• Photographs• Cartoon• Learning Center• Construction• Illustration• Stitchery• Sculpture• Model• Mobile
Analyze (I Can Be Logical)APPROPRIATE VERBS: Analyze, Inventory, Experiment, Investigate,
Diagram, Deduce, Inspect, Differentiate, Contrast, Categorize, Question, Criticize, Separate, Examine, Discriminate, Dissect, Calculate, Survey, Detect, Relate, Distinguish, Compare, Develop, Debate
PRODUCTS:• Graph• Survey• Family Tree• Time line• Questionnaire• Commercial• Diagram• Chart• Report• Fact file
13-09-2017
21
Evaluate (I Can Judge)APPROPRIATE VERBS: Judge, Measure, Rate, Verify, Decide,
Standardize, Estimate, Justify, Select, Validate, Revise, Argue, Evaluate Critique, Appraise, Debate, Choose, Consider, Score, Recommend, Assess
PRODUCTS: Survey Self evaluation Editorial Experiment Panel evaluation Recommendation Conclusion Court trial Essay Letter
Create (I Plan)APPROPRIATE VERBS: Create, assemble, improve, modify,
predict, derive, Plan, What if…, Construct, Invent, Manage, Produce, Suppose, Organize, Set Up, Imagine, Design, Compose, Prepare, Propose, Arrange, Formulate
PRODUCTS: Story Poem Play Radio Show Puppet Show News Article Invention Dance Mural Comic Strip Recipe Pantomime Travelogue
Return
13-09-2017
22
Exercise Look at the handout
– Note the lower order: higher order thinking divide is different Lower order: knowledge, understanding Higher order:
– Application, analysis, synthesis, evaluation – Application, analysis, evaluation, creation
Now look at the first 10 ICAS Mathematics items for Standard 3– What Bloom’s cognitive levels are applicable?
2013 ICAS Mathematics (Standard 3)
Q1 Q2 Q3 Q4 Q5
RememberUnderstanding
UnderstandingUnderstanding
AnalysisApplication
ApplicationAnalysis
UnderstandingApplication
Q6 Q7 Q8 Q9 Q10
RememberUnderstand
ApplicationUnderstand
UnderstandingApplication
ApplicationApplication
UnderstandingApplication
Classifications from two expert mathematics assessors
80 percent agreement at LOT/HOT level
13-09-2017
23
2013 ICAS Mathematics (Standard 3)
Q11 Q12 Q13 Q14 Q15
AnalysisAnalysis
ApplicationApplication
RememberAnalysis
ApplicationApplication
ApplicationApplication
Q16 Q17 Q18 Q19 Q20
ApplicationUnderstanding
RememberUnderstanding
UnderstandingApplication
RememberUnderstanding
ApplicationApplication
Classifications from two expert mathematics assessors
90 percent agreement at LOT/HOT level
2013 ICAS Mathematics (Standard 3)
Q21 Q22 Q23 Q24 Q25
AnalysisApplication
ApplicationApplication
ApplicationApplication
UnderstandingUnderstanding
UnderstandingApplication
Q26 Q27 Q28 Q29 Q30
UnderstandingUnderstanding
AnalysisAnalysis
ApplicationApplication
ApplicationApplication
AnalysisUnderstanding
Classifications from two expert mathematics assessors
80 percent agreement at LOT/HOT level
13-09-2017
24
2013 ICAS Mathematics (Standard 3)
Q31 Q32 Q33 Q34 Q35
UnderstandingApplication
ApplicationAnalysis
ApplicationApplication
AnalysisAnalysis
ApplicationUnderstanding
Q36 Q37 Q38 Q39 Q40
ApplicationAnalysis
ApplicationApplication
ApplicationAnalysis
ApplicationAnalysis
ApplicationAnalysis
Classifications from two expert mathematics assessors
50 percent agreement at LOT/HOT level
Return
Depth of Knowledge
13-09-2017
25
Depth of knowledge can vary on a number of dimensions level of cognitive complexity of information students should
be expected to know how well they should be able to transfer this knowledge to
different contexts how well they should be able to form generalizations; and how much prerequisite knowledge they must have in order to
grasp ideas.
The depth of knowledge required by a learning activity or within an assessment is related to the number of connections with regard to concepts and
ideas a learner needs to make in order to produce a response;
the level of reasoning; and the use of other self-monitoring processes
– Are they aware of their own learning?– Can they self-reflective, self-assess, self-direct?
13-09-2017
26
How does Depth of Knowledge work? DOK is broken into 4 levels. As the levels increase, students must demonstrate
increasingly complex mental strategies. Level One is the most basic level, essentially the “definition”
stage. Higher levels of DOK require that students solve problems in
new and creative ways, and allow for multiple solutions to solve those problems.
Norman Webb’s Depth of Knowledge (DOK)
52
Level Label
1 Recall and Reproduction
Requires recall of information, such as a fact, definition, term, or performance of a simple process or procedure
2 Skills and Concepts Requires more than one cognitive process or step beyond recall.
3 Strategic Thinking Requires deep understanding exhibited through planning, using evidence, and more demanding cognitive reasoning
4 Extended Thinking Requires high cognitive demand, consists of complex tasks done over an extended period of time
The depth of knowledge levels in the model developed by Webb establishes how deeply or extensively students are expected to transfer and use what they are learning.
13-09-2017
27
Depth of knowledge focuses on complexity DOK is a reference to the complexity of mental processing that must
occur to answer a question, perform a task, or generate a product. Adding is a mental process. Knowing the rule for adding is the intended outcome that influences the
DOK. Once someone learns the “rule” of how to add, 4 + 4 is DOK 1 and is also easy. Adding 4,678,895 + 9,578,885 is still a DOK 1 but may be more difficult.
Depth of knowledge, not difficulty of question• Difficulty is a reference to how many students answer a
question correctly.• “How many of you know the definition of acclimatize?”
• DOK 1 – recall • If all of you know the definition, this question is an easy question.– “How many of you know the definition of quark?”
DOK 1 – recall– If most of you do not know the definition, this question is a difficult
question
13-09-2017
28
Depth of Knowledge and Bloom’s TaxonomyBloom’s cognitive taxonomy focuses upon the type of thinking, the verb used to describe the cognitive processes expected to be used in solving the task.The Depth of Knowledge is NOT determined by the verb, but the context in which the verb is used and the depth of thinking required.
DOK 3- Describe a model that you might use to represent the relationships that exist within the rock cycle. (requires deep understanding of rock cycle and a determination of how best to represent it)DOK 2- Describe the difference between metamorphic and igneous rocks. (requires cognitive processing to determine the differences in the two rock types)DOK 1- Describe three characteristics of metamorphic rocks. (simple recall)
55
DOK is about what follows the verb... What comes after the verb is more important than the verb itself. “Analyze this sentence to decide if the comma have been used
correctly” does not meet the criteria for high cognitive processing.
Suddenly, there came a clap of thunder.
The learner who has been taught the rule for using commas is merely using the rule.
13-09-2017
29
Depth of Knowledge Level 1: Recall and reproduction DOK 1 requires recall of information, such as a fact, definition, term, or
performance of a simple process or procedure, as well as performing a simple algorithm or applying a formula.
Answering a Level 1 item can involve following a simple, well-known procedure or formula. Simple skills and abilities or recall characterize DOK 1.
57
Depth of Knowledge Level 1: Recall and Reproduction Examples1. Identify a diagonal in a geometric figure.2. Multiply two numbers.3. Find the area of a rectangle.4. Convert scientific notation to decimal form.5. Measure an angle.
Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.
58
13-09-2017
30
Depth of Knowledge Level 2: Skills and Concepts
DOK 2 includes the engagement of some mental processing beyond recalling or reproducing a response. Items require students to make some decisions as to how to approach the question or problem.
Keywords distinguishing Level 2 may include classify, organize, estimate, make observations, collect and display data, and compare data.
These actions imply more than one mental or cognitive process/step.
59
Depth of Knowledge Level 2: Skills and Concepts examples1. Classify quadrilaterals.2. Compare two sets of data using the mean, median, and mode of each set.3. Determine a strategy to estimate the number of jelly beans in a jar. 4. Extend a geometric pattern.5. Organize a set of data and construct an appropriate
display.
Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.
60
13-09-2017
31
Depth of Knowledge Level 3: Strategic Thinking
DOK 3 requires reasoning, planning, using evidence, and more demanding cognitive reasoning. The cognitive demands at Level 3 are complex and abstract.
An assessment item that has more than one possible answer and requires students to justify the response they give would most likely be a Level 3.
61
Depth of Knowledge Level 3: Strategic Thinking Examples
1. Solve a multiple-step problem and provide support with a mathematical explanation that justifies the answer.
2. Write a mathematical rule for a non-routine pattern.3. Explain how changes in the dimensions affect the area and perimeter/circumference of
geometric figures.4. Provide a mathematical justification when a situation has more than one outcome.
Interpret information from a series of data displays.
Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.
62
13-09-2017
32
Depth of Knowledge Level 4: Extended Thinking
DOK 4 requires high cognitive demand and is very complex. It requires complex reasoning, planning, developing, and thinking.
Students are expected to make connections - relate ideas within the content or amongcontent areas — and select or devise one approach among many alternatives on how the situation can be solved.
Due to the complexity of cognitive demand, DOK 4 often requires an extended period of time.
63
Depth of Knowledge Level 4: Extended Reasoning, Extended Thinking ExamplesSpecify a problem, identify solution paths, solve the problem, and report the results.1. Collect data over time taking into consideration a number of variables and analyze the results.2. Model a social studies situation with many alternatives and select one approach to solve with a mathematical model.3. Develop a rule for a complex pattern and find a phenomenon that exhibits that behavior.4. Complete a unit on formal geometric constructions, such as nine-point circles or the Euler line.5. Construct a non-Euclidean geometry.
Source: Kentucky Department of Education (2007). Support Materials for Core Content for Assessment.
64
Return
13-09-2017
33
Examples of DOK 1 in Music
1. Name the notes of the C Major scale
2. Name 4 periods of classical music.
3. Know that a sharp raises a note ½ step
1. Simple recall of pre-learned knowledge
2. Simple recall, but must be taught3. Identify a #, recognize that it raises
a pitch
Items Why is this DOK 1?
Examples of DOK 2 in Music
1. Read and perform a simple rhythm
2. Play a simple melody or accompaniment
1. If the student interprets the rhythm (as opposed to repeating) it is DOK 2.
2. Student must make sense out of written notation and perform
Item Why is this DOK 2?
13-09-2017
34
Examples of DOK 3 in Music
1. Improvise a simple melody2. Perform as a member of a
conducted ensemble3. Compose a single line melody
1. New application of complex processes
2. Students make individual choices about performance
3. New application of complex processes
Item Why is this DOK 3?
Examples of DOK 4 in Music
1. Compose using 2 or more parts2. Improvise over a given chord
progression3. Perform in a student-led
ensemble or solo with accompaniment
1. Requires application of harmony, voice leading, cadence
2. Requires student to apply all previous learning in a new and novel situation
3. Student makes all choices
Item Why is this DOK 4?
13-09-2017
35
Level 1Includes the recall of information such as a fact, definition, term simple procedure, as well as performing a simple algorithm or applying a formula. In mathematics a one-step, well-defined, and straight algorithmic procedure is included at this lowest level. Other key words that signify a Level 1 include “identify,” “recall,” “recognize,” “use,” and “measure.” Verbs such as “describe” and “explain” could be classified at different levels depending on what is to be described and explained.
Level 2Keywords that generally distinguish a Level 2 item include “classify,” “organize,” ”estimate,” “make observations,” “collect and display data,” and “compare data.” These actions imply more than one step. For example, to compare data requires first identifying characteristics of the objects or phenomenon and then grouping or ordering the objects. Some action verbs, such as “explain,” “describe,” or “interpret” could be classified at different levels depending on the object of the action.
Level 3Requires reasoning, planning, using evidence, and a higher level of thinking than the previous two levels. In most instances, requiring students to explain their thinking is a Level 3. Activities that require students to make conjectures are also at this level. The cognitive demands at Level 3 are complex and abstract. An activity, however, that has more than one possible answer and requires students to justify the response they give would most likely be a Level 3. Other Level 3 activities include drawing conclusions from observations; citing evidence and developing a logical argument for concepts; explaining phenomena in terms of concepts; and using concepts to solve problems.
Level 4At Level 4, the cognitive demands of the task should be high and the work should be very complex. Students should be required to make several connections—relate ideas within the content area or among content areas—and have to select one approach among many alternatives on how the situation should be solved, in order to be at this highest level. Level 4 activities include designing and conducting experiments; making connections between a finding and related concepts and phenomena; combining and synthesizing ideas into new concepts; and critiquing experimental designs.
DOK Levels for Mathematics
Level 1 “Recall of Information”
- This level generally requires students to identify, list, or define. - Recall who, what, when and where.- Identify specific information contained in maps, charts, tables, and drawings.
Level 2 “Basic Reasoning”
- Convert information from one form to another:Contrast and CompareCause and EffectCategorize into groupsDistinguish between fact and opinion
Level 3 “Complex Reasoning”
- Apply a concept in other contexts.- Draw conclusions or form alternative conclusions.- Analyze how changes have affected people or places. - Analyze similarities and differences in issues or problems.
Level 4 “Extended Reasoning”
- Analyze and explain multiple perspectives or issues.- Make predictions with evidence as support.-Plan and develop solutions to problems.- Describe, define, and illustrate common social, historical, economic, or geographical themes and how they relate.
DOK Levels for Social Studies
Return
13-09-2017
36
Exercise Now look at the first 10 ICAS Mathematics items for
Standard 3– What Depth of Knowledge levels are applicable?
2013 ICAS Mathematics (Standard 3)
Q1 Q2 Q3 Q4 Q5
11
11
11
11
11
Q6 Q7 Q8 Q9 Q10
12
12
21
12
11
Classifications from two expert mathematics assessors
70 percent DOK agreement, assessor 1 is more conservative
13-09-2017
37
2013 ICAS Mathematics (Standard 3)
Q11 Q12 Q13 Q14 Q15
21
22
11
12
11
Q16 Q17 Q18 Q19 Q20
12
12
12
12
22
Classifications from two expert mathematics assessors
40 percent agreement, first assessor is more conservative, stricter
2013 ICAS Mathematics (Standard 3)
Q21 Q22 Q23 Q24 Q25
12
22
22
22
11
Q26 Q27 Q28 Q29 Q30
22
12
12
22
22
Classifications from two expert mathematics assessors
70 percent DOK agreement
13-09-2017
38
2013 ICAS Mathematics (Standard 3)
Q31 Q32 Q33 Q34 Q35
12
22
23
22
22
Q36 Q37 Q38 Q39 Q40
23
23
23
22
22
Classifications from two expert mathematics assessors
50 percent DOK agreement; it is very hard to write a multiple choice item so that it isDOK 3
Return
Karen Hess’s Cognitive Rigor Matrix
Bloom
RememberUnderstand
ApplyAnalyseEvaluateCreate
Webb’s Depth of KnowledgeRecall &
ReproductionSkills &
ConceptsStrategic Thinking
Extended Thinking
Not in matrix
Not in matrix
13-09-2017
39
“UGs”
EvaluateRecall, locate basic facts, definitions, details and eventsSelect appropriate word when intended meaning is clear
DOK LEVEL 1Recall and Reproduction
Understand
DOK LEVEL 3Reasoning
DOK 2Skills and Concepts
DOK 4Extended Thinking
Create
Use context to find meaningObtain and use information in text features
Explain relationshipsSummarizeCentral ideas
AnalyzeApply
Use language structure or word relationships (synonyms/anto-nyms)
Remember
Analyze or interpret author’s craft (e.g., literary devices, viewpoint, or potential bias) to critique a text
Use concepts to solve non-routine problems and justify solutions with evidence
Explain, generalize or connect ideas using supporting evidence (quote, text, evidence, data, etc.)
Compare literary elements, facts, terms and events.Analyze format, organization & text structures
Identify information in a graphic, table, visual, etc.
Cite evidence and develop a logical argument for conjectures based on one text or problem.
Synthesize information across multiple sources Articulate a new voice, theme, perspective.
Evaluate relevancy, accuracy and completeness of information
Analyze multiple sources or textsAnalyze complex abstract themes.
Devise an approach among alternatives to research a novel problem
Explain how concepts relate to other content domains
Develop a complex model or approach for a given situationDevelop an alternative solution
Generate conjectures or hypotheses based on observations or prior knowledge
Brainstorm ideas, concepts, problems, or perspectives related to a topic
Bloom’s Taxonomy + Webb’s DOK = the Hess CRM
Return
Little Red Riding Hood
Imagine your class has just read a version of Little Red Riding Hood (or another short story, in the language of the class)– What is a basic comprehension question you might ask? – What is a more rigorous question you might ask?
What must you consider when developing each type of question?
13-09-2017
40
Depth +Thinking
Level 1Recall & Reproduction
Level 2Skills & Concepts
Level 3Strategic Thinking/ Reasoning
Level 4Extended Thinking
Remember What color was Red’s cape?Who is this story about?
Understand Who are the main characters?What was the story’s setting?
Retell or summarize the story in your own words.
What is the author’s message or theme? Justify your interpretation using text evidence.
Apply Identify words/phrases that helped you to know the sequence of events in the story.
Analyze Is this a realistic or fantasy story?
Compare the wolf character to the character of Red. How are they alike-different?
Is this a realistic or fantasy story? Justify your interpretation using text evidence.
Are all wolves (in literature) like the wolf in this story? Support your response using evidence from this and other texts.
Evaluate What is your opinion about the cleverness of the wolf? Justify your opinion using text evidence.
Which version has the most satisfying ending? (establish criteria first, then locate evidence)
Create Write text messages between Red & her mother explaining the wolf incident.
No longer higher order thinking focused; deep thinking focused What we have thought of as “higher order” (analysis, evaluation,
creative thinking) might only be engaging or fun and not always deeper Many critical thinking examples do not go deep or get to DOK 3 or 4
(e.g., interpret/solve and justify) Shift our thinking from “higher order” to deeper learning, and that can
mean:– deeper understanding – deeper application– deeper analysis, etc.
13-09-2017
41
USA study of 8428 Year 3 assessmentsDepth + Thinking
Level 1Recall & Reproduction
Level 2Skills & Concepts
Level 3Strategic Thinking (support with data, equations, models, etc.)
Level 4Extended Thinking (cross domains)
Remember Know math facts, terms (34)
Understand Attend to precision, Evaluate expressions, plot point (18)
Model with mathematicsEstimate, predict, observe, explain relationships (2)
Construct viable argumentsGeometry proof
Integrate concepts across domains
Apply Calculate, measure, make conversions (28)
Make sense of routineproblems (8)
Make sense of non-routineproblems
Design & conduct a project
Analyze Identify a patternLocate information in table (2)
Use tools strategicallyClassify, organize data, extend a pattern (6)
Reason abstractlyGeneralize a pattern
Analyze multiple sources ofevidence
Evaluate Critique the reasoning of others
Create (1) Design a complex model
ICAS and Cognitive Rigor MatrixDepth + Thinking
Level 1Recall & Reproduction
Level 2Skills & Concepts
Level 3Strategic Thinking (support with data, equations, models, etc.)
Level 4Extended Thinking (cross domains)
Remember Know math facts, terms
Understand Attend to precisionEvaluate expressions, plot point
Model with mathematicsEstimate, predict, observe, explain relationships
Construct viable argumentsGeometry proof
Integrate concepts across domains
Apply Calculate, measure, make conversions
Make sense of routineproblems
Make sense of non-routineproblems
Design & conduct a project
Analyze Identify a patternLocate information in table
Use tools strategicallyClassify, organize data, extend a pattern
Reason abstractlyGeneralize a pattern
Analyze multiple sources ofevidence
Evaluate Critique the reasoning of others
Create Design a complex model
13-09-2017
42
2013 Year 3 ICAS Mathematics
Depth + Thinking
Level 1Recall & Reproduction
Level 2Skills & Concepts
Level 3Strategic Thinking
Level 4Extended Thinking
Remember 50
Understand 62
39
Apply 85
1211 2
Analyze 34
35 2
Evaluate
Create
Return1
ICAS, Depth of Knowledge 1 and Bloom’s RecallDOK 1: Recall and Reproduction
Recognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures
How many candles are there on Anna’s cake?A. 6B. 7C. 10D. 14
13-09-2017
43
ICAS, Depth of Knowledge 1 and Bloom’s Understanding
DOK 1: Recall and Reproduction
Ann has some pictures.
Which picture is in the third row from the topand the second column from the left?
Recognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures
ICAS, Depth of Knowledge 1 and Bloom’s ApplicationDOK 1: Recall and ReproductionRecognizes, responds, remembers, memorizes, restates, absorbs, describes, demonstrates, follows directions, applies routine processes, definitions, and procedures Sam has tiles like this:
He wants to cover the hexagon with tiles without gaps or overlapping.
How many tiles does Sam need?A. 14B. 12C. 10D. 8
Return
13-09-2017
44
Malaysian mathematics for primary gradesBy the end of Grade 4, students should be able to do the following: Numbers … Measurement—Understand time, including the 12 hour system, perform mathematical
operations, and solve problems involving units of time and the calendar; measure length, mass, and volume of liquid in metric units; calculate unit conversions; perform mathematical operations; and solve problems involving length, mass, and volume of liquid
Shapes and Space—Identify two- and three-dimensional shapes; calculate perimeter, area, and volume; and solve problems involving perimeter, area, and volume of squares, rectangles, cubes, and cuboids
Statistics—Extract and interpret information from pictographs and bar graphs
Interpret and construct simple pictograms, tally charts, block diagrams and simple tables
How many more people play tennis than cricket?
(A) 1(compares cricket and basketball)(B) 3 (key)(C) 4 (compares tennis and basketball)(D) 9(number playing tennis)
13-09-2017
45
Bloom’s Taxonomy -AnalyseWebb’s DOK Level 2 – Skills and Concepts
How many more people play tennis than cricket?
(A) 1 (compares cricket and basketball)(B) 3 (key)(C) 4 (compares tennis and basketball)(D) 9 (number playing tennis)
CRM: DOK 2, Bloom AnalyseCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.
Interpret and construct simple pictograms, tally charts, block diagrams and simple tables
There are 14 people in this group. Each personplays at least one sport. Only one person playsall three sports.How many people play exactly two sports.
(A) 4(B) 5(C) 6(D) 7
13-09-2017
46
ICAS 2013 YEAR 3 Q38Bloom’s Taxonomy - AnalyseWebb’s DOK Level 3 - Strategic Thinking/ReasoningITEM
(A) 4 (B) 5 (C) 6 (D) 7
CRM: DOK 3, Bloom AnalyseCompare information within or across data sets or texts; Analyse and draw conclusions from data, citing evidence; Generalize a pattern; Interpret data from complex graph; Analyse similarities/differences between procedures or solution.
Bloom’s Taxonomy - AnalyseWebb’s DOK Level 1 – Recall and ReproductionITEM
Which of these is a tetromino?(A) (B) (C) (D)
(3 squares) (no common sides) (5 squares) (key)
CRM: DOK 1, Bloom AnalyseRetrieve information from a table or graph to answer a question; Identify whether specific information is contained in graphic representations (e.g., table, graph, T-chart, diagram)
13-09-2017
47
Bloom’s Taxonomy - AnalyseWebb’s DOK Level 2 – Skills and ConceptsITEM
Sam made this tetromino.
Which of these is a tetromino different to Sam’s?(A) (B) (C) (D)
(a rotation) (missing a common side) (5 squares) (key)
CRM: DOK 2, Bloom AnalyseCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.
ICAS 2012 YEAR 6 Q39Bloom’s Taxonomy - AnalyseWebb’s DOK Level 3 - Strategic Thinking/ReasoningITEM
SolutionThere are 5 different tetrominoes
CRM: DOK 3, Bloom AnalyseCompare information within or across data sets or texts; Analyze and draw conclusions from data, citing evidence; Generalize a pattern; Interpret data from complex graph; Analyze similarities/differences between procedures or solution.
13-09-2017
48
Malaysian mathematicsBy the end of Grades 7 to 9, students should be able to do the following:1. Numbers …2. Shapes and Space …3. Relationships—Understand and solve problems involving algebraic expressions;
write, formulate, and solve problems involving linear equations, including simultaneous equations; solve linear inequalities, including simultaneous linear inequalities with one unknown; draw graphs of functions; understand and solve problems involving ratio and proportion; collect and organize data systematically; understand measures of central tendency (mean, mode, and median); and represent and interpret data in pictograms, bar graphs, line graphs, and pie charts, and solve related problems
http://timssandpirls.bc.edu/timss2015/encyclopedia/countries/malaysia/the-mathematics-curriculum-in-primary-and-lower-secondary-grades/
Bloom’s Taxonomy - AnalyseWebb’s DOK Level 1 – Recall and ReproductionITEMJim is making a pattern. Each shape in the pattern uses orange and white tiles.
How many white tiles does Jim add to make each new Shape?(A) 2 (number of orange tiles added)(B) 4 (key)(C) 6 (number of tiles added for each shape)(D) 8 (number of white tiles in Shape 2)
CRM: DOK 1, Bloom AnalyseRetrieve information from a table or graph to answer a question; Identify whether specific information is contained in graphic representations (e.g., table, graph, T-chart, diagram)
13-09-2017
49
Bloom’s Taxonomy - AnalyzeWebb’s DOK Level 2 – Skills and ConceptsITEMJim is making a pattern. Each shape in the pattern uses orange and white tiles.
How many white tiles should Jim use in Shape 6?(A) 13 (number of orange tiles in shape 6)(B) 20 (number of white tiles in shape 5)(C) 24 (key)(D) 37 (number of tiles in shape 6)
CRM: DOK 2, Bloom AnalyzeCategorize, classify materials, data, figures based on characteristics; Organize or order data; Compare/ contrast figures or data; Select appropriate graph and display data; Interpret data from a simple graph; Extend a pattern.
ICAS 2011 YEAR 7 Q27Blooms Taxonomy - ApplyWebb’s DOK Level 3 - Strategic Thinking/Reasoning
Key (A)
CRM: DOK 3, Bloom ApplyDesign investigation for a specific purpose or research question; Conduct a designed investigation; Use concepts to solve non- routine problems; Use reasoning, planning, and evidence; Translate between problem & symbolic notation when not a direct translation.
13-09-2017
50
Bloom’s Taxonomy - UnderstandWebb’s DOK Level 1 – Recall and Reproduction
Anish used this number line to show the distance between 1 and 5 is 4 units.
Which of these numbers is also 4 units from 1?
(A) -5 (uses opposite sign to 5)(B) -4 (uses opposite sign to 4)(C) -3 (key)(D) -2 (moves to left counting markers including 1)
CRM: DOK 1, Bloom UnderstandEvaluate an expression; Locate points on a grid or number on number line; Solve a one-step problem; Represent math relationships in words, pictures, or symbols; Read, write, compare decimals in scientific notation.
Bloom’s Taxonomy - UnderstandWebb’s DOK Level 2 – Skills and ConceptsITEMAnish used this number line to show that distance between 1 and 5 is 4 units.
Which of these numbers is greater than 4 units from 1?(A) -4 (key)(B) -3 (-3 is 4 units from 1)(C) -2 (-2 is less than 4 units from 1)(D) 3 (3 is less than 4 units from 1; 3 is greater than 1)
CRM: DOK 2, Bloom UnderstandSpecify and explain relationships (e.g., non- examples/examples); Make and record observations; Explain reasoning; Summarize results or concepts; Make basic inferences or logical predictions from data/observations; Use models to represent or explain mathematical concept; Make and explain estimates; Provide justification for steps taken.
13-09-2017
51
ICAS 2011 YEAR 7 Q20Blooms Taxonomy - UnderstandWebb’s DOK Level 3 - Strategic Thinking/Reasoning
Key is C
CRM: DOK 3, Bloom UnderstandExplain, generalize, or connect ideas using supporting evidence in, generalize, or connect ideas using supporting evidence; Make and justify conjectures; Explain thinking when more than one response is possible; Explain phenomena (observed, in data) in terms of concepts; Provide a mathematical or scientific justification Return
Psychometrics Literally the application of mathematical (measurement)
models to psychological traits Can be applied naively
– Most people can construct a test, add up scores on a test, summarise achievement using letter grades, make judgements
– Most of these assessments would fail to meet assessment industry standards
– Most people have experienced assessments, most people are “experts” in assessment and this belief can be challenging to shake
ICAS leverages the best psychometric practices
13-09-2017
52
Psychometrics is often focused on: Constructing tools (tests, assessments, surveys, scales) to
collect data Developing, using and evaluating procedures to convert that
data into measurements Often what is being measured is latent or hidden and so
much attention is paid to describing a construct.
UNSW Global Psychometrics We use many of the processes as used in international large
scale assessments– TIMSS, PIRLS, PISA
13-09-2017
53
ICAS test design Mostly multiple choice questions
– Some constructed response questions– Writing is assess using a prompt and marked with a rubric
Curriculum focused– Criterion assessment
Items designed with the primary purpose of assessing learning Easy to hard difficulty
Medal focused– Normative assessment
Items designed with the primary purpose of identifying a very small group of medal winners
Progression
Can you identifythe medal questions?These items can be answeredcorrectly by a very smallproportion of any country.
We place all items within thesame subject onto a commonscale: ICAS monitors growth
13-09-2017
54
Main analysis software We have our own java-coded Rasch software for analysis
and reporting purposes We also use Conquest, RUMM2030, SPSS, SAS
Used to develop prototypes for reporting– E.g., senate-weighted international percentile using ‘super-populations”
Key Stage SAT seems to be analysed using WinSteps– We have this software but choose to use the above software
And we use R This open-source software is being used to parallel process all aspects of our work Used to develop novel internal and external reporting systems
We use two theories for ICAS Classical test theory
– The theory behind most classroom tests– All score points are equal
Some items don’t contribute as they should, though
– Uses total score, averages and correlations– Most development in late 1800s to 1950s
Item response theory (also called Modern Test Theory)– The theory behind most large scale tests (which are often comprised
of many small items) developed from 1950s onwards, peak development in 1970s to 2000
– Focuses upon what each item is telling us about the person
13-09-2017
55
ICAS scores Item level
– Correct or incorrect (and choice) provided to student and school– Informative, especially if item is related to curriculum– Not summarised, amount of data may be daunting
Sub-domain and domain level– Either as number correct or ICAS scale scores
Number correct is the commonly used method ICAS scale scores are derived from Item Response Theory and are used for trend
purposes– We could generate sub-scale scaled scores, later
Classical test theory and ICAS
Classical test theory is primarily used to check the quality of the assessment– An internal quality assurance step– Item difficulty, item discrimination, reliability
But components of this theory are used in reports– The number correct
13-09-2017
56
Item difficulty From classical test theory
– The average score on an item For polytomous items, the average score on the item divided by the total score Valid ranges from zero to one, inclusive
– This statistic is sample dependent The item difficulty depends in part on the item but also on the group learners
answering the item
An ICAS test is typically arranged so items are arranged in order of difficulty– Easiest item first, hardest items last– But order is affected by item lay-out and paper constraints
Item discrimination From classical test theory
– A statistical measure of how well an item discriminates between those who have mastered the subject and those who have not Divide the class into three groups based upon total (subscale) score and compute
the difference in item difficulty for the upper and lower groups And there are many other definitions, evolving as technology evolved
– A correlational score between scored responses on the item and the total test score (or total subscale score) or the total test score with the item removed (the total subscale score with the item
removed) (another way)
13-09-2017
57
Item discrimination guide Most guides for the item discrimination are arbitrary
– But all treat negative item discrimination values as indicative of a highly problematic item
There can be an interaction between item discrimination and item difficulty– Very easy items don’t discriminate well because people in the lower
group are getting the item correct The guide:
– 0.2 or higher is desirable– Between 0 and 0.2 is problematic (maybe)
Point-biserial graph – Year 5 English
13-09-2017
58
Reliability Reliability is a statistical measure of how well each item hits or
is focused on the same construct– It is the average correlation across all possible item pairs
There are a range of possible reliability statistics:– Cronbach alpha, Kuder Richardson 20 (don’t use KR21)
Reliabilities range from 0 to 1– Below 0.7 the test scores are pretty useless– Above 0.95 there is a lot of redundancy in the items– ICAS aims for around 0.80 and 0.85
Reliability and validity This is a common diagram (search the internet for images
of reliability and validity and you will see what I mean)– But it is wrong
13-09-2017
59
Reliability and true score Under classical test theory we assume that the actual test
score is a reflection of the person’s hypothetical real or true score (T= X + ε)
And if that person did, without learning, the test many times then the average of the observed test scores would be equal to their true score (T=Σ(X)/n)
We can estimate how likely a particular observed test score captures the true score, where s is the test score standard deviation, and α is the test reliability
푠푒 = 푠 1− α
Reliability and true score, English NSW 2016Year Std Dev Alpha SE 95%
Range
02 5.17 0.77 2.49 4.8703 7.76 0.86 2.92 5.7304 8.15 0.87 2.88 5.6505 8.98 0.89 2.99 5.8606 8.95 0.89 3.02 5.9107 8.43 0.86 3.15 6.1808 8.60 0.86 3.17 6.2209 9.33 0.87 3.43 6.7210 9.66 0.88 3.38 6.6311 10.02 0.88 3.43 6.7212 9.85 0.89 3.34 6.54
2009 and 2010 Key Stage 2tests had reliabilities of 0.9and higher. Double lengththough.
George and Mallery (2003) provide the following rules of thumb:“ > .9 Excellent,
> .8 Good,> .7 Acceptable, > .6 Questionable, > .5 Poor, and< .5 Unacceptable” (p. 231).
13-09-2017
60
Item response theory Essentially assumes that each response tells us something about the person
responding and the item itself– So the analysis is at the item level
And the person level– Individual estimates of ability (and uncertainty)
– Analysis requires advanced mathematics and specialist software IRT models are “strong models”
– They make strong assumptions If these are met, they work well If they are not met, then ….
IRT models are all probabilistic models– They are mathematical models saying how likely something is be the case– Their predecessors were deterministic models (like Guttman)
Rasch Model There are many IRT models ICAS uses the Rasch model
– The same model that has been used in PISA,TIMSS, Key Stage SATs, and other studies
The Rasch model is a measurement model– It has mathematical features that allow claims of measurement (in a
philosophical way, like measurement of temperature and weight)– Some other IRT models are not measurement models
They summarise the data
푃(θ| = 1) =푒( )
1 + 푒( )
13-09-2017
61
Wright Map
푃(θ| = 1) =푒( )
1 + 푒( )
0
+1
+2
-2
-1
+ ∞
- ∞
Items Learners
Scale is in “logits”
Very easy item Weakest learner
Hardest item
Strongest learner
Where items and learnerare at the same location: P=0.5
An area with little information
An area with a lot ofinformation
The ability estimates of these people will be most accurate
The ability estimate of this personwill have a large error
These people have the sameability but may have differentability errors (or uncertainty)
ICAS scale
500
750
1000
1250
1500
1750
2000
2
34
5 67
The ICAS scale extends from -150 to 3000
We convert IRT logit scoresto scale scores.
ICAS scale score is a transformationof logit scores so that negativenumbers are highly unlikely.
And the numbers are such thatpeople are unlikely to confuse themwith raw scores or percentages.
13-09-2017
62
Common item equating using vertical linksAll ICAS tests within a domain and calendar year
Paper Intro (2)Paper A (3)
Paper B (4)Paper C (5)
Paper D (6)Paper E (7)
Paper F (8)Paper G (9)Paper H (10)
Paper I (11)Paper J (12)
Each year’s paper has common items with adjacent year’s paper
All items and tests within a calendar year are placed onto the same scale
Vertical equating items are checked
Checking measurementqualities of common items from years 2 and 3
13-09-2017
63
Common item equating using horizontal links
Paper Intro (2)Paper A (3)
Paper B (4)Paper C (5)
Paper D (6)Paper E (7)
Paper F (8)Paper G (9)Paper H (10)
Paper I (11)Paper J (12)
Paper Intro (2)Paper A (3)
Paper B (4)Paper C (5)
Paper D (6)Paper E (7)
Paper F (8)Paper G (9)Paper H (10)
Paper I (11)Paper J (12)
Current yearPast years
Checking horizontal equating
Year 6 to basecheck
13-09-2017
64
Model fit – are we measuring achievement? Our Rasch analyses produce item and ability parameter
estimates– Each item parameter has one or more fit statistics
And an estimate of how well the Rasch model is capturing the data– Or more strictly, how well the data fits the measurement model (Rasch
theoreticians follow this line of argument)
We generally focus on specific item fit statistics– Infit – does the item appropriately measure its target
This is the most important item fit statistic
– Outfit – does the item appropriately measure learners who are not targeted by the item
– We want infit and outfit statistics to be between 0.8 and 1.2
An easy, good fitting item
We want infit and outfit statistics to be between 0.8 and 1.2
13-09-2017
65
A poor fitting, difficult item
We want infit and outfit statistics to be between 0.8 and 1.2
Reporting At student and
school levels– Schools can slice-and-dice
data as they want Reporting system is largely
normative But can use patterns to
inform learning
13-09-2017
66
Looking at response patterns When the ICAS questions are arranged from easy to hard, and students
are arranged from highest to lowest performance, we expect to see a triangle pattern:
Unless the student is guessing or another ability (e.g., language proficiency) is very important.
Looking at specific response patterns The following patterns are ypically found:
– Guttman
– Rasch – Rasch with Careless Response – Rasch plus Guessing
– Guessing (no pattern)
– Special knowledge (pattern based upon specific curriculum knowledge)
The ideal Rasch pattern has 3 zones, arranged in order of item difficulty– A zone of items all answered correctly– A zone of items with some answered correctly, and some incorrectly– A zone of items all answered incorrectly
The Guttman pattern has two zones only
13-09-2017
67
ICAS is focused on skills, concepts and strategic thinking
As part of ourvalidity frameworkwe investigateICAS’s relationshipwith other measures.
We need to know howICAS relates to the Malaysianassessments.
You will need a plan for ICAS
Wayman (2005): “few would argue that creating more information for educators to use is a negative” (p. 236).– But it is too easy to be swamped with data– Or be too busy to use the data– Or not have a constructive, detailed plan for the use of ICAS
13-09-2017
68
Optimum conditions for data usage Raths, Kotch, & Gorowara (2009)
– School climate– Sensitive measures
Must relate ICAS to the syllabi Evidence should be “curriculum sensitive” and should align with the teacher’s
educational objectives– Timely access to evidence– Buy-in by teachers– Teacher skills– Conceptual interpretation of audience– Time for teachers in school day– Team work