design challenges and solutions for next …...design strategies to reduce risks • pe bundling...
TRANSCRIPT
Design Challenges and
Solutions for Next
Generation Science
Assessments:
One State’s Design, Rich
Simulations, and One State’s
Reflections
Tim O’Neil, Ed.D.
Manager, Psychometric Services
June 2016
0NGSS Design Challenges: One State’s Design
Next Generation Science
Standards
1NGSS Design Challenges
Next Generation Science Standards
NGSS are K–12 science standards designed to be rich in content and practice. They are arranged in a coherent manner across disciplines and grades to provide all students an internationally-benchmarked science education.
2NGSS Design Challenges: One State’s Design
Next Generation Science Standards
Four Domains of NGSS:
• PS- Physical Sciences
• LS- Life Sciences
• ESS- Earth and Space Sciences
• ETS- Engineering, Technology, and Applications of Science
3NGSS Design Challenges: One State’s Design
Next Generation Science Standards
Structure of the Domains:
• Domains contain Performance Expectations (PEs) and
supporting concepts
• Supporting Concepts
–Science and Engineering Practices (SEP)
–Disciplinary Core Ideas (DCI)
–Crosscutting Concepts (CC)
4NGSS Design Challenges: One State’s Design
Next Generation Science Standards
Performance Expectations:
• Articulate what students should know and are able to do at each grade level
• Contain clarifying statements– provide examples and/or additional clarification to the
PEs
• Contain assessment boundaries– specify the limits to large scale assessment or “no-no’s”
5NGSS Design Challenges: One State’s Design
Next Generation Science Standards
7NGSS Design Challenges: One State’s Design
Evidence Statements
MS-PS4-1 Waves and Their Applications in Technologies for Information Transfer –
Evidence Statements
Maryland NGSS Science
Summative Assessment
Design & Challenges
8NGSS Design Challenges
Initial Drivers
• Assessing student performance on NGSS at grades 5 and 8
• Online administration• More complex, performance-based measurement• Scores at student level• Administered over 4 sessions, each to fit within one
classroom period
Within context of NGSS the design needs to:
• cover PEs as well as the supporting concepts• capture the demonstration of supporting claims with
evidence• allow testers to demonstrate an understanding of cross
disciplinary concepts
9NGSS Design Challenges: One State’s Design
Principled Design for Efficacy
10NGSS Design Challenges: One State’s Design
PDE Process Model
Basic Design
• Stimulus based task sets • Mixed format with more complex, performance-based
item types
• Balanced representation at domain level• Reasonably able to be administered within allotted
testing time• Embedded field testing assumed in operational design
11NGSS Design Challenges: One State’s Design
Content Coverage
• Roughly 40 to 60 PEs identified as assessable within the summative assessment context at grades 5 and 8
• Need to ensure supporting concepts (SEP, DCI, CC) adequately assessed
• Generally -- 3 items needed to adequately assess a given PE plus the supporting concepts
• Given overall administration time allotment and this basic design, coverage of PEs was roughly 25% to 30%
12NGSS Design Challenges: One State’s Design
Content Coverage – Risks
• Three primary risks – based on the assumption that
PEs within domain are distinctly different from one
another and that basic design samples only 1/4 to 1/3
of PEs in a given administration:
– Instructional impacts
– Score interpretation concerns
– Equating risk
• Need to consider the risk and decide how to address
them
• Challenge: Balance psychometric needs and testing
time goals
13NGSS Design Challenges: One State’s Design
Content Coverage – Risks
Potential negative impact on score interpretation and instruction:
• “Students…perform better on items testing standards that compose a larger fraction of last year’s state test” (Jennings & Bearak, 2014, p. 386)
– Undermines test performance inferences
– Reduced exposure to the state content standards for students
• Koretz (2013, p. 6) discusses
– Unbalanced allocation of instructional time to tested content standards, to the detriment of other valued standards (i.e., “reallocation,” p. 6) and
– Focusing instruction on narrow, incidental attributes of a test, such as item formats (i.e., “coaching,” p. 6)
– These can wreak havoc in interpreting student performance
14NGSS Design Challenges: One State’s Design
Additional Risks
Risk to equating:
• Form-to-Form Content Representation – to the extent that PEs are distinctly different, there is a risk that test forms may differ in terms of content representation where a different sampling of PEs is assessed on different forms. This could potentially undermine equating.
• By extension, these risks could manifest in differences in overall test and performance characteristics:
– Reliability
– Underlying structure
– Score distributions
15NGSS Design Challenges: One State’s Design
Design Strategies to Reduce Risks
• PE Bundling – is a strategy for both NGSS instruction and assessment which groups PEs together based on common elements.
• Through shifting content coverage to representation across PE bundles as opposed to stand alone PEs, will allow for more reasonable sampling of the full domain.
• Given basic operational test design ~70% coverage of PE bundles
16NGSS Design Challenges: One State’s Design
PE Bundling
17NGSS Design Challenges: One State’s Design
• MS-LS1-7. Develop a model to describe how food is rearranged through chemical reactions forming new molecules that support growth and/or release energy as this matter moves through an organism.
• MS-PS1-5. Develop and use a model to describe how the total number of atoms does not change in a chemical reaction and thus mass is conserved.
PE Bundling
18NGSS Design Challenges: One State’s Design
MS-LS1-7 and MS-PS1-5
PE Bundling
19NGSS Design Challenges: One State’s Design
MS-LS1-7 and MS-PS1-5
Design Strategies to Reduce Risks
• Multiple Core Forms – using domain level equivalence while varying specific PE bundles across multiple core forms (same number of points and item formats within each)
• Matrix Sampling – allows for increased sampling of PEs bundles at school level
• Allows for more PE bundle sampling within a given administration, reducing negative impact on instruction tied to awareness of tested content
20NGSS Design Challenges: One State’s Design
Design Strategies to Reduce Risks
• Number of Tasks - research on science performance tasks suggests that 10–12 tasks may be necessary for supporting dependable scores and defensible inferences about learners (Gao, Shavelson, and Baxter, 1994)
• Mixed Format Tasks – combining performance tasks with selected-response items can also greatly facilitate use of test equating methods as a means of improving score comparability (Davey, Ferrara, Holland, Shavelson, Webb, and
Wise, 2015)
21NGSS Design Challenges: One State’s Design
General Design
• Stimulus based tasks– 6 items per• Mixed format with more complex, performance-based
item types balanced to meet the rigor of NGSS performance assessments while providing a more tenable foundation to support year-to-year scale maintenance via equating with more basic formats
• Enough tasks to help guarantee dependable student level scores
• Balanced representation at domain level• Two core operational forms per administration
• Common section produces student level score• Matrix section produces school level score and helps
increase content coverage• Embedded FT in both common and matrix sections
anticipated
22NGSS Design Challenges: One State’s Design
Maryland NGSS Pilot Test
23NGSS Design Challenges
Pilot test purpose, method, and design
• Examine how students interact with
– New stimulus types (e.g., simulation, lab data set)
– Familiar item functionalities in unfamiliar performance task format
• Not concerned with whether student know the correct answer; focus on whether students can determine how to interact with the stimulus and assessment activities
• Cognitive labs via WebEx
• Sampling
– Four LEAs, one elementary and one middle school per LEA, two days
– Each school: four students per grade
– Higher achieving LEAs, schools, and students
24NGSS Design Challenges: One State’s Design
Cognitive Lab Procedures
• Estimated 60 minutes total per student
• Three roles:
– Facilitators: MSDE staff who made arrangements with schools before, during, and after cog labs; they also logged students onto the Webex and ABBI and helped them navigate from one assessment activity to the next
– Managers: Pearson science content staff and program managers who solved any problems with administration of the tasks in the ABBI sandbox and managed the WebEx; they were responsible for ensuring that the WebEx recording function was operating
– Researchers: Research scientists conducted the cog labs following the prescribed protocol
• Training
• Several researchers
25NGSS Design Challenges: One State’s Design
Cognitive Lab Tasks
• Two stimulus-based tasks developed per grade each designed to be completed within a single class period (each student was given one task)
• Each task involved 10 items with selected-response, technology enabled, and constructed response formats
• Grade 5 tasks involved an earth science and a wave simulation scenario
• Grade 8 tasks involved a chemistry lab and a reproductive types scenario
• Due to time constraints, administration occurred within the test environment of ABBI (Pearson item authoring tool)
26NGSS Design Challenges: One State’s Design
Cognitive Lab Protocols
• Part 1. Introductions and Demographic Interview (Est. 5 minutes)
• Part 2. Think Aloud Modeling and Practice (Est. 5 minutes)
• Part 3. Thinking Out Loud and Responding to the Performance Task (40 minutes)
• Part 4. Cob Lab Debrief (10 minutes)
27NGSS Design Challenges: One State’s Design
Cognitive Lab Protocols
28NGSS Design Challenges: One State’s Design
Item 1
Observation
Did student view the stimuli? Yes/No
How many times?Did student read question without error? Yes/No/Did not read out loud
Did student seem confident about their answer choice or other type of response?
Yes/Somewhere in between/Guess/Other
Did student read through all parts of the item before answering?
Yes/No/Other
Did student go back to the stimuli to respond to question or check answer?
Yes/No/Other
Did anything in particular confuse the student?
Specify
Did anything in particular interest/engage the student?
Specify:
Did the student think aloud as he/she figured out the stimulus and responded to the test question?Other notes, especially about problems with the stimulus or item:
Results
• This test was very engaging to almost all students
• Liked the graphics and animation
• Liked the dynamic item formats
• When students understood the science knowledge, they seemed able to respond easily using the various item functionalities
• Rich/informative stimuli engaging and well received
• In some instances, students spent too much time reviewing stimuli as opposed to using it to respond to the items
• Ideal to minimize scrolling with longer stimuli
29NGSS Design Challenges: One State’s Design
Results
• Need to provide opportunity to practice working with the variety of new item formats and test design
• Most students successful at interacting with all item formats
• Several students across grades experienced some issues interacting with new formats when instructions were unclear
30NGSS Design Challenges: One State’s Design
Thank [email protected]
31NGSS Design Challenges