expanding adaptive algorithms in new ways: rscat andecho...
TRANSCRIPT
1
Expanding Adaptive Algorithms in New Ways: RSCAT and Echo-Adapt
Bingnan Jiang and Michelle Barrett
Research Technology, Data Science, and Analytics
2
Bingnan Jiang, Ph.D.
• PhD Research Methodology, Measurement & Data Analysis, University of Twente
• Graduate Certificate in Large Scale Assessment, University of Maryland College Park
• Certifications in Agile methods, Lean/Six Sigma, Data Science
• Former:
• Middle & high school mathematics teacher
• Sr. Consultant, Colorado Department of Education
• Ed Technologist, Specializing in Assessment Technology
You may not know us (yet)...
Vice President
Research Technology, Data Science &
Analytics, ACT
Chair, AI and Adaptive Technologies
IEEE Industry Consortium on
Learning Engineering
Senior Research Scientist
Research Technology, Data Science &
Analytics, ACT
• PhD in Electrical & Computer Engineering, Northeastern University, Boston MA
• Interests: computerized adaptive testing, optimization, data science, Bayesian statistics, software development
• Former:
• Operations Research Scientist, Pacific Metrics, Lakewood, CO
• Operations Research Analysist, Norfolk Southern, Atlanta, GA
Michelle Barrett, Ph.D.
Outline
• Adaptive algorithms for learning and assessment
• Computerized adaptive testing with the shadow-test approach
• Ways to use this approach• RSCAT overview
• RSCAT demo
• Echo-Adapt overview (demo upon request post-session)
3
4
Learning & Assessment
5
Assessment... Learning...
Longitudinal
Statistical Models: Bayesian Knowledge TracingFocus: Process
Cross-section
Statistical Models: Item Response TheoryFocus: Outcome
Adaptive...
6
Statistical Model(s) for CAT
Item Response Theory (IRT)
• Acknowledges difference in item difficulty, discrimination, and other attributes depending on specific model
• Places items and examinees on the same difficulty/ability scale
• Allows for comparison of ability even when different items are administered (e.g., pre/post, different students)
• Often we MAXIMIZE INFORMATION in adaptive testing
7
Shadow Test Approach
A Fundamental Dilemma in CAT
8
CAT
Administer items sequentially to maximize information
Meet content and other specifications
9
The Shadow Test Approach Solves this Dilemma and Supports Other Operational Needs
Shadow-Test CAT
van der Linden, Wim J (2006)
Optimal Item
Administration
Use Item Bank
Efficiently
(Exposure Control)
Content
Specification
Compliance
Administer New
Items
(Field Testing)
A Shadow-Test Selects Optimal Items While Conforming to the Test Blueprint
Q1:…Q1000:
Item Pool
Shadow Test
Items Already Administered
Rest of the Optimal Test (Unseen to Test Takers )
• Test blueprint constraints• Real-time constraints• Test taker’s updated
abilityNext Optimal Item
Item i• Max accuracy• Item order• Item-passage
association
10
11
Shadow Tests Are Assembled Dynamically Throughout Adaptive Testing
van der Linden, Wim J (2006)
Shadow-Test Assembly Modeled as Mixed Integer Programming (MIP)
12
Objective Function
Constraints
Decision Variables
• Maximize test information maximize���
�
• Test length
• Item of Specific Content
�� = 50�
�
10 ≤ � � ≤∈����
15
• Item Selection
• Binary Variables� = 0or1, for all item �
ACT Internal Only13
Seem Complicated?
We have options…
14
15
Scalability Interoperability
Optimized Performance
Security & Reliability
At Scale
Easy Access Flexibility
Standalone Usability
Open-Source Package
Ways to Implement Shadow-Testing CAT
RSCATUseCases
1. Academic Research
• Test configuration and simulation
• Investigate item pool and test configuration interactions
• Evaluate measurement error, bias, etc.
2. Ed Tech R&D
• Unpack the algorithm "black box", understand methods
• Evaluate ways to improve outcome measures in products
16
17
Demo
18
Configure CAT and Run Simulations in RSCAT
Shiny APIs APIs
− No coding work− Easy use− Visualization
− Advanced use− Integrated with
R programs
− Advanced use− Integrated with
Java programs
Software Architecture
19
Available to the Public
License: CC BY-NC 4.0
Email: [email protected]
20
GitHub CRAN
https://github.com/act-org/RSCAT Coming Soon!
21
From Open Source to Scale: Highlights
MIP SolverExposure Control
Not supported
Embedded in operational testing;
Online IRT calibration for new
items
User Interface
Shiny app
Web app
Test configuration management
Lock and release configurations for live
delivery
Scalability
Runs locally
Runs in cloud
40,000+ concurrentexaminees,
<500ms response
Highly performant commercial
solver
Licensed by user; Open-source,
community and commercial solvers
available
At item and/or passage level;
Overall and conditional on
ability level
At item level;Overall only
Field Testing
22
References
• Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied psychological measurement, 6(4), 431-444.
• Durlach, P. J. & Spain, R. D. (2014). Framework for Instructional Technology: Methods of Implementing Adaptive Training and Education. Technical Report 1335. U.S. Army Research Institute for the Behavioral and Social Sciences. www.dtic.mil/docs/citations/ADA597411
• Meindl, B. & Templ, M. (2012). Analysis of commercial and free and open source solvers for linear optimization problems. Retrieved April 2017 from http://www.statistik.tuwien.ac.at/forschung/CS/CS-2012-1complete.pdf.
• Mittelmann, H. (2011). Benchmarks for Optimization Software. Retrieved April 2017 from http://plato.asu.edu/bench.html.
• Molnar, M. (2017). Market is booming for digital formative assessments. Retrieved from https://www.edweek.org/ew/articles/2017/05/24/market-is-booming-for-digital-formative-assessments.html
• van der Linden, W. J. (2006). Linear models for optimal test design. Springer Science & Business Media.
• van der Linden, W. J., & Veldkamp, B. P. (2007). Conditional item-exposure control in adaptive testing using item-ineligibility probabilities. Journal of Educational and Behavioral Statistics, 32(4), 398-418.
23
Thank you!Together, we’re helping people achieve success.
24
Technical Appendix
25
Assessment Helps to Close the Learning Gap
Identify Gaps
Close Gaps
Learning
Plan
Learning
Guidance
Learning
Status
Assessment
Learning
Plan
Technical Appendix
26
From Open Source to ScaleFeature RSCAT Echo-Adapt®
User Interface Shiny App, Installation Guide Web App, User Manual, Integration Guide, Technical Appendix
Constraint Types Basic Expanded
Objective Functions Available Maximum Fisher Information Maximum Fisher Information
Psychometric Model IRT IRT
Scoring Method EAP EAP, MCMC
Exposure Control Item, overall Item/passage, overall/conditional
Simulate Test Administration Sequential Load-balanced and auto-scaled
Field Testing Not available Embedded in operational testing with online calibration
Test Configuration Management Store locallyCopy, save, edit
Lock & release for test administration
Performance Limited optimization Fully optimized
Scalability Local installation 40,000+ concurrent examinees with <500 ms response time
Shadow Test MIP SolverLicensed by user;
Open-source, community and commercial solvers available with different performance profiles
Highly performant commercial solver
Interoperability N/AWith test delivery engines*: IMS Global aQTI CAT standard*Some test delivery engines are LTI and Caliper compliant
A MIP Is Built and Solved in Five Steps
1. Choice of decision variables� Integer variables, e.g., item selection binary variables
2. Modeling of the objective function� Maximize assessment accuracy (test fisher information at
ability estimate)
3. Modeling of constraints� Number of items of specific contents should be in a range
� Average word count of test should be in a range
� Enemy items
4. Input the MIP model into a MIP solver, e.g., FICO Xpress
5. Evaluation of solutions
27
Technical Appendix
28
Scoring Method
• Expected a posteriori (EAP) − The expected ability from posterior distribution based on quadrature points
− Bock & Mislevy (1982)
− Easy to implement and effective
• Markov chain Monte Carlo (MCMC) − A fully Bayesian approach based on Markov chains
− Gibbs sampling and Metropolis-Hastings
− Works well for items with unknown parameters
Technical Appendix
29
MCMC for Scoring and Field Testing
�� = �� , ��
………�� �� ��! �"
Testing Stage 1 to #
A Gibbs sampler with a Metropolis-Hastings step
Update �� or ��
Markov Chain
$�
Technical Appendix
30
MCMC Scoring
Draw Sample
Candidate �%,�&
Accept/Reject �%,�&with Probability P
Draw from normal proposal density '(�%,�|�%,�*+�)
Next Iteration, - ← - + 1
Resample�0* from
Posterior Distribution
�%,�* = �%,�& or �%,�*+�
1%,� = �%,�� , ⋯ , �%,�3Upon Stationary
4 = min 6(�%,�& )7(�%,�& ; �0*+�)9:0 1 − 7(�%,�& ; �0*+�)�+9:0
6(�%,�*+�)7(�%,�*+�; �0*+�)9:0 1 − 7(�%,�*+�; �0*+�)�+9:0 , 1
• After examinee < submits response =>? to operational item >? in stage ?
Update 6 �%
Technical Appendix
31
A Shadow Test Selects Operational and Field-Test Items Simultaneously
Shadow Test MIP Objective Function
Maximize Sum of Posterior Expected Item Information
Maximize Sum of D Criteria
Operational Field-Test
≡ A+��B: �C; �C3
C �
maximize���
�maximize�DE�E
F
E �
DE ≡ A+��det I+� �E + �EC ; �C3
C �−det I+� �E
Technical Appendix
32
MCMC to Update Field-Test Item Parameters
• After field-test item J is responded by examinee < with response =J<
�E = �E� , ⋯ , �E3
Resample �%*
Draw Parameter
Candidate KE&
Samples saved through the MCMC ability update process
Next Iteration, - ← - + 1
Accept/Reject KE&with Probability 4L
Upon Stationary
Draw from normal proposal density '(�E|�E*+�)
�E* = �E& or �E*+�4L = min 6(�E& )7(�%,�* ; �E& )9MN 1 − 7(�%,�* ; �E& ) �+9MN
6(�E*+�)7(�%,�* ; �E*+�)9MN 1 − 7(�%,�* ; �E*+�) �+9MN , 1
Update 6 �E
Technical Appendix
33
Exposure Control: Item-Ineligibility Constraints
• Eligibility probability of item/passage in a theta range
:the number of examinees through j who visited theta range k and took item/passage i
:the number of examinees through j who visited theta range k when item/passage iwas eligible
:exposure goal rate
max( 1)ˆ ( | ) min ,1 , for 0.ijkj
i k ijk
ijk
rP E
εθ α
α
+
= >
ijkα
ijkε
maxr
van der Linden and Veldkamp (2007)
Technical Appendix
34
Exposure Control: Item-Ineligibility Constraints (Cont’d)
• Eligibility experiment
If , then item/passage i is ineligible at theta interval k
• Ineligibility soft constraint
: Fisher information of item/passage i at the current ability estimate;
: Penalty of selecting ineligible item/passages
ˆ~ (1, ), where ( | ).ik i k
X B p p P E θ=
0ikX =
1
ˆ( )
j
I
i i i
i i S
I x M xθ
= ∈
−∑ ∑
ˆ( )iI θ
M
35
About Us
36
Mission-Driven
Helping people achieve education and workplace success.
37
We create products that genuinely help our customers succeed.
Our ValueProposition
AuthorityWe leverage our research and credibility to validate and certify
PersonalizationWe provide portable, personalized experiences
IntegrationWe embed our assessments into existing processes and cycles
38
About Us
Our Guiding Principles
InclusiveWe do everything we can to level the playing field for everyone, regardless of needs, backgrounds, or resources.
HolisticWe assess and appreciate each person's unique traits and skills, to help navigate toward college and career success.
TransformationalWe lead the industry through our research and technology, constantly evolving as an integral part of the learning process.
We strive to always be…
39
ACT Today
• Serving 3rd Grade - Career
• Customers in 50 states and 130+ countries
• 15M assessments in FY16
• 60% of 2017 graduating class took the ACT
• Fee waivers for 700,000 underserved students: $36M +
• More than 4 million National Career Readiness Certificates awarded
• 1000+ employees, in 37 states
40Founded in Iowa City, 1959
Products and ServicesACT is widely known for the ACT test, but that is just one aspect of the work we do...
41
42
Students &
Parents
ACT can help you plan your future, prepare for college and career, and achieve success.
K-12 Educators &
Administrators
ACT helps you track student progress and prepare them for success through high school and beyond.
Postsecondary
Professionals
ACT solutions can help you find, attract, place and retain students at your school.
Job Seekers &
Employers
ACT workforce development solutions help job seekers, employers, and business leaders, achieve career and business goals.
43
ACT Product Portfolio
High School 525K
3rd Grade – early High School5.3M Summative
3.8M Interim861K Classroom
High School 4M
High School Access to enrollment, scholarship opportunities
Measures foundational, soft skills 2M
Improve the skills essential to workplace success
276K Individual Learners
SolutionsAnnual Volume (est.)
44
Measures social and emotional learning (SEL) skills
High School
11th grade through Higher Education
Measures/certifies essential work skills
380+ Participating Counties
Measures/certifies work skills in a community
4M+ Awarded15,000+ Employers Recognized
SolutionsAnnual Volume (est.)
45
Innovative Services & Solutions
46
Growing a portfolio of solutions to advance our mission.
Research-BasedWe use data and research to drive policy, product and business decisions.
47
Research lies at the heart of everything we do.
ACT research guides thought-leadership, and drives solutions:
• ACT College and Career Readiness Standards
• ACT College Readiness Benchmarks
• Broad Definitions of Readiness
• ACT Policy Platforms
48
We are industry thought-leaders who continually reinvest in our research.
Among our top findings:
49
Readiness Matters Early MonitoringMatters
Multiple Dimensions Matter
50
Policy Platforms
ACT has articulated policy recommendations in the form of policy platforms in three areas: K–12 education, postsecondary education, and workforce development.
NonprofitWe re-invest in research, programs and services to support our mission.
51
Through purposeful investments, employee engagement, and thoughtful advocacy, the Center for Equity in Learning supports innovative partnerships, initiatives, campaigns, and programs that help young people succeed in education and the workplace.
Follow us @ACTEquity
52
Equity in LearningWe do everything we can to level the playing field for everyone, regardless of needs, backgrounds, or resources.
We believe success is different for everyone.
No matter where you’ve been, where you are, or where you want to go, ACT can help.
53
54
ACTNext supports ACT by pursuing and developing a research agenda integrating the most recent findings in psychometrics, statistics, assessment design, analytics, measurement, educational data mining and technological innovations
ACTNext
Research and Development + Business Innovation Center
Artificial Intelligence (AI) in assessment
• AI scoring – essays, short answer, open-ended math
• Diagnostics, error pattern recognition
• Create optimal learning pathways
• Automated item generation
• Test security, online proctoring to proctorless(anytime, anywhere)
• Real-time authoritative, personalized, integrated instruction, tutoring and advising at scale
55
We’ve disrupted the industry once… …and we’re ready to do it again.
Why? Because at ACT, we’re passionate about what we do and determined to help people achieve education and workplace success.
56
Champion the ACT MissionBe a voice, make a difference.
57
58
• Represented in all 50 states and the District of Columbia
• More than 10,000 members—and growing
• 640 actively engaged state council members
• 20 elected Steering Committee members
ACTState Organizations
A unique network of teachers, counselors, administrators, enrollment advisors and business professionals.
Work Ready Communities
ACT Work Ready Communities empowers states, regions and counties with data, processes and tools that drive economic growth.
National Career Readiness Certificate (NCRC®) measures and closes the skills gap, building common frameworks that link, align and match their workforce development efforts.
Total Certified Counties: 380+
Employers Supporting: 15,000+
Jobs Profiled: 21,000+
NCRC Total: 4 Million+
59
60
ACT College & Career Readiness Champions
Established in 2013 to create awareness and celebrate achievement in college and career readiness for all.
Help advance our mission
Be a voice, make a difference.
Connect on social.
61visit ACT.org