2013 nces mis conference title: data linking for analytics—k–12 to community college to...
TRANSCRIPT
2013 NCES MIS CONFERENCETITLE: Data Linking for Analytics—K–12 to Community College to
University
10:15am, February 14, 2013Watson (IEBC), Osumi (UH), Ikenaga (UH)
Jean OsumiSenior Associate for Academic Policy and Evaluation Hawaii P-20 Partnerships for Education, University of Hawaii
Todd IkenagaSLDS Program Manager Hawaii P-20 Partnerships for Education, University of Hawaii
John WatsonDirector of Analytics, Institute for Evidence Based Change (California)
Introductions
2
Agenda
Background, Systems, Approach
HI-PASS History, Current Effort, Data Sources
Cross-Segment Data Linking
Reporting
Progress, Next Steps
3
Our first multi-segment project: Cal-PASS
Started in San Diego in 1998Became a State funded project in 2003Goals:
Collect actionable dataLinks primary, secondary and post-secondary institutions on
a regional basisTracks students from one segment to the next
5
Using Data From A Systems Perspective
Educational System
Technologyand
Research Expertise
Organizational Habits
Human Judgments
and Behavior
6
Keys to Data Use
Focus on a few key metrics Key Performance Indicators, a good way to
start Focus on the goal – student learning and
completion Track cohorts, not just snapshots
◦ Look at leakage points Use data as a way to improve, not to punish Most important – tell a story
7
Systems
Main Warehouse
Research DB
Validation
> Loading
Web Site/PortalReporting
In-SITES Tools(Development | Published)
ETL > Data mart > Data Store
8
Data vs. Use We got it wrong – need to focus on consumer;
less on the data
Ron Thomas rules for Data Analysis…◦ We are in the knowledge Business – not the data business◦ Data is about improvement, in particular improvement in
instruction◦ A protocol for using data is important◦ Must build capacity of practitioners to acquire and use data
9
HI-PASS Initial Groundwork Started with two groups with different purposes:
Maui (PLC) & Hawaii P-20 (assess statewide impact)
2009 Statewide Forum on Longitudinal Data◦ top priorities that emerged were data
governance and access to data, which drove overarching MOU
One-by-one data sources and funding became clear – all focusing on P-20 as an end-goal
11
College and Career Readiness Indicators
Completed for every public high school Classes of 2008, 2009, 2010, 2011 Measures by school
◦ College access nationwide◦ SAT scores◦ Percentage of completion of the BOE Recognition Diploma◦ College level work: Advanced Placement and Running Start◦ College level and remedial/development enrollment for Math and English (UH only)
13
Challenges
MOUs Lawyers
Changing culture
Data Quality
Sustainability
Federal and State regulations
15
Multi-segment linking
CA: Multiple IDs across segments
K-12 > CC◦ Encrypted provided derived key◦ Derived Key
K-12 > University◦ Encrypted provided derived key◦ Derived Key
CC > University◦ Encrypted provided derived key
18
Multi-segment linking
Texas◦ Texas Pathways (Higher Ed Coordinating Board)
Encrypted identifier
◦ GC-PASS Provided encrypted identifier > derived key
Gulf Coast-PASS
19
Deduplication Techniques There are times when we suspect duplicate
records, or specific keys aren’t available. Especially found in cross-segment situations Some instances:
1) name change2) typos in name or birthdate3) detecting false matches 4) detecting CONFLICTING IDs 5) finding transposed names across institutions
20
Deduplication Techniques Remedy: multiple-pass deduplication,
including:◦ Creation of metakey based on various techniques◦ These agree with primary derived keys 95% of
time◦ Current Method (23 stages):
Rule-based cleaning Comparison vectors Cosine similarity
21
Labor Data: Promises, Pitfalls For an increasing number of projects, there
is the hope of understanding what happens to student cohorts, population as they leave school
Labor data can be the answer◦ Additional MOUs◦ Data formats can vary◦ Data security concerns lead to carefully-planned
processing◦ Results promising. Example: Coachella Valley, CA
22
HI Data LinkingDemographics matching across K-12 to postsecondary mainly using name, date
of birth and gender in different combinations with strictest criteria used first.
Match 1 - last name, first name, dob, gender
Match 2 - last name, first name, dob
Match 3 - last name, first name (imbedded), dob, gender
Match 4 - last name (imbedded), first name, dob, gender
Match 5 - last name (imbedded), first name (imbedded), dob, gender
Match 6 - last name, first name (first 3), dob, gender
Match 7 - last name, first name (first 3), dob (month/year), gender
Match 8- last name (first 3), first name (first 3), dob, gender
Match 9 - last name (first 3), first name (first 3), dob (month/year), gender
Match 10 - last name, first 3 letters of FN, dob (month/day), gender
23
OLAP Cubes◦K12 to postsecondary transitions - focus on
remedial enrollments and placement
◦Postsecondary enrollments linked to workforce data - Shows students, enrollments, GPA, campus, time period, linked to employment data
◦CTE tool -Similar to K12 but focusing on CTE pathways and subsequent majors and awards at postsecondary
HI-PASS: Focus of output
31
Project Status Continue with HI-PASS to identify issues
◦ Data quality◦ Integration problems◦ Missing elements◦ Definitions and standards
Complete infrastructure build for permanent system
Continue working on our data governance framework◦ Research◦ Data quality◦ Security and Access to data
Establish a framework for data literacy and use
Focus on identifying, training diverse user levels
32