mykola pechenizkiy mpechen/ ethics-aware learning analytics irb and big data nsf workshop, george...
TRANSCRIPT
Mykola Pechenizkiyhttp://www.win.tue.nl/~mpechen/
Ethics-aware Learning Analytics
IRB and Big Data NSF Workshop, George Masson UniversityArlington, VA, USA
Who I amApplied Data Mining researcher Data scientist
– Predictive analytics, evolving data, big data– Adaptive learning, concept drift, context– Web analytics, customer/student/user analytics
Educational Data Mining/Learning Analytics-related:– EDM 2011, EDM 2015, LASI 2014, JEDM – Handbook of EDM– President-Elect IEDMS
IRB_BD@GMU9 Nov 2014
2Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Outline• Big Data opportunities with education
platforms• Fears of Big Data coming to schools• Reconsidering priorities in
developing/adopting Data-Driven Education paradigm–Ethics-awareness and trustworthiness
• Take-aways: where advice from IRB panels is welcome
IRB_BD@GMU9 Nov 2014
3Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
More ICT – More Data Sources
IRB_BD@GMU9 Nov 2014
4Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Four Major Types of Learning & Kinds of Questions EDM\LA Can Assist with
How to (re)organize the classes, or assessment, or placement of materials based on usage and performance data
How to identify those who would benefit from provided feedback, study advice or other help; How to decide which kind of help would be most effective?
How to help learners in (re-)
finding useful material, done
whether individually or collaboratively
with peers
How to help learners in (re-)
finding useful material, done
whether individually or collaboratively
with peers
IRB_BD@GMU9 Nov 2014
5Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Kinds of Data Being Collected• Administrative data
– Who follows which program, who takes which course, registers for an (interim) exam, reexams
– Demographics, school grades, etc• MOOC and LMS
– Resource usage data– Assessment/assignements data (online tests, source code)– Forums, collaboration, feedback/help requests– Students’ evaluation of learning resources
• ITS, educational games, professional learning, e-Health, simulators, ...
• Gaming, browsing, Gmail, Facebook, Twitter
IRB_BD@GMU9 Nov 2014
6Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
EDM\LA: Data Approach KnowledgeInteractions data- Usage logs & contexts
“Feedback” data- Opinions- Preferences- Needs
Administrative data- Enrolments- Results- Payments- Graduation- Employment
Descriptive data- Demographics- Characteristics
Categorizing students
Classification
Clustering
Association Analysis, Sequence mining
Visual Analytics
Find courses taken together or Popular (parts of) study programs
Process mining
Grouping similar students
Goals
- Identify high risk students
- Predict new student application rates
- Predict students retention/dropout
- Course planning & scheduling
- Faculty teaching load estimation
- Predict demand for resources (library, cafeteria, housing)
- Predict alumni donation
Understanding study curricular
Facilitate reasoning about the process or results via interactive
data/model visualization
IRB_BD@GMU9 Nov 2014
7Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Learning@Scale Potential
Two central questions in DDE• “Does it work?” and “Which way is better?”
Ongoing research:• Gaining insights via (massive) A/B testing• Predictive modeling with actionable attributes
– Prediction vs. persuasion vs. manipulation • Predictive modeling with sensitive attributes
– Ethics-aware personalization w/out discrimination
IRB_BD@GMU9 Nov 2014
8Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Data Trumps Experts’ Intuition• LAK, AIED & EDM: help in
understanding what works and what does not, student modeling etc
• MOOC, ITS & L@S:A/B testing is becoming popular
• MOOC platforms provide support for A/B testing
Example by Ken Koedinger (CMU) at Data-driven education @NIPS2013
Intuitive design can be replaced by data-driven
IRB_BD@GMU9 Nov 2014
9Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Learning@Scale Potential
Two central questions in DDE• “Does it work?” and “Which way is better?”
Some emerging research lines:• Gaining insights via (massive) A/B testing• Predictive modeling with actionable attributes
– Prediction vs. persuasion vs. manipulation • Predictive modeling with sensitive attributes
– Ethics-aware personalization w/out discrimination
IRB_BD@GMU9 Nov 2014
10Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• Parents• ….
IRB_BD@GMU9 Nov 2014
11Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Uplift PredictorsSuppose we do have data from A/B testing• The control dataset
– individuals on which no action was taken• The treatment dataset
– individuals on which an action was taken
Build a model which predicts the causal influence of the action on a given individual• Some students prefer a story, others – a formula,
e.g. girls => story, boys => formula• Challenging to learn such predictors, but feasible!IRB_BD@GMU9 Nov 2014
12Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Fear of Privacy Violation & Data Misuse
• “Many companies are looking to profit from student and teacher data that can be easily collected, stored, processed, customized, analyzed, and then ultimately resold”.
Philip McRae (Alberta Teachers’ Association)
corpwatch.org/img/original/google.jpg
IRB_BD@GMU9 Nov 2014
14Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• Parents• ….
Sensitiveattributes
IRB_BD@GMU9 Nov 2014
15Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
cf. Discrimination at hiring, giving credit loan, etc
Fear of Predictive AnalyticsAre the decisions based on predictive models always ethical? • (Personalized) decisions may be unfair to a certain
group (race, ethnicity, gender)
Are the models/decisions trustworthy?• Do predictive models give guarantees? • Is the accuracy high enough? • Do models provide meaningful insights?• Are they interpretable and transparent?• “Correlation is not causation”
IRB_BD@GMU9 Nov 2014
16Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Fears of Personalization• “When Personalization Goes Bad”
http://www.portical.org/blog/when-personalization-goes-bad
• “Rebirth of the Teaching Machine through the Seduction of Data Analytics: This Time It's Personal”
http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html
• “This time it is Personal and Dangerous”http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/
Pawel Kuczynski©Postcard (World’s Fair, Paris 1899) predicting what learning will be like in France in the year 2000
Predicting with Sensitive AttributesParadox: we need to use personal data to control for unethical predictive analytics• “Fairness through awareness” Dwork et al. • “It’s Not Privacy, and it’s Not Fair” Dwork &
Mulligan
“Discrimination and Privacy in the Information Society” Custers et al. (Eds)• Data mining for discrimination discovery • Explainable vs. unethical discrimination• Accuracy-discrimination tradeoff
IRB_BD@GMU9 Nov 2014
18Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Take-awaysITS, MOOC, OLI – massive scale, cheap and scalable experimentation online • What should be the policies on
student data collection, sharing and use? Potential for data-driven education, finding out what works for students best via randomized trials• What is and is not ethical? (cf. the Facebook study)Effects of persuasion are not uniform• Potential and need for personalization• DM can learn causal models from A/B testing data• How to prevent malignant forms of DDEGeneral guidelines for ethics-aware personalization and persuasion?IRB_BD@GMU9 Nov 2014
19Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven
Thank you!
• Feedback, questions, collaboration ideas: [email protected]
• Staying connected: nl.linkedin.com/in/mpechen/
Fears of Big Data Coming to Schools
personal/educational data misuse, poor predictions, bad personalization
SIAT@SFU11 Aug 2014
21Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Fear of Privacy Violation & Data Misuse
• “Many companies are looking to profit from student and teacher data that can be easily collected, stored, processed, customized, analyzed, and then ultimately resold”.
Philip McRae (Alberta Teachers’ Association)
corpwatch.org/img/original/google.jpg
SIAT@SFU11 Aug 2014
22Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Fear of Predictive AnalyticsAre the decisions based on predictive models always ethical? • (Personalized) decisions may be unfair to a certain
group (race, ethnicity, gender)
Are the models/decisions trustworthy?• Do predictive models give guarantees? • Is the accuracy high enough? • Do models provide meaningful insights?• Are they interpretable and transparent?• “Correlation is not causation”
SIAT@SFU11 Aug 2014
23Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Fears of Personalization• “When Personalization Goes Bad”
http://www.portical.org/blog/when-personalization-goes-bad
• “Rebirth of the Teaching Machine through the Seduction of Data Analytics: This Time It's Personal”
http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html
• “This time it is Personal and Dangerous”http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/
Pawel Kuczynski©Postcard (World’s Fair, Paris 1899) predicting what learning will be like in France in the year 2000
Connections to Privacy & Ethics• What is education data scientist philosophy?• Is EDM always ethical?• Is EDM a threat to privacy? • Dangers of misuse of information• Unethical decision making or personalization
Will these discussions slow-down/kill the development and adoption of predictive learning analytics?SIAT@SFU11 Aug 2014
25Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Predicting with Actionable Attributes
Prediction vs. manipulation; uplift predictors
SIAT@SFU11 Aug 2014
26Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Data Trumps Intuition• LAK, AIED & EDM: help in
understanding what works and what does not, student modeling etc
• MOOC, ITS & L@S:A/B testing is becoming popular
• MOOC platforms provide support for A/B testing
Example by Ken Koedinger (CMU) at Data-driven education @NIPS2013
Intuitive design can be replaced by data-driven
SIAT@SFU11 Aug 2014
27Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• ….
SIAT@SFU11 Aug 2014
28Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Towards Personalized Medicine• A typical medical trial:
– treatment group: gets the treatment– control group: gets placebo (or another
treatment)– do a statistical test to show that the treatment is
better than placebo• With uplift predictors we can find out
– for whom the treatment works and works best or– in case of alternative treatments – which
treatment works best for whom
SIAT@SFU11 Aug 2014
29Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Uplift PredictorsSuppose we do have data from A/B testing• C: the control dataset
– individuals on which no action was taken• T: the treatment dataset
– individuals on which an action was taken
Build a model which predicts the causal influence of the action on a given individual• Challenging, if we assume that there is no globally
better action– Some students prefer a story, others – a formula
• But it is feasibleSIAT@SFU11 Aug 2014
30Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Uplift Predictors: Conclusions• Learn how to choose an action when there is
no globally better action• Clear evidence that this is feasible• Demonstrated, that the effect of action is not
uniform for individuals– focusing on individuals sensitive to choice of action
helps to build better uplift predictors
SIAT@SFU11 Aug 2014
31Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Predicting with Sensitive Attributes
Discrimination-aware mining; bias-aware mining
SIAT@SFU11 Aug 2014
32Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
“Fairness through awareness” by Cynthia Dwork et al. In order to treat similar individuals similarly we must collect more data about individuals.Connections between privacy-preserving and fair predictive modeling.
“It’s Not Privacy, and it’s Not Fair”Cynthia Dwork & Deirdre K. Mulligan
“Discrimination and Privacy in the Information Society” Custers et al. (Eds) Springer, 2013
Sensitive Attributes
• Demographics (gender, race, income, education of parents)
• Proxies to demographics (home address or school location)
• Some (un)known artifacts of data collection– Different instances of a course– Different instructors– Different groups (locations)
SIAT@SFU11 Aug 2014
33Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Predicting with Sensitive Attributes
Model L
population(source)
Sensitive
action?
1. training
2.
2. application
X
S
X'
a’ = argmax(p(y’=1))
Training:
y = L (X, S)
Application:use Lfor an unseen data
y' = L (X’,S’)enforcing P(Y|X,S) = P(Y|X)
labels
Testingdata
labelsy
Sensitive
Historicaldata
SIAT@SFU11 Aug 2014
34Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Predicting with Sensitive Attributes• Accuracy-discrimination tradeoff:
– Data massaging for discrimination-free predictions (ICDM);
– discrimination-aware decision trees, Bayesian classifiers, regression (DAMI, KAIS, ICDM)
• Explainable (ethical/legal) vs. unethical (ICDM) • Data mining for discrimination discovery
(TKDD)• Paradox: we need to use personal data to
control for unethical predictive analyticsSIAT@SFU11 Aug 2014
35Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
SIAT@SFU11 Aug 2014
36Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Predictive analytics should• provide better tooling for DDE, • help to eliminate Big Data fears in the
changing face of modern education, and • not boost these fears of the general
public, educators, students and other stakeholders
ConclusionsITS, MOOC, OLI – massive scale, cheap and scalable experimentation online • Potential for data-driven education,
finding out what works for student best– DM can help to generate promising hypothesis to test
• Effects of interventions/persuasion are not uniform– Potential and need for personalization– DM can help to learn causal models from A/B testing
data: uplift predictors
Fears of (malignant forms of) DDE and DDP• Ethics-aware and context-aware personalizationSIAT@SFU11 Aug 2014
37Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven
Thank you!
• Feedback, questions, collaboration ideas: [email protected]
• Staying connected: nl.linkedin.com/in/mpechen/