mykola pechenizkiy mpechen/ ethics-aware learning analytics irb and big data nsf workshop, george...

37
Mykola Pechenizkiy http://www.win.tue.nl/~mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA, USA

Upload: jeanette-beilby

Post on 16-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Mykola Pechenizkiyhttp://www.win.tue.nl/~mpechen/

Ethics-aware Learning Analytics

IRB and Big Data NSF Workshop, George Masson UniversityArlington, VA, USA

Page 2: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Who I amApplied Data Mining researcher Data scientist

– Predictive analytics, evolving data, big data– Adaptive learning, concept drift, context– Web analytics, customer/student/user analytics

Educational Data Mining/Learning Analytics-related:– EDM 2011, EDM 2015, LASI 2014, JEDM – Handbook of EDM– President-Elect IEDMS

IRB_BD@GMU9 Nov 2014

2Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 3: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Outline• Big Data opportunities with education

platforms• Fears of Big Data coming to schools• Reconsidering priorities in

developing/adopting Data-Driven Education paradigm–Ethics-awareness and trustworthiness

• Take-aways: where advice from IRB panels is welcome

IRB_BD@GMU9 Nov 2014

3Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 4: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

More ICT – More Data Sources

IRB_BD@GMU9 Nov 2014

4Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 5: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Four Major Types of Learning & Kinds of Questions EDM\LA Can Assist with

How to (re)organize the classes, or assessment, or placement of materials based on usage and performance data

How to identify those who would benefit from provided feedback, study advice or other help; How to decide which kind of help would be most effective?

How to help learners in (re-)

finding useful material, done

whether individually or collaboratively

with peers

How to help learners in (re-)

finding useful material, done

whether individually or collaboratively

with peers

IRB_BD@GMU9 Nov 2014

5Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 6: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Kinds of Data Being Collected• Administrative data

– Who follows which program, who takes which course, registers for an (interim) exam, reexams

– Demographics, school grades, etc• MOOC and LMS

– Resource usage data– Assessment/assignements data (online tests, source code)– Forums, collaboration, feedback/help requests– Students’ evaluation of learning resources

• ITS, educational games, professional learning, e-Health, simulators, ...

• Gaming, browsing, Gmail, Facebook, Twitter

IRB_BD@GMU9 Nov 2014

6Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 7: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

EDM\LA: Data Approach KnowledgeInteractions data- Usage logs & contexts

“Feedback” data- Opinions- Preferences- Needs

Administrative data- Enrolments- Results- Payments- Graduation- Employment

Descriptive data- Demographics- Characteristics

Categorizing students

Classification

Clustering

Association Analysis, Sequence mining

Visual Analytics

Find courses taken together or Popular (parts of) study programs

Process mining

Grouping similar students

Goals

- Identify high risk students

- Predict new student application rates

- Predict students retention/dropout

- Course planning & scheduling

- Faculty teaching load estimation

- Predict demand for resources (library, cafeteria, housing)

- Predict alumni donation

Understanding study curricular

Facilitate reasoning about the process or results via interactive

data/model visualization

IRB_BD@GMU9 Nov 2014

7Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 8: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Learning@Scale Potential

Two central questions in DDE• “Does it work?” and “Which way is better?”

Ongoing research:• Gaining insights via (massive) A/B testing• Predictive modeling with actionable attributes

– Prediction vs. persuasion vs. manipulation • Predictive modeling with sensitive attributes

– Ethics-aware personalization w/out discrimination

IRB_BD@GMU9 Nov 2014

8Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 9: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Data Trumps Experts’ Intuition• LAK, AIED & EDM: help in

understanding what works and what does not, student modeling etc

• MOOC, ITS & L@S:A/B testing is becoming popular

• MOOC platforms provide support for A/B testing

Example by Ken Koedinger (CMU) at Data-driven education @NIPS2013

Intuitive design can be replaced by data-driven

IRB_BD@GMU9 Nov 2014

9Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 10: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Learning@Scale Potential

Two central questions in DDE• “Does it work?” and “Which way is better?”

Some emerging research lines:• Gaining insights via (massive) A/B testing• Predictive modeling with actionable attributes

– Prediction vs. persuasion vs. manipulation • Predictive modeling with sensitive attributes

– Ethics-aware personalization w/out discrimination

IRB_BD@GMU9 Nov 2014

10Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 11: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• Parents• ….

IRB_BD@GMU9 Nov 2014

11Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 12: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Uplift PredictorsSuppose we do have data from A/B testing• The control dataset

– individuals on which no action was taken• The treatment dataset

– individuals on which an action was taken

Build a model which predicts the causal influence of the action on a given individual• Some students prefer a story, others – a formula,

e.g. girls => story, boys => formula• Challenging to learn such predictors, but feasible!IRB_BD@GMU9 Nov 2014

12Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 13: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fear of Privacy Violation & Data Misuse

• “Many companies are looking to profit from student and teacher data that can be easily collected, stored, processed, customized, analyzed, and then ultimately resold”.

Philip McRae (Alberta Teachers’ Association)

corpwatch.org/img/original/google.jpg

IRB_BD@GMU9 Nov 2014

14Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 14: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• Parents• ….

Sensitiveattributes

IRB_BD@GMU9 Nov 2014

15Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

cf. Discrimination at hiring, giving credit loan, etc

Page 15: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fear of Predictive AnalyticsAre the decisions based on predictive models always ethical? • (Personalized) decisions may be unfair to a certain

group (race, ethnicity, gender)

Are the models/decisions trustworthy?• Do predictive models give guarantees? • Is the accuracy high enough? • Do models provide meaningful insights?• Are they interpretable and transparent?• “Correlation is not causation”

IRB_BD@GMU9 Nov 2014

16Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 16: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fears of Personalization• “When Personalization Goes Bad”

http://www.portical.org/blog/when-personalization-goes-bad

• “Rebirth of the Teaching Machine through the Seduction of Data Analytics: This Time It's Personal”

http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html

• “This time it is Personal and Dangerous”http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/

Pawel Kuczynski©Postcard (World’s Fair, Paris 1899) predicting what learning will be like in France in the year 2000

Page 17: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Predicting with Sensitive AttributesParadox: we need to use personal data to control for unethical predictive analytics• “Fairness through awareness” Dwork et al. • “It’s Not Privacy, and it’s Not Fair” Dwork &

Mulligan

“Discrimination and Privacy in the Information Society” Custers et al. (Eds)• Data mining for discrimination discovery • Explainable vs. unethical discrimination• Accuracy-discrimination tradeoff

IRB_BD@GMU9 Nov 2014

18Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 18: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Take-awaysITS, MOOC, OLI – massive scale, cheap and scalable experimentation online • What should be the policies on

student data collection, sharing and use? Potential for data-driven education, finding out what works for students best via randomized trials• What is and is not ethical? (cf. the Facebook study)Effects of persuasion are not uniform• Potential and need for personalization• DM can learn causal models from A/B testing data• How to prevent malignant forms of DDEGeneral guidelines for ethics-aware personalization and persuasion?IRB_BD@GMU9 Nov 2014

19Ethics-aware Predictive Learning AnalyticsMykola Pechenizkiy, TU Eindhoven

Page 19: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Thank you!

• Feedback, questions, collaboration ideas: [email protected]

• Staying connected: nl.linkedin.com/in/mpechen/

Page 20: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fears of Big Data Coming to Schools

personal/educational data misuse, poor predictions, bad personalization

SIAT@SFU11 Aug 2014

21Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 21: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fear of Privacy Violation & Data Misuse

• “Many companies are looking to profit from student and teacher data that can be easily collected, stored, processed, customized, analyzed, and then ultimately resold”.

Philip McRae (Alberta Teachers’ Association)

corpwatch.org/img/original/google.jpg

SIAT@SFU11 Aug 2014

22Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 22: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fear of Predictive AnalyticsAre the decisions based on predictive models always ethical? • (Personalized) decisions may be unfair to a certain

group (race, ethnicity, gender)

Are the models/decisions trustworthy?• Do predictive models give guarantees? • Is the accuracy high enough? • Do models provide meaningful insights?• Are they interpretable and transparent?• “Correlation is not causation”

SIAT@SFU11 Aug 2014

23Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 23: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Fears of Personalization• “When Personalization Goes Bad”

http://www.portical.org/blog/when-personalization-goes-bad

• “Rebirth of the Teaching Machine through the Seduction of Data Analytics: This Time It's Personal”

http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html

• “This time it is Personal and Dangerous”http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/

Pawel Kuczynski©Postcard (World’s Fair, Paris 1899) predicting what learning will be like in France in the year 2000

Page 24: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Connections to Privacy & Ethics• What is education data scientist philosophy?• Is EDM always ethical?• Is EDM a threat to privacy? • Dangers of misuse of information• Unethical decision making or personalization

Will these discussions slow-down/kill the development and adoption of predictive learning analytics?SIAT@SFU11 Aug 2014

25Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 25: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Predicting with Actionable Attributes

Prediction vs. manipulation; uplift predictors

SIAT@SFU11 Aug 2014

26Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 26: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Data Trumps Intuition• LAK, AIED & EDM: help in

understanding what works and what does not, student modeling etc

• MOOC, ITS & L@S:A/B testing is becoming popular

• MOOC platforms provide support for A/B testing

Example by Ken Koedinger (CMU) at Data-driven education @NIPS2013

Intuitive design can be replaced by data-driven

SIAT@SFU11 Aug 2014

27Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 27: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

If We Were Able to Look DeeperHow these averages could possibly differ per• Student learning style• Student background• Country they studied• Ethnicity• Gender• ….

SIAT@SFU11 Aug 2014

28Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 28: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Towards Personalized Medicine• A typical medical trial:

– treatment group: gets the treatment– control group: gets placebo (or another

treatment)– do a statistical test to show that the treatment is

better than placebo• With uplift predictors we can find out

– for whom the treatment works and works best or– in case of alternative treatments – which

treatment works best for whom

SIAT@SFU11 Aug 2014

29Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 29: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Uplift PredictorsSuppose we do have data from A/B testing• C: the control dataset

– individuals on which no action was taken• T: the treatment dataset

– individuals on which an action was taken

Build a model which predicts the causal influence of the action on a given individual• Challenging, if we assume that there is no globally

better action– Some students prefer a story, others – a formula

• But it is feasibleSIAT@SFU11 Aug 2014

30Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 30: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Uplift Predictors: Conclusions• Learn how to choose an action when there is

no globally better action• Clear evidence that this is feasible• Demonstrated, that the effect of action is not

uniform for individuals– focusing on individuals sensitive to choice of action

helps to build better uplift predictors

SIAT@SFU11 Aug 2014

31Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 31: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Predicting with Sensitive Attributes

Discrimination-aware mining; bias-aware mining

SIAT@SFU11 Aug 2014

32Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

“Fairness through awareness” by Cynthia Dwork et al. In order to treat similar individuals similarly we must collect more data about individuals.Connections between privacy-preserving and fair predictive modeling.

“It’s Not Privacy, and it’s Not Fair”Cynthia Dwork & Deirdre K. Mulligan

“Discrimination and Privacy in the Information Society” Custers et al. (Eds) Springer, 2013

Page 32: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Sensitive Attributes

• Demographics (gender, race, income, education of parents)

• Proxies to demographics (home address or school location)

• Some (un)known artifacts of data collection– Different instances of a course– Different instructors– Different groups (locations)

SIAT@SFU11 Aug 2014

33Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 33: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Predicting with Sensitive Attributes

Model L

population(source)

Sensitive

action?

1. training

2.

2. application

X

S

X'

a’ = argmax(p(y’=1))

Training:

y = L (X, S)

Application:use Lfor an unseen data

y' = L (X’,S’)enforcing P(Y|X,S) = P(Y|X)

labels

Testingdata

labelsy

Sensitive

Historicaldata

SIAT@SFU11 Aug 2014

34Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 34: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Predicting with Sensitive Attributes• Accuracy-discrimination tradeoff:

– Data massaging for discrimination-free predictions (ICDM);

– discrimination-aware decision trees, Bayesian classifiers, regression (DAMI, KAIS, ICDM)

• Explainable (ethical/legal) vs. unethical (ICDM) • Data mining for discrimination discovery

(TKDD)• Paradox: we need to use personal data to

control for unethical predictive analyticsSIAT@SFU11 Aug 2014

35Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 35: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

SIAT@SFU11 Aug 2014

36Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Predictive analytics should• provide better tooling for DDE, • help to eliminate Big Data fears in the

changing face of modern education, and • not boost these fears of the general

public, educators, students and other stakeholders

Page 36: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

ConclusionsITS, MOOC, OLI – massive scale, cheap and scalable experimentation online • Potential for data-driven education,

finding out what works for student best– DM can help to generate promising hypothesis to test

• Effects of interventions/persuasion are not uniform– Potential and need for personalization– DM can help to learn causal models from A/B testing

data: uplift predictors

Fears of (malignant forms of) DDE and DDP• Ethics-aware and context-aware personalizationSIAT@SFU11 Aug 2014

37Predictive Analytics for Data-Driven EducationMykola Pechenizkiy, TU Eindhoven

Page 37: Mykola Pechenizkiy mpechen/ Ethics-aware Learning Analytics IRB and Big Data NSF Workshop, George Masson University Arlington, VA,

Thank you!

• Feedback, questions, collaboration ideas: [email protected]

• Staying connected: nl.linkedin.com/in/mpechen/