responsible data science...responsible data science ensuring fairness, accuracy, confidentiality,...

34
Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst www.responsibledatascience.org @wvdaalst

Upload: others

Post on 28-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT)

Wil van der Aalst

www.responsibledatascience.org @wvdaalst

Page 2: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

data

insights decisions models

Page 3: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements) ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

1) creating value from data

2) responsible data science: the next big challenge 2

1

Page 4: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Example: It’s a kind of magic …

• Behavioral models • Bottlenecks • Deviations • Predictions • Recommendations • …

Page 5: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

…, but with great power comes great responsibility!!

0101100 1001011 1101110 1011011 0111100 1001011 1101110 1001011 0101110 1001011 0100011 1000001 0101100 1001011 0101111 1001011 0101100

XXXX will make things better, faster, more

efficient, more effective, cheaper, …

Page 6: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements) ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

If data is the new oil on which our society runs, …

Page 7: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements) ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

unfair use of data

privacy violations bogus conclusions

non-transparent

… then we should take care of data-related forms of pollution!

spurious correlations

Page 8: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements) ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Green data science: separate the “pollution” from the actual purpose

Page 9: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Responsible Data Science

Page 10: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

The

RD

S Te

am

Page 11: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Unique • Collaboration between principal scientists

from Eindhoven University of Technology, Leiden University, University of Amsterdam, Radboud University Nijmegen, Tilburg University, VU University, Amsterdam Medical Center, VU Medical Center, Leiden University Medical Center, Delft University of Technology, and CWI.

• Involves disciplines like data/process mining, digital humanities, ethics, information retrieval, knowledge representation, law, machine learning, natural language processing, security, statistics, and visualization.

Page 12: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

FACT

Page 13: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Fairness: Data Science without prejudice: How to avoid unfair conclusions

even if they are true?

Page 14: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Standard classification problem

scholarship application

decision

Page 15: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Name: Peter Age: 28 Gender: Male Country: German Hobbies: Soccer Fav. food: Sauerkraut …

Learn classifier using training data

Graduated: Yes Duration: 8 years

Average grade: 6.4 …

Page 16: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Name: Peter Age: 28 Gender: Male Country: German Hobbies: Soccer Fav. food: Sauerkraut …

Tend to reject older male German students

Graduated: Yes Duration: 8 years

Average grade: 6.4 …

Page 17: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Name: Peter Hobbies: Soccer Fav. food: Sauerkraut …

Tend to reject “sauerkraut eating soccer fans”

Graduated: Yes Duration: 8 years

Average grade: 6.4 …

Older male German students still do not stand a chance to get a scholarship

confidential

Page 18: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Name: Peter Age: 28 Gender: Male Country: German Hobbies: Soccer Fav. food: Sauerkraut …

Discrimination-aware classification

Graduated: Yes Duration: 8 years

Average grade: 6.4 …

add fairness constraint(s) to problem

paradox: need to use sensitive attributes

Page 19: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Accuracy: Data Science without guesswork: How

to answer questions with a guaranteed level of

accuracy?

Page 20: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Spurious Correlations

Page 21: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Spurious Correlations

Page 22: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Curse of dimensionality

Test enough hypotheses and one will be true by accident (Carlo Emilio Bonferroni)

Page 23: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Example: Bonferroni and the AIVD Assumptions: • 18 million people in NL • 1800 hotels • 100 guests per hotel per

night • (visit hotel every 100 days)

find the terrorists

Page 24: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

Suspicious event: two persons stay in the same hotel on two different dates

How many suspicious events in a 1000 day period?

Page 25: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

A bit of reasoning …

Very suspicious!

Page 26: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Some more reasoning …

There are hundreds of thousands of

terrorists!

Page 27: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Confidentiality: Data Science that ensures

confidentiality: How to answer questions without

revealing secrets?

Page 28: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

How to share data in a safe manner?

How to compute results with a predefined “privacy budget”?

How to distribute analysis such that nobody has the data?

Page 29: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Transparency: Data Science that provides transparency: How to

clarify answers such that they become indisputable?

Page 30: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

How to present results such that people understand? How to make the “data

science pipeline” transparent?

How to reveal analysis choices and risks related to the input data?

Do analysis results indeed influence people as intended?

Page 31: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

Conclusion

Page 32: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

1) data science: creating value from data

2) responsible data science: the next big challenge 2

1

Page 33: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

data

insights decisions models

responsible by design

Page 34: Responsible Data Science...Responsible Data Science Ensuring Fairness, Accuracy, Confidentiality, and Transparency (FACT) Wil van der Aalst @wvdaalst©Wil van der Aalst & TU/e (use

©Wil van der Aalst & TU/e (use only with permission & acknowledgements)

www.responsibledatascience.org

Join us!