big data workshopbettina.berendt/... · • group 1: you are a fitbit-wristband / smart-home...
TRANSCRIPT
![Page 1: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/1.jpg)
Big Data
Workshop
Bettina Berendt Department of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ St. John's International School April 23rd, 2018, Waterloo, Belgium
‹#›
![Page 2: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/2.jpg)
2
2
Who am I?
![Page 3: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/3.jpg)
3
Goals and non-goals
• Goals
▫ Talk about Big Data as a critical data scientist
▫ On a background of what science is & what
“critical“ means in this context
▫ Involve you in being critical and constructive
• Non-goals (selection)
▫ Go into depth about privacy and data protection
– although these topics are unavoidable in the Big
Data context
3
![Page 4: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/4.jpg)
Big Data is ...
(from Alexandra Roche and Josefine Droste’s
presentation)
‹#›
![Page 5: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/5.jpg)
5
Big Data is …
• “the growth in the volume of structured and
unstructured data, the speed at which it is
created and collected, and the scope of how
many data points are collected”
• Potential for personalizing learning
• Inherits bias
• Surveillance
• Ethical dilemmas
• Transparency (pro and con), privacy
(Alexandra Roche & Josefine Droste)
5
![Page 6: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/6.jpg)
Science and being critical are ... ‹#›
![Page 7: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/7.jpg)
7
What is science? (1)
• A systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe.
• the word "science" became increasingly associated with what is today known as the scientific method, a structured way to study the natural world.
• Contemporary science is typically subdivided into the natural sciences which study nature in the broadest sense, the social sciences which study people and societies, and the formal sciences like mathematics which study abstract concepts. […] Disciplines which use science like engineering and medicine may also be considered to be applied sciences.
• Science is related to research, and is normally organized by a university, a college, or a research institute.
(Wikipedia: “Science”) 7
![Page 8: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/8.jpg)
8
(1st part of pic)
8
![Page 9: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/9.jpg)
9
What is science? (2)
“Wissenschaft ist, wenn man genauer nachfragt.”
˜Science happens when you ask again, and ask
more precisely.
(author unknown to me)
9
![Page 10: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/10.jpg)
10
(1st part of pic)
10
![Page 11: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/11.jpg)
Big Data is ...
… something we usually encounter via
statements
‹#›
![Page 12: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/12.jpg)
12
Typical Big Data statements (fictitious, but true to style)
① The average Belgian pupil now spends 3 hours
a day chatting.
② Pupils who spend more than 3 hours a day
chatting “like” Converse sneakers and Dunkin
Donuts.
③ People who “like” Converse and Dunkin
Donuts are less intelligent.
12
![Page 13: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/13.jpg)
13 Typical BD statements (4):
From Psychometrics Centre 2013
to Cambridge Analytica 2016 13
![Page 14: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/14.jpg)
14
14
![Page 15: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/15.jpg)
15
Typical Big Data statements (5) (from the CEM Brochure)
• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an
excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)
• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal
• Once you have students’ final IB Diploma results, you can return this data to us
• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)
15
![Page 16: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/16.jpg)
16
So how …
• … can we understand such statements
scientifically?
• … can we criticise them scientifically?
16
![Page 17: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/17.jpg)
Big Data is ...
… data ‹#›
![Page 18: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/18.jpg)
18
“Data speak for themselves.“
• “With enough data, the numbers speak for
themselves.” Anderson, C. (2008).
• “Quantitative data [...] are independent of
interpretation; [...] they often demand an
interpretation that transcends the quantitative
realm.“ Moretti, F. (2007), p.30
18
![Page 19: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/19.jpg)
19
Data?
• datum = given
• “data refer to those elements that are taken
[abstracted from phenomena]: extracted
through observations, computations,
experiments, and record keeping”, “selected
from nature by the scientist in accordance with
his [sic] purpose” (Kitchin, 2014)
Capta! 19
![Page 20: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/20.jpg)
20
Impact of measure-
ment methods
20
![Page 21: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/21.jpg)
21
Who or what “speaks“?
Who or what “decides“?
21
![Page 22: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/22.jpg)
22
Summary:
Data cannot speak for themselves • All data are not given (by nature), but taken
(by a researcher or other data collector) ▫ With conscious or unconscious purposes/agendas
▫ In some context
• Data and analyses of them require interpretation
• Big Data are samples too
• All data have quality issues; in Big Data, we often do not know these
• Combining datasets can introduce biases and errors
22
![Page 23: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/23.jpg)
23
Parking lot science
23
![Page 24: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/24.jpg)
24
Some more examples of data biases
and parking lot science • Facebook likes, real-world likes
• Facebook self-presentation: only the good things ...
• Restrictions on search in Twitter
Research focus on current and recent events?!
• “Trending topics“ algorithm in Twitter based on burstiness
Suppression of persistent topics?!
24
![Page 25: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/25.jpg)
Big Data is ...
… statistics
(on steroids)
‹#›
![Page 26: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/26.jpg)
26
What should you ask a statistic?
26
![Page 27: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/27.jpg)
27
What should you ask this statement?
The average Belgian pupil now spends 3 hours a day
chatting.
27
![Page 28: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/28.jpg)
28
How to talk back to a statistic (1)
(building on Huff’s final chapter)
1. Who says so?
2. How do they know?
▫ How were data collected and analysed?
▫ In which contexts?
3. Did somebody change the subject?
▫ What are the actual data?
4. Does it make sense?
28
![Page 29: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/29.jpg)
29
So …?
1. Who says so?
2. How do they know?
▫ How were data collected and analysed?
3. Did somebody change the subject?
▫ What are the actual data?
4. Does it make sense?
29
The average Belgian pupil now spends 3 hours a day chatting.
![Page 30: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/30.jpg)
30
Huff’s questions in more detail 1. Who says so?
▫ What could be their conscious or unconscious biases? ▫ Do they use unqualified words (“average”: mean, median, …?) ▫ Do they use OK names? (“The survey results from scientists from the
University of … show …”)
2. How do they know? ▫ Sample size, selection bias? ▫ Correlation size, significance? ▫ Baseline values? ▫ Did external factors change? E.g. frequency of reporting?
3. Did somebody change the subject? / What are the actual data? ▫ Observation or self-report? ▫ Change over time or across data sets in how basic measures are defined ▫ Correlation or causation?
4. Does it make sense? ▫ Be wary of “exact-sounding numbers” (40.13 Euros to eat per week,
average family with 3.5 children) ▫ extrapolation
30
![Page 31: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/31.jpg)
31
Empiricism and apophenia
31
![Page 32: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/32.jpg)
32
Empiricism and apophenia: correlation, causation, and instrumentality
32
![Page 33: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/33.jpg)
33
Correlation vs. causation
• The current scientific consensus is that the only
way to properly demonstrate causation is to do
an experiment.
• Many Big Data sets – especially those
concerning people – are not experimental data,
because they have been collected as
observations in the field, in all the diverse
contexts in which people operate.
• This means they can only show correlation.
33
![Page 34: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/34.jpg)
34
How to talk back to a statistic (2)
1. Who says so?
2. How do they know?
▫ How were data collected and analysed?
3. Did somebody change the subject?
▫ What are the actual data?
▫ Correlation or causation?
4. Does it make sense?
34
![Page 35: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/35.jpg)
35
“Correlation replaces causation“?!
(1) Good enough for business logic
35
![Page 36: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/36.jpg)
36
Correlation replaces causation?!
(2) But deficient for explanation (can we really explain
German history like this?)
36
![Page 37: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/37.jpg)
37
Correlation replaces causation?!
(3) What about predictions that affect someone‘s self-
image?
37
![Page 38: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/38.jpg)
38
Questions you should ask any inferential
statistic (e.g., prediction models)
38
• How good is the model?
• There are many relevant measures of
“goodness”.
• In the following, only a small selection.
![Page 39: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/39.jpg)
39
What is the measure,
and is it statistically significant?
39
[figure caption, from paper]
• Prediction accuracy of
regression for numeric
attributes and traits
expressed by the Pearson
correlation coefficient
between predicted and
actual attribute values;
• all correlations are
significant at the P < 0.001
level.
• The transparent bars
indicate the questionnaire’s
baseline accuracy,
expressed in terms of test–
retest reliability.
![Page 40: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/40.jpg)
40
But what does the correlation value
itself say?
40 (Wikipedia: “Correlation”)
![Page 41: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/41.jpg)
41
But what does the correlation value
itself say?
41 (Wikipedia: “Correlation”)
![Page 42: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/42.jpg)
42
How is a classification model built?
42
![Page 43: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/43.jpg)
43
How is a classification model built?
43
![Page 44: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/44.jpg)
44
How good is the model? (= How is a classification model evaluated?) confusion matrix
44
![Page 45: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/45.jpg)
45
How good?
45
Overall accuracy = (4+900)/1010 = 89.5% Precision for “criminals” = 4/104 = 3.8% Recall for “criminals” = 4/10 = 40% Accuracy of model “always innocent” = 1000/1010 = 99%
![Page 46: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/46.jpg)
46
How to talk back to a statistic (3)
1. Who says so?
2. How do they know?
▫ How were data collected and analysed?
▫ How good is the model?
3. Did somebody change the subject?
▫ What are the actual data? Correlation or
causation?
4. Does it make sense?
46
![Page 47: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/47.jpg)
47
Recap (from the CEM Brochure)
• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an
excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)
• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal
• Once you have students’ final IB Diploma results, you can return this data to us
• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)
47
![Page 48: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/48.jpg)
48
How to talk back to a statistic (4)
1. Who says so?
2. How do they know?
▫ How were data collected and analysed?
▫ How good is the model?
3. Did somebody change the subject?
▫ What are the actual data? Correlation or
causation?
4. Does it make sense?
5. What is actually being claimed?
48
![Page 49: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/49.jpg)
49
Accumulation of errors 49
… and if they see this ad, they will vote for Trump
Statistical model 1
Statistical model 1
![Page 50: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/50.jpg)
Big Data is ...
… business models ‹#›
![Page 51: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/51.jpg)
51
Recap (from the CEM Brochure)
• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an
excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)
• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal
• Once you have students’ final IB Diploma results, you can return this data to us
• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)
51
![Page 52: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/52.jpg)
52
How to talk back to a statistic (5)
1. Who says so? ▫ What (else) are they interested in?
2. How do they know? ▫ How were data collected and analysed?
▫ How good is the model?
3. Did somebody change the subject? ▫ What are the actual data? Correlation or
causation?
4. Does it make sense?
5. What is actually being claimed?
52
![Page 53: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/53.jpg)
53
NB: Can I see my data?
What if it’s wrong?
• You have data access rights (and other rights)
under European data protection legislation.
• But that’s another workshop …
53
![Page 54: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/54.jpg)
Big Data is ...
… an understanding of the past used to justify what some decision maker wants to do in the future.
(Geoffrey Rockwell,
personal communication, cited from memory)
‹#›
![Page 55: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/55.jpg)
55
Which brings us to …
• … the 2nd meaning of “critical” in science
• “Critical theory” (Habermas, Adorno, …) ▫ (social) science as a practical philosophy aiming
at societal change with the goal of increasing the autonomy / self-determination of people
▫ (A view of “critical” not as widely shared as the first one)
Here:
• Is data the only answer?
• What is the question?
55
![Page 56: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/56.jpg)
Let’s be practical philosophers
and scientists
… and we’ll use a different example now ‹#›
![Page 57: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/57.jpg)
57
Belgium:
top?
57
http://ec.europa.eu/eurostat/tgm/refreshTableAction.do?tab=table&plugin=1&pcode=ten00063&language=en
![Page 58: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/58.jpg)
58
Belgium: flop?
58
![Page 59: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/59.jpg)
59
One reason:
Belgians don’t
excel at sorting
waste
59
![Page 60: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/60.jpg)
60
Group work!
• Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems separating their trash properly, in order to give them helpful alerts. You may use any data you want. Prepare a pitch for your business model.
• Group 2: You are a company that wants to use Big Data, but avoid processing personal data. Develop an idea for how to best use these data. Prepare a pitch for your business model.
• Group 3: You are a civil society organisation that wants to
improve the trash situation without recourse to Big Data. Prepare a pitch for your idea.
60
![Page 61: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/61.jpg)
61
Note 1: Definition of “recycling rate”
Recycling rates for packaging waste (in %)
'Recycling rate' for the purposes of Article 6(1) of
Directive 94/62/EC means the total quantity of
recycled packaging waste, divided by the total
quantity of generated packaging waste.
http://ec.europa.eu/eurostat/web/products-
datasets/product?code=ten00063
61
![Page 62: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/62.jpg)
62
Note 2: Recycling science
62
![Page 63: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/63.jpg)
63
Some more ideas
63
![Page 64: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/64.jpg)
64
Shops
64
![Page 65: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/65.jpg)
65
Re-
use
65
![Page 66: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/66.jpg)
66
Activists
66
![Page 67: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/67.jpg)
67
“Science
activists”
67
![Page 68: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/68.jpg)
68
Group work!
• Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems separating their trash properly, in order to give them helpful alerts. You may use any data you want. Prepare a pitch for your business model.
• Group 2: You are a company that wants to use Big Data, but avoid processing personal data. Develop an idea for how to best use these data. Prepare a pitch for your business model.
• Group 3: You are a civil society organisation that wants to
improve the trash situation without recourse to Big Data. Prepare a pitch for your idea.
68
![Page 69: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/69.jpg)
Thank you!
Questions? Email me!
http://people.cs.kuleuven.be/~bettina.berendt/
‹#›
![Page 70: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems](https://reader033.vdocuments.site/reader033/viewer/2022050602/5fa951c3a8bace0630038993/html5/thumbnails/70.jpg)
70
References
70
• Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired 16.07. Available at http://edge.org/3rd_culture/anderson08/anderson08_index.html
• pp. 42ff: Degeling, M. & Berendt, B. (2017). What is wrong about Robocops as consultants? A technology-centric critique of predictive policing. AI & Society. May 2017 Online First.
• pp. 8, 10: Huber, O. (). Das psychologische Experiment: Eine Einführung.
• Huff, D. (1954). How to Lie with Statistics. New York: W.W. Norton & Company, Inc.
• Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.
• p. 13, 37, 39: Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110 (15), 5802–5805.
• Moretti, F. (2005). Graphs, Maps, Trees. Abstract Models for Literary History. p.30 London: Verso (cited from the paperback published in 2007)
• pp. 13f, 49: www.theguardian.com/commentisfree/2018/mar/20/brenda-the-civil-disobedience-penguin-on-cambridge-analytica-the-real-was-getting-caught
• pp. 31f.: From http://www.tylervigen.com/spurious-correlations
• Further sources on the slides themselves.
• My apologies for having mislaid some photo/picture URLs, and thanks to those who provide(d) them online!
Not cited, but also potentially interesting:
• Berendt, B. (2015). Big Capta, Bad Science? On two recent books on “Big Data” and its revolutionary potential. http://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf
• boyd, d. & Crawford, K. (2012). Critical questions for Big Data. Information, Communication & Society, 15:5, 662-679, DOI: 10.1080/1369118X.2012.678878.