big data analytics - leiden universityliacs.leidenuniv.nl/~wolstencroftkj/bigdataanalytics.pdf ·...

23
29-11-2016 1 Discover the world at Leiden University + cases in the health domain Wessel Kraaij Big data analytics Discover the world at Leiden University Privacy? Big five: Openness Conscientiousness Extraversion Agreeableness Neuroticism Who wants to display his big five profile online?

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

1

Discover the world at Leiden University

+ cases in the health domain Wessel Kraaij

Big data analytics

Discover the world at Leiden University

Privacy?

• Big five: • Openness

• Conscientiousness

• Extraversion

• Agreeableness

• Neuroticism

• Who wants to display his big five profile online?

Page 2: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

2

Facebook

3

Personality profile (PIP)

Openness

Conscientiousness Extraversion

Agreeableness Neuroticism

Example predictive analytics: Facebook personality study

4

Page 3: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

3

Evaluation of the model

6

7

Page 4: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

4

8

Nationale Wetenschaps Agenda

• Nationale wetenschapsagenda: 25 van de 140 clustervragen zijn gerelateerd aan Big Data

• Big Data Verantwoord Gebruiken: Zoeken naar Patronen in Grote Gegevensbestanden

• https://vragen.wetenschapsagenda.nl/

9

Page 5: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

5

Big data & analytics

10

Examples of big data

The ‘traditional’ internet (45 billion indexed pages) Google contributed Hadoop, Google file system

Amazon popularized cheap cloud services

Imagery data Digital video (Youtube adds 6000hrs/hr=13TB/hr)

Satellite data, Astronomical data (LOFAR) 140 PB/day

Measurement data Genome sequences, weather, air quality

Human sensors

10 million smartphones in the Netherlands, always on.

11

Capturing and storing data is a commodity. Now the real challenge is combining datasets and SENSE MAKING

Page 6: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

6

Making Sense of Big Data requires dealing with..

Heterogeneity : combining different information sources often

mentioned as game changer (variety)

Information quality (noisy data, uncertain data, missing data,

provenance, veracity, traceability, auditing)

the large volume of data

the speed in which data becomes available (velocity)

Dealing with the complexity of the system

From raw data to information :

Analyze, understand, reason, decide, act

12

Extracting meaning

13 images: Rose Business Technologies, mskcc.org, searchengineland.com, rtvdrenthe.nl, onweer-online.nl

Page 7: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

7

The different kinds of data analytics (Gartner, 2013) Descriptive analytics

Summarize and organize data into actionable information

Ex: transcribing speech, semantic annotation of images, video, measurements

Diagnostic analytics Causal inference

What caused a certain observation?

Predictive analytics 1. Predicting preferences by looking at similar people (using correlations)

2. Predicting what will happen (states, gradients) using historical data

Prescriptive analytics Suggest quantified decision options making use of the predictions and

associated cost models

E.g. Health care, smart factories etc.

14

Limitations: Societal impact

Transparency:

in the future we can trace the origin of our food,

but also: car insurer can detect risk prone driving style

Personalization:

personalized learning, personalized health care etc.

but also risk of abuse of information position

Society should define limits

Ex: Google Glass is not successful due to perceived privacy breaches

Challenge: design architecture with personalized control of trade-off privacy-utility

15

Page 8: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

8

Limitations

Data protection directive

GDPR

Netherlands

Wet op de persoonsbescherming

Key principle:

Data is collected for a specific purpose, cannot be used for other purposes without explicit consent

Limits combination of datasets

16

Limitations: spurious correlations

17

Page 9: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

9

Correlation does not imply causation [ex. thesis T. Claassen]

18

Cannabis

Depression

+

Cannabis Depression

DNA/Brain

+

Depression

Cannabis

+

Only in the first situation, banning Cannabis will help reducing

depression

STUDY EXAMPLE: SWELL PROJECT

19

Page 10: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

10

Project focus: help alter bad habits of knowledge workers and improve creativity, effectiveness

SWELL, one of the COMMIT/ projects 2011-2017

SWELL ambition

Support people to

Manage their work

Manage their health

Improve their well-being

How: Use sensors and ICT for:

1.Observe

2. Interpret

3. Act

Page 11: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

11

This is not a new idea

1924-1932: Experiments at Hawthorne works

Can workers become more productive at lower or higher levels of

light?

Outcome: effect on productivity was positive, but short-lived

>> Hawthorne effect: <<

Cf. Chris Anderson’s TEDx talk “Living by Numbers” http://vimeo.com/26182608

Microsoft Office: Clippy

Result of pMS roject on using Bayesian techniques to infer user goals

and provide feedback (PI Eric Horvitz)

Some reasons for failing:

No persistent user profiles

No adaptation to user compentence level

Reasoning based on very little data

Distinct un-connected activity and content models

Development team replaced utility function based interaction

algorithm by simplistic rule based engine.

http://robotzeitgeist.com/tag/clippy

http://en.wikipedia.org/wiki/Office_Assistant

Ex

isti

ng

ap

plic

ati

on

s

Page 12: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

12

RSI prevention

Work Rave

Ex

isti

ng

ap

pli

ca

tio

ns

Google

A user of Google services enables Google to get a very precise idea

about parts of my life.

Used for serving relevant ads.

The better the user model, the higher the revenue through Adsense.

Page 13: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

13

Discussion

Workrave has a preconceived idea of what is good for a PC user

Too intrusive

Clippy does some initial attempt at guessing what a user wants to do,

combined with a proactive interaction style.

Lack of personalization / adaptation / interaction style choice

Google’s goal is to create a complete user model, but respecting privacy

conflicts with their business model

Conclusions: User wants to be in control, early apps are not adaptive, since

they do not learn over time do not adapt to individual preferences

Maybe we need to learn more about humans …

The need for a more refined model of human behaviour and HCI

Goal: support for self management of workload, health etc.

Requirements / success factors at application level

Does not restrict autonomy, limited impact on privacy

Must be easy to use

What is difficult for humans

Balance short term and long term optimization

Self-perception is coloured

Memory is imperfect, selective and coloured

Easy to under or over-estimate required effort

Page 14: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

14

Reflective Practice (Donald Schön)

Reflection on experience as the basis for learning

Reflection in action: experience guides action

Reflection on action: Reflect on reaction on situation after the fact,

exploring reasons and consequences

Recommendations for reflective practice

Keeping a journal;

Seeking feedback;

View experiences objectively; and

Taking time at the end of each day, meeting, experience etc. to

reflect-on-actions.

Schön, D. (1983) The Reflective Practitioner, How Professionals Think In Action, Basic Books

Lea

rnin

g t

he

ori

es

Transtheoretical model for behavioural change

Dominant model for change of health related behaviour

Prochaska, JO.; DiClemente, CC. The transtheoretical approach. In: Norcross, JC;

Goldfried, MR. (eds.) Handbook of psychotherapy integration. 2nd ed. New York: Oxford

University Press; 2005. p. 147–171.

Page 15: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

15

Summary

Hypothesis 1: Human effectiveness can be significantly improved with

reflection and introspection that questions assumptions and frames.

“Thinking out of the box” “(Avoid) tunnel vision” “Take the blinkers off”

Hypothesis 2: It is useful to recognize different stages of behavioural

change and provide appropriate feedback

How can we support reflective practice?

Provide tools for objective logging and activity analysis

Stimulate people to take time for reflection by making it easy.

Provide non-judgemental and stimulating feedback

Involve peers / colleagues / friends

SWELL: Main hypothesis

Self-lifelogging (recording activities and inferring mental/physical state) can be used to improve well-being of knowledge workers, by:

Supporting behavioural changes

Improving self efficacy and self-knowledge

Main data analytics challenges: SENSE: Interpreting human activity and mental + physical condition on the

basis of a combination of various types of unobtrusive low level sensors

REASON: Reasoning about activities and the consequences of different action alternatives

ACT: Personalizing the motivational feedback for health and well being applications

Page 16: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

16

36

Affective computing

Privacy respecting data processing

Causal inference

Real time classification

Combination of unobtrusive sensors

37

Personalized context sensitive recommendation

Explainable AI Learning to coach

from user feedback

Persuasive technology

Page 17: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

17

Interpret

sensor data in

context

Create

customised

recommendation

Worker / Patient

sense Act (Nudge)

State & context

SWELL workflow

Study example: Affective computing to measure stress and mental effort

50

Page 18: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

18

SWELL Results (Koldijk)

Promising results with unobtrusive stress monitoring

Methodology for designing stress interventions, grounded in theory and operationalized by sensor technology and taking into account privacy concerns

Several prototype m-health apps

Koldijk, S., Bernard, J., Ruppert, T., Kohlhammer, J., Neerincx, M.A., & Kraaij, W. (2015). Visual Analytics of Work Behavior Data - Insights on Individual Differences. In: Proceedings of EuroVis 2015 Koldijk, S., Neerincx, M.A., Kraaij, W., Detecting work stress in offices by combining unobtrusive sensors , (2016) IEEE Transactions on Affective Computing

Koldijk, S., Kraaij, W. & Neerincx, M.A. (2016). Deriving Requirements for Pervasive Well-Being Technology From Work Stress and Intervention Theory: Framework and Case Study. JMIR Mhealth Uhealth 2016;4(3):e79.

Research on: Sensing & Interpretation

Perceived Stress (VAS)

Emotion (SAM)

Mental Effort (RSME)

Task Load (NASA-TLX)

Controlled variable:

induced stress

Outcomes

Page 19: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

19

How could stress or related aspects be measured with (unobtrusive) sensors?

Unobtrusive measurements

The SWELL Knowledge work dataset for stress and user modeling research [koldijk,2014]

54

Controlled experiment with 25 subjects

Three blocks: neutral, stressor 1, stressor 2

Sensor measurements and questionnaires

Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M.A., & Kraaij, W. (2014). The SWELL Knowledge Work Dataset for Stress and User Modeling Research. In: Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI 2014) Kraaij, Prof.dr.ir. W. (Radboud University & TNO); Koldijk, MSc. S. (TNO & Radboud University); Sappelli, MSc M. (TNO & Radboud University) (2014): The SWELL Knowledge Work Dataset for Stress and User Modeling Research. DANS. http://dx.doi.org/10.17026/dans-x55-69zp Sappelli, M., Verberne, S., Koldijk, S., & Kraaij, W. (2014). Collecting a dataset of information behaviour in context. In: Proceedings of the 4th Workshop on Context-awareness in Retrieval and Recommendation (CARR @ ECIR 2014) (Amsterdam, The Netherlands, 13-16 April 2014).

Feature values are averaged per

minute

Page 20: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

20

Identifying the working condition

Conclusion: Sensors do record different behaviour in stressor conditions

10 fold cross validation

Modality comparison

Conclusion: Posture is the strongest feature modality, Facial helps

Page 21: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

21

Predicting subjective mental state/effort

Sensor data seems to be most powerful for predicting ‘mental effort (RMSE)’

Model tree regression best performer (correlation 0.83)

Facial expression features are best predictors, followed by posture.

How important are individual differences?

Identifying working condition: Full population based SVM : 90% (majority baseline 62%)

Adding participant ID: no change!

Test on unseen user: (leave one out): average 59% (min 37.5, max 88.34)

=> performance for new users might be very low

Predicting mental state: Full population based regression model : 0.83

Adding participant ID: 0.94

Test on unseen user: (leave one out): 0.03

=> performance for new users might be very low

Especially the second task is sensitive to individual differences

Page 22: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

22

Towards a subtype based analysis

Idea: cluster subjects, train and evaluate classifier for subtypes

Hierarchical clustering (for determining k) and k-mean

Modality Subtype1 Subtype2 Subtype3 General

Computer Writers(16) 0.17

Copy-pasters(9) 0.34

0.15

Facial Low expression (16) 0.79

Eyes wide & mouth tight(3) 0.81

Tight eyes & loose mouth (6) 0.87

0.81

Posture Sits still & moves right arm (5) 0.76

Restless body & calm wrist 6) 0.85

Average movement (14) 0.69

0.59

Conclusions & Next Steps

SWELL KW dataset multimodal dataset for research in user modeling and affective computing

We can distinguish stressful working conditions in a controlled setting using non obtrusive sensors (posture best feature)

Mental effort can best be estimated using facial expression data

Individual differences play a significant role

A hierarchical approach using subtypes is promising

Plan: run field study

Field study at BZK

Validate stress markers

61

Page 23: Big data analytics - Leiden Universityliacs.leidenuniv.nl/~wolstencroftkj/BigDataAnalytics.pdf · 2016-12-06 · 29-11-2016 6 Making Sense of Big Data requires dealing with.. Heterogeneity:

29-11-2016

23

Take aways

Big data offer a large potential for new business and scientific research

Ethical considerations should be taken into account when designing big data processing systems

Affective computing and behavioural analytics are potentially powerful techniques for developing digital personal assistants

77