exploiting sensor data to increase compliance with ... · exploiting sensor data to increase...

Exploiting Sensor Data to IncreaseCompliance with Ecological Momentary

Assessments

Author:Pietro CROVARIID No 874562

Supervisor:Prof.ssa Franca GARZOTTO

Co-Supervisor:Prof. Thomas PLÖTZ

Prof. Gregory ABOWDProf. Andrea BOTTINO

Master Thesis

Computer Science and Engineering Master ProgramScuola di Ingegneria Industriale e dell’Informazione

Dipartimento di Elettronica, Informazione e Bioingegneria

Accademic Year 2017-2018

iii

- Are you an idiot?- No, Sir, I am a dreamer

John M. Dorian

v

POLITECNICO DI MILANO

SommarioScuola di Ingegneria Industriale e dell’Informazione


Computer Science and Engineering

Exploiting Sensor Data to Increase Compliance with Ecological MomentaryAssessments

di Pietro CROVARI

La salute mentale nelle università è un problema serio. Gli Ecological MomentaryAssessment (EMA) sono una famiglia di strumenti che si sono rivelati molto effi-caci per la sua valutazione. Purtroppo però di solito sono percepiti come un pesoda parte degli utenti e perciò il tasso di conformità è molto basso. In questo la-voro proponiamo un algoritmo di Machine Learning che sfrutta i dati raccolti daglismartphone per selezionare il miglior EMA da mostrare all’utente per aumentare lasua conformità. Viene formulato un modello matematico per la valutazione dellaQuantità di Informazione contenuta nei diversi EMA. Successivamente, utilizziamouna variante dell’algoritmo “Multi-Armed Bandit” per selezionare il miglior EMAda visualizzare a seconda del contesto rilevato. Tramite una sperimentazione con8 partecipanti scopriamo come a diversi contesti corrispondano diversi EMA otti-mali. Infine, durante un colloquio individuale con i partecipanti alla fine dello stu-dio riscontriamo quanto sia importante per l’utente avere un formato di EMA cheapprezzi e la stretta dipendenza tra la sua disponibilità nel rispondere e il contestoin cui si trova.

HTTP://WWW.POLIMI.IT

vii

POLITECNICO DI MILANO

AbstractScuola di Ingegneria Industriale e dell’Informazione


Computer Science and Engineering

Exploiting Sensor Data to Increase Compliance with Ecological MomentaryAssessments

by Pietro CROVARI

Monitoring Mental health in student population is a very important task. Ecologi-cal Momentary Assessments (EMAs) is a method based on questions prompted ona smartphone to capture the subjective state of an individual in their natural con-text. Unfortunately, EMAs are perceived as very burdening by the users, and there-fore the compliance rate is often low. We propose a machine learning algorithm toexploit sensor data from smartphone to select the optimal format EMA format toprompt such that it maximizes users’ compliance and, consequently, the informa-tion get by the researcher. A framework to evaluate the “Quantity of Information”is formulated and used to determine the informativeness of a set of EMAs. Then, a“Multi-Armed Bandit” algorithm variant is employed to select the best EMA accord-ing to the sensed context. A small scale field study (8 participants) is carried out, todiscover how to different contexts often correspond different optimal EMAs. An in-dividual interview with each participant reveals the importance of the employmentof a format of EMA the users appreciate, and the dependence between the contextin which they are and their availability to answer different format of EMAs.

HTTP://WWW.POLIMI.IT

ix

AcknowledgementsWhile I am writing these words I am sitting at my desk in Ubicomp Laboratory,distant more than 7000 Km from Milan. I am writing the last pages of this documentand thinking back to what that thesis has been. This internship in Georgia Instituteof Technology has been more than a research period. I arrived in the US that I wasa student, uncertain about his future, and doubtful about his abilities. In these fourmonths, I had the chance to challenge myself, to understand my real capabilitiesand know me better. I will leave Atlanta in a couple of week as a grown person, areal Researcher, determined about his future and enthusiast to begin his career. Butnothing of this could be possible, without the help and the support of many people.

First of all, a huge thank to my Supervisor, Professor Garzotto, the first personto see in me some potentialities I could not even imagine, and the person that madethis experience possible.

Thanks to Prof. Abowd for having welcomed me in his research group, for allhis suggestions, for having treated me with so much attention, and having made mefeel like home.

Thanks to Prof. Plötz. Your help has been essential for the whole project. Besidethe mere technicalities, you really taught me what doing research means. I will missall the "Buongiorno" and the "Arrivederci".

Thanks to Vedant, you have been my guide both in my work and in my Americanlife. Thanks for all the time you spend helping me.

Thanks to all my lab mates, Dan, Mehrab, Shruthi, Hong, Hyeok, and Philips.You made me really understand what working in an international group means.I really learned a lot from all the discussion we had. You taught me that even ifspeak different languages, we have different traditions, different beliefs and we eatdifferent things, at the end of the day we are much more similar than what mayseem.

Thank to my International Family: Chiara, Irene, Jordi, Joan, and all the others.Without you this experience wouldn’t have been the same. A famous quote saysthat Friends are the Family we choose for ourselves, and I strongly believe you hadbeen my family in these four amazing months.

I truly think that this incredible adventure is only the tip of the iceberg of thejourney that brought me here. The two years in Milan has been one of the bestexperiences of my life.

Thanks to my Family, my first supporter. You really made all this possible, sup-porting me from the very first moment I arrived in Milan. These have not been twoeasy years, but you made me go through all of them.

Thanks to my acquired brothers, Luca and Dodo, without you I woludn’t be thesame. An entire page wouldn’t be enough to say how much I am grateful to you.

Thanks to my friends in Genova, the "figli", Zini, Zava, Gian and all the others,for showing me that the distance doesn’t matter.

Thanks to my Milan friends, in particular to Francesco, Sebastian, Chiara, andBea. You are my happy island, some providing me from getting crazy in everydayroutine. When I arrived in Milan, I believed I could never have created strong bonds,but you proved me I was totally wrong. I am so grateful I met amazing people likeyou.

Thank you to all the ASPers, and CLONE team for having taught me how amaz-ing is working together to a common goal.

Finally, thank you Hoclab, for having shown me the beauty of facing life with asmile.

xi

Contents

Sommario v

Abstract vii

Acknowledgements ix

1 Introduction 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Review 52.1 Assessing Mood Instability . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Self-Esteem Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Patient Health Questionnaire . . . . . . . . . . . . . . . . . . . . 72.1.3 Positive And Negative Affect Schedule . . . . . . . . . . . . . . 7

Photographic Affect Meter: a visual alternative to PANAS . . . 112.2 Ecological Momentary Assessment . . . . . . . . . . . . . . . . . . . . . 122.3 Increasing the compliance rate . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Reducing the burden . . . . . . . . . . . . . . . . . . . . . . . . . 142.3.2 Increasing user’s engagement . . . . . . . . . . . . . . . . . . . . 152.3.3 Finding the best moment to interrupt . . . . . . . . . . . . . . . 17

2.4 Predicting Interruptibility . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Formulation Modelling 233.1 Mathematical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.2 Quantity of Information . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Multi-Armed Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.1 Multi-Armed Bandit Setting . . . . . . . . . . . . . . . . . . . . . 273.2.2 Solving MABs: Upper Confidence Bound - UCB1 . . . . . . . . 29

3.3 Contextual Multi-Armed Bandit . . . . . . . . . . . . . . . . . . . . . . 323.3.1 Query-Ad-Clustering Algorithm . . . . . . . . . . . . . . . . . . 32

4 Study Design 354.1 Experimentation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 35

Wrap-up Meeting . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.1 EMAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Instrument Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3 AWARE Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4 AWARE Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.1 AWARE Customization . . . . . . . . . . . . . . . . . . . . . . . 484.5 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xii

5 Data Analysis 515.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1.1 Sensors Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52AWARE Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Application in use . . . . . . . . . . . . . . . . . . . . . . . . . . 52Calls Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Ambient Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Screen Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Network Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.2 Plug-ins Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 57Ambient Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Google Fused Location . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Data Preparation and Features Transformation . . . . . . . . . . . . . . 605.2.1 battery_level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2.2 device_on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2.3 hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.4 last_activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.5 light_value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.6 network_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.7 notification_number . . . . . . . . . . . . . . . . . . . . . . . . 625.2.8 place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.9 screen_app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.3 EMA data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Discussion and Conclusions 676.1 Quantitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.3 Limitations and Further Works . . . . . . . . . . . . . . . . . . . . . . . 766.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A Certificates of Attendance Online courses 79

B Consent Form 81

Bibliography 85

xiii

List of Figures

2.1 Rosenberg’s Self Esteem Scale . . . . . . . . . . . . . . . . . . . . . . . . 62.2 PHQ-9 Questionnaire. Questions (1) and (2) compose PHQ-2 . . . . . . 82.3 PHQ-9 Questionnaire. Questions (1) and (2) compose PHQ-2 . . . . . . 102.4 (a) example of PAM questionnaire on a smartphone (b) valence and

arousal scales of PAM pictures . . . . . . . . . . . . . . . . . . . . . . . 112.5 Screenshots from Zhang’s Unlock EMA . . . . . . . . . . . . . . . . . . 152.6 µEMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7 Compliance rate in Hsieh’s study, without any feedback to the respon-

dent [Control] and with two different kind of feedbacks [A+I, A+M] . 172.8 Turner’s model for decomposing notifications . . . . . . . . . . . . . . 20

3.1 Different EMA has different resolution in their answers . . . . . . . . . 253.2 Example of questions cited in Table 3.1. (a) Radio , (b) Quick Answer,

(c) Checkbox, (d) Likert, (e) Free Text and (f) Scale. . . . . . . . . . . . . 27

4.1 Leaflet used for the recruitment . . . . . . . . . . . . . . . . . . . . . . . 384.2 Timeline of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3 EMAs designed for the study. In order top to bottom, left to right:

Quick Answer, Radio Buttons Positive, Radio Buttons Negative, andCheckbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4 EMAs designed for the study. In order top to bottom, left to right:Likert Smile, Likert Multiple Adjectives, Hot Spot, and Rank . . . . . . 43

4.5 High level view on the architecture of the instrument used in the ex-perimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.6 AWARE Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.7 Screenshots from Aware mobile Application . . . . . . . . . . . . . . . 464.8 Comparison between the default ESM on AWARE (left) and our mod-

ified version (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1 Graphical Representation of screen_app_time and screen_app_commitmentfeatures. The labels of the points refer to Table 5.11 . . . . . . . . . . . . 65

6.1 Distribution of time spent to answer of the various types of EMAs . . . 686.2 Distribution of the different types of EMA with respect to the time

spent to answer (seconds) . . . . . . . . . . . . . . . . . . . . . . . . . . 696.3 P-values from t-tests performed on the pairs of EMAs. The green val-

ues are below the threshold, while the red values failed the test . . . . 696.4 Compliance rate of Participants P01, P02, and P03 with different EMAs

with respect to the different contexts. The values are listed in Tab. 6.2 . 726.5 Compliance rate of Participants P05, P06, and P07 with different EMAs

with respect to the different contexts. The values are listed in Tab. 6.2 . 736.6 Expected rewards for the various EMAs, computed with Equation 3.4 746.7 Expected rewards predicted by the Contextual MAB algorithm . . . . 75

xv

List of Tables

2.1 Median Varimax-Rotated Factor Loadings of the Positive and Nega-tive Affect Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Impact of EMA and Notification type on the Intrusiveness perceived(score from 1 to 5) in Zhang’s study . . . . . . . . . . . . . . . . . . . . . 16

2.3 Impact of EMA and Notification type on the Frequency of Answer inZhang’s study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 11 factors that influence a person’s interruptibility at a certain timeinstant (Ho and Intille, 2005) . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Examples of EMAs and their Quantity of Information. Figure 3.2presents an example for each ESM listed . . . . . . . . . . . . . . . . . . 26

5.1 Compliance Rate of every EMA for every Participant . . . . . . . . . . 525.2 AWARE device Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Applications Foreground Data Scheme . . . . . . . . . . . . . . . . . . . 545.4 Calls Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.5 Light Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.6 Screen Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.7 Light Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.8 Activity Recognition Data Scheme . . . . . . . . . . . . . . . . . . . . . 575.9 Ambient Noise Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . 585.10 Fused Location Data Scheme . . . . . . . . . . . . . . . . . . . . . . . . 595.11 Conversion among the app category and Screen App Time and Com-

mitment Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.1 Compliance Rate of every EMA for every Participant . . . . . . . . . . 716.2 EMAs answered and missed for every type, participant and context . . 71

xvii

List of Abbreviations

EMA Ecological Momentary Assessment [2.2]ESM Experience Sampling Tool [2.2]IRB Institutional Review Board [4.1]MAB Multi-Armed Bandit [3.2]MQTT Message Queue Telemetry Transport [4.3]PAM Photographic Affect Meter [2.1.3]PANAS Positive Affect and Negative Affect Schedule [2.1.3]PHQ Patient Health Questionnaire [2.1.2]RNOC Georgia Tech Research Network Operation Center [4.2]UCB Upper Confidence Bound [3.2.2]

1

Chapter 1

Introduction

Mental health in US colleges is a serious concern. For example, studies report thatmore than 75% of the students suffer from moderate stress disorders, while an-other 10% suffers from serious anxiety disorders (Abouserie, 1994). Between exams,projects, and group works, college life is extremely demanding. The studends’ men-tal health status has both a direct and an indirect impact on the lives of the students.For example, it directly impacts the GPA, the number of credits accumulated at theend of the academic year, and the retention rate from one year to the next one (Za-jacova, Lynch, and Espenshade, 2005). In addition, factors like stress level play afundamental role in the mental health of the students, in their mood stability, theirsocial relationships, and their self-esteem (Murff, 2005). Mental health doesn’t onlyimpact college life, but has also severe repercussions on life after the graduation(Hunt and Eisenberg, 2010). For example, the World Health Organization estimatedthat almost half of disease that young adults suffer from in the United States are re-lated to mental health issues (Organization, 2007). Among these diseases, years ofcollege are very critical, since it has been estimated that most lifetime mental disor-ders show for the first time before the age of 24 (Kessler et al., 2005).

Many American universities want to help their students, but to do that they needto monitor the mental health and physical status of the campus population. In fact,only a detailed overview of the actual status of the students can lead to effective de-cisions to increase students’ quality of life. Unfortunately, assessing this informationis not easy for many reasons. First, the population to monitor is not small, but isin the order of tens of thousands of people, thus costs that depend on the popula-tion size must be minimized. Second, the data collection must be as less unobtru-sive as possible, to avoid making students feel continuously observed, and thereforeincreasing their discomfort. Last, the measurements must be repeated during theentire semester, and the sampling frequency must be sufficient to be able to extractuseful information from the gathered data. As a consequence, the sampling pro-cedure must be designed to minimize the user effort, otherwise after the first fewweeks the compliance will drop. One of the tools able to satisfy at the best all theserequirements is Ecological Momentary Assessment (EMA).

EMA is a family of techniques that consist of assessing the status of a personthrough quick questionnaires multiple times a day, instead of more sporadic andlonger sessions, such as a questionnaire or an interview on a weekly basis. Thesequestionnaires vary a lot for content, form, and frequency the respondent is askedto answer, but are traditionally administered through one of the most ubiquitousdevices: smartphones. EMA approach is very popular for two main advantages:first, a higher sampling rate allows researchers to have a more precise reconstructionof the fluctuation of the phenomenon they are observing, and second, the bias thathuman mind introduces during the recall process is minimized (Coughlin, 1990).

2 Chapter 1. Introduction

Even if extremely powerful, EMAs are burdening for the respondents. In fact, be-ing interrupted up to several times per day to respond to the questionnaire is usuallyperceived as extremely annoying, making the respondents less willing to cooperate,and therefore their compliance typically drops over time (Christensen et al., 2003).To avoid this trend, researchers often introduce a remuneration mechanism. Thisapproach usually makes EMA survey not suitable for college studies, since the in-troduction of a renumeration mechanism on such a large population for the durationof a whole semester makes this mechanism not economically sustainable.

As a consequence, researchers are trying to find alternate ways to make EMA-based studies more effective. As it will be explained in detail in Section 2.3, oneof the most successful approaches has been the study of best moment to promptEMAs. Finding these moments increased the compliance of the users up to a 1.75factor. Inspired by the success of these studies, we formulated our research question,trying to investigate a complementary hypothesis.

1.1 Research Question

Instead of choosing the best moment, we want to understand if we can further in-crease users’ compliance by choosing the form of the EMA In particular, given afixed time instant, we aim to choose the best type of EMA to ask in order to maxi-mize the information collected. In fact, different EMA can require different effort tobe answered but, at the same time, can provide researchers with a different amountof information. A yes/no question is very easy to be answered, but the answer con-tains less information, whereas a open question can provide a lot of information, butrequires more effort to be completed. On the other hand, if users are required aneffort that they perceive as too big, they will dismiss the question instead of answer-ing.

We believe that this threshold is not constant, but depends on the context. Forthis reason, if we find a way to estimate the users availability, every time we mustask them to answer an EMA, we can choose the ones that are ideal in that particularcontext. Since the EMA are prompted on a smartphone, we want to exploit thesensors on the same device to unobtrusively sense the context. As a consequence,our research question can be defined as:

Is it possible to increase EMA compliance by exploiting sensor data collectedthrough a smartphone?

Given the complexity of the question, the problem will be faced gradually, inconsecutive steps. At first there is the necessity of understanding if there is any dif-ference for the user in answering different kinds of questions. More precisely, wemust start to understand if there is any difference for the user in answering differentkinds of questions. Second, we need to investigate if this effort has any repercussionon the compliance rate of the different questions. In other words, we aim to un-derstand if we can find any relations among the effort required to answer a type ofquestion and its compliance rate. Then we understand if also the context influencesthe compliance rate. To do that, we need to identify some typical contexts in whichthe user is and see if the same question has different compliance rate in the differentcontexts. Last, we can gather all the results to see if we can dynamically choose atype of question according to the context in which the user is. As a result, the foursub-hypothesis can be formulated as follows:

1. The design of the EMA has an impact on the user effort required to answer.

1.1. Research Question 3

2. The design of the EMA has an impact on the compliance rate.

3. Different contexts imply different compliance rate for the same EMA.

4. The most convenient EMA depends not only on the quantity of information it provides,but also on the context in which the respondent is.

To answer these questions, we approached the problem on several levels. First,we built a framework to define the Quantity of Information contained in a genericEMA. Then, we formulated our problem as an instance of Multi-Armed Bandit, us-ing the Query-ad-Clustering version with a reward function that depends on theQuantity of Information. We adapted an open source mobile application (AWARE)to collect data during a field study. We ran our experimentation with 8 participantsfor 2 weeks, asking them to run the mobile application on their smartphone to col-lect context data from the sensors on the device while it prompted several typesof EMAs. After a qualitative interview with the participants, we analyzed the al-gorithms we developed to extract some relevant features from the dataset, we per-formed some statistical analysis, and we trained Query-Ad-Clustering.

To answer the questions, we started with a statistical analysis by looking at thedistributions of the response time to understand if they were significantly different(Q1) and comparing the compliance rate of the different EMAs (Q2). Then, we clus-tered the data into clusters that represent different contexts, and we analyzed howthe compliance rate varies according to the context (Q3. Finally, we run the Query-Ad-Clustering algorithm and we compared the results with the theoretical Quantityof Information computed previously. The results confirm all the four questions, evenif, given the small amount of data collected, we are not able to generalize them outof this study.

5

Chapter 2

Literature Review

2.1 Assessing Mood Instability

Researchers created several questionnaires to assess Mood Instability. These ques-tionnaires vary a lot in the aspect they want to measure (e.g., depression, self-esteem,positive or negative attitude), their length and how the questions are formulated.These questionnaires have been validated in psychological studies to demonstratethe reliability of the results and be considered scientifically relevant,. This sectionwill describe the ones that have been considered as a source of inspiration for de-signing the questionnaires adopted in this study.

2.1.1 Self-Esteem Scale

Self-Esteem Scale was developed by the sociologist Dr. Morris Rosenberg in thenineteen-sixties. It identifies potential problematic low self esteem in the respon-dent. Self-Esteem Scale consists of 10 questions to be answered with a value from 0to 3, according to the agreement with the statement enunciated. Half of the questionsdirectly assess the self-esteem of the respondent, while the others are formulated tobe reverse in valence (e.g. "All in all, I am inclined to feel that I am a failure"), toreduce the effect of the respondent set.

The global score is computed giving from 0 to 3 points to each response, and thencomputing the total sum. Validation studies have shown that scores among 15 and25 are considered average scores, whereas scores below 15 suggest low self-esteem.Figure 2.1 presents the complete test.

This test presents 4 main characteristics (Rosenberg, 2015):

• Ease of Administration: it can be submitted to more people at the same time

• Economy of Time: it requires only a few minutes to be completed.

• Unidimensionality: the results span on only one dimension. In this way, aranking can be established among different participants

• Face Validity: respondents can understand what the test is going to measurewhile completing it

Several studies has proven the test to be a valid and effective measurement of selfesteem, both when used with a general population(Robinson, Shaver, and Wrights-man, 2013) and specifically with university students (Martín-Albo et al., 2007).

6 Chapter 2. Literature Review

FIGURE 2.1: Rosenberg’s Self Esteem Scale

2.1. Assessing Mood Instability 7

2.1.2 Patient Health Questionnaire

The Patient Health Questionnaire (PHQ) is a questionnaire that can be self-administeredby the patient. It is composed of 3 pages of multiple-choice questions. The evalua-tion of the answers can address 8 different diagnoses related to depression, dividedin threshold disorders, that are depression, panic disorder, bulimia nervosa and anx-iety, and subthreshold disorders, such as binge eating disorders and alcohol abuseor dependence (Spitzer et al., 1999). The PHQ has been created in the second part ofnineteen-nineties by Robert Spitzer’s research group, as a self-administered versionof the PRIME-MD questionnaire (Spitzer et al., 1999).

The possibility of self-administration made PHQ very popular. In fact, severalalternatives have been developed. Some of the most famous are PHQ-9, PHQ-4,and PHQ-2. They are shorter versions of the questionnaire, more precisely made ofrespectively 9, 4 and 2 questions. The shorter questionnaires make the completiontime shorter, at the cost of lower precision in the diagnosis. In particular, they focusonly on the detection of depression symptoms and not the other diseases monitoredby the original PHQ. PHQ-9 uses the same set of questions present in the depressionmodule of the original PHQ, as represented in Figure 2.2 (Kroenke, Spitzer, andWilliams, 2001). PHQ-2 contains only the first two questions of PHQ-9, focusing onthe presence of a mood of depression and the lack of interest in routine activities. Ithas been validated as a good tool for the diagnosis, but with low specificity (Löwe,Kroenke, and Gräfe, 2005). In other words, they are good tools to detect a potentialproblem, but then other studies must be carried out to obtain a precise dignosis ofthe disease. PHQ-4 merges PHQ-2 with GAD-2, a questionnaire aimed to detectanxiety. Also in this case the validation showed high reliability at the cost of lowerspecificity (Khubchandani et al., 2016).

2.1.3 Positive And Negative Affect Schedule

Positive and Negative Affect Schedule (PANAS) is a self-report questionnaire bornto measure positive and negative affects. As shown in Figure 2.3, it consists of 20different adjectives, 10 referring to the positive affect and 10 to the negative one. Therespondent must give a score from 0 to 5 to every adjective according how muchhe/she feels in that way. The question can be structured to ask the respondent toconcentrate on the moment he/she is answering or in the previous week. The pos-itive affect describes how much a person feels active and enthusiastic, the negativeaffect measure a set of avversive feelings such as anger, fear, guilt, and nervousness.It is important to notice how the two factors cannot be considered as the oppositeson the same gradation scale, but they must be considered separately (Watson, Clark,and Tellegen, 1988).

The final score, represented by two numbers, is computed by summing the val-ues of the adjectives that refer respectively to the positive and the negative affect.Table 2.1 shows the Median Varimax-rotated Factor Loading of the PANAS sched-ule. Intuitively, the table shows how important are the various adjectives taken sin-gularly to measure positive and negative affects.

This tool has been demonstrated valid both in clinical and non-clinical studies.Interestingly, Crawfod and Henry showed that the scores obtained are independentfrom the demographic variables of the population (Crawford and Henry, 2004).


FIGURE 2.2: PHQ-9 Questionnaire. Questions (1) and (2) composePHQ-2


TABLE 2.1: Median Varimax-Rotated Factor Loadings of the Positiveand Negative Affect Schedule

Loading On

PANAS descriptor Positive Affect Negative Affect

Enthusiastic .75 -.12Interested .73 -.07Determined .70 -.01Excited .68 .00Inspired .67 -.02Alert .63 -.10Active .61 -.07Strong .60 -.15Proud .57 -.10Attentive .52 -.05Scared .01 .74Afraid .01 .70Upset -.12 .67Distressed -.16 .67Jittery .00 .60Nervous -.04 .60Ashamed -.12 .59Guilty -.06 .55Irritable -.14 .55Hostile -.07 .52


FIGURE 2.3: PHQ-9 Questionnaire. Questions (1) and (2) composePHQ-2


FIGURE 2.4: (a) example of PAM questionnaire on a smartphone (b)valence and arousal scales of PAM pictures

Photographic Affect Meter: a visual alternative to PANAS

Simplicity and ease of administration made PANAS one of the most popular toolsto measure affect. Even if when published it was much shorter than the equivalenttools, PANAS questionnaire was not short enough to be used for frequent samplings,such as on a daily basis.

To overcome this limitation, Pollak et al. came out with a new questionnaire,called Photographic Affect Meter (PAM), such that (Pollak, Adams, and Gay, 2011):

• it could reliably measure (positive) affect,

• it could be as less obtrusive as possible, to be able to be administered fre-quently, and

• it could be responded in situ.

PAM consists of a single question, the respondent is asked to choose among 16pictures the one that describes his current mood at best. These pictures are displayedin a 4-by-4 grid, as shown in Figure 2.4 (a). For each cell of the grid, the picture canbe selected in a pool of three possible images.

PAM evaluates respondents’ positive attitude in the terms of Valence and Arousal.In fact, pictures’ position is not random: pictures located to the left indicates a lowervalue of valence, whereas pictures located on the right represents a higher level ofvalence. In the same way, pictures on the bottom of the grid represent low arousaland pictures on the top represents high arousal (Fig. 2.4 (b)). Therefore, the pic-ture selected by the respondent provides both a value of valence and arousal for therespondent current mood, ranged in [−2, 2].


2.2 Ecological Momentary Assessment

When psychologists want to investigate subjective topics such as mood, anxiety,habits, perception or any other not directly quantifiable psychological trait, theyhave to face the so-called "recall bias". In fact, the human brain stores this infor-mation in the experiential memory, that has a very short term. The longer the timethat passes between when the question is done and the moment the subject is askedto recall, the more emotions are flattered, and particulars forgotten (Coughlin, 1990).On top of that, psychological studies demonstrated that the process of memory re-trieval is unavoidably influenced, and thus biased, by the context and the mentalstate at the time of the recall process. In other words, people who are in a bad moodtend to recall more easily and focus on bad events. In addition what the subject livesin the time elapsed between the questionnaire completion and the moment recalledinevitably influence the answers. Thus, recall bias makes traditional surveying tech-niques unsuitable to reconstruct short and frequent events such as mood variations(Kihlstrom et al., 2000). Furthermore, the human mind tends to reorganize memo-ries such that they fit in a coherent succession of events that logically leads to theexpected conclusion of the recalled event. In this process, facts and emotions aredistorted to create these stories (Ross, 1989).

Ecological Momentary Assessment tools are a family of sampling techniquescreated to overcome these problems. EMA includes very different technologies,from pencil-and-paper diaries to smart-phones applications, all aimed to collect datamany times, pursuing a real-time collection, in the respondents’ natural environ-ment. EMAs are a very heterogeneous set, they differ a lot for the sampling period,the question asked, and the aim they are adopted to, but some features are commonto every EMA approach.

First of all, data are collected in the environments where subjects live and not ina psychologist’s studio or a laboratory. This aspect gives the name "Ecological" tothese tools. EMAs always focus on the respondent’s current state or the very recentpast, and that is why "Momentary" appears in their name. The sampling is doneover a extended period with a high frequency, such that the phenomenon observedcan be reconstructed in its variations (Stone and Shiffman, 1994) (Shiffman, Stone,and Hufford, 2008).

These characteristics make EMAs a very versatile tool that can be used to ex-amine many phenomena, that can be summarized in 4 main categories (Shiffman,Stone, and Hufford, 2008):

• Characterizing individual differences – Information retrieved is aggregatedto obtain a description of the phenomena which is summarized across time,in particular across multiple EMAs. For example, doctors can measure thevariation of pain in patients, or psychologists can assess mood variations inpeople. The aggregation of the information makes the results more reliable,while the repeated sampling makes it more valid, because of the lack of recallbias.

• Describing natural history – EMAs are analyzed to discover trends over time.In this case, the focus is on the within-subject variation, to understand whethertypical patterns of the phenomena exist.

• Assessing contextual associations – Data collected are examined to highlightthe correlation among different phenomena that occur in time. For instance,Barrett and Russell adopted EMAs to demonstrate that a person cannot feel

2.2. Ecological Momentary Assessment 13

both happy and distressed at the same time (Feldman Barrett and Russell,1998), whereas Larson and Richards proved that the mood of each familymember affects the others (Larson and Richards, 1994).

• Documenting temporal sequences – Researchers can exploit the longitudinalintrinsic nature of EMAs to analyze causal relationships among events

When designing an EMA, the most crucial decision is to determine when the re-spondent has to report experiences. Three different protocols can be followed (Chris-tensen et al., 2003):

• Interval-contigent protocol – Respondents are required to report experiencesat fixed times, about what are they living at that moment or what they haveexperienced since the last EMA they have completed. This is not the best pro-tocol, in fact since respondents already know when they will have to answer,they can unconsciously introduce some bias in the activities they perform,preparing themselves for that moment;

• Signal-contingent protocol – Respondents are required to report experienceswhen a signal is received. It can be a notification on the telephone, an alarm,or a ringing pager. With this approach, the bias introduced with the Interval-contingent protocol is minimized, since the subject does not know when theEMA has to be fulfilled. On the other hand, the signal can arrive when thesubject is not available to respond, leaving several EMAs unanswered

• Event-contingent protocol – Respondents are required to report experienceswhen something significant happens. This kind of protocol minimizes the re-call bias since the EMA is answered immediately after the event. This approachis not as trivial as it may seem at first glance. In fact, there is the necessity ofsensing the event, which can be a very expensive operation both in terms of re-sources, since the necessity of continuously monitoring what is going on, andof the burden perceived by the respondents, since they may feel continuouslyobserved.

Once the protocol has been chosen, the sampling period is the second variableto be set. The shorter the period, the more complete the information collected, butthe higher the effort required by respondents. In addition, a growing effort requiredmeans a decreasing compliance rate from the respondents. On the contrary, longsampling periods imply lower effort, but a more fragmented information, and therisk of occurring in the recall bias.

For these reasons the optimal period is often a trade-off among the amount ofinformation necessary for the study and the effort required to the respondents, andthus it is strictly domain dependent. Looking at existing studies, the average numberof EMA collected is, on average, between 8 and 12 questionnaires per day, for studiesthat elapse 1 or 2 weeks, for a total number of samples between 56 and 168 perrespondent (Reis and Gable, 2000). On top of that, Delespaul suggests to sampleno more than 6 times per day if the questionnaire requires more than 2 minutes tobe fulfilled (Larson and Delespaul, 1992). Note that in Event-contingent protocolthese considerations do not subsist since respondents’ actions directly trigger thequestionnaire.

EMA has been introduced in research in the 1970s and immediately became awidely used tool thanks to its several advantages. Although, before deciding toadopt this tool for a research, the pitfalls must be considered too. First of all, there is


a set of issues strictly related to participants. In fact, answering 8 to 12 questionnairesevery day for 1 or 2 weeks can be perceived as a significant burden to deal with. As aconsequence, after a few days the compliance rate can drop drastically. Second, thequality of data must be considered. Even if EMAs has been fulfilled, the informationcollected is not necessarily meaningful. For example, people tend to answer alwaysin the same way to a question repeated over time, transforming the fulfillment ofthe questionnaire into a passive habit. In addition, even if the researcher chooseswhen to prompt the EMA, the respondent decides when to answer. This fact meansnot only that several minutes can elapse between the signal and the response, intro-ducing recall bias, but also that respondent will not answer while performing someactivities such as driving, playing a sport or talking with people, leaving these ac-tivities uncovered from the research. Last, there are some important ethical issues,since the data gathered are strictly related to people’s personal life (Scollon, Prieto,and Diener, 2009).

2.3 Increasing the compliance rate

As said, the burden perceived by respondents when asked to fulfill EMAs naturallyleads respondents either to ignore them or to answer meaninglessly. These phe-nomena are the main causes for a not sufficiently detailed or misleading dataset.Therefore researchers try to reach the highest compliance rate possible. Several ex-periments have been conducted with this aim, investigating three main paths: re-ducing the burden of answering, increasing the respondents’ engagement and findthe best moments to interrupt them.

2.3.1 Reducing the burden

The first and more natural idea to increase the compliance rate was finding a way toreduce the burden perceived by the respondents. If people find less boring to fulfillEMAs then they will respond more likely when requested. As a consequence, theefforts focused on modifying the form of the EMA to make it more immediate.

Several attempts have been made to try to gather information as a "side effect"of actions natural for the respondents. In particular, various studies tried to exploitthe gestures for unlocking the smartphone screen to collect data. Indeed, when therespondent pushed the button on the device to turn on the screen, if the EMA hadto be triggered, a custom unlock screen was shown instead of the default one, suchthat the respondent could answer the question while unlocking the smartphone. Theinteraction required was studied to imitate at the best the same experience peopleare used to with a traditional unlock screen.

One of the most relevant attempts is Unlock EMA (Zhang, Pina, and Fogarty,2016) Fig. 2.5 presents some of the EMA designed for the study. The study triedto compare the effectiveness of a "traditional" EMA system based on and UnlockEMA on the intrusiveness perceived by respondents and their compliance rate. Thestudy identified two controls variables, the kind of EMA, traditional or Unlock EMA,and the amount of notifications to prompt to the user, and consequently dividedsubjects in six groups to study which variables influenced more the number of an-swered EMA daily [frequency] and the burden perceived by respondents [intrusive-ness]. The results, as described in tables 2.2 and 2.3, show that the adoption ofan Unlock EMA doubles the frequency (16.1 against 7.9 EMAs per day, F1,258 =188, p < .001), whereas it marginally influences the intrusiveness (2.12 against 2.38,

2.3. Increasing the compliance rate 15

FIGURE 2.5: Screenshots from Zhang’s Unlock EMA

F1,258 = 11.7, p < .001). Notifications, on the contrary, are perceived as much moreintrusive, but they influence less the frequency.

CampusLife project adopted a similar approach: the EMA consisted on PAMquestionnaire, periodically prompted instead of the "traditional" unlock screen (sec.2.1.3). This study involved 51 students as participants, who had to answer to 4 EMAsevery day for 5 consecutive weeks. On a total of 3220 questionnaires prompted, 1606were correctly fulfilled, the compliance rate was, therefore, 1606/3220 = 0.49875776397(Saha et al., 2017).

Intille et al. instead, experimented another interaction paradigm: they triedto minimize the time requested to answer. Pursuing this aim, they developed theµEMA, a single-question EMA designed to be displayed on a smartwatch [Fig.2.6].The experimentation showed an increase of the compliance rate of a 1.35 factor com-pared to EMAs prompted on a smartphone (Intille et al., 2016).

2.3.2 Increasing user’s engagement

Engaging respondents more is another path that has been broadly investigated. Oneof the most used techniques to increase engagement is renumeration. Respondentsare typically paid around 20$ per week of study. To avoid the compliance rate todrop while the research is going on, other additional prizes, both economical andgadgets, are promised if the respondet’s compliance rate is above a threshold. Typ-ically, at the end of the study often a "big prize" is assigned, to further motivate therespondents (Christensen et al., 2003).

An Economical reward is effective to obtain a higher compliance rate, but it doesnot improve the quality of the data. In fact, such these forms of compensation do notmake the participants to feel involved, and thus, motivated, in collaborating to theresearch. As a consequence, the risk they fulfill EMA with low-quality data is high.


TABLE 2.2: Impact of EMA and Notification type on the Intrusivenessperceived (score from 1 to 5) in Zhang’s study

Notification Type

EMA Type None Traditional Aggressive

Unlock 1.77 2.31 3.08Traditional 1.33 2.22 2.81

TABLE 2.3: Impact of EMA and Notification type on the Frequency ofAnswer in Zhang’s study

Notification Type

EMA None Traditional Aggressive

Unlock 15.0 16.2 17.1Traditional 2.5 9.8 7.9

FIGURE 2.6: µEMA

2.3. Increasing the compliance rate 17

FIGURE 2.7: Compliance rate in Hsieh’s study, without any feedbackto the respondent [Control] and with two different kind of feedbacks

[A+I, A+M]

To prevent this, Hsieh et al. tried to exploit data visualization to increase the com-pliance. In other words, the research group developed a portal where respondentscould see some statistics about the results obtained analyzing the EMAs the respon-dents had already answered to. The experimentation was carried out by dividingrespondents into three groups, the control group, without any visual feedback, andtwo groups with feedbacks about different aspects of the information gathered. Thecompliance rate was monitored for 25 consecutive days, divided into five periods,each one 5-days long. The results, illustrated in Fig. 2.7, show that any visual feed-back [A+I and A+M groups] improves the compliance rate (Hsieh et al., 2008).

2.3.3 Finding the best moment to interrupt

Finding the best moment to interrupt respondents is the last, and, probably, themost investigated, possible solution to increase the compliance rate. Studying In-terruptibility is not a trivial task. As Janssen highlights, a complete research wouldinvolve several complementary disciplines, such as Human-Computer Interaction,Cognitive Science, Social Science, Computer Science and Experimental Psychology(Janssen et al., 2015).

Even if broadly explored, in Literature a unified definition of Interruptibilitydoes not exist. Turner recently surveyed a large number of studies around this topic,and evicted that all the definition provided gather in three main categories, definingInterrupibility as (Turner, Allen, and Whitaker, 2015a):

• The physiological ability of a person to switch focus across different tasks;

• The cognitive affect on performances due to the switch from a task to another,to come back to the first task;

• The subject’s sentiment modification due to the interruption.

However, the three definitions are strictly related. Focusing on a human point ofview, Ho and Intille identified 11 factors that influence a person’s interruptibility ata certain time instant, listed in table 2.4 (Ho and Intille, 2005).

What stated above is valid both for interruptions caused by other humans anddevices. Nees tried to understand whether interruptions caused by the two different


TABLE 2.4: 11 factors that influence a person’s interruptibility at acertain time instant (Ho and Intille, 2005)

Factor Description of the Factor

Activity of the user The activity the user was engaged in dur-ing the interruption

Utility of message The importance of the message to the userEmotional state of the user The mindset of the user, the time of dis-

ruption, and the relationship the user haswith the interrupting interface or device

Modality of interruption The medium of delivery, or choice of in-terface

Frequency of interruption The rate at which interruptions are occur-ring

Task efficiency rate The time it takes to comprehend the in-terruption task and the expected length ofthe task

Authority level The perceived control a user has over theinterface or device

Previous and future activities The tasks the user was previously in-volved in and might engage in during thefuture

Social engagement of the user The user’s role in the current activitySocial expectation of group behavior Activities and expected reaction to inter-

ruption of nearby peopleHistory and likelihood of response The type of pattern the user follows when

an interruption occurs

2.4. Predicting Interruptibility 19

sources present some differences. During his experiment, he asked some studentsto perform a data-entry task. While they were working, they were interrupted by ei-ther a researcher or a notification system, both asking for some information that thesubjects could find for on a table given them at the beginning of the experimentationsession. The results show different interesting aspects. First, there is no differencein the effort the subject makes to resolve the different interruptions. In fact, the in-terruption times are comparable, and the error rate of the answers is similar too.Second, as expectable, the delay between when the attention of the subject is trig-gered and when the subject actually interrupts himself is smaller when there is ahuman waiting. This is probably due to the perceived social expectation created bythe researcher waiting for an answer. As a consequence, the disruption perceivedby the human interruption is greater. These results, thus, can be also seen as theevidence that people are less willing to be interrupted by electronic devices.

In addition, for interruptions caused by EMAs, if we consider the frameworkdescribed above [Table 2.4], the low level of utility perceived by the respondent, thehigh frequency of interruption, the modality of interruption (the smartphone) andits low authority level make the degree of interruptibility perceived even lower.

All things considered a question naturally arises, is there the possibility of in-creasing the EMA compliance rate estimating when the respondent could be moreinterruptible?

2.4 Predicting Interruptibility

Interruptibility has been widely explored using Machine Learning. Supervised Learn-ing classification is used the most family of techniques used to predict when a sub-ject is willing to be interrupted. In particular, according to Turner et al., Naive Bayesand Support Vector Machines (SVMs) are the most investigated algorithms. Deci-sion Trees, Adaboost, Nearest Neighbour and Random Forest are popular classifiertoo. Neural Networks, Association Rules Learning, and Genetic Programming, in-stead, are algorithms still not widely used in this field (Turner, Allen, and Whitaker,2015a). In the same research, the authors highlight how the trend is moving fromoffline to online learning, thanks to the increasing computational and networkingcapabilities of new devices such as smartphones. On the other hand, the researchersshow how the actual panorama is instead split among personalized and compos-ite models. In fact, some studies are using data to build personalized models, thatare models trained with the specific person’s data, at the cost of a smaller dataset,whereas other researchers prefer to use data from different people to have more in-formation to mine, at the cost of less personalization. This research question is stillopen, none of the options has been already accepted as better, since the performancesdepends too much on the particular study performed.

Fisher and Simmons, instead, adopted a Reinforcement Learning approach. Inparticular, they exploited a variant of the Nearest Neighbour algorithm to predictthe subject’s preferences for the ringtone settings of the smartphone, setting it aloudwhen the user was willing to be interrupted, and in snooze mode when he/she wasnot. During the experimentation, the algorithm achieved an accuracy of around82%. Denoting a good level of interruptibility detection (Fisher and Simmons, 2011).

Dealing with smartphone interruptions, Turner et al. further analyzed the inter-ruptibility problem focusing on the action of responding to a notification. As shownin Figure 2.8, they decomposed the action performed by the user in four sub-actions(Turner, Allen, and Whitaker, 2015b):


FIGURE 2.8: Turner’s model for decomposing notifications

1. React: when a notification is prompted, either through a symbol on the screen,a vibration or a sound, the device captures the user’s attention. Thus, the usercan either interrupt himself to look at the smartphone or come back to do whathe was doing.

2. Focus: Having chosen to react, the user unlocks the screen and sees the noti-fication on the device. At this point, he/she can decide whether to read thetext associated with the notification itself or close the telephone ignoring thenotification.

3. Read: At this point, the user has to decide whether to open the application thatgenerated the notification to read the full content or not.

4. Act: (If applicable) The user must decide whether act on the content, for exam-ple by replying to the message he received or answering the incoming call.

With a detailed study, the author showed that prediction algorithms obtained dif-ferent performances in forecasting the decision took at every stage of the pipeline,and that, even if he did not reach satisfactory predictions, there is room for furtheranalysis in this direction.

Another important question that researchers investigated broadly is which kindof sensors are necessary to predict interruptibility. Hudson et al. were the first toinvestigate this field. They decided to adopt a Wizard-of-Oz approach since theironly focus was to study the kind of sensors necessary to implement an Interrupt-ibility predictor. They recorded some employees working in their offices, and theyrecorded their daily activities through a camera on the desk. Participants had tomake with one of their hands a number from one to five to the camera, describinghow much they were interruptible every time they heard the audio signal. Then,researchers divided the recording into 15-seconds chunks, and manually encodedevery chunk in a 23 features vector. The features were related to the subject (hispresence, the activity he was performing and his interaction with the surroundingenvironment), to guests in the room(the number and the activity performed by eachone of them) and to the surrounding environment (door open/close, day of the weekand hour of the day). For each feature, they extracted multiple variables, as theywere different sensors in the room. More precisely, the fictitious sensors measured:

• if the event occurred in the last 15 seconds, i.e., in the chunk analyzed;

• if the event occurred in at least one chunk of the last minute analyzed;

• if the event occurred in at least one chunk of the five minutes analyzed;

• if the event occurred in all the chunks of the last minute analyzed;

2.4. Predicting Interruptibility 21

• if the event occurred in all the chunks of the last five minutes analyzed;

• in how many chunks of the last five minutes analyzed the event has happened;

The team focused on the predictive power of every single variable. Interestingly,they discovered that the variables deriving from binary sensors that monitored asingle chunk or a single-minute interval contain more information than the vari-ables deriving from the respective sensors working on a five-minute interval. Thisdifference in the performances was not valid for the counters over five minutes, thathad performances comparable to the one-minute interval sensors. In addition, theynoticed that the most significant sensors for the prediction were microphones, todetect conversations, and accelerometers, to detect the activity the subject was per-forming and, thus, a device such as a smartphone was sufficient to collect all themost relevant data (Hudson et al., 2003).

The one just described was just the precursor of many studies aimed to under-stand how the context affects interruptibility. Some of the most important resultshave been obtained by Okoshi’s research group, which discovered that people aremore interruptible at breakpoints, the moments in which we stop an activity to startdoing something else. In their study, they exploited the smartphone accelerometerto detect physical-activity breakpoints and prompt an ESM at those moments. Theexperimentation showed that the compliance rate increased by 8 percentage points(50% to 58%), and the response time decreased. If an ESM was prompted when thesubject jumped on a vehicle, the reduction of the response time reached a 99% factorwith respect to the average response time of a questionnaire prompted at a randomtime (Obuchi et al., 2016).

In another research, Okoshi et al. tried instead to detect breakpoints exploitinginternal events of the smartphone. Subjects of the experimentation resulted 28% lessfrustrated in being interrupted if when the notification was triggered by a change ofactivity on the device (Okoshi et al., 2016).

Finally, a group of scientists, focused on how the emotional state of the subjectinfluences his interruptibility, too. Pejovic, for example, asked the study participantsto answer to some ESM about how they judged the activity they were doing beforebeing interrupted. From the data gathered they figured out that the more the activitythat is being performed is tough, the less the subjects are willing to be interrupted(Pejovic, Musolesi, and Mehrotra, 2015). Mehrotra, on the other hand, focused onhow different senders of the content of the of the notification influence Interrupt-ibility. He discovered that the acceptance rate of a notification, that means interruptto look reply to it, varies a lot both according to whether the notification is gener-ated by a human (e.g. message, email) or by the smartphone itself, and, within thenotification generated by humans, the social relationship among the sender and thereceiver influence the acceptance rate, too (Mehrotra et al., 2015).

23

Chapter 3

Formulation Modelling

As stated in the introduction, we aim to understand if we can increase the EMAscompliance rate through the data sensed by the smartphone. More specifically, weimagine building a system in which the smartphone gathers information and conse-quently decides, among the possible EMA questions, which is the best one to promptin order to maximize the information obtained from its completion. To achieve this,we want to exploit machine learning techniques to understand from the context howmuch a respondent is willing to response, and consequently ask him as much aspossible. In this way, we should be able to find always the best trade-off among therespondent’s willingness to cooperate and the researcher’s desire for information.

To achieve this result, we first need a mathematical formulation of the problem.In fact, we need to formally define our research question. Then, we need to math-ematically define the space of the EMAs, providing a metric able to determine if aparticular EMA is "better" than another. Finally, the last part of the chapter will focuson the theoretical description of the mathematical tool used to reach the result.

3.1 Mathematical Framework

We saw that we need to decompose the research question into four different sub-hypothesis to verify (Chap. 1), that are:

1. The design of the EMA has an impact on the user effort required to answer.

2. The design of the EMA has an impact on the compliance rate.

3. Different contexts imply different compliance rate for the same EMA.

4. The most convenient EMA depends not only on the quantity of information it provides,but also on the context in which the respondent is.

To analyze Hypothesis 1 and 2, we need to formalize the concept of compliancerate:

Definition 3.1 (Compliance Rate). The Compliance Rate is the ratio of the EMAsanswered divided by the number of EMAs prompted:

ρ =nanswered

nprompted(3.1)

Intuitively, ρ is the percentage of the EMAs answered among all the ones he hasbeen prompted. To verify Hypothesis 3 and 4, though, we need to go much morein depth, providing a complete mathematical formulation to the problem and to ourresearch question.

24 Chapter 3. Formulation Modelling

3.1.1 Problem formulation

We can formulate the problem as follows. Given:

• a set U of users

• N different EMAs

• I1, I2, . . . , IN , the amount of information obtained by answering to question-naire n, in ≥ 0 ∀n ∈ N

• a set T of fixed times instants

• Su,t =< actu,t, locu,t, . . . > the status of user u at time t, that is a tuple containingall the information collected by the smartphone in an interval of length tw :[t− tw, t[. In this particular domain, the status represents the context where theuser is

• a reward function

rn =

{in if the nth EMA is "correctly" answered0 otherwise

∀n ∈ N (3.2)

We aim to find an algorithm that works as follows: at every instant t, for everyuser u, we prompt an EMA among the possible ones, and we receive a reward rt,u,n.How can we choose the optimal EMA? At every time instant, We want to find thebest EMA to prompt such that the global reward is maximized:

maxT,U

∑t,u

rt,u (3.3)

This formulation opens many questions. First, we have introduced the amountof information in, how can we quantify it? Then, how can we establish whether anEMA has been answered correctly? Which is a suitable tool to solve this problem?All these questions will be answered in the continuation of the dissertation.

3.1.2 Quantity of Information

Determining the right metric for the measurement of the amount of informationcontained in an EMA is a fundamental design step to solve the problem properly. Astrong mathematical formulation of the quantity of information is the key to obtaina definition that is:

• general, in the sense that can be easily applied to any kind of EMA designed,

• valid, that properly encloses the difference among the various EMAs. In otherwords, if an EMA a can collect information more in detail than an EMA b, thenthe quantity of information of a must be higher of the one of b.

Intuitively, the more possible answers a user have to choose between, the higherthe amount of information the researcher will get when a particular answer is sub-mitted. For example, as shown in Fig. 3.1, a question like "How much happy doyou feel right now, in a scale from 1 to 5?" gives much more insight on the mood ofthe respondent with respect to "Do you feel happy right now? Yes or no?". Even ifthe difference seems very subtle, much more analysis can be carried out having the

3.1. Mathematical Framework 25

FIGURE 3.1: Different EMA has different resolution in their answers

complete series of self-reports on a 5-point scale. If we think about the succession ofEMAs as a temporal series, the resolution of the signal would be much higher, giv-ing researchers the possibility of much more detailed analysis. As a consequence,the metric chosen must score the former question more than the latter. The solutioncomes from the Self-Information defined by Shannon (Cover and Thomas, 2012).

Definition 3.2 (Quantity of Information). Given a single-question EMA, and a set ofM possible answers Am, the Quantity of Information obtained by the user’s answerAi is defined as follows:

I(Ai) = log21

p(Ai)= − log2 p(Ai)∀i ∈ M (3.4)

If the EMA contains Q questions, the total quantity information is obtained sum-ming the information of the single questions:

I(A1i ∨ A2j ∨ . . . ∨ AQk) = −Q

∑q=1

log2 p(Aqi) = − log2(Q

∏q=1

p(Aqi)) (3.5)

where i, j, . . . , k are the answers to the single questions or the EMA.

Introducing this definition we are making some implicit assumptions that areworth to be considered.

Assumption 3.3. Given an EMA, the probability distribution of its possible answersis known. In particular, we assume they are independently identically distributed.Hence, given an EMA with N possible answers:

p(A1) = p(A2) = . . . = p(A1) =1N

(3.6)

To analyze this assumption, and to understand its implication, we need first todefine what the answer is.

Definition 3.4 (Answer). An answer to an EMA is any valid outcome to the EMA

For example, given a yes/no question, the set of answers is {yes, no}. Given amultiple-choice question, in which the user can choose any options he wants amongoptions a, b, and c, then the set of answers contains any possible combination ofthe three options: {∅, a, b, c, ab, ac, abc}. Note that answer a and ab are treated asdifferent and independent.

Given the definition of answer, is Assumption 3.3 reasonable? Considering thecontent of the answers, it is not. For example, if an EMA asks to select from a listof adjectives the ones that describe the respondent’s mood, the user will hardly everselect both "Happy" and "Sad" options, whereas will select much more likely only


Name Answers domain I(Ai)

Radio One of the N possible answers log2N

Checkbox Any combination of N possible answers log22N = N

Likert From 1 to N stars log2N

Quick Answer Yes or No log22 = 1

Scale A value in [a, b] log2(b− a)

Free Text Any textual input of length N log2 26N

TABLE 3.1: Examples of EMAs and their Quantity of Information.Figure 3.2 presents an example for each ESM listed

one of the two. However, if we consider that the aim of the study is centered aroundthe action of the response, and not on its content, then the assumption is reasonable.Besides, if we consider different probabilities different answers, then the algorithmwould be biased. In fact, its aim would be no more prompt an EMA such that therespondent selects a response, but prompt an EMA such that the respondent selectsan unlikely response, because it has a higher reward.

Assumption 3.3 brings two important contributions. First, we are now able toquantify the Quantity of Information of every answer of an EMA question. Sec-ond, since the Quantity of Information is the same for every possible answer of aquestion, we can define the Quantity of Information of a question as the quantity ofinformation of any of its answers.

With these assumptions, we are now able to define the Quantity of Informationof any single-question EMA. Table 3.1and Figure 3.2 show some examples of single-question EMAs and the corresponding Quantity of Information I(Ai). However, todeal with multiple-questions EMAs, another assumption is necessary:

Assumption 3.5. If multiple questions appear on the same EMA, then their answersare independent variables.

This assumption is even stronger. The independence is not true, in fact since allthe questions are on the same topic their answers are somehow related. For example,if a user responds "yes" to the question "Do you feel happy?", then probably wouldanswer with a low score to the question "How much frustrated do you feel in thismoment, in a scale from 1 to 5?", if present on the same questionnaire. Again, the aimof the study makes this assumption possible: since we are interested in the response,and not in its content, we can see the event [User answer to question no. 1] independentfrom the event [User answer to question no. 2], even if their content is strictly related.

In this section we first formulated the problem mathematically, then we definedthe quantity of information, a metric that plays a fundamental role in the definitionof the reward function of the problem. We now have a complete overview of theproblem. Therefore, we can understand the mathematical tool we will use for itsresolution.

3.2. Multi-Armed Bandit 27

FIGURE 3.2: Example of questions cited in Table 3.1. (a) Radio , (b)Quick Answer, (c) Checkbox, (d) Likert, (e) Free Text and (f) Scale.

3.2 Multi-Armed Bandit

Multi-Armed Bandit (MAB) problem is a subfamily of Reinforcement Learning prob-lems. Intuitively, at each time step the algorithm is asked to choose one of the avail-able options, called arms, and a payoff is returned accordingly. The goal of the al-gorithm is to maximize the global payoff over time. The name Multi-Armed Banditcomes from the most illustrative example of the algorithm: a gambler is in a casinoand has many slot machines in front of him. He has a limited amount of resources(i.e., limited money) and he wants to choose the sequence of slots machine to play tomaximize the payoff (i.e., he wants to win as much as possible). Even in this simpleexample the solution is not trivial. In fact, the man needs to decide which slots arethe most profitable. However, how can he be sure that the chosen machine is themost promising one? What if he decides to invest everything he has on a slot ma-chine, but that is not the best one? In other words, how can he be sure to pick theoptimal arm and not the sub-optimal ones? Fortunately, many algorithms providean answer to all these questions. First, we will provide a formal definition for theMAB setting, along with some practical examples of application. Then, the disserta-tion will continue with the solution of the problem. Finally, the specific sub-class ofMABs used for the resolution of our problem will be provided. The following sec-tions take inspiration from (Bubeck and Cesa-Bianchi, 2012) and Machine Learningcourse material from Prof. Restelli and Dott. Trovò.

3.2.1 Multi-Armed Bandit Setting

Definition 3.6 (Multi-Armed Setting). A Multi-Armed Bandit setting is a tuple 〈A,R〉where:

• A is a set of N possible arms, and

• R is a set of reward functions, a set of (unknown) probability distributions

At every time step t = 1, 2, . . . :


1. the agent chooses an arm ait to pull;

2. the environment returns an observable reward rit,t

3. the agent updates its knowledge with rit,t

The goal is to maximize the cumulative reward:

max ∑t

rit,t (3.7)

We can notice very interesting aspects in the definition. First, the reward functionof every arm is unknown to the agent. If these where known, the problem wouldbe already solved, since pulling the arm with the highest expected reward would besufficient to maximize the reward. Second, the definition is not imposing any limiton the time horizon. It can be either finite or infinite. In case of finite time horizon,it can be known or unknown. In other words, the agent can know when the game isgoing to end or can know that it will end but it does not know when it will happen.Third, there are no constraints on the cardinality of the number of arms. In mostof the cases, |A| � T, with T the total number of time steps, but some particularversions of the algorithm allow the resolution for a number of arms greater thanthe number of time steps. This subclass is called Infinitely Many-Armed Banditsproblems. Any variation of the algorithm setting (time horizon and/or the numberof arms) will impact profoundly on the performance of the solution. Last, even ifDefinition 3.6 is the most adopted, it is not universal. In fact, it assumes that thereward distributions are stochastic distributions. In some cases, though, the agent isplaying against another agent. In this case, we call the problem Adversarial MAB.This subset of Multi-Armed Bandit will not be treated, since they are not relevant toour application.

Multi-Armed Bandits finds a lot of real-world applications. One of the manyexamples is web page advertisement. The owner of the web page wants to choosethe best advertisement banner to display on each page. The possible arms are thevarious banners; the reward function is the so-called Click-Through Rate (CTR), thatis the percentage of people that click on the advertisement when it is shown (Lu, Pál,and Pál, 2010). The agent must try the different banners to choose the most profitableones. Another relevant example is the choice of the best treatment in clinical trials.In this case, A contains all the possible treatments, whereas the reward functionmeasures the effectiveness of the treatment on the specific patient (Villar, Bowden,and Wason, 2015).

Now we understand what a MAB is, but we need to define what does solving aMAB mean. We solve a MAB when we establish a policy to adopt:

Definition 3.7 (Policy). A policy is a sequence of arms to pull.

How can we distinguish among a good policy from a bad one? We need a per-formance measure to be able to compare them, and possibly choose the best one. Forthis reason, we introduce the notion of regret.

Definition 3.8 (Regret). In general, given N ≥ 2 arms, and ri,t, the unknown rewardsassociated to each arm a = 1, . . . , A, at every time step T = 1, 2, . . ., the regret afterT time steps is defined as:

LT = maxa∈A

T

∑t=1

ri,t −T

∑t=1

rit,t (3.8)


From an intuitive point of view, Ln is a measure of how much the agent is loosingplaying at every time step a sub-optimal arm it instead of the the optimal one. Thisperformance measure is coherent with the cumulative reward function defined inEq. 3.7, in fact, minimizing the regret means maximizing the global reward.

This definition of regret, though, is valid only in a deterministic setting, since toapply it we must to know all the payoffs to choose the highest one. Unfortunately,that is not the usual setting. Generally, both the rewards and the agent’s decisionscan be stochastic, and we know only the reward of the arm we pulled. For thisreason, we must introduce the Expected Regret:

Definition 3.9 (Expected Regret). The Expected Regret is:

E Ln = E

[maxi∈A

T

∑t=1

ri,t −T

∑t=1

rit,t

](3.9)

We can further generalize Eq. 3.9 comparing the expected outcome not with theactual optimal action, but with the expected one. This generalization is called thepseudo-regret.

Definition 3.10 (Pseudo-regret). The Pseudo-regret is:

Ln = maxi∈A

E

[T

∑t=1

ri,t −T

∑t=1

rit,t

](3.10)

This generalization has a cost: having a more generalized definition leads to agreater weakness of the pseudo-regret estimator with respect to the expected regret.More formally:

Ln ≤ E Ln (3.11)

As a consequence, by using the pseudo-regret we have a worse estimation of thereal regret, going to underestimate the real loss. Before understanding how to solveMulti-Armed bandits, we need to do a last step forward. We must define the optimalarm and the optimal reward:

Definition 3.11 (Optimal Arm and Optimal Reward). The optimal arm a∗is the onewhose expected reward is the Optimal Reward, that is r∗ is the highest rewardamong the ones for all arms in A:

r∗ = maxi∈A

ri and a∗ ∈ argmaxi∈A

ri (3.12)

Last, we define the difference in the reward obtained from a generic arm ai andthe optimal arm a∗ as:

∆i := r∗ − ri (3.13)

3.2.2 Solving MABs: Upper Confidence Bound - UCB1

Literature presents many algorithms to solve MABs (Kuleshov and Precup, 2014).Even if they are very different in their premises, in the assumption they make aboutthe distributions of the rewards, the time horizon they consider, and in many otheraspects, they all have to deal with a crucial point, that is the exploration-exploitationdilemma.

In fact, every time an agent wants to pull an arm, it has to face a huge question, ithas to decide whether to explore the problem setting to identify with more precision


which are the best arms, or exploit the knowledge acquired in the previous steps ofthe execution to pull the arm that seems to be the optimal one. Too much explo-ration means pulling too many times non-optimal arms, and therefore a significantincreased loss of cumulative reward, whereas too much exploitation does not givethe possibility to the agent to understand whether he is actually following a goodpolicy or there are better ones. An agent must be able to understand when it hasexplored enough and switch to exploit the knowledge gathered. This problem isknown as exploration-exploitation tradeoff.

One of the simplest and probably most used heuristics to address this dilemmais optimism in face of uncertainty. The basic idea is as simple as effective: when anagent must take a decision in an uncertain setting, it first makes some estimation onthe various possibilities, and then it chooses the most promising one. The differencewith a pure exploitation algorithm is subtle but crucial: in the exploitation case theagent decides based only on what he knows. In this case, instead, it tries to makesome hypothesis on the unknown, based on its knowledge. The agent does not pullthe arm that it believes to be the most promising one, but the arm the agent believesit could be the most promising one.

How does this heuristic face the exploitation-exploration tradeoff? Optmism inface of uncertainty heuristic performs exploration and exploitation at the same time.In fact, by pulling the most promising arm, the agent exploit the previous knowledgeto maximize the cumulative reward. At the same time, as soon as another arm seemsto be more promising, the agent immediately switch pulling the new one, exploringthe other possibilities. As said, many algorithms follow this heuristic. However,they differentiate in the way they construct the estimations of the rewards for thearms and, consequently, in the arm that is pulled.

Upper Confidence Bound (UCB 1) is probably one of the most famous and adoptedalgorithms that follow this principle. UCB 1 makes its estimation on the arms pay-offs computing an upper bound on the rewards ri. In other words, it tries to identifya value U(ai), called upper bound, such that ri ≤ U(ai) with high probability. At ev-ery time instant, thus, the algorithm thus will pull the "most promising" arm, that isthe arm with the highest value of U(ai). How is U(ai) computed? Mathematically,we can see the upper bound as the sum of two terms:

U(ai) = r̂i,t + Bt(ai) (3.14)

where r̂i,t is the empirical mean, computed on the basis of the previous experiments,and Bt(ai) is the term that encloses the uncertainty. If we define Nt(ai) as the num-ber of times that arm ai has been pulled until time instant t, we can intuitively saythat the bound Bt(ai), and consequently U(ai), strictly depend on Nt(ai). In fact,the smaller Nt(ai), the smaller the information we gathered about arm ai, and con-sequently the higher Bt(ai) must be to allow us to state that, with high probability,ri ≤ U(ai). In the same way, the higher Nt(ai), the lower Bt(ai). The variabilityof Bt(ai) reflects the fact that the higher the Nt(ai), the more r̂i,t will be an accurateestimation of ri.

Having intuitively understood how UCB 1 works, we can now mathematicallyformulate the algorithm. Having introduced Nt(ai), we can formally define r̂i,t as:

r̂i,t =1

Nt(ai)

t

∑j=1

ri,j1{ai = aij} (3.15)


We must understand how to compute Ba(ai). To do that, we apply the HoeffdingInequality (Hoeffding, 1963):

p(ri > r̂i,t + Bt(ai)) ≤ e−2tBt(ai)2

(3.16)

From 3.16 we can compute the bound Bt(ai):

Bt(ai) =

√− log p2Nt(ai)

(3.17)

Finally, since we want to converge to the real expected value ri, we choose a valuefor p which decreases with the time, such as p = t−4 .

Bt(ai) =

√2 log pNt(ai)

(3.18)

We are now able to fully describe UCB 1 algorithm. The algorithm is divided intotwo parts; first, an initialization process is necessary to have an initial estimation ofr̂i, the reward for every arm i, then the algorithm continues its execution pulling themost promising arm at each time step. UCB 1 is illustrated in Algorithm 1:

Algorithm 1: UCB 1Data: N arms, number of rounds T > Nbegin

while t < N dopull arm att←− t + 1

endwhile t < T do

for i in N dor̂i,t =

1Nt(ai)

∑tj=1 ri,j1{ai = aij}

Bt(ai) =√

2 log pNt(ai)

pull arm ait = argmaxi∈Aend

endend

UCB 1 is quite efficient. In fact, it can be shown that the Expected CumulativeRegret at time instant T is bounded as

LT ≤ 8 log T ∑i,∆i

1∆i

+ (1 +π2

3)∑

i,∆i

∆i (3.19)

with ∆i defined in Eq. 3.13 (Auer, Cesa-Bianchi, and Fischer, 2002). Equation 3.19shows that the cumulative regret is logarithmic in the number of time steps.


3.3 Contextual Multi-Armed Bandit

So far, we introduced the concept of Multi-Armed Bandit setting and UCB 1, a quiteefficient algorithm to find a suitable policy. Unfortunately, there is still a big lim-itation. In fact, the definition of MAB (Sec. 3.2) considers a single-state problem.This is not the case of our dissertation, since we aim to prompt an EMA taking inconsideration the data sensed through the user’s smartphone. These data define acontext, and thus a state in which the user is when the algorithm must choose whichis the most appropriate EMA to prompt. As a consequence, adopting UCB 1 as de-scribed above to select the EMA would mean totally ignoring the context, and thusnot exploiting all the information collected by the smartphone about the context.

Fortunately, MAB problems have been generalized to Contextual MABs, a classof problems where at each time step the agent is in a context. Before taking theirdecisions, the agent observes the context, or a part of it, and exploits this knowledgeto take the decision on which arm to pull. Usually, the policy in a Contextual MAB isa set of sub-policies, each one mapping a specific context to an arm. In the Literature,we find many algorithms to find the best policies, each one differs mainly for theknowledge about the time horizon they require and the possibility of dealing with afinite or an infinite number of policies. The price for the introduction of the contextis a degradation of the performances, in particular the cumulative regret is higher.Different algorithms have different regret bounds, but generally, the loss is in theorder of the square root of the time instant (O(

√T)) (Zhou, 2015).

Among all the existing algorithms we decided to adopt Query-Ad-Clustering al-gorithm (Lu, Pál, and Pál, 2010), given its properties with match very well with ourproblem, especially with it on-line nature.

3.3.1 Query-Ad-Clustering Algorithm

As the name states, Query-ad clustering is an algorithm that was born to choosethe best ad to display on a web page. The authors wanted to exploit the informa-tion about the visitor of the page to tune the choice of the ad, and then increasethe Click-Through Rate. Query-ad-Clustering is composed of two steps. First, theentry points are clustered in subsets. Then, every cluster is treated as a traditionalMulti-Armed Bandit problem, and therefore a resolution algorithm similar to UCB1 is applied independently to each cluster. What makes this algorithm particular isthat the number of clusters and of arms considered is not fixed, but it is dynami-cally increased with the increasing number of data points collected. In this way, inthe first stages of the execution, when the number of points is low, we avoid hav-ing several clusters with a very low number of points inside and a many of possiblearms, setting that cause very low performances to the MAB algorithm. When thenumber of data points becomes considerable, then they are split into more clustersto exploit better the context knowledge, and more arms are considered. To makeQuery-Ad-Clustering work properly, we need to satisfy two hypothesis:

1. The space of data points is endowed with a metric, and the reward functionsatisfies the Lipschitz condition with respect to each coordinate. Intuitively,the reward function must be a "smooth function", since its derivative must bebounded.

2. The sequence of points is generated by an obvious adversary and revealed ineach time instant. Informally, all the data points are generated in advance by

3.3. Contextual Multi-Armed Bandit 33

the adversary, thus without knowing the choices made by the algorithm. Thedata points, though, will be disclosed only one at a time.

Before proceeding with the formulation of the algorithm, though, we need todefine to introduce the notion of metric space, the covering number and the coveringdimension, quantities necessary for the execution of the algorithm.

Definition 3.12 (Metric Space). A Metric Space is a pair (X, d) where X is a set andd is a function d : X × X → R which satisfies the following conditions, ∀x, y, z ∈ X(Choudhary, 1993):

1. d(x, y) ≥ 0

2. d(x, y) = 0↔ x = y

3. d(x, y) = d(y, x)

4. d(x, z) ≤ d(x, y) + d(y, z)

Definition 3.13 (Covering Number). Given a metrical space 〈X, d〉, The CoveringNumber N (X, d, r) is the smallest number of sets needed to cover X such that ineach set of the covering any two points have distance less than r.

Definition 3.14 (Covering Dimension). Given a Metric Space 〈X, d〉 and its CoveringNumber N (X, d, r), the Covering Dimension of 〈X, d〉 is:

COV(X, d) = in f {d : ∃c > 0 ∀r ∈ (0, 1] N (X, d, r) ≤ cr−d} (3.20)

We now have all the notions necessary to describe the algorithm. Given 〈X, d〉and 〈Y, dy〉, respectively the metric space of the data points and the one of the EMAs(the arms), let a,b be their covering dimensions. Define a′,b′such that a′ > a andb′ > b. Last, we define c,d such that the covering numbers of X,Y are bounded byN (X, dx, r) < cr−a′ and N (Y, dy, r) < dr−b′ .

The algorithm works in epochs i = 0, 1, 2, 3, . . ., each one made of 2i time steps.At the beginning of each epoch, Query-Ad-Clustering algorithm creates a partitionof the data point space X, {X1, . . . , XN}, where N, the number of partitions (clusters),is computed as follows:

N = c · 2a′ i

a′+b′+2 (3.21)

At the same time, the algorithm selects a subset of the arms Y0 ⊆ Y such that |Y0| = Kand each y ∈ Y is within distance r to a point in Y0, where

K = d · 2b′ i

a′+b′+2 and r = 2−i

a′+b′+2 (3.22)

Then, at every time step t for that epoch, when a new data point xt arrives, thealgorithm decides to which cluster Xj it belongs. The arm to pull is chosen in Y0training with the points belonging to Xj a slight variation of UCB 1, where the upperbound considered is the Upper Confidence Index:

It(yi) = r̂i,t + Ct(yi) (3.23)

where

Ct(yi) =

√4i

1 + Nt(yi)(3.24)

This algorithm has regret O(Ta+b+1a+b+2 + ε), for any ε > 0.

35

Chapter 4

Study Design

In the previous chapters we have first seen the current State of the Art in the Litera-ture (Chap. 2), then we analyzed our problem from a theoretical perspective (Chap.3). We have now all the instruments necessary to go in the depth of the problemand its resolution. This chapter opens with the description of the procedure of theexperimentation (Sec. 4.1). Then we will move to discover the technological infras-tructure underlying the experimentation and understand the design choices, first ingeneral (Sec. 4.2), to proceed to analyze the details of every component of the system(Sec.4.3, 4.4 and 4.5).

4.1 Experimentation Procedure

The experimentation is the most important part of the entire project. Thanks to a de-tailed experimentation we are able to give a suitable response to our research ques-tion, that, as we said, is "Is possible to increase EMAs compliance rate exploitingdata about the user context?". Thus, a detailed design of the Experimentation pro-cedure is necessary to eliminate any eventual confounding variable. In other words,we must pay extreme attention to eliminate any possible factor that could influencethe outcome of the experimentation. On top of that, the procedure must satisfyseveral constraints to make the study feasible and scientifically valid. First, sincemachine learning plays a fundamental role in the solution of the problem, we needas much data as possible. Second, the whole duration of the study must stay withinthe period spent in Georgia Tech. Third, since humans are involved in the experi-mentation, the study must be approved by the Institutional Review Board (IRB) ofGeorgia Institute of Technology.

IRB is a panel in charge of guaranteeing that studies that involve people respectthe rights and the welfare of human subjects. IRB requires researchers to submit aprotocol, and this must be approved before that the study can effectively start. IRBevaluation takes into consideration many aspects, the most important are:

• The Researchers must complete courses "Social/Behavioral Research Investi-gators and Key Personnel" and "Responsible Conduct of Research" and submitthe certificates (available in Appendix A);

• Before the experimentation starts, subjects must be aware of what is going tohappen and must sign a consent form to demonstrate it;

• The risk of harm for the people involved in the experimentation is minimized;

• Data about the people involved in the experimentation are treated to minimizethe risk of leaks.

36 Chapter 4. Study Design

IRB approval is not a straightforward procedure. It requires many documents, to beredacted very carefully, and often many iterations of submission and correction ofaspects not judged as suitable. As a consequence, it is an extremely time demandingprocess, and, unfortunately, many aspects of the experimentation have been modi-fied with respect to the original idea to make the study be approved. These aspectswill be highlighted while the procedure is explained in detail.

Participants

We will recruit a maximum of 20 participants among the students enrolled at Geor-gia Tech. The low maximum number is necessary to guarantee an expedited reviewof the protocol, shortening the approval process of one month. Participants willbe requested to install a mobile application in charge of collecting data and fromthe smarphone and its sensors, and periodically interact with it answering to someEMAs. Participants will be volunteers, therefore no compensation will be given fortheir participation in the study. Volunteers must meet all the following criteria:

• They must be students at Georgia Tech

• Participants must have an Android Smartphone. Android version must be atleast 4.4. The device must be able to run the application for the whole study.

• Participants must have access to an internet connection from their smartphone,either mobile data or WiFi, for most of the time.

Participants will not be considered eligible if they meet at least one of the followingcriteria:

• Individual’s device belongs to OnePlus brand, because the Operative Systemon these phones is not compatible with the application.

• The subject is located in a European Union country, because of the GeneralProtection and Data Regulation (GDPR).

Recruitment

The recruitment will be made using flyers (Fig. 4.1 ), web announcements as Emailsand posts on social media channels (Slack, Facebook and Reddit). Volunteers will beasked to fulfill a form giving the following information to be contacted back:

• Name

• Surname

• Georgia Tech id

• Email address

• How they knew about the study

• Free comments

The email we will send to recruit will contain the following text:If you are a Georgia Tech student, you are requested to participate in our study about

Ecological Momentary Assessment (EMA), a tool widely used in psychological studies. It

4.1. Experimentation Procedure 37

consists of some very short questionnaires prompted several times per day. It allows re-searchers to precisely reconstruct the variables they want to observe, without introducingbias due to the recalling process. We want to try to minimize the burden of answering theseEMAs, exploiting data collected from smartphones. The study we are asking you to par-ticipate will last 2 weeks. During this period, we will collect anonymous data from yoursmartphone while we will ask you to answer some EMAs. At the beginning and at the end ofthe data collection we will meet in one-in-one sessions to help you to set up your smartphoneand receive your feedback on the experience you had. We will do our best to minimize youreffort. You will not be compensated for your time.

IF YOU ARE LOCATED IN A EUROPEAN UNION (EU) COUNTRY, YOU ARENOT PERMITTED TO PARTICIPATE IN THIS STUDY DUE TO THE GENERAL PRO-TECTION DATA REGULATION (GDPR).

If you want to participate, visit: [recruitment questionnaire link]

Study Protocol

As shown in Fig. 4.2, the procedure will consist of three main steps:

1. One-on-one meeting

2. Data Collection

3. Wrap-up Meeting

One-on-one Meeting

Participants will be met individually for a one-on-one meeting, divided into 3 phases.Firstly, any potential doubt the subjects have will be clarified, and a signed consentform will be collected from the participants [Appendix B]. On the consent form, adedicated section will contain the email of the participant and a univocal code as-signed to the participant, that from now on we will call the participant code. Then,demographic information will be collected through a survey on Qualtrics platform.On this survey, the participant code will be the only personal identifiable datum col-lected. In the last part of the meeting, the AWARE application will be installed onthe volunteers’ smartphones. Participants will be trained to correctly use AWARE.At the end of the meeting, on a separate piece of paper, we will write the correspon-dence among Aware device id and the participant code.

The procedure of using the participant code as a link among the various datamay seem unnecessarily complex, but it is a required step to protect the sensibledata. In fact, a potential ill-intentioned must have access to both the consent formsand the participant code - device id association table to be able to reconstruct thewhole information.

Data Collection

Data collection constitutes the most important part of the experimentation. It willlast around two weeks, from the One-on-one Meeting until the Wrap-up Meeting. Inthis period, volunteers will be asked to conduct their normal life; when a notificationis prompted, they have to implicitly decide whether to answer or ignore it. Duringthe initial meeting volunteers will be explicitly asked to look at what type of EMAthey are requested to answer before deciding whether to reply or not. 8 to 15 timesa day, AWARE application will draw an EMA randomly type among all the EMAs


C A M P U S S T U D Y

W E W A N T Y O U F O R A

S C A N T H E Q R O R G O T O B . G A T E C H . E D U / 2 Q G 4 S K 7

FIGURE 4.1: Leaflet used for the recruitment

4.1. Experimentation Procedure 39

FIGURE 4.2: Timeline of the study

possible to prompt on the device. The full list of the possible EMAs is described inSection 4.1.1.

In the meantime, AWARE will record the following data:

• Applications in use

• Battery Status

• Incoming/outgoing telephone call in course (NOTE: only if the phone call ishappening, NOT the receiver nor the content of the conversation)

• Ambient Light

• Geolocation

• Display Status (turned on/off, screen locked)

• Ambient Noise and Engagement in a Conversation (the raw audio recording isNOT stored; instead, only short periodic samples of sound are collected whichcannot be reconstructed/interpreted to words or actual speech. We will NOTknow the content of the conversation. We will only be able to determine ifconversations took place, along with features like pitch, volume, . . . )

• Device usage (High/Average/Low)

• Weather condition

• Activities performed (e.g., Walking, Sitting, . . . )

• Network Status

Periodically, volunteers will be reminded through a text message or an email, toconnect to Georgia Tech net through a VPN to allow AWARE to synchronize datawith the server.

Wrap-up Meeting

At the end of the 2-weeks period, during an individual wrap-up meeting, the datapresent on the device will be synchronized with the server. Then, each volunteerwill have the possibility to make their considerations about their experience withthe application. A semi-structured interview will focus on the differences the usersperceived among the different typologies of EMAs prompted and the burdensomeperceived while answering the questionnaires. The main questions of the interviewwill be:


1. How was your experience with the application?

2. Where EMAs annoying?

3. Please, for every kind of EMA, tell me how annoying you found responding toit

4. Which are the main elements that made you decide whether to answer or notto a Question?

5. Do you feel you answered more to certain types of EMA with respect to others?

6. Do you think the kind of EMA influenced your decision of compiling it?

7. Do you think there were some moments you were more willing to respondmore “burdensome” EMAs?

8. Do you believe that if I tuned the EMAs type with what you were doing at themoment they were prompted you would have answered more?

9. Do you have any other question and or feedback?

4.1.1 EMAs

Users will receive 14 EMAs every day, one per hour from 9 AM to 10PM. Every EMAis composed of a single question and will investigate the respondent’s the mentalhealth status. To formulate EMA questions., we take inspiration from PANAS ques-tionnaire (Sec. 2.1.3). In particular, we formulate different kinds of question basedon the most relevant adjectives to measure Positive and Negative schedule accord-ing to Table 2.1, that are Enthusiastic, Determined, Interested, Excited and Inspiredfor the Positive Affect, and Scared, Afraid, Upset, Distressed, Jittery and Nervous forthe Negative Affect. The questions will be designed exploiting different possibilitiesthat Qualtrics platform provides, as shown in Figures 4.3 and 4.4:

• Quick Answer: The respondent must answer Yes or No: Do you feel Enthusias-tic?

• Radio Button: The respondent is required to select one answer among thepossible choices. The question is chosen among two possibilities, one to inves-tigate Positive affect and one to investigate the Negative one:

– Positive: Which is the adjective that describes better your mood at this moment?Attentive-Active-Enthusiastic-None of the Above

– Negative: Which is the adjective that describes better your mood at this mo-ment? Irritable-Nervous-Scared-None of Above

• Checkbox: The respondent must select all the adjective he/she feels to be rep-resented by: Select all the adjectives that describe you in this moment: Irritable-Enthusiastic-Scared-Nervous-Inspired-Excited

• Likert - Smile: The respondent must choose through a slider the smile facethat represents better his mood at that moment

• Likert - Multiple Adjective: The user must give a score from 1 to 5 stars to fourdifferent adjectives prompted, according to how much he feels they describehis/her mood: Please select for each adjective how much it represents you at thismoment: Irritable-Enthusiastic-Determined-Nervous

4.2. Instrument Overview 41

• Hot Spot: A picture with 15 different smiles is prompted. The respondentmust select all the smiles he/she feels represented by.

• Rank: The respondent must rank a set of adjectives accordingly his/her cur-rent mood: Please, rank the adjectives according to how much they represent you atthis moment (the most representative ones on top): Irritable-Enthusiastic-Determined-Nervous-Inspired-Jittery

4.2 Instrument Overview

Figure 4.5 presents a high-level view of the architecture of the system. As we can see,the architecture is divided into three main components. In the leftmost part of thepicture, the smartphones are the interface to the users. In particular, a customizedversion of AWARE framework is in charge of collecting data from the users, anddisplaying EMAs at fixed times. AWARE periodically synchronizes the data with itsback-end application, hosted on Georgia Tech Research Network Operations Center(RNOC). To allow the data-transfer, users must turn on a Virtual Private Network(VPN) on their smartphones. This operation is requested by the Georgia Tech’s pol-icy on data storage and protection. Users where reminded periodically through atext message or an email to turn on the VPN to make the transfer possible. Since thequestionnaires were generated through Qualtrics platform, the platform itself wasresponsible for the storage of the information related to the ESM. Finally, data com-ing from both RNOC servers and Qualtrics platform are joined and the simulationsare conducted locally on our machines. In the next sections, we will analyze bet-ter each component of the architecture, going more into the detail of how they areimplemented and how they interact with each other.

4.3 AWARE Framework

As explained in Sec. 4.2, the first endpoint in the System Architecture is AWAREframework (Ferreira, Kostakos, and Dey, 2015). AWARE is an open-source mobileframework developed to capture the context in mobile devices. As Figure 4.6 shows,AWARE framework contains two main modules, a mobile application for captur-ing the context, and a dashboard for managing the studies and storing data, whichwill be analyzed in section 4.4. AWARE Client collects data from sensors and storesthem locally on the smartphone. Senors divide into three main categories, accord-ing to their functionality: Hardware sensors, who listen to the actual hardware ofthe telephone, Software sensors, which monitor the software running on the device,and Human-based sensors, which gather data from a direct interaction with users,in particular through ESMs. More specifically, AWARE sensors are:

• Accelerometer

• Applications

• Barometer

• Battery Status

• Bluetooth Status

• Communication events (calls andmessages)

• ESMs

• Gravity

• Gyroscope

• Installations on the device


FIGURE 4.3: EMAs designed for the study. In order top to bottom,left to right: Quick Answer, Radio Buttons Positive, Radio Buttons

Negative, and Checkbox

4.3. AWARE Framework 43

FIGURE 4.4: EMAs designed for the study. In order top to bottom,left to right: Likert Smile, Likert Multiple Adjectives, Hot Spot, and

Rank


FIGURE 4.5: High level view on the architecture of the instrumentused in the experimentation

FIGURE 4.6: AWARE Framework

4.3. AWARE Framework 45

• Light

• Geolocation

• Magnetometer

• Network status

• Keyboard usage

• Processor load

• Proximity

• Screen status

• Telephony

• Temperature of the environment

• Timezone

• WiFi status

AWARE doesn’t limit to the collection of raw data from the sensors listed above.In fact, these data can be analyzed and abstracted to add further information to thecontext through custom plugins by means of data analysis. Plugins can either ex-ploit the information collected by the sensors listed above and perform some anal-ysis on the data, such as machine learning algorithms or data mining techniques,or implement new sensors, and consequently collecting and analyzing new data.Researchers can implement their own plugins, though some of them are alreadyavailable. In particular:

• Google Activity Recognition: it exploits accelerometer and gyroscope data todetect the activity performed by the user (e.g., Walking, Cycling, Standing, . . . )

• Ambient Noise: it exploits the data coming from the microphone to say if theuser is in a noisy environment

• Contacts: collects information on the user’s contacts list

• Conversations: detects if the user is enrolled in a conversation with anotherperson

• Device Usage: detects how much the device is used by the user

• Fitbit: it enables to collect data from a Fitbit wristband

• Google Fused Location: it exploits different sensors and Google Fused Loca-tion APIs to detect the geolocation of the user with low battery consumption

• OpenWeather: it exploits OpenWeather APIs to provide weather informationin the location of the user

AWARE’s User Interface is quite simple (Figure 4.7): when the application opens,the user can see the information about the device and the version of the software inthe upper part of the screen, below the list of all the sensors takes place. From thislist, the user can decide for each available sensor whether to activate it and, eventu-ally, to modify its settings, such as the sampling frequency. Through the menu onthe lower part of the screen, the user can also access the list of the installed plugins,and activate them. With the rightmost button of the same menu, the user can accessa real-time stream of the sensed data.

The context information can be retrieved in three different ways: Broadcasts,Providers and, Observers.

• Broadcast messages allow receiving quick updates about on a particular event.They only provide general information about the context, without any detailon the data gathered. For example, when the user starts charging the smart-phone, the relative message is broadcasted.


FIGURE 4.7: Screenshots from Aware mobile Application

4.4. AWARE Dashboard 47

• Providers are SQLite databases in which all the data gathered are stored, lo-cally on the smartphone.

• Observers monitors the context, and when a specific condition is met theyshare a message to remote devices using Message Queue Telemetry Transportmessage callbacks (MQTT).

AWARE it is a very flexible and powerful tool, the possibility of creating customplugins make it easily personalizable with other components and third-party ser-vices. For this reason, it has been used for many studies in very different domains.For example, several researchers exploited AWARE to understand the relationshipbetween the user and the smartphone [(Ickin et al., 2012), (Banovic et al., 2014)],while others focus on the physical and mental health of the user [(Kan, 2018), (Hut-tunen et al., 2017)]

Unfortunately, AWARE presents some limitations, too. First, if it is used togather data from several sensors, it impacts a lot on battery and performance ofthe smartphone. Especially when the accelerometer sensor is enabled, the batterylife of the smartphone is sensibly shorter. Thus, older devices could be unsuitableto run AWARE. For the same reason, it is impossible to make big computations lo-cally on the device, since it would impact too much the usability of the smartphoneitself. Second, since most of the work of AWARE consists of data gathering as a back-ground service, the correct functioning of AWARE is strictly dependent on the bat-tery optimization policies implemented in the device’s operative system. Thus, sincethese pieces of software are usually implemented by the device producers, the possi-bility of successfully adopting this application is limited to some device brands. Forexample, OnePlus battery optimization policy makes AWARE not working properly.In addition, sometimes the application suffers from sudden crashes, and a completere-installation is necessary.

4.4 AWARE Dashboard

As mentioned in the previous section, the second module of AWARE consists of adashboard for the researchers to manage their studies and store a copy of the data.In particular, in this study we use an instance of the dashboard hosted on RNOCservers in Georgia Tech. The dashboard is a PHP application that provides a UserInterface to the creation and the management of the studies. Researchers can loginto the dashboard and create their studies. Once the researcher inserts a name anda description for the study, the dashboard creates a database instance where the datacoming from all the users that enroll that study are stored. When it is created, thedashboard associates a QR code to the study.

To join a study, a user has only to capture the QR code with AWARE applicationon his/her smartphone. In addition, the researcher can set through the dashboardwhich sensors and plugin to enable, and modify their parameters, such that when auser joins a study, the device automatically configures correctly, without the neces-sity of manual set up. Last, the researcher can create some view on the data gatheredto explore them and make some insights. If the researchers want to access the rawdata, they directly query the SQL database in which all the data are stored. As adefault, when AWARE mobile application synchronizes data with the dashboard,it deletes the local copy from the cache of the device. The researcher, though, candecide to conserve data on the devices modifying the corresponding setting on thedashboard control panel.


Given the sensibility of the collected data, AWARE pays particular attentionto the transfer and the storage of sensible data. In fact, all the data collected areanonymized locally on the device before being sent to the database. In particular,the ID that identifies the user in the study is neither his telephone number, nor anyother code related to the smartphone. The code is a random string generated by theapplication when a study is joined. In addition, all the data coming from the key-board are hashed before being stored, and the data collected from the microphoneare only very short and sparse chunks from which is impossible to reconstruct whatwas happening at the moment they were recorded.

On top of that, RNOC servers add another layer of security. In fact, both the webapplication and the database are behind a firewall. To access them is necessary touse Georgia Tech VPN, logging in through the double-factor authentication systemprovided by the Institute.

4.4.1 AWARE Customization

As mentioned, we customized AWARE application to tailor it to our study. In par-ticular, we wanted to minimize the effort of the user in the setup process, make theuser experience as straightforward as possible, and collect additional data useful forthe analysis of the EMAs.

To minimize the effort user, we decided to automatize the setup process. In otherwords, since we wanted that all the users enrolled in the same study, we embeddedthe process of joining the study within the application, without having to capturethe QR code with the device camera. As a consequence, the second time that theapplication is launched, AWARE automatically joins the study and schedules all theEMAs for the whole experimentation. We chose the second start-up, instead of thefirst one, because during the first start-up process AWARE has to instantiate severalinternal processes and to ask the user all the permissions to access the resources.For this reason, if we also had our set-up in the same process, the application wouldhave crashed on the majority of the devices. We also included all the necessary plug-ins in the same application, such that the installation process is as straightforwardas possible.

The second problem we wanted to solve was related to the experience of the userin answering EMAs. Even if AWARE gives the possibility of using some embeddedEMAs, these are quite limited. In fact, they only allow the creation of a small set ofdifferent types of questions, shown in Figure 3.2. Fortunately, among the possibleones, there is the possibility of prompting a web page. We decided to exploit thiskind of EMA to prompt questions generated on Qualtrics platform. Qualtrics, infact, gives much more freedom in the available types of question, allowing us todesign a different User Experience for most of the EMAs. In addition, Qualtricsprovides very useful tools to gather meta-data on the answers, such as the time theuser has spent on the page of the question.

At the same time, we modified the notification prompted by AWARE when anew ESM was available. In fact, the original system used to show the same notifica-tion independently from the question prompted. Thus, to discover the content of thequestion, the user had to click on the notification to open the prompt with the ques-tion. With this mechanism, there was a high risk that the user could decide whetherto answer or not the question only by deciding whether to click on the notificationor delete it, without knowing the actual question, and therefore nullifying the aimof the research. To avoid this, we decided to write in the notification the text of the

4.5. Backend 49

FIGURE 4.8: Comparison between the default ESM on AWARE (left)and our modified version (right)

question, such that the participants could understand what they were required to doalready by reading the notification.

The existing web EMAs presented some problems though. As the left picturein Figure 4.8 shows, two buttons, "ok" and "cancel", occupied the lower part of thepop-up. These buttons, though, are limited to close the pop-up. To submit the EMAthe user needed to click the dark right arrow under the question. For this reason,if a user completed the EMAs and pushed the "ok" button instead the describedarrow, the pop-up would have closed, but the EMA wouldn’t be sent, resulting asnot answered. To avoid this problem, we decided to remove both the buttons andintroduce a "close" button in the top right corner, such that the interface results moreintuitive and the user is less prone to make errors.

Finally, we modified the code to add some information to the providers imple-mented by default from AWARE, in particular, to capture some additional informa-tion related to the EMAs prompted, that we believed could be useful in the dataanalysis.

4.5 Backend

The algorithm were run locally on our machines. We decided to use Python for sev-eral reasons. First, it is easily understandable, and its data-centered approach makes


working on data a fast and efficient process. Second, Python has very powerful androbust libraries such as Scikit-learn and Numpy available. Third, being one of themost adopted languages for machine learning, we had the possibility of finding alot of support on online communities.

Among the possible distributions we chose to use IPython, and both JupiterNotebook and JupiterLab as environments for the development of the algorithms.The choice was driven by the fact they allow to experiment rapidly having an im-mediate feedback on the result of the code execution, and thus to efficiently adopt atrial-and-error approach, making the development much faster. In addition, they al-low alternating execution blocks written in Python to documentation blocks writtenin Markup Language, making the notebook self-explanatory, and immediate to readand understand. Both Jupiter Notebook and JupiterLab allow to work with IPythonnotebooks. JupiterLab, though, provides some additional functionalities, such as aPython console running on the same Python Kernel of the notebook, and the possi-bility of displaying different parts of the same notebook in independent windows inthe user interface, giving the developer the possibility of looking at different pointsof the notebook without having to continuously scrolling the view.

51

Chapter 5

Data Analysis

Before going into depth of the data analysis, we have to talk about the big pictureon the Machine Learning Pipeline to clearly have in mind the big picture of the datajourney from their collection to the selection of the most suitable EMA.

First, data are collected through AWARE mobile application. AWARE storesthem locally on the device and periodically makes a synchronization with RNOCserver. Data stored in the server are raw, no preprocessing has been done and theirformat is not suitable for training any algorithm. These data are not complete either,in fact EMA answers are collected on Qualtrics Survey platform. Section 5.1 willdescribe in detail the how the dataset is composed, providing some insights on thecollected data.

The second step therefore is the integration of the data coming from the two dif-ferent platforms, AWARE and Qualtrics, and the preparation of the dataset for thelearning algorithm. We need to verify the quality of the data, and possibly try toimprove it through some operations such as missing values imputation, normaliza-tion of all the measurements, and trasformation of some features to increase theirexpressiveness and capture some hidden aspects of the information. Section 5.2 willdescribe this process, giving the rationale behind every main design choice.

Then the actual learning process can begin. We will use Query-Ad-Clusteringalgorithm, as described in Section 3.3.1. Thus, Query-ad-Clustering is divided intwo main phases, k-means clustering algorithm will divide the points in differentclusters, representing the contexts in which the user is. Then we will train a separateMAB on every cluster generated.

5.1 The Dataset

In this section we will analyze the dataset obtained with the experimentation, un-derstanding the available data and their nature.

We run the experimentation with 8 Participants from Georgia Tech programs.We will refer to all of them with the code assigned during the study[P01 - P08]. Aparticipant (P08) quit because his phone was not able to support the application,another one (P04) quit because he found the study too burdening. The remaining 6participants run the application from 10 to 14 days. While installed, AWARE gath-ered data from all the sensors and prompted one EMA per hour, from 9 AM to 10PM.Unfortunately, the theoretical schedule of the EMAs has not been followed, due tocompatibility issues among the volunteer’s device and the mobile application. As aconsequence, the number of EMA reported varies a lot for the different participants,starting from only 22 EMAs prompted during the entire study (P06), to 491 per asingle subject (P01). We impute this discrepancy to some problems of compatibilityof the application with the devices. In fact, every distribution of Android Operative

52 Chapter 5. Data Analysis

TABLE 5.1: Compliance Rate of every EMA for every Participant

Checkbox Hotspot Mult. Adj. Smile Quick A. Rank Radio TOT

P01 55 68 68 66 52 56 67 432

P02 1 3 2 2 3 2 2 17

P03 14 17 15 15 11 16 12 100

P05 8 21 16 18 24 28 23 138

P06 4 3 2 1 1 1 3 15

P07 4 8 4 7 5 5 4 37

TOT 86 120 107 109 96 108 113 739

System has a different policy for the execution of background processes, necessaryto AWARE to work properly.

In total, we collected collected 739 EMAs distributed as shown in Table 5.1.The section continues as follows: first we will describe all the data coming from

the built-in sensors, then we will move to the data collected through the plugins.

5.1.1 Sensors Data

AWARE Senors Data consist of all the data collected from sensors without perform-ing any processing.

AWARE Device

When a device enrolls a study, AWARE creates a tuple containing all the informationrelated to the device. This information is stored in AWARE_device table, whoseschema is described in Table 5.2. As we can see from the schema, the AWARE sensorgives us a lot of information on the physical device, for our study, though, we arenot interested in them, but the table provides a reliable list of all the devices enrolledin the study, and thus, a list of all the device_id we have to look for in the othertables. The other features are useful only for statistics on the devices that enrolledthe study.

Application in use

Application sensor gathers information about the application running in the fore-ground. A new log entry is added to the table every time the user changes from anapplication to another. Table 5.3 describes the data logged by AWARE. In particular,package_name contains the name of the package of the application. This name is aunique string that identifies the application within the device and the Play Store. Ex-amples of package names are com.facebook.katana (Facebook mobile application),com.whatsapp (Whatsapp Messenger), and com.spotify.music (Spotify Music).

5.1. The Dataset 53

TABLE 5.2: AWARE device Data Scheme

Table Field Field type Description

_id INTEGER Primary key, auto incremented

timestamp REAL Unixtime (milliseconds since 1970)

device_id TEXT AWARE device UUID

board TEXT Manufacturer’s board name

brand TEXT Manufacturer’s brand name

device TEXT Manufacturer’s device name

build_id TEXT Android OS build ID

hardware TEXT Hardware codename

manufacturer TEXT Device’s manufacturer

model TEXT Device’s model

product TEXT Device’s product name

serial TEXT Manufacturer’s device serial, not unique

release TEXT Android’s release

release_type TEXT Android’s type of release (e.g., user,userdebug, eng)

sdk INTEGER Android’s SDK level


TABLE 5.3: Applications Foreground Data Scheme





package_name TEXT Application’s package name

application_name TEXT Application’s localized name

is_system_app BOOLEAN Device’s pre-installed application

Calls Data

The call sensor logs all the call events such as incoming, outgoing and voice calls.Sensible data is protected by encrypting the source (or the target) of the call throughSHA-1 algorithm. In this way, we are able to understand which of the calls have beendone with the same person, but we don’t know the identity of the person. Table 5.4describes in depth the Data Scheme.

Ambient Light

Light sensor reports the light in the ambient in which the device is. It exploits thephysical sensor located on the front of the phone, near to the call speaker. Table5.5 describes in detail the Data Scheme. Even if we are not directly interested inthe value of the luminance reported, it may reveal useful to determine if the user isoutside, inside, or in particular light conditions such as a dark room (and therefor heis probably resting). Only the analysis of the data will be able to reveal if this sensoris a good proxy to estimate user’s Interruptibility.

Screen Status

Screen sensors reports every time that the screen of the devices is turned off, turnedon, locked or unlocked. The full description of the information schema is in Table5.6.

Network Status

Network sensor logs the changes in the network status. More precisely, it trackswhen the various networks are turned on and off. Table 5.7 describes the DataScheme.

5.1. The Dataset 55

TABLE 5.4: Calls Data Scheme





call_type INTEGER one of the Android’s call types (1 – incom-ing, 2 – outgoing, 3 – missed)

call_duration INTEGER length of the call session

trace TEXT SHA-1 one-way source/target of the call

TABLE 5.5: Light Data Scheme





double_light_lux REAL the ambient luminance in lux units

accuracy INTEGER the sensor’s accuracy level – constantfrom the actual sensor

label TEXT researcher/user provided label. Usefulfor data calibration or labelling


TABLE 5.6: Screen Data Scheme





screen_status INTEGER screen status, one of the following: 0=off,1=on, 2=locked, 3=unlocked

TABLE 5.7: Light Data Scheme





network_type INTEGER the network type, one of the following: -1=AIRPLANE, 1= WIFi, 2=BLUETOOTH,3=GPS, 4=MOBILE, 5=WIMAX

network_subtype TEXT the text label of the TYPE, one of the fol-lowing: AIRPLANE, WIFI, BLUETOOTH,GPS, MOBILE, WIMAX

network_state INTEGER the network status (1=ON, 0=OFF)

5.1. The Dataset 57

TABLE 5.8: Activity Recognition Data Scheme





activity_name TEXT human-readable activity name: un-known, tilting, on_foot, in_vehicle,on_bicycle, running, walking

activity_type INTEGER a code to identify the detected activity

confidence INTEGER prediction accuracy (0-100)

5.1.2 Plug-ins Data

Contrarily to the sensors, AWARE plug-ins process data collected by sensors beforestoring them. The processing can be executed either locally on the device or exploit-ing some cloud service through API’s calls. Its nature can be of any kind: simpleAPI’s calls to external services, mathematical computation on the sensors signals toextract new features, or the execution of some machine learning algorithms prece-dently trained.

Activity Recognition

Activity recognition plugin exploits Google Location API’s to detect the activity per-formed by the user. In particular, it exploits mainly data from the Accelerometerand Gyroscope to say if the user is Running, Walking, Standing, or he/she is on avehicle. Table 5.8 describes the Data Scheme in detail.

Ambient Noise

Ambient Noise plugins exploits the microphone of the device to detect the levelof the noise in the surrounding environment. Table 5.9 shows the Data Scheme indetail. Note that even if a audio snippet is collected and stored, it is not enough toreconstruct what is being said, or where the users is when the snippet is recorded.

Google Fused Location

Google Fused Location plugin exploits Google Location API’s to determine the geo-graphical location of the user. This service gathers data from various sensors, suchas Wi-Fi networks and GPS to provide the position in an energy efficient way. Table5.10 describes the Data Scheme in detail.


TABLE 5.9: Ambient Noise Data Scheme





double_frequency REAL sound frequency in Hz

double_decibels REAL sound decibels in dB

double_RMS REAL sound RMS

is_silent INTEGER 0 = not silent 1 = is silent

double_silence_threshold REAL the used threshold when classifying be-tween silent vs not silent

blob_raw BLOB the audio snippet raw data collected

5.1. The Dataset 59

TABLE 5.10: Fused Location Data Scheme





double_latitude REAL the location’s latitude, in degrees

double_longitude REAL the location’s longitude, in degrees

double_bearing REAL the location’s bearing, in degrees

double_speed REAL the speed if available, in meters/secondover ground

double_altitude REAL the altitude if available, in meters abovesea level

provider TEXT gps, network, fused

accuracy INTEGER the estimated location accuracy

label TEXT Customizable label. Useful for data cali-bration and traceability


5.2 Data Preparation and Features Transformation

Once the data had been collected, some preprocessing operations revealed neces-sary to adequately prepare the data for the learning algorithms. We therefore trans-formed the data collected in the following features:

• battery_level

• device_on

• hour

• last_activity

• light_value

• network_type

• notification_number

• place

• rain

• screen_app

This section will describe in detail how every feature is computed, exploring indetail why it has been chosen and how it has been extracted. To guarantee flexibilityto the learning algorithms, many functions for the the data preprocessing has beendesigned as parametric. Therefore, we will refer to the interval of time consideredfor identifying the context without explicitly quantifying it. We will quantify thisinterval in the training phase of the algorithms.

5.2.1 battery_level

The battery level is interesting because we believe that low battery levels make usersless willing to use their smartphone to answer to long questions, because they wantto preserve the battery as much as they can. For the same reason, when the batteryis charging we consider it as fully charged. We retrieve its value from Battery sensorand normalize the value in the interval [0, 1].

5.2.2 device_on

device_on describe how much time the device has been turned on in the time in-terval considered. It exploits data coming from Device Usage plugin, in particularthe fields double_elapsed_device_on and double_elapsed_device_on. The func-tion operates in different steps. First, it sums the total value of the two variablesacross all the tuples whose timestamp is within the time interval considered, storingthe values in tot_on and tot_off. Then it returns the percentage of the time thescreen has been turned on, that is:

on =tot_off

tot_on+ tot_off(5.1)

5.2. Data Preparation and Features Transformation 61

5.2.3 hour

The hour of the day is potentially a great descriptor of Interruptibility. In fact, userswill likely perform similar actions and, thus, will be interruptible at a similar level.For this reason, we extract from the timestamp of the EMA generation the hour ofthe day. Since it is in an interval 0-23 we normalize it:

hour[0,1] =hour[0,23] − 8

15(5.2)

The normalization is performed by subtracting 8 and dividing by 15 because EMAare prompted only from 8AM to 11PM.

5.2.4 last_activity

Last activity reports the most recent activity detected by Google Activity RecognitionAPI’s. These activities are described as a limited set of strings (e.g. walking, still,. . . ), therefore we convert them into a value in [0, 1]. The assignment is not donerandomly, but trying to order the activities according their physical involvement,following the idea that the more we are active the less we are interruptible. Theresulting conversion is the following:

Unknown = 0Still = 1In_vehicle = 2On_foot = 3Tilting = 4Walking = 5Running = 6On_bicycle = 7

The value is divided by 7 before being returned.

5.2.5 light_value

light_value reports the light intensity measured from the light sensor on the frontside of the device. This raw value, though, can vary a lot for two main reasons. First,it is strictly hardware dependent, since every physical sensor has a different rangeof values. Second, the values registered can vary a lot according to the differentlight conditions, from values below 1 in dark environments to the maximum valueof the sensor for the sunlight, typically in the magnitude order of tens of thousandsof units.

To compute the final value we take the registered value and we normalize itdividing by the maximum value of the sensor:

light_value =registered_value

max_value(5.3)

5.2.6 network_type

network_type variable describes the type of network the device is connected to. Net-work sensor reports a value in [−1, 5] according the following convention:


Airplane = -1WiFi = 1Bluetooth = 2GPS = 3Mobile = 4WiMax = 5

network_type therefore contains the value in network sensor normalized:

network_type =network_value+ 1

6(5.4)

5.2.7 notification_number

notification_number reports the number of notifications prompted on the devicein the few minutes before the ESM is generated. To achieve this result, we countthe number of entries inserted in the Notifications Table by the Notification Sensorfor the selected device. To normalize it, we divide the number of notification by thehighest number of notification registered during the data collection.

5.2.8 place

place variable contains the nature of the place in which the user is when the EMAis prompted. Intuitively, we believe that this descriptor can be a good indicator forInterruptibility, in fact we think that in many cases the place in which the user istells about the activity he/she is doing, and therefore can influence the willingnessto answer EMAs. For this reason, we want to add expressiveness to the coordinatesregistered by AWARE describing the kind of place where the user is. In addition,the abstraction from the coordinates convention to pass to place labels makes similarlocations (e.g. two different bars, or supermarkets) look the same, even if geograph-ically distant.

Labelling places is not a trivial task, though. First, we need to clean the ac-celerometer data. To do that, we decided to adopt a clustering approach. In otherwords, we decided to consider all points closer than 20m as a single point, repre-senting them with the centroid of their cluster. To measure the distance among theclusters, we use a measure similar to euclidean distance, but that keeps in consid-eration the elipside shape of the Earth (Karney, 2013). The resulting algorithm is aHierarchical Clustering Algorithm that stops merging clusters when the centroidsare more distant than 20 meters.

Once the coordinates of the centroids are determined, we need to label them, andgive the same label to all the points of the cluster. We tried different approaches, likeGoogle Places and Foursquare API’s. Unfortunately, we found that both of themhad bad performances if employed with this aim. In the meantime, we noticed thatthe number of clusters is logrithmimc with the number of data points, given the ev-eryday routine of most students. As a consequence, we decided to manually labelall the points. We developed a little User Interface to plot the points on a map andconsequently select the most suitable label, based on our knowledge of Atlanta. In-spired from Google Places Labels, we decided to adopt the following classification:

Open_air = 0Public_transportation = 1Shop = 2

5.2. Data Preparation and Features Transformation 63

Restaurant = 3Private_house = 4Library = 5University = 6Event = 7

5.2.9 screen_app

We want to capture the context of the screen in detail, describing if the screen isturned on or off and, in case it is turned on, describe the kind application it is run-ning. AWARE, collects information about the status of the screen and the ID of theapplication running. Thanks to this ID, we can look up on the play store for thecategory of the application running. In fact, Google organizes the application in 32different categories, the same we use to navigate in the play store, that are:

• ART_AND_DESIGN

• AUTO_AND_VEHICLES

• BEAUTY

• BOOKS_AND_REFERENCES

• BUSINESS

• COMICS

• COMMUNICATION

• DATING

• EDUCATION

• ENTERTAINMENT

• EVENTS

• FINANCE

• FOOD_AND_DRINKS

• HEALTH_AND_FITNESS

• HOUSE_AND_HOME

• LIBRARIES_AND_DEMO

• LIFESTYLE

• MAPS_AND_NAVIGATION

• MEDICAL

• MUSIC_AND_AUDIO

• NEWS_AND_MAGAZINES

• PARENTING

• PERSONALIZATION

• PHOTOGRAPHY

• PRODUCTIVITY

• SHOPPING

• SOCIAL

• SPORT

• TOOLS

• TRAVEL_AND_LOCAL

• VIDEO_PLAYERS

• WEATHER

This categorization is a good starting point, but it is not enough. In fact, 32 dif-ferent labels are too much given the small amount of data ponts. On top of that,it is difficult to give a numerical value to all these categories that have some logi-cal meaning. For this reason, we decided to project the application category on a2-dimensional vector.

We identified two different measurements that characterize our interaction withan application:

• Time: the amount of time we usually spend on the application from its launchto its closure. We consider the time the application stays on the backgroundof the smartphone. For example, maps are usually on the screen for a longtime, since we need them to reach the destination, whereas we tend to spendrelatively short intervals on social networks.

• Personal Commitment: measures how we are involved in the interaction withthe application. For example, when we write a text message our commitmentis low, in that we can do that while we are doing other things, whereas if weare playing a game we are totally focused on the application.


TABLE 5.11: Conversion among the app category and Screen AppTime and Commitment Measurements

Point Time Commitment Categories List

a zero zero Display off

b low zero Lock Screen

c low low COMMUNICATION, MUSIC

d medium low ART_AND_DESIGN, AUTO_AND_VEHICLES,BEAUTY, EVENTS, FOOD_AND_DRINKS,HOUSE_AND_HOME, LIBRARIES_AND_DEMO,LIFESTYLE, PARENTING, PERSONALIZATION,SPORT, TRAVEL_AND_LOCAL, WEATHER

e medium medium DATING, SOCIAL

f medium high BOOKS_AND_REFERENCES, BUSINESS,COMICS, EDUCATION, FINANCE, MEDICAL,NEWS_AND_MAGAZINES, SHOPPING

g short high TOOLS

h long high HEALTH_AND_FITNESS,MAPS_AND_NAVIGATION, PRODUCTIVITY

i short extreme PHOTOGRAPHY

j extreme extreme ENTERTAINMENT, VIDEO_PLAYERS

For each of the two variables, screen_app_time and screen_app_commitment, weadopted a 5-values scale [zero, low/short, medium, long/high, extreme] and for ev-ery app category we chose the most appropriate couple of values. Finally, we equallydistributed the values of each variable along the [0, 1] interval, and we assigned thecouple of values at each app category. Table 5.11 reports the conversion between thevalues and the categories, whereas Figure 5.1 gives a graphical representation of theconversion

5.3 EMA data

EMA information comes from two main sources. AWARE stores a table in which itreports all the information about the EMA prompted, such as the time at which ithas been spawned and its content. On the other hand, Qualtrics platform stores in-formation about every EMA answered, such as the content and the timestamp of theresponse, and other interesting side information such as the amount of time the userspent on the EMA page before submitting the question. Since on Qualtrics server

5.3. EMA data 65

FIGURE 5.1: Graphical Representation of screen_app_time andscreen_app_commitment features. The labels of the points refer to

Table 5.11

there is an entry only for the answered EMA, to understand if an EMA generatedby AWARE has been answered we simply look up in Qualtrics data to search for anentry about the same EMA. If there is, the EMA is reported as answered, otherwiseit is reported as missed.

67

Chapter 6

Discussion and Conclusions

6.1 Quantitative analysis

As introduced in Chapter 1, to understand if data gathered from sensors can beexploited to increase the user compliance in EMAs, we need to break down the re-search question into four different sub-questions and answer them individually. Inthis section, we will go through the results of the experimentation to answer all thesub-questions.

Q1- The design of the EMA has an impact on the user effort required toanswer

To answer this question we first needed to define a measurement to quantify theeffort the user has to make to respond. Since we wanted to find an objective mea-surement, we decided to consider the time the user spent on the question as a mea-surement to evaluate the effort. The longer the time spent on the question, the morethe effort required, and vice-versa. Figure 6.1 presents the distributions of the re-sponse time for all the participants, separated by type of EMA. We considered theanswering time of each EMA as an independent random variable, Figure 6.2 summa-rizes their distributions. Comparing it with the Quantity of Information computedfor every EMA (Fig. 6.6), it is interesting to notice how the time spent answeringan EMA grows with the Quantity of Information the EMA contains, except from theHotspot EMA.

To understand if the differences among the EMAs are significant, we used thet-test on all the pairs of EMAs, with α = 0.05. Given the presence of outliers, pointswhose time value is much higher than the mean value of the distribution. We im-pute these values to the application EMA opened without the user paying attentionto the EMA. Therefore, we decided to discard all the points above two standard de-viation. Since we are performing many independent tests on the same distributions,we applied the Bonferroni correction to not overestimate the results. Given n = 7,the number of different distributions, the corrected α∗ is:

α∗ =α

(72)

=0.0521

= 0.0024 (6.1)

The confidence levels of every pairwise test are reported in Figure 6.3. As wecan see, many pairs of distributions are significantly different. For example, rankingEMA is different from any other questionnaire, asking, on average, much more timeto be completed. Quick answer, instead, is the questionnaire that requires the mini-mum effort for the user. Likert Smile is the only one that is not significantly differentfrom most of the other EMAs.

68 Chapter 6. Discussion and Conclusions

FIGURE 6.1: Distribution of time spent to answer of the various typesof EMAs

6.1. Quantitative analysis 69

FIGURE 6.2: Distribution of the different types of EMA with respectto the time spent to answer (seconds)

FIGURE 6.3: P-values from t-tests performed on the pairs of EMAs.The green values are below the threshold, while the red values failed

the test


Q2 - The design of the EMA has an impact on the compliance rate

To answer this question we computed the compliance rate for every participant ofevery type of EMA. Before, we looked into the answers to understand which re-sponses were genuine response and which ones were given only to close the ques-tionnaire, without reporting the real mood (Chan et al., 2018). For every EMA typewe expected to find all the data points grouped in two main clusters: a first cluster ofvery fast answers, registered when the user skipped the question submitting a ran-dom answer to close it as fast as possible, and a second cluster of “slower” answers,representing all the answers for which the users spent time reading the question andthinking to the response. Beside, we expected this second type of duration to becomeslightly shorter over time, showing some sort of learning curve that represents theusers spending less time reading the questions since they remember them from theprevious EMAs. When we analyzed the data (Fig 6.1) we realized that the pointsdid not follow any of the expected patterns, but most of them were concentratedaround a single value, with some outliers with long response time. For this rea-son, we decided that we could not discriminate which answers were submitted withmindfulness and which were not, and therefore that we would not have discardedany entry. In the samples we also saw some outliers, questions whose answeringtime is definitely higher, but we decided to keep them as well, since we evaluatedthat if the questionnaire has been opened for a long time, then the user had time toread and answer it with purpose.

As shown in Table 6.1, we then compared the expected compliance of everyEMA. The findings are quite interesting. Some compliance rates are substantiallygreater than others (e.g., Checkbox), whereas others are more or less equivalent (e.g.,Multiple Adjectives and Likert Smile). In particular, not only Checkbox and Hotspothave a compliance rate that is around 150% the one of Multiple Adjective and QuickAnswer, but they also are significantly greater than the compliance rate of QuickAnswer and Radio Buttons.

This intuition is reinforced by looking at the compliance rate per single partici-pant since the value of the different EMAs differ a lot. Therefore, even if the numberof participants is too small to be able to consider any significance test, the differencesbetween many pairs of EMAs that we can say that this hypothesis is plausible.

Q3 - Different contexts imply different compliance rate for the same EMA

Before answering questions 3 and 4 we clustered each participant’s data in differentcontexts. To choose the best value for k, that represents the number of differentcontexts, we kept in consideration the following factors:

• Contexts containing a high number of samples were preferred, such that wecould better train the MABs

• k that generated multiple similar contexts were discarded

This approach created 6 different contexts for participant P01, 2 contexts for par-ticipants P02, P03 and P07, 7 contexts for participant P05, and 3 contexts for partici-pant P06.

Once the data points were labeled, we computed the compliance rate for everyEMA in every context. Figures 6.4 and 6.5 show the results, while Table 6.2 reportsthe number of EMA answered per type per context. A graph for every kind of EMAis reported for each participant. On the x-axis all the contexts are reported, while


TABLE 6.1: Compliance Rate of every EMA for every Participant

Checkbox Hotspot Mult. Adj. Smile Quick A. Rank Radio

P01 0.182 0.132 0.103 0.212 0.154 0.143 0.254

P02 1 0.66 0 0.5 0 0.5 0.5

P03 0.358 0.353 0.2 0.267 0.455 0.189 0.583

P05 0.875 0.857 0.875 0.722 0.875 0.821 0.67

P06 1 0.667 0.5 0 0 1 0.667

P07 0.25 0.625 0.5 0.571 1 0.4 0.25

AVG 0.61 0.549 0.363 0.379 0.414 0.509 0.487

TABLE 6.2: EMAs answered and missed for every type, participantand context

the height of the bar represents the compliance rate for the EMA in the particularcontext. From these graphs we can draw very interesting conclusions. First, thereare contexts in which the compliance rate is higher for most of the EMAs. We candeduce that there are some contexts in which the user is more willing to respond thanin others, showing coherence with the previous studies on interruptibility analyzedin Section 2.3.3. Second, in most of the graphs the compliance rate is different amongthe various contexts. For some participants the pattern is very similar for all theEMAs (e.g., P02, P03, and P07) whereas for others every EMA has a different pattern(e.g., P01 and P06). In either case, we can conclude that the context has an impact onthe compliance rate of every EMA.

Q4 - The most convenient EMA depends not only on the quantity of infor-mation it provides, but also on the context in which the respondent is

To answer the last question we trained the Contextual MAB training a different MABfor every context (Sec. 3.3.1). We run the algorithm separately for every participant


FIGURE 6.4: Compliance rate of Participants P01, P02, and P03 withdifferent EMAs with respect to the different contexts. The values are

listed in Tab. 6.2


FIGURE 6.5: Compliance rate of Participants P05, P06, and P07 withdifferent EMAs with respect to the different contexts. The values are

listed in Tab. 6.2


FIGURE 6.6: Expected rewards for the various EMAs, computed withEquation 3.4

and we compared the theoretical quantity of information of every EMA [Fig. 6.6]with the estimations produced by the executions of the algorithm [Fig. 6.7]. Somevery interesting facts emerge from this comparison. For some participants, suchas P02, P03, and P07, the estimation of the Quantity of Information almost coherentwith the theoretical one, even if the difference between the various EMAs is reduced.For participant P05, instead, even if the optimal EMA is almost always the Hot Spot,we find very interesting differences in the other EMAs.

The most interesting results come from Participant P01, though. We notice howthe expected Quantity of Information changes in almost every context. For example,in participant P01 estimations, in some contexts, like C0, C4, the Hot Spot is theoptimal EMA, in C1 the best choice is the Multiple Adjective EMA, in C2 and C3the optimal EMA is the Ranking one, whereas in C5 the algorithm would prompt aLikert Smile and a quick answer in C6.

On top of that, it is interesting to notice how the optimal EMA is different fordifferent study participants, highlighting the importance of learning algorithms tai-lored to the specific user, how Turner had figured out in his work (Turner, Allen, andWhitaker, 2015a).

In addition, we can draw another interesting conclusion: if we pay attention notonly to the highest estimation of Quantity of Information but also the other esti-mations, we note how the "ranking" changes with the contexts. As a consequence,given 2 different EMAs, we can foresee which one will be more appreciated in everycontext. Having a complete overview of the user’s behavior in every context.

Looking at the graph, the hypothesis seems verified. Anyway, we must keepin consideration the low number of data points per context, which may make theresults not generalizable.

6.2 Qualitative Analysis

As said in the Study Design, during the wrap-up meeting every participant was in-terviewed to understand their experience with the application and get their insights

6.2. Qualitative Analysis 75

FIGURE 6.7: Expected rewards predicted by the Contextual MAB al-gorithm


on the perception of the EMAs.All the participants who successfully completed the study were quite satisfied

with the experience, even if they reported some issues related to the use of the VPN.When the participants were asked which kind of EMA they preferred, surpris-

ingly the answers were very different. Some participants preferred fast questionslike Radio Buttons and Likert Smile, whereas others appreciated more the questionsin which they had more freedom, the ones in which they had more possibilities,finding the request of choosing a single answer too limiting and, thus, frustrating.In general, Hot Spot question has been the most appreciated because the pictureswere immediate and at the same time it gave a lot of freedom, while the less appre-ciated EMAs have been the Ranking and the Multiple Adjectives. In fact, the vastmajority of the participants had problems in assigning the lower positions of theranking EMA.

The Multiple Adjective EMA was perceived by most of the participants boring toanswer and too repetitive. On top of that, a participant declared that giving alwayssimilar votes to the adjectives in that EMA, "the Multiple Adjective made me feel awarm person". For another participant, instead, having four different adjectives tovote singularly gave the possibility of expressing his status in detail, while he foundthe pictures on the Hot Spot EMA too difficult to associate to his mood status. Thesedifferences in participants’ opinions make clear the necessity of EMAs that help theusers in expressing them how they prefer.

From the cognitive point of view, instead, all the participants agree that they pre-fer to make some additional effort to answer a question they appreciate than makeless effort for a question they don’t like. Often, in fact, the cognitive effort was mea-sured by the user not as the amount of time they spent on a single question, but ashow frustrated they felt in answering the questions. This frequent opinion enforcesthe idea that a personalized EMA could dramatically improve the User Experienceand, as a consequence, his compliance. On the other hand, it suggests that the timespent to answer may not be an appropriate measure for the cognitive error.

All the participants agreed that in some contexts they were less willing to answerEMAs. For example, most of them reported that they skipped questions when theywere in some class, or when they were doing something they needed to be concen-trate on. A participant made the example that, being in another city visiting a friend,he was really annoyed when an EMA was prompting on the map he was looking onhis phone. Another participant reported that during a coding competition he wasperforming (a hackathon) he found EMAs too intrusive. The same participant saidthat his desire to respond was really different if he was doing his homework or hewas playing with his phone.

Some participants believe that the kind of question they were asked had a strongimpact on their experience, for example, if a question they didn’t like was prompted,they were much less willing to respond, whereas for others the kind of EMA withrespect to the context in which it was prompted influenced the most the User Expe-rience.

Finally, all of them agreed that choosing the best EMA according to the contextwould have definitely improved their experience.

6.3 Limitations and Further Works

Even if the results are promising, this study presents some limitations. First, thesmall number of participants and the short period of the experimentation, make the

6.4. Conclusions 77

results not easily generalizable. To do that, a study with a real-time execution of thealgorithm is necessary to draw more specific conclusions.

As explained in Section 4.1.1, we created the EMA based on PANAS question-naire. However, the questions employed in this study have no validity from thepsychological perspective. As a consequence, there is the necessity of creating avalid set of question to give meaning to the information collected.

If with these improvements the project will reveal successful, the team will try tojoin the previous finding on the best moment to prompt an EMA with this study, toprompt the best EMA in the best moment.

6.4 Conclusions

Starting from the literature review, we wanted to understand whether it is possibleto exploit sensor data from the smartphone to increase EMA compliance. We dis-covered that users perceived different EMAs differently and that this has an impacton the effort required to answer. Then, dividing the data points into different con-texts, we discovered that these have an impact on the compliance rate. Finally, wediscovered that for most of the contexts there is a different optimal EMA.

These contributions open to a new kind of EMAs, sensitive to the user needs and,therefore, more friendly. If further analysis confirms the results, a whole new branchof EMA will be created, possibly enhancing EMAs effectiveness as a monitoring tool.Furthermore, the same results can be adopted in other contexts in which the userhas to be interrupted for something compelling, such as the display of notificationtailored to the user.

79

Appendix A

Certificates of Attendance Onlinecourses

In this Appendix we report the certificates obtained through the successful comple-tion of the online courses Social/Behavioral Research Investigation and Key Personnel andResponsible Conduct of Research. The authenticity of the certificates can be verified atthe links present on the certifiactes themselves

80 Appendix A. Certificates of Attendance Online courses

Completion Date 16-Aug-2018Expiration Date 15-Aug-2021

Record ID 28096041

This is to certify that:

Pietro Crovari

Has completed the following CITI Program course:

Human Research (Curriculum Group)

Group 2 Social / Behavioral Research Investigators and Key Personnel (Course Learner Group)

1 - Basic Course (Stage)

Under requirements set by:

Georgia Institute of Technology

Verify at www.citiprogram.org/verify/?we4ddce1d-c46e-4645-bcff-09c5eff378e0-28096041

Completion Date 15-Aug-2018Expiration Date N/A

Record ID 28096042

This is to certify that:

Pietro Crovari

Has completed the following CITI Program course:

Responsible Conduct of Research (Curriculum Group)

RCR Basic Course (Course Learner Group)

1 - Basic Course (Stage)

Under requirements set by:

Georgia Institute of Technology

Verify at www.citiprogram.org/verify/?wd9834b37-4899-4991-8ac7-02241505f1fa-28096042

81

Appendix B

Consent Form

In the next pages the Consent Form of the experimentation is reported in its entirely.Study participants were required to read it and sign in the last page of the document.Near the signature, the participant code was written as unique link between theparticipant code and the data of the participant.

Consent Form Approved by Georgia Tech IRB: October 11, 2018 - March 15, 2019

IF YOU ARE LOCATED IN A EUROPEAN UNION (EU) COUNTRY, YOU ARE NOT PERMITTED TOPARTICIPATE IN THIS STUDY DUE TO THE GENERAL PROTECTION DATA REGULATION (GDPR).

CONSENT FORMGeorgia Insti tute of Technology

A sub-study of CampusLife

INVESTIGATORS: DR. THOMAS PLOETZ, DR. GREGORY ABOWD, PIETRO CROVARI

You are being asked to be a volunteer in a research study.

PurposeThe aim of this study is to better understand the relationship among the data sensed through asmartphone and the answers received through some questionnaires, called Ecological MomentaryAssessment (EMA), periodically prompted on the device itself.

To be relevant, these questionnaires need to be fulfilled several times per day (typically from 8 to 15)and therefore they are perceived as burdensome by the respondent. We want to understand ifreducing this burden is possible, particularly exploiting the sensing capabilities of the smartphones.

Inclusion/Exclusion CriteriaYou can enrol this study only if you satisfy all the following requirements:

You are a student at Georgia Tech Currently you have an Android smartphone with Android Version installed at least 4.4 You have access to a internet connection for most of your time, either through mobile data

or Wi-Fi

You are not eligible if you meet at least one of the following criteria:

You are European Union citizen and/or located in European Union countries Your smartphone is a model from OnePlus brand

Number of ParticipantsWe will recruit a maximum of 50 participants

ProceduresWe will ask you to download an Android application, and use it for two weeks. This application willexploit your smartphone’s sensors to collect some data about your habits and activities. In particular:

Applications in use Battery Status Incoming/outgoing telephone call in course (NOTE: only if the phone call is happening, NOT

the receiver nor the content of the conversation) Ambient Light Geolocation Display Status (turned on/off) Ambient Noise and Engagement in a Conversation (raw audio data is NOT stored; instead,

only short periodic samples of sound are collected which cannot be reconstructed/interpreted to words or actual speech. We will NOT know the content of

Page 1 of 3


conversation. We will only be able to determine if conversations took place, along with features like pitch, volume etc.)

Device usage (High/Average/Low) Weather condition Activities performed (e.g. walking, sitting, …) Network

On top of that, the application will periodically ask you to answer some questions about how you feelin that moment. These questions are very quick to be answered (only multiple-choice answersselection or similar is required) and they will be prompted 8 to 15 times per day.

During an initial meeting we will help you to install the application, we will show you how it works,and we will answer to any doubt you may have. During a final interview we will collect your opinionabout the application and all your feedbacks about the two weeks of experimentation.

We will collect your demographic information (i.e., age, gender, nationality, etc.). You have the optionof not giving information regarding your demographic information. We will ask you to provide anemail address to reach you in case of necessity.

Risks or Discomforts:

The risks associated with participation in the study are deemed to be low. The major risk in thisproposal is a loss of confidentiality of data. To minimize the potential for a loss of data confidentiality,we have taken extensive measures to de-identify all the data we collect from you and ensure that thedata is stored on secure servers.

To protect your privacy, we will not collect any Personally Identifiable Information (PII) from youduring this study other than your email. At the beginning of the study, your email will be associatedwith a unique passphrase. The database storing the information collected during this study willcontain this unique passphrase but will NOT contain your email. The table linking your email to yourunique passphrase will be kept separately from the data we collect from you on an encrypted GTserver. Only authorized researchers will have access to this server. The table linking your email toyour unique passphrase will only be accessed should we need to contact you for follow up purposes.As mentioned in the procedures above, we may contact you regarding the collection of additionaldata, such as social media. We will provide collection details and obtain your explicit consent beforeany additional data collection occurs.

The data we collect from you that is not personally identifiable will be stored on a secure GeorgiaTech server, indexed only by the passphrase assigned to you at the beginning of the study. Thisincludes all of the following data: activity recognition, GPS, microphone data, device usage data(screen, Wi-Fi, battery, applications), your answers to momentary survey questions.

Study entry and survey completion pose a risk of time lost, and some questions asked may make youfeel uncomfortable. However, you will be able to dismiss any questions you do not have time for, ordo not feel like answering.

If at any point you decide you want to delete any of the data collected about you or if you want toend participation in the study entirely, please contact: [email protected]

BenefitsThere will be some modest benefits to you if you decide to participate in this study. You will gain abetter understanding of yourself – having the stimulus to reflect on your emotional well-being more

Page 2 of 3


often. Your participation will improve future in-the-wild studies on campus that could, in turn,improve student life.

CompensationThere is no monetary remuneration involved in this study.

In Case of Injury/Harm

If you are injured as a result of being in this study, please contact the Principal Investigator, Dr.Thomas Ploetz, at telephone (404) 226 5011. Neither the Principal Investigator nor Georgia Instituteof Technology has made provision for payment of costs associated with any injury resulting fromparticipation in this study.

Participant Rights

Your participation in this study is voluntary. You do not have to be in this study if you don't want to be.

You have the right to change your mind and leave the study at any time without giving any reason and without penalty.

Any new information that may make you change your mind about being in this study will be given to you.

You will be given a copy of this consent form to keep. You do not waive any of your legal rights by signing this consent form.

Questions about the Study

If you have any questions about the study, you may contact Pietro Crovari at [email protected].

Questions about Your Rights as a Research Participant

If you have any questions about your rights as a research participant, you may contact Ms. MelanieClark, Georgia Institute of Technology Office of Research Integrity Assurance, at (404) 894-6942.

If you sign below, it means that you have read (or have had read to you) the information given in this consent form, and you would like to be a volunteer in this study

_________________________________________ _______________________________________Participant Name (printed) Participant email

___________________________________________ ______________ Participant Signature Date

____________________________________________ ______________Signature of Person Obtaining Consent Date

Page 3 of 3

Participant Code:

PXX

85

Bibliography

Abouserie, Reda (1994). “Sources and levels of stress in relation to locus of controland self esteem in university students”. In: Educational psychology 14.3, pp. 323–330.

Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer (2002). “Finite-time analysis ofthe multiarmed bandit problem”. In: Machine learning 47.2-3, pp. 235–256.

Banovic, Nikola et al. (2014). “ProactiveTasks: the short of mobile device use ses-sions”. In: Proceedings of the 16th international conference on Human-computer inter-action with mobile devices & services. ACM, pp. 243–252.

Bubeck, Sébastien, Nicolo Cesa-Bianchi, et al. (2012). “Regret analysis of stochasticand nonstochastic multi-armed bandit problems”. In: Foundations and Trends® inMachine Learning 5.1, pp. 1–122.

Chan, Larry et al. (2018). “Students’ Experiences with Ecological Momentary As-sessment Tools to Report on Emotional Well-being”. In: Proceedings of the ACMon Interactive, Mobile, Wearable and Ubiquitous Technologies 2.1, p. 3.

Choudhary, B (1993). The elements of complex analysis. New Age International.Christensen, Tamlin Conner et al. (2003). “A practical guide to experience-sampling

procedures”. In: Journal of Happiness Studies 4.1, pp. 53–78.Coughlin, Steven S. (1990). “Recall bias in epidemiologic studies”. In: Journal of Clin-

ical Epidemiology 43.1, pp. 87 –91. ISSN: 0895-4356. DOI: https://doi.org/10.1016/0895-4356(90)90060-3. URL: http://www.sciencedirect.com/science/article/pii/0895435690900603.

Cover, Thomas M and Joy A Thomas (2012). Elements of information theory. John Wiley& Sons.

Crawford, John R and Julie D Henry (2004). “The Positive and Negative Affect Sched-ule (PANAS): Construct validity, measurement properties and normative data ina large non-clinical sample”. In: British journal of clinical psychology 43.3, pp. 245–265.

Feldman Barrett, Lisa and James A Russell (1998). “Independence and bipolarity inthe structure of current affect.” In: Journal of personality and social psychology 74.4,p. 967.

Ferreira, Denzil, Vassilis Kostakos, and Anind K Dey (2015). “AWARE: mobile con-text instrumentation framework”. In: Frontiers in ICT 2, p. 6.

Fisher, Robert and Reid Simmons (2011). “Smartphone interruptibility using density-weighted uncertainty sampling with reinforcement learning”. In: Machine Learn-ing and Applications and Workshops (ICMLA), 2011 10th International Conference on.Vol. 1. IEEE, pp. 436–441.

Ho, Joyce and Stephen S Intille (2005). “Using context-aware computing to reducethe perceived burden of interruptions from mobile devices”. In: Proceedings of theSIGCHI conference on Human factors in computing systems. ACM, pp. 909–918.

Hoeffding, Wassily (1963). “Probability inequalities for sums of bounded randomvariables”. In: Journal of the American statistical association 58.301, pp. 13–30.

https://doi.org/https://doi.org/10.1016/0895-4356(90)90060-3

https://doi.org/https://doi.org/10.1016/0895-4356(90)90060-3

http://www.sciencedirect.com/science/article/pii/0895435690900603

http://www.sciencedirect.com/science/article/pii/0895435690900603

86 BIBLIOGRAPHY

Hsieh, Gary et al. (2008). “Using visualizations to increase compliance in experiencesampling”. In: Proceedings of the 10th international conference on Ubiquitous comput-ing. ACM, pp. 164–167.

Hudson, Scott et al. (2003). “Predicting human interruptibility with sensors: a Wiz-ard of Oz feasibility study”. In: Proceedings of the SIGCHI conference on Humanfactors in computing systems. ACM, pp. 257–264.

Hunt, Justin and Daniel Eisenberg (2010). “Mental health problems and help-seekingbehavior among college students”. In: Journal of adolescent health 46.1, pp. 3–10.

Huttunen, Hanna-Leena et al. (2017). “Understanding elderly care: a field-study fordesigning future homes”. In: Proceedings of the 19th International Conference on In-formation Integration and Web-based Applications & Services. ACM, pp. 390–394.

Ickin, Selim et al. (2012). “Factors influencing quality of experience of commonlyused mobile applications”. In: IEEE Communications Magazine 50.4.

Intille, Stephen et al. (2016). “µEMA: Microinteraction-based ecological momentaryassessment (EMA) using a smartwatch”. In: Proceedings of the 2016 ACM Inter-national Joint Conference on Pervasive and Ubiquitous Computing. ACM, pp. 1124–1128.

Janssen, Christian P et al. (2015). Integrating knowledge of multitasking and interruptionsacross different perspectives and research methods.

Kan, Valerii (2018). “STOP: A smartphone-based game for Parkinson’s disease med-ication adherence”. In:

Karney, Charles FF (2013). “Algorithms for geodesics”. In: Journal of Geodesy 87.1,pp. 43–55.

Kessler, Ronald C et al. (2005). “Lifetime prevalence and age-of-onset distributions ofDSM-IV disorders in the National Comorbidity Survey Replication”. In: Archivesof general psychiatry 62.6, pp. 593–602.

Khubchandani, Jagdish et al. (2016). “The psychometric properties of PHQ-4 depres-sion and anxiety screening scale among college students”. In: Archives of psychi-atric nursing 30.4, pp. 457–462.

Kihlstrom, John F et al. (2000). “Emotion and memory: Implications for self-report”.In: The science of self-report: Implications for research and practice, pp. 81–99.

Kroenke, Kurt, Robert L Spitzer, and Janet BW Williams (2001). “The PHQ-9: validityof a brief depression severity measure”. In: Journal of general internal medicine 16.9,pp. 606–613.

Kuleshov, Volodymyr and Doina Precup (2014). “Algorithms for multi-armed banditproblems”. In: arXiv preprint arXiv:1402.6028.

Larson, Reed and PAEG Delespaul (1992). “Analyzing experience sampling data: Aguidebook for the perplexed”. In: The experience of psychopathology: Investigatingmental disorders in their natural settings, pp. 58–78.

Larson, Reed and Maryse H Richards (1994). Divergent realities: The emotional lives ofmothers, fathers, and adolescents. ERIC.

Löwe, Bernd, Kurt Kroenke, and Kerstin Gräfe (2005). “Detecting and monitoringdepression with a two-item questionnaire (PHQ-2)”. In: Journal of psychosomaticresearch 58.2, pp. 163–171.

Lu, Tyler, Dávid Pál, and Martin Pál (2010). “Contextual multi-armed bandits”. In:Proceedings of the Thirteenth international conference on Artificial Intelligence and Statis-tics, pp. 485–492.

Martín-Albo, José et al. (2007). “The Rosenberg Self-Esteem Scale: translation andvalidation in university students”. In: The Spanish journal of psychology 10.2, pp. 458–467.

BIBLIOGRAPHY 87

Mehrotra, Abhinav et al. (2015). “Designing content-driven intelligent notificationmechanisms for mobile applications”. In: Proceedings of the 2015 ACM Interna-tional Joint Conference on Pervasive and Ubiquitous Computing. ACM, pp. 813–824.

Murff, Sharon Hall (2005). “The impact of stress on academic success in college stu-dents”. In: ABNF JOURNAL 16.5, p. 102.

Obuchi, Mikio et al. (2016). “Investigating interruptibility at activity breakpoints us-ing smartphone activity recognition API”. In: Proceedings of the 2016 ACM In-ternational Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. ACM,pp. 1602–1607.

Okoshi, Tadashi et al. (2016). “Towards attention-aware adaptive notification onsmart phones”. In: Pervasive and Mobile Computing 26, pp. 17–34.

Organization, World Health et al. (2007). “World Health Organization global burdenof disease”. In: Geneva: World Health Organization.

Pejovic, Veljko, Mirco Musolesi, and Abhinav Mehrotra (2015). “Investigating therole of task engagement in mobile interruptibility”. In: Proceedings of the 17th In-ternational Conference on Human-Computer Interaction with Mobile Devices and Ser-vices Adjunct. ACM, pp. 1100–1105.

Pollak, John P, Phil Adams, and Geri Gay (2011). “PAM: a photographic affect meterfor frequent, in situ measurement of affect”. In: Proceedings of the SIGCHI confer-ence on Human factors in computing systems. ACM, pp. 725–734.

Reis, Harry T and Shelly L Gable (2000). “Event-sampling and other methods forstudying everyday experience”. In: Handbook of research methods in social and per-sonality psychology, pp. 190–222.

Robinson, John P, Phillip R Shaver, and Lawrence S Wrightsman (2013). Measures ofpersonality and social psychological attitudes: Measures of social psychological attitudes.Vol. 1. Academic Press.

Rosenberg, Morris (2015). Society and the adolescent self-image. Princeton universitypress.

Ross, Michael (1989). “Relation of implicit theories to the construction of personalhistories.” In: Psychological review 96.2, p. 341.

Saha, Koustuv et al. (2017). “Inferring mood instability on social media by leverag-ing ecological momentary assessments”. In: Proceedings of the ACM on Interactive,Mobile, Wearable and Ubiquitous Technologies 1.3, p. 95.

Scollon, Christie Napa, Chu-Kim Prieto, and Ed Diener (2009). “Experience sam-pling: promises and pitfalls, strength and weaknesses”. In: Assessing well-being.Springer, pp. 157–180.

Shiffman, Saul, Arthur A Stone, and Michael R Hufford (2008). “Ecological momen-tary assessment”. In: Annu. Rev. Clin. Psychol. 4, pp. 1–32.

Spitzer, Robert L et al. (1999). “Validation and utility of a self-report version ofPRIME-MD: the PHQ primary care study”. In: Jama 282.18, pp. 1737–1744.

Stone, Arthur A and Saul Shiffman (1994). “Ecological momentary assessment (EMA)in behavorial medicine.” In: Annals of Behavioral Medicine.

Turner, Liam D, Stuart M Allen, and Roger M Whitaker (2015a). “Interruptibility pre-diction for ubiquitous systems: conventions and new directions from a growingfield”. In: Proceedings of the 2015 ACM international joint conference on pervasive andubiquitous computing. ACM, pp. 801–812.

— (2015b). “Push or delay? decomposing smartphone notification response behaviour”.In: Human Behavior Understanding. Springer, pp. 69–83.

Villar, Sofía S, Jack Bowden, and James Wason (2015). “Multi-armed bandit modelsfor the optimal design of clinical trials: benefits and challenges”. In: Statisticalscience: a review journal of the Institute of Mathematical Statistics 30.2, p. 199.

88 BIBLIOGRAPHY

Watson, David, Lee Anna Clark, and Auke Tellegen (1988). “Development and val-idation of brief measures of positive and negative affect: the PANAS scales.” In:Journal of personality and social psychology 54.6, p. 1063.

Zajacova, Anna, Scott M Lynch, and Thomas J Espenshade (2005). “Self-efficacy,stress, and academic success in college”. In: Research in higher education 46.6,pp. 677–706.

Zhang, Xiaoyi, Laura R Pina, and James Fogarty (2016). “Examining unlock journal-ing with diaries and reminders for in situ self-report in health and wellness”.In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems.ACM, pp. 5658–5664.

Zhou, Li (2015). “A survey on contextual multi-armed bandits”. In: arXiv preprintarXiv:1508.03326.

exploiting sensor data to increase compliance with ... · exploiting sensor data to increase...

Documents