iir 2017, lugano switzerland

26
Empathic inclination from digital footprints* Marco Polignano, Pierpaolo Basile, Gaetano Rossiello, Marco de Gemmis and Giovanni Semeraro University of Bari “Aldo Moro”, Dept. of Computer Science, Italy * These results are already published in Inclination to Empathy from Social Media Footprints” in proceedings of User Modelling, Adaptation and Personalization, FIIT STU, Bratislava, Slovakia, July 2017 (UMAP 2017), DOI: http://dx.doi.org/10.1145/3079628.3079639

Upload: marco-polignano

Post on 23-Jan-2018

55 views

Category:

Technology


0 download

TRANSCRIPT

Empathic inclination from digital footprints*Marco Polignano, Pierpaolo Basile, Gaetano Rossiello, Marco de Gemmis and Giovanni

SemeraroUniversity of Bari “Aldo Moro”, Dept. of Computer Science, Italy

* These results are already published in “Inclination to Empathy from Social Media Footprints” in proceedings of User Modelling,

Adaptation and Personalization, FIIT STU, Bratislava, Slovakia, July 2017 (UMAP 2017), DOI: http://dx.doi.org/10.1145/3079628.3079639

Hello!I am Marco

PolignanoYou can find me at [email protected]

Intelligent Information Access

• Affects detection and extraction

• Recommender Systems

• Information Filtering

• Hybrid Recommendation Strategies

• Machine Learning Techniques for Recommender Systems

http://www.di.uniba.it/~swap/index.php?n=Membri.MarcoPolignano

http://www.di.uniba.it/~swap/

Outline

• The role of affects in human reasoning

• Digital Footprints on the Internet and Social Media

• Prediction Model of Empathy Inclination

• Experimental Session

• Discussion of Results

• Recap and future work

Human Decisions

A person facing a choosing problem has to consider different solutions and take a decision.

Traditional approaches of behavioural decision making, consider choosing as a rational process that

estimates which of various alternatives would yield the one with most positive consequences. A modern

view, consider in this process also other influences, such as them of emotions, feeling and sentiment.

They are in the area of feelings and emotions

Area of feelings and emotions

Each person is influenced differently by affects, without really knowing the

reason. We have some psychological studies but they are restricted to some specific

context such as gambling or high risk situation. But, Many new models have been

proposed in the last years.

• Less open to changes

• Less social

• More conservative

• Take less risks

• More connected with the past

… This doesn’t mean that standard preferences doesn’t matter.

T. E. Nygren, A. M. Isen, P. J. Taylor, J. Dulin, The influence of posi- tive affect on the decision rule in risk situations: Focus on outcome (and especially avoidance of loss) rather than probability,

Organizational be- havior and human decision processes 66 (1) (1996) 59–72.

How can I balance them?

People are influenced by affects with different intensity and empathy is a meaningful

indicator about the impact of affective aspects on the user's everyday life.

A System should be able to:

(a) detect user affects,

(b) understand influences of affects on users,

(c) use them in a core reasoning process,

(d) generate actions for supporting coherently users,

considering their affective state.

A system, able to show this functions, is Emotional Intelligent!*

* Mayer, J. D., Salovey, P., Caruso, D. R., & Sitarenios, G. (2003). Measuring emotional intelligence with the MSCEIT V2. 0. Emotion, 3(1), 97.

Empathy?“Empathy is the ability to

understand and be

influenced by self and other

emotions.

It can be correlated with social

self-confidence, even-

temperedness, sensitivity

and nonconformity. *Empathy is not a self-trait of personality, it is considered as an affective-cognitive

process over a specific situation.” **

* Hogan, Robert. "Development of an empathy scale." Journal of consulting and clinical psychology 33.3 (1969): 307.

** Zillmann, Dolf. "Empathy: Affect from bearing witness to the emotions of others." Responding to the screen: Reception and reaction processes (1991): 135-167.

Can I detect psychological

aspects?

They can be detected using different strategies: questionnaire, face-voice analysis,

text analysis, biological parameters observation… Are they accurate? Not always, but

non-intrusive strategies are going to be very accurate.*

A modern approach is based on the Analysis of all the data that the user leaves on

Social Networks. This acting strategy is called: Social Footprints Analysis.

* Park, G., Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Kosinski, M., Stillwell, D.J., Ungar, L.H., Seligman, M.E.: Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108(6), 934 (2015)

We need approaches for analyzing text and all the

information that digitally describe the user.

• Machine Learning Approaches (Multinomial Naive

Bayes, MSO – SVM, Random Forest, …)

• Thesaurus based Approaches (WordNet-Affects,

Senti – WordNet)

• …

A very large

source for

profiling

users…

We defined a model

for predicting the

Empathy Inclination of

users from

Social Media Sites“Privacy is dead and social media hold the smoking

gun”

Pete Cashmore, Mashable CEO

The empathy prediction model

Each user Ui is represented as the concatenation of five features vectors.

Each vector captures a particular aspect of the user profile which are really important for their

influence on most of the aspects of area of feelings and emotions.

Our starting point?

Dataset of myPersonality Project*

http://mypersonality.org/

More than 4,000,000 individual Facebook profiles

More than 6,000,000 test results

More than 36,000,000 user-like pairs

22mn status updates of 154k users

224m records of friendships connections

* Kosinski, M., Matz, S., Gosling, S., Popov, V. & Stillwell, D. (2015) Facebook as a Social Science

Research Tool: Opportunities, Challenges, Ethical Considerations and Practical Guidelines. American

Psychologist.

Pre-processing operations

1. Construction of Word2Vec distributional space

The word2vec model is learned over the 22 millions of user status

updates of the “mypersonality” dataset, in an attempt to discover the

semantics behind social media user language.

For each user a pseudo document that contains all her posts is

created. The pseudo document is turned into a feature vector using

the mean aggregation strategy over all the word embeddings

encountered while scanning the document

Moreover, we divide the whole vocabulary of word2vec vectors

involved in the user’s posts into clusters (k-means), which should

represent topics of discussion.

Status Updates

Pseudo

Document

200-dimensional features vector

Pre-processing operations

2. User filtering

We obtained 903 user’s from myPersonality which have

information about:

• Demographic Data: general data about the users, including age, gender, …

• BIG5 Personality Scores: personality traits of the users, including Openness,

Conscientiousness, Extroversion, Agreeableness, Neuroticism

• Facebook activity: statistics about the activity of the users, including number of

user’s likes

• User-concept SVD reduced data (SVD): 100-dimensional vectors of weights for

latent concepts associated to the users.

• User-topic membership data (LDA): 600-dimensional vectors of weights of topics

of interest associated to the users

• Facebook Status Updates: posts by the users on their personal profiles

• Empathy Quotient Scale(EQS): results of the empathy level questionnaire

Pre-processing operations

3. Data Representation

We represent each User as a vector of 1088 numeric features. The

nominal features have been binarized.

Demographic

Data

Personalit

y

FB

ActivitySVD+LDA

Word2Ve

c of Posts

+

Clusters

Empathy

120 features 5 features 12 features 700 features 250 features 1 feature

Experimental SetupConfigurations of the experiment

Research Questions and set upRQ1: Is it possible to predict empathy from social media footprints?

RQ2: What are the most important features to consider for

improving the prediction accuracy?

We exploit three different regression algorithms:

1. Linear Regression (Lr)

2. Simple Regression (Sr)

3. Different configurations of kernel of the SVM Regression with SMO algorithm (SMO)

For the SMO we used the polynomial kernel (SMOpoly) and the Radial Basis Function (RBF) kernel

(SMOrbf), by varying the c parameter from 1 to 8.

Metrics and Baseline

We adopted the Root Mean Square Error (RMSE) and the Mean

Absolute Error (MAE) as evaluation metrics over an interval of

possible values between 0-80. The evaluation protocol was 10

folds cross validation.

We evaluated our approach with the following baseline:

Baseline Value Predicted MAE RMSE

Majority 8 7.4784 10.8258

Avg. EQS 13.9169 6.8457 9.0757

The former always predicts the most frequent value in the dataset

(Majority), while the later computes the empathy score as the simple

average of EQS observed in the dataset (Avg. EQS).

Results of

evaluation

All Features Filtered Features –

CfsSubsetEval

Approach c MAE RMSE MAE RMSE

SMOpoly 1 12.7137 19.1565 5.714 7.8407

SMOpoly 2 15.5265 23.9027 5.7227 7.8445

SMOpoly 4 - - 5.7167 7.8412

SMOpoly 8 - - 5.725 7.8501

SMOrbf 1 5.9101 8.2341 5.6894 7.9163

SMOrbf 2 5.9543 8.2432 5.6673 7.8631

SMOrbf 4 6.1049 8.3623 5.6701 7.8275

SMOrbf 8 6.539 8.7748 5.686 7.8236

Lr - 22.7929 34.4679 5.7854 7.7269

Sr - 6.1045 8.233 6.1045 8.233

Majority 8 7.4784 10.8258 7.4784 10.8258

Avg. EQS 13.9169 6.8457 9.0757 6.8457 9.0757

Best result considering

both MAE and RMSE

textBest result considering

singular values of MAE

and RMSE

Legend of results:

Outcome 1

RQ1: Is it possible to predict empathy from social media footprints?

• 14% Atheist

• 3% Separated

• 8% from country (AG, EG, KW, HN, AR, SR)

• 75% extrovert and

aggregable

• 29% Atheist

• 11% Separated

• 11% from country (AG, EG, KW, HN, AR, SR)

• 38% extrovert and

aggregable

Low Inclination

to EmpathyScore < 30

High Inclination

to EmpathyScore >= 30

The feature selection process left 37 relevant features. If we analyze some

of them we obtain interesting statistics:

Second Round:

Ablation Test

Best result considering

both MAE and RMSE

textBest result considering

singular values of MAE

and RMSE

Legend of results:

Features MAE RMSE dif. MAE % dif. RMSE%

all – SMOrbf

15.9206 8.2949

- activity 5.9072 8.2811 0.2263 0.1664

-

demographic5.8717 8.311 0.8259 -0.1941

- personality 6.4908 9.0482 -9.6308 -9.0815 - LDA 5.9261 8.2921 -0.0929 0.0338

- SVD 5.9261 8.2921 -0.0929 0.0338

- LDA&SVD 5.8988 8.1903 0.3682 1.261

- W2V 5.9523 8.2768 -0.5354 0.2182

- W2V

clusters 5.9096 8.2643 0.1858 0.3689

Outcome 2

RQ2: What are the most important features to consider for

improving the prediction accuracy?

The removal of personality features

generate a drastically reduction of the

accuracy of the approach, considering

both MAE and RMSE

Recap and future work

• It has been showed a study of the physiological area of feeling and emotion

in literature

• It has been investigated the phenomenon of digital footprints

• It has been described a model for predicting the user’s inclination to be

empathic from data on social media

• Results of the experimental session have been discussed showing the

importance of Personality Traits and a good accuracy in the prediction task

• We are working for including these findings as part of the user profile and as

part of an Affective-Based recommendation strategy

Thanks!Any questions?You can find me at:

[email protected]◇ http://www.di.uniba.it/~swap/index.php?n=Membri.MarcoPolign

ano