profiling user interests on the social semantic web

44
Profiling User Interests on the Social Semantic Web Ph.D. Viva Fabrizio Orlandi

Upload: fabrizio-orlandi

Post on 23-Aug-2014

796 views

Category:

Science


3 download

DESCRIPTION

Fabrizio Orlandi's PhD Viva @Insight NUI Galway (ex-DERI) - 31/03/2014. Supervisors: Alexandre Passant and John G. Breslin. Examiners: Fabien Gandon and Stefan Decker

TRANSCRIPT

Page 1: Profiling User Interests on the Social Semantic Web

Profiling User Interests on the

Social Semantic WebPh.D. Viva

Fabrizio Orlandi

Page 2: Profiling User Interests on the Social Semantic Web

2

Context: Personalisation

Page 3: Profiling User Interests on the Social Semantic Web

3

Problem

Page 4: Profiling User Interests on the Social Semantic Web

4

Goal

Page 5: Profiling User Interests on the Social Semantic Web

1 – Heterogeneous data sources

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Microblog?

Challenges

5 / 37

Social Networking

Service?

Page 6: Profiling User Interests on the Social Semantic Web

2 – Lack of provenance

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Where?Who?

How?

Challenges

6 / 37

What?

Page 7: Profiling User Interests on the Social Semantic Web

3 – Semantics of entities of interest

SportCEV Volleyball Cup

MusicHeavy Metal

MastodonAtlanta

Semantics?Pragmatics

?

Relevance?

Challenges

7 / 37

Page 8: Profiling User Interests on the Social Semantic Web

Research Questions1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

3. Semantic enrichment of user profiles and personalisation:

How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?

8 / 37

Page 9: Profiling User Interests on the Social Semantic Web

Research GoalHow can we collect, represent, aggregate, mine, enrich and deploy user profiles of interests on the Social Web for multi-source personalisation?

9 / 37

Page 10: Profiling User Interests on the Social Semantic Web

Methodology

10 / 37

Page 11: Profiling User Interests on the Social Semantic Web

1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

11 / 37

Page 12: Profiling User Interests on the Social Semantic Web

Aggregation of Social Web Data Modelling solution for Social Web data and user profiles

Based on SIOC, FOAF and extensions

Experiments on wikis

[Orlandi, Passant. WikiSym. ACM. 2010.] 12 / 37

Page 13: Profiling User Interests on the Social Semantic Web

MusicHeavy Metal

MastodonAtlanta

CEV Champions LeagueVolleyball

Semantic WebRDF

“Mastodon is the best heavy metal band from Atlanta… Can’t wait to see them live again!”

“Trentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyball”

User likes RDF and SemanticWeb on Facebook

• Natural language processing tools

for entity extraction(Zemanta & Spotlight)

• Frequency + time-decay weighting

schemes

Example

13 / 37

Page 14: Profiling User Interests on the Social Semantic Web

14

Aggregation and Mining of Interests7 types of user profiling strategies:

2 types of DBpedia entities: Categories vs. Resources

2 types of weighting-scheme for category-based methods- Cat1: Interests Weight Propagation- Cat2: Interests Weight Propagation w/ Cat. Discount

2 types of exponential Time Decay function- Short mean lifetime- Long mean lifetime

1 “bag-of-words” (Tag-based) state-of-the-art approach

days120days360

Page 15: Profiling User Interests on the Social Semantic Web

Evaluation User study: 21 users rating their user profiles from Twitter &

Facebook 210 ratings for each of the 7 different profiling methods

Aggregation and Mining of Interests

Resou

rces

Categ

ories

Tags

0

0.2

0.4

0.6

0.8

1

P@10AVG Score

Key findings DBpedia resource-based profiles

outperform Dbpedia category-based and tag-based profiles.

Best strategy: Resources + Frequency & Slow Time Decay weighting scheme

[Orlandi, Breslin, Passant. I-Semantics. ACM. 2012.] 15 / 37

Page 16: Profiling User Interests on the Social Semantic Web

1. Aggregation of Social Web data: How can we aggregate and represent user data distributed across heterogeneous social media systems for profiling user interests?

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

16 / 37

Page 17: Profiling User Interests on the Social Semantic Web

Motivation: use of provenance information as core of the profiling heuristics to improve mining of user interests and semantic enrichment Data Provenance as the history, the origins and the evolution of data

Who created/modified it? When? What is the content? Where is it located?How and Why was it created? Which tools and processes were used?

Provenance of Data

Provenance as the “bridge” between Social Web and Web of Data

e.g. Wikipedia/DBpedia

17 / 37

Page 18: Profiling User Interests on the Social Semantic Web

Use Case: Provenance on Wikis

Provenance on the Social Web for the Web of Data

A semantic model to represent provenance information in wikis A software architecture to extract provenance from Wikipedia An application that uses and exposes provenance data to compute

measures and statistics on Wikipedia articles

[Orlandi, Champin, Passant. SWPM at ISWC. 2010.] 18 / 37

Page 19: Profiling User Interests on the Social Semantic Web

Provenance on the Social Web

19 / 37

Page 20: Profiling User Interests on the Social Semantic Web

Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.

Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.

We built a model and an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.

Provenance on the Web of Data for the Social Web

Use Case: Provenance on DBpedia

[Orlandi, Passant. Journal of Web Semantics. 2011] 20 / 37

Page 21: Profiling User Interests on the Social Semantic Web

Semantic provenance in DBpedia

• Using detailed provenance information extracted from Wikipedia we are able to compute provenance also for DBpedia resources.

• Analyzing the “diffs” between the revisions of Wikipedia articles and the users' contributions we identify the edits on Wikipedia that resulted in a change in the related DBpedia resource.

• We built an application that shows provenance information for each triple on DBpedia that is the result of users' edits on Wikipedia.

21 / 37

Page 22: Profiling User Interests on the Social Semantic Web

Provenance for Profiling InterestsDifferent provenance features to support interest mining

Not only: authorship and temporal features But also: social media source, object, type of action,

22 / 37

Page 23: Profiling User Interests on the Social Semantic Web

Provenance for Profiling InterestsUser study: 27 users on Twitter and FacebookThey evaluated their aggregated and provenance-aware user

profilesSocial Feature Score

E FB education 4.62E FB workplace 4.60I TW followees’ posts 4.03I FB checkins 3.95E FB interests 3.95E FB likes 3.92I TW favourite posts 3.76I TW retweets 3.76I TW posts 3.61I TW replies 3.52I FB status updates 3.50I FB media actions 3.24I FB comments 2.56I FB direct posts 2.37

AVG Scores from 1 to 5

Locations, explicit profile info and also followees’ posts provide better accuracy for mining user interests

Interests stated explicitly by users produce user profiles 20% more accurate than implicitlySeries1

1 2 3 4 5

[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM WI. 2013] 23 / 37

Page 24: Profiling User Interests on the Social Semantic Web

2. Provenance of data for user profiling: What is the role of provenance on the Social Web and on the Web of Data and how to leverage its potential for user profiling?

3. Semantic enrichment of user profiles and personalisation:

How to combine data from the Social and Semantic Web for enriching user profiles of interests and deploying them to different personalisation tasks?

24 / 37

Page 25: Profiling User Interests on the Social Semantic Web

Semantic Enrichment

db:Montreal

db:Quebec

db:Gilles_Villeneuve

db:Ferrari db:Formula_1dbo:wikiPageWikiLink

dbo:wikiPageWikiLink

dbo:birthPlace

dbp:largestcity

25 / 37

Page 26: Profiling User Interests on the Social Semantic Web

Music

Heavy Metal

Mastodon (band)

CEV Champions League

Volleyball

Semantic Web

RDF

ExampleAre all the extracted entities useful for personalisation?

How are concepts/entities being used on the Social Web? (Pragmatics)

Very abstract, very popular

Specific and time-dependent on events, etc.

Specific and time-dependent on events, etc.

Abstract and not popular

Abstract and popular

Specific and not popular

Very popular

26 / 37

Page 27: Profiling User Interests on the Social Semantic Web

Characterising Concepts of Interest

27

Novel measures for the characterisation and semantic expansion of concepts of interest Enrichment of entity-based user profiles for personalisation

Popularity of concepts on the Social Web (using Twitter) How popular an entity is on the Social Web? How frequently is it

mentioned/used at that point of time?

Trend and temporal dynamics (using Wikipedia page views) The trend and evolution of the frequency of mentions of an entity on

the Social Web (i.e. popularity over time)

Specificity and categorisation of entities of interest (using LOD)

The level of abstraction that an entity has in a common conceptual schema shared by humans

27 / 37

Page 28: Profiling User Interests on the Social Semantic Web

Requirements

Use case: real-time personalisation of Social Web streams

1. Real-time computation of the dimensions

2. Results constantly up to date with the real world

3. Knowledge base and domain independent approach

28 / 37

Page 29: Profiling User Interests on the Social Semantic Web

Popularity?

[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM, WI 2013]

Characterising Concepts of Interest

Trendy and Stable?

Specificity?

29 / 37

Page 30: Profiling User Interests on the Social Semantic Web

Real-time Semantic Personalisation of Social Web Streams

“SPOTS”: A methodology for real-time personalisation of any large social stream

Automatic dynamic generation of multi-source user profiles of interests.

Semantic enrichment of concepts of interest with provenance and Linked Data info.

Ranking and selection of the interests according to their relevance for the user and for the personalisation use case.

Informativeness measures for posts to filter a large social stream.

Evaluation of the approach on the public Twitter stream

Against Twitter #Discover: from 192% increase in accuracy30 / 37

Page 31: Profiling User Interests on the Social Semantic Web

31[Kapanipathi, Orlandi, Sheth, Passant. SPIM at ISWC 2011.]

31

Real-time Semantic Personalisation of Social Web Streams

Page 32: Profiling User Interests on the Social Semantic Web

Evaluation on SPOTSUser study to evaluate the impact of the enrichment on a personalisation use case

27 users, 800 user ratings collectedMain outcome:

Popularity and Temporal Dynamics are useful measures for real-time personalisation

SPOTS Improvement*No Enrichment ---

Trendy +29%Not Stable +26%

At Least 2 Features +9%

Specific + Not Popular +5%

* In recommendations accuracy over non-enriched profiles 32 / 37

Page 33: Profiling User Interests on the Social Semantic Web

Evaluation on User ProfilesUser study to evaluate the impact of the enrichment on user profiles according to users’ judgement

27 users, 800 user ratings collectedMain outcome:

Specificity is more useful than popularity measures according to user perception

User Profiles Improvement*No Enrichment ---

Not Specific + Not Popular +13%

Not Specific +8%Not Popular +2%

Stable + Not Trendy +1%

* In profile accuracy over non-enriched profiles 33 / 37

Page 34: Profiling User Interests on the Social Semantic Web

Summary

34 / 37[Orlandi, UMAP 2012]

Page 35: Profiling User Interests on the Social Semantic Web

Summary We provide and evaluate a complete methodology for

profiling user interests across multiple sources on the Social Web Collect, Represent, Aggregate, Mine, Enrich, Deploy

Aggregation of user data: • Semantic representation of Social Web content and user activities

Provenance of data:• Improves profiling accuracy and connects Social Web and WoD

Mining of user interests:• Provenance + Linked Data/Entity-based strategies + time decay,

outperform traditional “bag-of-words” strategies and facilitate enrichment

Semantic enrichment:• Improves profiling accuracy and it is necessary for the deployment of

the profiles in a personalisation use case• Different types of personalisation need different entities of interest

35 / 37

Page 36: Profiling User Interests on the Social Semantic Web

Future Work

Federated Personal Data Manager Privacy-aware, interoperable, autonomous,

user profiling infrastructure

Provenance at Web Scale Necessary to focus on techniques for an easier and less expensive

tracking and management of provenance on the Social Semantic Web

Adaptive Profiling of User Interests Adaptation of the profiling algorithm and strategy according to the

application and the context

36 / 37

Page 37: Profiling User Interests on the Social Semantic Web

Contributions & Dissemination Semantic Web modelling solutions for Social Web data, user

profiles, provenance on the Social Web and Web of Data. A provenance computation framework Novel measures for characterising entities of interest A real-time personalisation system for large Social Web

streams User studies for different profiling strategies, provenance

features and personalisation use-cases A privacy-aware user profile management system

Publications

2 journal, 4 conference, 2 workshop papers

37 / 37

Thanks!

Page 38: Profiling User Interests on the Social Semantic Web

38

Page 39: Profiling User Interests on the Social Semantic Web

39

ContextUser Modelling• The process of representing a user or some of his/her

characteristics (e.g. interests, workplace, location, etc.)

User Profile• A characterisation of a user at a particular point of time

Page 40: Profiling User Interests on the Social Semantic Web

Experiment6 types of user profiles evaluated:

2 types of DBpedia entities

Categories vs. Resources

2 types of weighting-scheme for category-based methodsCat1: Interests Weight PropagationCat2: Interests Weight Propagation w/ Cat. Discount

2 types of exponential Time Decay function

Short mean lifetime

Long mean lifetime

days120

days360

Page 41: Profiling User Interests on the Social Semantic Web

Experiment

6 types of user profiles evaluated:

Cat2

Cat1-120 Cat1-360 Cat2-120 Cat2-

360Res-120 Res-360

Res Cat

Cat1

Page 42: Profiling User Interests on the Social Semantic Web

42

User-based Evaluation

We asked users to rate the top 10 interests generated for each of the 6 profiling strategies Question:

“Please rate how relevant is each concept for representing your personal interests and context…”

Rating: 0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)

Rating converted to a (0…10) scale Performance evaluated with:

MRR (Mean Reciprocal Rank)P@10 (Precision at K = 10)

Comparison with a Baseline A traditional approach based on “keyword frequency”

Page 43: Profiling User Interests on the Social Semantic Web
Page 44: Profiling User Interests on the Social Semantic Web

EvaluationOn average for:200 Tweets & 200 Facebook posts, and items.

~106 interests – DBpedia Resources ~720 interests – DBpedia Categories (~7 times)

Statistical significance for:Resources vs. Categories (p<0.05)Any method vs. Baseline (p<0.05)Not for time decay (p~0.2) and Cat1 vs. Cat2