semantic modelling of user interests based on cross-folksonomy analysis @ iswc2008

24
TAGora: Semiotic Dynamics of Online Social Communities EU- IST-2006-034721 Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis Martin Szomszor, Harith Alani, Kieron O’Hara, Nigel Shadbolt University of Southampton Iván Cantador Universidad Autonoma de Madrid

Upload: martin-szomszor

Post on 28-Aug-2014

477 views

Category:

Documents


0 download

DESCRIPTION

Paper presented at the International Semantic Web Conference (ISWC) 2008.

TRANSCRIPT

Page 1: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721

Semantic Modelling of User Interests Based

on Cross-Folksonomy AnalysisMartin Szomszor, Harith Alani, Kieron O’Hara, Nigel Shadbolt

University of Southampton

Iván Cantador Universidad Autonoma de Madrid

Page 2: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Outline• Introduction and Motivation

– Why is your folksonomy interaction useful?– How could it be exploited?

• Architecture– Matching user accounts– Collecting Data– Tag Filtering– Profile Building

• Experiment and Evaluation• Conclusions and Future Work

Page 3: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Introduction

delicious.comhttp://slashdot.org/

http://news.bbc.co.uk/

Dream Theater

Metallica

Rush

Page 4: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Increasing number ofonline identities

• Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2

• Many predict that in the near future, individuals will have in excess of 10 profiles– [Ofcom 2008] Social Networking: A quantative and qualitative

research report into attitudes, behaviours, and use.

Page 5: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Profile of Interests

The Big Picturedelicious.com

Page 6: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

delicious.com

Profiles could be exported to other sites to improve recommendation quality

Profile of

Interests

Personalisation

Profiles could be used to support

personalised searching

Better user experience

Page 7: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Consolidation and Integration

currency

travel

hotels

cuba

http://dbpedia.org/resource/Cuba

cuba

holiday

2008

http://dbpedia.org/resource/Travel

http://dbpedia.org/resource/Holiday

http://dbpedia.org/resource/Category:Tourism

Page 8: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

User Taggingdelicious.com

Page 9: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

delicious.com

Tag Clouds

Page 10: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Tagging Variation

[1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.

Raw Tags

Filtered Tags

Page 11: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Architecture for Building Profiles of Interests

Page 12: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Account Correlation

• Using Google’s Social Graph API

delicious.com

acco

unt h

omep

age

http://users.ecs.soton.ac.uk/mns2

Page 13: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

• Delicious– Custom python scripts

• Flickr– Using public API

• Only public information is harvested

Data Collection

Page 14: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Tag Filtering Process

Page 15: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

• Three stage process:1. Identify Wikipedia page

• London is matched withhttp://en.wikipedia.org/wiki/London

2. Extract Category list• Host cities of the Summer Olympic Games | Host cities of the

Commonwealth Games | London | 1st century establishments | British capitals | Capitals in Europe | Port cities and towns in the United Kingdom

3. Select representative Categories• Only choose categories that match the tag string• Excludes spurious categories such as:

– Host cities of the Summer Olympic Games– Needs more sources

Creating User Profiles

Page 16: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Profile of Interest

Page 17: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Experiment Setup• Bootstrapped using 667,141 delicious

profiles obtained in previous work• Only accounts with a matching Flickr

profile and > 50 distinct tags were added• Final list contains 1,392 users

Delicious FlickrTotal Posts 1,134,527 Total Posts 2,215,913

Distinct Tags 138,028 Distinct Tags 307,182

Page 18: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Evaluation

• Four evaluation procedures:– The performance of the tag filtering and

matching to Wikipedia Entries– The difference between the most common

categories found in delicious and Flickr– The amount learnt from merging profiles from

the two folksonomies– The accuracy of matching tags to Wikipedia

categories

Page 19: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Tag Filtering and Matching

Page 20: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Global Category View• What are the differences in the interests

that are learnt from each domain?

Delicious FlickrWikipedia Category Total Freq Wikipedia Category Total Freq

Design 69,215 Travel 51,674

Blogs 68,319 Australia 51,617

Music 45,063 London 46,623

Photography 41,356 Festivals 42,504

Tools 35,795 Music 40,943

Video 34,318 Cats 38,230

Arts 29,966 Holidays 37,610

Software 28,746 Family 37,100

Maps 26,912 Japan 36,513

Teaching 22,120 Concerts 35,374

Games 21,549 Surnames 34,947

How-to 19,533 Washington 33,924

Technology 18,032 Given Names 32,843

News 17,737 Dogs 32,206

Humor 15,816 Birthdays 22,290

Page 21: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Learning More About Users• How much more can we learn by using

multiple profiles?

Page 22: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Category Matching• How good is the category matching?• Take 100 random users and choose 1

Delicious tag and 1 Flickr tag• Classify tag into one of 3 classes:

– Correct– Unresolved (not matched to any category)– Ambiguous (Disambiguation required)

Correct Unresolved AmbiguousDelicious 66% 20% 14%

Flickr 63% 25% 12%

Page 23: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Conclusions• We have proposed a novel method for the

creation of Profiles of Interest by exploiting an individual’s tagging activities across two popular folksonomy sites

• Frequently used tags often specify areas of interest but not always!– Common delicious tags are daily, toread, howto– Flickr tags often include names of people

• Expanding the analysis across folksonomies increases the amount learnt– On Average 15 new concepts per user

Page 24: Semantic Modelling of  User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

Future Work• Improve page matching

– 22.5% of sample tags unresolved• Handle disambiguation

– 13% of sample tags refer to ambiguous terms• Cooccurrence networks• Category hierarchy

• Increase network coverage– Already have the data to include Last.fm

• Understand which tags actually specify an interest of the individual– Filter out categories such as ‘Surname’