towards context-aware search and analysis on social media data

23
Towards Context-Aware Search and Analysis on Social Media Data Leon Derczynski Bin Yang 杨彬 Christian S. Jensen

Upload: leon-derczynski

Post on 26-Jan-2015

113 views

Category:

Education


1 download

DESCRIPTION

Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology.A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal con-texts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.

TRANSCRIPT

Page 1: Towards Context-Aware Search and Analysis on Social Media Data

Towards Context-Aware Search and Analysis on

Social Media Data

Leon DerczynskiBin Yang 杨彬

Christian S. Jensen

Page 2: Towards Context-Aware Search and Analysis on Social Media Data

Evolution of communication

Functional utterances

Vowels

Velar closure: consonants

Speech

New modality: writing

Digital text

E-mail

Social media

Increased

machine-readable

information??

Page 3: Towards Context-Aware Search and Analysis on Social Media Data

Social Media = Big Data

Gartner ''3V'' definition:

1.Volume

2.Velocity

3.Variety

High volume & velocity of messages:

Twitter has ~20 000 000 users per monthThey write ~500 000 000 messages per day

Massive variety: Stock markets;Earthquakes;Social arrangements;… Bieber

Page 4: Towards Context-Aware Search and Analysis on Social Media Data

What is machine-readable now?

Messages now contain

- not only linguistic content

- but also:Links (e.g. URI)Topic markers (e.g. hashtags)Meta-information

What kind of meta-information?

User profile (including home location)ImagesMessages replied toMessage language

Time of messageLocation of message

Page 5: Towards Context-Aware Search and Analysis on Social Media Data

What resources do we have now?

Large, content-rich, linked, digital streams of human communication

We transfer knowledge via communication

Sampling communication gives a sample of human knowledge

''You've only done that which you can communicate''

The metadata (time – place – imagery) gives a richer resource:

→ A sampling of human behaviour

Page 6: Towards Context-Aware Search and Analysis on Social Media Data

What can we do with this resource?

Context increases the data's richness

Increased richness enables novel applications

Time and Place are interesting parts of message context

1.What kinds of applications are there?

2.What are the practical challenges?

Page 7: Towards Context-Aware Search and Analysis on Social Media Data

Temporal Context

Messages have timestamps:

Two temporal retrieval scenarios:

1. Historical analyses

2. Emerging data

+

Page 8: Towards Context-Aware Search and Analysis on Social Media Data

Historical search

Ability to retrieve from archives: Longitudinal query mode 0

Retrieve information on:

● Lifecycle of socially connected groups

● Analyse precursors to events, post-hoc

2008 2011

0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR

Page 9: Towards Context-Aware Search and Analysis on Social Media Data

Historical search

Retrospective analyses into cause and effect

Social media mentions of dead crows predict WNV in humans 1

''There's a dead crow in my garden''

1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc. Int'l conference on Computing for Geospatial Research and Applications

Page 10: Towards Context-Aware Search and Analysis on Social Media Data

Emerging search

Data emerging at high velocity:

185 000 documents per minute

Gives a high temporal density

Search over this info enables:

● Live coverage of events

● Realtime identification of emerging events 2

2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR

Page 11: Towards Context-Aware Search and Analysis on Social Media Data

Temporal indexing

What are our requirements?

● High-frequency document creation

● Temporal cross-sections of varying size

● Time-sensitive TF/IDF: stopwords are fluid

How can we do this? - Open challenge

● Tree indexing hard to distribute

● Maybe with adaptive multi-resolution grids?

Page 12: Towards Context-Aware Search and Analysis on Social Media Data

Spatial Context

Demand for spatial information:

20% of all Google searches

53% of Bing mobile searches

Heterogeneous spatial context sources

GPS locations (most reliable)

Origin bounding boxes (e.g. city)

User profile text??? 3

Author's friends' locations 4

3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User Profiles'', Proc. ACM CHI ; 4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating Users in Social Networks'', Proc. ACM Hypertext

Page 13: Towards Context-Aware Search and Analysis on Social Media Data

Spatial Keyword Search

How can we query a set of social media messages?

Treat as a a set of objects, each havingText Location

Query parameters:Query textQuery location

Given query and set of messages, rank by similarity:

Text similarity (Cosine, Siamese Learning Net, Oriented PCA)Separating distance (Haversine, Manhattan, Eco-routed)Blend this with balancing coeff

(just like conventional spatial keyword search)

Page 14: Towards Context-Aware Search and Analysis on Social Media Data

Spatial Keyword Search

Query: ''good bar in north copenhagen''

Issued from location

Five candidate messages

Query region established

Rank by blend of location and textual similarity

A

B

C

D

E

Message loca text

A So drunk last night at @BarSyv 0.7 0.6

B Out shoe shopping!!! #louboutintime 0.9 0.0

C Who pays $9 for a beer?! 0.6 0.5

D wow found cph's greatest cocktail bar lol 0.1 1.0

E Traffic. Traffic everywhere. Need a drink. 0.4 0.2

Page 15: Towards Context-Aware Search and Analysis on Social Media Data

Continuous Spatial Queries

Social media scenario characterised by:

Streaming data

New spatial objects constantly appearing

Two new spatial keyword query types:

Static Continuous (SCSKQ)- Fixed query location- Tracks newly appearing objects

Moving Continuous (MCSKQ)- Query location transits locus- Result updated with new objects

Novel part: fresh objects continuously introduced

Page 16: Towards Context-Aware Search and Analysis on Social Media Data

Location Diversity

Location data unreliable

Reliability of location data... is also unreliable

''There are known knowns.. we also know there are known unknowns.. but there are also unknown unknowns'' – Donald Rumsfeld

Text mentions require disambiguation

● In profile● In messages● In queries

Requirement is to rank vague points given vague query

Page 17: Towards Context-Aware Search and Analysis on Social Media Data

Willingness to travel

Determines useful search radius

Based on mode of transport:

Different for varying classes of Point Of Interest?

ST Social media = huge dataset

Easy data collection

Useful for e.g. town planning

14.9km22.0km

40.6km61.5km

>100km

Page 18: Towards Context-Aware Search and Analysis on Social Media Data

Spatio-temporal Challenges

We've seen temporal and spatial challenges; let's combine!

Given all these spatio-temporal utterances, what can we do?

- Spatial gives relevance from physical or travel proximity

- Temporal gives relevance from recency and historical

Adding text to the spatio-temporal points gives

explicit semantic context

Not only are ST patterns in the data, we are told what they mean!

Page 19: Towards Context-Aware Search and Analysis on Social Media Data

Topic-based Retrieval

Retrieving results on a topic is useful; ''Tell me about X''

Specific terms vary between places and over time

… Spatio-temporally sensitive indexing?

2007

2011

en.wikipedia.org/wiki/President_of_the_United_States

England English

US English

''Jelly''

Page 20: Towards Context-Aware Search and Analysis on Social Media Data

Sentiment Monitoring

Measure how attitudes change over time and over location

Business uses: where to send marketing

Political uses: data-driven democratic.. campaigning

Governance uses: what are citizen priorities in a region

Temporal dimension enables tracking of trends and reactions

red = upbeat;

blue = complaint.

- no normalisation for vocality!

Page 21: Towards Context-Aware Search and Analysis on Social Media Data

Local Computational Journalism

Social media is quick

Social media is uncurated

''Citizen Journalism''

News has relevance scope:RecencyProximity

Different events relevant in different contexts:Rain in LondonRain in Addis Ababa

Automatic event detection5 - and also reporting!

5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD

Page 22: Towards Context-Aware Search and Analysis on Social Media Data

Summary

Social media is a rich source of ''big data''

A small sampling of all human discourse

It comes with temporal and spatial context

Context-aware search and analysis is very demanding!

- Novel, powerful applications

- Wide variety of domains

- An open set of challenges

Page 23: Towards Context-Aware Search and Analysis on Social Media Data

Thank you!

Thank you for listening!

Do you have any questions?