social media crawling and mining seminar (motivation part)

27
Lecture @ International Hellenic University Thessaloniki, 8 May 2014 Social Media Crawling and Mining Motivation – Use Cases Symeon (Akis) Papadopoulos, Manos Schinas, Katerina Iliakopoulou, Yiannis Kompatsiaris Information Technologies Institute (ITI) Centre for Research & Technologies Hellas (CERTH)

Upload: yiannis-kompatsiaris

Post on 25-May-2015

210 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Social Media Crawling and Mining Seminar (Motivation Part)

Lecture @ International Hellenic UniversityThessaloniki, 8 May 2014

Social Media Crawling and MiningMotivation – Use CasesSymeon (Akis) Papadopoulos, Manos Schinas, Katerina Iliakopoulou, Yiannis KompatsiarisInformation Technologies Institute (ITI)Centre for Research & Technologies Hellas (CERTH)

Page 2: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #2

IntroductionMotivationExample ApplicationsConceptual ArchitectureChallenges

Page 3: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

http://www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)

Page 4: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Social Networks as Real-Life Sensors• Social Networks is a data source with an

extremely dynamic nature that reflects events and the evolution of community focus (user’s interests)

• Huge smartphones and mobile devices penetration provides real-time and location-based user feedback

• Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections

• Present in an efficient way for a variety of applications (news, marketing, entertainment)

Page 5: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #5

Pope Francis

Pope Benedict

2007: iPhone release

2008: Android release

2010: iPad release

http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/

Page 6: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Social Networks as Graphs

Page 7: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #7

Social Networks as Graphs

“Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts”

•Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph

– Central – Hub– Outlier– Transitivity (connections between

friends)

Page 8: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Examples - Science

Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM.

8

“…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”

Page 9: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Example – News (Boston bombing)

#9

“Following the Boston Marathon bombings, one quarter of Americans reportedly looked to Facebook, Twitter and other social networking sites for information, according to The Pew Research Center. When the Boston Police Department posted its final “CAPTURED!!!” tweet of the manhunt, more than 140,000 people retweeted it.”

“Authorities have recognized that one the first places people go in events like this is to social media, to see what the crowd is saying about what to do next”

"I have been following my friend's Facebook [account] who is near the scene and she is updating everyone before it even gets to the news”

Page 10: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Events - Festivals

#10http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg

Page 11: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

API Wrapper

Website Wrapper

Scheduler

CRAWLING

Visual Indexing

Near-duplicates

Text Indexing

INDEXING

Media Fetcher

SNA

Sentiment - Influence

Trends - Topics

MINING

Model Building

Concepts

Relevance

Diversity

Popularity

RANKING

Veracity

Crawling Specs

Sources

Interaction

Responsiveness

Aggregation

VISUALIZATION

Aesthetics

Conceptual Architecture

Page 12: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Challenges – Content (Mining)

• Multi-modality: e.g. image + tags

• Rich social context: spatio-temporal, social connections, relations and social graph

• Inconsistent quality: noise, spam, ambiguity, fake, propaganda

• Huge volume: Massively produced and disseminated

• Multi-source: may be generated by different applications and user communities

• Also connected to other sources (e.g. LOD, web)

• Dynamic: Fast updates, real-time

Page 13: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Policy – Licensing – Legal challenges

• Fragmented access to data– Separate wrappers/APIs for each source (Twitter, Facebook, etc.)– Different data collection/crawling policies

• Limitations imposed by API providers (“Walled Gardens”)• Full access to data impossible or extremely expensive (e.g. see data

licensing plans for GNIP and DataSift• Non-transparent data access practices (e.g. access is provided to an

organization/person if they have a contact in Twitter) • Constant change of model and ToS of social APIs

– No backwards compatibility, additional development costs• Ephemeral nature of content

• Social search results often lead to removed content inconsistent and unreliable referencing

• User Privacy & Purpose of use• Fuzzy regulatory framework regarding mining user-contributed data

Page 14: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #14

Social Sensor ProjectUse Cases

Page 15: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

SocialSensor Project Objective

SocialSensor quickly surfaces trusted and relevant material from social media – with context.

DySCODySCO

behaviour

location

timecontent

usage

social context

Massive social mediaand unstructured web

Social media miningAggregation & indexing

News - InfotainmentPersonalised access

Ad-hoc P2P networks

Page 16: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #16

The SocialSensor Vision

SocialSensor quickly surfaces trusted and relevant material from social media – with context.

•“quickly”: in real time•“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process•“relevant”: to the users, personalized•“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web)•“social media”: across all relevant social media platforms•“with context”: location, time, sentiment, influence

Page 17: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #17

Conceptual Architecture and Main components

SEMANTIC MIDDLEWARE

Public Data

In-project Data

SEARCH & RECOMMENDATION

USER MODELLING & PRESENTATION

INDEXINGMINING

STORAGE

DATA COLLECTION / CRAWLING

• Real time dynamic topic and event clustering

• Trend, popularity and sentiment analysis

• Calculate trust/influence scores around people

• Personalized search, access & presentation based on social network interactions

• Semantic enrichment and discovery of services

Page 18: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Use Cases

Casual News application

Casual News Readers

Professional News application

Journalists, Editors, etc.

NEWS

EventLiveDashboard

Festival organizers

INFOTAINMENT

Social Media Walls

Festival attendants

Page 19: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #19

“It has changed the way we do news”(MSN)

“Social media is the key place for emerging stories – internationally, nationally, locally” (BBC)

“Social media is transforming the way we do journalism”(New York Times)

Source: picture alliance / dpa

Page 20: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #20

Source: Getty Images

“It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC)

“Things that aren’t relevant crowd out the content you are looking for” (MSN)

“The filters aren’t configurable enough” (CNN)

Page 21: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Verification was simpler in the past...

Source: Frank Grätz

#21

Page 22: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics #22

Infotainment• Events with large numbers

of visitors• Thessaloniki International

Film Festival – 80,000 viewers / 100,000

visitors in 10 days– 150 films, 350 screenings

• Discovery and presentation of relevant aggregated social media– Trending Topics– Sentiment– Tweet – film matching– Visualization (Social Walls)

Page 23: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Other Application Areas

• Science– Sociology, machine learning (machine as a teacher), computer vision

(annotation)• Tourism – Leisure – Culture

– Off-the-beaten path POI extraction• Marketing

– Brand monitoring, personalised ads• Prediction

– Politics: election results• News

– Topics, trends event detection• Others

– Environment, emergency response, energy saving, etc

Page 24: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Conclusions – Further topics• Social media data useful in many applications• Not all data always available (e.g. User queries, fb)

– Infrastructure– Policy - Privacy issues

• Real-time and scalable approaches– Efficiency of semantics and analysis vs. performance vs. infrastructure

• Fusion of various modalities– Content, social, temporal, location

• Verification & Linking other sources (web, Linked Open Data)• Visualization - Interfaces• Applications and commercialization• User engagement

Page 25: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

Reusable results

• Starting point: http://www.socialsensor.eu/results – Deliverables– Publications – Datasets– Software– e-letter: http://stcsn.ieee.net/e-letter/vol-1-no-3

• Open-source projects (Apache License v2): https://github.com/socialsensor

– Data collection (stream-manager, storm-focused-crawler)– Indexing (framework-client, multimedia-indexing)– Mining (topic-detection, multimedia-analysis, community-evolution-

analysis, social-event-detection)

Page 26: Social Media Crawling and Mining Seminar (Motivation Part)

MSDM 2014, Athens Social Data and Multimedia Analytics

European Centre for Social Media

• Topics– Social media analytics– Verification– Visualisation– Applications in different domains

• Activities– Listings of project, results, institutions, events– Community building– Support/organise events– Common social media presence (e.g. LinkedIn)– Funding from subscriptions, training, commercialisation

– Supporting projects: SocialSensor, Reveal, MULTISENSOR, PHEME, DecarboNet, MWCC, uComp,

– Website: http://www.socialmediacentre.eu/ – Research-academic: STCSN http://stcsn.ieee.net/

Page 27: Social Media Crawling and Mining Seminar (Motivation Part)

Thank you for your [email protected]

http://mklab.iti.gr