jordi nin – hermes: distributed social network monitoring system - nosql matters barcelona 2014
TRANSCRIPT
Hermes
Distributed social network monitoring system
Daniel Cea and Jordi Nin
Barcelona Supercomputing Center (﴾BSC)﴿́
Universitat Politècnica de Catalunya (﴾UPC)﴿
{dcea, nin}@ac.upc.edu
Problem formulation
Platform to build social relations among people
who share interests, activities, backgrounds or
real-‐life connections.
New issues born:
Privacy, child safety,
addiction.
4/25
Problem formulation § Rise of social networks -‐> Big amounts of
social data.
§ Two main problems: Multiple sources +
Hardware limitations.
§ Solution: Implement a distributed, scalable
social media analyser ready to gather from
multiple sources and show the aggregated results in real-‐time.
5/25
Objectives
Input web interface:
§ Start a new query.
§ Control the data
recollection.
§ Query history.
6/25
Objectives
Backend:
§ Render interfaces.
§ Gather data from external
APIs.
§ Enrich and store data into a
NoSQL database.
7/25
Objectives
Output web interface:
§ See aggregated
results.
§ Filter results.
§ Customize how the
results are displayed.
8/25
Data Process
JavaScript (﴾client and server side)﴿
§ Platform: Node.js
§ Web framework: Express
§ Sentiment analysis:
Dictionaries obtained from Amazon Turk*
* Amy Beth Warriner, Victor Kuperman, Marc Brysbaert. "Norms of valence, arousal, and dominance for 13,915 English
lemmas”. December 2013, Ghent university. 11/25
Description
Implementation structured in 3 layers, following a Model
View Controller pattern:
• Data access -‐> Storage and indexing of documents
(﴾ json)﴿ and queries.
• Business logic -‐> Start query, manage data stream,
process + enrich tweets, send them to storage.
• User Interface -‐> Allow user control of the system.
14/25
Enrichers Stream slots implement the following data enrichers:
§ Device enricher: Determines the device used to
write the message.
§ Geo enricher: Filters messages by geo-‐location.
§ Spain enricher: For messages coming from Spain,
determines the autonomous community.
17/25
Enrichers § Stopwords enricher: Remove stop words from
the text.
§ Stemmer enricher: Applies a stem to the prior
filtered words.
§ Sentiment enricher: Determines the sentiment
and arousal of the stemmed message.
18/25
Use case: 9N referendum
§ What? -‐> The 9N
unofficial Catalonian
independence referendum
§ When? -‐> from 7th Nov.
2014, to 11th Nov. 2014.
§ Where? -‐> Catalonia
20/25
Use case: 9N referendum § How? -‐> Storing all tweets with filters:
§ Location: none.
§ Language: none.
§ Text: Contains “9N”.
§ Time: From Nov 7 at 00:00 to Nov 11 at 23:59.
§ Why? -‐> Analyse the reactions in the world before,
during and after the referendum.
21/25
General conclusions
§ NoSQL Technologies are crucial for the project. Couchsbase + Elasticsearch + kibana works
perfectly.
§ Elasticsearch is flexible enough for allowing fast
developing and performing real time queries
§ Kibana allows us to create fancy plots with few
effort
23/25
Future work
§ More data sources.
§ Better data enrichment.
§ Add user data context.
§ Percolation queries
24/25