implementing link-prediction for social networks in a database system (dbsocial2013)

20
Implementing Link-Prediction for Social Networks in a Database System Sara Cohen Netanel Cohen-Tzemach The Hebrew University of Jerusalem

Upload: nati-cohen-tzemach

Post on 18-Dec-2014

673 views

Category:

Technology


1 download

DESCRIPTION

Our project considers the problem of implementing metrics for link prediction in a social network over different types of database systems (MySQL, Redis and Neo4J). In particular, we study how the features of the database system affect the ease in which link prediction may be performed.

TRANSCRIPT

Page 1: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Implementing Link-Prediction for Social Networks in a Database System

Sara Cohen Netanel Cohen-Tzemach

The Hebrew University of Jerusalem

Page 2: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

About me

Page 3: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

What backend to choose?● Premise: <1M Nodes

● DIY vs. existing

● Data model

● Limitations/Features

● TPC-C won't help...

Page 4: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Previous workCompared databasesLimitations and FeaturesData model

ImplementationExperimentsMeasurements

N. Ruflin, H. Burkhart, and S. Rizzotti. Social-data storage-systems.In Databases and Social Networks, DBSocial '11, pages 7-12, New York, NY, USA, 2011. ACM.

Page 5: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Our work● Implemented 7 Link-Prediction metrics

● Experimented on 10 social-networks

● Over 3 different backends○ Relational (MySQL)

○ Key-Value (Redis)

○ Graph (Neo4J)

● What did we find?○ Stay tuned :)

Page 6: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Link Prediction

Page 7: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

● Why Link Prediction?○ Well researched

○ Useful

○ Multiple scoring functions

Link Prediction

D. Liben-Nowell and J. Kleinberg.The link prediction problem for social networks. In CIKM, 2003.

"Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added

to a specific node during the interval from time t to a given future time t'."

Page 8: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

● Common Neighbors○ Only neighbors

● Katz measure○ Paths

● Rooted PageRank○ Random walk

Link Prediction examples

Page 9: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Storage systems: MySQL

http://www.mysql.com/InnoDB vs MyISM: http://www.oracle.com/partners/en/knowledge-zone/mysql-5-5-innodb-myisam-522945.pdf

● Relational database○ Edges table○ Stored procedures, Indices, "helper" tables

2

4

1

3

65

ID1 ID1

1 2

1 3

2 1

2 3

2 4

2 5

3 1

Page 10: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Storage systems: Redis

http://redis.io/

● Key-Value store○ Adjacency sets○ Lua functions, "helper" database

2

4

1

3

65

1: (2, 3)2: (1, 2, 3, 4)3: (1, 2, 5)4: (2)5: (2, 3, 6)6: (5)

Page 11: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Storage systems: Neo4J

http://www.neo4j.org/

● Graph database○ No modeling required○ Cypher queries, Lucene "helper" index

2

4

1

3

65

2

4

1

3

65

Page 12: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Storage systems● Why these systems?

○ Popular

○ Open Source

● Perfect implementation?○ No. But,

■ Unbiased■ Best practices■ Same time-frame

Full implementation available on GitHub: github.com/natict/gdbb

Page 13: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Implementation of Common Neighbours

select E2.id2 as y, count(E2.id1) as neighbor_count

from edges as E1 join edges as E2

where E1.id1 = x and E1.id2 = E2.id1

and E1.id1 <> E2.id2

group by y

order by neighbor_count desc

imit 100;

START a=node({n})

MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)

WHERE a <> c

RETURN a.nid,c.nid,count(b) as score

ORDER BY score DESC

LIMIT 100

local tc = {};local x = KEYS[1];for k1,n in pairs(redis.call('smembers', x)) do for k2,y in pairs(redis.call('smembers', n)) do if x ~= y then tc[y] = (tc[y] or 0) + 1; end; end;end;local ttop = {}; -- Extract top 100 resultslocal min = math.huge;local mini = '';for k,v in pairs(tc) do if (#ttop < 100) then table.insert(ttop, {k,v}); if v<min then min=v; mini=table.maxn(ttop); end; else if v>min then ttop[mini] = {k,v}; min = math.huge; for i = 1,#ttop,1 do if ttop[i][2]<min then min=ttop[i][2]; mini=i; end; end; end; end;end; -- Now we just need to sort, and format the output...

...

SQL

Cypher

Lua

Page 14: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

CypherSTART a=node({n})

MATCH (a)-[:COAUTH]->(b)<-[:COAUTH]-(c)

WHERE a <> c

RETURN a.nid,c.nid,count(b) as score

ORDER BY score DESC

LIMIT 100a

b

c

b

a

c b

Page 15: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Datasets● Undirected● Medium sized● Socially oriented● Data sources

○ DBLP○ SNAP

DBLP in XML format: http://dblp.uni-trier.de/xml/SNAP Datasets: http://snap.stanford.edu/data/index.html

Name # Nodes # Edges

dblp-all 366,600 4,349,796

ca-HepPh 12,006 237,010

enron 36,692 367,662

facebook 4,039 170,174

Page 16: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Experiments

Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html

Page 17: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Experiments (2)

Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html

Page 18: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Experiments (3)

Detailed specifications and results: www.cs.huji.ac.il/~sara/link-prediction.html

Page 19: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Conclusions● MySQL is highly optimised

○ mainly for simple queries (with few joins)● Redis is very flexible and fast

○ mainly with complex metrics● Neo4J has implementation simplicity

○ with some limitations○ still evolving at a fast pace

● Future work○ More databases○ More algorithms

Page 20: Implementing Link-Prediction for Social Networks in a Database System (DBSocial2013)

Thank youNati (Netanel) Cohen-Tzemachlinkedin.com/in/natict

Acknowledgments:● Israel Science Foundation (Grant 143/09)● Ministry of Science and Technology (Grant 3-8710)● DBSocial Travel Award