socialite a datalog-based language for large-scale graph...
TRANSCRIPT
![Page 1: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/1.jpg)
SociaLite: A Datalog-based Language for Large-Scale Graph Analysis
Jiwon Seo
M O B I S O C I A L R E S E A R C H G R O U P
![Page 2: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/2.jpg)
Overview
![Page 3: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/3.jpg)
! SociaLite: language for large-scale graph analysis ! Extensions to Datalog ! Compiler optimizations for SociaLite queries ! Graph algorithms in SociaLite
! HW 6 (using SociaLite)
Overview
![Page 4: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/4.jpg)
! Analysis of large-scale graphs are important › IT Industry – Twitter, LinkedIn, FaceBook, Pinterest › Bio-informatics, etc
! Challenges › Difficulty of distributed programming › Complexity of graph algorithms
Motivation & Challenges
![Page 5: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/5.jpg)
! MapReduce › MapReduce based graph systems (HaLoop)
! Pregel › vertex-centric programming
" Too low-level programming model
State of the Art Technology
![Page 6: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/6.jpg)
! SociaLite ! Abstractions for graph algorithms ! Datalog-based query language " Graph algorithms in high-level language, and
compiled to parallel/distributed code
Distributed Graph Language
![Page 7: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/7.jpg)
Single-Source Shortest Paths (SSSP)
! Shortest distances from a single source node to rest of the nodes in a graph
! Core graph algorithm & a running example in this talk
![Page 8: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/8.jpg)
! Two tables to store the graph and path distances – Edge(src, target, length). – Path(target, distance). – MinPath(target, minimum-distance).
SSSP in Datalog
Path(t, d) :- Edge(1, t, d). (1) Path(t, d) :- Path(s, d1), Edge(s, t, d2), d=d1+d2. (2) MinPath(t, $min(d)) :- Path(t,d). (3)
![Page 9: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/9.jpg)
! Execution times for SSSP*
! Datalog 30~250 times slower
Datalog Performance
Exec Time (seconds) Overlog 24.9
IRIS 12.0
LogicBlox 3.4
Java** 0.1
* synthetic graph with 100K nodes, 1M edges ** Java program implemented in Dijkstra’s algorithm
![Page 10: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/10.jpg)
! Execution time wasted in sub-optimal paths
Why is Datalog Slow?
Path(t, d) :- Edge(1, t, d). (1) Path(t, d) :- Path(s, d1), Edge(s, t, d2), d=d1+d2. (2) MinPath(t, $min(d)) :- Path(t,d). (3)
![Page 11: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/11.jpg)
! Inefficient data structure for graphs
Why is Datalog Slow?
1 9
2 5 2 7 3 11
1 10 1 2
9 10
11
5 7
3 vs
Java, adjacency list Datalog, flat table
![Page 12: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/12.jpg)
! Recursive aggregate functions ! Tail-nested tables (data layout extension)
Extensions in SociaLite
![Page 13: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/13.jpg)
! Aggregate functions inside recursion " Pruning suboptimal answers
! Syntax and Semantics
Recursive Aggregate Functions
Path(t, $min(d)) :- Edge(1,t,d) ; :- Path(s, d1), Edge(s, t, d2), d=d1+d2.
Shortest Paths in SociaLite
![Page 14: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/14.jpg)
! Equation for recursive rules: R = h(R) = g o f (R)
Fixed-Point Semantics
e.g. f(R) = { t, d | Edge(1, t, d) ( s, d1 R Edge(s, t, d2) d = d1 + d2) } g(R) = { t, min t, d R d }
〈 〉
〈 ∈ 〉〈 〉
Path(t, $min(d)) :- Edge(1,t,d) ; :- Path(s, d1), Edge(s, t, d2), d=d1+d2.
∨∈〈 〉 ∧ ∧
![Page 15: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/15.jpg)
R = h(R) = g o f (R) ! If g is a meet operator, and f is monotone under g " naïve evaluation converges to a fixed-point " the solution is greatest fixed-point solution
! g induces partial order and semi-lattice ! x y " f(x) f(y) where is from g
Fixed-Point Semantics
![Page 16: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/16.jpg)
R = h(R) = g o f (R) ! If g is a meet operator, and f is monotone under g " naïve evaluation converges to a fixed-point " the solution is greatest fixed-point solution
! g induces partial order and semi-lattice ! x y " f(x) f(y) where is from g
Fixed-Point Semantics
Path(t, $min(d)) :- Edge(1,t,d) ; :- Path(s, d1), Edge(s, t, d2), d=d1+d2.
![Page 17: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/17.jpg)
! Semi-naïve evaluation ! Optimized evaluation for recursion ! Uses delta (new tuples) as input for evaluation "Transforms shortest-paths to Bellman-Ford
! Prioritization by (partial order by g) ! e.g. in shortest-paths, prioritize the tuple with
minimum distance ($min) " Transforms the program into Dijkstra’s algorithm
Optimizations
* SociaLite: Datalog Extensions for Efficient Social Network Analysis, ICDE’13
![Page 18: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/18.jpg)
! Tables with nesting at the last column ! Annotation in table declarations e.g.
! Data as index ! Range annotation (N1..N2) " array index
e.g.
Tail-nested Tables
Edge(int s, (int t)).
Edge(int s:1..10, (int t)).
![Page 19: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/19.jpg)
Edge(int s, int t). Edge(int s, (int t)).
1 2
9 10
5 7
9
1 9
2 5 2 7 2 9
1 10
Edge(int s:0..10, (int t)).
1 2
9 10
5 7
9
![Page 20: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/20.jpg)
! Partitioning (sharding) by first column ! Hash-based partitioning
Distributed Tables
Machine 1 Machine 2
1 7
9 10
11
2 5 7
9
Edge(int src, (int target)).
![Page 21: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/21.jpg)
! Range-based partitioning
Distributed Tables
Machine 1 Machine 2
1 2
9 10
11 7
5 7
9
Edge(int src:0..10, (int target)).
![Page 22: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/22.jpg)
Distributed Execution Foo(int a, int b). Bar(int a, int b). Qux(int a, int b). Foo(a, c) :- Bar(a, b), Qux(b, c).
1 2 Bar
2 9 Qux 1 9
Qux
Foo
Bar
Foo
Machine 1 Machine 2
join
transfer 1 9
![Page 23: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/23.jpg)
Parallel Prioritization of Recursive Aggregation
Path(t, $min(d)) :- t=0, d=0; :- Path(s, d1), Edge(s, t, d2), d=d1+d2.
Δ Path = Path(1, 1.1) Path(2, 1.5) Path(1, 1.2)
Path(3, 2.1) Path(4, 2.7) Path(0, 2.3)
Path(9, 3.1) Path(7, 3.9) Path(4, 3.3)
(2.0 < d < 3.0)
…
(d < 2.0) (3.0 < d < 4.0)
" Delta-stepping algorithm generalized, applied to recursive aggregate functions
![Page 24: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/24.jpg)
! Condition Folding – Binary search for sorted columns e.g.
! Pipelining – Evaluate multiples rules at once (locality) e.g.
Other Optimizations
Foo(a,b) :- Bar(a,b), b > 10.
Foo(a,b) :- Bar(a,b), b > 10. Baz(a,b) :- Foo(a,b), c=b*b.
![Page 25: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/25.jpg)
Approximate Computation ! Table columns as Bloom Filter ! Store large intermediate results approximately
Other Optimizations
![Page 26: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/26.jpg)
Approximaton w/ Bloom Filter
Foaf(i, ff) :- Friend(i, f), Friend(f, ff). LocalCount(i, $inc(1)) :- Foaf(i, ff), Attr(ff, “Some Attr”).
![Page 27: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/27.jpg)
Approximaton w/ Bloom Filter
Foaf(i, ff) :- Friend(i, f), Friend(f, ff). LocalCount(i, $inc(1)) :- Foaf(i, ff), Attr(ff, “Some Attr”).
(2nd column of Foaf table is represented with a Bloom filter)
![Page 28: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/28.jpg)
Approximaton w/ Bloom Filter
Foaf(i, ff) :- Friend(i, f), Friend(f, ff). LocalCount(i, $inc(1)) :- Foaf(i, ff), Attr(ff, “Some Attr”).
(2nd column of Foaf table is represented with a Bloom filter)
Exact Approximation Comparison Exec time (min) 28.9 19.4 32.8% faster Memory usage(GB) 26.0 3.0 11.5% usage Accuracy(<10% error) 100.0% 92.5%
* LiveJournal (4.8M nodes, 68M edges)
![Page 29: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/29.jpg)
! SociaLite queries embedded in Python code ! `Queries are quoted in backtick`
! Python #" SociaLite ! Python functions, variables are accessible in
SociaLite queries (with prefix $) ! SociaLite tables are readable from Python
Python Integration
![Page 30: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/30.jpg)
PageRank in Python & SociaLite
Python Integration
`Rank(n, 0, r) :- Node(n), r=1.0/$N. ` for i in range(50): `Rank(pi, $i+1, $sum(r)) :- Node(pi), r=0.15*1.0/$N; :- Rank(pj, $i, r1), Edge(pj, pi), EdgeCnt(pj, cnt), r=0.85*r1/cnt.`
![Page 31: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/31.jpg)
Benchmark algorithms ! Shortest-Paths ! PageRank ! Mutual Neighbors ! Connected Components ! Finding Triangles ! Clustering Coefficients
" Evaluated on single-core multi-core/distributed cluster
Evaluation
![Page 32: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/32.jpg)
Input Graph for Single-Core
Source Type Size Used for Machine
LiveJournal directed 4.8M nodes 68M edges
Shortest Paths PageRank
Intel Xeon 2.80GHz
32GB memory
Last.fm undirected 1.7M nodes 6.4M edges
Mutual Neighbors
Connected Components
Triangle Clustering
Coefficients
Intel Core2 2.66GHz
3GB memory
![Page 33: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/33.jpg)
Speedup from Tail-nested Tables
![Page 34: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/34.jpg)
Speedup from Other Optimizations
![Page 35: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/35.jpg)
SociaLite (w/ optimizations)
0
1
2
3
Shortest Paths
PageRank Mutual Neighbors
Connected Components
Triangles Clustering Coefficients
spee
dup
over
Jav
a
SociaLite
![Page 36: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/36.jpg)
Optimized Java vs SociaLite
0
1
2
3
Shortest Paths
PageRank Mutual Neighbors
Connected Components
Triangles Clustering Coefficients
spee
dup
over
initi
al J
ava
SociaLite Opt Java
![Page 37: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/37.jpg)
Input Graph for Multi-Core
Source Size Machine
Friendster 120M nodes 2.5B edges
Intel Xeon E5-2670 16 cores(8+8)
2.60GHz 20MB last-level cache
256GB memory
![Page 38: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/38.jpg)
Parallel Performance (Multi-Core)
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
20
1 2 4 6 8 10 12 14 16
Par
alle
liza
tion
Sp
eed
up
Exe
cuti
on
Tim
e (M
in.)
Number of Cores
time speedup ideal speedup
0
2
4
6
8
10
12
14
16
18
0
10
20
30
40
50
60
70
1 2 4 6 8 10 12 14 16
Par
alle
liza
tio
n S
pee
du
p
Exe
cuti
on
Tim
e (M
in.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
20
40
60
80
100
120
1 2 4 6 8 10 12 14 16
Par
alle
liza
tion
Sp
eed
up
Exe
cuti
on
Tim
e (S
ec.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
10
20
30
40
50
60
70
80
90
100
1 2 4 6 8 10 12 14 16
Par
alle
liza
tio
n S
pee
du
p
Exe
cuti
on
Tim
e (S
ec.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
50
100
150
200
250
1 2 4 6 8 10 12 14 16
Par
alle
liza
tion
Sp
eed
up
Exe
cuti
on
Tim
e (M
in.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
1 2 4 6 8 10 12 14 16
Par
alle
liza
tion
Sp
eed
up
Exe
cuti
on
Tim
e (H
ou
rs)
Number of Threads
PageRank Mutual Neighbors
Connected Components Triangle Clustering Coefficients
Shortest Paths # of cores
![Page 39: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/39.jpg)
Input Graph for Distributed Evaluation
Source Size Machine
Synthetic Graph* up to 268M nodes 4.3B edges
(weak scaling)
64 Amazon EC2 Instances Intel Xeon X5570, 8 cores
23GB memory
*RMAT algorithm, Graph 500 Generator
![Page 40: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/40.jpg)
Distributed Performance
40
160
640
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
SociaLite Ideal(BF) Ideal(DS)
40
80
160
320
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
2
4
8
16
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
Shortest Paths PageRank Mutual Neighbors
8
16
32
64
128
256
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
Clustering Coefficients Triangle Connected Components
2
8
32
128
2 4 8 16 32 64
Exec
Tim
e (M
in.)
8
32
128
512
2 4 8 16 32 64
Exec
Tim
e (M
in.)
# of machines
![Page 41: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/41.jpg)
Giraph (Pregel) vs SociaLite
8
32
128
512
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
2
8
32
128
2 4 8 16 32 64
Exec
Tim
e (M
in.)
8
32
128
512
2 4 8 16 32 64
Exec
Tim
e (M
in.)
Clustering Coefficients Triangle Connected Components
40
160
640
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
40
160
640
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
1
2
4
8
16
32
2 4 8 16 32 64
Exec
Tim
e (S
ec.)
Shortest paths PageRank Mutual neighbors # of machines
![Page 42: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/42.jpg)
! Programming model: – Vertex-centric model – (manual) message passing ! Implement function F – Executed at each iteration – Process messages received – Send out messages (delivered next iteration)
Pregel (Giraph)
![Page 43: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/43.jpg)
! Giraph (Pregel) vs SociaLite (lines of code)
Programmability (Distributed)
Giraph SociaLite Shortest Paths 232 4
PageRank 146 13
Mutual Neighbors 169 6
Connected Components 122 9
Triangles 181 6
Clustering Coefficients 218 12
Total 1,068 50
![Page 44: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/44.jpg)
! In collaboration with Intel Parallel Research Lab ! Compared frameworks ! SociaLite ! Giraph ! GraphLab ! Combinatorial BLAS
! Native Implementation in C, assembly – optimal
Comparisons of Graph Frameworks
![Page 45: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/45.jpg)
! Benchmark Algorithms ! BFS (Breadth First Search) ! PageRank ! Collaborative Filtering ! Triangle
! Evaluation on Intel cluster – Intel Xeon E5-2697, 24 cores 2.7GHz, 64GB memory, InfiniBand network – Input graph: up to 500M vertices, 16B edges
Comparisons of Graph Frameworks
![Page 46: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/46.jpg)
! BFS (Breadth First Search)
Programmability
Lines of Code
SociaLite 4
Giraph 200
GraphLab 180
Combinatorial BLAS 450
Native > 1000
![Page 47: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/47.jpg)
! BFS (Breadth First Search)
Programmability
Lines of Code Development Time
SociaLite 4 10min
Giraph 200 1~2 hours
GraphLab 180 1~2 hours
Combinatorial BLAS 450 a few hours
Native > 1000 > A few months
![Page 48: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/48.jpg)
Distributed Execution – Comparison
0"
1"
10"
100"
1000"
1" 4" 16" 64"
Exec%&me%(sec.)%
Na(ve" Combblas" Graphlab" Giraph"
Breadth First Search
0.1"
1"
10"
100"
1" 4" 16" 64"
Time%pe
r%iter.%(sec.)%
PageRank
1"
10"
100"
1000"
10000"
1" 4" 16" 64"
Time%pe
r%iter.%(sec.)%
0"
1"
10"
100"
1000"
1" 4" 16" 64"Exec%&me%(sec.)%
Triangle Collaborative Filtering
# of machines
![Page 49: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/49.jpg)
Distributed Execution – Comparison
0"
1"
10"
100"
1000"
1" 4" 16" 64"
Exec%&me%(sec.)%
Na(ve" Combblas" Graphlab" Socialite" Giraph"
Breadth First Search
0.1"
1"
10"
100"
1" 4" 16" 64"
Time%pe
r%iter.%(sec.)%
PageRank
1"
10"
100"
1000"
10000"
1" 4" 16" 64"
Time%pe
r%iter.%(sec.)%
0"
1"
10"
100"
1000"
1" 4" 16" 64"Exec%&me%(sec.)%
Triangle Collaborative Filtering
# of machines
SociaLite
![Page 50: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/50.jpg)
! Large-scale graph analysis made easy with SociaLite ! Succinct code (1/10th) ! High-level abstraction w/ compiler optimizations
! Competitive performance to other graph frameworks
Summary
![Page 51: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/51.jpg)
! Thank you! ! Questions?
![Page 52: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/52.jpg)
! Analysis of DBLP co-authorship graph ! DBLP – CS bibliography containing authors and papers – Co-authorship graph 1.2M nodes, 9.5M edges
Homework
![Page 53: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/53.jpg)
! Example code › Shortest-paths › PageRank
! Part 1 ! Part 2
" Will be announced later today
Homework
![Page 54: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/54.jpg)
![Page 55: SociaLite A Datalog-based Language for Large-Scale Graph ...courses/cs243/lectures/l12-socialite.pdf · SociaLite: language for large-scale graph analysis ! Extensions to Datalog](https://reader031.vdocuments.site/reader031/viewer/2022030417/5aa312de7f8b9a46238de45a/html5/thumbnails/55.jpg)