![Page 1: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/1.jpg)
GraphX: Unifying Table and Graph Analytics
Presented by Joseph Gonzalez
Joint work with Reynold Xin, Daniel Crankshaw, AnkurDave, Michael Franklin, and Ion Stoica
IPDPS 2014
*These slides are best viewed in PowerPoint with anima
![Page 2: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/2.jpg)
Graphs are Central to Analytics
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 PageTitle PR
TextTableTitle Body
Topic Model(LDA) Word Topics
Word Topic
Editor GraphCommunityDetection
User Community
User Com.
Term-DocGraph
DiscussionTable
User Disc.
CommunityTopic
Topic Com.
![Page 3: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/3.jpg)
Update ranks in parallel
Iterate until convergence
Rank of user i Weighted sum of
neighbors’ ranks
3
PageRank: Identifying Leaders
![Page 4: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/4.jpg)
Low-Rank Matrix Factorization:
5
r13
r14
r24
r25
f(1)
f(2)
f(3)
f(4)
f(5)Use
r Fac
tors
(U) M
ovie Factors (M)
Use
rs
MoviesNetflix
Use
rs≈x
Movies
f(i)
f(j)
Iterate:
Recommending Products
![Page 5: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/5.jpg)
The Graph-Parallel Pattern
6
Model / Alg. State
Computation depends only on the neighbors
![Page 6: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/6.jpg)
Many Graph-Parallel Algorithms• Collaborative Filtering
– Alternating Least Squares– Stochastic Gradient
Descent– Tensor Factorization
• Structured Prediction– Loopy Belief Propagation– Max-Product Linear
Programs– Gibbs Sampling
• Semi-supervised ML– Graph SSL
– CoEM• Community Detection
– Triangle-Counting– K-core Decomposition– K-Truss
• Graph Analytics– PageRank– Personalized PageRank– Shortest Path– Graph Coloring
• Classification– Neural Networks 7
MACHINE LEARNING
SOCIAL NETWORK ANALYSIS
GRAPH ALGORITHMS
![Page 7: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/7.jpg)
Graph-Parallel Systems
8
oogle
Expose specialized APIs to simplify graph programming.
![Page 8: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/8.jpg)
“Think like a Vertex.”
- Pregel [SIGMOD’10]
9
![Page 9: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/9.jpg)
The Pregel (Push) AbstractionVertex-Programs interact by sending messages.
iiPregel_PageRank(i, messages) : // Receive all the messagestotal = 0foreach( msg in messages) :
total = total + msg
// Update the rank of this vertexR[i] = 0.15 + total
// Send new messages to neighborsforeach(j in out_neighbors[i]) :
Send msg(R[i]) to vertex j
10Malewicz et al. [PODC’09, SIGMOD’10]
![Page 10: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/10.jpg)
The GraphLab (Pull) AbstractionVertex Programs directly access adjacent vertices and edgesGraphLab_PageRank(i) // Compute sum over neighborstotal = 0foreach( j in neighbors(i)):
total = total + R[j] * wji
// Update the PageRankR[i] = 0.15 + total
11
R[4] * w41
++
44 11
33 22
Data movement is managed by the system and not the user.
![Page 11: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/11.jpg)
BarrierIterative Bulk Synchronous
ExecutionCompute Communicate
![Page 12: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/12.jpg)
Graph-Parallel Systems
13
oogle
Expose specialized APIs to simplify graph programming.
Exploit graph structure to achieve orders-of-magnitude performance gains
over more general data-parallel systems
![Page 13: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/13.jpg)
PageRank on the Live-Journal Graph
22
354
1340
0 200 400 600 800 1000 1200 1400 1600
GraphLab
Naïve Spark
Mahout/Hadoop
Runtime (in seconds, PageRank for 10 iterations)
Spark is 4x faster than HadoopGraphLab is 16x faster than Spark
![Page 14: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/14.jpg)
Counted: 34.8 Billion Triangles
15
Triangle Counting on Twitter
64 Machines15 Seconds
1536 Machines423 Minutes
Hadoop[WWW’11]
S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
1000 x Faster
40M Users, 1.4 Billion Links
![Page 15: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/15.jpg)
Graph Analytics Pipeline
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 PageTitle PR
TextTableTitle Body
Topic Model(LDA) Word Topics
Word Topic
Editor GraphCommunityDetection
User Community
User Com.
Term-DocGraph
DiscussionTable
User Disc.
CommunityTopic
Topic Com.
![Page 16: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/16.jpg)
Tables
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 PageTitle PR
TextTableTitle Body
Topic Model(LDA) Word Topics
Word Topic
Editor GraphCommunityDetection
User Community
User Com.
Term-DocGraph
DiscussionTable
User Disc.
CommunityTopic
Topic Com.
![Page 17: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/17.jpg)
Graphs
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 PageTitle PR
TextTableTitle Body
Topic Model(LDA) Word Topics
Word Topic
Editor GraphCommunityDetection
User Community
User Com.
Term-DocGraph
DiscussionTable
User Disc.
CommunityTopic
Topic Com.
![Page 18: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/18.jpg)
Separate Systems to Support Each View
Table View Graph View
Dependency Graph
Table
R lt
Result
RowRow
RowRow
RowRow
RowRow
![Page 19: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/19.jpg)
Having separate systems for each view is
difficult to use and inefficient
20
![Page 20: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/20.jpg)
Difficult to Program and Use
Users must Learn, Deploy, and Manage multiple systems
Leads to brittle and often complex interfaces
21
![Page 21: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/21.jpg)
Inefficient
22
Extensive data movement and duplication across the network and file system
< / >< / >< / >XML
HDFSHDFS HDFSHDFS HDFSHDFS HDFSHDFS
Limited reuse internal data-structures across stages
![Page 22: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/22.jpg)
GraphX Solution: Tables and Graphs are
views of the same physical data
GraphX UnifiedRepresentation
Graph ViewTable View
Each view has its own operators that exploit the semantics of the view
to achieve efficient execution
![Page 23: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/23.jpg)
Graphs Relational Algebra
1. Encode graphs as distributed tables
2. Express graph computation in relational algebra
3. Recast graph systems optimizations as:1. Distributed join optimization2. Incremental materialized maintenance
Integrate Graph and Table data
processing systems.
Achieve performance parity with
specialized systems.
![Page 24: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/24.jpg)
Part. 2
Part. 1
Vertex Table
B C
AA DD
F E
AA DD
Distributed Graphs as Distributed Tables
D
Property Graph
B C
D
E
AA
F
Edge Table
A B
A C
C D
B C
A E
A F
E F
E D
B
C
D
E
A
F
RoutingTable
B
C
D
E
A
F
1
2
1 2
1 2
1
2
2D Vertex Cut Heuristic
![Page 25: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/25.jpg)
Table OperatorsTable operators are inherited from Spark:
26
map
filter
groupBy
sort
union
join
leftOuterJoin
rightOuterJoin
reduce
count
fold
reduceByKey
groupByKey
cogroup
cross
zip
sample
take
first
partitionBy
mapWith
pipe
save
...
![Page 26: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/26.jpg)
class Graph [ V, E ] {def Graph(vertices: Table[ (Id, V) ],
edges: Table[ (Id, Id, E) ])// Table Views -----------------def vertices: Table[ (Id, V) ]def edges: Table[ (Id, Id, E) ]def triplets: Table [ ((Id, V), (Id, V), E) ]// Transformations ------------------------------def reverse: Graph[V, E]def subgraph(pV: (Id, V) => Boolean,
pE: Edge[V,E] => Boolean): Graph[V,E]def mapV(m: (Id, V) => T ): Graph[T,E] def mapE(m: Edge[V,E] => T ): Graph[V,T]// Joins ----------------------------------------def joinV(tbl: Table [(Id, T)]): Graph[(V, T), E ]def joinE(tbl: Table [(Id, Id, T)]): Graph[V, (E, T)]// Computation ----------------------------------def mrTriplets(mapF: (Edge[V,E]) => List[(Id, T)],
reduceF: (T, T) => T): Graph[T, E]}
Graph Operators
27
![Page 27: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/27.jpg)
Triplets Join Vertices and Edges
The triplets operator joins vertices and edges:
The mrTriplets operator sums adjacent triplets.SELECT t.dstId, reduce( map(t) ) AS sum FROM triplets AS t GROUPBY t.dstId
TripletsVertices
B
A
C
D
Edges
A BA CB CC D
A BA
B A CB CC D
SELECT s.Id, d.Id, s.P, e.P, d.PFROM edges AS eJOIN vertices AS s, vertices AS dON e.srcId = s.Id AND e.dstId = d.Id
![Page 28: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/28.jpg)
F
E
Example: Oldest Follower
D
B
A
CCalculate the number of older followers for each user?val olderFollowerAge = graph.mrTriplets(
e => // Mapif(e.src.age < e.dst.age) {(e.srcId, 1)
else { Empty },(a,b) => a + b // Reduce
).vertices
23 42
30
19 75
1629
![Page 29: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/29.jpg)
We express enhanced Pregel and GraphLab
abstractions using the GraphX operatorsin less than 50 lines of code!
30
![Page 30: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/30.jpg)
Enhanced Pregel in GraphX
31Malewicz et al. [PODC’09, SIGMOD’10]
pregelPR(i, messageList ):
// Receive all the messagestotal = 0foreach( msg in messageList) :total = total + msg
// Update the rank of this vertexR[i] = 0.15 + total
// Send new messages to neighborsforeach(j in out_neighbors[i]) :Send msg(R[i]/E[i,j]) to vertex
Require Message CombinersmessageSum
messageSum
Remove Message Computation
from theVertex Program
sendMsg(ij, R[i], R[j], E[i,j]):// Compute single messagereturn msg(R[i]/E[i,j])
combineMsg(a, b):// Compute sum of two messagesreturn a + b
![Page 31: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/31.jpg)
PageRank in GraphX
32
// Load and initialize the graph
val graph = GraphBuilder.text(“hdfs://web.txt”)
val prGraph = graph.joinVertices(graph.outDegrees)
// Implement and Run PageRank
val pageRank =
prGraph.pregel(initialMessage = 0.0, iter = 10)(
(oldV, msgSum) => 0.15 + 0.85 * msgSum,
triplet => triplet.src.pr / triplet.src.deg,
(msgA, msgB) => msgA + msgB)
![Page 32: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/32.jpg)
Join EliminationIdentify and bypass joins for unused triplet fields
33
02000400060008000
100001200014000
0 5 10 15 20
Com
mun
icat
ion
(MB
)
Iteration
PageRank on Twitter Three Way Join
Join Elimination
Factor of 2 reduction in communication
sendMsg(ij, R[i], R[j], E[i,j]):// Compute single messagereturn msg(R[i]/E[i,j])
![Page 33: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/33.jpg)
We express the Pregel and GraphLab likeabstractions using the GraphX operators
in less than 50 lines of code!
34
By composing these operators we canconstruct entire graph-analytics pipelines.
![Page 34: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/34.jpg)
Example Analytics Pipeline// Load raw data tables
val verts = sc.textFile(“hdfs://users.txt”).map(parserV)
val edges = sc.textFile(“hdfs://follow.txt”).map(parserE)
// Build the graph from tables and restrict to recent links
val graph = new Graph(verts, edges)
val recent = graph.subgraph(edge => edge.date > LAST_MONTH)
// Run PageRank Algorithm
val pr = graph.PageRank(tol = 1.0e-5)
// Extract and print the top 25 users
val topUsers = verts.join(pr).top(25).collect
topUsers.foreach(u => println(u.name + ‘\t’ + u.pr))
![Page 35: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/35.jpg)
The GraphX Stack(Lines of Code)
GraphX (3575)
Spark
Pregel (28) + GraphLab (50)
PageRank (5)
Connected Comp. (10)
Shortest Path (10)
ALS(40) LDA
(120)
K-core(51)
Triangle
Count(45)
SVD(40)
![Page 36: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/36.jpg)
Performance Comparisons
22
68207
3541340
0 200 400 600 800 1000 1200 1400 1600
GraphLabGraphXGiraph
Naïve SparkMahout/Hadoop
Runtime (in seconds, PageRank for 10 iterations)
GraphX is roughly 3x slower than GraphLab
Live-Journal: 69 Million Edges
![Page 37: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/37.jpg)
GraphX scales to larger graphs
203
451
749
0 200 400 600 800
GraphLab
GraphX
Giraph
Runtime (in seconds, PageRank for 10 iterations)
GraphX is roughly 2x slower than GraphLab»Scala + Java overhead: Lambdas, GC time, …»No shared memory parallelism: 2x increase in comm.
Twitter Graph: 1.5 Billion Edges
![Page 38: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/38.jpg)
PageRank is just one stage….
What about a pipeline?
![Page 39: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/39.jpg)
HDFSHDFSHDFSHDFS
ComputeComputeSpark PreprocessSpark Preprocess Spark Post.Spark Post.
A Small Pipeline in GraphX
Timed end-to-end GraphX is faster than GraphLab
Raw Wikipedia
< / >< / >< / >XML
Hyperlinks PageRank Top 20 Pages
342
1492
0 200 400 600 800 1000 1200 1400 1600
GraphLab + SparkGraphX
Giraph + SparkSpark
Total Runtime (in Seconds)
605
375
![Page 40: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/40.jpg)
StatusPart of Apache Spark
In production at several large technology companies
![Page 41: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/41.jpg)
GraphX: Unified Analytics
Enabling users to easily and efficientlyexpress the entire graph analytics
pipeline
New APIBlurs the distinction between Tables and
Graphs
New SystemCombines Data-Parallel Graph-Parallel Systems
![Page 42: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/42.jpg)
A Case for Algebra in Graphs
A standard algebra is essential for graph systems:
• e.g.: SQL proliferation of relational system
By embedding graphs in relational algebra:
• Integration with tables and preprocessing
• Leverage advances in relational systems
• Graph opt. recast to relational systems
![Page 43: GraphX: Unifying Table and Graph AnalyticsDistributed join optimization 2. Incremental materialized maintenance Integrate Graph and Table data processing systems. Achieve performance](https://reader034.vdocuments.site/reader034/viewer/2022052001/60141ee4250da70cae29e3d4/html5/thumbnails/43.jpg)
Thanks!
[email protected]@eecs.berkeley.edu
[email protected]@eecs.berkeley.edu
http://amplab.cs.berkeley.edu/projects/graphx/