the ieee international conference on big data 2013
DESCRIPTION
A Distributed Vertex-Centric Approach for Pattern Matching in Massive Graphs. Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz. Computer Science Department University of Georgia, Athens, GA. October 7, 2013. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/1.jpg)
The IEEE International Conference on Big Data 2013
• Arash Fard• M. Usman Nisar• Lakshmish Ramaswamy• John A. Miller• Matthew Saltz
Computer Science DepartmentUniversity of Georgia, Athens, GA
October 7, 2013
A Distributed Vertex-Centric Approach for Pattern Matching in Massive Graphs
![Page 2: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/2.jpg)
2
Outline
Introduction Subgraph Pattern Matching
Types of Subgraph Pattern Matching Strict Simulation Distributed Algorithms Experiments
![Page 3: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/3.jpg)
3
Introduction Labeled Directed Graph G(V, E, l) Versatile and expressive
Social networks, web search engines, genome sequencing, etc. Data sizes are growing rapidly
Facebook: 1 billion+ users, average degree of 140 Twitter: 200 million+ users, 400 million tweets each day
Bigger datasets mean bigger graphs Pattern (Query Graph): the interesting small graph Data Graph: the massive graph contains the
information Pattern Matching: finding the instances of the pattern in
the data graph
![Page 4: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/4.jpg)
4
Subgraph Pattern Matching There are different paradigms for pattern matching:
Exact Topological Matching Sub-graph Isomorphism
Simulation Matching (meaning of the relations) Graph Simulation Dual Simulation Strong Simulation Strict Simulation SD
Bio PM
PMSD: Software DeveloperBio: BiologistPM: Product Manager
SD
Bio PM
PM
John
Ann
Sara
SDMary
BioBobAlic
e
Pattern Graph Data Graph
Endorsed byA B
![Page 5: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/5.jpg)
5
Example: Graph Simulation
PMPM
SA
DB
SA
DB
PM
SD
AIAI
DBDB AI
AI DB AI
SA
Data Graph
DB
AI
PM
PM
SA
PMSA
1
2
3
45 6
7
8
9
10
11
12
13
14
151617
18 19 20
21
22
23 24
Pattern Graph
PM
SA
DBAI
PM: Product ManagerSA: Software AnalystDB: Database DesignerAI: AI specialistSD: Software Developer
Label and Children conditions
Quadratic time complexity
![Page 6: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/6.jpg)
6
Example: Dual Simulation
PMPM
SA
DB
SA
DB
PM
SD
AIAI
DBDB AI
AI DB AI
SA
Pattern Graph
Data Graph
PM
SA
DBAI
DB
AI
PM
PM
SA
PMSA
1
2
3
45 6
7
8
9
10
11
12
13
14
151617
18 19 20
21
22
23 24
PM: Product ManagerSA: Software AnalystDB: Database DesignerAI: AI specialistSD: Software Developer
Adds Parents condition Cubic time complexity
![Page 7: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/7.jpg)
7
Strong Simulation Extends dual simulation with a locality condition A ball Gb[v, r] is a subgraph of a graph G
containing: all vertices Vb within an undirected distance r of the
vertex v all the edges between the vertices in Vb
S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo, “Capturing topology in graph pattern matching,” Proc. VLDB Endow., vol. 5, no. 4, pp. 310–321, Dec. 2011.
X86
1432
57
X86
1432
57
![Page 8: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/8.jpg)
8
Example: Strong Simulation
PMPM
SA
DB
SA
DB
PM
SD
AIAI
DBDB AI
AI DB AI
SA
Data Graph
DB
AI
PM
PM
SA
PMSA
1
2
3
45 6
7
8
9
10
11
12
13
14
151617
18 19 20
21
22
23 24
Pattern Graph
PM
SA
DBAI
PM: Product ManagerSA: Software AnalystDB: Database DesignerAI: AI specialistSD: Software Developer
Adds Locality condition Cubic time complexity
![Page 9: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/9.jpg)
9
Example: Subgraph Isomorphism
PMPM
SA
DB
SA
DB
PM
SD
AIAI
DBDB AI
AI DB AI
SA
Data Graph
DB
AI
PM
PM
SA
PMSA
1
2
3
45 6
7
8
9
10
11
12
13
14
151617
18 19 20
21
22
23 24
2 Subgraph results:• {4,6,7,8}• {5,6,7,8}
Pattern Graph
PM
SA
DBAI
PM: Product ManagerSA: Software AnalystDB: Database DesignerAI: AI specialistSD: Software Developer
Exact Topological Match NP-complete in general
case
![Page 10: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/10.jpg)
10
Strict Simulation
Strict Simulation is a smart modification of Strong Simulation Smaller balls Faster computation The same quality of results or better
Cubic time complexity
![Page 11: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/11.jpg)
11
Strict vs Strong Simulation
![Page 12: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/12.jpg)
12
Example: Strict vs Strong Simulation
dq = 2 Strong Simulation: Gb[2,2] contains all vertices Strict Simulation: Gb[2,2] contains {2,1,10,3,4} Blue and white vertics in the result of Strong
Simulation White vertices in the result of Strict Simulation
![Page 13: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/13.jpg)
13
Vertex Centric BSP Bulk Synchronous Parallel Vertex Centric
Each vertex of the graph is a computing unit Each vertex initially knows only its id, label, and out going edges Algorithm terminates when every vertex votes to halt.
GPS, a free implementation of Pregel, developed in infoLab, Stanford University.
![Page 14: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/14.jpg)
14
Distributed Algorithm for Graph Simulation
Initialization:• Pattern is received from the Master• Set the match flag to true
match flag: indicating if the vertex is a candidate member of the result match set
matchSet: set of vertices in Q that are candidate match
Vertex
match flag
matchSet
G1 True Empty
G2 True Empty
G3 True Empty
G4 True Empty
G5 True Empty
G6 True Empty
G7 True Empty
G8 True Empty
Pattern Graph
a
b c
b
c
cb
ea
f
dG1
G2
G3
G4
G5
G6
G7
G8
Q3Q2
Q1
![Page 15: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/15.jpg)
15
Distributed Algorithm for Graph Simulation
Vertex
match flag
matchSet
G1 False Empty
G2 True Q2
G3 True Q1
G4 True Q3
G5 True Q2
G6 True Q3
G7 False Empty
G8 False Empty
Pattern Graph
a
b c
b
c
cb
ea
f
dG1
G2
G3
G4
G5
G6
G7
G8
Data Graph
1st Superstep
Q3Q2
Q1
![Page 16: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/16.jpg)
16
Distributed Algorithm for Graph Simulation
Vertex
match flag
matchSet
G1 False Empty
G2 True Q2
G3 True Q1
G4 True Q3
G5 True Q2
G6 True Q3
G7 False Empty
G8 False Empty
Pattern Graph
a
b c
b
c
cb
ea
f
dG1
G2
G3
G4
G5
G6
G7
G8
Data Graph
2nd Superstep
Q3Q2
Q1
![Page 17: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/17.jpg)
17
Distributed Algorithm for Graph Simulation
Vertex
match flag
matchSet
G1 False Empty
G2 True Q2
G3 True Q1
G4 True Q3
G5 False Empty
G6 True Q3
G7 False Empty
G8 False Empty
Pattern Graph
a
b c
b
c
cb
ea
f
dG1
G2
G3
G4
G5
G6
G7
G8
Data Graph
3rd and 4th Supersteps
Q3Q2
Q1
![Page 18: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/18.jpg)
18
Other Distributed Graph Algorithms
Dual simulation In addition to storing the match sets of its children, a vertex
also stores the match sets of its parents During match set evaluation, a vertex takes both of these
into account Strong simulation
Run dual simulation to obtain match relation Rd
For each matching vertex v in Rd : Create a ball centered at v with radius dQ containing any vertex
in G Perform dual simulation on the ball
Strict simulation Run dual simulation to obtain match relation Rd
For each matching vertex v in Rd : Create a ball centered at v with radius dQ containing only
vertices in Rd
Perform dual simulation on the ball
![Page 19: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/19.jpg)
19
Experiments System
Cluster of 12 machines 128 GB RAM Two 2GHz Intel Xeon E-2620 GPS (developed in infoLab, Stanford University)
Datasets Synthesized graphs: |V|= 100e6, |E|=|V| α , α = 1.2 Real world graphs
uk-2002: |V|= 18e6, |E|=298e6 ljournal: |V|= 5e6, |E|=79e6 Amazon-2008: |V|= 7e5, |E|=5e6http://law.di.unimi.it/datasets.php
![Page 20: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/20.jpg)
20
Experiment 1: Comparison of Strong and Strict Simulation
Synthesized (|V |= 1e6, α = 1.2)
Amazon-2008 (|V |= 7e5, |V| = 5e6)
Amazon-2008 (|V |= 7e5, |Vq| = 20)
Amazon-2008 (|V |= 7e5, |Vq| = 20)
![Page 21: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/21.jpg)
21
Experiment 2: Running time
uk-2002-hc (|V |= 18e6, |E|=298e6)
(l = 200 & |Vq| = 20)
Synthesized (|V |= 1e8, α = 1.2,|E|=|V| α )
Quality of result:Graph Simulation < Dual Simulation < Strict Simulation
![Page 22: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/22.jpg)
22
Experiment 3: Impact of dataset
Synthesized (|V |= 100e6)
(l = 200 & |Vq| = 20)
Synthesized (α = 1.2, |E|=|V| α )
![Page 23: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/23.jpg)
23
Experiment 4: Impact of pattern
uk-2002-hc (|V |= 18e6, |E|=298e6)
Synthesized (|V |= 100e6, α = 1.2 , |E|=|V| α )
![Page 24: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/24.jpg)
24
Related work Pattern matching models
P-homomorphism Graph Simulation, Dual and Strong Bounded simulation
Distributed algorithms for pattern matching S. Ma, Y. Cao, J. Huai, and T. Wo, “Distributed
graph pattern matching,” WWW 2012. Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li,
“Efficient subgraph matching on billion node graphs,”VLDB 2012.
![Page 25: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/25.jpg)
25
Introducing a new Pattern Matching model
Design, implementation, and test of 4 distributed algorithms for Pattern Matching Graph Simulation Dual Simulation Strong Simulation Strict Simulation
Conclusion
![Page 26: The IEEE International Conference on Big Data 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/56816502550346895dd76f84/html5/thumbnails/26.jpg)
26
Questions?
Thanks