hadoop architecture and ecosystem - polito.it...2020/05/04 · graphframe input: the textual file...
TRANSCRIPT
![Page 1: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/1.jpg)
19/05/2020
1
Spark Streaming
1
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): user identifier
▪ name (string): user name
▪ age (integer): user age
2
![Page 2: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/2.jpg)
19/05/2020
2
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “follow”or “friend”
3
Output:
For each user with at least one follower, store in the output folder the number of followers
▪ One user per line
▪ Format: user id, number of followers
Use the CSV format to store the result
4
![Page 3: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/3.jpg)
19/05/2020
3
5
Input graph example
u1 Alice
34
u6 Adel
36
u5 Paul
32
u4 David
29
u3 John
30
U2 Bob 36
u7 Eddy
60
friend
friend
friend friend
follow
follow follow
follow
friend
friend
friend
friend
follow
Result
6
id numFollowers
u3 2
u6 2
u2 1
![Page 4: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/4.jpg)
19/05/2020
4
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): user identifier
▪ name (string): user name
▪ age (integer): user age
7
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “follow”or “friend”
8
![Page 5: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/5.jpg)
19/05/2020
5
Output:
Consider only the users with at least one follower
Store in the output folder the user(s) with the maximum number of followers
▪ One user per line
▪ Format: user id, number of followers
Use the CSV format to store the result
9
10
Input graph example
u1 Alice
34
u6 Adel
36
u5 Paul
32
u4 David
29
u3 John
30
U2 Bob 36
u7 Eddy
60
friend
friend
friend friend
follow
follow follow
follow
friend
friend
friend
friend
follow
![Page 6: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/6.jpg)
19/05/2020
6
Result
11
id numFollowers
u3 2
u6 2
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): user identifier
▪ name (string): user name
▪ age (integer): user age
12
![Page 7: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/7.jpg)
19/05/2020
7
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “follow”or “friend”
13
Output:
The pairs of users Ux, Uy such that
▪ Ux is friend of Uy (link “friend” from Ux to Uy)
▪ Uy is not friend of Uy (no link “friend” from Uy to Ux)
One pair Ux,Uy per line
Format: idUx, idUy
Use the CSV format to store the result
14
![Page 8: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/8.jpg)
19/05/2020
8
15
Input graph example
u1 Alice
34
u6 Adel
36
u5 Paul
32
u4 David
29
u3 John
30
U2 Bob 36
u7 Eddy
60
friend
friend
friend friend
follow
follow follow
follow
friend
friend
follow
Result
16
IdFriend IdNotFriend
u4 u1
u1 u2
![Page 9: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/9.jpg)
19/05/2020
9
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): vertex identifier
▪ entityType (string): “user” or “topic”
▪ name (string): name of the entity
17
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “expertOf” or “follow” or “correlated”
18
![Page 10: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/10.jpg)
19/05/2020
10
Output:
The followed topics for each user
One pair (user name, followed topic) per line
Format: username, followed topic
Use the CSV format to store the result
19
20
Input graph example
like
follow
follow
expertOf
V1 “user”
“Paolo”
V4 “topic”
“Big Data”
V3 “user”
“David”
V2 “topic” “SQL”
V5 “user” “John”
correlated correlated follow
follow
![Page 11: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/11.jpg)
19/05/2020
11
Result
21
username topic
Paolo Big Data
David SQL
David Big Data
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): vertex identifier
▪ entityType (string): “user” or “topic”
▪ name (string): name of the entity
22
![Page 12: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/12.jpg)
19/05/2020
12
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “expertOf” or “follow” or “correlated”
23
Output:
The names of the users who follow a topic correlated with the “Big Data” topic
One user name per line
Format: username
Use the CSV format to store the result
24
![Page 13: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/13.jpg)
19/05/2020
13
25
Input graph example
like
follow
follow
expertOf
V1 “user”
“Paolo”
V4 “topic”
“Big Data”
V3 “user”
“David”
V2 “topic” “SQL”
V5 “user” “John”
correlated correlated follow
follow
Result
26
username
David
![Page 14: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/14.jpg)
19/05/2020
14
GraphFrame Input:
The textual file vertexes.csv
▪ It contains the vertexes of a graph
Each vertex is characterized by
▪ id (string): user identifier
▪ name (string): user name
▪ age (integer): user age
27
The textual file edges.csv
▪ It contains the edges of a graph
Each edge is characterized by
▪ src (string): source vertex
▪ dst (string): destination vertex
▪ linktype (string): “follow”or “friend”
28
![Page 15: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/15.jpg)
19/05/2020
15
Output: Select the users who can reach user u1 in less than
3 hops (i.e., at most two edges) ▪ Do not consider u1 itself
For each of the selected users, store in the output folder his/her name and the minimum number of hops to reach user u1 ▪ One user per line
▪ Format: user name, #hops to user u1
Use the CSV format to store the result
29
30
Input graph example
u1 Alice
34
u6 Adel
36
u5 Paul
32
u4 David
29
u3 John
30
U2 Bob 36
u7 Eddy
60
friend
friend
friend friend
follow
follow follow
follow
friend
friend
friend
friend
![Page 16: Hadoop architecture and ecosystem - polito.it...2020/05/04 · GraphFrame Input: The textual file vertexes.csv It contains the vertexes of a graph Each vertex is characterized by](https://reader036.vdocuments.site/reader036/viewer/2022071411/610641f86ca33b2fa111cc12/html5/thumbnails/16.jpg)
19/05/2020
16
Result
31
name numHops
Bob 1
John 2
David 1
Paul 1