dynamic community detection for large-scale e-commerce data with spark streaming and graphx-(ming...
TRANSCRIPT
![Page 1: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/1.jpg)
Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX
Ming Huang
Meng Zhang, Bin Wei GuangYuan Huang, Jinkui Shi
![Page 2: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/2.jpg)
Community Detection
Scenarios
• VIP Customer • Reputation Escalator • Fraud Seller • ………
Algorithms
• LPA • GN • Fast Unfolding • …….
![Page 3: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/3.jpg)
How to make it Dynamic? Static Communities
Streaming Data
Make sophisticated, real-time decisions
![Page 4: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/4.jpg)
Definition & Solution Dynamic Community Detection
1. Decide New Node’s community 2. Update Graph Physical Topology 3. Effect communities and modularity
Spark Streaming + GraphX à Streaming Graph
REAL-TIME
![Page 5: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/5.jpg)
Streaming Graph
Edges DStream
Graph DStream
merge merge merge
Stock Graph
… … …
![Page 6: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/6.jpg)
Models and Algorithms
![Page 7: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/7.jpg)
Quick Overview of Fast Unfolding
Modularity: !Q= 1
2mAij *
kikj
2m
⎡
⎣⎢⎢
⎤
⎦⎥⎥i,j
∑ δ ci ,cj( )!Q= Q
ii
c
∑ =in∑
2m)
tot∑2m
⎛⎝⎜
⎞⎠⎟
2⎡
⎣⎢⎢
⎤
⎦⎥⎥i
c∑
![Page 8: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/8.jpg)
Incremental Algorithms
JV(Streaming with RDD )
UMG(Streaming with Graph)
" Union & Modularity Greedy " Join & Vote
![Page 9: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/9.jpg)
JV
A B C
C1 C2 C2 D D D
A B C
D D D
C1 C2 C2
D
C2
join
Vote
incEdgeRDD stockCommunityRDD
D
C2
![Page 10: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/10.jpg)
UMG 1 - Union
A
B
C1
C2
C3
C
(C1 or C2) ?
newGraph = stockGraph.union(incGraph) "
A
B
C
D
![Page 11: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/11.jpg)
UMG 2 - findBestCommunity
A
B
C
D
gain1=G(node(d), community(1))
gain2=G(node(d) , community(2))
C3
incVertexWithNeighbors = newGraph.mapReduceTriplets[Array[VertexData]](collectNeighborFunc, _ ++ _, " # # # # #Some((incGraph.vertices, EdgeDirection.Either)))
idCommunity = incVertexWithNeighbors.map { " case (vid, neighbors) => (vid, findBestCommunity(neighbors)) "}.cache() "
!Ci =Cmaxj G(nodei ,Cj )
!
ΔQ=in∑ +ki,in
2m+
tot+ki∑2m
⎛
⎝⎜⎞
⎠⎟
2⎡
⎣
⎢⎢
⎤
⎦
⎥⎥+
in∑2m
+tot∑
2m⎛
⎝⎜⎞
⎠⎟
2
+ki
2m⎛
⎝⎜⎞
⎠⎟
2⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
C2
C1
![Page 12: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/12.jpg)
UMG 3 - updateCommunities
A
D
B
C
newCommunityRdd = idCommunity.updateCommunities(commuitiyRdd) ""newModularity = newCommunityRdd.map(community=>community.modularity).reduce(_+_) "
C1
C2
!Q= Q
ii
c
∑ =in∑
2m)
tot∑2m
⎛⎝⎜
⎞⎠⎟
2⎡
⎣⎢⎢
⎤
⎦⎥⎥i
c∑
(Q1, Q2)
![Page 13: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/13.jpg)
edgeStreamRDD.foreachRDD { " incEdgeRdd => { " val incGraph = buildIncGraph(incEdgeRdd) " (communityInfoRDD, modularity) = streamingFU.trainOn(incGraph) " outputToHBase(communityInfoRDD) " outputToHBase(modularity) " edgeRdd " } "} "
Flow Example Code
ssc.start() "ssc.awaitTermination() "
val conf = new SparkConf().setMaster(……).setAppName(……) "val ssc = new StreamingContext(conf, Seconds(60)) """val totalGraph = initGraph(totalEdgesRdd) "Val streamingFU = new StreamingFU().setTotalGraph(totalGraph) ""val onlineDataFlow = getDataFlow(ssc.sparkContext) "val edgeStreamRDD = ssc.queueStream(onlineDataFlow, true) ""
![Page 14: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/14.jpg)
Experiment Results
![Page 15: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/15.jpg)
Autonomous Systems Graphs
Stanford Large Network Dataset Collection(as-733) https://snap.stanford.edu/data/
![Page 16: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/16.jpg)
Modularity Trend – AS
![Page 17: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/17.jpg)
Online Trading Graph
Buyer Seller
C-C
![Page 18: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/18.jpg)
Modularity Trend – OT
Streaming Graph à Better Result
![Page 19: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/19.jpg)
Key Points
" Operator " Merge Small graph into Large graph
" Model " Local changes " Index or summary
" Algorithm " Delicate formula " Commutative law & Associative law " Parallelly & Incrementally
![Page 20: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/20.jpg)
Complex GraphX Operators
![Page 21: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/21.jpg)
Graph Union Operator
GRAPH(H) GRAPH(G)
∪ = �
E
F
G
H
B
C
D E
F
A
B
C
DE
F
A
H
G
GRAPH(G U H)
Graph Union Operator https://issues.apache.org/jira/browse/SPARK-7894""
[GraphX] Complex Operators between Graphs: Union https://github.com/apache/spark/pull/6685""
newGraph = stockGraph.union(incGraph) "
![Page 22: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/22.jpg)
Complex GraphX Operators
" Union of Graphs ( G ∪ H )
" Intersection of Graphs ( G ∩ H)
" Graph Join
" Difference of Graphs(G – H)
" Graph Complement
" Line Graph ( L(G) )
Issues:"
Complex Operators between Graphs https://issues.apache.org/jira/browse/SPARK-7893"
![Page 23: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/23.jpg)
Streaming Optimization
![Page 24: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/24.jpg)
Monitoring and Correction
Ω
Data Loading Modularity Threshold Checking Streaming-FU
FastUnfolding
[Hourly Monitoring] [Streaming]
[Daily Running]
FastUnfolding
communityID � communityInfo �
community1 � (in1,tot1,degree1,modularity1) �
…… � …… �
mTime mValue
timestamp1 totalModularity1
…… ……
modularityTable commRDDTable
![Page 25: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/25.jpg)
Streaming Resource Allocation
• Driver-Memory: 20G • Executors: 100 • Core: 2 • Executor-Memory: 20G
Not Enough for Peak Period!
![Page 26: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/26.jpg)
Streaming Buffer
Kafka Stream
Hdfs Stream
Join
StreamingFUModel
Streaming-
FU Stream
ing-Buffer
TT Receiver
Split
HDFS
Modularity Correction Buffer
Resource Peak Buffer
Kafka Buffer Writer
![Page 27: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/27.jpg)
Conclusion
" Streaming Graph " Complex Operators will help " Daily Rebuild & Threshold Check " Costs more memory and time
" Open Question checkpoint with Streaming or Graph?
![Page 28: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/28.jpg)
Acknowledgements
1. Limits of community detection " http://www.slideshare.net/vtraag/comm-detect
2. Community Detection " http://www.traag.net/projects/community-detection/
3. Social Network Analysis " http://lorenzopaoliani.info/topics/
4. Community detection in complex networks using Extremal Optimization " http://arxiv.org/pdf/cond-mat/0501368.pdf
![Page 29: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/29.jpg)
" Q & A
![Page 30: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/30.jpg)
Agenda
" Dynamic Community Detection
" Streaming Graph
" Models and Algorithms
" Complex GraphX Operators
" Streaming Optimization
" Conclusion
![Page 31: Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX-(Ming Huang, Taobao)](https://reader034.vdocuments.site/reader034/viewer/2022052413/55cd1707bb61ebd15c8b4681/html5/thumbnails/31.jpg)
Static vs. Dynamic Static Model Dynamic Model