parallel algorithms for trillion edges graph problems

24
Parallel Algorithms for Trillion Edges Graph Problems (Credit: F. Petrini research group IBM Research) Alexander Pozdneev Research Software Engineer, IBM October 16, 2015 — DAMDID-2015

Upload: alexander-pozdneev

Post on 23-Jan-2017

233 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Parallel Algorithms for Trillion Edges Graph Problems

Parallel Algorithms for Trillion Edges Graph Problems(Credit: F. Petrini research group — IBM Research)

Alexander PozdneevResearch Software Engineer, IBMOctober 16, 2015 — DAMDID-2015

Page 2: Parallel Algorithms for Trillion Edges Graph Problems

Big Data challenges

• Big Data, The New Natural Resource• How to enrich?• Super-abundance of information ⇒ Dark Data• Need to be cognitive

2 c© 2015 IBM Corporation

Page 3: Parallel Algorithms for Trillion Edges Graph Problems

Graphs are everywhere

• NetworksI socialI computerI mobileI supply chainI roadI . . .

• Internet of Things (sensors)• Tweets• Purchase transactions• Cell phone GPS signals• . . .

3 c© 2015 IBM Corporation

Page 4: Parallel Algorithms for Trillion Edges Graph Problems

Graphs processing on computer systems

• Huge datasets• Locality assumptions of current architectures• Irregular access patterns, benefits of caching and prefetching• Very little computation per byte loaded ⇒ latencies are harder to hide• Small network packets, random destinations, very high rate

4 c© 2015 IBM Corporation

Page 5: Parallel Algorithms for Trillion Edges Graph Problems

Breadth first search (BFS)

Performance metric — traversed edges per second, TEPS5 c© 2015 IBM Corporation

Page 6: Parallel Algorithms for Trillion Edges Graph Problems

Scale-free graphs

Power law

P (k) ∼ k−γ

2 < γ < 3

“R-MAT: A Recursive Model for Graph Mining” by D. Chakrabarti et al.

6 c© 2015 IBM Corporation

Page 7: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Evolution of performance

[Checconi2014]

7 c© 2015 IBM Corporation

Page 8: Parallel Algorithms for Trillion Edges Graph Problems

IBM leadership in graph processing: Graph500

Date # System Model Nodes Cores Scale GTEPSJul’15 2 Sequoia Q 96k 1.5M 41 23751

Nov’14 1 Sequoia Q 96k 1.5M 41 23751Jun’14 2 Sequoia Q 64k 1M 40 16599Nov’13 1 Sequoia Q 64k 1M 40 15363Jun’13 1 Sequoia Q 64k 1M 40 15363Nov’12 1 Sequoia Q 64k 1M 40 15363Jun’12 1 Sequoia/Mira Q 32k 512k 38 3541Nov’11 1 BG/Q prototype Q 4k 64k 32 253Jun’11 1 Interpid/Jugene P 32k 128k 38 18Nov’10 1 Interpid P 8k 32k 36 7

http://www.graph500.org

8 c© 2015 IBM Corporation

Page 9: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Breakthrough results

• IBM Blue Gene/Q SequoiaI 96 racksI 96k nodesI 1.5M coresI 6M threads

• R-MATI scale: 41I 2T vertices (T = 1012)

• 23.8T TEPS

9 c© 2015 IBM Corporation

Page 10: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Algorithm

10 c© 2015 IBM Corporation

Page 11: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Techniques and strategies

• 1D decompostion• Compressed adjacency list with a coarse index• Direction optimization• Load balancing• Message compression• Multithreading

11 c© 2015 IBM Corporation

Page 12: Parallel Algorithms for Trillion Edges Graph Problems

BFS: 1D decomposition w/o load balancing

• The vertices are partitionedamong the m computing nodesn0, n1, . . . , nm−1

• The edges incident on thevertices assigned to a computingnode ni are all stored on ni

• Visiting an edge (u, v) involvesthe cooperation andcommunication between theowners of u and v

12 c© 2015 IBM Corporation

Page 13: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Direction optimization

13 c© 2015 IBM Corporation

Page 14: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Impact of direction optimization

Graph properties Number of messages

14 c© 2015 IBM Corporation

Page 15: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Need for load balancing

• MotivationI Large scales ⇒ load balancing

issuesI Irregular nature of the graphI Vertices with huge number of

neighbors

• LB — number of vertices torebalance

• The set H contains LB verticeswith the highest degrees:

I Created during the datadistribution phase

I H-vertices does not follow the1D distribution

15 c© 2015 IBM Corporation

Page 16: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Load balancing

16 c© 2015 IBM Corporation

Page 17: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Algorithms for H-vertices

17 c© 2015 IBM Corporation

Page 18: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Impact of individual optimizations

18 c© 2015 IBM Corporation

Page 19: Parallel Algorithms for Trillion Edges Graph Problems

BFS: Scalability (in the “weak” sense)

Scale: 34–4019 c© 2015 IBM Corporation

Page 20: Parallel Algorithms for Trillion Edges Graph Problems

Summary

• Big Data, Dark Data, Cognitive Computing• Graph models• Traditional architectures and algorithms• IBM leadership in graph processing

I Graph500I IBM Blue Gene/Q Sequoia (96k nodes, 6M threads)I R-MAT scale: 41 (2T vertices), Performance: 23.8T TEPSI Techniques

• 1D decomposition• Direction optimization• Load balancing

I Scalability

20 c© 2015 IBM Corporation

Page 21: Parallel Algorithms for Trillion Edges Graph Problems

References (clickable)

F. Checconi, F. PetriniTraversing Trillions of Edges in Real-time: Graph Exploration onLarge-scale Parallel Machines.Parallel and Distributed Processing Symposium (IPDPS), 2014 IEEEInternational, Phoenix, Arizona, 19-23 May 2014.

21 c© 2015 IBM Corporation

Page 22: Parallel Algorithms for Trillion Edges Graph Problems

Further reads (clickable)

• Parallel algorithms developed at IBM Research for BFS and SSSPproblems, 2014 (in Russian)

• Graph Community Detection Algorithm for Distributed MemoryParallel Computing Systems, 2015

22 c© 2015 IBM Corporation

Page 23: Parallel Algorithms for Trillion Edges Graph Problems

Disclaimer

All the information, representations, statements, opinions and proposals in thisdocument are correct and accurate to the best of our present knowledge but arenot intended (and should not be taken) to be contractually binding unless anduntil they become the subject of separate, specific agreement between us.Any IBM Machines provided are subject to the Statements of Limited Warrantyaccompanying the applicable Machine.Any IBM Program Products provided are subject to their applicable license terms.Nothing herein, in whole or in part, shall be deemed to constitute a warranty.IBM products are subject to withdrawal from marketing and or service uponnotice, and changes to product configurations, or follow-on products, may resultin price changes.Any references in this document to “partner” or “partnership” do not constitute orimply a partnership in the sense of the Partnership Act 1890.IBM is not responsible for printing errors in this proposal that result in pricing orinformation inaccuracies.

23 c© 2015 IBM Corporation

Page 24: Parallel Algorithms for Trillion Edges Graph Problems

Правовая информация

IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International BusinessMachines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотритена узле Web: www.ibm.com/legal/copytrade.shtml.Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживаниядругих компаний.(c) 2015 International Business Machines Corporation. Все права защищены.Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагаетпредоставлять их во всех странах, в которых осуществляет свою деятельность, информация опредоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информациейо продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшееторговое представительство IBM или к авторизованным бизнес-партнерам.Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления.Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованныханонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердитьпроизводительность, совместимость, или любые другие заявления относительно продуктов третьих фирм.Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов.Информация может содержать технические неточности или типографические ошибки. В представленную впубликации информацию могут вноситься изменения, эти изменения будут включаться в новые редакцииданной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты илиуслуги в любое время без уведомления.Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служатподдержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов дляданного продукта IBM.

24 c© 2015 IBM Corporation