hama: an efficient matrix computation with the mapreduce framework sangwon seo, edward j. woon,...

32
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng IEEE 2007 Dec 3, 2014 Kyung-Bin Lim

Upload: beverley-mckinney

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul MaengIEEE 2007

Dec 3, 2014Kyung-Bin Lim

Page 2: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

2 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 3: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

3 / 35

Apache HAMA

Easy-of-use tool for data-intensive scientific computation Massive matrix/graph computations are often used as primary

functionalities Fundamental design is changed from MapReduce with matrix

computation to BSP with graph processing Mimic of Pregel running on HDFS

– Use zookeeper as a synchronization barrier

Page 4: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

4 / 35

Our Focus

This paper is a story about previous version 0.1 of HAMA– Latest version: 0.7.0, Mar. 2014 released

Only Focus on matrix computation with MapReduce Shows simple case studies

Page 5: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

5 / 35

The HAMA Architecture

We propose distributed scientific framework called HAMA (based on HPMR)– Provide transparent matrix/graph primitives

Page 6: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

6 / 35

The HAMA Architecture

HAMA API: Easy-to-use Interface HAMA Core: Provides matrix/graph primitives HAMA Shell: Interactive User Console

Page 7: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

7 / 35

Contributions of HAMA

Compatibility– Take advantage of all Hadoop features

Scalability– Scalable due to compatibility

Flexibility– Multiple Compute Engines Configurable

Applicability– HAMA’s primitives can be applied to various applications

Page 8: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

8 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 9: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

9 / 35

Case Study

With case study approach, we introduce two basic primitives with MapReduce model running on HAMA– Matrix multiplication and finding linear solution

And compare with MPI versions of these primitives

Page 10: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

10 / 35

Case Study

Representing matrices– As a defaults, HAMA use HBase (NoSQL database)

HBase is modeled after Google’s Bigtable Column oriented, semi-structured distributed database with high scalability

Page 11: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

11 / 35

Case Study – Multiplication: Iterative Way

Iterative approach (Algorithm)

Page 12: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

12 / 35

Case Study – Multiplication: Iterative Way

Simple, naïve strategy

Works well with sparse matrix

Sparse matrix: most entries are 0

Page 13: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

13 / 35

Multiplication: Iterative Way

Page 14: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

14 / 35

Multiplication: Iterative Way

Page 15: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

15 / 35

Multiplication: Iterative Way

Page 16: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

16 / 35

Multiplication: Iterative Way

Page 17: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

17 / 35

Multiplication: Iterative Way

Page 18: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

18 / 35

Multiplication: Iterative Way

Page 19: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

19 / 35

Case Study – Multiplication: Block Way

Multiplication can be done using sub-matrix

Works well with dense matrix

Page 20: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

20 / 35

Case Study – Multiplication: Block Way

Block Approach– Minimize data movement (network cost)

Page 21: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

21 / 35

Case Study – Multiplication: Block Way

Block Approach (Algorithm)

Page 22: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

22 / 35

Case Study – Finding Linear Solution

Ax =b– x = ?

A: known square symmetric positive-definite matrix b: known vector

Use Conjugate Gradient approach

Page 23: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

25 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method– Find a direction (conjugate direction)– Find a step size (Line search)

Page 24: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

26 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method (Algorithm)

Page 25: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

27 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 26: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

28 / 35

Evaluations

TUSCI (TU Berlin SCI) Cluster– 16 nodes, two Intel P4 Xeon processors, 1GB memory– Connected with SCI (Scalable Coherent Interface) network interface in a 2D

torus topology– Running in OpenCCS (similar environment of HOD)

Test sets

Page 27: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

29 / 35

HPMR’s Enhancements

Prefetching– Increase Data Locality

Pre-shuffling– Reduces Amount of intermediate outputs to shuffle

Page 28: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

30 / 35

Evaluations

The comparison of average execution time and scaleup with Ma-trix Multiplication

Page 29: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

31 / 35

Evaluations

The comparison of average execution time and scaleup with CG

Page 30: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

32 / 35

Evaluations

The comparison of average execution time with CG, when a single node is overloaded

Page 31: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

33 / 35

Outline

Introduction Methodology Experiments Conclusion

Page 32: HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng

34 / 35

Conclusion

HAMA provides the easy-of-use tool for data-intensive computa-tions– Matrix computation with MapReduce– Graph computation with BSP