72130310 임규찬. 1. abstract of paper 1.abstract of paper 2.reference of paper – late 2....
TRANSCRIPT
![Page 1: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/1.jpg)
TarazuOptimizing MapReduce On Heterogeneous Clusters
72130310 임규찬
![Page 2: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/2.jpg)
1. Abstract of Paper1. Abstract of paper2. Reference of paper – LATE
2. Introduction3. Issue with Heterogeneity4. Tarazu5. Experimental Result
목차
![Page 3: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/3.jpg)
Heterogeneous Cluster 환경에서 MapReduce 기법의 최적화를 연구함 .◦ 데이터 센터 규모의 클러스터 환경에서 경제적 이유로 Het-
erogeneous 를 도입하고 있음 .◦ MapReduce 기법을 통한 BigData 처리가 가능해짐 .
기존의 기법으로는 성능이 오히려 떨어졌음 .◦ Straggler task Managing 이용한 기존 연구는 효과 없음
그 예시로써 Improving MapReduce Performance in Heterogeneous Environments 논문을 비교함 .
Abstract of Paper
![Page 4: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/4.jpg)
Straggler Task 제어를 통한 Heterogeneous 최적화◦ Node is available but is performing poorly Condition ◦ Can arise many reason, faulty hardware and mis-
configuration LATE Scheduler 제안
◦ Longest Approximate Time to End◦ Task 별 Progress rate 를 이용함
P rogressScore/Amount of time the task Unfortunately, LATE alone is not sufficient to
address hardware heterogeneity.
Reference of PaperImproving MapReduce Performance in Heterogeneous Environments
![Page 5: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/5.jpg)
‘ 균형’을 뜻하는 힌디어◦ MapReduce 연산에 있어서 균형을 추구하도록 설계
Introduction - Tarazu(तरा�जू�)
![Page 6: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/6.jpg)
대용량 데이터를 분산 컴퓨팅 환경에서 병렬처리 하도록 만들어진 프레임워크◦ Homogeneous cluster 에 최적화 .
Introduction -MapReduce
![Page 7: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/7.jpg)
서로 다른 코어로 이루어진 시스템을 이용한 Computing◦ CPU/GPU 를 이용한 GPGPU
CPU/GPU 각각의 장점을 극대화하여 성능 향상을 꾀함 . OpenCL, CUDA, DirectCompute 등 존재 . 본 논문에서 다루지 않음
◦ High/Row Node 를 이용한 Clustering 전력 , 가격 등 금전적인 요소에서의 최적화 본 논문에서 10 개의 Xeon Node, 80 개의 Atom Node 사용
Introduction -Heterogeneous Computing
![Page 8: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/8.jpg)
Four phase Excution Model◦ Map computation
produces <Key, Value> tuple◦ Shuffle
all Map to all Reduce personalized Communication◦ Sorts
Grouping all the tuples for same Key◦ Reduce computation
Processes all the tuples for a key & produce final output
Issue with Heterogeneity-Background : MapReduce
![Page 9: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/9.jpg)
Dynamic Load-balancing in MapReduce◦ Slower nodes fewer tasks/faster nodes more tasks
Heterogeneity is slow than Homogeneity◦ 20-75% slower for six out of eleven benchmarks.◦ Heterogeneity can be degrades performance
Poor performance is due to two Key factors◦ Non-intuitive◦ Other intuitive
Issue with Heterogeneity-Reasons for poor performance on heterogeneous clusters
![Page 10: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/10.jpg)
Factor 1 : Non-intuitive◦ Interaction between load balancing
and network traffic
In Heterogeneous, cause remote task◦ Xeon is fast, Atom is slow. So Xeon stole Atom task◦ Remote task can Network Traffic◦ Network Traffic is exacerbated heavy Shuffle
Issue with Heterogeneity-Reasons for poor performance on heterogeneous clusters
![Page 11: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/11.jpg)
Factor 2 : Intuitive◦ Reduce phase imbalance amplified by heterogeneity
Reduce phase load imbalance ◦ Different processing speeds cause long time
Issue with Heterogeneity-Reasons for poor perfermance on heterogeneous clusters
![Page 12: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/12.jpg)
Issue with Heterogeneity-A Simple(?) analytical model
Map Finish Time(High/Low System 중 Map 연산이 늦게 끝나는 시간값 )
Number of input data in bisection(Remote Task 로 인한 데이터 + 셔플 데이터 )
Shuffle Finish Time(Remote task 로 인한 시간 혹은 MFT)
Reduce Finish Time(Remote task 로 인한 시간 혹은 MFT)
![Page 13: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/13.jpg)
Two problems in MapReduce◦ Map-side built-in load balancing results in remote Map◦ Reduce-side load imbalance across the nodes
Tarazu consist of three components◦ Communication-Aware Load Balancing of Map computa-
tion◦ Communication-Aware Scheduling of Map computation◦ Predictive Load Balancing of Reduce computation
Tarazu
![Page 14: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/14.jpg)
Based on key observation◦ Due to the overlap between Map computation and Shuf-
fle
In Shuffle is critical, ‘no-steal mode’◦ Pick up remote task when Shuffle end
There are no remote Map tasks to compete with Shuffle Reduce the I/O Processing overhead Slower nodes perform more work
Tarazu- Communication-Aware Load Balancing of Map computation
![Page 15: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/15.jpg)
In Map Computation is Critical, ‘task-steal mode’◦ Concern of CAS.
CALB’s mode change using shuffleLag◦ Using MapReduce monitor for fault tolerance
Diffence of number of Map task that have completed their computation Have completed their communication
in all nodes◦ Deciding the Source of criticality once is enough
without repeated, dynamic check.
Tarazu- Communication-Aware Load Balancing of Map computation
![Page 16: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/16.jpg)
Determine how many remote tasks needed◦ Using in CALB ‘task-steal’ mode◦ Using to avoid increase SFT
To avoid traffic, CAS spreads out the remote task by interleaving them with local task
Tarazu- Communication-Aware Scheduling of Map computation
![Page 17: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/17.jpg)
CAS has other benefits◦ By interleaving remote tasks with local tasks,
CAS achieves better overlap between remote task communication and local task computation on both sender and receiver sides
◦ Remote tasks read input data faster by avoiding bursts
Tarazu- Communication-Aware Scheduling of Map computation
![Page 18: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/18.jpg)
Better load balance in the Reduce phase◦ Skewing the intermediate key distribution◦ Reduce max term RFT
Each Reduce task save number of fast/slow nodes.
Tarazu- Predictive Load Balancing of Reduce computation
![Page 19: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/19.jpg)
Using Heterogeneous Cluster Environment◦ 10 Xeon-based/80 Atom-based server nodes
Using Hadoop 0.20.2 Compare another solution, LATE
Experimental Methodology
![Page 20: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/20.jpg)
Heterogeneous 기법을 통한 시스템 장점 극대화◦ Shuffle-Critical 의 경우에는 Atom 의 물량 반영◦ Map-Critical 의 경우에는 Xeon 의 성능 반영
Experimental Result-Performance
![Page 21: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/21.jpg)
Experimental Result-Effect of CALB, CAS and PLB
![Page 22: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/22.jpg)
Experimental Result-Sensitivity to extent of heterogeneity
![Page 23: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/23.jpg)
Experimental Result-Effect of skewed input data dist.
![Page 24: 72130310 임규찬. 1. Abstract of Paper 1.Abstract of paper 2.Reference of paper – LATE 2. Introduction 3. Issue with Heterogeneity 4. Tarazu 5. Experimental](https://reader033.vdocuments.site/reader033/viewer/2022052701/56649dde5503460f94ad68ab/html5/thumbnails/24.jpg)
Improving MapReduce Performance in Heterogeneous Environments –University of California, Berkeley
https://developers.google.com/appengine/docs/python/dataprocessing/ http://www.cpubenchmark.net/
Reference