簡報人:碩資工一甲 董耀文
DESCRIPTION
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian , Haojie Zhou , Yongqiang He,Li Zha. 簡報人:碩資工一甲 董耀文 . Outline. Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion. Background. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/1.jpg)
A Dynamic MapReduce Scheduler for Heterogeneous Workloads
Chao Tian, Haojie Zhou , Yongqiang He,Li Zha
簡報人:碩資工一甲 董耀文
![Page 2: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/2.jpg)
Outline Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion
![Page 3: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/3.jpg)
Background As the Internet scale keeps growing up,
enormous data needs to be processed in many Internet Service Providers.
MapReduce framework is now becoming a leading example solution, it’s designed for building large commodity cluster, which consist of thousands of nodes by using commodity hardware.
![Page 4: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/4.jpg)
Background The performance of a parallel system like
MapReduce system closely ties to its task scheduler.
Current scheduler in Hadoop uses a single queue for scheduling jobs with a FCFS method.
Yahoo’s capacity scheduler as well as Facebook’s fair scheduler uses multiple queues for allocation differnet resource in the cluster.
![Page 5: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/5.jpg)
Background In practical, different kinds of jobs often
simultaneously run in the data center. These different jobs make different workloads on the cluster, including the I/O-bound and CPU-bound workloads.
![Page 6: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/6.jpg)
Background The characters of workloads are not aware
by Hadoop's scheduler which prefers to simultaneously run map tasks from the same job on the top of queue.
This may reduce the throughput of the whole system which seriously influences the productivity of data center, because tasks from the same job always have the same character.
![Page 7: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/7.jpg)
Question How to improve the hardware utilization rate
when different kinds of workloads run on the clusters in MapReduce framework?
![Page 8: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/8.jpg)
SO! They design a new triple-queue scheduler which consist of
a workload predict mechanism MR-Predict and three different queues (CPU-bound queue, I/O-bound queue and wait queue).
They classify MapReduce workloads into three types, and their workload predict mechanism automatically predicts the class of a new coming job based on this classification.
Jobs in the CPU- bound queue or I/O-bound queue are assigned separately to parallel different type of workloads.
Their experiments show that can Approach could increase the system throughput up to 30%
![Page 9: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/9.jpg)
Related work Scheduling algorithms in parallel system
[11,…] Applications have different workloads
large computation and I/O requirements [10]. How I/O-bound jobs affect system
performance[6]. A gang schedule algorithm which parallel
the CPU- bound jobs and IO-bound jobs to increasing the utilization of hardware[7].
![Page 10: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/10.jpg)
Related work The schedule problem in MapReduce
attracted many attentions[2,10]. Yahoo and Facebook designed schedulers of
Hadoop as capacity scheduler [4] and Fair scheduler [5].
![Page 11: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/11.jpg)
MapReduce procedure analysis
Map-shuffle phase1. Init input data2. Compute map task3. Store ouput result
to local disk4. Shuffle map tasks
result data out5. Shuffle reduce
input data in
![Page 12: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/12.jpg)
MapReduce procedure analysis
Reduce-Compute phase1. tasks run the
application logic
![Page 13: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/13.jpg)
MR-Predict
![Page 14: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/14.jpg)
Schedule policys
![Page 15: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/15.jpg)
Schedule policys
![Page 16: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/16.jpg)
Evaluation Environment
6 node connect gigabyte Etherent. DELL1950
CPU: 2 Quard Core 2.0GHz Memory: 4GB Disk: 2 SATA disk
Input data: 15GB map slots & reduce slot: 8 DIOR: 31.2 MB/s (without reduce phase in
Hadoop)
![Page 17: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/17.jpg)
Evaluation Resource utilizations
TeraSort: Total order sort (sequential I/O )benchmark
8 ( 64MB + 64 MB ) / 8 >= 31.2 MB/s
![Page 18: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/18.jpg)
Evaluation Resource utilizations
Grep-Count: use [.]* as the regular expression.
8 ( 64MB + 1MB + 1MB + SID ) / 92 >= 31.2 MB/s
![Page 19: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/19.jpg)
Evaluation Resource utilizations
WordCount: It splits the input text into words, shuffles every word in map phase and counts its occupation number in reduce phase.
8 ( 64MB + 64 MB + 64MB + SID ) / 35 >= 31.2 MB/s
![Page 20: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/20.jpg)
Evaluation Triple queue scheduler experiments
Every job runs five times & total 15 jobs will run
![Page 21: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/21.jpg)
Conclusion Scheduler correctly distributes jobs into
different queues in most situations. Triple Queue Scheduler could
increase the map tasks throughput 30% save the makespan 20%
![Page 22: 簡報人:碩資工一甲 董耀文](https://reader033.vdocuments.site/reader033/viewer/2022050714/56816339550346895dd3c7bc/html5/thumbnails/22.jpg)
Thank you for listening.