cloudrank d a benchmark suite for private cloud...

43
INSTITUTE OF COMPUTING TECHNOLOGY CloudRankDA Benchmark Suite for Private Cloud Systems Jing Quan, Jianfeng Zhan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China 1 http://prof.ict.ac.cn/CloudRank

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

INS

TITUTE O

F CO

MP

UTIN

G TEC

HN

OLO

GY

CloudRank‐D:A Benchmark Suitefor Private Cloud Systems 

Jing Quan, Jianfeng Zhan

Institute of Computing Technology, Chinese Academy of Sciences and University of Science 

and Technology of China

1

http://prof.ict.ac.cn/CloudRank

Page 2: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Contents

• Background & Motivation

• Introduction of CloudRank‐D

• Use cases

Page 3: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Contents

• Background & Motivation

• Introduction of CloudRank‐D

• Use cases

Page 4: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

What is Private Cloud ?• Private Cloud

– The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.    "The NIST Definition of Cloud Computing" National Institute of Standards and Technology. Retrieved 24 July 2011

http://blogs.technet.com/b/yungchou/archive/2011/03/21/what‐is‐private‐cloud.aspx

Page 5: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Typical Data Processing ApplicationRecommender Recommender 

systermSocial 

Network …… Search Engine

Hadoop Master Node

Job Production

Client Front End

MapReduce Jobs

JobDeployment

Scheduler

Job flowFramework

Node Node …… Node

HDFS

Page 6: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

User Concerns

Xeon

Xeon

Xeon

Xeon

Atom

Atom

Atom

Atom

How to quantitatively measure systems?Which one is better (ranking systems)?How to guide optimization?

Page 7: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

What is CloudRank‐D?

CloudRank‐DPrivate cloud systems

Ranking systemsData processing

General DescriptionCloudRank‐D is a benchmark suite,  used to evaluate 

private cloud systems that is shared for running data processing applications.

Page 8: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Why CloudRank‐D?Benchmark Target of Evaluation

MineBench Data mining algorithms

GridMix Hadoop framework

HiBench Hadoop framework

WL suite Hadoop framework

CloudRank‐D The whole system

Page 9: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Our Focus: Evaluating the Whole System

Applications(Data analysis)

Framework(Hadoop)System platform

System platform

Default framework(Hadoop)

Applications(Data analysis)

Performanc of Software &  Hardware

CloudRank‐DGridMix etc. 

Hadoop Performance

vs

Page 10: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Comparison of Different Benchmarks SuitesMine‐Bench

Grid‐Mix HiBench

WL suite

CloudSuite CloudRank‐D

Representa‐tive 

applications

Basic operations n y y y n y

Classification y n y n y y

Clustering y n y n n y

Recommend‐ation n n n n n y

Sequence learning y n n n n y

Association rule mining y n n n n y

Data warehouse operations

n n n y n y

Page 11: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Comparison of Different Benchmarks Suites(Cont')MineBench

GridMix HiBench WL 

suite CloudSuite CloudRank‐D

Workloads description

Submissionpattern

n n n y n y

Scheduling strategies n n n n n y

System software 

configuration n n n n n y

Data models n n n n n y

Data semantics n n n n n y

Scalable data size y y n y n y

Category of datacentric computation n n n y n y

Page 12: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Contents

• Background & Motivation

• Introduction of CloudRank‐D– Methodology

• Use cases

Page 13: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

CloudRank‐D Methodology

Systemplatform

Workloadswith usage patterns

Performancereports

running

feedback

Get the peak systemperformance

Ⅰ.Measure systemsⅡ.Find a suitable systemⅢ.Optimize systems

Page 14: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Configurable Workloads with Tunable Usage Patterns

Scalable applications andinput datasets

Tunable submission  patterns

Configurable runtime system

• Representive applications domains• User specific• Scalable data size

• Modeling production system logs 

• Experiences from industry and academic

Page 15: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Usage patterns

Scalable applications andinput data sets

Tunable submission  patterns

Configurable framework

CloudRank‐D Methodology:Workloads with Usage Patterns

Page 16: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Scalable Applications and Input Data Sets

Scalable applications and input data sets

Submitted jobs composed of appropriate applications

Expanded data sets

Page 17: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

NO. Category Application Data size Data semantics

1basic  

operation

sort

scalable(scale to 10PB)

automatically generated

2 word count3 grep

4

classification

naive bayes

5support vector machine

Scientist Search

6 cluster k‐means scalable sougou corpus

7 recommendation

Item based collaborative 

filteringscalable ratings on movies

Applications and Input Data Sets 

Page 18: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

NO. Category Application Data size Data semantics

8 association rule mining

frequent pattern growth fixed

retail market basket dataclick‐stream data , traffic accident data, collection of 

web html documents

9 sequence learning

hidden morkov model

scalable

Scientist Search

10

warehouse operation

grep select

automatically generated table

11 ranking select

12 aggregation

13 uservisits‐ranking join

Applications and Input Data Sets (Cont')

You can add any applications you want !

Page 19: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Applications Combinations Demonstration

WebCrawlingDataMining

MachineLearningImageProcessing

Naive Bayes & SVMHMM & 

IBCF & FPG

35%

TextIndexingLogProcessing

Basic Operations 31%

ReportingDataStorage

Hive 34%

wiki.apache.org/hadoop/PoweredBy

Page 20: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Data Set Sizes Demonstration

Map Number Percentage Size

<10 40.57% 128MB~1.25GB

10~500 39.33% 1.25GB~62.5GB

500~2000 12.03% 63.5GB~250GB

>2000 8.07% 250GB~

Workload Characterization on a Production Hadoop Cluster: A Case Study on Taobao

Page 21: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Usage patterns

Scalable applications andinput data size

Tunable submission  patterns

Configurable framework

Workloads with usage patterns

Page 22: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Submission Patterns

Submissionpatterns

Submission intervals

Submission orders

Page 23: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Submission Intervals

Form the Facebook report, distribution of inter‐arrival times was roughly exponential with a mean of 14 seconds.Ddelay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceeding In Proceedings of the 5th European conference on Computer systems.

Probability density function

Page 24: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Submission Orders

• For the workloads with different resource sizes and different catelogs– Submitting jobs randomly– Submitting jobs with batch model 

Page 25: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Usage patterns

Scalable applications andinput data size

Tunable submission  patterns

Configurable framework

Workloads with usage patterns

Page 26: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Hadoop Configurations

Dimensions Explanation

Map/Reduce Number affect  system utilization 

Scheduling Policy  Hadoop chooses which job to run according to this policy

Main Parameters

mapred.tasktracker.map.tasks.maximummapred.tasktracker.reduce.tasks.maxmummapred.child.java.optsdfs.block.size

Page 27: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Hadoop Settings

Parameter Value

Mapred.tasktracker.tasks.reduce.maximum

usually, this value is equal to the core number of current node

dis.block.size default value is 64M, you can change it to ensure there won't be too much map number for most 

workloads

Map (adjust through the block size)

10~100 per node, and it's would be better if the execution time was more than 1min

Page 28: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Scheduling Policy

• Common schedule algorithms– First input first out– Fair‐share scheduler– Capacity scheduler

• Fair‐share scheduling can do a good job 

Workload Characterization on a Production Hadoop Cluster: A Case Study on Taobao

Page 29: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

CloudRank‐D methodology:Our metrics

• Focus– From user perspective– Easy to compare and understand 

• Metrics– Data processed per second or  joule

• How to get it?DPS=Total data input size/Total run timeDPJ=Total data input size/Total energy consumption

Page 30: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Contents

• Background & Motivation

• Introduction of CloudRank‐D

• Use cases

Page 31: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

How to use?

CloudRank‐D

Page 32: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Use Case 1: Comparing Two Hardware Platforms

Cluster 1 Cluster 2Xeon

Xeon

Xeon

Xeon

Atom

Atom

Atom

Atom

Two clusters comprise 128 nodes respectively.

Page 33: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

ProceduresStep 1Prepared hardware platform

Step 1Build foundation platform

Step 2Customize workloads

Step 3Run workloads

Step 4Get results and optimize systems

Page 34: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Base Information

• Evaluating two private cloud systems 

• Using all workloads we provide

• Deploying uniform software platform

• Adopting same configuration

Page 35: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Software Configuration

software stack

Hadoop version 0.20.2

Hive version 0.6.0

Mahout version 0.6

map/reduce slot 4 map slots and 2 reduce slots

Hadoop system configuration default

Hadoop scheduling algorithm  fair schedule 

Page 36: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Run your workloads

Job Submission Patterns

You can submit the workloads according to the exponential distribution with a specified mean submission interval ‐‐‐ 14 seconds

Submission order : Random 

Page 37: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

An example of result

The comparion between Xeon Atom on two metrics

• Xeon– less time, more energy

• Atom– more time, less energy

AtomXeon

Total data processed 

per secon

d (KB/S)

4000

2000

0Xeon Atom

10

0Total data processed 

per jou

le (K

B/J)

5

Page 38: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Optimized (Cont')• Tuning the interval

We can see that the best performance occurred when the interval value is 70.

Page 39: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Use Case 2: Scheduling  Evaluation

39/

I have designed a new Hadoop scheduling algorithm, but I don’t  have the workloads for  test.

How to evaluate the scheduling ? And let people trust the evaluations results.

Page 40: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Using CloudRank‐D Step 1

Building foundation platform with different scheduling  policy 

Step 1Build foundation platform

Step 2Customizing workloads with productive scenarios

Step 3Running workloads

Step 4Getting the metrics under different scheduling  policy

Page 41: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Our Result5

0Total data processed pe

r jou

le (K

B/J)

Fair scheduler FIFO scheduler

4000

2000

0

Total data processed pe

r secon

d (KB/S)

Fair scheduler FIFO scheduler

We can see that fair scheduler works better than FIFO scheduler.

Page 42: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

• Contact us– Websit:  http://prof.ict.ac.cn/CloudRank/

Page 43: CloudRank D A Benchmark Suite for Private Cloud Systemsprof.ict.ac.cn/DComputing/uploads/2013/DC_5_2_CloudRank-D.pdf · HPCA 2013 HVC Tutorial Hadoop Configurations Dimensions Explanation

HVC TutorialHPCA 2013

Thanks