hpc with clouds and cloud technologies

33
High Performance Parallel Computing with Clouds and Cloud Technologies Jaliya Ekanayake and Geoffrey Fox School of Informatics and Computing Indiana University Bloomington Cloud Computing and Software Services: Theory and Techniques July, 2010 Presented by: Inderjeet Singh

Upload: inderjeet-singh

Post on 19-Jun-2015

2.130 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: HPC with Clouds and Cloud Technologies

High Performance Parallel Computing with

Clouds and Cloud TechnologiesJaliya Ekanayake and Geoffrey Fox

School of Informatics and Computing Indiana University Bloomington

Cloud Computing and Software Services: Theory and TechniquesJuly, 2010

Presented by:Inderjeet Singh

Page 2: HPC with Clouds and Cloud Technologies

Introduction Problem Data Analysis Applications Evaluations and Analysis Performance of MPI on Clouds Benchmarks and Results Conclusions and Future Work Critique

Overview

Page 3: HPC with Clouds and Cloud Technologies

Introduction

Page 4: HPC with Clouds and Cloud Technologies

Clouds and Cloud Technologies

Apache Hadoop (OpenSource version of Google MapReduce)

DryadLINQ (Microsoft API for Dryad) CGL-MapReduce (Iterative version of MapReduce)

Cloud technologies/Parallel Runtimes/Cloud Runtimes

Page 5: HPC with Clouds and Cloud Technologies

On demand provisioning of resources Customizable Virtual Machines (VM) Root privileges Provisioning is very fast (within minutes) You pay only for what you use Better resource utilization

Advantages of Cloud

Page 6: HPC with Clouds and Cloud Technologies

Cloud Technologies Moving computation to data Better Quality of Service (QoS) Simple communication topologies Distributed file system (HDFS,GFS)

Most HPC applications are based upon MPI Many fine grained communication

topologies Usage of fast network

Features

Page 7: HPC with Clouds and Cloud Technologies

Software framework to support distributed computing on large datasets on cluster of computers

Map step - The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node

Reduce step - The master node collects the answers to all the sub-problems and combines them in some way to form the output or answer

MapReduce

Page 8: HPC with Clouds and Cloud Technologies

Large data/compute intensive applications

Traditional approach Execution on Clusters/grid/supercomputers Moving both application and data to available

computational power Efficiency decreases with large datasets

Better approach Execution with Cloud technologies Moving computations to data to perform processing More data centric approach

Page 9: HPC with Clouds and Cloud Technologies

Comparisons of features supported by different cloud technologies and MPI

Page 10: HPC with Clouds and Cloud Technologies

What applications are best handled by cloud technologies?

What overheads do they introduce? Can traditional parallel runtimes such as

MPI be used in cloud? If so, what overheads do they have?

Problem

Page 11: HPC with Clouds and Cloud Technologies

Types of Applications (Based upon communication)

Map only (Cap3) Map Reduce (HEP) Iterative/Complex style (Matrix Multiplication and

K-Means Clustering)

Data Analysis Applications

Page 12: HPC with Clouds and Cloud Technologies

Cap3 - Sequence assembly program that operates on a collection of gene sequence files to produce several outputs

HEP - High Energy Physics data analysis application

K-Means clustering - Performs iteratively refining computation of clusters

Matrix Multiplication – Cannon’s algorithm

Page 13: HPC with Clouds and Cloud Technologies
Page 14: HPC with Clouds and Cloud Technologies

Iterative/Complex Style

MapReduce does not support iterative/complex style applications so [Fox] build CGL- MapReduce

CGL-Mapreduce – Supports long running tasks and retains static data in memory across invocations

Page 15: HPC with Clouds and Cloud Technologies

Performance (average running time) Overhead = [P * T(P) – T(1)]/T(1)

P = No. of processes

Evaluation and Analysis

DryadLINQ

Hadoop/CGL MapReduce/MPI

Page 16: HPC with Clouds and Cloud Technologies
Page 17: HPC with Clouds and Cloud Technologies
Page 18: HPC with Clouds and Cloud Technologies

CAP3 (map only) and HEP (mapreduce) perform well with cloud runtimes

K-means clustering (iterative) and matrix multiplications (iterative) show high overheads with cloud runtimes compared to MPI runtime

CGL-Mapreduce also gives less overhead for large datasets

Page 19: HPC with Clouds and Cloud Technologies

Goals Overhead of Virtual Machines (VM) on parallel

applications in MPI How applications with different

communication/computation (c/c) ratio perform on cloud?

Effect of different CPU core assignment strategies on VMs and running these MPI applications on these VMs

Performance of MPI on Private Cloud

Page 20: HPC with Clouds and Cloud Technologies

Three MPI applications with different c/c ratios requirements

Matrix multiplication (Cannon’s algorithm) K-Means clustering Concurrent wave solver

Page 21: HPC with Clouds and Cloud Technologies

Computation and Communication complexities of the different MPI applications used

Page 22: HPC with Clouds and Cloud Technologies

Eucalyptus and Xen based cloud infrastructure

16 nodes with 2 Quad Core Intel Xeon processors and 32 GB of memory

Nodes connected with 1 gigabit Ethernet connection Same s/w configuration for both bare-metal

nodes and VMs• OS - Red Hat Enterprise Linux Server release 5.2• OpenMP version 1.3.2

Benchmarks and Results

Page 23: HPC with Clouds and Cloud Technologies

Different CPU core/virtual machines assignment strategies

Invariant to select the number of MPI processesNumber of MPI processes = Number of CPU cores used

Page 24: HPC with Clouds and Cloud Technologies

Matrix Multiplication (Cannon’s)

◦ Speedup decrease 34% between Bare metal and 8-VM/node at 81 processes

◦ Exchange of large messages and more communication

Speedup – Fixed Matrix size (5184*5184)

Performance – 64 CPU Cores

Page 25: HPC with Clouds and Cloud Technologies

K-Means Clustering

◦ Communication is very less than computations◦ Communication here depends upon number of clusters formed◦ Overhead is large for small data sizes, so less speedup is

observed

Total overhead (Number of MPI Processes =128)

Performance – 128 CPU Cores

Page 26: HPC with Clouds and Cloud Technologies

Concurrent Wave Equation Solver

◦ Amount of communications is fixed, less data transfer rates◦ Lower c/c ratio of O(1/n) leads to more latency and lower

performance on VMs◦ 8-VMs per node has 7% more overhead than bare metal node

Performance – 128 CPU CoresTotal Overhead (Number of MPI Processes = 128)

Page 27: HPC with Clouds and Cloud Technologies

◦ In multi VMs configuration scheduling of I/O operation of DomUs (user domains) happens via Dom0 (privileged OS)

Communication between dom0 and domUs when 1-VM per node is deployed (top). Communication between dom0 and domUs when 8-VMs per node are deployed (bottom)

Page 28: HPC with Clouds and Cloud Technologies

When using mutliple VMs on multi-core CPUs, it is good to use runtimes supporting in-node communications (OpenMP vs LAM-MPI)

Figure: LAM vs. OpenMPI in different VM configurations

Page 29: HPC with Clouds and Cloud Technologies

Cloud runtimes work well for pleasing parallel (map only and mapreduce) applications with large datasets

Overheads of cloud runtimes are high with parallel applications that require iterative/complex communication patterns (MPI based applications)

Work needs to be done on finding algorithms for these applications that are cloud friendly

CGL-MapReduce is efficient for iterative style mapreduce applications (k-means)

Conclusions and Future Work

Page 30: HPC with Clouds and Cloud Technologies

Overheads for MPI applications increase as number of VMs/node increase (22-50% degradation)

In-node communication in important MapReduce applications (not susceptible to

latencies) may perform well on VMs deployed on clouds

Integration of MapReduce and MPI (biological DNA sequencing application)

Page 31: HPC with Clouds and Cloud Technologies

No results of implementation of pleasing parallel applications (Cap3, HEP) with MPI, missing MPI and cloud runtimes time comparisons

Missing evaluations of HPC applications implemented with cloud runtimes on private cloud, which is critical to show the effect of multi VMs/multi-core configurations on performances of these applications

Difference in memory sizes (16/32 GB) for clusters of different OS. This could lead to biased results

Critique

Page 32: HPC with Clouds and Cloud Technologies

Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel Computing with Clouds and Cloud Technologies, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (2010), Pages 20, Volume 34

High Performance Parallel Computing with Clouds and Cloud Technologies. http://www.slideshare.net/jaliyae/high-performance-parallel-computing-with-clouds-and-cloud-technologies

Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce

References

Page 33: HPC with Clouds and Cloud Technologies