on the varieties of clouds for data intensive computing

24
On the Varieties of Clouds for Data Intensive Computing 董董董 1098308101 董董董 @ Antslab Robert L. Grossman University of Illinois at Chicago And Open Data Group Yunhong Gu University of Illinois at Chicago

Upload: roscoe

Post on 23-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Robert L. Grossman University of Illinois at Chicago And Open Data Group. Yunhong Gu University of Illinois at Chicago. On the Varieties of Clouds for Data Intensive Computing. 董耀文 1098308101 碩資工一甲 @ Antslab. Outline. What is Cloud ? Types of Clouds - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On the Varieties of Clouds for  Data Intensive Computing

On the Varieties of Clouds for Data Intensive Computing

董耀文1098308101 碩資工一甲 @ Antslab

Robert L. GrossmanUniversity of Illinois at

ChicagoAnd Open Data Group

Yunhong GuUniversity of Illinois at Chicago

Page 2: On the Varieties of Clouds for  Data Intensive Computing

Outline What is Cloud ? Types of Clouds Clouds provide on-demand computing

capacity Experimental Studies Research Questions

Page 3: On the Varieties of Clouds for  Data Intensive Computing

What is Cloud?

Page 4: On the Varieties of Clouds for  Data Intensive Computing

What is Cloud? An infrastructure. At scale and reliability of a data center. Provides resources or services.

Google Hadoop Amazon’s EC2

Page 5: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds

Page 6: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Architecural Model

Loosely coupled commodity computers. Computing instances on demand.

Amazon’s EC2 US $0.10 /1hr 1.0~1.2GHz 2007 Opteron or Xeon processor 1.7 GB memory 160GB disk Moderate I/O performance

Page 7: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Tightly Coupled Loosely coupled

Page 8: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Architecural Model

Computing capacity on demand Google’s MapReduce

TeraSort use1800 machines. 2GHz Xeon process 4GB memory 2 * 160GB IDE disks

TeraSort Sort 100-byte records. ( 1TB data ) Required ~= 891s

1010

Page 9: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Architecural Model

Open source Eucalyptus (Elastic Utility Computing Architecture

for Linking Your Programs To Useful Systems) University of California, Santa Barbara Linux & Xen

Hadoop Hadoop MapReduce HDFS HBase

Page 10: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Programming Model

On-demand computing support any computing model compatible with loosely coupled clusters. Amazon EC2 MapReduce : < key, value >

Map : map each < key, value >pair into a new pair of < key,value >

Reduce : merges values with the same key Sector/Sphere : User Defined Function(UDF)

Page 11: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Management Model

Internal vs Hosted Private vs Shared Combinations(Hybrid)

Page 12: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds Payment Model

Pay as you go. Buy. make arrangements with a third party to pay

for the exclusive use of cloud resources for a specified period of time.

Page 13: On the Varieties of Clouds for  Data Intensive Computing

Types of Clouds What’s New?

New scale. Hadoop Google

New simplicity clouds provide. Amazon’s EC2,S3

AMI(Amazon Machine Image)

Page 14: On the Varieties of Clouds for  Data Intensive Computing

Clouds provide on-demand computing capacity Google Cloud

GFS (Google File System) MapReduce BigTable

Page 15: On the Varieties of Clouds for  Data Intensive Computing

Clouds provide on-demand computing capacity MapReduce

<key,value> m(Bear)

m(River)

Page 16: On the Varieties of Clouds for  Data Intensive Computing

Clouds provide on-demand computing capacity

Page 17: On the Varieties of Clouds for  Data Intensive Computing

Clouds provide on-demand computing capacity UDT(UDP-based Data Transfer)

Designed for extremely high speed networks. Concurrent UDT flows share the available

bandwidth fairly. Resides completely at the application level. User defined congestion control algorithms. Easier to traverse the firewall.

Page 18: On the Varieties of Clouds for  Data Intensive Computing

Experimental Studies

Page 19: On the Varieties of Clouds for  Data Intensive Computing

Experimental Studies 10 Gb/s networks. 30 DELL 1435 computer

4G memory 1TB disk 2.0GHz dual-core ADM Opteron 2212 1Gb/s NIC

Page 20: On the Varieties of Clouds for  Data Intensive Computing

Experimental Studies

hadoop sector

Page 21: On the Varieties of Clouds for  Data Intensive Computing
Page 22: On the Varieties of Clouds for  Data Intensive Computing

Experimental Studies CreditStone

credit card transactions. flags some of the transactions.

Page 23: On the Varieties of Clouds for  Data Intensive Computing

Research Questions Quite easy to use. Develop appropriate network protocols,

architectures and middleware for wide area clouds.

How different clouds can interoperate. Develop standards and standards based

architectures for cloud services. Alternate storage Compute Table services

Page 24: On the Varieties of Clouds for  Data Intensive Computing

THANKS