![Page 1: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/1.jpg)
An Overview of Cloud Computing:My Other Computer is a Data Center
Robert GrossmanOpen Cloud Consortium
January 7, 2010
![Page 2: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/2.jpg)
Part 1What is a Cloud?
2
![Page 3: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/3.jpg)
What is a Cloud?
3
Software as a Service (SaaS)
![Page 4: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/4.jpg)
What Else is a Cloud?
4
Platform as a Service (PaaS)
![Page 5: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/5.jpg)
Is Anything Else a Cloud?
5
Infrastructure as a Service (IaaS)
![Page 6: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/6.jpg)
Are There Other Types of Clouds?
6
Large Data Cloud Services
ad targeting
![Page 7: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/7.jpg)
What is Virtualization?
7
![Page 8: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/8.jpg)
Idea Dates Back to the 1960s
Virtualization first widely deployed with IBM VM/370.
8
IBM Mainframe
IBM VM/370
CMS
App
Native (Full) VirtualizationExamples: Vmware ESX
MVS
App
CMS
App
![Page 9: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/9.jpg)
What Do You Optimize?
Goal: Minimize latency and control heat.
Goal: Maximize data (with matching compute) and control cost.
![Page 10: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/10.jpg)
10
Scale is new
![Page 11: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/11.jpg)
Elastic, Usage Based Pricing Is New
11
1 computer in a rack for 120 hours
120 computers in three racks for 1 hour
costs the same as
Elastic, usage based pricing turns capex into opex. Clouds can be used to manage surges in computing needs.
![Page 12: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/12.jpg)
Simplicity Offered By the Cloud is New
12
+ .. and you have a computer ready to work.
A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
![Page 13: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/13.jpg)
Databases Data CloudsScalability 100’s TB 100’s PBFunctionality Full SQL-based queries,
including joinsOptimized access to sorted tables (tables with single keys)
Optimized Databases optimized for safe writes
Clouds optimized for efficient reads
Consistency model
ACID (Atomicity, Consistency, Isolation & Durability) – database always consist
Eventual consistency – updates eventually propagate through system
Parallelism Difficult because of ACID model; shared nothing is possible
Basic design incorporates parallelism over commodity components
Scale Racks Data center13
![Page 14: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/14.jpg)
What Resource is Managed? Scarce processors wait for data
– Manage cycles– wait for an opening in the queue– scatter the data to the processors– and gather the results
Persistent data wait for queries– Manage data– persistent data waits for queries– computation done locally– results returned
Supercomputer Center Model
Data CenterModel
![Page 15: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/15.jpg)
Part 2. Data Centers as the Unit of Computing
“Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions
15
Cloud computing is at the top of the Gartner hype cycle.
![Page 16: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/16.jpg)
experimental science
simulation science
datascience
160930x
1670250x
197610x-100x
200410x-100x
![Page 17: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/17.jpg)
Requirements for Clouds
Scale to Data Centers
Scale Across Data Centers
Support Large Data Flows
Support Security, Auditing
Support Real Time Alerts
Business X X
E-science X X X
Healthcare X X
Defense X X X X X
![Page 18: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/18.jpg)
Transition Taking Place A hand full of players are building multiple data
centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with
semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B
18
![Page 19: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/19.jpg)
Which is the Operating System?
19
workstation
VM 1 VM 5
…
VM 1 VM 50,000
…
Data Center Operating SystemHyperviser
data center
![Page 20: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/20.jpg)
How Do You Program A Data Center?
20
![Page 21: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/21.jpg)
Some Programming Models for Data Centers
Operations over data center of disks– MapReduce (“string-based”)– User-Defined Functions (UDFs) over data center– SQL and Quasi-SQL over data center– Data analysis / statistics over data center
Operations over data center of memory– Grep over distributed memory– UDFs over distributed memory– SQL and Quasi-SQL over distributed memory– Data analysis / statistics over distributed memory
![Page 22: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/22.jpg)
Part 3.Open Cloud Consortium
![Page 23: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/23.jpg)
U.S. 501(3)(c) not-for-profit corporation Supports the development of standards and
interoperability frameworks. Supports reference implementations for
cloud computing. Manages testbeds: Open Cloud Testbed,
Intercloud Testbed, Open Science Data Cloud Develops benchmarks.
23
www.opencloudconsortium.org
![Page 24: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/24.jpg)
OCC Members
Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo
Universities: CalIT2, Johns Hopkins, Northwestern, University of Illinois at Chicago, University of Chicago
Government agencies: NASA Organizations: Sector Project
24
![Page 25: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/25.jpg)
Open Cloud Testbed
Phase 2 9 racks 250+ Nodes 1000+ Cores 10+ Gb/s
25
MREN
CENIC Dragon
Hadoop Sector/Sphere Thrift KVM VMs Eucalyptus
VMs
C-Wave
![Page 26: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/26.jpg)
Intercloud Testbed
Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)– Physical Resources
Platform as a Service– Cloud Compute Services– Data & Storage as a Service
Open Virtualization Format (OVF)
Open Cloud Computing Interface (OCCI)
SNIA Cloud Data Management Interface (CDMI)
Large Data Cloud Interoperability Framework
Dynamic infrastructure service linking IaaS and DaaS
Dynamic infrastructure service naming and linking
entities in the IaaS layers
Working with Infrastructure 2.0 Working Group
Working with Infrastructure 2.0 Working Group
![Page 27: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/27.jpg)
Open Science Data Cloud
27
sky cloud
biocloud
Planning to work with 5 international partners (all connected with 10 Gbps networks).
![Page 28: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/28.jpg)
MalStone (OCC-Developed Benchmark)
MalStone A MalStone BHadoop 455m 13s 840m 50s
Hadoop streaming with Python
87m 29s 142m 32s
Sector/Sphere 33m 40s 43m 44s
Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
![Page 29: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/29.jpg)
Some Lessons Learned (So Far)
Python over Hadoop Distributed File System surprisingly powerful.
Tuning Hadoop can be a large (unacknowledged) cost.
Performance of a cloud computation can be significantly impacted by just 1 or 2 nodes that are a bit slower.
Wide area clouds can be practical in some cases.
29
![Page 30: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/30.jpg)
Part 4. Sector
30
http://sector.sourceforge.net
![Page 31: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/31.jpg)
Sector Overview Sector is fast
– As measured by MalStone & Terasort Sector is easy to program
– Supports UDFs, MapReduce & Python over streams Sector does not require extensive tuning. Sector is secure
– A HIPAA compliant Sector cloud is being set up Sector is reliable
– Sector v1.24 supports multiple master node servers31
![Page 32: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/32.jpg)
Google’s Large Data Cloud
Storage Services
Data Services
Compute Services
32
Google’s Stack
Applications
Google File System (GFS)
Google’s MapReduce
Google’s BigTable
![Page 33: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/33.jpg)
Hadoop’s Large Data Cloud
Storage Services
Compute Services
33
Hadoop’s Stack
Applications
Hadoop Distributed File System (HDFS)
Hadoop’s MapReduce
Data Services
![Page 34: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/34.jpg)
Sector’s Large Data Cloud
Storage Services
Compute Services
34
Sector’s Stack
Applications
Sector’s Distributed File System (SDFS)
Sphere’s UDFs
Routing & Transport Services
UDP-based Data Transport Protocol (UDT)
Data Services
![Page 35: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/35.jpg)
Generalization: Apply User Defined Functions (UDF) to Files in Storage Cloud
35
map/shuffle reduce
UDFUDF
![Page 36: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/36.jpg)
Hadoop vs SectorHadoop Sector
Storage Cloud Block-based File-basedProgramming Model
MapReduce UDF & MapReduce
Image processing
Difficult with MapReduce
Easy with UDF
Protocol TCP UDTReplication At write At write or period.Security Not yet HIPAA capableLanguage Java C++
36Source: Gu and Grossman, Sector and Sphere, Phil. Trans. Royal Society A, 2009.
![Page 37: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/37.jpg)
Terasort - Sector vs Hadoop Performance1 Rack 2 Racks 3 Racks 4 Racks
Nodes 32 64 96 128
Cores 128 256 384 512
Hadoop 85m 49s 37m 0s 25m 14s 17m 45s
Sector 28m 25s 15m 20s 10m 19s 7m 56s
Speed up 3.0 2.4 2.4 2.2
Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
![Page 38: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/38.jpg)
Sector Applications Distributing the 15 TB Sloan Digital Sky Survey to
astronomers around the world (joint with JHU, 2005) Managing and analyzing high throughput sequence
data (Cistrack, University of Chicago, Cistrack, 2007). Detecting emergent behavior in distributed network
data (Angle, won SC 07 Analytics Challenge) Image processing for high throughput sequencing. Wide area clouds (won SC 09 BWC with 100 Gbps
wide area computation) New ensemble-based algorithms for trees Graph processing
38
![Page 39: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/39.jpg)
Cistrack Database
Analysis Pipelines & Re-analysis
Services
Cistrack Web Portal & Widgets
Cistrack Large Data Cloud Services
Ingestion Services
Cistrack Elastic Cloud
Services
![Page 40: My Other Computer is a Data Center (2010 v21)](https://reader036.vdocuments.site/reader036/viewer/2022062418/554dbf74b4c905c2488b4be7/html5/thumbnails/40.jpg)
Thank you
For more information, please see blog.rgrossman.com
40