cloud computing basics - chapter 01

Upload: inayat

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    1/8

    Comparison of Multiple Cloud Frameworks

    Gregor von Laszewski, Javier Diaz, Fugang Wang, Geoffrey C. Fox

    Pervasive Technology Institute, Indiana University

    Bloomington, IN 47408, U.S.A.

    e-mail: {laszewski, javier.diazmontes, kevinwangfg, gcfexchange}@gmail.com

    AbstractToday, many cloud Infrastructure as a Service

    (IaaS) frameworks exist. Users, developers, and

    administrators have to make a decision about which

    environment is best suited for them. Unfortunately, the

    comparison of such frameworks is difficult because either

    users do not have access to all of them or they are comparing

    the performance of such systems on different resources, which

    make it difficult to obtain objective comparisons. Hence, the

    community benefits from the availability of a testbed on which

    comparisons between the IaaS frameworks can be conducted.

    FutureGrid aims to offer a number of IaaS including Nimbus,

    Eucalyptus, OpenStack, and OpenNebula. One of the

    important features that FutureGrid provides is not only the

    ability to compare between IaaS frameworks, but also to

    compare them in regards to bare-metal and traditional high-

    performance computing services. In this paper, we outline

    some of our initial findings by providing such a testbed. As one

    of our conclusions, we also present our work on making access

    to the various infrastructures on FutureGrid easier.

    Cloud, Grid; Nimbus; Eucalyptus; OpenStack; OpenNebula;

    RAIN; FutureGrid

    I.

    INTRODUCTION

    Cloud computing has become an important tool to deliverinfrastructure as a service (IaaS) to users that require a greatdeal of customization and management of their own softwarestacks. In addition, we observe that users demand furtherabstraction and expect platforms as a service (PaaS) to bereadily available to conduct higher-level developmentefforts. Together IaaS and PaaS can provide potent solutionsnot only to business users, but also to the educational andscientific computing communities. We observe that in thescientific community we can distinguish a number of typicaluser categories based on their usage in regards to scale,virtual machines used, computational tasks, and the numberof users served by the services. In particular we areinterested in the following use cases.

    First, we distinguish users that demand very few imagesbut want to run them for a long period of time withguarantees on high-availability. Such environments aretargeted to support what is today termed the long tail ofscience. In this case, many millions of scientific users withmoderate computing needs benefit from the delivery of lesscompute intense services.

    Second, we distinguish scientists requiring a largenumber of resources to conduct actual calculations andanalyses of data to be exposed to their community. This isthe traditional high-performance computing use case.

    Third, we identified a class of users with requirements in-between the two previous cases. These users have modestbut still significant demand of compute resources. Hence ouruse cases motivate three classes of infrastructures:1.

    Traditional high performance computing (HPC) servicesoffered as part of traditional HPC centers.

    2.

    Hosting services for production such as services thatcater to a community, including gateways, Web servers

    and others. Such services may interface with servicesthat run on traditional HPC resources.

    3.

    Access to moderate compute resources to support manymoderate calculations to its many users that demand it.

    Within this paper we focus on the third class of users. Wewill identify the rational for some of the choices that weoffer in FutureGrid and identify how to simplify access to anenvironment that provides so many choices.

    The paper is structured as follows. In Section II, wepresent an overview of FutureGrid and explain whyFutureGrid provides the current services it offers. We alsooutline some future directions motivated by simple usagepatterns observed in FutureGrid. One of the main objectives

    of this paper is to contrast the different IaaS frameworks weoffer and project first results of our qualitative comparisondescribed in Sections III and IV, respectively. Based on ourqualitative analysis we strive to provide answers to thefollowing questions:1.

    Which IaaS framework is most suited for me?2.

    How can I compare these frameworks not just betweeneach other, but also to bare-metal?

    The later is of special interest as at this time many of the

    Cloud frameworks are still under heavy development and

    pathways to utilize multiple of them are of current interest.

    In Section V, we impart our thoughts on providing PaaSofferings attractive for our user communities. Next, in

    Section VI, we discuss what implications this multitude ofservice offerings has for the user community. In Section VIIwe introduce rain and in Section VIII we detail thescalability tests performed in different FG infrastructures.Finally, Section IX collects the conclusions of this work.

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    2/8

    II.

    FUTUREGRID

    The FutureGrid project is sponsored bypartners from Indiana University,Chicago, University of Florida, San DiegoCenter, Texas Advanced Computing CentVirginia, University of Tennessee, UniverCalifornia, Dresden, Purdue University, and

    Among its sites Futuregrid has a sresources totaling about 5000 computeinclude a variety of different platforms aaccess heterogeneous distributed computistorage resources. This variety of resourallow interesting interoperability and scalabiAdditionally, FG provides a fertile environvariety of IaaS and PaaS frameworks.

    Early on we were faced with two elem(1) Does a user need access to moreframework, and (2) if so, which frameoffer? To answer these questions, we analyfrom the user community of FutureGrid as

    registration process. Each project requestorfeedback on the technologies that wereexecution of their projects. The result of tdepicted in Figure 1. Figure 2 shows the deprojects we host on FutureGrid classifidisciplines and goals. We observe the follo1. Nimbus and Eucalyptus were requested

    not surprising, as we made most advertsystems and recommended them forprojects on FG. Originally we hadEucalyptus but scalability issues ofinstall on FutureGrid resulted in a rNimbus requests.

    2.

    High Performance Computing (HP

    requested as third highest category. Thiour affiliation with traditional HPC coas the strong ties to XSEDE.

    3. Hadoop and map/reduce were requ36.78% of the projects. This number ione reported in Figure 1, as we combinHadoop and MapReduce while only coentries.

    4.

    We saw recently an increase in requestIt has just become one of the prefersolutions for cloud computing within acompanies, but also within the research

    5.

    We have seen an increased demand fOpenNebula. It is quite popular as part

    Cloud efforts and has gained substantiaUS projects such as Clemson UnivLaboratory as part of their cloud strateg

    6.

    Not surprisingly the largest contingentgroup of technology experts.

    Based on this analysis we spent our effoservices within FutureGrid. As a resultproviding the following partitioning betlisted in Figure 3. However, the number ofbetween these services can be chang

    SF and includesUniversity ofSupercomputingr, University ofsity of SouthernGrid 5000.t of distributedores. Resourcesllowing users tog, network, andes and serviceslity experiments.ent to explore a

    entary questions:than one IaaS

    orks should weed data gatheredart of the project

    ad to provide usrelevant for theis information isographic of the

    ed by scientificing:the most. This is

    isement for theseducational classore requests forthe Eucalyptus

    cent increase in

    ) services are

    s is motivated bymunities as well

    ested by abouts higher than theed the values fornsidering unique

    s for OpenStack.red open sourcelarge number ofcommunity.r the support ofof the European

    l backing also byrsity and Fermiies.f our users is the

    rt to enable suche are currently

    een services asnodes associateded by request.

    OpenNebula does not appear on thnot made it officially accessible.conducted scalability experimentsmonitored usage patterns that couldin our current deployment strategy.taking into account also other factodo on the portal. An interesting obsour OpenNebula tutorial that we hoof our most popular tutorials (seenot offer OpenNebula on Futureprompts us to identify if we should

    At present, we are integratinresources between the IaaS framcould decide to assign some resotomorrow the same resources coulIaaS farmework Hence, the differtestbed can be resized on-demanexperiments or scientific problerequirements.

    Figure 2: The distribution of the scientiwhile reviewing the project requests. (No

    to be integrated into th

    E

    Life

    Science

    15%

    Inter-

    operability

    3%

    Technolog

    y

    Evaluation

    24%

    Figure 1: Technology choices made as paprocess in FutureGrid. Note that multiple

    total will be more than 100%.

    s chart because we haveNevertheless, we have(see Section IV) and

    motivate a possible shiftFor this decision we ares, such as what our userservation we made is thatt on the FG portal is oneigure 3) although we do

    Grid at this time. Thislso offer OpenNebula.

    g mechanisms to floatworks. Thus, today weurces to Nimbus, whilebe assigned to another

    nt infrastructures of ourd to support scalability

    s with high resource

    ic areas that we identifiedte that 20 project have yet

    is data).

    ducation

    9%

    Computer

    Science

    35%other

    Domain

    Science

    14%

    rt of the project applicationntries could be selected so the

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    3/8

    III. OVERVIEW OF CLOUD IAASFRA

    One fundamental concept in cloud compproviding Infrastructure as a Service (Iresources to customers and users instead omaintaining compute, storage, and network.

    achieved by offering virtual machines to tto establish such a service, a number ofavailable including Eucalyptus, NimbuOpenStack (see Figure 4). Next, wediscussion about these frameworks and ouqualitative differences between them.

    A. Nimbus

    The Nimbus project [1] is working on tthey term Nimbus Infrastructureand Ni

    Nimbus Infrastructure: The Nimbus prNimbus Infrastructure to be an opencompatible Infrastructure-as-a-Servicespecifically targeting features of interestcommunity such as support for proxy cschedulers, best-effort allocations and oththis mission, Nimbus is providing its own ia storage cloud compatible with S3 compatiby quota management, EC2 compatible cloconvenient cloud client which uses internall

    Nimbus Platform: The Nimbus platforprovide additional tools to simplify the minfrastructure services and to facilitate theother existing clouds (OpenStack and Am

    Figure 4: Typical partitioning of FG compute reso

    framework

    Figure 3:Number of hits on our tutorial pages

    53074551

    3916

    1,794

    0100020003000400050006000

    MEWORKS

    uting is based onaaS) to deliverf purchasing andTypically, this is

    e users. In orderframeworks ares, OpenNebula,rovide a shortline some major

    wo products thatbus Platform.

    oject defines thesource EC2/S3-implementation

    to the scientificredentials, batchers. To supportplementation ofle and enhancedd services, and aWSRF.is targeting to

    nagement of theintegration with

    azon). Currently,

    this includes the following toolslaunching, controlling, and monitb) a context broker service thatcluster launches automatically and

    B. OpenNebula

    OpenNebula [3, 4] is an ope

    design is flexible and modular tdifferent storage and network infrand hypervisor technologies [5]. Iresource needs, resource addsnapshotting, and failure of ph

    Furthermore, OpenNebula suppointerface with external clouds, wisolation, and multiple-site supportsupplement the local infrastructurefrom public clouds to meet peakhigh availability strategies. Thus,centralized management systemmultiple deployments of OpenNebu

    OpenNebula supports diffe

    including REST-based interfacesinterfaces, and the emerging cloudthe de-facto AWS EC2 API standar

    The authorization is basedkeypairs, X.509 certificates or LDintegrates fine-grained role basedauthorization capabilities.

    The authorization is basedkeypairs, X.509 certificates or LDsupports fine-grained ACLs to maroles as part of its authorization cap

    The storage subsystem sconfiguration, from non-shared fitransferring via SSH to shared file s

    Lustre) or LVM with CoW (copy-oserver, from using commodity harsolutions.

    C. OpenStack

    OpenStack [7] is a collection ofto deliver public and private clocurrently include OpenStack COpenStack Object Storage (calledImage Service (called Glance). Oand has received considerableopenness and the support of compa

    Nova is designed to provisnetworks of virtual machines, cr

    scalable cloud computing platform.redundant, scalable object storstandardized servers in order to stodata. It is not a file system or real-but rather a long-term storage systestatic data. Glance provides discdelivery services for virtual disk im

    rces by IaaS

    1,265 1,200

    ) cloudinit.d coordinatesoring cloud applications,coordinates large virtualrepeatedly [1, 2]

    source IaaS toolkit. Its

    o allow integration withstructure configurations,

    t can deal with changingitions, live migration,ysical resources [3, 6].ts cloud federation toich provides scalability,. This lets organizationswith computing capacitydemands, or implementsingle access point and

    an be used to controla.ent access interfaces

    , OGF OCCI serviceAPI standard, as well as.

    on passwords, ssh rsaP. This framework alsoACLs that as part of its

    on passwords, ssh rsaP. This framework alsoage users with multiple

    bilities.pports any backendle systems with imageystems (NFS, GlusterFS,

    n-write), and any storageware to enterprise-grade

    open source componentsuds. These componentsompute (called Nova),

    Swift), and OpenStackenStack is a new effortmomentum due to itsies.on and manage largeeating a redundant and

    Swift is used to create age using clusters ofe petabytes of accessibleime data storage system,

    for more permanent orovery, registration, andges.

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    4/8

    D.

    Eucalyptus

    Eucalyptus [8] promises the creation of on-premiseprivate clouds, with no requirements for retooling theorganization's existing IT infrastructure or need to introducespecialized hardware. Eucalyptus implements an IaaS(Infrastructure as a Service) private cloud that is accessiblevia an API compatible with Amazon EC2 and Amazon S3. Ithas five high-level components: Cloud Controller (CLC) thatmanages the virtualized resources; Cluster Controller (CC)controls the execution of VMs; Walrus is the storage system,Storage Controller (SC) provides block-level networkstorage including support for Amazon Elastic Block Storage(EBS) semantics; and Node Controller (NC) is installed ineach compute node to control VM activities, including theexecution, inspection, and termination of VM instances.

    IV.

    QUALITATIVE FEATURE COMPARISON OF THE IAAS

    FRAMEWORKS

    All these IaaS frameworks have been designed to allowusers to create and manage their own virtual infrastructures.

    However, these frameworks have differences that need to beconsidered when choosing a framework. Some qualitativefeatures to consider as part of the selection are summarizedin Table 1 and enhance findings from others [9].

    Software deployment. An important feature is the easeand frequency of the software deployments. From our ownexperience the easiest to deploy is OpenNebula because weonly have to install a single service in the frontend for a basicconfiguration while no OpenNebula software is installed inthe compute nodes. Nimbus is also relatively easy to install,as only two services have to be configured in the frontendplus the software installation in each compute node. On theother hand, the deployment of Eucalyptus and OpenStack ismore difficult due to the number of different components to

    configure and the different configuration possibilities thatthey provide. The learning curve for OpenStack is still steep,as the documentation needs some improvements

    In addition to a single install we also have to considerupdate and new release frequencies. From the release notesand announcements of the framework we observe that majorupdates happen on a four or six months schedule, with manyrelease candidates and upgrades that fix intermediate issues.Furthermore, we observed that the installation deploymentdepends on scalability requirements. As such, an OpenStackdeployment of four nodes may look quite different from onethat utilizes 60 nodes. Hence, it is important that integrationwith configuration management toolkits exist in order tominimize the effort to repeatedly adapt to configuration

    requirements for a production system. Tools such as chef andpuppet provide a considerable value add in this regards. Theyserve as a repeatable template to install the services in caseversion dependent performance comparisons or featurecomparisons are conducted by the users.

    Interfaces. Since Amazon EC2 is a de-facto standard, allof them support the basic functionality of this interface,namely image upload and registration, instantiate VMs, aswell as describe and terminate operations. However, theOpenStack project noticed disadvantages due to features that

    EC2 is not exposing. Thus, OpenStack is considering toprovide interfaces that diverge from the original EC2.

    Storage. Storage is very important in cloud because wehave to manage many images and they must be available forusers anytime. Therefore, most of the IaaS frameworksdecided to provide cloud storage. In the case of Nimbus, it iscalled Cumulus and it is based on the POSIX filesystem.

    Cumulus also provides a plugin supporting various storagesystems including PVFS, GFS, and HDFS (under a FUSEmodule). Cumulus uses http/s as a communication protocol.

    OpenStack and Eucalyptus provide more sophisticatedstorage systems. In OpenStack it is called called Swift, andin Eucalyptus it is called Walrus. Both of them are designedto provide fault tolerance and scalability. OpenStack canstore the images in either the POSIX filesystem or Swift. Inthe first case, images are transferred using ssh while in thesecond one are transferred using http/s. Finally, OpenNebuladoes not provide a cloud storage product, but its internalstorage system can be configured in different ways. Thus, wecan have a shared filesystem between frontend and computenodes; we can transfer the images using ssh; or we can use

    LVM with CoW to copy the images to the compute nodes.Networking. The network is managed differently for

    each IaaS framework while providing various options ineach of them:

    Eucalyptus offers four different networking modes:managed, managed-noLAN, system, and static [10]. Inthe two first modes, Eucalyptus manages the network ofthe VMs. The modes differ in the network isolationprovided by vLAN. In the system mode, Eucalyptusassumes that IPs are obtained by an external DHCPserver. In the static mode, Eucalyptus manages VM IPaddress assignment by maintaining its own DHCPserver with one static entry per VM.

    Nimbus assigns IPs using a DHCP server that can be

    configured in two ways: centralized and local. In thefirst case, a DHCP service is used that one configureswith Nimbus-specific MAC to IP mappings. In thesecond case, a DHCP server is installed on everycompute node and automatically configured with theappropriate addresses just before a VM boots.

    OpenStack supports two modes of managing networksfor virtual machines: flat networking and vLANnetworking. vLAN based networking requires a vLANcapable managed switch that you can use to setupvLANs for your systems. Flat Networking uses LinuxEthernet bridging to connect multiple compute hoststogether.

    The OpenNebula network contains the followingoptions: host-managed vLANs where the networkaccess is restricted through vLAN tagging, which alsorequires support from the hardware switches; Ebtablesto restrict the network access through Ebtables rules;and Open vSwitch to restrict network access with OpenvSwitch Virtual Switch.

    Hypervisors. All of the IaaS frameworks support KVMand XEN, which are the most popular open sourcehypervisors. OpenNebula and Openstack also supportVMWare. Eucalyptus only supports VMWare in its

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    5/8

    commercial version. Nimbus does not support VMWare.Additionally, OpenStack also supports LXC, UML andMicrosoft's HyperV. This makes OpenStack a quite

    attractive choice for experimenting with differenthypervisors environments.

    Authentication. All of the IaaS frameworks supportX.509 credentials as authentication method for users.OpenStack and OpenNebula also support authentication viaLDAP, although it is quite basic. OpenNebula also supportssh rsa keypair and password authentication. OpenStackfocuses recently on the development of a keystore. Nimbusallows interfacing with existing Grid infrastructureauthentication frameworks.

    V. OVERVIEW OF SELECTED PAASFRAMEWORKS

    A. Platform as a Service

    Platform as a Service (PaaS) offers higher-level servicesto the users that focus on the delivery of a platform todevelop and reuse advanced services.. Hence, developers canbuild applications without installing any tools on theircomputer and deploy those applications without worryingabout system administration tasks. On FutureGrid weprovide Hadoop and Twister to our users as they providepopular choices in the community. Twister is a productdeveloped at Indiana University (IU) and used as part of

    courses taught within the university. This also explains whyit is so frequently requested as part of the project registration.

    VI.

    SUPPORTED IAASAND PAASFRAMEWORKS IN

    FUTUREGRID

    As already outlined in Section II FutureGrid provides atthis time Nimbus, OpenStack, and Eucalyptus on variousresources. However, We also experimented with anOpenStack installation with great success. At this timeNimbus is our preferred IaaS framework due to its easyinstall, the stability, and the support that is provided whileincluding the authors of the Nimbus project as fundedpartners into the FutureGrid Project. This has a positiveimpact in support questions, but also in the development offeatures motivated by FutureGrid. As part of our PaaSofferings, we provide various ways of running Hadoop on

    FG. This is achieved by either using Hadoop as part of thevirtualized environment or exposing it through the queuingsystem via myHadoop [11]. Twister [12] is contributedthrough community efforts by IU.

    A. Unicore and Genesis II

    Additionally, services such as Unicore and Genesis II areavailable as part of the HPC services.

    OpenStack Eucalyptus 2.0 Nimbus OpenNebula

    Interfaces EC2 and S3, Rest Interface.Working on OCCI

    EC2 and S3, Rest Interface.Working on OCCI

    EC2 and S3, RestInterface

    Native XML/RPC, EC2 and S3,OCCI, Rest Interface

    Hypervisor KVM, XEN, VMware Vsphere,

    LXC, UML and MS HyperV

    KVM and XEN. VMWare in the

    enterprise edition.

    KVM and XEN KVM, XEN and VMWare

    Networking - Two modes:(a) Flat networking

    (b) VLAN networking

    -Creates Bridges automatically-Uses IP forwarding for public

    IP

    -VMs only have private IPs

    - Four modes: (a) managed; (b)

    managed-novLAN; (c) system;

    and (d) static

    - In (a) & (b) bridges are created

    automatically

    - IP forwarding for public IP

    -VMs only have private IPs

    - IP assigned using aDHCP server that can be

    configured in two ways.

    - Bridges must exists inthe compute nodes

    - Networks can be defined tosupport Ebtable, Open vSwitch

    and 802.1Q tagging

    -Bridges must exists in thecompute nodes

    -IP are setup inside VM

    Software

    deployment

    - Software is composed by

    component that can be placed in

    different machines.

    - Compute nodes need to installOpenStack software

    - Software is composed by

    component that can be placed in

    different machines.

    - Compute nodes need to installOpenStack software

    Software is installed in

    frontend and compute

    nodes

    Software is installed in frontend

    DevOpsdeployment

    Chef, Crowbar, Puppet Chef*, Puppet*(*according to vendor)

    no Chef, Puppet

    Storage (Image

    Transference)

    - Swift (http/s)- Unix filesystem (ssh)

    Walrus (http/s) Cumulus (http/https) Unix Filesystem (ssh, sharedfilesystem or LVM with CoW)

    Authentication X509 credentials, LDAP X509 credentials X509 credentials,Grids

    X509 credential, ssh rsa keypair,password, LDAP

    Avg. Release

    Frequency

    4 month 6 month

    License OpenSource - Apache OpenSource Commercial OpenSource Apache OpenSource Apache

    Table 1:Feature comparison of IaaS frameworks. indicates a positive evaluation. The more checkmarks the better.

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    6/8

    B.

    Globus

    Originally we expected that the demand for Globus [13]would be higher than what we currently see. However, this isexplainable due to the following factors: a) Globus demandhas been shrinking because many new projects investigatecloud technologies instead of using Grid toolkits; and b) Gridservices have been offered in stable form as part of OSG andXSEDE and thus the demand within a testbed is small assufficient resources are available to experiment with suchsystems in production environments. Projects that requestedGlobus include the SAGA team to investigateinteroperability, the EMI project, and the Pegasus Project.Nevertheless, only the later has actively requested a realpermanent Globus infrastructure on bare metal resources.The other projects have indicated that their needs can befulfilled by raining Globus onto a server based on imagesthat are managed by the projects themselves. Furthermore,we observe that the Globus project has moved over the lastyear to deliver package based installs that make suchdeployments easy. We had only one additional user thatrequested not the installation of Globus, but the installationof GridFTP to more easily move files along. This requestcould be satisfied while using the CoG Kit [14] andabstracting the need for moving data away from a protocoldependency. Interoperability projects would be entirelyserved by a virtual Globus installation on images via rain(see Section VII). In many of these cases even bare-metalaccess is not needed. In fact such use cases benefit from theability to start up many Globus services to as part of thedevelopment of advanced workflows that are researched bythe community.

    At this time we are not supporting any other PaaS such asmessaging queues or hosted databases. However, we intendto adapt our strategies based on evaluating user requests withavailable support capabilities.

    VII. RAINING

    Due to the variety of services and limited resourcesprovided in FG, it is necessary to enable a mechanism toprovision needed services onto resources. This includes alsothe assignment of resources to different IaaS or PaaSframeworks. We have developed, as first step to address thischallenge, a sophisticated image management toolkit thatallows us to not only provision virtual machines, but alsoprovision directly onto bare-metal. Thus, we use the termrainingto indicate that we can place arbitrary software stackonto a resource. The toolkit to do so is called rain.

    Rainmakes it possible to compare the benefits of IaaS,

    PaaS performance issues, as well as evaluating whichapplications can benefit from such environments and howthey must be efficiently configured. As part of this process,we allow the generation of abstract images and universalimage registration with the various infrastructures includingNimbus, Eucalyptus, OpenNebula, OpenStack, but also bare-metal via the HPC services. This complex process isdescribed in more detail in [15]. It is one of the uniquefeatures about FutureGrid to provide an essential componentto make comparisons between the different infrastructures

    more easily possible. Our toolkit rain is tasked withsimplifying the full life cycle of the image managementprocess in FG. It involves the process of creating,customizing, storing, sharing and deploying images fordifferent FG environments. One important feature in ourimage management design is that we are not simply storingan image but rather focus on the way an image is created

    through templating. Thus it will be possible at any time toregenerate an image based on the template that is used toinstall the software stack onto a bare operating system. Inthis way, we can optimize the use of the storage resources.

    Now that we have an elementary design of managingimages, we can dynamically provision them to create bare-metal and virtualized environments while raining them ontoour resources. Hence, rain will offers four main features:

    Create customized environments on demand.

    Compare different infrastructures.

    Move resources from one infrastructure to another bychanging the image they are running plus doing neededchanges in the framework.

    Ease the system administrator burden for creatingdeployable images.

    This basic functionality has been tested as part of ascalability experiment motivated by our use case introducedin the introduction. Here, a user likes to start many virtualmachines and compare the performance of this task betweenthe different frameworks.

    VIII. INFRASTRUCTURE SCALABILITY STUDY

    We have performed several tests to study the scalabilityof the infrastructures installed in our cluster at IndianaUniversity called India. The India cluster is composed byIntel Xeon X5570 with 24GB of memory, a single drive of500GB with 7200RPMm 3Gb/s and an interconnection

    network of 1Gbps Ethernet.The idea of these tests is to provision as many physical

    machines (PM) or virtual machines (VM) at the same time aspossible. Obviously this is an important test that is ofespecial interest to our traditional HPC community thatneeds many compute resources to operate in parallel. Testssucceed if all the machines have ssh access and we cansuccessfully connect and run commands on all the machines.We measure the time that takes since the request is firstissued until we have access to all the machines.

    For that purpose we have used our image generator andregistration tools and created a CentOS 5 image. Weregistered it in the different infrastructures: HPC, OpenStack,Eucalyptus and OpenNebula. This process gives us an

    identical image template. Therefore, the only difference inthe image is the version of the 2.6 kernel/ramdisk used. InHPC we use the ramdisk modified by xCAT, in Eucalyptuswe use a XEN kernel and in OpenStack or OpenNebula weuse a generic kernel with KVM support. The total size of theimage without compression is 1.5 GB. In the case of netbootit is compressed to about 300MB. Figure 5 shows the resultsof the performed tests. In the following sections we describethe software used in each infrastructure, the results obtainedand the problems we found.

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    7/8

    A.

    HPC (Moab/xCAT)

    We used Moab 6.0.3 and xCAT 2.6.provision images to the compute nodes.111 nodes provisioned with the same image

    we did not have any problems and observedscalability. This behavior is based on theexecution the provisioning step in pabottleneck in our experiments was introducretrieving the image from the server. Sicompressed and relatively small, the scalabis preserved. However, provisioning eventakes more time in comparison to its virtualiThis easily explained by the fact that the prothe physically machine must be called whicneeded in case of using virtualized maprocess of rebooting takes place in memorconsuming.

    B. OpenStack

    In our experiments, we have used theOpenStack. In order to make an efficient imOpenStack caches images in the computedoes not need to transfer the image over ttime it is requested and the instantiation of a

    Due to scalability issues by provisioninof VMs, we modified our strategy to submibatches of 10 VMs at a time (maximum). Tthe case of provisioning 32 VMs, we have10 VMs and once 2 VMs. This is aOpenStack and is going to be solved in futhrough the method we chose for Cactus [16

    Furthermore, we observed that if the im

    in the compute nodes, the scalability of tvery limited. In fact, we started to have prboot 16 VMs and observed a failure rateour test. We observed that VMs got stuckstatus (this is the status where the image iscompute node).

    On the other hand, once we had the imaof the compute nodes, we were able to bosimultaneously. However, placing the imagwas not an easy task, as we had to deal wi

    Figure 5: Scalability Experiment of IaaS Framewor

    to dynamicallye had a total of

    . During the tests

    very good linearood behavior of

    rallel. A smalled at the time ofce the image isility of this setup

    a single imagezed counterparts.cess of rebootingh is naturally nothines. Here theand is less time

    actus version ofage provisioning,

    nodes. Thus, ite network everyVM is faster.g higher numbert VM requests inhis means that inequested 3 timesverified bug inture releases. Or].age is not cached

    is framework isoblems trying tof about 50% forin the launchingtransferred to the

    e cached in mostt up to 64 VMs

    es into the cacheth a 50% failure

    rate. Besides observing that thelaunching status, we also observerunning, but we could not accessproblem is mainly related to the facinject the public network configAlternatively, it creates bridges anto direct all the traffic toward the

    observed that the bridges, the iptanot created properly. In particulimportant because we observed thgets confused and duplicate entriesinformation that was not properly clinformation makes the VMs inaccethis problem is persistent and one nthe database to remove the wOpenStack.

    Finally, another known problethat it is not able to automaticallyVMs, so we have to look into the poVM manually. This feature has bversion, though. Our intention is t

    also with Diablo once it is available

    C.

    Eucalyptus

    We use version 2.03 of Eucalopen source version. Eucalyptus althe compute nodes. Just like insimilar issues with Eucalyptusnumbers of VMs. Thus, we modifiplan and also introduced batched rwith a delay of 6 seconds. This is dof Eucalyptus that prevents userscommand several times in a short p

    The tests with this infrastructurebecause we could only get 16 VMs

    Even though, the failure rate was vOpenStack, configures the puforwarding and creates the bridgethis creates problems, like in the casimilar errors like missing bridgAdditionally, we had problemassignment of public IPs to the VVMs did not get public IPs andaccessible for users.

    D. OpenNebula

    We used OpenNebula versiOpenNebula does not cache imagessupports three basic transfer plugins

    NFS has a terrible performance becthe image through the network. Sused because is still easy to conperformance. The last one seems mbut it should provide the best perfoone selected by the OpenNebulaexperiments [17, 18]. We were ablat the same time with almost a 100only got one error in the caseconfiguration does not use an imag

    s on FutureGrid

    image got stuck in thethat other images were

    them via ssh. The laterthat OpenStack does notration inside the VM.conducts IP forwarding

    private IP. At times we

    les entries or both werer, the iptables issue isat sometimes OpenStackin the iptables using oldeaned up. This erroneousssible. We observed thateeds to manually modifyong entries or restart

    of OpenStack Cactus isassign public IPs to theol and assign one to eachen added in the Diablo

    o rerun our experiments

    on FutureGrid.

    ptus, which is the latestso caches the images inpenStack, we observedhile requesting larger

    ed our initial experimentquests (10 at a time) bute to an annoying featurerom executing the sameriod of time.were quite disappointing

    running at the same time.

    ry high. Eucalyptus, likelic network with IPat running time. Thus,

    se of OpenStack, we gotes and iptables entries.

    with the automatics. This means that sometherefore they were not

    on 3.0.0. By defaultin the compute nodes. Itnamed nfs, ssh and lvm.

    use the VMs are readingH was the one that we

    figure and has a betterore difficult to configure,rmance, because it is the

    team to perform theire to instantiate 148 VMs

    of success. In fact, wef 148 VMs. Since ourcache, it was very slow

  • 8/11/2019 Cloud Computing Basics - Chapter 01

    8/8

    and limited by the network, as each image must be copied toeach VM. This is done in a sequential process. However, thiscan be improved by introducing caches as others haveshowcased [19].

    IX. CONCLUSIONS

    This paper provides evidence that offering not one butmany IaaS frameworks is needed to allow users to answerthe question which IaaS framework is right for them. Wehave to be aware that users requirements may be verydifferent from each other and the answer to this questioncannot be universally cast.

    We identified an experiment to be executed onFutureGrid to provide us with feedback on a question manyof our users have: Which infrastructure framework should Iuse under the assumption I have to start many instances. Ourclose ties to the HPC community and researchers utilizingparallel services and programming methodologies naturallymotivate this experiment. However, such users tend not to beexperts in IaaS frameworks and a tool is needed to allow the

    comparison of their use cases in the different frameworks. Inthis paper, we have demonstrated that we achieved a majormilestone towards this goal and proven that our efforts canlead to significant comparisons between deployedframeworks. This also includes the ability to identify a set ofbenchmarks for such infrastructures in order to furtheridentify which framework is best suited for a particularapplication. Our test case must be part of such a benchmark.

    Concretely, we found challenges in our scalabilityexperiments while raining images on them. This wasespecially evident for Eucalyptus and even OpenStack. Asmany components are involved in the deployment they arealso not that easy to deploy. Tools provided as part ofdevelopments such as Chef [20] and Puppet [21] can

    simplify deployments especially if they have to be donerepeatedly or require modifications to improve scalability.We claim that the environment to conduct an OpenStackexperiment with just a handful VMs may look quite differentfrom a deployment that uses many hundreds of servers onwhich Openstack may be hosted. This is also documentednicely as part of the OpenStack Web pages that recommendsmore complex service hierarchies in case of largerdeployments.

    On the other hand, we have seen that OpenNebula isvery reliable and easy to deploy. Although in ourexperiments we found it is quite slow due to the lack ofcaching images. Nevertheless, we think that this problem iseasy to solve. In fact, after we finished the tests we also

    found other community members having the same problem.A plugin to provide caching support is already available,which can significantly improve managing images acrossmany nodes in OpenNebula.

    Through the ability of rain it will become easier for us todeploy PaaS on the IaaS offerings, as we will be able tocreate templates that facilitate their installation andpotentially their upgrade. Due to this ability it is possible toreplicate the environments and introduce reproducibleenvironment.

    The answer to the question, which IaaS question shouldone use, depends on the requirements of the applications andthe user community. OpenNebula and Nimbus were easier toinstall, but we observed that OpenStack and Eucalyptus haveconsiderably worked on these issues in newer versions to bereleased shortly. Nimbus has turned out to be very reliable,but our OpenStack and Eucalyptus Clouds have been

    extremely popular resulting in resource starvation. We willaddress this issue by assigning dynamically resourcesbetween the clouds. From our own observations workingwith applications it is clear that we do need a testbed, such asFutureGrid, to allow users to cast their own evaluations.

    ACKNOWLEDGEMENTS

    We would like to acknowledge the rest of the FG team for

    its support in the preparation and execution of these tests.

    This material is based upon work supported in part by the

    National Science Foundation under Grant No. 0910812.

    REFERENCES

    [1]Nimbus Project Web Page. http://www.nimbusproject.org, May 2012.

    [2] K. Keahey, I. Foster, T. Freeman, and X. Zhang, "Virtual Workspaces:

    Achieving Quality of Service and Quality of Life in the Grid," Scientific

    Programming Journal, vol. 13, pp. 265-276, 2005.

    [3] OpenNebula Web Page. http://www.opennebula.org, May 2012.

    [4] I. M. Llorente, R. Moreno-Vozmediano, and R. S. Montero, "Cloudcomputing for on-demand grid resource provisioning," Advances in

    Parallel Computing, vol. 18, pp. 177-191, 2009.

    [5] R. S. Montero, R. Moreno-Vozmediano, and I. M. Llorente, "AnElasticity Model for High Throughput Computing Clusters," J. Parallel

    and Distributed Computing, 2010.

    [6] B.Sotomayor, R. S. Montero, I. M. Llorente, and I.Foster, "VirtualInfrastructure Management in Private and Hybrid Clouds," IEEE Internet

    Computing, vol. 13, pp. 14-22, Sep.-Oct 2009 2009.

    [7] OpenStack Web Page. http://www.openstack.org, May 2012.[8] Eucalyptus Web Pages (Open Source). http://open.eucalyptus.com/,

    May 2012.[9] P. Sempolinski and D. Thain, " A Comparison and Critique of

    Eucalyptus, OpenNebula and Nimbus. ," presented at the IEEE 2nd Int.

    Conf. on Cloud Computing Technology and Science, 2010.

    [10] Eucalyptus Network Configuration 2.0.http://open.eucalyptus.com/wiki/EucalyptusNetworkConfiguration_v2.0,

    May 2012.

    [11] myHadoop. http://sourceforge.net/projects/myhadoop/, May 2012.[12] Twister. http://www.iterativemapreduce.org/, May 2012.

    [13] Globus Project. http:/ /www.globus.org/, May 2012.

    [14] G. von Laszewski, I. Foster, J. Gawor, and P. Lane, "A JavaCommodity Grid Kit," Concurrency and Computation: Practice and

    Experience, vol. 13, pp. 643-662, 2001.[15] J. Diaz, G. v. Laszewski, F. Wang, and G. Fox, " Abstract Image

    Management and Universal Image Registration for Cloud and HPC

    Infrastructures " IEEE Cloud. 2012.

    [16] Running 200 VM instances on OpenStackhttp://blog.griddynamics.com/2011/05/running-200-vm-instances-on-

    openstack.html, May 2012.[17] U. Schwickerath, "CERN Cloud Computing Infrastructure," presented

    at the ICS Cloud Computing Conference, 2010.

    [18] CERN Scaling up to 16000 VMs. http://blog.opennebula.org/?p=620,May 2012.

    [19] Cache optimization in OpenNebula.

    http://dev.opennebula.org/issues/553, May 2012.[20] Chef. http://www.opscode.com/chef/, May 2012

    [21] Puppet. http://puppetlabs.com/, May 2012.