lecture 5: build-a-cloud

Lecture 5: Build-A-Cloud

http://www.cs.columbia.edu/~sambits/

2

Life Cycle in a Cloud

Build a image(s) for the software/application that we want to host on cloud (lecture 4)

Request a VM – pass appropriate parameters such as resource needs and image details (lecture 3)

When the VM is started up, parameters are passed to it at appropriate run levels to auto-configure the software image (lecture 4)

Now in this lecture– Lets monitor the provisioned VM– Manage it at run time – As workload changes, make changes to the amount of requested resource

3

What we shall learn

We shall put together a cloud piece by piece– Open Nebula as the cluster manager– KVM as the hypervisor for host machines– Creating and managing guest VMs– Creating Cluster Application(s) using VMs– Application level management

Interesting Sub-topics which we will touch – Monitoring cluster and applications in such an environment– Example application level management

o How to add on-demand resource scaling using Open Nebula and Ganglia

4

Cloud Setup

Basic Management– Image Management– VM Monitoring & Management– Host Monitoring &

Management

InfrastructureInfo

Management Layer

Image Management VM Management Host ManagementVN Management

private cloud client

5

Our stack for the cloud

Open Nebula – for managing a set of host machines that have hypervisor on them

KVM – hypervisor on the host machines

Ganglia – for monitoring the guest VMs

Glue code for implementing Application management: e.g. resource scaling

6

Install OpenNebula management node– Download and compile the src on the mgmt-node (easy installation, install root

as oneadmin)– Setup sshd on all hosts which have to be added (also install ruby on

them)– Allow root of the mgmt-node to have password-less access to all the

managed hosts– Setup image repository (shared FS based setup is required for live

migration)– If you do not have linux-server (download VirtualBox) and create a

linux VM on your laptop

Open Nebula Architecture– Tools written om top of OpenNebula interact with core via XML-RPC– The core exposes VM, Host, Network management APIs– Core stores all installation and monitoring information in SQLite3 (or

MySQL) DB.– Most of the DB information can be accessed using XML-RPC calls– All the drivers are written in ruby as run as daemons, which in-turn

call small shell-scripts to get the work-done.

OpenNebula Setup

7

Create a Cloud

Start the one daemon– Edit $ONEHOME/etc/oned.conf for necessary changes (quite intuitive)– Put login:passwd in $ONEHOME/etc/one_auth

– “one start” does that– Keeps all the DB and logs in $ONEHOME/var/

– NOTE: if you want to do a fresh setup, simply stop oned and delete $ONEHOME/var/ and again start the OpenNebula daemon

Setup ssh on host machines (allow oneadmin as password-less entry)– Concatenate the .ssh/id_rsa of admin-node on the host-server’s .ssh/authorized_keys

– chmod 600 .ssh/authorized_keys

Add hosts to OpenNebula– Use command onehost

o Command is written in Rubyo Command basically makes XMLRPC call to the OpenNebula server’s HostAllocate call

– E.g.

8

Configure network– Fixed: defines fixed set of IP-MAC pairs– Ranged: defines a class network– e.g. fixed set network setting (assuming you have a set of static IP addresses allotted to

you then how will you set it up).

Note: good site for help:http://www.opennebula.org/documentation:rel1.4:vgg

9

How to access OpenNebula

All API can be called using XML-RPC client libraries– Nebula command line client (Ruby)

– Java Client

10

Setup Monitoring

Requirements of Monitoring– Need something which stores resource monitoring data as a time series– Exposes interfaces for querying it and simple aggregation of data– Automatically archives the older data

How to achieve it?– Install Ganglia !– Tune the VM-images to automatically report their monitoring via ganglia– Install gmond on host-servers

What is Ganglia– Its an open-source S/W (BSD License)– Distributed monitoring of clusters and grids– Stores time-series data and historical data as archives (RRDs)

How to get Ganglia– Download the source-code from (http://ganglia.info/downloads.php)– For some Linux distributions, RPMs are available

http://ganglia.info/downloads.php

11

Components of ganglia

It has two prime daemons– Gmond: a multi-threaded daemon, which runs on monitored nodes

o Collects data on monitored notes and broadcasts the monitored data as XML (can be accessed at port 8649)

o Configuration script (/etc/gmond.conf)– Gmetad:

o periodically polls a collection of children data-sourceso parses the collected XML and saves all numeric metrics to round-robin

databases o exports the aggregated XML over a TCP socket to clients (8651)o Configuration file /etc/gmetad.confo One for each cluster

– Round Robin Databaseo RRDtool is a well known tool for creating and storing and retrieving/plotting

RRD datao Maintains data at various granularities: e.g. defaults are:

• 1-hour data averaged over 15-sec (rra[0])• 1-day data averaged over 6-min (rra[1])• 1-week-data averaged over 42-min (rra[2])

– The web GUI toolso These are a collection of PHP scripts started by the Webserver to extract the

ganglia data and generate the graphs for the website– Additional tools

o gmetric to add extra stats - in fact anything you like numbers or strings, with units etc.

o gstat to get at the Ganglia data to do anything else you like

Note: good site for help: http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia

12

How to get monitoring-data?

How to get the time-series data?– Ganglia stores all RRDs in “/var/lib/ganglia/rrds/cluster_name/machine_ip”– There is a rrd file for each metric– Data is collected at a fixed time-interval (default is 15 sec )– One ca retrieve the complete time series of monitored data using rrdtool from each rrd

file: e.g.:o Get average load_one for every 15 sec of the the last 1-hour:o rrdtool fetch load_one.rrd AVERAGE -end now -start e-1h -r 15

13

How to get monitoring-data? …

How to access this data from inside a program– Either use sshlib (for perl, python or Java) and remotely execute the rrdtool command

with correct parameters– Write a small XML-RPC server which exposes a function to run rrdtool fetch queries.– E.g. perl XML RPC server

use Frontier::Daemon;my $d = Frontier::Daemon->new( methods => { sum => \&sum,}, LocalAddr => $server_ip, LocalPort => $server_port, debug => 1, );

sub sum { my ($auth, $arg1, $arg2) = @_; my $bool = 1; my ($package, $filename, $line, $subroutine, $hasargs, $wantarray, $evaltext, $is_require, $hints, $bitmask) = caller(0); $log->info("$subroutine - " . $_[1]); $log->debug("$subroutine @_"); $log->debug("$subroutine - " . join(";",@_)); return {SUCCESS=>$bool, MESSAGE=>$arg1 + $arg2};}

14

Create a Multi-tiered Clustered Application

Lets us consider a two-tired TPC-W (web server and database performance benchmark)

How to create an application on custom-images– Create a 6-GB file using dd (utility for converting and copying files)– Attach a loop-back device to it– Format it like a file-system (say ext3)– Partition it into 3(swap, boot and root)– Install complete OS and application stack on the relevant partitions.– Install gmond and configure it. – Save it as a custom-image

For TPC-W one will need, – apache tomcat-server, – java-implementation of TPC-W– MySQL Server.

We will need a load-balancer, which can route http-packets to various backend-servers (and also http-session aware)

– I am using HAProxy (easy to install and configure) – Nginx, lighttpd are also other popular http-proxy servers.

15

Installing a multi-tier application

Install a two-tiered Application– Create a template of load-balancer– Create a template of TPCW– Deploy the LB-VM (using OpenNebula)– Deploy the TPCW-VM (using OpenNebula)– Attach TPCW application VMs to LB-VM– Test using Web-browser if setup is working– Create a Client-template– Deploy the client VM – Test client

LoadBalancer

TPCW-0 TPCW-1

Client

16

Application Level Operation

One needs to maintain Application level information, for e.g. which VM is a load-balancer and which VMs are backend servers).

Keep Application level knowledge in some local database.

Application Level Operation: e.g. Dynamic provisioning

Case 1: increasing capacity using replication– Monitor the average utilization of VMs over say 1-min (using ganglia)– If the average utilization of all the VMs under the load-balancer is above say 70%

o provision a new VM using OpenNebula (reactive provisioning also supported by EC2)o Run the post-install script to add the new VM to the application

Case 2: increase capacity using migration/resizing– Monitor the average utilization of VMs over say 1-min (using ganglia)– If only one-vm is over-utilized and the host does not have more resources

o migrate it to another host and re-size it to higher capacity (note nebula does not support it)o Migrate-and-resize VM

• Migrate the image to another host• Change the VM-configuration file to new configuration• Start the VM with new configuration file (with more RAM and CPU)

17

Application Level Operations (e.g. Dynamic Provisioning) …

Where and How to implement the application-scaling logic:– Application scaling logic needs knowledge of application topology

o It obviously resides above Infrastructure management layer (I.e. OpenNebula)– Choose an easy to build language (Perl, Python, Ruby, Java etc).

o XML-RPC client is required to make access to OpenNebula– Write a management program using language of your choice which

o Installs a multi-tier Application and stores application topology in local DBo Periodically monitors

• average load on each server• Proxy errors

o Implement case-1 and case-2o Post-install script is adding the VM to the load-balancer and restarting it.

Problem: live-resize or migrate-and-resize are not present in OpenNebula– Hack: create a script which does the following (very dirty but it works)

o Migrate the current VM to destination hosto Alter the configuration file of this migrate VMo Destroy and recreate the VM.

– Neater solutiono Add a class in include/RequestManager.h (say VirtualMachineResize similar to that of class

VirtualMachineMigrate)o Add another method in src/rm/RequestManager.cc (say: migrateResize)o Implement the class in src/rm/RequestManagerResize.cc (implement the resize).

18

Solution Architecture

Application Manager (written above OpenNebula): The high level control flow is: – Periodically monitor the workload-change and Application performance– Manage the current configuration and actuate configuration change– Calculate the changed capacity (using some model and feedback from monitoring block)– Find the new configuration of application – Go ahead and start the process of actuating the new change

19

How to use (demo!)

Command line scripts– VM Lifecycle steps

o Creation: show template and image-namingo Suspension: just the commando Migration: migration (suspend and migrate)o Deletion: removing the image

– Show ganglia monitoring o Host monitoring through VM-lifecycleo VM monitoring

20

Cloud Management using this Setup

Integrate Nebula monitoring with ganglia and make it more efficient

Use monitoring for VM placement on hosts.

Use monitoring to do reactive provisioning

lecture 5: build-a-cloud

Documents

cloud lecture

cloud piece

monitoring cluster

host machinescreating

host machinesganglia

monitoring information

set of host machines

application management