apache whirr
DESCRIPTION
By Lars GeorgeTRANSCRIPT
![Page 1: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/1.jpg)
Apache WhirrCloud Services Made Easy
Lars George, ClouderaHUGUK Meetup12 May 2011
Friday, May 20, 2011
![Page 2: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/2.jpg)
What is Apache Whirr?▪ An Apache Incubator project for running services in the cloud▪ A community for sharing code and configuration for running clusters▪ People: 9 committers (6 orgs), more contributors and users▪ Projects: Hadoop, HBase, ZooKeeper, Cassandra
▪ More coming: Voldemort, ElasticSearch, Mesos, Oozie▪ Whirr is used for▪ Testing▪ Evaluation and proof of concept▪ Production (future)
Friday, May 20, 2011
![Page 3: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/3.jpg)
High Level Overview▪ Whirr uses low-level API libraries to talk to cloud providers▪ Orchestrates clusters based on roles▪ Configuration files define entire cluster▪ Override anything you need to change on the command line
▪ Spins up minimal images, only base OS and SSH▪ Ships your credentials
▪ Services are a combination of roles▪ Define bootstrapping and configuration actions
▪ Adding your own service is (nearly) trivial▪ No dealings with cloud provider internals
Friday, May 20, 2011
![Page 4: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/4.jpg)
Steps in writing a Whirr service▪ 1. Identify service roles▪ 2. Write a ClusterActionHandler for each role▪ 3. Write scripts that run on cloud nodes▪ 4. Package and install▪ 5. Run
Friday, May 20, 2011
![Page 5: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/5.jpg)
1. Identify service roles▪ Flume, a service for collecting and moving large amounts of data
▪ Flume Master▪ The head node, for coordination▪ Whirr role name: flumedemo-master
▪ Flume Node▪ Runs agents (generate logs) or
collectors (aggregate logs)▪ Whirr role name: flumedemo-node
https://github.com/cloudera/flume
Friday, May 20, 2011
![Page 6: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/6.jpg)
2. Write a ClusterActionHandler for each role
public class FlumeNodeHandler extends ClusterActionHandlerSupport {
public static final String ROLE = "flumedemo-node"; @Override public String getRole() { return ROLE; } @Override protected void beforeBootstrap(ClusterActionEvent event) throws IOException, InterruptedException { addStatement(event, call("install_java")); addStatement(event, call("install_flumedemo")); }
// more ...}
Friday, May 20, 2011
![Page 7: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/7.jpg)
Handlers can interact...
public class FlumeNodeHandler extends ClusterActionHandlerSupport {
// continued ... @Override protected void beforeConfigure(ClusterActionEvent event) throws IOException, InterruptedException { // firewall ingress authorization omitted
Instance master = cluster.getInstanceMatching(role(FlumeMasterHandler.ROLE)); String masterAddress = master.getPrivateAddress().getHostAddress(); addStatement(event, call("configure_flumedemo_node", masterAddress)); }}
Friday, May 20, 2011
![Page 8: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/8.jpg)
3. Write scripts that run on cloud nodes▪ install_java is built in▪ Other functions are specified in individual files
function install_flumedemo() { curl -O http://cloud.github.com/downloads/cloudera/flume/flume-0.9.3.tar.gz tar -C /usr/local/ -zxf flume-0.9.3.tar.gz echo "export FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf" >> /etc/profile}
Friday, May 20, 2011
![Page 9: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/9.jpg)
You can run as many scripts as you want▪ This script takes an argument to specify the master
function configure_flumedemo_node() { MASTER_HOST=$1 cat > /usr/local/flume-0.9.3/conf/flume-site.xml <<EOF<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>flume.master.servers</name> <value>$MASTER_HOST</value> </property></configuration>EOF FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf \ nohup /usr/local/flume-0.9.3/bin/flume master > /var/log/flume.log 2>&1 &}
Friday, May 20, 2011
![Page 10: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/10.jpg)
4. Package and install▪ Each service is a self-contained JAR:
▪ Discovered using java.util.ServiceLoader facility▪ META-INF/services/org.apache.whirr.service.ClusterActionHandler:
▪ Place JAR in Whirr’s lib directory
org.apache.whirr.service.example.FlumeMasterHandlerorg.apache.whirr.service.example.FlumeNodeHandler
functions/configure_flumedemo_master.shfunctions/configure_flumedemo_node.shfunctions/install_flumedemo.shMETA-INF/services/org.apache.whirr.service.ClusterActionHandlerorg/apache/whirr/service/example/FlumeMasterHandler.classorg/apache/whirr/service/example/FlumeNodeHandler.class
Friday, May 20, 2011
![Page 11: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/11.jpg)
5. Run▪ Create a cluster spec file
▪ Then launch from the CLI
▪ or Java
% whirr launch-cluster --config flumedemo.properties
whirr.cluster-name=flumedemowhirr.instance-templates=1 flumedemo-master,1 flumedemo-nodewhirr.provider=aws-ec2whirr.identity=${env:AWS_ACCESS_KEY_ID}whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
Configuration conf = new PropertiesConfiguration("flumedemo.properties");ClusterSpec spec = new ClusterSpec(conf);Service s = new Service();Cluster cluster = s.launchCluster(spec);// interact with clusters.destroyCluster(spec);
Friday, May 20, 2011
![Page 12: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/12.jpg)
Demo
Friday, May 20, 2011
![Page 13: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/13.jpg)
Orchestration▪ Instance templates are acted on independently in parallel▪ Bootstrap phase▪ start 1 instance for the flumedemo-master role and run its bootstrap
script▪ start 1 instance for the flumedemo-node role and run its bootstrap
script▪ Configure phase▪ run the configure script on the flumedemo-master instance▪ run the configure script on the flumedemo-node instance
▪ Note there is a barrier between the two phases, so nodes can get the master address in the configure phase
Friday, May 20, 2011
![Page 14: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/14.jpg)
Going further: Hadoop▪ Service configuration▪ Flume example was very simple▪ In practice, you want to be able to override any service property▪ For Hadoop, Whirr generates the service config file and copies
to cluster▪ E.g.
▪ Sets fs.trash.interval in the cluster configuration▪ Service version▪ Tarball is parameterized by
hadoop-common.fs.trash.interval=1440
whirr.hadoop.tarball.url
Friday, May 20, 2011
![Page 15: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/15.jpg)
Going further: HBase
Public
Web Browser
Master RegionServer RegionServer RestServer
Privatelisten listen listen listen
UI RPC RPC RPC HTTPUI UI
HTable REST Client
Friday, May 20, 2011
![Page 16: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/16.jpg)
Going further: HBase▪ Service composition
▪ HBase handlers will do the following in beforeConfigure():▪ open ports▪ pass in ZooKeeper quorum from zookeeper role▪ for non-master nodes: pass in master address
▪ Notice that the Hadoop cluster is overlaid on the HBase cluster
whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master,5 hadoop-datanode+hadoop-tasktracker+hbase-regionserver
Friday, May 20, 2011
![Page 17: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/17.jpg)
Challenges▪ Complexity▪ Degrees of freedom▪ #clouds × #OS × #hardware × #locations × #services ×
#configs = a big number!▪ Known good configurations, recipes▪ Regular automated testing
▪ Debugging▪ What to do when the service hasn’t come up?▪ Logs▪ Recoverability
Friday, May 20, 2011
![Page 18: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/18.jpg)
What’s next?▪ Release 0.5.0 coming soon▪ Support local tarball upload (WHIRR-220)▪ Allow to run component tests in memory (WHIRR-243)
▪ More services (ElasticSearch, Voldemort)▪ More (tested) cloud providers▪ Pool provider/BYON▪ Cluster resizing▪ https://cwiki.apache.org/confluence/display/WHIRR/RoadMap
Friday, May 20, 2011
![Page 19: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/19.jpg)
Questions▪ Find out more at▪ http://incubator.apache.org/whirr
Friday, May 20, 2011
![Page 20: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/20.jpg)
provisionPublic
Ubuntu 10.048GB RAM8 Cores
Ubuntu 10.048GB RAM8 Cores
Ubuntu 10.048GB RAM8 Cores
Ubuntu 10.042GB RAM1 CPU
Private
hbase-master hbase-regionserver hbase-restserverhbase-regionserver
Friday, May 20, 2011
![Page 21: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/21.jpg)
installPublic
login: hbasejava: 1.6.0_16hbase: 0.90.1
login: hbasejava: 1.6.0_16hbase: 0.90.1
login: hbasejava: 1.6.0_16hbase: 0.90.1
login: hbasejava: 1.6.0_16hbase: 0.90.1
Private
hbase-master hbase-regionserver hbase-restserverhbase-regionserver
Friday, May 20, 2011
![Page 22: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/22.jpg)
configurePublic
Master RegionServer RegionServer RestServer
Privatelisten listen listen listen
UI RPC RPC RPC HTTPUI UI
Friday, May 20, 2011
![Page 23: Apache Whirr](https://reader034.vdocuments.site/reader034/viewer/2022052505/5552c03fb4c90581158b46e6/html5/thumbnails/23.jpg)
managePublic
Master RegionServer
Private
listen listen
UI RPC RPCUI
Master #2
listen
UI RPC
Friday, May 20, 2011