gcc 2014 scriptable workshop

26
Outline 1. Working with CloudMan 2. Deploying bioinformatics tools with CloudBioLinux 3. Deploying a genome assembly and annotation pipeline with Fabric

Upload: enis-afgan

Post on 17-Jul-2015

263 views

Category:

Education


1 download

TRANSCRIPT

Page 1: GCC 2014 scriptable workshop

Outline 1.  Working with CloudMan

2.  Deploying bioinformatics tools with CloudBioLinux

3.  Deploying a genome assembly and annotation pipeline with Fabric

Page 2: GCC 2014 scriptable workshop

Working with CloudMan Enis Afgan

@ GCC 2014, Baltimore

Page 3: GCC 2014 scriptable workshop

Researchers

`

100sGB

100+

Page 4: GCC 2014 scriptable workshop

Why the cloud? Infrastructure Customization

Page 5: GCC 2014 scriptable workshop

How to use the cloud?

1.  Get an account on the supported cloud

2.  Start a master instance via the cloud web console or CloudLaunch

3.  Use CloudMan’s web interface on the master instance to manage the platform

4.  Use or customize Galaxy

Page 6: GCC 2014 scriptable workshop

YOUR TURN

Page 7: GCC 2014 scriptable workshop

Launch an instance 1.  Visit biocloudcentral.org 2.  Enter the access key and secret key

provided by Dave Clements on 6/25

3.  Provide your email address 4.  Use your initials as the cluster name 5.  Set any password (and remember it) 6.  Keep Large instance type 7.  Start your instance

Wait for the instance to start (~2-3 minutes)

For more details, see wiki.galaxyproject.org/CloudMan

Page 8: GCC 2014 scriptable workshop

Launch an instance 1.  Visit usegalaxy.org/cloudlaunch 2.  Enter your access key and secret key

(credits provided by Dave Clements on 6/25)

3.  Choose New cluster 4.  Set any name as the cluster name

5.  Set any password

6.  Keep cloudman_keypair 7.  Keep Large instance type

8.  Launch your instance Wait for the instance to start (~2-3 minutes)

For more details, see wiki.galaxyproject.org/CloudMan

Page 9: GCC 2014 scriptable workshop
Page 10: GCC 2014 scriptable workshop

Manual scaling •  Explicitly add 1 worker node to your cluster

•  Node type corresponds to node processing capacity

•  Research use of Spot instances

Page 11: GCC 2014 scriptable workshop

Auto-scaling

Page 12: GCC 2014 scriptable workshop

Using an S3 bucket as a data source

Page 13: GCC 2014 scriptable workshop

Accessing an instance over ssh

Use the terminal (or install Secure Shell for Chrome extension)

SSH using user ubuntu and the password you chose when launching an instance:

[local machine]$ ssh ubuntu@<instance IP address>

Page 14: GCC 2014 scriptable workshop

Once logged in

•  You have full system access to your instance, including sudo; use it as any other system

•  CloudMan-managed file systems are located under /mnt

•  Can submit jobs to SGE job manager via the standard qsub command

Page 15: GCC 2014 scriptable workshop

Customize your instance - install a new tool

$ cd /mnt/galaxy/export

$ wget http://heanet.dl.sourceforge.net/project/dnaclust/parallel_release_3/dnaclust_linux_release3.zip

$ unzip dnaclust_linux_release3.zip

Page 16: GCC 2014 scriptable workshop

Use the new tool in the cluster mode 1.  Create a new sample shell file to run the tool; call it job_script.sh

with the following content: #$ -cwd /mnt/galaxy/export/dnaclust_linux_release3/dnaclust -l -s 0.9 /mnt/workshop-data/mtDNA.fasta

2.  Submit single job to SGE queue qsub job_script.sh

3.  Check the queue: qstat -f 4.  Job output will be in the local directory in file job_script.sh.o# 5.  Submit the same job a number of times:

qsub job_script.sh (*10) watch qstat ‒f

1.  See all jobs lined up 6.  [optional] See auto-scaling in action (if enabled) [1.5-2 mins]

Page 17: GCC 2014 scriptable workshop

Sharing-an-Instance •  Share the entire CloudMan platform

•  Includes all of user data and even the customizations

•  Publish a self-contained analysis

•  Make a note of the share-string and send it to your neighbor

Page 18: GCC 2014 scriptable workshop

UNDERPINNING CONCEPTS Technical details

Page 19: GCC 2014 scriptable workshop

CloudMancore

Tool as a

service

- cloud resource management- multi-cloud interface- service management and monitoring- user interaction- system state

- well-defined interface / API- web interface

- self-contained service definition- implements the default service interface- implements optional service hooks

- service dependencies- automatic dependency management- dynamic loading

Page 20: GCC 2014 scriptable workshop

Launcher /Management

Console

Application(s)(eg, Galaxy)

1°2°

6°, 8°

Persistent data

repositoryStart CloudMan

Setup services

10°

CM-wCM-wCM-w

...

FS 1 FS ...

Block storage

Contextualize image

CloudMan MICloudMan MI

...

CloudMan MI

CloudMan machine image

11°

S3/Swift

CloudMan master instance

Snap1 Snap.. Managed FS

Inst. storage

or orFS archive

Page 21: GCC 2014 scriptable workshop
Page 22: GCC 2014 scriptable workshop

What do you get?

•  Cluster-in-the-cloud: SGE

•  Galaxy on the Cloud + control

•  Customizable: tools, configs, data

•  Sharable

•  Extensible

Page 23: GCC 2014 scriptable workshop

Supported cloud middleware

1.  Amazon Web Services

2.  OpenStack

3.  Eucalyptus

4.  OpenNebula

Page 24: GCC 2014 scriptable workshop

Building •  Leverage CloudBioLinux build framework

•  Via GVL flavor

•  Base CloudMan machine image

•  Tools and data

•  There are also more specific CBL flavors available

•  cloudman

github.com/afgane/gvl_flavor

Page 25: GCC 2014 scriptable workshop

Deploying

•  Integrated with BioCloudCentral.org app

•  Use the public one, deploy your own or run locally

•  BCC supports multiple clouds

Page 26: GCC 2014 scriptable workshop

Launcher /Management

Console

Application(s)(eg, Galaxy)

1°2°

6°, 8°

Persistent data

repositoryStart CloudMan

Setup services

10°

CM-wCM-wCM-w

...

FS 1 FS ...

Block storage

Contextualize image

CloudMan MICloudMan MI

...

CloudMan MI

CloudMan machine image

11°

S3/Swift

CloudMan master instance

Snap1 Snap.. Managed FS

Inst. storage

or orFS archive

Troubleshooting /mnt/galaxy[Indices]

/mnt/cm/paster.log cm-<hash>

/usr/bin/ec2autorun.log /tmp/cm/cm_boot.log

/mnt/cm/paster.log

2

1

3