providing bioinformatics services on cloud

19
Christophe Blanchet, Clément Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552 ) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001 ) Providing Bioinformatics Services on Cloud C. Blanchet and C. Gauthey EGI CF13, Manchester, 9 April 2013 Infrastructure Distributed for Biology - IDB CNRS-IBCP FR3302, Lyon, FRANCE

Upload: stratuslab

Post on 07-Nov-2014

1.434 views

Category:

Technology


2 download

DESCRIPTION

Improvements of experimental technologies forces biologists to face a deluge of data that require relevant tools and sufficient resources to be analyzed. The cloud helps bioinformatics experts to define virtual appliances with pre-installed tools and workflows, and helps scientists to deploy them, on demand, on national research infrastructures. Presented by Christophe Blanchet and Clément Cauthey at the EGI Community Forum in Manchester, UK in April 2013.

TRANSCRIPT

Page 1: Providing Bioinformatics Services on Cloud

Christophe Blanchet, Clément Gauthey

Infrastructure Distributed for BiologyIDB-IBCP CNRS FR3302 - LYON - FRANCE

http://idee-b.ibcp.fr

IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)

Providing Bioinformatics Serviceson Cloud

C. Blanchet and C. GautheyEGI CF13, Manchester, 9 April 2013

Infrastructure Distributed for Biology - IDB

CNRS-IBCP FR3302, Lyon, FRANCE

Page 2: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Bioinformatics Today• Biological data are big data

• 1512 online databases (NAR Database Issue 2013)

• Institut Sanger, UK, 5 PB

• Beijing Genome Institute, China, 4 sites, 10 PB➡ Big data in lot of places

• Analysing such data became difficult• Scale-up of the analyses : gene/protein to complete genome/

proteome, ...

• Lot of different daily-used tools

• That need to be combined in workflows

• Usual interfaces: portals, Web services, federation,...➡ Datacenters with ease of access/use

• Distributed resources• Experimental platforms: NGS, imaging, ...

• Bioinformatics platforms➡ Federation of datacenters

ADN

ADN

BI

M

M

ADN

ADN

BI

ADN

ADN

BI CC

BI

M

ADN

ADN

ADN

Page 3: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Sequencing Genomes

source: www.politigenomics.com/next-generation-sequencing-informatics

Complete genome sequencing become a lab commodity with

NGS (cheap and efficient)

source: www.genomesonline.org

Page 4: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Infrastructures in Biology

Lot of toolsand web servicesto treat and vizualize

lot of data

Page 5: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

The scene

• Bioinformatics services providers• Is it easy to deploy lot of (incompatible) tools ?

• To make them connected to public databases ?

• To limit transfer of huge data ?

• To provide users with their own computing resources ?

• With their own isolated storage ?

• Scientists• Is it easy to access/use these tools ?

• To adapt to your usage ?

• To get your/other tools deployed on a datacenter ?

• To combine them ?

• To get my own computing/storage resources ?

Page 6: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

IDB’s Cloud

• Cloud workbench for Biology• 13 turnkey bioinformatics appliances (as of Apr. 2013)

• Running since Sept. 2011, opened to Biology community

• Lyon, FRANCE

• Powered by• StratusLab

• Compute nodes, Block storage

• +900 cores, +4TB RAM, 36TB vdisks

• Mainly Intel SandyBridge servers with 32c 128GB

• Bigmen servers with 64c 768GB

• VMs from 1 to 64c, 512MB to 760GB RAM

• + Openstack

• Object storage (Swift)

• +200 TB redundant & scalable storage

Page 7: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Driven throught a simple web interface

Page 8: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Integrate Bioinformatics Tools in Cloud

BLAST

GOR4

FastASSearch

Abyss

ClustalW

Bioinformatics

Tools

RayBWA

PhyML RedHat,CentOS

Debian,Ubuntu

Suse

LinuxVirtual machines

Createnew

Appliance

Bioinformatics Marketplace

NGSStructure Galaxy ARIA (…)Sequence

• Appliances are virtual machines• small : few GB, easy to convert in most virtualization formats

• Installed and pre-configured with common bioinformatics tools• e.g. BLAST, Clustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.

Page 9: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Bioinformatics Appliances

Page 10: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Select your bioinformatics tools

Page 11: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Run Bioinformatics Cloud InstancesBioinformatics Marketplace

NGSStructure Galaxy ARIA (…)Sequence

IBCP's CloudResources

BLAST,Clustal,

etc.

PaaS

WorkersVM CNS

Shar

ed F

S

launch jobssshIaaS

Master & StorageVM ARIA

Portal

Laun

chIn

stan

ces

Page 12: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Manage your Cloud Instances

Page 13: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

UNIPROT

PDB

EMBLPROSITE

Genomes

Public

Data sources

BioinformaticsCloud

BLAST,Clustal,

etc.

PaaS

WorkersVM CNS

Shar

ed F

S

launch jobssshIaaS

Master & StorageVM ARIA

Portal

shared(NFS)

User

Persistent data

pdisk(iSCSI)

Biological Data in CloudUpload your data

Get your results

scp http/S3

scp http/S3

Page 14: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: ‘biocompute’ Appliance

• Use your own instance(s)

• With pre-installed standard bioinformatics tools• BLAST, FastA, SSearch,HMM,...

• ClustalW2, Clustal-Omega, Muscle,..

• Bowtie(2), BWA, samtools, ...

• MEME, R, etc.

• Connected to public reference data• Uniprot, EMBL, genomes, PDB, etc.

• Automaticaly shared to the VMs

Page 15: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: Galaxy portal for NGS analyses

• Analyse NGS data

• portal Galaxy is widely used in the community

• connected to large public data: sequences and indexes

• large user data (GBs)

• Preserve workflows and results (persistent storage)

Page 16: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: Proteomics• Motivation

• Collaboration with a mass spectroscopy platform

• Running out of space on their local resources

• Protein identification• Mass experimental data

• Reference databases : nr, Swiss-Prot

• Reference screening tools:OMSSA, X!Tandem

• User interface• Remote display

• NX

• Reference GUIs

• SearchGUI

• PeptidShaker

source: PeptideShaker site

Page 17: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Conclusion• Provide turnkey bioinformatics appliances

• Standard tools and pipelines

• Interoperability: ready to run on cloud

• Easier to transfer appliances than data (GB vs TB)

• Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure• Public IDB’s bioinformatics cloud

• Linked to public biological databases

• In collaboration with the French Bioinformatics Institute

• Ease the usage by scientists• Usual bioinformatics gateways

• Persistent and large ubiquitous storage

• Web interface for cloud management

Page 18: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Perspectives• Define good practices to provide academic

community and industry with bioinformatics services!

• French Bioinformatics Institute - IFB• Goals are to provide core bioinformatics resources to the

national and international life science research community in key fields such as genomics, proteomics, systems biology, etc.

• Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud.

• European ELIXIR infrastructure• To build a sustainable European infrastructure for biological

information, supporting life science research and its translation

• IFB will be the French representative in ELIXIR.

Page 19: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

• Acknowledgment

• StratusLab members

• co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001).

Questions ?

http://idee-b.ibcp.fr