Transcript
Page 1: Providing Bioinformatics Services on Cloud

Christophe Blanchet, Clément Gauthey

Infrastructure Distributed for BiologyIDB-IBCP CNRS FR3302 - LYON - FRANCE

http://idee-b.ibcp.fr

IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)

Providing Bioinformatics Serviceson Cloud

C. Blanchet and C. GautheyEGI CF13, Manchester, 9 April 2013

Infrastructure Distributed for Biology - IDB

CNRS-IBCP FR3302, Lyon, FRANCE

Page 2: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Bioinformatics Today• Biological data are big data

• 1512 online databases (NAR Database Issue 2013)

• Institut Sanger, UK, 5 PB

• Beijing Genome Institute, China, 4 sites, 10 PB➡ Big data in lot of places

• Analysing such data became difficult• Scale-up of the analyses : gene/protein to complete genome/

proteome, ...

• Lot of different daily-used tools

• That need to be combined in workflows

• Usual interfaces: portals, Web services, federation,...➡ Datacenters with ease of access/use

• Distributed resources• Experimental platforms: NGS, imaging, ...

• Bioinformatics platforms➡ Federation of datacenters

ADN

ADN

BI

M

M

ADN

ADN

BI

ADN

ADN

BI CC

BI

M

ADN

ADN

ADN

Page 3: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Sequencing Genomes

source: www.politigenomics.com/next-generation-sequencing-informatics

Complete genome sequencing become a lab commodity with

NGS (cheap and efficient)

source: www.genomesonline.org

Page 4: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Infrastructures in Biology

Lot of toolsand web servicesto treat and vizualize

lot of data

Page 5: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

The scene

• Bioinformatics services providers• Is it easy to deploy lot of (incompatible) tools ?

• To make them connected to public databases ?

• To limit transfer of huge data ?

• To provide users with their own computing resources ?

• With their own isolated storage ?

• Scientists• Is it easy to access/use these tools ?

• To adapt to your usage ?

• To get your/other tools deployed on a datacenter ?

• To combine them ?

• To get my own computing/storage resources ?

Page 6: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

IDB’s Cloud

• Cloud workbench for Biology• 13 turnkey bioinformatics appliances (as of Apr. 2013)

• Running since Sept. 2011, opened to Biology community

• Lyon, FRANCE

• Powered by• StratusLab

• Compute nodes, Block storage

• +900 cores, +4TB RAM, 36TB vdisks

• Mainly Intel SandyBridge servers with 32c 128GB

• Bigmen servers with 64c 768GB

• VMs from 1 to 64c, 512MB to 760GB RAM

• + Openstack

• Object storage (Swift)

• +200 TB redundant & scalable storage

Page 7: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Driven throught a simple web interface

Page 8: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Integrate Bioinformatics Tools in Cloud

BLAST

GOR4

FastASSearch

Abyss

ClustalW

Bioinformatics

Tools

RayBWA

PhyML RedHat,CentOS

Debian,Ubuntu

Suse

LinuxVirtual machines

Createnew

Appliance

Bioinformatics Marketplace

NGSStructure Galaxy ARIA (…)Sequence

• Appliances are virtual machines• small : few GB, easy to convert in most virtualization formats

• Installed and pre-configured with common bioinformatics tools• e.g. BLAST, Clustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.

Page 9: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Bioinformatics Appliances

Page 10: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Select your bioinformatics tools

Page 11: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Run Bioinformatics Cloud InstancesBioinformatics Marketplace

NGSStructure Galaxy ARIA (…)Sequence

IBCP's CloudResources

BLAST,Clustal,

etc.

PaaS

WorkersVM CNS

Shar

ed F

S

launch jobssshIaaS

Master & StorageVM ARIA

Portal

Laun

chIn

stan

ces

Page 12: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Manage your Cloud Instances

Page 13: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

UNIPROT

PDB

EMBLPROSITE

Genomes

Public

Data sources

BioinformaticsCloud

BLAST,Clustal,

etc.

PaaS

WorkersVM CNS

Shar

ed F

S

launch jobssshIaaS

Master & StorageVM ARIA

Portal

shared(NFS)

User

Persistent data

pdisk(iSCSI)

Biological Data in CloudUpload your data

Get your results

scp http/S3

scp http/S3

Page 14: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: ‘biocompute’ Appliance

• Use your own instance(s)

• With pre-installed standard bioinformatics tools• BLAST, FastA, SSearch,HMM,...

• ClustalW2, Clustal-Omega, Muscle,..

• Bowtie(2), BWA, samtools, ...

• MEME, R, etc.

• Connected to public reference data• Uniprot, EMBL, genomes, PDB, etc.

• Automaticaly shared to the VMs

Page 15: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: Galaxy portal for NGS analyses

• Analyse NGS data

• portal Galaxy is widely used in the community

• connected to large public data: sequences and indexes

• large user data (GBs)

• Preserve workflows and results (persistent storage)

Page 16: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Example: Proteomics• Motivation

• Collaboration with a mass spectroscopy platform

• Running out of space on their local resources

• Protein identification• Mass experimental data

• Reference databases : nr, Swiss-Prot

• Reference screening tools:OMSSA, X!Tandem

• User interface• Remote display

• NX

• Reference GUIs

• SearchGUI

• PeptidShaker

source: PeptideShaker site

Page 17: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Conclusion• Provide turnkey bioinformatics appliances

• Standard tools and pipelines

• Interoperability: ready to run on cloud

• Easier to transfer appliances than data (GB vs TB)

• Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure• Public IDB’s bioinformatics cloud

• Linked to public biological databases

• In collaboration with the French Bioinformatics Institute

• Ease the usage by scientists• Usual bioinformatics gateways

• Persistent and large ubiquitous storage

• Web interface for cloud management

Page 18: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

Perspectives• Define good practices to provide academic

community and industry with bioinformatics services!

• French Bioinformatics Institute - IFB• Goals are to provide core bioinformatics resources to the

national and international life science research community in key fields such as genomics, proteomics, systems biology, etc.

• Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud.

• European ELIXIR infrastructure• To build a sustainable European infrastructure for biological

information, supporting life science research and its translation

• IFB will be the French representative in ELIXIR.

Page 19: Providing Bioinformatics Services on Cloud

EGI CF13, Manchester, 9 April 2013

• Acknowledgment

• StratusLab members

• co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001).

Questions ?

http://idee-b.ibcp.fr


Top Related