deployment of bioxsd- enabled services on a cloud · [email protected] deployment of...
TRANSCRIPT
Deployment of BioXSD-enabled services on a Cloud
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Outline
• IBCP, provider of BioXSD-enabled services• Cloud Computing• RENABI GRISBI, French infrastructure
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Bioinformatics Integrated Tools
gbio-pbil.ibcp.fr/ws
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
GBIO WS resources
VirtualizationVMware ESX/Xen
ComputingNodes
PhysicalMachines
VirtualMachines
CentralStorage NFS / S3
NFS / S3
Web ServicesHOST
Jobs
Biologist 'sMachine HTTP
SOAP
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Web Services clients
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
• Protein sequence analysis at large scale with generic services,
• User: get the most information possible regarding their protein(s)
• Bioinformatics tools• use case with common
software BLAST, ClustalW, GOR4, CATH-Gene3D HMMscan, ...
• Biological data• Analyzing large sets of proteins
obtained for example from Next Generation Sequencing
• Using international databases, i.e. UniProt
Pro
tein
s
Shared
Storage
Results
gathering
ToolTool
ToolTool
ToolTool
ToolTool
10sto
1,000s
User's
sequences
splitting
Master
Protein Similarity
Knowledge Bases- MB to 100s GB
- Swiss-Prot, TrEMBL
Read
only
Tools- BLAST, FastA, SSearch
EMBRACE Usecase «Usage of Generic WS»
https://bioinformatics.bmc.uu.se/WP4/content/view/125/50/
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
BioXSD Services
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Benefits of Controlled Vocabulary,Ontology and Schema
• verify the quality of the input/ouput data
• expected data type : integer, string, ...
• expected value: size, allowed range, ...
• identify abnormal data
• verify the format of the input data
• return detailled error message: what’s wrong in the format
• convert the data in a good format
• evaluate complexity/cost for the data
• input data and its transfer/computation cost (time and €)
• ouput data and its storage/transfer cost (time and €)
• workflow• group different process according
to the data
• make the Web services «user-friendly»
• Web services selection• BioCatalogue, Seekda
• Web services composition• Taverna, Triana, ...
• switch to other provider in case of failure
• Web services customization• to let users map/connect to
their own ontology
• rich plugins to input data• specific forms, with data
converters, an adequat help/doc about the data to input, on different hardware/support
• data conversion in workflow
• rich plugins to display data
Providers Users
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Outline
• IBCP, provider of BioXSD-enabled services• Cloud Computing• RENABI GRISBI, French infrastructure
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Cloud ?
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Cloud: principles
VirtualInfrastructureBiologist 'sMachine
Web
CLOUDPortal andServices
HTTPWebServices
Web Serv
ices
CloudNode
PhysicalInfrastructure
CloudNode
CloudNode
CloudNode
CLOUDSite
CloudComputing
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Cloud: middlewares
• Open source• hipernet (HIPCAL project),
• eucalyptus (Amazon EC2/S3 like),
• open nebula (OGF, StratusLab EU FP7)
• others: nimbus, cloudStack, ...
• Commercial• Amazon EC2,
• Google Apps,
• IBM,
• Microsoft,
• Yahoo, ...
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Cloud Comparison
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Cloud Console
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Deploying on Cloud
ToolsShared
Storage
Results
gathering
ToolTool
ToolTool
ToolTool
ToolTool
10sto
1,000s
User's
sequences
splitting
Master
Protein Similarity
Knowledge Bases- MB to 100s GB
- Swiss-Prot, TrEMBL
Read
only
Tools- BLAST, FastA, SSearch
Virtual
Infrastructure
Biologist 'sMachine SSH
WS
...
SSH
WS
...
SSH
WS
...IBCP’s cloud
5 servers40 cores, 160GB
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Connect to Cloud
Choose VMs
Deploy VMs
Submit SOAP
Process SOAP
Results
Compute job
Ye
t d
on
e b
y I
BC
P
Bioinformatics Services on a CLOUD
VirtualizationVMware ESX/Xen
ComputingNodes
PhysicalMachines
VirtualMachines
CentralStorage NFS / S3
NFS / S3
BioinformaticsVirtual Apps
3.Jobs
Biologist 'sMachine 1. & 5.
HTTPSOAP Cloud
Console
Mgmt VMs
2.Register
BiologicalData
4. Get
Results
CloudStorage
(S3)
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
PerspectivesConnect to
Cloud
Choose VMs
Deploy VMs
Submit SOAP
Process SOAP
Results
Compute job
VirtualAppliances
Deployment
VirtualizationVMware ESX/Xen
ComputingNodes
PhysicalMachines
VirtualMachines
CentralStorage NFS / S3
NFS / S3
Jobs
Biologist 'sMachine
2.HTTPSOAP
CloudConsole
Mgmt VMs
0.HTTP/S
SSH
1.Register
BiologicalData
3. Get
Results
CloudStorage
(S3)
CloudConsole
(EC2, ONE, hipcal)
BioinformaticsVirtual Apps
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
What next ?
• Platform-as-a-Service• Provide scientists with pre-defined VM to deploy
• on the Research Infrastructure, e.g. RENABI GRISBI
• on their own computer/cloud
• Bioinformatics centers switch to virtual appliances provider
• Infrastructure-as-a-Service• User can connect to the community cloud to
deploy required VM
• Deploy the required infrastructure according to the workflow:
• VXDL/VPXI developed by HIPCAL project
• Haizea with the OpenNebula.org
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Outline
• IBCP, provider of BioXSD-enabled services• Cloud Computing• RENABI GRISBI, French infrastructure
GRISBI - Grid, Support to Bioinformatics, www.grisbio.fr
GRISBI- Grid Support to Bioinformatics -
• National Research Infrastructure
• RENABI, IBISA 2008-2010, Institut des Grilles 2009-2010
• 6 centers from RENABI
• PRABI, MIGALE, GenOuest, CBIB Bordeaux, BIPS, CIB
• 8 sites, with 7 CNRS institutes IBCP Lyon, SBR Roscoff, CBiB Bordeaux, CIB Lille, IRISA Rennes, LBBE Lyon, MIGALE Jouy-en-Josas, BIPS Strasbourg
• 40 participants
• Computig resources
• 1200 cores, 220 TB storage
Make possible challenging bioinformatics applications dealing with large scale biological systems
ii
GRSB
6 centers1000 cores - RAM 2TB
Storage 150 TB
© RENABI GRISBI 2009 - www.grisbio.fr
CIB
BIPS
GenOuestMigale
PRABI
CBiB
GRISBI - Grid, Support to Bioinformatics, www.grisbio.fr
GRISBI site
GRIDComputing
Nodes
PhysicalMachines
VirtualizationVMware ESX
UI
SECE
VirtualMachines
GRISBI - Grid, Support to Bioinformatics, www.grisbio.fr
GRISBI Infrastructure
ii
GRSB
CIB
BIPS
GenOuestMigale
PRABI
CBiB
412 c75 TB
48 c30 TB
128 c1.5 TB
216 c62 TB
376 c50 TB
32 c5 TB
1212 c224 TB
150 c10 TB
06/201015c,1TB
136 c3.2 GB
426 c17 TB
120 c2 TB
09/2010
06/2010192+2
08/2010
20 c5 TB
+ Core services (IDG)WMS (GRIF),
LFC&VOMS (CC)
GRISBI - Grid, Support to Bioinformatics, www.grisbio.fr
GRISBI usecase
Will be used tomorrow
as hands-on material
by LABRI people
(RENABI GRISBI partner)
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Conclusions
• Standard Integration of bioinformatics tools• Freely available to community
• Annotated tools with BioXSD/EDAM
• Cloud perspective• Use standard infrastructures
• Managed infrastructure with Web interface
• Platform- and Infrastructure-as-a-Service
• Perspectives• Research Infrastructures based on public/private
cloud
• Provide the community with bioinformatics Virtual Appliances
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Acknowledgment
CNRS - Centre National de la Recherche Scientifique
University of Lyon 1
The ANR throught the HIPCAL project
GIS IBISA through the project GRISBI PF 2008
The European Commission through the EU FP7 EGEE III project, contract number INFSO-RI-222667.
CNRS IBCP: E. Bettler, C. Combet, G. Deléage, C. Eloto, C. Gauthey, A. Joseph, A. Michon, J Pessey, F. Penin.HIPCAL: Pascale Vicat-Blanc and partnersEMBRACE partners GRISBI partners CNRS IBCP
Institute of Biology and Chemistry of Proteins7 passage du Vercors, 69007 LYON, FRANCE
Workshop for Web Service Providers in Bioinformatics2 June 2010, CBS, Lyngby
Thanks
• Questions ?