activities of the cost d37 gridchem computational chemistry workflow group
DESCRIPTION
Activities of the COST D37 GridChem Computational Chemistry Workflow Group. EGEE'07 Conference Budapest 01.10.2007. Partners in the CCWF Working Group. København. Thomas Steinke, Tim Clark (DE) Hans-Peter Lüthi, Martin Brändle (CH) Peter Murray-Rust , Henry Rzepa (UK) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/1.jpg)
Thomas Steinke
Zuse Institute Berlin (ZIB) <www.zib.de>[email protected]
Activities of the COST D37 GridChemActivities of the COST D37 GridChemComputational Chemistry Workflow Computational Chemistry Workflow
GroupGroup
EGEE'07 ConferenceEGEE'07 Conference
BudapestBudapest
01.10.200701.10.2007
![Page 2: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/2.jpg)
2
• Berlin
• Manno•
• Erlangen
• London•
• Sevilla
Zürich
Cambridge Thomas Steinke, Tim Clark (DE)
Hans-Peter Lüthi, Martin Brändle
(CH)
Peter Murray-Rust, Henry Rzepa
(UK)
Antonio Márquez (ES)
Kurt Mikkelsen (DK)
- CSCS (Manno, CH)
- ZIB (Berlin, DE)
Partners in the CCWF Working Group
København•
![Page 3: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/3.jpg)
3
“Traditional” Workflow in Computational Chemistry
Workflows have a long tradition in the CC domain.
start knowledge base (DB search)automated/manually edited molecular structuresmolecular simulations
method / program Amethod / program B…
propertiesprimary visualization / quality controlanalysis / archival / DB storagenew insights?
in the 80’s – 90’s
![Page 4: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/4.jpg)
4
Databases: Computational protocol (T. Clark, 1998)
Complete protocol runs automatically with less than 0.5% failure rate. Cleanup 2D 3D conversion VAMP optimization Calculate properties
~3,000 compounds per processor day (3 GHz Xeon)
Enhanced 3D-Databases: A Fully Electrostatic Database of AM1-Optimized Structures B. Beck, A. Horn, J. E. Carpenter, and T. Clark, J.Chem. Inf. Comput.Sci. 1998, 38, 1214-1217.
source: Tim Clark, Uni Erlangen
![Page 5: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/5.jpg)
5
Distributed Computing Environment in the 90’s
QMpackages
![Page 6: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/6.jpg)
6
Distributed Computing Environment in the 90’s
Example: UniChemdistributed environment for quantum-chemical
simulationsCray Research Inc. 1991-(2004)
![Page 7: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/7.jpg)
7
CCWF Chemical Illustrator Applications
Molecular design of functionalised enzynesHans-Peter Lüthi, Martin Brändle, ZürichPeter Murray-Rust, Cambridge; Henry Rzepa, London
Quantum chemical based QSAR/QSPRTim Clark, Erlangen; Jon Essex, Southampton
High-order dynamic and static electrostatic molecular properties
Kurt Mikkelsen, Copenhagen
Computational heterogeneous catalysisAntonio M. Márquez Cruz, Javier Fdez. Sanz, Sevilla
![Page 8: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/8.jpg)
8
Molecular Design Workflow (Enzyne Design)
Steps: Generation and
Archiving of data
ExtractionXPath queries
Statistical Analysis
DB
QC Input
QC Output
Input
Output
Parser
StatisticalAnalysis
XMLXPathQuery
XSLT
QCApplication
source: Hans-Peter Lüthi, ETH Zürich
![Page 9: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/9.jpg)
9
Quantum Chemical Based QSAR and QSPR
2D-Database
2D 3DConformations,
Tautomers
VAMP
ParaSurf
QSPR
Virtual Screening
ADME/Tox.
Pharmacokinetics
Molecular Info
Materials Design
Multiscale Modeling
Property Optimization
generate structures,conformations and protonation states
semiempirical MO geometry optimization and electron density
generate isodensity surfaces, spherical-harmonic fits and local properties
apply models
source: Tim Clark, Uni Erlangen
![Page 10: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/10.jpg)
10
-14 -12 -10 -8 -6 -4 -2 0 2 4
Experimental Gsolv(H2O) (kcal mol-1)
-14
-12
-10
-8
-6
-4
-2
0
2
4
Cal
cula
ted
G
solv(H
2O)
(kca
l mol
-1)
Properties: Free Energies of Hydration
N = 362MUE = 0.85 kcal mol-1
RMSD = 1.09 kcal mol-1
r2 = 0.88q2 = 0.83
source: Tim Clark, Uni Erlangen
![Page 11: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/11.jpg)
11
Computing the NCI database (P. Murray-Rust, ’05)
MOPACPM5
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
Workflow built with Taverna
![Page 12: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/12.jpg)
12
Times to run jobs
0
40,000
80,000
120,000
0.E+00 5.E+08 1.E+09
(n basis functions)4
time
/ s
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
![Page 13: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/13.jpg)
13
Protocol
Log Files
Parse
SystemCrashes
ScienceErrors
Analysis
PathologicalBehaviour
Statistics
Other Science DisseminateResults
UnsuitableData
ProgramCrashes
InformDeveloper
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
![Page 14: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/14.jpg)
14
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
Conclusions from NCI “Experiment” (2005)
Protocols can be automated
Machines can highlight unusual behaviour, geometries and distribution of results for humans to consider
Computational programs can provide high quality “experimental” molecular properties
![Page 15: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/15.jpg)
15
Motivation
The orchestration of complex workflow scenarios is on today’s agenda.
complex scientific solution paths linking in-house and (commercial) legacy codes
Transformation of scientific ventures into a scientifically validated protocol
allowing a highly (semi-) automated data generation (pre-processing) and data processing steps.
![Page 16: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/16.jpg)
16
Goals of the CCWF Working Group
implementation of workflow environments for QC by adapting standard (Grid) technologies
fostering standard techniques (interfaces) for handling quantum chemical data in a flexible and extensible format to ensure application program interoperability and support of an efficient access to chemical information based on a CC ontology.
implementation of computational chemistry illustrator scenarios to demonstrate the applicability of our approach
![Page 17: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/17.jpg)
17
Generic Workflow
1. Automatic generation + validation of input data
2. Submission, monitoring, and gathering of output data of
simulation jobs
3. Integration of results (primary data) into project database
4. Data mining and visualization techniques to reduce
complexity
5. Knowledge generation by applying methods of statistical
analysis and pattern recognition.
6. On-line publication and archiving of valuable scientific
data.
![Page 18: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/18.jpg)
18
Challenges
Diversity:Molecular properties derived from state functions obtained with electronic-structure methods. ab-initio, semi-empirical, DFT, approximate potentials
Gaussian, COLUMBUS, Dalton, Turbomole, MOPAC, Vamp, CPMD…
Data formats:How to implement seamless data export/import? ~80 relevant formats known in CC: XYZ, MDL, SDF, PDB, …
OpenBABEL
![Page 19: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/19.jpg)
19
Challenges (cont.)
Scaling, Robustness, Load Balancing:I can handle O(10) jobs by hand but…what about campaigns of O(1000) of jobs? workflow system computational resources distributed computing persistence, automated failure recovery, … long simulation times, sometimes unpredictable
Acceptance: easy of use, GUI + CLI
![Page 20: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/20.jpg)
20
What I Want…
easy-of-use: workflow orchestration usage installation / maintenance
sharing of workflow descriptions with my colleagues standard languages
support in a heterogeneous environment laptop – server – cluster – supercomputer – grid
![Page 21: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/21.jpg)
21
Which Workflow System?
… to be spoilt for choice?
![Page 22: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/22.jpg)
22
Some Assessment Criteria
workflows in distributed systems supported batch systems: PBS (,
LSF) support for managing large files
recovery / backup
quality of the documentation customizability PKI / security
required installation effort Web interface WF language
robustness, stability Grid environment open source
restart/stop/debugging user/installation base
status & exception handling legacy codes and Web services project development activity
GUI
![Page 23: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/23.jpg)
23
TRIANA Experiences (2005/06)
workflow orchestration integration of web
services semantic check of WSDL
files support for self-written
Triana modules negligible control logic
overhead pre-requisite for migration
to Grid environments
- proprietary workflow description language in TRIANA (BPEL is announced)
- GUI robustness for very complex workflow definitions
![Page 24: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/24.jpg)
24
GWES Experiences (MediGRID, since 2006)
integration of web services and legacy codes
monitoring + debugging support
Grid environments under active development
(A. Hoheisel et al./FhG FIRST)
- workflow orchestration (WF GUI builder in preparation)
- proprietary workflow description language
![Page 25: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/25.jpg)
25
![Page 26: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/26.jpg)
26
OMII Server: Attracting Features
Workflows language: BPEL (Active BPEL) WF editor (Eclipse) Web Services customization
Jobs submission & monitoring via
WS job manager API
persistent (job recovery), in-memory (via Hibernate)
Distributed Resource Management (DRM)
Condor-G, Globus Gram SSH-exec your own plug-ins, e.g. PBS
Data GridSAM file staging support within job (JSDL): file stage in/out Apache Virtual File System library
(vfs) FTP, local files, http, http, ssftp zip, jar, tar, bzip2, gzip ram - data in memory
GridFTP
![Page 27: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/27.jpg)
27
OMII/Active BPEL Experiences (3 months)
workflow orchestration (Eclipse plugin)
standardized WF language monitoring support Grid environments security features: https +
signed messages (X.509 cert.)
active development (UK eScience)
- deployment requires manual workarounds
- learning barrier (BPEL)- BPEL editor not fully
mature (validation of BPEL workflows)
![Page 28: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/28.jpg)
28
Summary
there are a couple of workflow system available design/development of workflow system still an on-
going research not yet decided for our working group
barriers: easy to use vs. robustness middleware stack: more complicated Grid
environments vs. script-based approaches on clusters
standards vs. proprietary but powerful/sufficient WF languages BPEL has a high chance to survive
![Page 29: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/29.jpg)
29
Acknowledgement
Core members of D37 CCWF working group Hans-Peter Lüthi, ETH Zurich Tim Clark, CCC Uni Erlangen J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang, Uni
Cambridge/Unilever Inst.
developer of workflow systems mentioned in this talk
![Page 30: Activities of the COST D37 GridChem Computational Chemistry Workflow Group](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813ffe550346895dab2ed1/html5/thumbnails/30.jpg)
30
QUESTIONS?