protein folding landscapes in a distributed environment

28
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Protein Folding Landscapes in a Distributed Environment All Hands Meeting, 2001 University of Virginia Andrew Grimshaw Anand Natrajan Scripps (TSRI) Charles L. Brooks III Michael Crowley SDSC Nancy Wilkins-Diehr

Upload: daktari

Post on 05-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Protein Folding Landscapes in a Distributed Environment. All Hands Meeting, 2001. University of Virginia Andrew Grimshaw Anand Natrajan. Scripps (TSRI) Charles L. Brooks III Michael Crowley. SDSC Nancy Wilkins-Diehr. Outline. CHARMM Issues Legion The Run Results Lessons AmberGrid - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Protein Folding Landscapes in a Distributed Environment

All Hands Meeting, 2001

University of VirginiaAndrew Grimshaw

Anand Natrajan

Scripps (TSRI)Charles L. Brooks III

Michael Crowley

SDSCNancy Wilkins-Diehr

Page 2: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Outline

• CHARMM– Issues

• Legion• The Run

– Results– Lessons

• AmberGrid• Summary

Page 3: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

CHARMM

• Routine exploration of folding landscapes helps in search for protein folding solution

• Understanding folding critical to structural genomics, biophysics, drug design, etc.

• Key to understanding cell malfunctions in Alzheimer’s, cystic fibrosis, etc.

• CHARMM and Amber benefit majority (>80%) of bio-molecular scientists

• Structural genomic & protein structure predictions

Page 4: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Folding Free Energy LandscapeMolecular

Dynamics Simulations

100-200 structures to sample

(r,Rgyr ) space

Rgyr

Page 5: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Application Characteristics

• Parameter-space study– Parameters correspond to structures along

& near folding path

• Path unknown - could be many or broad– Many places along path sampled for

determining local low free energy states– Path is valley of lowest free energy states

from high free energy state of unfolded protein to lowest free energy state (folded native protein)

Page 6: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Folding of Protein L

• Immunoglobulin-binding protein– 62 residues (small), 585 atoms

– 6500 water molecules, total 20085 atoms

– Each parameter point requires O(106) dynamics steps

– Typical folding surfaces require 100-200 sampling runs

• CHARMM using most accurate physics available for classical molecular dynamics simulation

– PME, 9 Ao cutoff, heuristic list update, SHAKE

• Multiple 16-way parallel runs - maximum efficiency

Page 7: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Application Characteristics

• Many independent runs

– 200 sets of data to be simulated in two sequential runs

• Equilibration (4-8 hours)

• Production/sampling (8 to 16 hours)

• Each point has task name, e.g., pl_1_2_1_e

Page 8: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Scientists Using Legion

• Binaries for each type• Script for dispatching

jobs• Script for keeping

track of results

• Script for running binary at site– optional feature in

Legion

• Abstract interface to resources– queues, accounting,

firewalls, etc.

• Binary transfer (with caching)

• Input file transfer• Job submission• Status reporting• Output file transfer

Page 9: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Legion

Complete, Integrated Infrastructure for Secure

Distributed Resource Sharing

Page 10: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Grid OS Requirements

• Wide-area• High Performance• Complexity

Management• Extensibility• Security• Site Autonomy• Input / Output• Heterogeneity

• Fault-tolerance• Scalability• Simplicity• Single Namespace• Resource

Management• Platform

Independence• Multi-language• Legacy Support

Page 11: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Transparent System

Page 12: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

npacinet

Page 13: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

The Run

Page 14: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Computational Issues

• Provide improved response time• Access large set of resources

transparently– geographically distributed– heterogeneous– different organisations

5 organisations7 systems9 queues

5 architectures~1000 processors

Page 15: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

IBM Blue HorizonSDSC

375MHz Power3512/1184

IBM Blue HorizonSDSC

375MHz Power3512/1184

Resources Available

HP SuperDomeCalTech

440 MHz PA-8700128/128

HP SuperDomeCalTech

440 MHz PA-8700128/128

IBM SP3UMich

375MHz Power324/24

IBM SP3UMich

375MHz Power324/24

IBM AzureUTexas

160MHz Power232/64

IBM AzureUTexas

160MHz Power232/64

Sun HPC 10000SDSC

400MHz SMP32/64

Sun HPC 10000SDSC

400MHz SMP32/64

DEC AlphaUVa

533MHz EV5632/128

DEC AlphaUVa

533MHz EV5632/128

Page 16: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Scientists Using Legion

• Binaries for each type• Script for dispatching

jobs• Script for keeping

track of results

• Script for running binary at site– optional feature in

Legion

• Abstract interface to resources– queues, accounting,

firewalls, etc.

• Binary transfer (with caching)

• Input file transfer• Job submission• Status reporting• Output file transfer

Page 17: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Mechanics of Runs

Leg

ion

Register binaries

Create taskdirectories &specification

Dispatchequilibration

Dispatchequilibration& production

Page 18: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

71%

24%

1%

2%

1%

1%

0%SDSC IBMCalTech HPUTexas IBMUVa DECSDSC CraySDSC SunUMich IBM

Distribution of CHARMM Work

Page 19: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

LEGION

• Network slowdowns– Slowdown in the middle of the run– 100% loss for packets of size ~8500 bytes

• Site failures– LoadLeveler restarts– NFS/AFS failures

• Legion– No run-time failures– Archival support lacking– Must address binary differences

Problems Encountered

UVa

SDSC

UMich 01101

Page 20: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Successes

• Science accomplished faster– 1 month on 128 SGI Origins @Scripps– 1.5 days on national grid with Legion

• Transparent access to resources– User didn’t need to log on to different machines– Minimal direct interaction with resources

• Problems identified• Legion remained stable

– Other Legion users unaware of large runs

• Large grid application run at powerful resources by one person from local resource

• Collaboration between natural and computer scientists

Page 21: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

AmberGrid

Easy Interface to Grid

Page 22: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

• Simple point-and-click interface to Grids– Familiar access to distributed file system– Enables & encourages sharing

• Application portal model for HPC– AmberGrid– RenderGrid– Accounting

Legion GUIs

Transparent Accessto Remote Resources

Intended Audience isScientists

Page 23: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Logging in tonpacinet

Page 24: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

View ofcontexts(DistributedFile System)

Page 25: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Control Panel

Page 26: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

RunningAmber

Page 27: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

RunStatus(Legion)

GraphicalView(Chime)

Page 28: Protein Folding Landscapes in a Distributed Environment

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Summary

• CHARMM Run– Succeeded in starting big runs– Encountered problems– Learnt lessons for future– Let’s do it again!

• more processors, systems, organisations

• AmberGrid– Showed proof-of-concept - grid portal– Need to resolve licence issues