Protein Folding Landscapes in a Distributed Environment All Hands Meeting, 2001

Download Report

Transcript Protein Folding Landscapes in a Distributed Environment All Hands Meeting, 2001

Protein Folding Landscapes in a
Distributed Environment
All Hands Meeting, 2001
University of Virginia
Andrew Grimshaw
Anand Natrajan
Scripps (TSRI)
Charles L. Brooks III
Michael Crowley
SDSC
Nancy Wilkins-Diehr
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Outline
• CHARMM
– Issues
• Legion
• The Run
– Results
– Lessons
• AmberGrid
• Summary
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
CHARMM
• Routine exploration of folding landscapes helps in
search for protein folding solution
• Understanding folding critical to structural
genomics, biophysics, drug design, etc.
• Key to understanding cell malfunctions in
Alzheimer’s, cystic fibrosis, etc.
• CHARMM and Amber benefit majority (>80%) of
bio-molecular scientists
• Structural genomic & protein structure
predictions
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Folding Free Energy Landscape
Molecular
Dynamics Simulations
100-200 structures
to sample
(r,Rgyr ) space
r
NATIONAL PARTNERSHIP
FOR
Rgyr
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Application Characteristics
• Parameter-space study
– Parameters correspond to structures along &
near folding path
• Path unknown - could be many or broad
– Many places along path sampled for determining
local low free energy states
– Path is valley of lowest free energy states from
high free energy state of unfolded protein to
lowest free energy state (folded native protein)
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Folding of Protein L
• Immunoglobulin-binding protein
– 62 residues (small), 585 atoms
– 6500 water molecules, total 20085 atoms
– Each parameter point requires O(106) dynamics steps
– Typical folding surfaces require 100-200 sampling runs
• CHARMM using most accurate physics available
for classical molecular dynamics simulation
– PME, 9 Ao cutoff, heuristic list update, SHAKE
• Multiple 16-way parallel runs - maximum efficiency
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Application Characteristics
• Many independent runs
– 200 sets of data to be simulated in two
sequential runs
• Equilibration (4-8 hours)
• Production/sampling (8 to 16 hours)
• Each point has task name, e.g., pl_1_2_1_e
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Scientists Using Legion
• Binaries for each type
• Script for dispatching
jobs
• Script for keeping
track of results
• Script for running
binary at site
– optional feature in
Legion
NATIONAL PARTNERSHIP
FOR
• Abstract interface to
resources
– queues, accounting,
firewalls, etc.
• Binary transfer (with
caching)
• Input file transfer
• Job submission
• Status reporting
• Output file transfer
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Legion
Complete, Integrated Infrastructure
for Secure Distributed Resource
Sharing
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid OS Requirements
• Wide-area
• High Performance
• Complexity
Management
• Extensibility
• Security
• Site Autonomy
• Input / Output
• Heterogeneity
NATIONAL PARTNERSHIP
FOR
•
•
•
•
•
•
Fault-tolerance
Scalability
Simplicity
Single Namespace
Resource Management
Platform
Independence
• Multi-language
• Legacy Support
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Transparent System
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
npacinet
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
The Run
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Computational Issues
• Provide improved response time
• Access large set of resources
transparently
– geographically distributed
– heterogeneous
– different organisations
NATIONAL PARTNERSHIP
FOR
5 organisations
7 systems
9 queues
5 architectures
~1000 processors
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Resources Available
HP SuperDome
CalTech
440 MHz PA-8700
128/128
IBM SP3
UMich
375MHz Power3
24/24
DEC Alpha
UVa
533MHz EV56
32/128
IBM Blue Horizon
SDSC
375MHz Power3
512/1184
NATIONAL PARTNERSHIP
Sun HPC 10000
SDSC
400MHz SMP
32/64
FOR
IBM Azure
UTexas
160MHz Power2
32/64
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Scientists Using Legion
• Binaries for each type
• Script for dispatching
jobs
• Script for keeping
track of results
• Script for running
binary at site
– optional feature in
Legion
NATIONAL PARTNERSHIP
FOR
• Abstract interface to
resources
– queues, accounting,
firewalls, etc.
• Binary transfer (with
caching)
• Input file transfer
• Job submission
• Status reporting
• Output file transfer
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Mechanics of Runs
Dispatch
Create
task
Dispatch
equilibration
directories
&
equilibration
&
production
specification
NATIONAL PARTNERSHIP
FOR
Legion
Register binaries
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Distribution of CHARMM Work
1%
1%
2%
1%
0%
24%
71%
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
SDSC IBM
CalTech HP
UTexas IBM
UVa DEC
SDSC Cray
SDSC Sun
UMich IBM
Problems Encountered
• Network slowdowns
– Slowdown in the middle of the run
– 100% loss for packets of size ~8500 bytes
LEGION
• Site failures
– LoadLeveler restarts
– NFS/AFS failures
• Legion
UMich
– No run-time failures
– Archival support lacking
– Must address binary differences
NATIONAL PARTNERSHIP
FOR
01101
SDSC
ADVANCED COMPUTATIONAL INFRASTRUCTURE
UVa
Successes
• Science accomplished faster
– 1 month on 128 SGI Origins @Scripps
– 1.5 days on national grid with Legion
• Transparent access to resources
– User didn’t need to log on to different machines
– Minimal direct interaction with resources
• Problems identified
• Legion remained stable
– Other Legion users unaware of large runs
• Large grid application run at powerful resources by one
person from local resource
• Collaboration between natural and computer scientists
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
AmberGrid
Easy Interface to Grid
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Legion GUIs
• Simple point-and-click interface to Grids
– Familiar access to distributed file system
– Enables & encourages sharing
• Application portal model for HPC
– AmberGrid
– RenderGrid
– Accounting
NATIONAL PARTNERSHIP
Transparent Access
to Remote Resources
Intended Audience is
Scientists
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Logging in to
npacinet
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
View of
contexts
(Distributed
File System)
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Control Panel
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Running
Amber
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Run
Status
(Legion)
Graphical
View
(Chime)
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE
Summary
• CHARMM Run
–
–
–
–
Succeeded in starting big runs
Encountered problems
Learnt lessons for future
Let’s do it again!
• more processors, systems, organisations
• AmberGrid
– Showed proof-of-concept - grid portal
– Need to resolve licence issues
NATIONAL PARTNERSHIP
FOR
ADVANCED COMPUTATIONAL INFRASTRUCTURE