HIGH PERFORMANCE COMPUTING, THE SCIENTIFIC IMPERATIVE
Download
Report
Transcript HIGH PERFORMANCE COMPUTING, THE SCIENTIFIC IMPERATIVE
High Performance Computing
and
Biomedical Research
Ralph Roskies
Professor of Physics-University of Pittsburgh
Scientific Director-Pittsburgh Supercomputing Center
Coalition for
of Academic
AcademicScientific
ScientificComputation,
Computation,September
September18,
18,2002
2002
P I TT TT S B U R G
G H
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Why the recent emphasis on
computing in biomedicine?
Economics- in 15 years, for the same cost
Can do 10,000 times more processing
Can store 100,000 times more data
Software developments
New algorithms for processing
The Web for finding data
Now possible to do things
previously not feasible
Acquiring, storing, finding, accessing large
amounts of data (terabytes to petabytes of
text, numbers, images)
Assembling large databases and
repeatedly reprocessing them to ask new
questions
Simulations based on realistic models to
understand the data,
do predictive, as opposed to descriptive,
biology and medicine
Why High Performance Computing?
HPC is a discovery tool.
Bringing more problems within reasonable human
timescales encourages creativity and exploration
HPC is a time machine
Incorporating more realism in models crosses a
threshold of relevance to experiment
HPC really involves computing, networking,
visualization, storage
Real-time fMRI
3.0T MRI Scanner
Cray T3E
In 1996, this needed a supercomputer
Today, it’s routine
SGI Onyx
Compelling biomedical investigations
that HPC enables today
Genomics
Analyzing and storing images for
revolutionizing medical care
Blood flow and heart disease
Structural biology
Data Explosion
Exponential Growth of GenBank
20
Number of Gigabases
15
Growth in 2002 due
to additions up to
May 23th!
10
courtesy of
Thom Dunning,
BioGrid North Carolina
5
0
1982
1986
1990
1994
1998
2002
Simulations linking genes and
diseases
Based on Utah’s resource of 1.5M people,
with their genealogy, record-linked to cancer
and death records back to the early 1900s.
Typical pedigree goes back 8 generations
Need genotypic data on hundreds of people alive
today
University of Utah Division of Genetic Epidemiology and
Center for High Performance Computing
Image Analysis
USC Microtomography
High-throughput 3D microtomography
using data from electron microscopes
Need high performance computing for
real-time analysis
Compare 3-D MRI
of normal and
knockout mice
Allan Johnson-Duke
Blood Flow
1990’s- Realistic geometry
Artificial valve design
(Charles Peskin et al, Courant Institute;
150 hours C90)
Today- role of turbulence in
loosening plaque, leading to
embolisms
(Henry Tufo et al, Argonne; 104 hours TCS)
Tomorrow- designing heart
pumps to minimize damage to
individual blood cells
(Jim Antaki et al, University of Pittsburgh,
millions of hours TCS?)
Structural Biology e.g.
How Do Aquaporins Work?
Aquaporins -proteins which conduct
large volumes of water through cell
walls while filtering out charged
particles like hydrogen ions.
Massive simulation showed that
water moves through aquaporin
channels in single file. Oxygen leads
the way in. Half way through, the
water molecule flips over.
That breaks the ‘proton wire’
Klaus Schulten et al, U. of Illinois, SCIENCE (April 19, 2002)
35,000 hours TCS
For a given computing capability, certain
important problems get solved.
Many problems need more computing
power than we currently have
Protein folding
Analyzing genomic and proteomic data
Cell modeling and metabolism
Protein folding
Critical for drug design,
Understanding misfolding
crucial for diseases like
Alzheimers and mad cow
Today’s most powerful
systems can only simulate
microseconds of real time,
but folding takes milliseconds
or more
Villin headpiece
Red-native
Blue- partially
folded
Genomics and Proteomics
Data is increasing rapidly. Computational
demands for integrating, mining and
analyzing that data grows even faster.
By 2005
Genomic data–petabytes
Computational needs- 10
Teraflops
courtesy of
Thom Dunning,
BioGrid North Carolina
from TimeLogic
Cell modeling
Need to take account of
spatial inhomogeneity
cell geometry
signal variability (stochastic behavior)
Synaptic Transmission
Many
neurological
diseases due
to problems of
release or
absorption of
neurotransmitters
like acetylcholine,
glutamate,
glycine, GABA,
serotonin
Joel Stiles, PSC and Tom Bartol, Salk- MCell
Unusual Medical Success
In Slow Channel Congenital Myasthenic Syndrome,
channel closes slower upon binding. Electrical
current continues longer than normal.
Particular patient presented puzzling symptoms
Stiles experimented with the model parameters, and
simulations showed that one could explain the
symptoms if the receptors also opened slowly- then
verified medically
Unusual interplay of simulation and medical
diagnosis- depends critically on realistic
geometry and on stochastic modeling
HPC has enormous promise for
biomedicine and improving health
See e.g.
The Biomedical Information Science and
Technology Initiative (BISTI) report, June 1999
www.nih.gov/about/director/060399.htm
PITAC Report to the President, “Transforming
Health Care Through Information Technology”
February 2001, www.itrd.gov/pubs/pitac/index.html
Department of Energy, Computational Structural
Biology http://cbcg.lbl.gov/ssi-csb/Program.html
PITAC Recommendations for NIH
Pilot projects and Enabling Technology
Centers should be established to extend the
practical uses of information technology to
health care systems and biomedical research
NCRR doing some of this, but Resource budgets limited at
$700K.
NIBIB?
PITAC recommendations (cont’d)
Programs should be established to increase
the pool of biomedical research and health
care professionals with training at the
intersection of health and information
technology.
NPEBC programs a start. But biologists will soon be
overtaken by technical developments and the associated
analysis needs
PITAC recommendations (cont’d)
A scalable national computing infrastructure
should be provided to support the biomedical
research community;
Still badly needed
NSF and civilian DOE have each recently
invested~$100M in HPC infrastructure.
Biomedical users are very heavy users of NSF
and DOE facilities. (At PSC, close to 50% this past
year).
NIH has almost no investment in these or
comparable resources. (PSC has 30 times the
compute power of NCI’s ABCC at Frederick)
Hardware is not enough
Also need support people, knowledgeable in
both computing and biology to interact with
and support the biomedical research
community.
Emerging paradigm
Grid Computing
Stresses collaboration, seamless
access to data wherever located
Multiple Computers
Distributed data sets
High speed networks
Common Interface
We urge you to consider:
NIH should establish HPC Centers, with
leading-edge hardware, biomedicallyoriented support staff, research into
relevant algorithms, and vigorous training.
NIH should actively cooperate with NSF,
DOE, and other agencies and shoulder
their fair share in building the national
computing infrastructure.
Comparable budget scale is ~$100M/year .
There are many sites in the nation that
could respond credibly to an NIH
solicitation for such Center, including
many minority institutions as partners
Because computing infrastructure cuts
across Institutes, it is not, and will
never be the major priority of any
Institute
A cross-Institute initiative of
this magnitude and importance
cannot happen without
leadership from the Director