HIGH PERFORMANCE COMPUTING, THE SCIENTIFIC IMPERATIVE

Download Report

Transcript HIGH PERFORMANCE COMPUTING, THE SCIENTIFIC IMPERATIVE

High Performance Computing
and
Biomedical Research
Ralph Roskies
Professor of Physics-University of Pittsburgh
Scientific Director-Pittsburgh Supercomputing Center

Coalition for
of Academic
AcademicScientific
ScientificComputation,
Computation,September
September18,
18,2002
2002












P I TT TT S B U R G
G H
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Why the recent emphasis on
computing in biomedicine?
Economics- in 15 years, for the same cost
 Can do 10,000 times more processing
 Can store 100,000 times more data
Software developments
 New algorithms for processing
 The Web for finding data
Now possible to do things
previously not feasible
Acquiring, storing, finding, accessing large
amounts of data (terabytes to petabytes of
text, numbers, images)
 Assembling large databases and
repeatedly reprocessing them to ask new
questions
 Simulations based on realistic models to

understand the data,
 do predictive, as opposed to descriptive,
biology and medicine

Why High Performance Computing?

HPC is a discovery tool.

Bringing more problems within reasonable human
timescales encourages creativity and exploration
HPC is a time machine
 Incorporating more realism in models crosses a
threshold of relevance to experiment

HPC really involves computing, networking,
visualization, storage
Real-time fMRI
3.0T MRI Scanner
Cray T3E
In 1996, this needed a supercomputer
Today, it’s routine
SGI Onyx
Compelling biomedical investigations
that HPC enables today
Genomics
 Analyzing and storing images for
revolutionizing medical care
 Blood flow and heart disease
 Structural biology

Data Explosion
Exponential Growth of GenBank
20
Number of Gigabases

15
Growth in 2002 due
to additions up to
May 23th!
10
courtesy of
Thom Dunning,
BioGrid North Carolina
5
0
1982
1986
1990
1994
1998
2002
Simulations linking genes and
diseases

Based on Utah’s resource of 1.5M people,
with their genealogy, record-linked to cancer
and death records back to the early 1900s.


Typical pedigree goes back 8 generations
Need genotypic data on hundreds of people alive
today
University of Utah Division of Genetic Epidemiology and
Center for High Performance Computing
Image Analysis

USC Microtomography


High-throughput 3D microtomography
using data from electron microscopes
Need high performance computing for
real-time analysis
Compare 3-D MRI
of normal and
knockout mice
Allan Johnson-Duke
Blood Flow

1990’s- Realistic geometry
Artificial valve design
(Charles Peskin et al, Courant Institute;
150 hours C90)

Today- role of turbulence in
loosening plaque, leading to
embolisms
(Henry Tufo et al, Argonne; 104 hours TCS)

Tomorrow- designing heart
pumps to minimize damage to
individual blood cells
(Jim Antaki et al, University of Pittsburgh,
millions of hours TCS?)
Structural Biology e.g.
How Do Aquaporins Work?



Aquaporins -proteins which conduct
large volumes of water through cell
walls while filtering out charged
particles like hydrogen ions.
Massive simulation showed that
water moves through aquaporin
channels in single file. Oxygen leads
the way in. Half way through, the
water molecule flips over.
That breaks the ‘proton wire’
Klaus Schulten et al, U. of Illinois, SCIENCE (April 19, 2002)
35,000 hours TCS
For a given computing capability, certain
important problems get solved.
Many problems need more computing
power than we currently have
Protein folding
 Analyzing genomic and proteomic data
 Cell modeling and metabolism

Protein folding
Critical for drug design,
 Understanding misfolding
crucial for diseases like
Alzheimers and mad cow
 Today’s most powerful
systems can only simulate
microseconds of real time,
but folding takes milliseconds
or more

Villin headpiece
Red-native
Blue- partially
folded
Genomics and Proteomics

Data is increasing rapidly. Computational
demands for integrating, mining and
analyzing that data grows even faster.
By 2005
Genomic data–petabytes
Computational needs- 10
Teraflops
courtesy of
Thom Dunning,
BioGrid North Carolina
from TimeLogic
Cell modeling

Need to take account of
spatial inhomogeneity
 cell geometry
 signal variability (stochastic behavior)

Synaptic Transmission
Many
neurological
diseases due
to problems of
release or
absorption of
neurotransmitters
like acetylcholine,
glutamate,
glycine, GABA,
serotonin
Joel Stiles, PSC and Tom Bartol, Salk- MCell
Unusual Medical Success
In Slow Channel Congenital Myasthenic Syndrome,
channel closes slower upon binding. Electrical
current continues longer than normal.
 Particular patient presented puzzling symptoms
 Stiles experimented with the model parameters, and
simulations showed that one could explain the
symptoms if the receptors also opened slowly- then
verified medically
 Unusual interplay of simulation and medical
diagnosis- depends critically on realistic
geometry and on stochastic modeling

HPC has enormous promise for
biomedicine and improving health

See e.g.
The Biomedical Information Science and
Technology Initiative (BISTI) report, June 1999
www.nih.gov/about/director/060399.htm
 PITAC Report to the President, “Transforming
Health Care Through Information Technology”
February 2001, www.itrd.gov/pubs/pitac/index.html
 Department of Energy, Computational Structural
Biology http://cbcg.lbl.gov/ssi-csb/Program.html

PITAC Recommendations for NIH

Pilot projects and Enabling Technology
Centers should be established to extend the
practical uses of information technology to
health care systems and biomedical research


NCRR doing some of this, but Resource budgets limited at
$700K.
NIBIB?
PITAC recommendations (cont’d)

Programs should be established to increase
the pool of biomedical research and health
care professionals with training at the
intersection of health and information
technology.

NPEBC programs a start. But biologists will soon be
overtaken by technical developments and the associated
analysis needs
PITAC recommendations (cont’d)

A scalable national computing infrastructure
should be provided to support the biomedical
research community;




Still badly needed
NSF and civilian DOE have each recently
invested~$100M in HPC infrastructure.
Biomedical users are very heavy users of NSF
and DOE facilities. (At PSC, close to 50% this past
year).
NIH has almost no investment in these or
comparable resources. (PSC has 30 times the
compute power of NCI’s ABCC at Frederick)
Hardware is not enough

Also need support people, knowledgeable in
both computing and biology to interact with
and support the biomedical research
community.
Emerging paradigm
Grid Computing

Stresses collaboration, seamless
access to data wherever located
Multiple Computers
Distributed data sets
High speed networks
Common Interface
We urge you to consider:
NIH should establish HPC Centers, with
leading-edge hardware, biomedicallyoriented support staff, research into
relevant algorithms, and vigorous training.
 NIH should actively cooperate with NSF,
DOE, and other agencies and shoulder
their fair share in building the national
computing infrastructure.
 Comparable budget scale is ~$100M/year .

There are many sites in the nation that
could respond credibly to an NIH
solicitation for such Center, including
many minority institutions as partners
 Because computing infrastructure cuts
across Institutes, it is not, and will
never be the major priority of any
Institute

A cross-Institute initiative of
this magnitude and importance
cannot happen without
leadership from the Director