I N D I A N A U Implementing advanced IT facilities for the Indiana Genomics Initiative N I V E R S I T Y Craig A. Stewart [email protected] HPC@IDC meeting April 23-24, 2002, HPC User Forum meeting, Santa Fe, New Mexico HPC@IDC April.

Download Report

Transcript I N D I A N A U Implementing advanced IT facilities for the Indiana Genomics Initiative N I V E R S I T Y Craig A. Stewart [email protected] HPC@IDC meeting April 23-24, 2002, HPC User Forum meeting, Santa Fe, New Mexico HPC@IDC April.

I
N
D
I
A
N
A
U
Implementing advanced IT
facilities for the Indiana
Genomics Initiative
N
I
V
E
R
S
I
T
Y
Craig A. Stewart
[email protected]
HPC@IDC meeting
April 23-24, 2002, HPC User Forum
meeting, Santa Fe, New Mexico
HPC@IDC April 2002
I
N
License terms
D
I
•
A
N
A
U
N
I
V
E
•
Please cite as: Stewart, C.A. Implementing advanced IT facilities for the Indiana
Genomics Initiative. 2002. Presentation. Presented at: HPC User Forum (Santa Fe,
New Mexico, 23 Apr 2002). Available from: http://hdl.handle.net/2022/15220
Except where otherwise noted, by inclusion of a source url or some other note, the
contents of this presentation are © by the Trustees of Indiana University. This
content is released under the Creative Commons Attribution 3.0 Unported license
(http://creativecommons.org/licenses/by/3.0/). This license includes the following
terms: You are free to share – to copy, distribute and transmit the work and to
remix – to adapt the work under the following conditions: attribution – you must
attribute the work in the manner specified by the author or licensor (but not in any
way that suggests that they endorse you or your use of the work). For any reuse or
distribution, you must make clear to others the license terms of this work.
R
S
I
T
Y
2
I
N
Indiana University’s Goals
D
I
A
N
A
U
N
I
V
• IT Goal: “To be a leader in absolute terms in information
technology.” IU president Myles Brand, 1996
• Goals of the Indiana Genomics Initiative: To advance
understanding of life’s processes, develop new therapies
for human diseases, improve the quality of human health in
Indiana, and enhance the strength of the central Indiana
high-tech economy
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
IU in a nutshell
D
I
A
N
A
U
N
I
V
E
R
S
•
•
•
•
Founded in 1820
$2B Annual Budget
8 campuses
Campuses well connected; esp.
IUB, IUPUI, and Purdue’s
campus at W. Lafayette
connected by I-light
• IU Operates TransPAC,
GlobalNOC
I
T
Y
HPC@IDC April 2002
I
N
IT@IU in a nutshell
D
I
A
N
A
U
N
I
V
E
R
S
• Academic programs in IT through computer
science, library and information sciences,
engineering and technology, and most notably
through new School of Informatics
• CIO: Vice President Michael A. McRobbie
• ~$100M annual budget
• Technology services offered university-wide
• pervasivetechnologylabs
I
T
Y
HPC@IDC April 2002
I
N
D
School of Medicine in a nutshell
I
A
N
A
U
N
I
V
E
• 2nd largest School of Medicine in the US
• IU Cancer Center nationally recognized leader
• Regenstrief Institute longstanding leader in medical
informatics
• National leader in optical and tomographic imaging
• Longstanding leader in genetically influenced diseases
including Huntington’s (Conneally), alcoholism (Li);
currently lead institution in national study of bipolar disorder
R
S
I
T
Y
HPC@IDC April 2002
I
N
INGEN
D
I
A
N
A
U
N
I
V
E
R
S
• Created by $105 M grant from the Lilly
Endowment to Indiana University
• Involves IU School of Medicine (IUPUI),
Departments of Biology and Chemistry (IUB),
Center for Genomics and Proteomics (IUB), and
University Information Technology Services
• Comprised of “Programs” (central research areas)
and “Cores” (supporting units that are also
generally research areas)
I
T
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
IT and INGEN
D
I
A
N
A
U
N
I
V
E
R
S
• INGEN’s IT core is a critical part of the infrastructure for
the initiative as a whole
– Networking (using I-light facility)
– Supercomputing
– Massive Data Storage
– Visualization
– Support
• IT is one of the paths by which INGEN should enhance the
Indiana Economy
I
T
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
E
R
Supercomputing - Oct 17 IU/IBM
announcement
• IU tripled the capacity of its IBM SP, to > 1 TFLOPS (a
trillion mathematical operations per second).
• IU’s SP is very large when considered within the set of
supercomputers owned by individual universities
• Large part of this acquisition made possible via funding
from INGEN
• IU and IBM also announced a partnership in developing
new supercomputer applications for the life sciences
S
I
T
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
Y
HPC@IDC April 2002
Photo: Tyagan Miller. May be reused by IU for noncommercial
purposes. To license for commercial use, contact the photographer
I
N
Sun E10000
D
I
A
N
A
U
N
I
V
• IU is a Sun “Center
of Excellence” and is
pursuing
collaborative
research with Sun in
the area of Chemical
Informatics
E
R
S
I
Photo: Tyagan Miller. May be reused by IU for noncommercial
purposes. To license for commercial use, contact the photographer
T
Y
HPC@IDC April 2002
I
N
AVIDD
D
I
A
N
A
U
N
I
V
E
•
•
•
•
•
Analysis & Visualization of Instrument-Driven Data
Large, distributed Intel-compatible Linux cluster
Distributed data storage/data staging
Distributed visualization
Education a key component of this initiative – distributed
education (IUB, IUPUI, IUN) taught via Access Grids at
advanced undergrad/beginning grad level
R
S
I
T
Y
HPC@IDC April 2002
I
N
Massive Data Storage
D
I
A
N
A
U
N
I
V
E
R
S
• IU has a large massive data storage system based on IBM
and STK tape robotic systems.
• IU’s massive data storage system is based on HPSS (High
Performance Storage System) which provides for excellent
security.
• >300 TB current capacity
• Mirrored storage in Indianapolis and Bloomington should
provide safety in data storage
• IU was first installation to implement remote HPSS
movers over long haul networks
I
T
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
Y
HPC@IDC April 2002
Photo: Tyagan Miller. May be reused by IU for noncommercial
purposes. To license for commercial use, contact the photographer
I
N
Advanced Visualization
D
I
A
N
A
U
N
I
V
E
• UITS, IU School of Medicine, and IUPUI Computer &
Information Science have already collaborated to create 3DIVE (3-D Interactive Volume Explorer)
• CAVE
• Immersadesk
• IU-designed passive 3D environments (4’ sq screen, 5’ sq
footprint)
R
S
I
T
Y
HPC@IDC April 2002
I
N
D
Accomplishments & Challenges
I
A
N
A
U
N
I
V
E
• Past accomplishments
– fastDNAml
– 3DIVE
• Challenges
– Broader engagement with life scientists
– Data heterogeneity
– New application areas
R
S
I
T
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
fastDNAml
U
N
I
V
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
Building Phylogenetic Trees
D
I
A
N
A
U
N
I
V
E
R
S
• Goal: an objective
means by which
phylogenetic trees can
be estimated in
tolerable amounts of
wall-clock time,
producing
phylogenetic trees
with measures of their
uncertainty
I
T
Y
HPC@IDC April 2002
I
N
D
Why is tree-building a HPC problem?
I
A
• The number of bifurcating
unrooted trees for n taxa is
(2n-5)!/ {2n-3(n-3)!}
• For 100 taxa the number
of possible trees is ~10182
N
A
U
N
I
V
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
fastDNAml
D
I
A
N
A
U
N
I
V
E
R
S
•
•
•
•
Developed by Gary Olsen
Derived from Felsensteins’s PHYLIP programs
One of the more commonly used ML methods
The first phylogenetic software implemented in a parallel
program (at Argonne National Laboratory, using P4 libraries)
• Olsen, G.J.,et al.1994. fastDNAml: a tool for construction of
phylogenetic trees of DNA sequences using maximum
likelihood. Computer Applications in Biosciences 10: 41-48
• MPI version available from IU now (development supported
by IBM SUR grant)
I
T
Y
HPC@IDC April 2002
I
N
Performance of fastDNAml
D
I
70
A
60
A
50
U
SpeedUp
N
40
30
N
I
20
V
10
E
0
R
0
10
20
30
40
50
Number of Processors
S
Perfect Scaling
I
T
Y
HPC@IDC April 2002
50Taxa
101Taxa
150Taxa
60
70
I
N
Current projects
D
I
A
N
A
U
N
I
V
E
•
•
•
•
•
•
•
Data integration
Gamma knife
Pedigree analysis
PET scan analysis
Protein families
AMASS – shotgun sequence assembly
Data, data, data
R
S
I
T
Y
HPC@IDC April 2002
I
N
HPC and life sciences
D
I
A
N
A
U
N
I
V
E
R
S
I
T
• HPC hardware and software market set to dramatically
expand thanks to life sciences
• HPC and life sciences communities don’t share common
language
• Biomedical researchers are no more conservative than
anyone else
• Biomedical researchers not alone in creating bad code
• Both communities have lots to offer each other, but it
seems at present up to the HPC community to reach out
(when was the last time an astronomer saved your life?)
• HPC community has been slow to take advantage of
opportunities offered via collaboration with life scientists
• This will be like the dot-com bust – sort of. The key
question is: how great will be the similarities?
Y
HPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
Challenges: creating collaborations with life
scientists
• Need to challenge “I can do it on my desktop” mentality
when appropriate
• Go for the low hanging fruit
• Remember that physics, astronomy, and other traditional
HPC codes have a head start of many years
• Need to recognize the complexity of the life sciences
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
Current approaches @ IU
D
I
A
N
A
U
N
I
•
•
•
•
•
Really clever batch scripts…. then portals
Appropriate documentation
Door to door consulting
Proof of concept projects
Contributions to open source/community code efforts
V
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
D
Keys to success in partnerships @ IU
I
A
N
A
U
N
I
V
E
R
• Long history of openness, diversity in HPC uses
• Accountability and service philosophy
• Supercomputing time and programming support baseline
services
• Central computing center staff hired from several
disciplines (including biology)
• Computer scientists who actually care about applications
• History and a certain amount of luck
S
I
T
Y
HPC@IDC April 2002
I
N
Summary
D
I
A
N
A
U
N
I
V
• IU has thus far been very successful in implementing
advanced IT infrastructure for life scientists
• Reaching out has been essential to formation of
partnerships
• Industry partnerships have been essential to success
• So far, so good……
E
R
S
I
T
Y
HPC@IDC April 2002
I
N
Acknowledgements
D
I
A
N
A
U
N
I
V
E
•
•
•
•
IBM research relationships & SUR grants
Sun and Center of Excellence relationships
Compaq relationship
Computer scientists at IU (esp. Randall Bramley, Dennis
Gannon, Shaoifen Fang)
• State of Indiana
• Lilly Endowment
R
S
I
T
Y
HPC@IDC April 2002
I
N
Important URLs
D
I
A
N
A
U
N
I
V
E
R
S
• University Information Technology Services:
www.indiana.edu/~uits/
• UITS Research & Academic Computing Division
www.indiana.edu/~uits/rac
• InGen IT Core:
www.indiana.edu/~rac/bioinformatics/ingen.html
• IU Teraflop SP announcement:
www.indiana.edu/~rac/outreach.html
• IT@IU: it.iu.edu
I
T
Y
HPC@IDC April 2002