High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A.

Download Report

Transcript High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A.

High Performance Computing for
University Medical Research:
A Successful Implementation
Dr. Craig A. Stewart, Ph.D.
[email protected]
Director, Research and Academic Computing, University
Information Technology Services
Director, Information Technology Core, Indiana Genomics Initiative
Dr. Richard Repasky, Ph.D.
[email protected]
Bioinformatics Specialist
License Terms
•
Please cite this presentation as: Stewart, C.A. and R. Repasky. High
Performance Computing for University Medical Research: A Successful
Implementation. 2007. Presentation. Presented at: Bio-IT World Conference
& Expo (Boston, MA, 24-26 Apr 2007). Available from:
http://hdl.handle.net/2022/14600
•
Portions of this document that originated from sources outside IU are shown here
and used by permission or under licenses indicated within this document.
•
Items indicated with a © are under copyright and used here with permission. Such
items may not be reused without permission from the holder of copyright except
where license terms noted on a slide permit reuse.
•
Except where otherwise noted, the contents of this presentation are copyright
2007 by the Trustees of Indiana University. This content is released under the
Creative Commons Attribution 3.0 Unported license
(http://creativecommons.org/licenses/by/3.0/). This license includes the following
terms: You are free to share – to copy, distribute and transmit the work and to
remix – to adapt the work under the following conditions: attribution – you must
attribute the work in the manner specified by the author or licensor (but not in any
way that suggests that they endorse you or your use of the work). For any reuse
or distribution, you must make clear to others the license terms of this work.
Bioinformatics and
Biomedical Research
• Bioinformatics, Genomics, Proteomics, ____ics all
promise to radically change our understanding of
biological function and the way biomedical research is
done.
• Traditional biomedical researchers must take
advantage of new possibilities
• “Post-genomic” research must take advantage of the
tremendous store of detailed knowledge held by
traditional biomedical researchers
Anopheles gambiae
•
From www.sciencemag.org/feature/data/mosquito/mtm/index.html
Source Library:Centers for Disease Control
PHIL Photo Credit:Jim Gathany
IU’s goals for the Indiana
Genomics Initiative (INGEN)
• Build on traditional strengths of IU School of Medicine
• Build on IU's strength in Information Technology
• Add new programs of research made possible by the sequencing of the
human genome
• Perform the research that will generate new treatments for human
disease in the post-genomic era
• Improve human health generally and in the State of Indiana particularly
• Enhance economic growth in Indiana
• INGEN was created by a $105M grant from the Lilly Endowment, Inc.
and launched December, 2000
• The goal of this talk is to explain how advanced information technology
was implemented to aid in the meeting of these goals.
Outline
• Background information about IU
• The Indiana Genomics Initiative (INGEN)
• The INGEN Information Technology Core
• Facilities
• Service
• Some key projects
• Status and summary of success factors
• Acknowledgements
IU in a nutshell
• $2B Annual Budget
• 8 campuses, 90,000 students, 3,900 faculty
• 878 degree programs; > 100 programs ranked within top 20
of their type nationally
• Nation’s second largest school of medicine
• 1,347 M.D., Ph.D. and M.D./Ph.D students
• Sole school of medicine in Indiana
• Traditional strengths in human genetic diseases (e.g.,
Alcoholism, Huntingtons) and medical records (Regenstrief
Institute)
IT @ IU in a nutshell
•CIO: Vice President Michael A. McRobbie
•~$100M annual budget
•Technology services offered university- wide
•Networking
•IU Operates network Operations Center for Abilene
•High Performance Computing
•First university in US to own a 1 TFLOPS supercomputer
•Top 500 list has for past several years included at least one IU
supercomputer
INGEN Structure
Programs
Cores
• Bioethics
• Tech Transfer • Information
• Genomics
• Gene
Expression
• Medical
Informatics
• Education
• Training
Technology
• Proteomics
• Cell & Protein
• Integrated
Expression
• Human
Expression
Imaging
• In vivo
Imaging
• Animal
Indiana Genomics Initiative
Programs
Genomics
Medical
Informatics
Bioinformatics
Training
Education
Bioethics
Cores
Proteomics
Information
Technology
In Vivo
Imaging
Genotyping
and Gene
Expression
Cell
and
Protein
Expression
Animal
Human
Drosophila
Expression
Integrated
Technology
Microscopy
Transfer
Information Technology Core
• Foci:
• High Performance Computing
• Visualization (esp. 3D)
• Massive Data Storage
• Support for use of all of the above
• $6.7M budget for IT Core
• Baseline IT services for School of Medicine
responsibility of School of Medicine CIO
Challenges for UITS
and the INGEN IT Core
• Assist traditional biomedical researchers in adopting use of
advanced information technology (massive data storage,
visualization, and high performance computing)
• Assist bioinformatics researchers in use of advanced computing
facilities
• Questions we are asked:
• Why wouldn't it be better just to buy me a newer PC?
• Questions we ask:
• What do you do now with computers that you would like to do faster?
• What would you do if computer resources were not a constraint?
Steps in meeting the challenge
• Use INGEN funding to enhance IU’s high
performance computing hardware environment
• Use INGEN funding to add dedicated staff
supporting INGEN researchers
• Proof of concept projects showing advanced
capabilities of IU’s IT environment
• Outreach to get many people using at least the basic
capabilities of IU’s advanced IT environment
Hardware Environment
• I-Light network
• High Performance Computing
• IBM SP – 1.005 TFLOPS
• Sun E10000 52 GFLOPS
• Large, distributed Linux cluster – 1.1 TFLOPS
• Massive Data Storage system
• Advanced Visualization Systems
• CAVE
• John-E-Box
IBM Research SP
(Aries/Orion Complex)
•
Acquired 9/96, expanded in 1998, 1999, 2000,2001,2002 with help of
IU IT Strategic Plan funds, IBM SUR grants and INGEN grant from
Lilly Endowment, Inc.
•
Geographically distributed at IUB and IUPUI
•
632 cpus, 1.005 TeraFLOPS
•
First University-owned supercomputer in US to exceed 1 TFLOPS
processing capacity
•
Initially 50th, now 112th in Top 500 supercomputer list
•
Distributed memory system with shared memory nodes
•
AIX 5.1, wealth of software including SAS, SPSS, S-Plus,
Mathematica, Matlab, Maple, Gaussian, GIS, scientific/numerical
libraries, Oracle and DB2, and more
IBM Research SP
(Aries/Orion)
©2000 Tyagan Miller
Sun E10000 (Solar)
• Acquired 4/00
• Shared memory architecture
• ~52 GFLOPS
• 64 400MHz cpus, 64GB memory
• > 2 TB external disk
• Solaris 2.8
• Supports some bioinformatics software not
available under AIX (e.g. GCG/SeqWeb)
Sun E10000 (Solar)
©2000 Tyagan Miller
Distributed Linux Cluster
• AVIDD (Analysis and Visualization of
Instrument-Driven Data)
• 1.1 TFLOPS, 0.5 TB RAM, 10 TB Disk
• Tuned, configured, and optimized for handling
real-time data streams
Massive Data Storage System
• Based on HPSS (High Performance Software
System)
• 180 TB capacity with existing tapes; total
capacity of 480 TB
• First distributed HPSS installation; STK 9310
Silos in Bloomington and Indianapolis
• Automatic replication of data between
Indianapolis and Bloomington, via I-light,
overnight. Critical for biomedical data, which is
often irreplaceable.
STK Silo
©2000 Tyagan Miller
Advanced Visualization
• Advanced Visualization Lab – recognized as leader
in implementation of 3D and other advanced
visualization technologies
• CAVE – Immersive 3D environment
• John-E-Box – IU designed, low-cost passive 3D
device. Under construction now, planned for
installation in multiple INGEN-affiliated labs
John-E-Box
Invented by John N. Huffman, John C.
Huffman, and Eric Wernert
Specific benefits in hardware
environment as a result of
INGEN funding:
• Funded significant fraction of upgrade of IU’s IBM
SP to 1 TFLOPS
• Funded addition of STK Silo in Indianapolis (and
tapes) to provide redundant storage of data
• Funded placement of visualization equipment within
the School of Medicine
So, what now that we have
all of this hardware?
• Strategic relationships with vendors
• University Information Technology Services has a history of
excellent customer support and long-term, collaborative research.
• Focus on provision of facilities and services as a competitive
advantage.
• Annual customer satisfaction survey – user satisfaction typically >
95%. These results probably not representative of SoM as of 2000.
• More information available at
http://www.indiana.edu/~rac/siguccs_copyright.html
• It’s people – consulting staff – that make the hardware useful for
researchers
INGEN IT Core Support Staff
• Visualization programmer, HPC programmer,
and bioinformatics database specialist hired to
support INGEN
• Staff added to existing management units within
UITS
• economy of scale (management, exchange of
expertise)
• Assures addition rather than substitution for basefunded consulting support
So, why is this better than
just buying me a new PC?
• Unique facilities provided by IT Core
• Redundant data storage
• HPC – better uniprocessor performance; trivially
parallel programming, parallel programming
• Visualization in the research laboratories
• Hardcopy document – INGEN's advanced IT
facilities: The least you need to know
• Outreach efforts
• Demonstration projects
Example projects
• Multiple simultaneous Matlab jobs for brain imaging.
• Installation of many commercial and open source
bioinformatics applications.
• Site licenses for several commercial packages
• Evaluation of several software products that were not
implemented.
Creation of new software
• Gamma Knife – Penelope. Modified existing version for
more precise targeting with IU's Gamma Knife.
• Karyote (TM) Cell model. Developed a portion of the code
used for model cell function.
http://biodynamics.indiana.edu/
• PiVNs. Software to visualize human family trees
• 3-DIVE (3D Interactive Volume Explorer).
http://www.avl.iu.edu/projects/3DIVE/
• fastDNAml – maximum likelihood phylogenies
(http://www.indiana.edu/~rac/hpc/fastDNAml/index.html)
• Protein Family Annotator – collaborative development with
IBM, Inc.
Data Integration
• Goal set by IU School of Medicine: Any research
within the IU School of Medicine should be able to
transparently query all relevant public external data
sources and all sources internal to the IU School of
Medicine to which the researcher has read privileges
• IU has more than 1 TB of biomedical data stored in
massive data storage system
• There are many public data sources
• Different labs were independently downloading,
subsetting, and formatting data
• Solution: IBM DiscoveryLink, DB/2 Information
Integrator
Centralized Life Science Database
(CSLD)
• Based on use of IBM DiscoveryLink(TM) and DB/2
Information Integrator(TM)
• Public data is still downloaded, parsed, and put
into a database, but now the process is
automated and centralized.
• Lab data and programs like BLAST are included
via DL’s wrappers.
• Implemented in partnership with IBM Life
Sciences via IU-IBM strategic relationship in the
life sciences
• IU contributed writing of data parsers
Status Overall
• So far, so good
• 108 users of IU’s supercomputers
• 104 users of massive data storage system
• Six new software packages created or enhanced, more than 20
packages installed for use by INGEN-affiliated researchers
• 1 TB of biomedical data stored in the massive data storage
system
• Three software packages made available as open source
software as direct result of INGEN
• The INGEN IT Core is providing services valued by traditionally
trained biomedical researchers as well as researchers in
bioinformatics, genomics, proteomics, etc.
Success in meeting goals?
• Work on Penelope code for Gamma Knife likely to be
first major transferable technology development.
Stands to improve efficacy of Gamma Knife
treatment at IU
• Excellent success in supporting basic research
• Development of open source software (licensed
under terms similar to Lesser GNU) provide
opportunities for technology transfer
• Participation in grants and industrial partnerships
provides economic benefit for IU
Success factors
• Creation of new position, Chief Information Officer and
Associate Dean, within IU School of Medicine, and
significant improvement in basic IT infrastructure
within the IU School of Medicine
• INGEN has permitted IU to build on excellent IT
infrastructure
• Dedicated (but not isolated) staff supporting INGEN
researchers
• Commitment to customer service
• Outreach (in the proper formats)
Success factors, con't
• Scientific collaborations
• Strategy research on behalf of IU School of
Medicine
• Accountability
• Leveraging of industrial partnerships
Funding Support
• This research was supported in part by the Indiana Genomics
Initiative (INGEN). The Indiana Genomics Initiative (INGEN) of
Indiana University is supported in part by Lilly Endowment Inc.
• Joint Study Agreement with IBM, Inc. Protein Family Annotator:
School of Informatics - M Dalkilic, Center for Genomics and
Bioinformatics - P Cherbas, Univ. Information Technology Services &
INGEN IT Core - C Stewart.
• This work was supported in part by Shared University Research
grants from IBM, Inc. to Indiana University.
• This material is based upon work supported by the National Science
Foundation under Grant No. 0116050 and Grant No. CDA-9601632.
Any opinions, findings and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation
(NSF).
Additional Information
• Further information is available at
•
•
•
•
ingen.iu.edu
http://www.indiana.edu/~uits/rac/
http://cgb.indiana.edu/
http://www.ncsc.org/casc/paper.html
Acknowledgements (People)
• UITS Research and Academic Computing Division
managers: Mary Papakhian, David Hart, Stephen
Simms, Richard Repasky, Matt Link, John Samuel,
Eric Wernert, Anurag Shankar
• INGEN Staff: Andy Arenson, Chris Garrison, Huian Li,
Jagan Lakshmipathy, David Hancock
• UITS Senior Management: Associate Vice President
and Dean Christopher Peebles, RAC(Data) Director
Gerry Bernbom
• Assistance with this presentation: John Herrin, Malinda
Lingwall