Global Analysis of Arthropod Evolution – a successful grid project Craig A. Stewart, Rainer Keller, Matthias Hess, Uwe Woessner, Martin Aumüller, Matthias Müller, Richard Repasky,

Download Report

Transcript Global Analysis of Arthropod Evolution – a successful grid project Craig A. Stewart, Rainer Keller, Matthias Hess, Uwe Woessner, Martin Aumüller, Matthias Müller, Richard Repasky,

Global Analysis of Arthropod
Evolution – a successful grid
project
Craig A. Stewart, Rainer Keller, Matthias Hess, Uwe Woessner, Martin
Aumüller, Matthias Müller,
Richard Repasky, David Hart, Huian Li, Donald K. Berry
University Information Technology Services, Indiana University
High Performance Computing Center Stuttgart
And many other contributors…
© Copyright Trustees of Indiana University 2004
1
License Terms
•
•
•
•
Please cite this presentation as: Stewart, C.A., R. Keller, M. Hess, U. Wössner, M.
Aumüller, M. Müller, R. Repasky, D. Hart, H. Li and D.K. Berry. Global grid analysis
of arthropod evolution – a successful grid project. 2004. Presentation. Presented
at: 7th HLRS Metacomputing and GRID Workshop (Stuttgart, Germany, 26 Apr
2004). Available from: http://hdl.handle.net/2022/14782
Portions of this document that originated from sources outside IU are shown here
and used by permission or under licenses indicated within this document.
Items indicated with a © or denoted with a source url are under copyright and
used here with permission. Such items may not be reused without permission from
the holder of copyright except where license terms noted on a slide permit reuse.
Except where otherwise noted, the contents of this presentation are copyright
2004 by the Trustees of Indiana University. This content is released under the
Creative Commons Attribution 3.0 Unported license
(http://creativecommons.org/licenses/by/3.0/). This license includes the
following terms: You are free to share – to copy, distribute and transmit the work
and to remix – to adapt the work under the following conditions: attribution – you
must attribute the work in the manner specified by the author or licensor (but not
in any way that suggests that they endorse you or your use of the work). For any
reuse or distribution, you must make clear to others the license terms of this work.
Outline
•
•
•
•
•
•
The SCxy conference and the HPC Challenge
The biological problem
The software used
The global grid
The results!
Acknowledgements
3
The SCxy conference and the HPC
Challenge
• Supercomputing
Conference (sponsored
by ACM and IEEE)
• High Performance
Challenge
– demonstrates new
capabilities in
advanced computing
systems
– (or sometimes silly
supercomputer tricks)
4
Biological problem
Are Hexapods a single evolutionary group? Are ecdysozoans a
single evolutionary group?
5
A partial bestiary
All organism illustrations copyright
Jennifer Fairman, 2003.
www.fairmanstudios.com
Used by agreement
6
Software and data analysis
• Non-grid preparatory work
– Download sequences from NCBI (67 Taxa, 12,162 bp,
mitochondrial genes for 12 proteins)
– Align sequences with Multi-Clustal
– Determine rate parameters with TreePuzzle
• Grid preparatory work
– Analyze performance of fastDNAml with Vampir
– Meetings via Access Grid & CoVise
• The grid software
– PACXMPI – Grid/MPI middleware
– Covise – Collaboration and visualization
– fastDNAml – Maximum Likelihood phylogenetics
7
• A project of HLRS (High Performance Computing Center
Stuttgart)
• PACX-MPI (PArallel Computer eXtension) enables
seamlessly execution of MPI-conforming parallel
applications on a Grid.
• Application recompiled and linked w. PACX-MPI.
• Communication between MPI processes internally is
done with the vendor MPI, while communication to other
parts of the Metacomputer is done via the connecting
network.
• Key advantages:
– Optimized vendor MPI library is used.
– Two daemons (MPI processes) take care of
communication between systems – allows bundling
of communication.
8
COVISE
• COllaborative VIsualization and Simulation Environment
• A project of HLRS (High Performance Computing Center
Stuttgart)
• Focus on collaborative and interactive use of
supercomputers
• Interactive startup of calculation on a Computational Grid
• Real-Time visualization of the results and the performance
9
of computation.
• ML analysis of
phylogenetic
trees based on
DNA sequences
• Foreman/worker
MPI program
• Heuristic search
for best trees
• For 67 taxa:
2.12 ~10109 trees
• Goal: 300
bootstraps, 10
jumbles per –
3000 executions
(more than 3x
typical!)
fastDNAml
10
Why this project on a grid?
• Important & time-sensitive biological question requiring
massive computer resources
• A biologically-oriented code that scales well
• Grid middleware environment & collaboration tool well
suited to the task at hand
• Opportunity to create a grid spanning every continent
on earth (except Antarctica)
11
The metacomputers
• One
• Two
• Three
• Four
• Five
Origin 2000
32
Linux cluster
64
Linux cluster
12
IBM SP
32
T3E
128
IBM SP
64
Dec Alpha
4
Sun fire 6800
16
Hitachi SR8000
32
Cray T3E
128
Cray T3E
32
IBM SP (Blue Horiz)
32
Dec Alpha (Lemieux) 64
Linux system
1
Spain
Japan
Australia
US
Germany
US
Brazil
Singapore
Germany
UK
US
US
US
Tunisia
Five functional units; 8 types of systems
(several on Top500 list); 6+ vendors; 641
processors; 9 countries, 6 continents
12
13
The results
• ~200 trees were
analyzed during
the course of the
week
• The biological
results are still
being analyzed
• Our HPC challenge
project was
awarded the prize
for “Most
geographically
distributed
application”
14
Things we learned
• Proper alignment of parallelism coarseness and network
speeds was important
• There was real value to the use of the metacomputer
concept within the overall grid
• You can distribute a lot of machine computations, but
less of the human work. (=>simplicity is a virtue)
• There are today few large scale grids delivering
computational services for biological computation in a
persistent fashion.
• The temporary grid we created ranks as one of the
larger grids ever created for biological computing
15
For further information
• fastDNAml:
http://www.indiana.edu/~rac/hpc/fastDNAml/
• PACXMPI:
www.hlrs.de/organization/pds/projects/pacx-mpi
• COVISE: www.hlrs.de/organization/vis/covise
• HLRS: www.hlrs.de
• UITS: uits.iu.edu
• Center for Genomics and Bioinformatics:
www.cgb.indiana.edu
• SCxy: www.supercomp.org
• about.uits.iu.edu/divisions/rac/index.html
• about.uits.iu.edu/divisions/rac/pubsstaff.html
• ingen.iu.edu
• it.iu.edu
16
Acknowledgments
• This research was supported in part by the Indiana Genomics
Initiative. The Indiana Genomics Initiative of Indiana University
is supported in part by Lilly Endowment Inc.
• This work was supported in part by Shared University
Research grants from IBM, Inc. to Indiana University.
• This material is based upon work supported by the National
Science Foundation under Grant No. 0116050 and Grant No.
CDA-9601632. Any opinions, findings and conclusions or
recommendations expressed in this material are those of the
authors) and do not necessarily reflect the views of the
National Science Foundation (NSF).
• Assistance with this presentation: John Herrin, Malinda
Lingwall, W. Les Teach
• Thanks to the SciNet team and SC2003 organizers!
• This project was an outcome of a kind invitation from Prof.
Dr. Michael Resch and HLRS to Craig Stewart last year.
17
Our partners
18
Rainer Keller, Matthias Hess
Richard Repasky
John Colbourne
Craig Stewart, David Hart
Jennifer Steinbachs
Uwe Woessner
Donald Berry
Matthias Mueller
Huian Li
Gary W. Stuart
Michael Resch
Eric Wernert
Martin Aumüller, Ulrich Lang
Markus Buchhorn
Hiroshi Takemiya
Rim Belhaj
Wolfgang E. Nagel
Sergui Sanielevici
Sergio takeo Kofuji
David Bannon
Norihiro Nakajima
Rosa Badia
Mark A. Miller
Hyungwoo Park
Rick Stevens
Fang-Pang Lin
John Brooke
David Moffett
Tan Tin Wee
Greg Newby
J.C.T. Poole
Ramched Hamza
Mary Papakhian, John N. Huffman
Leigh Grundhoeffer
Ray Sheppard
Peter Cherbas
Stephen Pickles, Neil Stringfellow
HLRS, University of Stuttgart
UITS, Indiana University
Center for Genomics and Informatics, Indiana University
UITS, Indiana University
Center for Genomics and Bioinformatics, Indiana University
HLRS, University of Stuttgart
UITS, Indiana University
HLRS, University of Stuttgart
UITS, Indiana University
Center for Genomics and Bioinformatics, Indiana University
HLRS, University of Stuttgart
UITS, Indiana University
HLRS, University of Stuttgart
Australia National University
National Institute of Advanced Industrial Science & Technology, Japan
ISET'Com, Tunesia
ZHR, Technical University of Dresden
Pittsburgh Supercomputing Center
LCCA/CCE-USP
Victorian Partnership for Advanced Computing, Australia
Japan Atomic Energy Research Institute
CEPBA-IBM Research Institute
San Diego Supercomputer Center
Korea Institute of Science and Technology Information
Argonne National Laboratory
National Center for High Performance Computing
Manchester Computing
Purdue University
National University of Singapore
Arctic Region Supercomputer Center
CACR, Cal-Tech
Sup'com, Tunesia
UITS, Indiana University
UITS, Indiana University
UITS, Indiana University
Center for Genomics and Bioinformatics, Indiana U.
CSAR, University of Manchester
19
Thank you!
Questions?
20