Penguin Computing and Indiana University partner for “above campus” and campus bridging services to the community Craig A.

Download Report

Transcript Penguin Computing and Indiana University partner for “above campus” and campus bridging services to the community Craig A.

Penguin Computing and Indiana University partner
for “above campus” and campus bridging services to
the community
Craig A. Stewart ([email protected])
Executive Director, Pervasive Technology Institute
Associate Dean, Research Technologies
Associate Director, CREST
Matthew Link
Director, Systems, Research Technologies
Associate Director, CREST
George Turner
Manager, High Performance Systems
William K. Barnett
Director, National Center for Genome Analysis Support
Associate Director, Center for Applied Cybersecurity Research, PTI
Indiana University - pti.iu.edu
Presented SC11 Exhibits Hall, Nov 14-17, IEE/ACM SC11 conference, Seattle, WA
1
2
License terms
•
•
•
•
Please cite this presentation as: Stewart, C.A., M.R. Link, G. Turner, W. K. Barnett,
2011. Penguin Computing and Indiana University partner for “above campus” and
campus bridging services to the community. Presented SC11 Exhibits Hall, Nov 1417, IEE/ACM SC11 conference, Seattle, WA. http://hdl.handle.net/2022/13880
Portions of this document that originated from sources outside IU are shown here and used
by permission or under licenses indicated within this document.
Items indicated with a © are under copyright and used here with permission. Such items
may not be reused without permission from the holder of copyright except where license
terms noted on a slide permit reuse.
Except where otherwise noted, the contents of this presentation are copyright 2011 by the
Trustees of Indiana University. This content is released under the Creative Commons
Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license
includes the following terms: You are free to share – to copy, distribute and transmit the
work and to remix – to adapt the work under the following conditions: attribution – you must
attribute the work in the manner specified by the author or licensor (but not in any way that
suggests that they endorse you or your use of the work). For any reuse or distribution, you
must make clear to others the license terms of this work.
Some CI resources available to science and
engineering researchers in US (March 2011)
3
NSF Track 1
Track 2 and other
major facilities
Campus HPC/ Tier
3 systems
Workstations at
Carnegie
research…
Volunteer
computing
Commercial
cloud (Iaas and
Paas)
0
2,000
4,000
6,000
8,000
10,000
12,000
TFLOPS
Based on: Welch, V.; Sheppard, R.; Lingwall, M.J.; Stewart, C. A. 2011. Current structure and past history of US
cyberinfrastructure (data set and figures). http://hdl.handle.net/2022/13136
Adequacy of research CI
Never (10.6%)
Some of the
time (20.2%)
Most of the time
(40.2%)
Responses to question asking if researchers had sufficient access to
Cyberinfrastructure resources – survey sent to 5,000 researchers selected randomly
from 34,623 researchers funded by NSF as Principle Investigators 2005-2009;
Results based on 1,028 responses
Stewart, C.A., D.S. Katz, D.L. Hart, D. Lantrip, D.S. McCaulay and R.L. Moore. Technical Report: Survey of
cyberinfrastructure needs and interests of NSF-funded principal investigators. 2011. http://hdl.handle.net/2022/9917
4
5
Clouds look serene enough
Photo by http://www.flickr.com/photos/mnsc/
http://www.flickr.com/photos/mnsc/2768391365/sizes/z/in/photostream/
http://creativecommons.org/licenses/by/2.0/
6
But is ignorance bliss?
•
In the cloud, do you know:
– Where your data are?
– What laws prevail over the physical location of your data?
– What license you really agreed to?
– What is the security (electronic / physical) around your data?
– And how exactly do you get to that cloud, or get things out of it?
– How secure your provider is financially? (The fact that something seems
unimaginable, like cloud provider such-and-such cloud provider going
out of business abruptly, does not mean it is impossible!)
7
Cloud computing - NIST
•
•
•
•
Cloud computing is a model for enabling convenient, on-demand network
access to a shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or service
provider interaction.
This cloud model promotes availability and is composed of five essential
characteristics (On-demand self-service, Broad network access, Resource
pooling, Rapid elasticity, Measured Service); three service models (Cloud
Software as a Service (SaaS), Cloud Platform as a Service (PaaS), Cloud
Infrastructure as a Service (IaaS)); and, four deployment models (Private
cloud, Community cloud, Public cloud, Hybrid cloud).
Key enabling technologies include: (1) fast wide-area networks, (2)
powerful, inexpensive server computers, and (3) high-performance
virtualization for commodity hardware.
http://www.nist.gov/itl/cloud/index.cfm
8
Above Campus services
•
•
•
•
Above Campus Services
– "We are seeing the early emergence of a meta-university — a transcendent,
accessible, empowering, dynamic, communally constructed framework of
open materials and platforms on which much of higher education worldwide
can be constructed or enhanced.” Charles Vest, president emeritus of MIT,
2006
Goal: achieve economy of scale and retain reasonable measure of control
See: Brad Wheeler and Shelton Waggener. 2009. Above-Campus Services:
Shaping the Promise of Cloud Computing for Higher Education. EDUCAUSE
Review, vol. 44, no. 6 (November/December 2009): 52-67.
www.educause.edu/EDUCAUSE+ReviewEDUCAUSEReviewMagazineVolume44
AboveCampusServicesShapingtheP/185222
9
Penguin Computing and IU partner for “Cluster as a
Service”
•
•
•
•
•
Just what it says: Cluster as a Service
Cluster physically located on IU’s campus, in IU’s Data Center
Available to anyone at a .edu or FFRDC (Federally Funded Research and
Development Center)
To use it:
– Go to podiu.penguincomputing.com
– Fill out registration form
– Verify via your email
– Get out your credit card
– Go computing
This builds on Penguin’s experience - currently host Life Technologies'
BioScope and LifeScope in the cloud (http://lifescopecloud.com)
10
We know where the data are … and they are secure
11
POD IU (Rockhopper) specifications
Server Information
Architecture
TFLOPS
Clock Speed
Nodes
CPUs
Memory Type
Total Memory
Memory per Node
Local Scratch Storage
Cluster Scratch
Further Details
Penguin Computing Altus 1804
4.4
2.1GHz
11 compute; 2 login; 4 management; 3 servers
4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node
Distributed and Shared
1408 GB
128GB 1333MHz DDR3 ECC
6TB locally attached SATA2
100TB Lustre
OS
CentOS 5
Network
Job Management Software
Job Scheduling Software
Job Scheduling policy
Access
QDR (40Gb/s) Infiniband, 1Gb/s ethernet
SGE
SGE
Fair Share
keybased ssh login to headnodes
remote job control via Penguin's PODShell
12
Available applications at POD IU (Rockhopper)
Package name
Summary
COAMPS
Coupled ocean / atmosphere meoscale prediction system
Desmond
Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics
simulations of biological systems on conventional commodity clusters.
GAMESS
GAMESS is a program for ab initio molecular quantum chemistry.
Galaxy
Galaxy is an open, web-based platform for data intensive biomedical research.
GROMACS
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of
motion for systems with hundreds to millions of particles.
HMMER
HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein
sequence alignments.
Intel
compilers and libraries
LAMMPS
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively
Parallel Simulator.
MM5
The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following sigmacoordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is supported
by several pre- and post-processing programs, which are referred to collectively as the MM5 modeling
system.
mpiBLAST
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST.
NAMD
NAMD is a parallel molecular dynamics code for large biomolecular systems.
13
Available applications at POD IU (Rockhopper)
Package name
Summary
NCBI-Blast
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and calculates the statistical
significance of matches.
OpenAtom
OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at the
quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method.
OpenFoam
The OpenFOAM® (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD software
package produced by OpenCFD Ltd. It has a large user base across most areas of engineering and science,
from both commercial and academic organisations. OpenFOAM has an extensive range of features to
solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid
dynamics and electromagnetics.
OpenMPI
Infinibad based Message Passing Interface - 2 (MPI-2) implementation
POP
POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner and Chervin in which
depth is used as the vertical coordinate. The model solves the three-dimensional primitive equations for fluid
motions on the sphere under hydrostatic and Boussinesq approximations.
Portland Group
compilers
R
R is a language and environment for statistical computing and graphics.
WRF
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting and atmospheric research needs. It features
multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a software
architecture allowing for computational parallelism and system extensibility.
14
IU / POD as an example of effective campus bridging
•
•
•
•
•
The goal of campus bridging is to enable the seamlessly integrated use among a scientist
or engineer’s personal cyberinfrastructure; cyberinfrastructure on the scientist’s campus;
cyberinfrastructure at other campuses; and cyberinfrastructure at the regional, national,
and international levels; as if they were proximate to the scientist.
– Short form: The goal of campus bridging is to make local, regional, and national
cyberinfrastructure facilities appear as if they were peripherals to your laptop
We remember that the speed of light is fixed, but latency is not the biggest problem!
The biggest problems:
– Not enough CI resources available to most researchers
– When you go from your campus to the national cyberinfrastructure it can feel like you
are falling off a cliff! That’s why you need bridging….
More info on campus bridging at http://pti.iu.edu/campusbridging/
IU is collaborating with Penguin Computing to support the national research community in
general and in particular with two NSF-funded projects:
– eXtreme Science and Engineering Discovery Environment (XSEDE)
– National Center for Genome Analysis Support
15
XSEDE and Penguin – part 1
•
XSEDE (eXtreme Science and Engineering Discovery Environment) is a project,
an institution, and a set of services.
– As a project, XSEDE is a five-year, $121 million grant award made by the
National Science Foundation (NSF) to the National Center for
Supercomputing Applications (NCSA) at the University of Illinois and its
partners via program solicitation NSF 08-571. XSEDE is a successor to the
NSF-funded TeraGrid project, which itself succeeded the NSF
supercomputer center program that began in the 1980s.
– As an institution, XSEDE is a collaboration led by NCSA and 18 partner
organizations to deliver a series of instantiations of services, each
instantiation being developed through a formal systems engineering
process.
– As a set of services, XSEDE integrates supercomputers, visualization and
data analysis resources, data collections, and software into a single virtual
system for enhancing the productivity of scientists, engineers, social
scientists, and humanities experts.
16
XSEDE and Penguin – part 2
•
•
•
•
•
Under TeraGrid, it was never possible to buy “TeraGrid-like” cycles, and
many people viewed the allocation process as very slow
XSEDE is speeding up the allocation process considerably
IU is working with Penguin Computing to install the basic open source
XSEDE software environment on Rockhopper
It will be possible to buy “XSEDE-like” cycles in a matter of minutes using a
credit card
In some circumstances this will be a much better way to meet peak needs,
or use startup funds, than buying and installing “clusters in a closet.”
17
NCGAS, POD IU, and campus bridging
•
•
•
•
•
•
•
The National Center for Genome Analysis Support
A Cyberinfrastructure Service Center affiliated with the Pervasive
Technology Institute at Indiana University (http://pti.iu.edu)
Dedicated to supporting life science researchers who need computational
support for genomics analysis
Initially funded by the National Science Foundation Advances in Biological
Informatics (ABI) program, grant # 1062432
Provides access to genomics analysis software on supercomputers
customized for genomics studies INCLUDING POD IU
Particularly focused on supporting genome assembly codes such as:
– de Bruijn graph methods: SOAPdeNovo, Velvet, ABySS,
– consensus methods: Celera, Newbler, Arachne 2
For more information, see http://ncgas.org
18
Summary
•
•
•
•
•
IU and its partners are collaborating with Penguin Computing Inc. to implement
a new model of ‘above campus’ services that provides many of the advantages
of “cloud” services, while avoiding many of the drawbacks.
The service provided is “Cluster as a Service” – a real, high performance
supercomputer cluster
Access is simple – if you are at a .edu or a FFRDC, get out your credit card and
go computing
As examples of effective campus bridging:
– This service is being supported by the IU National Center for Genome
Analysis Support
– IU is providing the open source components of the XSEDE software
environment to provide a “run-like” XSEDE environment that you can
access in minutes with a credit card
Establishing this partnership is possible through the involvement of our key
academic partners: University of California Berkeley, University of Virginia,
University of Michigan
19
Absolutely Shameless Plugs
•
•
•
•
XSEDE12: Bridging from the eXtreme to the campus and beyond
July 16-20, 2012 | Chicago
The XSEDE12 Conference will be held at the beautiful Intercontinental
Chicago (Magnificent Mile) at 505 N. Michigan Ave. The hotel is in the heart
of Chicago's most interesting tourist destinations and best shopping.
Watch for Calls for Participation – coming early January
•
And please visit the XSEDE and IU displays in the SC11 Exhibition Hallway!
20
For more information…
•
•
https://podiu.penguincomputing.com/
http://pti.iu.edu/ci/systems/rockhopper
21
Thanks
•
•
•
•
•
•
•
Penguin Computing, Inc. for their willingness to forge new paths with IU
Staff of the Research Technologies Division of University Information Technology Services,
affiliated with the Pervasive Technology Institute, who were involved in the implementation
of Rockhopper: George Turner, Robert Henschel, David Y. Hancock, Matthew R. Link,
Richard Knepper
Those involved in campus bridging activities: Guy Almes, Von Welch, Patrick Dreher, Jim
Pepin, Dave Jent, Stan Ahalt, Bill Barnett, Therese Miller, Malinda Lingwall, Maria Morris,
Gabrielle Allen, Jennifer Schopf, Ed Seidel
All of the IU Research Technologies and Pervasive Technology Institute staff who have
contributed to the development of IU’s advanced cyberinfrastructure and its support
NSF for funding support (Awards 040777, 1059812, 0948142, 1002526, 0829462,
1062432, OCI-1053575 – which supports the Extreme Science and Engineering Discovery
Environment)
Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute
Any opinions presented here are those of the presenter and do not necessarily represent
the opinions of the National Science Foundation or any other funding agencies