Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Craig A.

Download Report

Transcript Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Craig A.

Get the convenience of cloud while keeping your rights –
through the IU / Penguin Computing partnership
Craig A. Stewart ([email protected]) - Executive Director, Pervasive Technology Institute;
Associate Dean, Research Technologies; Associate Director, CREST
Matthew Jacobs – Senior Vice President, Corporate Development, Penguin
Computing
Barbara Hallock – Senior Systems Analyst
Richard Knepper - Manager, Campus Bridging and Research Infrastructure
William K. Barnett - Director, National Center for Genome Analysis Support; Director,
Science Community Tools; Associate Director, Center for Applied Cybersecurity
Research
1
Some CI resources available to science and
engineering researchers in US (March 2011)
2
NSF Track 1
Track 2 and other
major facilities
Campus HPC/ Tier
3 systems
Workstations at
Carnegie
research…
Volunteer
computing
Commercial
cloud (Iaas and
Paas)
0
2,000
4,000
6,000
8,000
10,000
12,000
TFLOPS
Based on: Welch, V.; Sheppard, R.; Lingwall, M.J.; Stewart, C. A. 2011. Current structure and
past history of US cyberinfrastructure (data set and figures). hdl.handle.net/2022/13136
Adequacy of research CI
Never (10.6%)
Some of the
time (20.2%)
Most of the time
(40.2%)
Responses to asking if researchers had sufficient access to cyberinfrastructure
resources – survey sent to 5,000 researchers selected randomly from 34,623
researchers funded by NSF as Principal Investigators 2005-2009; results based on
1,028 responses
Stewart, C.A., D.S. Katz, D.L. Hart, D. Lantrip, D.S. McCaulay and R.L. Moore. Technical Report: Survey of
cyberinfrastructure needs and interests of NSF-funded principal investigators. 2011. hdl.handle.net/2022/9917
3
4
Clouds look serene enough
Photo by www.flickr.com/photos/mnsc/
www.flickr.com/photos/mnsc/2768391365/sizes/z/in/photostream
creativecommons.org/licenses/by/2.0
5
Cloud computing - NIST
•
•
•
•
Cloud computing is a model for enabling convenient, on-demand
network access to a shared pool of configurable computing resources
(e.g., networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or
service provider interaction.
This cloud model promotes availability and is composed of five essential
characteristics (on-demand self-service, broad network access, resource
pooling, rapid elasticity, measured service); three service models
(Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure
as a Service (IaaS)); and four deployment models (private cloud,
community cloud, public cloud, hybrid cloud).
Key enabling technologies include:
– Fast wide-area networks
– Powerful, inexpensive server computers
– High-performance virtualization for commodity hardware
www.nist.gov/itl/cloud/index.cfm
But is cloud computing all the pundits
claim?
•
•
•
•
•
•
•
•
Where are your data?
What laws prevail over the physical location of your data?
What license did you agree to?
Did you read the license terms?
– “When you upload or otherwise submit content to our Services, you give
Google (and those we work with) a worldwide license to use, host, store,
reproduce, modify, create derivative works … communicate, publish,
publicly perform, publicly display and distribute such content.” www.google.com/intl/en/policies/terms
What is the security (electronic / physical) around your data?
And how exactly do you get to that cloud, or get things out of it?
How secure is your provider financially? (The fact that something seems
unimaginable, like a cloud provider going out of business abruptly, does not
mean it is impossible!)
If you care about parallel performance, is a cloud provider the right solution?
6
Above-campus services – not exactly
clouds
•
•
•
Above-campus services
– "We are seeing the early emergence of a meta-university – a
transcendent, accessible, empowering, dynamic,
communally constructed framework of open materials and
platforms on which much of higher education worldwide can
be constructed or enhanced.” Charles Vest, president
emeritus of MIT, 2006
Goal: achieve economy of scale and retain reasonable measure
of control
See: Brad Wheeler and Shelton Waggener. 2009. Above-Campus
Services: Shaping the Promise of Cloud Computing for Higher
Education. EDUCAUSE Review, vol. 44, no. 6
(November/December 2009): 52-67.
www.educause.edu/EDUCAUSE+ReviewEDUCAUSEReviewMagazi
neVolume44AboveCampusServicesShapingtheP/185222
7
8
Penguin Computing and IU partner for
“Cluster as a Service”
•
•
•
•
•
Just what it says: Cluster as a Service
Cluster physically located on IU’s campus, in IU’s Data Center
Available to anyone at a .edu or FFRDC (Federally Funded Research and
Development Center)
To use it:
– Go to podiu.penguincomputing.com
– Fill out registration form
– Verify via your email
– Get out your credit card
– Go computing
This builds on Penguin’s experience – currently hosting Life Technologies'
BioScope and LifeScope in the cloud (lifescopecloud.com)
9
We know where the data are … and
they are secure
10
POD IU (Rockhopper) specifications
Server Information
Architecture
TFLOPS
Clock speed
Nodes
CPUs
Memory type
Total memory
Memory per node
Local scratch storage
Cluster scratch
Penguin Computing Altus 1804
4.4
2.1GHz
11 compute; 2 login; 4 management; 3 servers
4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node
Distributed and shared
1408 GB
128GB 1333MHz DDR3 ECC
6TB locally attached SATA2
100TB Lustre
Further Details
OS
CentOS 5
Network
Job management software
Job scheduling software
Job scheduling policy
Access
QDR (40Gb/s) Infiniband, 1Gb/s ethernet
SGE
SGE
Fair share
Keybased ssh login to head nodes
remote job control via Penguin's PODShell
Applications on POD IU (Rockhopper)
11
Package name
Summary
COAMPS
Coupled ocean / atmosphere meoscale prediction system
Desmond
Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular
dynamics simulations of biological systems on conventional commodity clusters.
GAMESS
GAMESS is a program for ab initio molecular quantum chemistry.
Galaxy
Galaxy is an open, web-based platform for data intensive biomedical research.
GROMACS
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian
equations of motion for systems with hundreds to millions of particles.
HMMER
HMMER is used for searching sequence databases for homologs of protein sequences, and for
making protein sequence alignments.
Intel
Compilers and libraries
LAMMPS
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular
Massively Parallel Simulator.
MM5
The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following
sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The
model is supported by several pre- and post-processing programs, which are referred to collectively
as the MM5 modeling
system.
mpiBLAST
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST.
NAMD
NAMD is a parallel molecular dynamics code for large biomolecular systems.
Applications on POD IU (Rockhopper)
(2)
12
Package name
Summary
NCBI-Blast
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and calculates the
statistical significance of matches.
OpenAtom
OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations
at the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD)
method.
OpenFoam
The OpenFOAM® (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD
software package produced by OpenCFD Ltd. It has a large user base across most areas of
engineering and science, from both commercial and academic organisations. OpenFOAM has an
extensive range of features to solve anything from complex fluid flows involving chemical reactions,
turbulence and heat transfer, to solid dynamics and electromagnetics.
OpenMPI
Infiniband-based Message Passing Interface - 2 (MPI-2) implementation
POP
POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner, and Chervin
in which depth is used as the vertical coordinate. The model solves the three-dimensional primitive
equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations.
Portland Group
Compilers
R
R is a language and environment for statistical computing and graphics.
WRF
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical
weather prediction system designed to serve both operational forecasting and atmospheric research
needs. It features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation
system, and a software architecture allowing for computational parallelism and system extensibility.
13
More about POD – underlying technology
•
On-demand HPC system
– Compute, storage, low latency fabrics, GPU, non-virtualized
 Robust software infrastructure
–
–
–
–
–
Full automation
Internet (150Mb, burstable to 1Gb)
User and administration space controls
Secure and seamless job migration
Extensible framework
Complete billing infrastructure
• Services
–
–
–
–
•
Custom product design
Site and workflow integration
Managed services
Application support
HPC support expertise
– Skilled HPC administrators
– Leverage 13 yrs serving HPC market
14
Scyld HPC Cloud Management System
Create and Manage User and Group Hierarchies
Simultaneously Manage Multiple Collocated Clusters
Create Customer Facing Web Portals
Use Web Services to Integrate with Back-End Systems
Deploy HTML5 Based Cluster Management Tools
Securely Migrate User Workloads
Efficiently Schedule and Manage Cluster Resources
Create and Deploy Virtual Headnodes for User-Specific Clusters
Created by POD Developers and Administrators
15
12 Million Commercial Jobs and Counting…
•
•
•
Replaced in-house
image analysis
cluster with POD
and co-located
storage
Provides cloud
analysis services on
POD for world-wide
bioinformatics
customers
Current data centers: Salt Lake
City, Indiana University, Mountain
View
1,500 cores (AMD and Intel)
240 TB on-demand storage
Replaced Amazon
AWS cloud usage with
PODTools workflow
migration system
Nihon ESI provides
crash analysis
analyses to Honda
R&D during Japan’s
brown-outs
16
The POD Advantage










Persistent, customized user environment
High-speed Intel and AMD compute nodes (physical)
Fast access to local storage (data guaranteed to be local)
Highly secure (https, shared key authentication, IP matching, VPN)
Billed by the fractional core hour
HPC expertise included (Penguin’s core business for many years)
Cluster software stack included
Troubleshooting included in support
Collocated storage options available
Highly dependable and dynamically scalable
IU / POD - an example campus bridging
• The goal of campus bridging is virtual proximity …
• The biggest problems:
– Not enough CI resources available to most researchers
– When you go from your campus to the national
cyberinfrastructure it can feel like you are falling off a cliff!
That’s why you need bridging….
• More info on campus bridging at pti.iu.edu/campusbridging
• IU is collaborating with Penguin Computing to support the
national research community in general and particularly with
two NSF-funded projects:
– eXtreme Science and Engineering Discovery Environment
(XSEDE)
– National Center for Genome Analysis Support
17
XSEDE and Penguin – part 1
•
XSEDE (eXtreme Science and Engineering Discovery Environment) is a
project, an institution, and a set of services.
– As a project, XSEDE is a five-year, $121 million grant award made by
the National Science Foundation (NSF) to the National Center for
Supercomputing Applications (NCSA) at the University of Illinois and its
partners via program solicitation NSF 08-571.
– XSEDE is a successor to the NSF-funded TeraGrid project
– As an institution, XSEDE is a collaboration led by NCSA and 18 partner
organizations to deliver a series of instantiations of services, each
instantiation being developed through a formal systems engineering
process.
– As a set of services, XSEDE integrates supercomputers, visualization
and data analysis resources, data collections, and software into a
single virtual system for enhancing the productivity of scientists,
engineers, social scientists, and humanities experts.
18
19
XSEDE and Penguin – part 2
• Under TeraGrid, it was never possible to buy “TeraGrid-like”
cycles, and many people viewed the allocation process as
very slow
• XSEDE is speeding up the allocation process considerably
• IU is working with Penguin Computing to install the basic open
source XSEDE software environment on Rockhopper
• It is for the first time ever possible to buy “XSEDE-like” cycles in
a matter of minutes using a credit card
• In some circumstances this will be a much better way to meet
peak needs, or use startup funds, than buying and installing
“clusters in a closet.”
NCGAS & POD IU
•
•
•
•
•
•
•
The National Center for Genome Analysis Support
A Cyberinfrastructure Service Center affiliated with the Indiana University
Pervasive Technology Institute (pti.iu.edu)
Dedicated to supporting life science researchers who need
computational support for genomics analysis
Initially funded by the National Science Foundation Advances in
Biological Informatics (ABI) program, grant # 1062432
Provides access to genomics analysis software on supercomputers
customized for genomics studies including POD IU
Particularly focused on supporting genome assembly codes such as:
– de Bruijn graph methods: SOAPdeNovo, Velvet, ABySS,
– Consensus methods: Celera, Newbler, Arachne 2
For more information, see ncgas.org
20
Summary
•
•
•
•
•
IU and its partners are collaborating with Penguin Computing Inc. to
implement a new model of above-campus services that provides many of the
advantages of cloud services, while avoiding many of the drawbacks.
The service provided is Cluster as a Service – a real, high performance
supercomputer cluster
Access is simple – if you are at a .edu or a FFRDC, get out your credit card
and go computing
As examples of effective campus bridging:
– This service is being supported by the IU National Center for Genome
Analysis Support
– IU is providing the open source components of the XSEDE software
environment to provide an XSEDE-like environment that you can access in
minutes with a credit card
Establishing this partnership is possible through the involvement of our key
academic partners: University of California Berkeley, University of Virginia,
University of Michigan
21
22
For more information…
• podiu.penguincomputing.com
• pti.iu.edu/ci/systems/rockhopper
23
License terms
•
•
•
•
Please cite this presentation as: Stewart, C.A., M. Jacobs, B. Hallock, R. Knepper and W.K.
Barnett. Get the convenience of cloud while keeping your rights – through the IU /
Penguin Computing partnership. 2012. Presentation. http://hdl.handle.net/2022/14704
Portions of this document that originated from sources outside IU are shown here and
used by permission or under licenses indicated within this document.
Items indicated with a © are under copyright and used here with permission. Such items
may not be reused without permission from the holder of copyright except where license
terms noted on a slide permit reuse.
Except where otherwise noted, the contents of this presentation are copyright 2011 by
the Trustees of Indiana University. This content is released under the Creative Commons
Attribution 3.0 Unported license (creativecommons.org/licenses/by/3.0). This license
includes the following terms: You are free to share – to copy, distribute and transmit the
work and to remix – to adapt the work under the following conditions: attribution – you
must attribute the work in the manner specified by the author or licensor (but not in any
way that suggests that they endorse you or your use of the work). For any reuse or
distribution, you must make clear to others the license terms of this work.
Thanks
•
•
•
•
•
•
•
Penguin Computing, Inc. for their willingness to forge new paths with IU
Staff of the Research Technologies division of University Information Technology Services,
affiliated with the Pervasive Technology Institute, who were involved in the
implementation of Rockhopper: George Turner, Robert Henschel, David Y. Hancock,
Matthew R. Link, Richard Knepper
Those involved in campus bridging activities: Guy Almes, Von Welch, Patrick Dreher, Jim
Pepin, Dave Jent, Stan Ahalt, Bill Barnett, Therese Miller, Malinda Husk, Maria Morris,
Gabrielle Allen, Jennifer Schopf, Ed Seidel
All of the IU Research Technologies and Pervasive Technology Institute staff who have
contributed to the development of IU’s advanced cyberinfrastructure and its support
NSF for funding support (Awards 040777, 1059812, 0948142, 1002526, 0829462, 1062432,
OCI-1053575 – which supports the Extreme Science and Engineering Discovery
Environment)
Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute
Any opinions presented here are those of the presenter and do not necessarily represent
the opinions of the National Science Foundation or any other funding agencies.
24