Penguin Computing / IU Partnership HPC “cluster as a service” and Cloud Services CASC Spring Meeting 2012 Craig A.

Download Report

Transcript Penguin Computing / IU Partnership HPC “cluster as a service” and Cloud Services CASC Spring Meeting 2012 Craig A.

Penguin Computing / IU Partnership
HPC “cluster as a service” and
Cloud Services
CASC Spring Meeting 2012
Craig A. Stewart ([email protected])
Executive Director, Pervasive Technology Institute
Associate Dean, Research Technologies
Matthew Jacobs
SVP Corporate Development
Penguincomputing.com
Please cite as:
Stewart, C.A. and M. Jacobs. 2012. “Penguin Computing /
IU Partnership HPC ‘cluster as a service’ and Cloud
Services.” Presentation. Presented at Coalition of
Academic Scientific Computation, 29 February 2012,
Arlington, VA. http://hdl.handle.net/2022/14441
The image on slide 1 (title slide) and slides 3 – 7 ©
Penguin Computing Inc. all rights reserved; may not be
reused without permission from Penguin Computing Inc.
Other slides (except where explicitly noted) are copyright
2011 by the Trustees of Indiana University, and this
content is released under the Creative Commons
Attribution 3.0 Unported license
(http://creativecommons.org/licenses/by/3.0/)
2
What is POD
 On-demand HPC system
> Compute, storage, low latency fabrics, GPU, non-virtualized
 Robust software infrastructure
>
>
>
>
>
Full automation
User and administration space controls
Secure and seamless job migration
Extensible framework
Complete billing infrastructure
Internet (150Mb, burstable to 1Gb)
 Services
> Custom product design
> Site and workflow integration
> Managed services
> Application support
 HPC support expertise
> Skilled HPC administrators
> Leverage 13 yrs serving HPC market
3
Penguin HPC Cloud Services
Penguin Computing on
Demand
True HPC in the cloud on a
pay-as-you-go basis
Overflow, Targeted
workload, Targeted user set
Post-Purchase
Collocation
Collocation services
provided by Penguin
Cost Reduction, Budget
reallocation
Public-Private OnDemand Partnerships
Penguin-owned and
operated PODs, hosted at
academic or government
facilities
Revenue sharing, Augment
local resources, Selfsustaining growth
POD Hybrid
On-premise cluster
designed to mean usage +
POD
Save on initial capital outlay
while sustaining high service
level to users
OEM HPC Cloud
POD distribution to internal
or external customers
Augment local resource,
expertise, fund growth
HPC SaaS Platform
Hosting platform for SAAS
providers
On-demand delivery
platform for ISVs
Turnkey Managed
Services
Remote managed services
for penguin and nonPenguin clusters
Augment local expertise,
reduce costs
4
Scyld HPC Cloud Management System
Create and Manage User and Group Hierarchies
Simultaneously Manage Multiple Collocated Clusters
Create Customer Facing Web Portals
Use Web Services to Integrate with Back-End Systems
Deploy HTML5 Based Cluster Management Tools
Securely Migrate User Workloads
Efficiently Schedule and Manage Cluster Resources
Create and Deploy Virtual Headnodes for User-Specific Clusters
Created by POD Developers and Administrators
5
12 Million Commercial Jobs and Counting…
 Current data centers: Salt Lake City,
Indiana University, Mountain View
 1,500 cores (AMD and Intel)
 240 TB on-demand storage
Replaced in-house
image analysis
cluster with POD
and co-located
storage
Provides cloud
analysis services
on POD for worldwide bioinformatics
customers
Replaced Amazon
AWS cloud usage
with PODTools
workflow migration
system
6
Nihon ESI provides
crash analysis
analyses to Honda
R&D during Japan’s
brown-outs
The POD Advantage
 Persistent, customized user environment
 High-speed Intel and AMD compute nodes (physical)
 Fast access to local storage (data guaranteed to be local)
 Highly secure (https, shared key authentication, IP matching, VPN)
 Billed by the fractional core hour
 HPC expertise included (Penguin’s core business for many years)
 Cluster software stack included
 Troubleshooting included in support
 Collocated storage options available
 Highly dependable and dynamically scalable
7
Clouds look serene enough - But is ignorance bliss?
 In the cloud, do you know:
> Where your data are?
> What laws prevail over the physical
location of your data?
> What license you really agreed to?
> What is the security (electronic /
physical) around your data?
8
> And how exactly do you get to that
cloud, or get things out of it?
Photo by http://www.flickr.com/photos/mnsc/
http://www.flickr.com/photos/mnsc/2768391365/siz
es/z/in/photostream/
> How secure your provider is
financially? (The fact that
something seems unimaginable,
like cloud provider such-and-such
going out of business abruptly, does
not mean it is impossible!)
http://creativecommons.org/licenses/by/2.0/
Penguin Computing & IU partner for “Cluster as a Service”
 Just what it says: Cluster as a Service
 Cluster physically located on IU’s campus, in IU’s Data Center
 Available to anyone at a .edu or FFRDC (Federally Funded
Research and Development Center)
 To use it:
> Go to podiu.penguincomputing.com
> Fill out registration form
> Verify via your email
> Get out your credit card
> Go computing
 This builds on Penguin’s experience - currently host Life
Technologies' BioScope and LifeScope in the cloud
(http://lifescopecloud.com)
9
We know where the data are … and they are secure
10
An example of NET+ Services / Campus Bridging
 "We are seeing the early emergence of a meta-university — a
transcendent, accessible, empowering, dynamic, communally
constructed framework of open materials and platforms on which
much of higher education worldwide can be constructed or
enhanced.” Charles Vest, president emeritus of MIT, 2006
 NET+ Goal: achieve economy of scale and retain reasonable
measure of controlSee: Brad Wheeler and Shelton Waggener.
2009. Above-Campus Services: Shaping the Promise of Cloud
Computing for Higher Education. EDUCAUSE Review, vol. 44, no. 6
(November/December 2009): 52-67.
 Campus Bridging goal – make it all feel like it’s just a peripheral
to your laptop (see pti.iu.edu/campusbridging)
11
IU POD – Innovation Through Partnership
 True On-Demand HPC for Internet2
 Creative Public/Private model to address HPC shortfall
 Turning lost EC2 dollars into central IT expansion
 Tiered channel strategy expansion to EDU sector
 Program and discipline-specific enhancements under way
 Objective third party resource for collaboration
> EDU, Federal and Commercial
12
POD IU (Rockhopper) specifications
Server Information
Architecture
TFLOPS
Clock Speed
Nodes
CPUs
Memory Type
Total Memory
Memory per Node
Local Scratch Storage
Cluster Scratch
Further Details
Penguin Computing Altus 1804
4.4
2.1GHz
11 compute; 2 login; 4 management; 3 servers
4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node
Distributed and Shared
1408 GB
128GB 1333MHz DDR3 ECC
6TB locally attached SATA2
100TB Lustre
OS
CentOS 5
Network
Job Management Software
Job Scheduling Software
Job Scheduling policy
Access
QDR (40Gb/s) Infiniband, 1Gb/s ethernet
SGE
SGE
Fair Share
keybased ssh login to headnodes
remote job control via Penguin's PODShell
13
Available applications at POD IU (Rockhopper)
Package name
Summary
COAMPS
Coupled ocean / atmosphere meoscale prediction system
Desmond
Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular
dynamics simulations of biological systems on conventional commodity clusters.
GAMESS
GAMESS is a program for ab initio molecular quantum chemistry.
Galaxy
Galaxy is an open, web-based platform for data intensive biomedical research.
GROMACS
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations
of motion for systems with hundreds to millions of particles.
HMMER
HMMER is used for searching sequence databases for homologs of protein sequences, and for making
protein sequence alignments.
Intel
compilers and libraries
LAMMPS
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular
Massively Parallel Simulator.
MM5
The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following
sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is
supported by several pre- and post-processing programs, which are referred to collectively as the MM5
modeling
system.
mpiBLAST
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST.
NAMD
NAMD is a parallel molecular dynamics code for large biomolecular systems.
14
Available applications at POD IU (Rockhopper)
Package name
Summary
NCBI-Blast
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and calculates the
statistical significance of matches.
OpenAtom
OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at
the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method.
OpenFoam
The OpenFOAM® (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD
software package produced by OpenCFD Ltd. It has a large user base across most areas of engineering
and science, from both commercial and academic organisations. OpenFOAM has an extensive range of
features to
solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to
solid dynamics and electromagnetics.
OpenMPI
Infinibad based Message Passing Interface - 2 (MPI-2) implementation
POP
POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner and Chervin in
which depth is used as the vertical coordinate. The model solves the three-dimensional primitive
equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations.
Portland Group
compilers
R
R is a language and environment for statistical computing and graphics.
WRF
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting and atmospheric research needs. It
features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a
software architecture allowing for computational parallelism and system extensibility.
15