FutureGrid Computing Testbed as a Service Details July 3 2013 Geoffrey Fox for FutureGrid Team [email protected] http://www.infomall.org http://www.futuregrid.org School of Informatics and Computing Digital Science Center Indiana University Bloomington https://portal.futuregrid.org.

Download Report

Transcript FutureGrid Computing Testbed as a Service Details July 3 2013 Geoffrey Fox for FutureGrid Team [email protected] http://www.infomall.org http://www.futuregrid.org School of Informatics and Computing Digital Science Center Indiana University Bloomington https://portal.futuregrid.org.

FutureGrid Computing Testbed as a Service Details

July 3 2013 Geoffrey Fox for FutureGrid Team

[email protected]

http://www.infomall.org http://www.futuregrid.org

School of Informatics and Computing Digital Science Center Indiana University Bloomington https://portal.futuregrid.org

• • • • • • • • • • • • • • • •

Topics Covered

Current Status Recap Overview Details of Hardware More sample FutureGrid Projects ScaleMP High Performance Cloud Infrastructure?

Details – XSEDE Testing and FutureGrid Relation of FutureGrid to other Projects MOOC’s Services FutureGrid Futures Cloudmesh TestbedaaS Tool Security in FutureGrid Details of Image Generation on FutureGrid Details of Monitoring on FutureGrid Appliances available on FutureGrid https://portal.futuregrid.org 2

Current Status

https://portal.futuregrid.org 3

• • • • •

Basic Status

FutureGrid has been running for 3 years – 339 projects; 2009 users September 13 2013 Funding available through September 30, 2014 with No Cost Extension which can be submitted in mid August (45 days prior to the formal expiration of the grant) Participated in Computer Science activities (call for white papers and presentation to CISE director) Participated in OCI solicitations Pursuing GENI collaborations

• • • •

Technology

OpenStack becoming best open source virtual machine management environment – Also more reliable than previous versions of OpenStack and Eucalyptus – Nimbus switch to OpenStack core with projects like Phantom – In past Nimbus was essential as only reliable open source VM manager XSEDE Integration has made major progress; 80% complete These improvements/progress will allow much greater focus on TestbedaaS software Solicitations motivated adding “On-ramp” capabilities; develop code on FutureGrid – Burst or Shift to other cloud or HPC systems (Cloudmesh)

Assumptions

• • • • • “Democratic” support of Clouds and HPC likely to be important As a testbed, offer bare metal or clouds on a given node Run HPC systems with similar tools to clouds so HPC bursting as well as Cloud bursting Define images by templates that can be built for different HPC and cloud environments Education integration important (MOOC’s)

Recap Overview

https://portal.futuregrid.org 7

• • • •

FutureGrid Testbed as a Service

FutureGrid is part of XSEDE set up as a testbed with cloud focus Operational since Summer 2010 (i.e. coming to end of third year of use) The FutureGrid testbed provides to its users: – Support of Computer Science and Computational Science research – A flexible development and testing platform for middleware and application users looking at interoperability , functionality , performance or evaluation – FutureGrid is user-customizable supports Grid , Cloud and HPC , accessed interactively and software with and without VM’s – A rich education and teaching platform for classes Offers OpenStack, Eucalyptus, Nimbus, OpenNebula, HPC (MPI) on same hardware moving to software defined systems; supports both classic HPC and Cloud storage https://portal.futuregrid.org

FutureGrid Operating Model

• Rather than loading images onto VM’s, FutureGrid supports Cloud, Grid and Parallel computing environments by provisioning software as needed onto “bare-metal” or VM’s/Hypervisors using (changing) open source tools – Image library for MPI, OpenMP, MapReduce (Hadoop, (Dryad), Twister), gLite, Unicore, Globus, Xen, ScaleMP (distributed Shared Memory), Nimbus, Eucalyptus, OpenNebula, KVM, Windows …..

– Either statically or dynamically • Growth comes from users depositing novel images in library • FutureGrid is quite small with ~4700 distributed cores and a

dedicated network

Image1 Image2 … ImageN Choose https://portal.futuregrid.org Load Run

• • • • • • • •

FutureGrid Partners

Indiana University (Architecture, core software, Support) San Diego Supercomputer Center at University of California San Diego (INCA, Monitoring) University of Chicago /Argonne National Labs (Nimbus) University of Florida (ViNE, Education and Outreach) University of Southern California Information Sciences (Pegasus to manage experiments) University of Tennessee Knoxville (Benchmarking) University of Texas at Austin /Texas Advanced Computing Center (Portal, XSEDE Integration) University of Virginia (OGF, XSEDE Software stack) • Red institutions have FutureGrid hardware https://portal.futuregrid.org

FutureGrid offers Computing Testbed as a Service

Software (Application Or Usage) SaaS   

CS Research Use e.g. test new compiler or storage model Class Usages e.g. run GPU & multicore Applications

Platform PaaS   

Cloud e.g. MapReduce HPC e.g. PETSc, SAGA Computer Science e.g. Compiler tools, Sensor nets, Monitors

Infra structure IaaS   

Software Defined Computing (virtual Clusters) Hypervisor, Bare Metal Operating System

Network NaaS 

Software Defined Networks

        FutureGrid Uses Testbed-aaS Tools Provisioning Image Management IaaS Interoperability NaaS, IaaS tools Expt management Dynamic IaaS NaaS Devops

FutureGrid Cloudmesh (includes RAIN)

uses Dynamic Provisioning and Image Management to provide custom environments for general target systems Involves (1) creating, (2) deploying, and (3) provisioning of one or more images in a set of machines on demand

Selected List of Services Offered

Cloud PaaS

Hadoop Iterative MapReduce HDFS Hbase Swift Object Store

IaaS

Nimbus Eucalyptus OpenStack ViNE

GridaaS

Genesis II Unicore SAGA Globus

HPCaaS

MPI OpenMP CUDA https://portal.futuregrid.org

TestbedaaS

FG RAIN, CloudMesh Portal Inca Ganglia Devops (Chef, Puppet, Salt) Experiment Management e.g. Pegasus 12

Hardware(Systems) Details

https://portal.futuregrid.org 13

FutureGrid: a Grid/Cloud/HPC Testbed

Private Public FG Network https://portal.futuregrid.org

12TF Disk rich + GPU 512 cores NID

: Network Impairment Device

Heterogeneous Systems Hardware

Name System type # CPUs # Cores TFLOPS Total RAM (GB) Secondary Storage (TB) Site Status India

IBM iDataPlex 256 1024 11 3072 512 IU

Operational

192 768 8 1152 30 TACC

Operational Alamo

Dell PowerEdge

Hotel Sierra Xray

IBM iDataPlex IBM iDataPlex Cray XT5m

Foxtrot

IBM iDataPlex 168 168 168 64 672 672 672 256

Bravo Delta Lima

Large Disk & memory 32 128 Large Disk & memory With Tesla GPU’s SSD Test System 32 CPU 32 GPU’s 16 192 128

Echo TOTAL

Large memory ScaleMP 32 192

1128 + 32 GPU 4704 +14336 GPU

7 7 6 2 1.5

9 1.3

2 2016 2688 1344 120 96 180 UC SDSC IU 768 24 3072 (192GB per node) 192 (12 TB per Server) 3072 (192GB per node) 192 (12 TB per Server) 512 3.8(SSD) 8(SATA) UF IU IU SDSC 6144 192 IU

1550 Operational Operational Operational Operational Operational Operational Operational Beta

FutureGrid Distributed Computing TestbedaaS

India (IBM) and Xray (Cray) (IU) Bravo Delta Echo (IU) Lima (SDSC) Hotel (Chicago)

System Type

Xanadu 360 DDN 6620 SunFire x4170 Dell MD3000 IBM

Storage Hardware

Capacity (TB) File System Site

180 120 96 30 24 NFS GPFS ZFS NFS NFS IU UC SDSC TACC UF

Status

New System New System New System New System New System • • • • • Substantial back up storage at IU: Data Capacitor and HPSS

Support

Traditional Drupal Portal with usual functions Traditional Ticket System System Admin and User facing support (small) Outreach group (small) Strong Systems Admin Collaboration with Software group https://portal.futuregrid.org

More Example Projects

https://portal.futuregrid.org 18

ATLAS T3 Computing in the Cloud

• • • • Running 0 to 600 ATLAS simulation jobs continuously since April 2012.

Number of running VMs responds dynamically to the workload management system (Panda).

Condor executes the jobs, Cloud Scheduler manages the VMs Using cloud resources at FutureGrid, University of Victoria, and National Research Council of Canada https://portal.futuregrid.org

Completed jobs per day since march CPU Efficiency in the last month Number of simultaneously running jobs since march (1 per core) https://portal.futuregrid.org

Improving IaaS Utilization

• •

Challenge

– Utilization is the catch 22 of on-demand clouds

Solution

– Preemptible instances: increase utilization without sacrificing the ability to respond to on-demand requests – Multiple contention management strategies 94 % 78 % 62 % 47 % 31 % 16 %

ANL Fusion cluster utilization 03/10 -03/11 Courtesy of Ray Bair, ANL Paper: Marshall P., K. Keahey, and T. Freeman, “Improving Utilization of Infrastructure

Clouds“, CCGrid’11 4/25/2020 https://portal.futuregrid.org 21

Improving IaaS Utilization

Preemption Disabled

Average utilization: 36.36% Maximum utilization: 43.75%

Preemption Enabled

Average utilization: 83.82% Maximum utilization: 100% Infrastructure Utilization (%) 4/25/2020 https://portal.futuregrid.org Infrastructure Utilization (%) 22

SSD experimentation using Lima

Lima @ UCSD • 8 nodes, 128 cores • • • AMD Opteron 6212 64 GB DDR3 10GbE Mellanox ConnectX 3 EN • 1 TB 7200 RPM Ent SATA • Drive 480 GB SSD SATA Drive HDFS I/O throughput (Mbps) comparison for SSD and HDD using the TestDFSIO benchmark. For each file size, ten files were written to the disk.

(Intel 520) https://portal.futuregrid.org

Ocean Observatory Initiative (OOI)

• • • • Towards Observatory Science Sensor-driven processing – Real-time event-based data stream processing capabilities – Highly volatile need for data distribution and processing – An “always-on” service Nimbus team building platform services for integrated, repeatable support for on-demand science – High-availability – Auto-scaling From regional Nimbus clouds to commercial clouds https://portal.futuregrid.org 24

ScaleMP

https://portal.futuregrid.org 25

• • •

ScaleMP vSMP

vSMP Foundation is a virtualization software that creates a single virtual machine over multiple x86-based hardware compute nodes.

Provides large memory and compute SMP virtually to users by leveraging commodity MPP hardware. Allows for the use of MPI, OpenMP, Pthreads, Java threads, and serial jobs on a single unified OS.

https://portal.futuregrid.org

ECHO – FG vSMP System

• • • ScaleMP version 5.4 virtualized SMP – 6 TB DDR3 RAM [ /proc/meminfo] – 192 Intel Xeon E5-2640 @ 2.5GHz

– 96 TB System Storage (RAID) Scale applications with minimal effort – OpenMP or Pthreads for easy many-core parllelism – 6TB main memory for in-memory DB access – 90%+ MPI efficiency for classic HPC applications Future: Provision vSMP via Cloud Infrastructure https://portal.futuregrid.org 27

High Performance Cloud Infrasturcture?

https://portal.futuregrid.org 28

Virtualized GPUs in a Cloud

• • • • Need for GPUs on Clouds – GPUs are becoming commonplace in scientific computing – Great performance-per-watt Different competing methods for virtualizing GPUs – Remote API for CUDA calls  rCUDA, vCUDA, gVirtus – Direct GPU usage within VM  our method Also need InfiniBand in Cloud Infrastucture – High speed, low latency interconnect – RDMA & IPoIB to VMsSupport many traditional HPC applications via MPI Supercomputing today uses high performance interconnects and advanced accelerators (GPUs) – Goal: provide the same hardware at a minimal overhead to build the first ever HPC Cloud https://portal.futuregrid.org 29

Direct PCI Virtualization

• • • • Allow VMs to directly access GPU hardware – Uses Xen 4.2 Hypervisor Enables CUDA and OpenCL code Utilizes PCI-passthrough of device to guest VM – Hardware directed I/O virt (VT d or IOMMU) – Provides direct isolation and security of device – Removes host overhead Can also use InfiniBand Virtual Functions via SR-IOV – IB + GPU = HPC Dom0 OpenStack Compute MDD CPU & DRAM Dom1 Task VDD GPU VF VFVF PF IB GPU1 Dom2 Task VDD GPU VMM VT-D / IOMMU PCI Express GPU2 DomN Task VDD GPU GPU3 https://portal.futuregrid.org 30

80 70 60 50 40 30 20 10 0 3500 3000 2500 2000 1500 1000 500 0

Performance of GPU enabled VMs

GPU Max FLOPS

maxspflops maxdpflops

Benchmark GPU Stencil 2D and S3D GPU Bus Speed

Delta Native Delta VM ISI Nat ISI VM 1 0 3 2 5 4 8 7 6 C2075 Native C2075 VM K20m Native K20m VM 3500 3000 2500 2000 1500 1000 500 bspeed_download bspeed_readback

InfiniBand Bandwidth Benchmark

NAT READ VM READ NAT WRITE C2075 Native C2075 VM K20m Native K20m VM rCUDA v3 GigE rCUDA v4 GigE rCUDA v3 IPoIB rCUDA v4 IPoIB rCUDA v4 IBV VM WRITE

Results

• • • Xen performs relatively well for virtualizing GPUs – Best: Near-native ~2/3 <1% Fermi, ~1/3 < 1% Kepler – Average: -2.8% Fermi, -4.9% Kepler (-2% w/o FFT) – Worst: Kepler FFT (-15% Kepler) – Xen VM performs near-native for memory transfer Xen works for enabling InfiniBand in VMs – R/W bandwidth and latency perform at near-native – Support for SR-IOV Virtual Functions soon to come Can now build a High Performance Cloud Infrasturcture?

https://portal.futuregrid.org 32

OpenStack Integration

• Integrated into OpenStack “Havana” release – Xen support for full virtualization with libvirt – Custom Libvirt driver for PCI-Passthrough – Use instance_type_extra_specs to specify PCI devs root@test-nvidia-xqcow2-vm-58 ~]# lspci

...

00:04.0 3D controller: NVIDIA Corporation Device 1028 (rev a1) 00:05.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

https://portal.futuregrid.org

Experimental Deployment:

• • •

FutureGrid Delta

16x 4U nodes in 2 Racks – 2x Intel Xeon X5660 – 192GB Ram – Nvidia Tesla C2075 Fermi – QDR InfiniBand - CX-2 Management Node – OpenStack Keystone, Glance, API, Cinder, Nova network Compute Nodes – Nova-compute, Xen, libvirt https://portal.futuregrid.org 34

Details – XSEDE Testing and FutureGrid

https://portal.futuregrid.org 35

• • •

Software Evaluation and Testing on FutureGrid

Technology Investigation Service (TIS) provides a capability to identify, track, and evaluate hardware and software technologies that could be used in XSEDE or any other cyberinfrastructure

XSEDE Software Development & Integration (SD&I)

uses best software engineering practices to deliver high quality software thru XSEDE Operations to Service Providers, End Users, and Campuses.

XSEDE Operations Software Testing and Deployment

(ST&D) performs acceptance testing of new XSEDE capabilities https://portal.futuregrid.org

SD&I testing for XSEDE Campus Bridging for EMS/GFFS (aka SDIACT-101)

GenesisII

SD&I test plan

Full test pass involving…

a.XRay

as only endpoint (putting heavy load on a single BES – Cray XT5m Linux/Torque/Moab)

b.India

as only endpoint (testing on a IBM iDataplex Redhat 5/Torque/Moab) c.Centurion (UVa) as only endpoint (testing against Genesis II BES)

d.Sierra

setup fresh following CI installation guide (testing the correctness of the installation guide)

e.Sierra

and endpoints)

India

(testing load balancing to these https://portal.futuregrid.org

XSEDE SD&I and Operations testing of

•

xdusage (aka SDIACT-102)

Joint SD&I and Operations test plan

xdusage gives researchers and their collaborators a command line way to view their allocation information in the XSEDE central database (XDCDB) Full test pass involving…

a.FutureGrid Nimbus VM on Hotel

(emulating TACC Lonestar) % xdusage -a -p TG-STA110005S Project: TG STA110005S/staff.teragrid

PI: Navarro, John-Paul Allocation: 2012-09-14/2013-09-13 Total=300,000 Remaining=297,604 Usage=2,395.6 Jobs=21 PI Navarro, John-Paul portal=navarro usage=0 jobs=0 b.Verne test node (emulating NICS Nautilus) c.Giu1 test node (emulating PSC Blacklight) https://portal.futuregrid.org

Comparison with EGI

https://portal.futuregrid.org 39

8 7 1 2 3 4 5 6

•

EGI Cloud Activities v. FutureGrid

https://wiki.egi.eu/wiki/Fedcloud-tf:FederatedCloudsTaskForce

FutureGrid EGI Phase 1. Setup: Sept 2011 - March 2012 Workbenches Capabilities

Running a pre-defined VM Image VM Management Managing users' data and VMs Data management Cloudmesh. Templated image management Not addressed due to multiple FG environments/lack of manpower Integrating information from multiple resource providers Information discovery Cloudmesh, FG Metrics, Inca, FG Glue2, Ubmod, Ganglia Accounting across Resource Providers Reliability/Availability of Resource Providers VM/Resource state change notification Accounting Monitoring Notification FG Metrics Not addressed (as Testbed not production) Provided by IaaS for our systems AA across Resource Providers Authentication and Authorisation LDAP, Role-based AA VM images across Resource Providers VM sharing Templated image Repository https://portal.futuregrid.org 40

Activities Related to FutureGrid

https://portal.futuregrid.org 41

• • • • • •

Essential and Different features of FutureGrid in Cloud area

Unlike many clouds such as Amazon and Azure, FutureGrid allows robust reproducible (in performance and functionality) research (you can request same node with and without VM) – Open Transparent Technology Environment FutureGrid is more than a Cloud; it is a general distributed Sandbox; a cloud grid HPC testbed Supports 3 different IaaS environments (Nimbus, Eucalyptus, OpenStack) and projects involve 5 (also CloudStack, OpenNebula) Supports research on cloud tools, cloud middleware and cloud-based systems FutureGrid has itself developed middleware and interfaces to support FutureGrid’s mission e.g. Phantom (cloud user interface) Vine (virtual network) RAIN/Cloudemesh (deploy systems) and security/metric integration FutureGrid has experience in running cloud systems https://portal.futuregrid.org 42

• • • • • • •

Related Projects

Grid5000 (Europe) and OpenCirrus with managed flexible environments are closest to FutureGrid and are collaborators PlanetLab has a networking focus with less managed system Several GENI related activities including network centric EmuLab, PRObE (Parallel Reconfigurable Observational Environment), ProtoGENI, ExoGENI, InstaGENI and GENICloud BonFire (Europe) cloud supporting OCCI Recent EGI Federated Cloud with OpenStack and OpenNebula aimed at EU Grid/Cloud federation Private Clouds: Red Cloud (XSEDE), Wispy (XSEDE), Open Science Data Cloud and the Open Cloud Consortium are typically aimed at computational science Public Clouds such as AWS do not allow reproducible experiments and bare-metal/VM comparison; do not support experiments on low level cloud technology https://portal.futuregrid.org 43

• • •

Related Projects in Detail I

EGI Federated cloud (see https://wiki.egi.eu/wiki/Fedcloud-tf:UserCommunities and https://wiki.egi.eu/wiki/Fedcloud-tf:Testbed#Resource_Providers_inventory ) with about 4910 documented cores according to the pages. Mostly OpenNebula and OpenStack Grid5000 is a scientific instrument designed to support experiment-driven research in all areas of computer science related to parallel, large-scale, or distributed computing and networking. Experience from Grid5000 is a motivating factor for FG. However, the management of the various Cloud and PaaS frameworks is not addressed.

EmuLab provides the software and a hardware specification for a Network Testbed. Emulab is a long-running project and has through its integration into GENI and its deployment in a number of sites resulted in a number of tools that we will try to leverage. These tools have evolved from a network-centric view and allow users to emulate network environments to further users’ research goals. Additionally, some attempts have been made to run IaaS frameworks such as OpenStack and Eucalyptus on Emulab.

https://portal.futuregrid.org 44

• • • •

Related Projects in Detail II

PRObE (Parallel Reconfigurable Observational Environment) using EmuLab targets scalability experiments on the supercomputing level while providing a large-scale, low-level systems research facility. It consists of recycled super-computing servers from Los Alamos National Laboratory.

PlanetLab consists of a few hundred machines spread over the world, mainly designed to support wide-area networking and distributed systems research ExoGENI links GENI to two advances in virtual infrastructure services outside of GENI: open cloud computing (OpenStack) and dynamic circuit fabrics. ExoGENI orchestrates a federation of independent cloud sites and circuit providers through their native IaaS interfaces and links them to other GENI tools and resources. ExoGENI uses OpenFlow to connect the sites and ORCA as a control software. Plugins for OpenStack and Eucalyptus for ORCA are available.

ProtoGENI is a prototype implementation and deployment of GENI largely based on Emulab software. ProtoGENI is the Control Framework for GENI Cluster C, the largest set of integrated projects in GENI.

https://portal.futuregrid.org 45

• • •

Related Projects in Detail III

BonFire from the EU is developing a testbed for internet as a service environment. It provides offerings similar to Emulab: a software stack that simplifies experiment execution while allowing a broker to assist in test orchestration based on test specifications provided by users.

OpenCirrus is a cloud computing testbed for the research community that federates heterogeneous distributed data centers. It has partners from at least 6 sites. Although federation is one of the main research focuses, the testbed does not yet employ a generalized federated access to their resources according to discussions that took place at the last OpenCirrus Summit.

Amazon Web Services (AWS) provides the de facto standard for clouds. Recently, projects have integrated their software services with resources offered by Amazon, for example, to utilize cloud bursting in the case of resource starvation as part of batch queuing systems . Others (MIT) have automated and simplified the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud.

https://portal.futuregrid.org 46

• • •

Related Projects in Detail IV

InstaGENI and GENICloud build two complementary elements for providing a federation architecture that takes its inspiration from the Web. Their goals are to make it easy, safe, and cheap for people to build small Clouds and run Cloud jobs at many different sites. For this purpose, GENICloud/TransCloud provides a common API across Cloud Systems and access Control without identity. InstaGENI provides an out-of-the-box small cloud. The main focus of this effort is to provide a federated cloud infrastructure Cloud testbeds and deployments. In addition a number of testbeds exist providing access to a variety of cloud software. These testbeds include Red Cloud, Wimpy, the Open Science Data Cloud, and the Open Cloud Consortium resources.

XSEDE is a single virtual system that scientists can use to share computing resources, data, and expertise interactively. People around the world use these resources and services, including supercomputers, collections of data, and new tools. XSEDE is devoted to delivering a production-level facility to its user community. It is currently exploring clouds, but has not yet committed to them. XSEDE does not allow the provisioning of the software stack in the way FG allows.

https://portal.futuregrid.org 47

• • • •

Link FutureGrid and GENI

Identify how to use the ORCA federation framework to integrate FutureGrid (and more of XSEDE?) into ExoGENI Allow FG(XSEDE) users to access the GENI resources and vice versa Enable PaaS level services (such as a distributed Hbase or Hadoop) to be deployed across FG and GENI resources Leverage the Image generation capabilities of FG and the bare metal deployment strategies of FG within the GENI context.

– Software defined networks plus cloud/bare metal dynamic provisioning gives software defined systems https://portal.futuregrid.org 48

• • • •

Typical FutureGrid/GENI Project

Bringing computing to data is often unrealistic as repositories distinct from computing resource and/or data is distributed So one can build and measure performance of virtual distributed data stores where software defined networks bring the computing to distributed data repositories. Example applications already on FutureGrid include Network Science (analysis of Twitter data), “ Deep Learning ” (large scale clustering of social images), Earthquake and Polar Science , Sensor nets as seen in Smart Power Grids, Pathology images, and Genomics Compare different data models HDFS, Hbase, Object Stores, Lustre, Databases https://portal.futuregrid.org 49

MOOC’s

https://portal.futuregrid.org

• • • • • •

Integrate MOOC Technology

We are building MOOC lessons to describe core FutureGrid Capabilities Will help especially educational uses – – 36 Semester long classes: over 650 students Cloud Computing, Distributed Systems, Scientific Computing and Data Analytics – – 3 one week summer schools: 390+ students Big Data, Cloudy View of Computing (for HBCU’s), Science Clouds – 7 one to three day workshop/tutorials: 238 students Science Cloud Summer School available in MOOC format First high level Software IP-over-P2P (IPOP) Overview and Details of FutureGrid How to get project, use HPC and use OpenStack

• • • • • •

Online MOOC’s

Science Cloud MOOC repository – http://iucloudsummerschool.appspot.com/preview FutureGrid MOOC’s – https://fgmoocs.appspot.com/explorer A MOOC that will use FutureGrid for class laboratories (for advanced students in IU Online Data Science masters degree) – https://x-informatics.appspot.com/course MOOC Introduction to FutureGrid can be used by all classes and tutorials on FutureGrid Currently use Google Course Builder: Google Apps + YouTube – Built as collection of modular ~10 minute lessons Develop techniques to allow Clouds to efficiently support MOOC’s

• • Twelve ~10 minutes lesson objects in this lecture IU wants us to close caption if use in real course 53

FutureGrid hosts many classes per semester How to use FutureGrid is shared MOOC

Services

https://portal.futuregrid.org

Which Services should we install?

• • • • We look at statistics on what users request We look at interesting projects as part of the project description We look for projects which we intend to integrate with: e.g. XD TAS, XSEDE We look at community activities https://portal.futuregrid.org 56

Technology Requests per Quarter

HPC Eucalyptus 20 Nimbus 15 OpenNebula OpenStack 10 Avg of the rest 16 Полиномиальная (HPC) 5 0 Полиномиальная (Eucalyptus) Полиномиальная (Nimbus) Полиномиальная (OpenNebula)

Poly is a polynomial fit

https://portal.futuregrid.org 57

Selected List of Services Offered

Cloud PaaS Hadoop Iterative MapReduce HDFS Hbase Swift Object Store IaaS Nimbus Eucalyptus OpenStack ViNE GridaaS Genesis HPCaaS Unicore SAGA Globus MPI OpenMP CUDA TestbedaaS

Infrastructure: Inca, Ganglia Provisioning: RAIN, Cloudmesh VMs: Phantom, Cloudmesh Experiments: Pegasus, Precip,

Cloudmesh

Accounting: FG, XSEDE

https://portal.futuregrid.org 58

https://portal.futuregrid.org 59

Education Technology Requests

https://portal.futuregrid.org 60

Details – FutureGrid Futures

https://portal.futuregrid.org 61

• • • • • • • • • •

Lessons learnt from FutureGrid

Unexpected major use from Computer Science and Middleware Rapid evolution of Technology Eucalyptus  Nimbus  OpenStack Open source IaaS maturing as in “Paypal To Drop VMware From 80,000 Servers and Replace It With OpenStack” (Forbes) – “VMWare loses $2B in market cap”; eBay expects to switch broadly?

Need interactive not batch use; nearly all jobs short Substantial TestbedaaS technology needed and FutureGrid developed (RAIN, CloudMesh, Operational model) some Lessons more positive than DoE Magellan report (aimed as an early science cloud) but goals different Still serious performance problems in clouds for networking and device (GPU) linkage; many activities outside FG addressing – One can get good Infiniband performance on a peculiar OS + Mellanox drivers but not general yet We identified characteristics of “optimal hardware” Run system with integrated software (computer science) and systems administration team Build Computer Testbed as a Service Community https://portal.futuregrid.org 62

• • • • • • • •

Future Directions for FutureGrid

Poised to support more users as technology like OpenStack matures – Please encourage new users and new challenges More focus on academic Platform as a Service (PaaS) - high-level middleware (e.g. Hadoop, Hbase, MongoDB) – as IaaS gets easier to deploy • Expect increased Big Data challenges Improve Education and Training with model for MOOC laboratories Finish CloudMesh (and integrate with Nimbus Phantom) to make FutureGrid as hub to jump to multiple different “production” clouds commercially, nationally and on campuses; allow cloud bursting – Several collaborations developing Build underlying software defined system model with integration with GENI and high performance virtualized devices (MIC, GPU) Improved ubiquitous monitoring at PaaS IaaS and NaaS levels Improve “Reproducible Experiment Management” environment Expand and renew hardware via federation https://portal.futuregrid.org 63

• • • • •

FutureGrid is an onramp to other systems

FG supports Education & Training for all systems User can do all work on FutureGrid OR User can download Appliances on local machines (Virtual Box) OR User soon can use CloudMesh to jump to chosen production system CloudMesh is similar to OpenStack Horizon, but aimed at multiple federated systems. – Built on RAIN and tools like libcloud, boto with protocol (EC2) or programmatic API (python) – – Uses general templated image that can be retargeted One-click template & image install on various IaaS & bare metal including Amazon, Azure, Eucalyptus, Openstack, OpenNebula, Nimbus, HPC – Provisions the complete system needed by user and not just a single image; copes with resource limitations and deploys full range of software – Integrates our VM metrics package (TAS collaboration) that links to XSEDE (VM's are different from traditional Linux in metrics supported and needed) https://portal.futuregrid.org 64

Proposed FutureGrid Architecture

https://portal.futuregrid.org 65

Summary Differences between FutureGrid I (current) and FutureGrid II

Usage Target environments Computer Science Education FutureGrid I

Grid, Cloud, and HPC Per-project experiments Fixed Resource

FutureGrid II

Cloud, Big-data, HPC, some Grids Repeatable, reusable experiments Scalable use of Commercial to FutureGrid II to Appliance per-tool and audience type

Domain Science

Software develop/test Software develop/test across resources using templated appliances

Cyberinfrastructure Provisioning model Configuration Extensibility User support Flexibility Deployed Software Service Model IaaS Hosting Model FutureGrid I

IaaS+PaaS+SaaS Static Fixed size Help desk Fixed resource types Proprietary, Closed Source, Open Source

FutureGrid II

CTaaS including NaaS+IaaS+PaaS+SaaS Software-defined Federation Help Desk + Community based Software-defined + federation Open Source Public and Private Distributed Cloud

• • •

Federated Hardware Model in FutureGrid I

FutureGrid internally federates heterogeneous cloud and HPC

systems

–

Want to expand with federated hardware partners

HPC services: Federation of HPC hardware is possible via Grid technologies (However we do not focus on this as this done well at XSEDE and EGI)

Homogeneous cloud federation (one IaaS framework).

– – – Integrate multiple clouds as zones.

Publish the zones so we can find them in a service repository.

introduce trust through uniform project vetting – allow authorized projects by zone (zone can determine is a project is allowed on their cloud) – integrate trusted identity providers => trusted identity providers & trusted project management & local autonomy https://portal.futuregrid.org 67

• •

Federated Hardware Model in FutureGrid II

Heterogeneous cloud federation (multiple IaaS)

– Just as homogeneous case but in addition to zones we also have different IaaS frameworks including commercial – Like Amazon+FutureGrid federation

Federation through Cloudmesh

– – HPC+Cloud Develop "drivers license model" (online user test) for RAIN.

– Introduce service access policies. CloudMesh is just one of such possible services e.g. enhance previous models with role based system allowing restriction of access to services – Development of policies on how users gain access to such services, including consequences if they are broken.

– Automated security vetting of images before deployment https://portal.futuregrid.org 68

Cloudmesh TestbedaaS Tool

RAIN https://portal.futuregrid.org 69

Avoid Confusion

To avoid confusion with the overloaded term

Dynamic Provisioning

we will use the term

RAIN

https://portal.futuregrid.org 70

What is RAIN?

Templates & Services

Virtual Cluster OS Image Virtual Machine Resources

https://portal.futuregrid.org

Hadoop

Other 71

RAIN/RAINING

is a Concept

Cloudmesh

is a toolkit implementing RAIN and also to support general target environments

It includes a component called Rain that is used to build and interface with a testbed so that users can conduct advanced reproducible experiments https://portal.futuregrid.org 72

Cloudmesh TestbedaaS Tool

An evolving toolkit and service to build and interface with a testbed so that users can conduct advanced reproducible experiments https://portal.futuregrid.org 73

Cloudmesh Functionality View

https://portal.futuregrid.org 74

Cloudmesh Layered Architecture View

Interfaces

Portal, CMD shell, Commandline, API

Provisioner Queue

AMQP

Provision Management Cloud Metrics

REST

Infrastructure Scheduler

REST

Image Management RAIN

VM Image Generation, VM Provisioning

Provisioner Abstraction OS Provisioners

Teefaa, Cobbler, OpenStack Bare Metal

IaaS Abstraction User On-Ramp

Amazon, Azure, Eucalyptus, OpenCirrus, ...

Data https://portal.futuregrid.org 75

Cloudmesh RAIN Move

• • • • • Orchestrates resource re-allocation among different infrastructures Command Line interface to ease the access to this service Exclusive access to the service to prevent conflicts Keep status information about the resources assigned to each infrastructure as well as the historical to be able to make predictions about the future needs Scheduler that can dynamically re-allocate resources and support manually planning future re-allocations https://portal.futuregrid.org 76

Use Case: Move Resources

CM FG CLI Component Autonomous Runtime Services CM

FG Move

CM FG Metrics Component FG Scheduler Component CM FG Provisioning Component (Teefaa) OpenStack CM FG Move Controller HPC CM FG Move Controller Eucalyptus CM FG Move Controller FutureGrid Fabric https://portal.futuregrid.org 77

Use Case: Move Resources

Autonomous Runtime Services

FG Move

FG CLI Component FG Metrics Component FG Scheduler Component FG Provisioning Component (Teefaa) OpenStack FG Move Controller HPC FG Move Controller Eucalyptus FG Move Controller FutureGrid Fabric https://portal.futuregrid.org 78

Use Case: Move Resources

CM FG CLI Component Autonomous Runtime Services CM

FG Move

CM FG Metrics Component FG Scheduler Component CM FG Provisioning Component (Teefaa) OpenStack CM FG Move Controller 1 2 HPC CM FG Move Controller FutureGrid Fabric https://portal.futuregrid.org Eucalyptus CM FG Move Controller 79

Use Case: Move Resources

CM FG CLI Component Autonomous Runtime Services CM

FG Move

CM FG Metrics Component FG Scheduler Component CM FG Provisioning Component (Teefaa) Gregor von Laszewski OpenStack CM FG Move Controller 1 2 HPC CM FG Move Controller FutureGrid Fabric https://portal.futuregrid.org Eucalyptus CM FG Move Controller 80

CloudMesh Feature Summary

• • • • • • Provisioning – RAIN Bare Metal – RAIN of VMs – RAIN of Platforms – Templated Image Management Resource Inventory Experiment Management with IPython Integration of external clouds Integration of HPC resources Project, Role, and user based authorization framework https://portal.futuregrid.org

Cloudmesh Federation Aspects

• • Federate HPC services – Covered by Grid technology – Covered by Genesis II (often used) Thus: Should not be focus of our activities as addressed by others – We provide users the ability to access HPC resources via key management – This is logical as each HPC resource in FG is independent.

https://portal.futuregrid.org

Federated Cloud services

– Data: • No shared data services – Accounting (via cloudmesh) • Uniform metric framework developed, that allows us to integrate with accounting. Example XSEDE integration will include accounting data from our cloud platforms.

– Authentication & Authorization (LDAP & Project and Role based authentication, can integrate with various IAAS, Eucalyptus, OpenStack, (Nimbus does not support projects) https://portal.futuregrid.org

Federated Cloud Services

• • • Templated images – Cloudmesh will integrate with rain allowing access to a templated image library that allows to run images on multiple IaaS across its federation. VM Management – Cloudmesh Users can manage easily all their VMs (even on different IaaS) through a single API, commandline and GUI Cloud Bursting – HPC services will be augmented by cloud bursting services. Users of cloudmesh will not be aware of this detail, but we intend in a future version to add information services for it https://portal.futuregrid.org

Federated Cloud Services

• • Current: Cloud Shifting – Administrators will be able to shift resources between IaaS and HPC. This is done via bare metal provisioning. Cloudmesh will provide a convenient role based access to such a service.

– Administrators and users will be able to use bare metal provisioning via cloudmesh through role, project, and user based access Future: Autonomous Cloud Shifting – Resources will be alligned by an autonomous service that is guided by Metrics and user demand.

https://portal.futuregrid.org

Cloudmesh TestbedaaS Tool Screenshots

https://portal.futuregrid.org

User Side Federation with Cloud Mesh UI

https://portal.futuregrid.org 87

Interactive Cloudmesh with IPython

https://portal.futuregrid.org 88

CloudMesh: Example of Moving a Service

https://portal.futuregrid.org 89

Cloudmesh One Click Install

Hadoop one-click Install

https://portal.futuregrid.org 90

Registering External Clouds

https://portal.futuregrid.org 91

Details -- Security

https://portal.futuregrid.org 92

• • •

Security issues in FutureGrid Operation

Security for TestBedaaS is a good research area (and Cybersecurity research supported on FutureGrid)!

Authentication and Authorization model – This is different from those in use in XSEDE and changes in different releases of VM Management systems – We need to largely isolate users from these changes for obvious reasons – – Non secure deployment defaults (in case of OpenStack) OpenStack Grizzly (just released) has reworked the role based access control mechanisms and introduced a better token format based on standard PKI (as used in AWS, Google, Azure) – Custom: We integrate with our distributed LDAP between the FutureGrid portal and VM managers. LDAP server will soon synchronize via AMIE to XSEDE Security of Dynamically Provisioned Images – Templated image generation process automatically puts security restrictions into the image; This includes the removal of root access – – Images include service allowing designated users (project members) to log in Images vetted before allowing role-dependent bare metal deployment – No SSH keys stored in images (just call to identity service) so only certified users can use https://portal.futuregrid.org 93

Some Security Aspects in FG

• User Management – Users are vetted twice • (a) when they come to the portal all users are checked if they are technical people and potentially could benefit from a project • (b) when a project is proposed the proposer is checked again. • Surprisingly: so far vetting of most users is simple – Many portals do not do (a) • therefore they have many spammers and people not actually interested in the technology • As we have wiki forum functionality in portal we need (a) so we can avoid vetting every change in the portal which is too time consuming https://portal.futuregrid.org

•

Image Management

Authentication and Authorization – Significant changes in technologies within IaaS frameworks such as OpenStack – OpenStack • Evolving integration with enterprise system Authentication and Authorization frameworks such as LDAP • Simplistic default setup scenarios without securing the connections • Grizzly changes several things https://portal.futuregrid.org

• •

Significant OpenStack changes

Grizzly: – “A new token format based on standard PKI functionality provides major performance improvements and allows offline token authentication by clients without requiring additional Identity service calls. – more organized management of multi-tenant environments with support for groups, impersonation, role-based access controls (RBAC), and greater capability to delegate administrative tasks.” Havana: – Introduction of Groups – Renaming of Tenants to Projects (meaning what we already call a project) https://portal.futuregrid.org

A new version comes out …

• • • We need to redo security work and integration into our user management system.

Needs to be done carefully.

Should we federate accounts?

– Results: • A) use the same keys as in HPC -> Yes • B) use the same password as in the portal -> No https://portal.futuregrid.org

(Cont.) Recent Lesson

• Indirect Federation (example OpenStack) – Each cloud has its own identity service. This is done on purpose to experiment with multiple identity services as to replicate what we see in commercial cloud. E.g. Google and AWS have their own identity service. – Two choices for managing user authentication: • Same password as portal (deployed at TACC) – Model has security disadvantages as we are not able to give access to CloudServices due to issue of passwords stored in rc files by users – OpenStack code was modified • Using different password (deployed at IU) – We use the native way of using rc files and do not use the portal password – This is the model that we will use in future.

https://portal.futuregrid.org 98

Federation with XSEDE

• • We can receive new user requests from XSEDE and create accounts for such users How do we approach SSO?

– The Grid community has made this a major task – However we are not just about XSEDE resources, what about EGI, GENI, …, Azure, Google, AWS – Two models (a) VO’s with federated authentication and authorization (b) user-based federation while user manages multiple logins in various services through a key-ring with multiple keys • We believe that for the majority of our users b) was sufficient. However, we use the same keys on HPC as we to on Clouds.

https://portal.futuregrid.org

Details – Image Generation

https://portal.futuregrid.org 100

Life Cycle of Images

(a) Creating and Customizing Images (b) Storing Images (c) Registering Images (d) Instantiating Images User selects properties and software stack features meeting his/her requirements Abstract Image Repository March 2013 Nimbus Eucalyptus OpenStack OpenNebula Bare Metal https://portal.futuregrid.org Gregor von Laszewski 101

Phase (a) & (b) from Lifecycle Management

• Creates images according to user’s specifications: Command Line Tools • OS type and version Requirements: OS, version, hadrware,...

• Architecture • Software Packages • Images are not aimed to any specific infrastructure • Image stored in Repository March 2013 Yes Retrieve Image from Repository https://portal.futuregrid.org Matching Base Image in the Repository?

Image Gen. Server

No Generate Image

VM OpenNebula

Base OS

CentOS 5

Base Software X86 CentOS 6 X86_64 Base Image Install Software FG Software Cloud Software User Software Update Image Store in Image User's Image Repository Gregor von Laszewski 102

•

Performance of Dynamic Provisioning

4 Phases a) Design and create image (security vet) b) Store in repository as template with components c) Register Image to VM Manager (cached ahead of time) d) Instantiate (Provision) image 300 250 200 150

Provisioning from Registered Images

Phase d) OpenStack xCAT/Moab 500 400 300 200 100 0

CentOS 5 Generate an Image

Phase a) b) Upload image to the repo Compress image Install user packages Install u l packages Create Base OS Boot VM

Ubuntu 10.10

Generate Images

Phase a) b) 100 50 800 600 400 200 0 CentOS 5 Ubuntu 10.10

0 1 2 4 8

Number of Machines

16 37 1 2 4

Number of Images Generated at the Same Time

https://portal.futuregrid.org 103

Time for Phase (a) & (b)

Generate an Image

500 Upload image to the repo Compress image 400 300 Install user packages 200 100 0 Install u l packages Create Base OS Boot VM

CentOS 5 Ubuntu 10.10

Generate Images

800 600 400 200 0 CentOS 5 Ubuntu 10.10

1 2 4

Number of Images Generated at the Same Time

https://portal.futuregrid.org

140 120 100 80 60 40 20 0

Time for Phase (c)

Deploy/Stage Image on xCAT/Moab

xCAT packimage Retrieve kernels and update xcat tables Untar image and copy to the right place Retrieve image from repo 300 250 200 150 100 50 0

Deploy/Stage Image on Cloud Frameworks OpenStack Eucalyptus

Wait un l image is in available status (aprox.) Uploading image to cloud framework from client side Retrieve image from server side to client side Umount image (varies in different execu ons) Customize image for specific IaaS framework Untar image Retrieve image from repo or client

BareMetal

https://portal.futuregrid.org

Time for Phase (d)

Provisioning Images

300 250 200 150 100 50 0 1 https://portal.futuregrid.org

2 4 8

Number of Machines

16 https://portal.futuregrid.org 37 OpenStack xCAT/Moab

Why is bare metal slower

• HPC bare metal is slower as time is dominated in last phase, including a bare metal boot • In clouds we do lots of things in memory and avoid bare metal boot by using an in memory boot.

• We intend to repeat experiments in Grizzly and will have than more servers.

https://portal.futuregrid.org

Details – Monitoring on FutureGrid

Monitoring and metrics are critical for a Testbed https://portal.futuregrid.org 108

Inca

Software functionality and performance

Ganglia

Cluster monitoring

perfSONAR

Network monitoring - Iperf measurements

SNAPP

Network monitoring – SNMP measurements

Monitoring on FutureGrid

https://portal.futuregrid.org

Important and even more needs to be done

Transparency in Clouds helps users understand application performance

• FutureGrid provides transparency of its infrastructure via monitoring and instrumentation tools • Example: $ cloud-client.sh

–conf conf/alamo.conf --status Querying for ALL instances.

[*] - Workspace #3132. 129.114.32.112 [ vm-112.alamo.futuregrid.org ] State: Running Duration: 60 minutes.

Start time: Tue Feb 26 11:28:28 EST 2013 Shutdown time: Tue Feb 26 12:28:28 EST 2013 Termination time: Tue Feb 26 12:30:28 EST 2013 Details: VMM=129.114.32.76

*Handle: vm-311 Image: centos-5.5-x86_64.gz

Nimbus provides VMM information

•

Messaging and Dashboard provided unified access to monitoring data

Consumers Messaging tool provides programmatic access to monitoring data – Single format (JSON) – Single distribution mechanism via AMQP protocol (RabbitMQ) – Single archival system using CouchDB (a JSON object store) Database query/result Common Representati on Language messages Messaging Service messages messages Information Gatherer Information Gatherer • Dashboard provides integrated presentation of monitoring data in user portal https://portal.futuregrid.org

• •

Virtual Performance Measurement

Goal: User-level interface to hardware performance counters for applications running in VMs Problems and solutions: – VMMs may not expose hardware counters • addressed in most recent kernels and VMMs – Strict infrastructure deployment requirements • exploration and documentation of minimum requirements – Counter access may impose high virtualization overheads • requires careful examination of trap-and-emulate infrastructure • counters must be validated and interpreted against bare metal – Virtualization overheads reflect in certain hardware event types; i.e. TLB and cache events • on-going area for research and documentation https://portal.futuregrid.org

Virtual Timing

• • • • Various methods for timekeeping in virtual systems: – real time clock, interrupt timers, time stamp counter, tickless timekeeping (no timer interrupts) Various corrections needed for application performance timing; tickless is best PAPI currently provides two basic timing routines: – PAPI_get_real_usec for wallclock time – PAPI_get_virt_usec for process virtual time • affected by “steal time” when VM is descheduled on a busy system PAPI has implemented steal time measurement (on KVM) to correct for time deviations on loaded VMMs https://portal.futuregrid.org

Effect of Steal Time on Execution Time Measurement

• • real execution time of matrix matrix multiply increases linearly per core as other apps are added virtual execution time remains constant, as expected • • https://portal.futuregrid.org both real and virtual execution times increase in lockstep virtual guests are “stealing” time from each other, creating the need for a virtual-virtual time correction (Stealing when VM’s descheduled to allow others to run)

Details – FutureGrid Appliances

https://portal.futuregrid.org 115

• • • • • • •

Education and Training Use of FutureGrid

28 Semester long classes: 563+ students – Cloud Computing, Distributed Systems, Scientific Computing and Data Analytics 3 one week summer schools: 390+ students – Big Data, Cloudy View of Computing (for HBCU’s), Science Clouds 7 one to three day workshop/tutorials: 238 students Several Undergraduate research REU (outreach) projects From 20 Institutions Developing 2 MOOC’s (Google Course Builder) on Cloud Computing and use of FutureGrid supported by either FutureGrid or downloadable appliances (custom images) – See http://iucloudsummerschool.appspot.com/preview http://fgmoocs.appspot.com/preview and FutureGrid appliances support Condor/MPI/Hadoop/Iterative MapReduce virtual clusters https://portal.futuregrid.org 116

Educational appliances in FutureGrid

• • • A flexible, extensible platform for hands-on, lab-oriented education on FutureGrid Executable modules – virtual appliances – Deployable on FutureGrid resources – Deployable on other cloud platforms, as well as virtualized desktops Community sharing – Web 2.0 portal, appliance image repositories – An aggregation hub for executable modules and documentation https://portal.futuregrid.org 11 7

• • •

Grid appliances on FutureGrid

Virtual appliances – Encapsulate software environment in image • Virtual disk, virtual hardware configuration The Grid appliance – Encapsulates cluster software environments • Condor, MPI, Hadoop – Homogeneous images at each node – Virtual Network forms a cluster – Deploy within or across sites Same environment on a variety of platforms – FutureGrid clouds; student desktop; private cloud; Amazon EC2; … https://portal.futuregrid.org 11 8

Grid appliance on FutureGrid

• Users can deploy virtual private clusters Hadoop + Virtual Network GroupVPN Credentials (from Web site) Group VPN copy A Hadoop worker instantiate Another Hadoop worker Virtual machine Repeat… Virtual IP - DHCP 10.10.1.1

https://portal.futuregrid.org Virtual IP - DHCP 10.10.1.2

11 9