Transcript Document

Big Data Open Source Software
and Projects
ABDS in Summary IV: Layer 5 Part 2
Data Science Curriculum
March 5 2015
Geoffrey Fox
[email protected]
http://www.infomall.org
School of Informatics and Computing
Digital Science Center
Indiana University Bloomington
Functionality of 21 HPC-ABDS Layers
1) Message Protocols:
2) Distributed Coordination:
3) Security & Privacy:
4) Monitoring:
5) IaaS Management from HPC to hypervisors: Part 2
6) DevOps:
Here are 21 functionalities.
7) Interoperability:
(including 11, 14, 15 subparts)
8) File systems:
9) Cluster Resource Management:
4 Cross cutting at top
10) Data Transport:
17 in order of layered diagram
11) A) File management
starting at bottom
B) NoSQL
C) SQL
12) In-memory databases&caches / Object-relational mapping / Extraction Tools
13) Inter process communication Collectives, point-to-point, publish-subscribe, MPI:
14) A) Basic Programming model and runtime, SPMD, MapReduce:
B) Streaming:
15) A) High level Programming:
B) Application Hosting Frameworks
16) Application and Analytics:
17) Workflow-Orchestration:
OpenStack SubProjects
March 2015
•
16 OpenStack Capabilities http://www.openstack.org/software/roadmap/
•
•
•
•
•
•
•
•
•
•
•
OpenStack Compute (code-name Nova) - integrated project since Austin release
OpenStack Networking (code-name Neutron) - integrated project since Folsom release
OpenStack Object Storage (code-name Swift) - integrated project since Austin release
OpenStack Block Storage (code-name Cinder) - integrated project since Folsom release
OpenStack Identity (code-name Keystone) - integrated project since Essex release
OpenStack Image Service (code-name Glance) - integrated project since Bexar release
OpenStack Dashboard (code-name Horizon) - integrated project since Essex release
OpenStack Telemetry (code-name Ceilometer) - integrated project since the Havana release
OpenStack Orchestration (code-name Heat) - integrated project since the Havana release
OpenStack Database (code-name Trove) - integrated project since the Icehouse release
OpenStack Data Processing (code-name Sahara) - integrated project since the Juno release
•
•
•
•
•
•
New capabilities under development for Juno release and beyond:
Bare Metal (Ironic)
Queue Service (Zaqar)
Shared file system (Manila)
DNS Service (Designate)
Key Management (Barbican)
FutureGrid
IaaS request popularity by year
OpenNebula
• http://en.wikipedia.org/wiki/OpenNebula
http://opennebula.org/ Apache License.
• OpenNebula orchestrates storage, network, virtualization,
monitoring, and security technologies to deploy multi-tier services
(e.g. compute clusters) as virtual machines on distributed
infrastructures, combining both data center resources and remote
cloud resources, according to allocation policies
• The toolkit includes features for integration, management,
scalability, security and accounting. It also claims standardization,
interoperability and portability, providing cloud users and
administrators with a choice of several cloud interfaces (Amazon
EC2 Query, OGF Open Cloud Computing Interface and vCloud) and
hypervisors (Xen, KVM and VMware), and can accommodate
multiple hardware and software combinations in a data center
• Good system which strongly promoted in Europe but little used in
USA where eclipsed by OpenStack
CoreOS
•
•
•
•
http://en.wikipedia.org/wiki/CoreOS
https://coreos.com/
Open Source Linux distribution aimed at Docker
CoreOS is a fork of Chrome OS, by the means of using its
software development kit (SDK) freely available through
Chromium OS as a base while adding new functionality and
customizing it to support hardware used in servers
• CoreOS is an open source lightweight operating system based
on the Linux kernel and designed for providing infrastructure
to clustered deployments, while focusing on automation,
ease of applications deployment, security, reliability and
scalability.
– As an operating system, CoreOS provides only the minimal
functionality required for deploying applications inside software
containers, together with built-in mechanisms for service
discovery and configuration sharing
VMware vCloud, ESX, ESXi
• VMware ESX http://en.wikipedia.org/wiki/VMware_ESX is an enterpriselevel computer virtualization product offered by VMware. ESX is a
component of VMware's larger offering, VMware Infrastructure, which
adds management and reliability services to the core server product.
VMware recommends that deployments running the earlier ESX
architecture migrate to the newer ESXi hypervisor architecture.
• VMware ESX and ESXi are VMware's enterprise software Type 1
hypervisors for guest virtual servers; they run on host server hardware
without an underlying operating system.
• vSphere http://en.wikipedia.org/wiki/VMware_vSphere uses VMware’s
ESXi hypervisor adding management (as in OpenStack)
• Note desktop VMware Workstation is a type 2 hypervisor
• VMware has historically been a software vendor focused on virtualization
technologies. It entered the cloud IaaS market when it launched the
VMware vCloud Hybrid Service (vCHS) into general availability in
September 2013. http://en.wikipedia.org/wiki/VCloud This allows
customers to migrate work on demand from their "internal cloud" of
cooperating VMware hypervisors to a remote cloud of VMware
hypervisors.
– This is called cloud bursting
Amazon, Azure, Google Clouds
•
•
•
•
Gartner has a “magic quadrant” summarizing public clouds 28 May 2014
http://www.gartner.com/technology/reprints.do?id=1-1UKQQA6&ct=140528
Note Amazon is way ahead!
Google with GCE (Google Compute Engine) is just starting IaaS. Previously it offered
PaaS with Google App Engine
Microsoft has recently expanded Azure
but still catching up
Dynamic Orchestration and Dataflow
Software
(Application
Or Usage)
SaaS
Platform
PaaS
 Use HPC-ABDS
 Class Usages e.g. run
GPU & multicore
 Applications
 Control Robot
 Cloud e.g. MapReduce
 HPC e.g. PETSc, SAGA
 Computer Science e.g.
Compiler tools, Sensor
nets, Monitors
Infra  Software Defined
Computing (virtual Clusters)
structure
IaaS
Network
NaaS
 Hypervisor, Bare Metal
 Operating System
 Software Defined
Networks
 OpenFlow GENI
Amazon Web Services AWS
• Compute: Elastic Compute Cloud (EC2) offers multitenant, fixed-size and
nonresizable, Xen-virtualized VMs without autorestart. Single-tenant VMs
are available via Dedicated Instances. There are special options for HPC,
including graphics processing units (GPUs). AWS does not have any formal
private cloud offerings, though it is willing to negotiate such deals (such as
its deal for the U.S. intelligence community cloud).
• Storage: VM storage is ephemeral. Persistence requires VM-independent
block storage (Elastic Block Store). There is an option for SSDs, as well as
storage performance guarantees (Provisioned IOPS). Object-based storage
(Simple Storage Service [S3]) is integrated with a CDN (CloudFront), there is
an option for long-term archive storage (Glacier), and AWS offers its own
cloud storage gateway appliance.
• Network: AWS offers a full range of networking options. Complex
networking and IPsec VPN is done via Amazon Virtual Private Cloud (VPC).
Third-party connectivity is via partner exchanges (AWS Direct Connect).
• Security: RBAC (Role based Access Control) is per-element, with customerdefined roles and exceptional control over permissions. AWS has obtained
many security and compliance-related certifications and audits.
Google Compute Engine
• Google has been operating App Engine since 2008, but did not enter the IaaS
market until the general-availability launch of GCE in December 2013.
• Compute: GCE offers multitenant, fixed-size and nonresizable, KVM-virtualized
VMs, metered by the minute. Provisioning is exceptionally fast (typically under 1
minute).
• Storage: VM storage is persistent, and there is also VM-independent block storage.
All block storage is encrypted.
• Network: Third-party private connectivity is not supported. Customers cannot
bring their own private IP addresses (although this need may possibly be addressed
by GCE's Advanced Routing features). There is no back-end load balancing.
• Security: RBAC permissions apply to the whole account.
• Google's strategy for Google Cloud Platform centers on the concept of allowing
other organizations to "run like Google" by taking Google's highly innovative
internal technology capabilities and exposing them as services that other
companies can purchase. Consequently, although Google is a late entrant to the
IaaS market, it is primarily productizing existing capabilities, rather than having to
engineer those capabilities from scratch. It will therefore be able to advance its
offering more rapidly than most competitors
Microsoft Azure
• The Azure business was previously strictly PaaS with a Windows
and .Net focus, but Microsoft launched Azure Infrastructure
Services (which include Azure Virtual Machines and Azure Virtual
Network) into general availability in April 2013, thus entering the
cloud IaaS market.
• Compute: Azure VMs (Linux or Windows) are fixed-size, paid-bythe-VM, and Hyper-V-virtualized; they are metered by the minute.
• Storage: Block storage ("virtual hard disk") is persistent and VMindependent. Object-based cloud storage is integrated with a CDN.
• Network: There is no support for complex network topologies.
Third-party connectivity is via partner exchange (Azure
ExpressRoute).
• Security: Virtual network topology limitations prevent useful
deployment of most security-related virtual appliances, such as a
perimeter intrusion detection/prevention system (IDS/IPS). RBAC
uses Azure Active Directory, but permissions are whole-account.
Google Cloud DNS
& Amazon Route 53
• Google Cloud DNS
– Authoritative DNS server available as a service in Google Cloud
– The service is efficient, fault-tolerant and available globally
– This service can be used by the user hosted services in Google
Cloud or from third party applications
– https://developers.google.com/cloud-dns/what-is-cloud-dns
• Amazon Route 53
– Authoritative DNS server available as a service in Amazon AWS
– Provides a fault-tolerant, very fast DNS service.
– Similarly to Google Cloud DNS this service can be used by the
hosted services in Amazon Cloud or from third party applications
– The service is available in all continents except Africa
– http://aws.amazon.com/route53/