Fault Tolerance Design Techniques

Download Report

Transcript Fault Tolerance Design Techniques

An Introduction to Cloud Computing
Seattle University
Course-Computing System
Professor-Dr. Yingwu Zhu
By
Navsimrat Kaur
Pooja Singhal
Sangeetha Codla Diwakar
Outline
Introduction
 Key Characteristics and Benefits of Cloud
 Different Cloud Service Models
 Case Study A: Amazon EC2 - Pooja
 Case Study B: Google App Engine - Navi
 Case Study C: Microsoft Azure - Sangeetha
 Current Issues and Limitations of Cloud
 Summary
 References

What is Cloud Computing
 Cloud computing is a technology that uses the internet and central
remote servers to maintain data and applications.
 It provides on demand resources and services over the internet
with the power of scalability and reliability.
Key Characteristics and Benefits
Economy:




The most frequent reason cited was that the cloud wins in cost.
Zero Upfront Infrastructure Cost
Pay as per Use

Time: Just in time Infrastructure

Elasticity : Scale up, scale down, on-demand – Improved Testability,
Experimentation

Better Resource Utilization

Potential for shrinking the Processing Time

Overflow the traffic to Cloud
http://www.dotcominfoway.com/technology/cloud-computing
Different Cloud Service Models
http://hrushikeshzadgaonkar.wordpress.com/
3 Building Blocks
SaaS: on-demand software delivery model in which software and its
associated data are hosted centrally on a cloud and are typically accessed
by users using a client, normally using a web browser over the Internet.
*Allows users to run existing online applications

PaaS: This includes hardware (servers, networks, load balancers etc) and
software (operating systems, databases, application servers etc). There are
a number of PaaS providers including Google App Engine, Microsoft Azure
and Salesforce.com’s Force.com.
*Allows User to create their own cloud applications using supplier specific tools
and languages.

IaaS: This includes hardware (servers, networks, load balancers etc) and
software (operating systems, databases, application servers etc).
*Allows Users to run any application they please on cloud hardware of their
choice

Case Study C
CASE STUDY 1 – Amazon EC2
Amazon EC2

EC2: Amazon Elastic Compute Cloud is a web service that provides
resizable compute capacity in the cloud. It is designed to make web
scale computing easier for developers.

Gives a virtual instance of the machine on the cloud to host and run
applications on the virtual instance.

Uses XEN Para-Virtualization Architecture
XEN Para-Virtualization Architecture
http://tr.opensuse.org/An_Introduction_to_Virtualization
Amazon EC2 Core Features

Amazon Machine Images: Contains all the information necessary to boot
instances of user’s software. It is also possible to use templated images that
are already available for usage and allow instance usage of EC2

Amazon EC2 Instance: The running system based on AMI is referred to as
an instance.

Amazon Elastic Block Store: offers persistent storage for EC2 Instances.
Designed to protect data by automatically creating replicas. EC2 instances
can be stopped and restarted.

Elastic Load Balancing: Automatically distributes incoming application
traffic across multiple Amazon EC2 instances.

Auto Scaling: Automatic Scaling up/down of EC2 Instances, provided by
Amazon Cloud Watch.
Amazon EC2 Functionality






Select a pre-configured, templated image to get up and running
immediately. (Or Configure a new AMI)
Configure security and network access on Amazon EC2 instance.
Choose instance type(s) and operating system.
Start, terminate, and monitor instances of AMI as per need
Determine whether want to run in multiple locations, utilize static IP
endpoints, or attach persistent block storage to your instances.
Pay only for the resources that are actually consumed
Amazon EC2 Demo
Amazon EC2 Failure: Analysis
On April 21st 2011, an Amazon’s Data Center failure in Northern Virginia
caused dozens of popular websites to be out of service for a considerable
amount of time.
 Affected: Foursquare , Reddit, Hootsuite, Quora, many other companies
 Unaffected: Netflix, SimpleGeo, SmugMug
What really happened?







Amazon Engineers were attempting to upgrade primary EBS networks,
accidentally routed some traffic to a backup network with insufficient capacity
A large number of EBS nodes lost their connection to the replicas they had
created, causing them immediately look for somewhere to make their replicas.
Instances which were trying to read/write these volumes also get stuck.
In order to stabilize and restore EBS cluster, all control APIs were disabled in the
affected Availability zone giving rise to unavailability of service.
Amazon team took 12 hours to control Replication Storm
Took much more then that to recover Customer’s data, 0.07% EBS volumes
were unrecoverable.
Amazon EC2 Failure:
Lessons Learned

Better Communication with Clients in Crisis:
Amazon Harshest Criticism : Lack of any response for more than 40 minutes





Incident showed weaknesses of a cloud, also highlights liabilities in those
who have become totally dependent upon Clouds.
Cloud is still maturing and evolving.
Don’t store data on Instance or if store then back it up frequently. Also make
an AMI of your instance for easy recreation or cloning.
Design your systems keeping Cloud in mind: Each component (EC2
Instance) should be able to die without affecting the whole system.
Netflix uses Chaos Monkey (set of Scripts) that runs through AWS
processes and occasionally shuts them down to ensure that rest of the
system is able to keep running. Also, uses Amazon’s Cloud Redundant
Backup Infrastructure.
Pay Model: Free Tier
Pay Model: On Demand Instances
Pay Model: Reserved Instances
SO What’s SO Amazing in Amazon EC2?







Elastic
Completely Controlled
Flexible
Reliable
Secure
Designed to use with other Web Services
Inexpensive
Case Study C
CASE STUDY 2 – Google App Engine
Overview





Run your application on Google infrastructure.
Build your app using
 JVM bases interpreter or compiler.
 Python
 Go
Applications build on Google infrastructure are
 Easy to build, maintain and scale.
User has a choice either the app to be served by free domain appspot.com
or he can his own domain name.
Starting package is free
 10 applications
 500 mb storage
 5 million page views per month
High Level Architecture
http://www.byteonic.com/2009/why-java-is-a-better-choice-than-using-python-on-google-appengine/
How does it work?








Dynamic web serving
Persistent storage
Automatic scaling and load balancing
APIs for user authentications and sending emails.
Fully featured local development environment.
Task queues
Scheduled tasks
Secure Environment

Sandbox
 Sandbox isolates the application from operating system, hardware and physical location
of the server in very secure and reliable way.
 This makes load balancing easy
DataStore




A powerful distributed data storage service.
 Grows with the amount of traffic.
Stores data objects as entities. An entity can have more than one property
of different types.
 Create, update or delete happens in a transaction.
Entity can belong to entity groups also which are defined as hierarchy of
relationships between entities.
Uses optimistic concurrency
Types of Datastore

High Replication datastore
 Synchronous
 Highly available and reliable
 Available for reads and writes during planned downtime also
 Data replicated using Paxos algorithm.
 3 times expensive than Master/Slave

Master/Slave datastore
 Asynchronous
 One datacenter is master at any given time for write queries. Therefore
offers strong consistency.
.
Services


Memcache
 When to use?
 Speed up common datastore queries
 Session data, user preferences and frequently performed queries
 When not to use?
 Values can expire unexpectedly from cache. Make sure that your
application runs normally if the value is suddenly not available.
Quota
URL Fetch







Communicate with other hosts using http or https requests.
URL to be fetched can use any port in the range : 80-90,440-450 and 102465535.
Fetch can use any of GET,POST,PUT, HEAD and DELETE.
A request handler cannot call its own URL.
Default deadline for response for URL fetch is 5 seconds and maximum is
10 seconds for online and 10 minutes for offline.
Supports both synchronous and asynchronous requests.
Quota
Mail



Sending emails
 The message to be sent is queued and call returns immediately.
 Mail service contact each recipient’s mail server, delivers the message
and retries if the server was unavailable.
 If mail service fails in sending message, then error message is sent to
the address of the sender of the message.
Receiving emails
 Receive emails of the form [email protected]
 Received as HTTP requests
Quota
BlobStore

Allows the app to serve data objects that can be upto 2 gigabytes in size.
Useful for serving large files , e.g., Videos or image files or allowing users to
upload large files.

Cannot be modified once they are created

Quota

.
Capabilities & Images


Capabilities

Detect outages and scheduled downtime.

Reduce downtime by detecting if capability is available or not
Images

Manipulate images(rotate, crop, resize) using Image service

Support JPEG, PNG,GIF,BMP,TIFF and ICO formats.
.
Channel

Creates a persistent API between application and Google servers
http://code.google.com/appengine/docs/java/channel/overview.html
.
Channel Quota
OAuth


Protocol that allows a user to grant third party limited permission without the
user to give his/her username or password to the third party.
Various steps between user and the service provider
 Consumer calls a web service to get request token for app.
 Redirection of user browser to authentication URL, user signs in and
tells Google accounts that consumer is authorized to access service on
user’s behalf.
 Consumer calls web service to get access token
 Consumer is authorized to call the service now.
Task Queues

Apps perform tasks other than from the user requests, e.g., for some
background work. Efficient and powerful tool for background processing
 Push Queues
 Configure a queue and add tasks to it. App engine takes care of
rest.
 Easy to implement but restricted to use within app engine.
 Pull Queues
 Best choice if using a different system to consume tasks.
 Task consumer leases specific number of tasks from the queue and
is responsible for deleting it afterwards.
 Gives more flexibility and control over when and where tasks will be
processed.
 Quota
Users, Multitenancy and XMPP




Authenticate users
 Google accounts
 Google Apps domain
 OpenId()
One instance of an application servers many clients.
XMPP: Send and receive messages to and from any XMPP compatible
chatting service , e.g. Google talk
XMPP quota
Billing Model
http://code.google.com/appengine/docs/billing.html
CASE STUDY 3 – Microsoft Azure
Microsoft Azure Platform


The Windows Azure platform is a simple, reliable, and
powerful Microsoft platform for creating cloud
applications, online services, and websites.
Core products:
 Windows Azure
 SQL Azure
 Windows Azure Platform AppFabric
Windows Azure



Windows Azure is the operating system that helps
developers build, host and scale applications through
Microsoft datacenters
Applications are run through internet accessible data
centers.
Data stored on machines in a internet accessible data
center.
Windows Azure apps run in data centers
accessed via internet
White Paper: Introducing windows azure
Main Components of Windows Azure
White Paper: Introducing windows azure
Windows Azure Components





Compute: runs apps in the cloud.
Storage: Stores data in the cloud.(Blobs, tables, queues)
Fabric Controller: Deploys , monitors and manages the
apps in the cloud.
Content Delivery Network: For faster access to the data
storage by maintaining cached copies of data.
Connect: allows connection between on-premises
computers and applications.
Application Roles
An application can have one or more instances of each of these
roles.
 Web Roles – This makes it easy for web based applications. It
has IIS configured within it. This is like front end.


Worker Roles – For windows based code. This does not have
IIS configured. Handles all processing like user interactions,
video processing etc.


So creating WCF, ASP.NET apps is easy.
When user submits request for some task. That task can involve
front end and back end tasks. Web role takes care of the front
end tasks and it hands over the processing tasks to the Worker
role.
VM Roles- Helps moving windows server apps to windows
azure.
Submitting App to Azure


Submits
 App
 Config- tells platform how many instances of each
role (web, worker)to run.
Fabric Controller – based on the config file info, creates
a VM for each instance.
Setting up a simple Azure Application

http://www.microsoft.com/windowsazure/getstarted/
Browser Output
Sample Application to store file data to Azure








Create a simple text file “myfile.txt”
Create a console application in VS2010
Reference Microsoft Azure storage DLL.
Create a blob (These are used for storage. They can
interact with storage as if they were a local system file).
Set file reference to the blob.
Upload file from local to blob.
Get the URI of the blob.
Now we can access data through this URI.
Current Problems/Limitations of Cloud

EC2 Limitations
 Not easy to recover if something goes wrong after creating instance.
 Random Loss of Instances
 Server Configuration Woes : Configuring, Running and Monitoring EC2 Instances

Azure Limitations
 Azure provides application level cloud computing not infrastructural cloud computing
like amazon. Can only select applications, no choice of OS.
 Security concerns: We can not be sure who has access to data.
 Learning curve: working with storage like blobs, tables, queues needs some
experience.
 Poor debugging and logging support for deployed applications
 Untested compared to Google and Amazon’s offerings.

App Engine Limitations
 Returns stale results in case of non ancestoral queries in High Replication Datastore.
 Data may be unavailable during planned downtime or failures in case of Master/Slave
data store.
Summary


Cloud Computing Introduction, Benefits, Cloud Service Models
Case study 1: Amazon EC2
 EC2 Architecture
 EC2 Core Features and Functionality
 Demo of Launching an EC2 Instance
 April 2011 Failure Analysis and Lessons Learned
 Different Pay Models
 Highlights

Case study 2:Google AppEngine
 Overview
 High Level Architecture
 Data Stores
 High Reliable
 Master/Slave
 Services and Quotas
 Billing Model
Summary


Case study 3: Microsoft Azure
 Windows Azure Definition
 Main Components
 Roles
 Web Roles
 Worker Roles
 VM Roles
 Sample Applications:
 Hello World!
 Accessing file data.
Limitations
References
 Google 
 http://www.microsoft.com/windowsazure
 http://www.jackofallclouds.com/
 http://aws.amazon.com/ec2/
 http://cloud-computing.learningtree.com/tag/amazon-ec2/
 White Paper on AWS Cloud Best Practices, 2010 By Jinesh Varia
 White Paper on Amazon EC2 on Red Hat Enterprise Linux
http://www.microsoft.com/en-us/cloud/developer/
http://www.microsoft.com/enus/cloud/developer/resource.aspx?resourceId=introducing-windowsazure&fbid=kRj7B2TdjLB
http://www.microsoft.com/windowsazure/getstarted/
http://www.microsoft.com/windowsazure/sdk/
 http://kasunpanorama.blogspot.com/2010/07/understanding-cloud-computing-feeleasy.html
http://code.google.com/appengine/docs/
Google App engine paper by Alexander Zahariev Helsinki University of Technology
Team members contributions



Pooja – Amazon EC2
Navi – Google App Engine
Sangeetha – Microsoft Azure