An Introduction to Azure

Download Report

Transcript An Introduction to Azure

Jimmy Narang
1
• A service in the cloud has to:
• Be able to handle arbitrary node failures
• Be available all the time
• Be able to scale up or down on demand without the need to re-write the
code
• Handle platform or software upgrades
2
• The service design must be:
•
•
•
•
•
Loosely coupled
Such that node failures do not affect functionality
Nodes can be initialized and added easily
State of the service is decoupled from nodes
Scale can be achieved through quantity (scale out)
3
• Cloud: thousands of connected servers
• Azure: an operating system for the cloud
• Abstracts away hardware – switches, servers, disks, routers, loadbalancers
• Manages deployment, so that developer can upload code and hit ‘run’
• Provides reliable common storage that can be accessed from any mode
• Provides a familiar development platform
4
• A service boundary
• Roles
• Each role has a number of identical instances
• Two types of roles: web roles and worker role
• Storage
• Accessible from any instance
• Blobs, tables, queues
• Endpoints
• External: communicate outside the service boundary
• Internal: communicate within the service boundary
5
Service Boundary
LB
n role
instances
External endpoint
External endpoint
Web
Role
Web
WebRole
Role
Web Role
m role
instances
Worker Role
Worker Role
Internal endpoints Worker Role
Cloud Storage
6
• Developers write their code and describe a service model
• Service model includes role definitions, VM Size, instance counts,
endpoints, etc.
• code + service model is packed and uploaded to Azure, which
deploys the service in Microsoft Datacenters
7
•
•
•
•
Two types: web roles and worker roles
No Admin access; cannot install applications
Choose a particular VM capacity for each role
Specify number of instances per role
• Azure starts a fresh instance if an existing one crashes
• Code:
• Extend RoleEntryPoint class for worker roles; optional for web roles.
• Asp.Net for web roles
8
• Each service runs in an isolated boundary
• The service deployment is assigned a Virtual IP address (VIP)
• The service is reachable externally via ‘external endpoints’ on this VIP
• External endpoints: ports selected to be exposed to the outside
world for in-coming connections to the service
• Usually http and https on web roles (i.e., port no. 80 and 81)
• Can be TCP endpoints on worker roles
• Both web and worker roles can make outbound connections to
Internet resources
• via HTTP or HTTPS and via Microsoft .NET APIs for TCP/IP sockets.
9
• Azure provides APIs to obtain internal IPs of each instance in
each role
• Roles can define ‘internal endpoints’ (ports exposed within the
service) to communicate between instances
10
• Accessed from anywhere using account name and storage key
• Exposed in the form of URIs:
• http://<accntName>.queue.core.windows.net/<queueName>
• http://<accntName>.blob.core.windows.net/<container>/<blobName>
• http://<accntName>.table.core.windows.net/<tableName>
11
• Queues: often the best way to communicate between roles
• Messages can be 8kb max
• use messages as pointers to blobs/tables for larger data
• Can create several queues per account
• Not guaranteed Fifo; no priority queues either.
• Guaranteed each message will be seen at least once
12
•
•
•
•
Create / Delete queue
Get / Put message
Peek message (queueName, n)
Delete message (queueName, msgId, popreceipt)
• ‘get message’ does not lead to deletion!
• Clear Queue
13
• MessageID: A GUID associated with each msg
• VisibilityTimeOut: default 30 seconds, max: 2 hours. Messages
not deleted within this interval will return to the queue
• PopReceipt: A string retrieved with every get-msg.
• PopReceipt+MsgID required to delete a msg
• MessageTTl: (7 days) messages not deleted within this interval
are garbage collected
14
Producers
P
Consumers
C1
3 2 1
C2
C1: GetMsg (returns 1)
C2: GetMsg (returns 2)
C2: DeleteMsg #2
C1 dies
C2: GetMsg (returns 3)
Visibility Timeout on Msg#1
C2: DeleteMsg #3
C2: GetMsg (returns 1)
15
• A large chunk of (raw binary) data
• Blob Operations:
•
•
•
•
•
•
Create / Delete
Read / Write: byte range (page blob) or blocks (block blob)
Lease the blob
Create a Snapshot
Create a copy
Mount as Drive (page blob)
16
• Hierarchy: accounts, containers, blobs
• http://<account>.blob.core.windows.net/<container>/<blobname>
• An account can contain multiple containers
• A container can contain blobs or other containers
• Fine grained access control can be granted to containers/blobs
(grant permissions for individual operations such as read, write,
delete, list, take snapshot etc.)
17
•
•
•
•
A blob as a sequential list of blocks
Each block has an ID
Blocks are immutable
Upload blocks out of order / in parallel
• PutBlock to upload block
• PutBlockList to stitch uploaded blocks into blob
• Order of upload doesn’t matter; order in Putblocklist matters.
• Putblocklist: First commit wins (all uncommitted blocks are
garbage collected)
18
Block Id 2
Block Id 4
Block Id 3
Block Id 4
Block Id 2
Block Id 3
Block Id 4
Block Id 1
PutBlob (name);
PutBlock(BlockId1);
PutBlock(BlockId3);
PutBlock(BlockId4);
PutBlock(BlockId2);
PutBlock(BlockId4);
PutBlockList(BlockId2, BlockId3, BlockId4);
19
• Page blobs: A collection of pages
• Specify blob size at creation time.
• Entire range initialized to 0 at creation
• Read/Write specific byte ranges, no ‘commit’ required (unlike
block blobs)
• 512 Byte alignment required for write operations; not required
for read
20
• A lease is a timed (1 min) lock on a block
•
•
•
•
Acquire lease: create a lease for a blob without one
Renew: request to hold the existing lease
Release
Break: to end the lease but ensure that another instance cannot acquire it
until the current lease has expired
21
• Can scale up to billions of entries and terabytes of data
• Contain set of ‘entities’ (rows) with ‘properties’ (columns)
• (Partition Key, Row Key) defines the primary key
• Partition key is used to partition the table into storage nodes
• Row key uniquely identifies an entity within a partition
22
• No Fixed schema, except for Partition Key, Row Key, and
Timestamp
• Properties are stored as <name, typed value>
• Two entities can have very different properties
• Common data types – int, string, guid, timestamp etc. –
supported.
• Limits on the size of an entity (1MB), and # of properties(255,
including keys & timestamp)
23
• Queries:
• always return whole entities, no projections
• Only ‘From’, ‘Take’ (max 1000), ‘where’ operators supported – no select,
sort, group-by, join, etc.
• Normal Boolean and comparison operators supported.
• For good performance, ‘where’ should have the partition key
• Insert / Delete
• Update: Replaces the original entity
• Merge: modifies properties in place
24
• ACID guaranteed for transactions involving a single entity.
• Group Transactions have restrictions, such as:
• Only possible for entities in the same partition
• Entity needs to be identified by primary key
• Max 100 operations per ‘batch’
• Snapshot isolation: there will be no dirty reads
• Application needs to ensure cross-table consistency
25
• A partition (i.e. all entities with the same partition key) are
served by the same ‘node’
• ‘node’ here should not be thought of as a single server, but a single
‘place’.
• Entity locality: Entities within the same partition are stored together
• Tradeoffs in choosing the partition key:
• large partitions: efficient group queries
• small partitions: spread across more nodes => greater scalability
26
• Updating an entity is a multi-step process:
• Get the entity from the server
• Update it locally, and submit to server
• Entity can get changed in that time
• Use E-tags (“version numbers”) stored in the header associated
with each entity
• Update only if version number matches with the one you were expecting
• Or use If-Match * to unconditionally update
27
• Use for debugging, performance monitoring, traffic analysis etc.
• Based on logging: no remote desktop access to instances
• Choose the required Log sources: Azure, IIS logs, Windows
event logs, Perf counters, Crash dumps (and others)
• Then dump the logs locally or store them in Azure storage (at
scheduled intervals or on-demand)
28
• X Drive
• Mount a page blob as a VHD (per instance)
• SQL Azure
• Complete relational SQL storage in the cloud
• Azure appliance
• A container of pre-configured hardware with Azure installed
• Content Delivery Network
• Mark public blobs to be copied to edge locations across a region
29
• <DEMO>
30