• • • • • • • •             Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets       PDU PDU PDU PDU PDU TOR PDU TOR Servers TOR Servers TOR FC PDU US-North Central Region TOR Servers FC Servers TOR Servers  TOR Servers  TOR Servers  FC Servers FC PDU.

Transcript • • • • • • • •             Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets       PDU PDU PDU PDU PDU TOR PDU TOR Servers TOR Servers TOR FC PDU US-North Central Region TOR Servers FC Servers TOR Servers  TOR Servers  TOR Servers  FC Servers FC PDU.

•
•
•
•
•
•
•
•












Visual Studio
Windows Azure Portal
Rest APIs / PS Cmdlets






PDU
PDU
PDU
PDU
PDU
TOR
PDU
TOR
Servers
TOR
Servers
TOR
FC
PDU
US-North Central Region
TOR
Servers
FC
Servers
TOR
Servers

TOR
Servers

TOR
Servers

FC
Servers
FC
PDU



Datacenter
Routers
Aggregation Routers and
Load Balancers
Agg
Agg
Cluster 1
Cluster Network
Aggregation
Cluster 2
Agg
Agg
Cluster 3
Agg
Cluster 5
Cluster 4
Agg
Agg
Agg
Agg
PDU
PDU
PDU
TOR
…
Servers
…
TOR
Servers
PDU
…
TOR
Servers
PDU
PDU
TOR
Servers
PDU
TOR
Servers
PDU
TOR
Servers
PDU
…
TOR
Servers
TOR
Servers
PDU
TOR
Servers
PDU
…
TOR
Servers
PDU
TOR
Servers
PDU
TOR
Servers
PDU
…
TOR
Servers
Power
Distribution Units
TOR
Servers
Racks
TOR
Servers
Top of Rack
Switches
PDU





Cluster
AGG
Rack 20
Rack 2
Rack 1
TOR Switch
TOR Switch
TOR Switch
Cluster
Agg
Servers
PDU
…
PDU
…
…
…
PDU
TOR
…
Servers
TOR
Servers
TOR
PDU
PDU
PDU
Inside a Physical Server

TOR Switch
Physical
Server
VM
VM
CPU
CPU
PaaS VM Role
Instance
…
PDU
Trust boundary
Host Partition
CPU
VM
CPU
PaaS VM Role
Instance
CPU
CPU
IaaS VM Role
CPU
CPU
Unallocated
CPUs



AGG
TOR Switch
TOR Switch
PDU
PDU
 FC deploys the role instances in (at
least) two different fault domains.
Azure Load
Balancer
 Different roles are allocated to fault
domains independently
 An even distribution is maintained when
scaling up or down
 No way to control the Fault Domain
mapping, but it can be queried for
each role instance:
 Portal
 REST service mgmt. APIs (“FaultDomain”)
 Queuing can be defined between the
layers (only LB by default)
Web
Role
Worker
Role
Web Role
Instance 0
Fault Domain 0
Web Role
Instance 1
Fault Domain 1
Worker Role
Instance 0
Fault Domain 0
Web Role
Instance 2
Fault Domain 0
Worker Role
Instance 1
Fault Domain 1
 Update Domains (UD) control how to the
service is updated.
Azure Load
Balancer
 A single UD is being updated for a role at a time.
 Scenarios:
 User Initiated: PaaS service owner updates the
service package or chooses a different Guest OS
Web
Role
Web Role
Instance 0
Update Domain 0
Web Role
Instance 1
Update Domain 1
Web Role
Instance 2
Update Domain 2
 Platform Initiated: Update Guest OS for PaaS
services when a new version is released (e.g.
security fixes); Update the server (hypervisor)
 Implementation Details:
 Role instances are assigned into different UDs,
circularly
 Alignment between UDs of the different roles
 Up to 20 UDs per Service (5 by default)
Worker
Role
Worker Role
Instance 0
Update Domain 0
Worker Role
Instance 1
Update Domain 1

Web Role
FD0
UD0
IN_0
UD1
FD1
IN_1
UD2
IN_2










Worker Role
FD0
UD0
IN_0
UD1
FD1
IN_1
mode

•
•
•
Walk Upgrade Domain

•
Rollback
update
•

Swapping
•
•

•
•
Change Deployment Configuration
Delete Role Instances
Running Highly Available Cloud Virtual
Machines
• Sample application to demonstrate
Azure Load
Balancer
Windows Azure Usage (application
migrated from customer premise).
• Sample application specifics:
•
•
•
High redundancy for each component
Load balancer for the front end
Data layer can be implemented by SQL
Server or SQL Azure (here); alternatively, can
utilize Windows Azure storage
Frontend
Availability Set
Front End
Front End
Queueing or load-balancing
• Set up the whole application in the same
affinity group to gain physical proximity
Backend
Availability Set
Backend
Backend
Geo-Distributed Storage
Or SQL Azure
Front End
• Availability sets instruct how to allocate VMs in the
Azure Load
Balancer
Frontend
Availability Set
Front End
Fault Domain 1
Front End
Fault Domain 2
Front End
Fault Domain 1
datacenters to isolate impact for hardware faults
and infrastructure updates.
• Availability sets are defined through portal or REST
APIs.
• Availability sets has to be defined for each
redundant application tier to achieve 99.95% SLA
•
Queueing or load-balancing
•
Backend
Availability Set
Backend
Fault Domain 2
Backend
Fault Domain 1
We do not offer SLA unless there are 2 VM instances
defined and used in each availability set
Application SLA is compositional and dependent on the
multiplication of the SLA components (each tier,
compute, networking, etc)
•
•
No correspondence between fault domains used
in different availability sets
•
Geo-Distributed Storage
Or SQL Azure
e.g. Front End may cause unavailability of the entire
service.
Thus, queuing or load-balancing is being added
between the availability sets
•
•
•
Scenario: Platform initiated update of the servers
which run the IaaS VM instances.
Goal: high redundancy for the IaaS service
Each role is allocated to a different update domain
(up to 5)
•
•
•
When physical servers are updated, only fraction of the
capacity will be touched at a time (or less).
No mapping between update domains in different
availability sets.
IaaS service update is under the customer
responsibility.
•
•
In some cases customer VM update and infrastructure
update can happen in the same time.
• IaaS update notifications are sent to avoid this.
Hardware failures can occur any time. Thus, platform
update + hardware failure could still cause service
outage for dual VM availability sets.
Azure Load
Balancer
Frontend
Availability Set
Front End
Update Domain
1
Front End
Update Domain
0
Queueing or load-balancing
Backend
Availability Set
Backend
Update Domain
0
Backend
Update Domain
1
Geo-Distributed Storage
Or SQL Azure
Front End
Update Domain
2














 Capture
 Shutdown
Add
Delete
Infrastructure Operations Impacting
Customer Services



Symptom
Healing
Operation
Potential Causes
Issue with a customer
code or customer VM
Reboot the
VM(s)
• Role instance or Guest OS
crash (PaaS)
• Customer OS Crash (IaaS)
Issue with physical
server or rack
Allocate the
impacted
customer VMs
to the different
server(s)
• Physical server software
failure
• Physical server hardware
failure
• Rack / PDU / ToR Failure







Note: your role instance keeps the same VM and VHDs, preserving cached data
in the resource volume
Aspect
Cloud Services (PaaS)
Azure VMs (IaaS)
Fault Domain count
Two per Role
Two per Availability Set
Update Domain count
Five by default; up to twenty
Five
Platform update
UD by UD
UD by UD
Administrator initiated
update
UD by UD, or Blast, or Customer
Controlled UD walk or VIP-Swap
Administrator controlled (can be
automated using PowerShell or REST
management APIs)
Frontend and backend
highly-available
addressability
Windows Azure provides Load-Balancer
per role; queuing recommended for
backend roles
Administrator defines endpoints in VMs
and maps them to a load-balanced set;
queuing recommended for backend roles
SLA
99.95% uptime for roles with two or
more role instances
99.95% uptime for Availability Sets with
two or more VMs
Multi-service collocation
Yes, using Affinity Groups
Yes, using Affinity Groups
UD/FD automated
Yes (except when deleting a specific
management when service instance)
grows / shrinks
Yes when service grows; no when shrinks
•
•
•
•

Directory