AZR302 SQL Server Word Server Exchange Online SQL Azure Datacenter Spine TOR TOR TOR TOR TOR DLA Architecture (Old) Quantum10 Architecture (New) DC Router DCR Access Routers BL BL Aggregation + LB Spine AGG AGG L B L B L B 20Racks AGG L B AGG L B L B AGG L B L B 20Racks L B L B 20Racks TOR 20Racks TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40
Download ReportTranscript AZR302 SQL Server Word Server Exchange Online SQL Azure Datacenter Spine TOR TOR TOR TOR TOR DLA Architecture (Old) Quantum10 Architecture (New) DC Router DCR Access Routers BL BL Aggregation + LB Spine AGG AGG L B L B L B 20Racks AGG L B AGG L B L B AGG L B L B 20Racks L B L B 20Racks TOR 20Racks TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40
AZR302 SQL Server Word Server Exchange Online SQL Azure Datacenter Spine TOR TOR TOR TOR TOR DLA Architecture (Old) Quantum10 Architecture (New) DC Router DCR Access Routers BL BL Aggregation + LB Spine AGG AGG L B L B L B 20Racks AGG L B AGG L B L B AGG L B L B 20Racks L B L B 20Racks TOR 20Racks TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR TOR Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi Digi 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes 40 Nodes APC APC APC APC APC APC APC APC APC APC APC APC APC APC … … … BL Spine Spine L B Digi … BL AGG L B TOR … Spine DC Routers DCR … … 40 Nodes APC TOR … TOR TOR Image Repository Maintenance OS FC Host Agent Fabric Controller Windows Azure Parent OS Role Role Role Role Images Images Images Images Windows Azure Node OS Windows Azure Hypervisor Windows Deployment Server PXE Server Fabric Controller US-North Central Datacenter A(h, g) = C(h, g) / ∑ 𝐶 ℎ, 𝑔′ : 𝑔′ ∈ 𝐺 𝑋 𝑔 = 𝑚𝑖𝑛𝐴(ℎ, 𝑔) ℎ∈𝐻 Role B Worker Role www.mycloudapp.net Count: 2 Update Domains: 2 Size: Medium www.mycloudapp.net Load Balancer Physical Node Guest Partition Guest Partition Guest Partition Guest Partition Role Instance Role Instance Role Instance Role Instance Guest Agent Guest Agent Guest Agent Guest Agent Trust boundary Host Partition FC Host Agent Fabric Controller (Primary) Image Repository (OS VHDs, role ZIP files) Fabric Controller (Replica) … Fabric Controller (Replica) Role Virtual Machine C:\ Resource Disk Dynamic VHD Windows VHD Role VHD OS Volume Resource Volume Role Volume Guest Agent Role Host Role Entry Point Virtual Disk Driver Local RAM Cache Local On-Disk Cache Disk Blob Role Virtual Machine C:\ OS Disk RAM Cache Local Disk Cache D:\ Resource Disk Dynamic VHD E:\, F:\, etc. Data Disks FrontFrontEnd-1 End-2 max is 20 Middle Middle Tier-1 Tier-3 Tier-2 FrontEnd-1 FrontEnd-2 Middle Tier-1 Middle Tier-2 Middle Tier-3 Update Domain 1 Update Domain 2 Update Domain 3 Problem Fabric Detection Fabric Response Role instance crashes FC guest agent monitors role termination FC restarts role Guest VM or agent crashes FC restarts VM and hosted role Host OS or agent crashes FC host agent notices missing guest agent heartbeats FC notices missing host agent heartbeat Detected node hardware issue Host agent informs FC Tries to recover node FC reallocates roles to other nodes FC migrates roles to other nodes Marks node “out for repair” Guest Agent Heartbeat 5s 25 min Guest Agent Heartbeat Timeout 10 min Guest Agent Guest Agent Connect Timeout Role Instance Role Instance Launch Indefinite Role Instance Load Balancer Load Balancer Role Instance Heartbeat “Unresponsive” Timeout Heartbeat Timeout 15s 15s 15 min 30s 30s Role Role Instance Instance Start Ready (for updates only) FrontEnd-1 FrontFrontEnd-2 End-2 Middle Tier-1 Middle Middle Tier-2 Tier-2 Middle Middle Tier-3 Tier-3 Role B Worker Role www.mycloudapp.net Count: 2 Update Domains: 2 Size: Medium www.mycloudapp.net Load Balancer Service B Role A-1 UD 2 Service B Role B-2 UD 2 Allocation 1 Allocation algorithm: Prefer nodes hosting same UD as role instance’s UD Service B Role A-1 UD 2 Service B Role B-2 UD 2 Allocation 2 FrontEnd-1 FrontFrontEnd-2 End-2 Middle Tier-1 Middle Middle Tier-2 Tier-2 Middle Tier-3 expiredate.year = currentdate.year + 1; Start End Date Primary Secondary Backup1 Backup2 Friday, January 13 2012 11:00 AM 10:59 AM densamo gagupta padou anue Saturday, January 14 2012 11:00 AM 10:59 AM jimjohn mkeating chuckl padou Sunday, January 15 2012 11:00 AM 10:59 AM anilingl absingh chuckl padou Monday, January 16 2012 11:00 AM 10:59 AM sushantr lisd saadsyed sushantr Tuesday, January 17 2012 11:00 AM 10:59 AM coreysa ppatwa ksingh ritwikt Wednesday, January 18 2012 11:00 AM 10:59 AM wakkasr soupal ritwikt padou Thursday, January 19 2012 11:00 AM 10:59 AM roylin mkeating anue padou Event Date (PST) Response and Recovery Timeline Initiating Event 2/28/2012 16:00 Leap year bug begin Detection 2/28 17:15 3x25 min retry for first batch hit, nodes start going to HI (cascading failure) Phase1 2/28 16:00 – 2/29 05:23 New deployments fail initially and then marked offline globally to protect clusters Phase 2 2/29 02:57 – 2/29 23:00 Service management offline for 7 clusters (staggered recovery) Host OS Host Agent Application VM Guest Agent Public Key Private Key Hypervisor Host OS Host Agent Application VM Guest Agent Hypervisor App VM App VM Guest Agent Guest Agent 44 Customer 1: Customer 2: Customer 3: 45 119 HA v2 119 GA v2 119 Networking Plugin 119 OS VM 119 GA v1 VM VM 119 GA v1 119 GA v1 119 HA v1 119 HA v1 119 HA v1 119 Networking Plugin 119 Networking Plugin 119 Networking Plugin 119 OS 119 OS Network 119 OS Network 118 HA HA v2 118 v2 119 Networking Plugin 118 OS 118 OS VM VM VM 118 GA v1 118 GA v1 118 GA v1 118 HA v1 118 HA v1 119 HA v1 118 Networking Plugin 118 Networking Plugin 119 Networking Plugin 118 OS 118 OS Network 119 OS Network 119 HA HA v2 119 v2 119 Networking Plugin 119 OS 119 OS VM – 1 VM – 2 VM – 3 VM – 1 VM – 3 VM – 4 VM – 2 VM – 4 VM – 5 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 HA v2 118 HA v2 118 HA v2 119 Networking Plugin 119 Networking Plugin 119 Networking Plugin 118 OS 118 OS Network 118 OS Network 119 119 GA GA v2 v2 VM – 1 118 GA v1 VM – 2 VM – 3 VM – 1 VM – 3 VM – 4 VM – 2 VM – 4 VM – 5 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 119 HA v2 119 HA v2 119 HA v2 119 Networking Plugin 119 Networking Plugin 119 Networking Plugin 119 OS 119 OS Network 119 OS Network 119 119 119 GA v2 GA GA v2 v2 VM – 1 VM – 2 VM – 3 VM – 1 VM – 3 VM – 4 VM – 2 VM – 4 VM – 5 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 118 GA v1 119 HA v2 119 HA v2 119 HA v2 119 Networking Plugin 119 Networking Plugin 119 Networking Plugin 119 OS 119 OS Network 119 OS Network @WindowsAzure @teched_europe Hands-On Labs DOWNLOAD Windows Azure Meetwindowsazure.com Windowsazure.com/ teched http://europe.msteched.com www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn http://europe.msteched.com/sessions