User Exchange 2013 architecture For a given mailbox’s connectivity, the protocol being used is always served by the server that hosts the active database.
Download ReportTranscript User Exchange 2013 architecture For a given mailbox’s connectivity, the protocol being used is always served by the server that hosts the active database.
User Exchange 2013 architecture For a given mailbox’s connectivity, the protocol being used is always served by the server that hosts the active database copy Exchange Online service changed the engineering approach to monitoring Scale drives automation Component based monitoring does not tell the story Layer 4LB CAS DAG1 MBX-A MBX-B Bringing the learnings from the service to the enterprise Monitoring based on the end user’s experience Protect the user’s experience through recovery oriented computing Customer Touch Points “stuff breaks and the Experience does not” LB CAS-1 DAG MBX-1 OWA OWA DB1 DB2 DB1 DB2 DB1 DB2 MBX-2 CAS-2 OWA MBX-3 OWA —OWA send —OWA failure —OWA fast recovery —OWA verified as healthy —OWA send —OWA failure —OWA fast recovery —Failover server’s databases —OWA verified as healthy —Server becomes “good” failover target (again) Exchange 2013 Server “take human driven action” Managed Availability “state of the world” “restore service or prevent failure” System Level Checks 1. Mailbox Self Test 2. 3. • (e.g. OWA MST) [detection 5m] • (e.g. OWA PST) [detection 20 secs] • (e.g. OWA PrST) [detection 20 secs] Protocol Self Test Proxy Self Test End User Experience Level Checks 4. Customer Touch Point – CTP • (e.g. OWA CTP) [detection 20m] “take human driven action” “state of the world” “take human driven action” “restore service or prevent failure” Sampling Probe Definition Probe Detection Probe Results (Samples) Monitor Definition Recovery Monitor Results (Alerts) Monitor Responder Definition Responder Responder Results (Responses) Monitor States Healthy Notification Item Sequenced HA Responder Pipeline Example Restart Responder Reset AppPool Responder 00:00:00 T1 00:00:10 T2 Failover responder Bugcheck responder Offline Responder 00:00:30 T3 Escalate Responder Named Times Per Server RecoveryAction Enabled ForceReboot Per Group Minutes Between Actions Max Allowed Per Hour Max Allowed Per Day Minutes Between Actions Max Allowed Per Day True 720 N/A 1 600 4 SystemFailover True 60 N/A 1 60 4 RestartService True 60 N/A 1 60 4 ResetIISPool True 60 N/A 1 60 4 DatabaseFailover True 120 N/A 1 120 4 ComponentOffline True 60 N/A 1 60 4 ComponentOnline True 5 12 288 5 Large MoveClusterGroup True 240 N/A 1 480 3 ResumeCatalog True 5 4 8 5 12 WatsonDump True 480 N/A 1 720 4 USER SYSTEM CTP Health Set Proxy Health Set OWA.Proxy OWA Protocol Health Set OWA.Protocol *See slide 13 to view monitoring layer details See Appendix for property name definitions The Bottom Line Managed Availability + Retries…“stuff breaks and the Experience does not” NLB CAS-1 DAG MBX-1 OWA OWA DB1 DB2 DB1 DB2 DB1 DB2 MBX-2 CAS-2 OWA MBX-3 OWA —OWA send —OWA failure —OWA failure detected —OWA restart App pool —OWA restart complete —OWA verified as healthy —OWA send —OWA failure —OWA failure detected —OWA restart App pool —OWA restart failed —Failover server’s databases —OWA service restarts —OWA verified as healthy —Server becomes “good” failover target (again) Bringing the learnings from the service to the enterprise Monitoring based on the end user’s experience Protect the user’s experience through recovery oriented computing @MSFTExchange Join the conversation, use #IamMEC www.iammec.com : http://fasttrack.office.com// http://channel9.msdn.com/Events/TechEd www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn