User Exchange 2013 architecture For a given mailbox’s connectivity, the protocol being used is always served by the server that hosts the active database.

Download Report

Transcript User Exchange 2013 architecture For a given mailbox’s connectivity, the protocol being used is always served by the server that hosts the active database.

User
Exchange 2013 architecture
For a given mailbox’s connectivity, the
protocol being used is always served by the
server that hosts the active database copy
Exchange Online service changed the
engineering approach to monitoring
Scale drives automation
Component based monitoring does not tell
the story
Layer 4LB
CAS
DAG1
MBX-A
MBX-B
Bringing the
learnings from
the service to the
enterprise
Monitoring
based on the
end user’s
experience
Protect the user’s
experience
through recovery
oriented
computing
Customer Touch Points
“stuff breaks and the Experience does not”
LB
CAS-1
DAG
MBX-1
OWA
OWA
DB1
DB2
DB1
DB2
DB1
DB2
MBX-2
CAS-2
OWA
MBX-3
OWA
—OWA send
—OWA failure
—OWA fast recovery
—OWA verified as healthy
—OWA send
—OWA failure
—OWA fast recovery
—Failover server’s databases
—OWA verified as healthy
—Server becomes “good”
failover target (again)
Exchange
2013
Server
“take human driven
action”
Managed
Availability
“state of the world”
“restore service or
prevent failure”
System Level Checks
1. Mailbox Self Test
2.
3.
•
(e.g. OWA MST) [detection 5m]
•
(e.g. OWA PST) [detection 20 secs]
•
(e.g. OWA PrST) [detection 20 secs]
Protocol Self Test
Proxy Self Test
End User Experience Level Checks
4. Customer Touch Point – CTP
•
(e.g. OWA CTP) [detection 20m]
“take human driven
action”
“state of the world”
“take human driven
action”
“restore service or
prevent failure”
Sampling
Probe Definition
Probe
Detection
Probe
Results
(Samples)
Monitor
Definition
Recovery
Monitor
Results (Alerts)
Monitor
Responder
Definition
Responder
Responder
Results
(Responses)
Monitor
States
Healthy
Notification Item
Sequenced HA
Responder Pipeline
Example
Restart Responder
Reset AppPool Responder
00:00:00
T1
00:00:10
T2
Failover responder
Bugcheck responder
Offline Responder
00:00:30
T3
Escalate Responder
Named Times
Per Server
RecoveryAction
Enabled
ForceReboot
Per Group
Minutes
Between Actions
Max Allowed
Per Hour
Max Allowed Per
Day
Minutes Between
Actions
Max Allowed Per
Day
True
720
N/A
1
600
4
SystemFailover
True
60
N/A
1
60
4
RestartService
True
60
N/A
1
60
4
ResetIISPool
True
60
N/A
1
60
4
DatabaseFailover
True
120
N/A
1
120
4
ComponentOffline
True
60
N/A
1
60
4
ComponentOnline
True
5
12
288
5
Large
MoveClusterGroup
True
240
N/A
1
480
3
ResumeCatalog
True
5
4
8
5
12
WatsonDump
True
480
N/A
1
720
4
USER
SYSTEM
CTP
Health Set
Proxy
Health Set
OWA.Proxy
OWA
Protocol
Health Set
OWA.Protocol
*See slide 13 to view monitoring layer details
See Appendix for property name definitions
The Bottom Line
Managed Availability + Retries…“stuff breaks and the Experience does not”
NLB
CAS-1
DAG
MBX-1
OWA
OWA
DB1
DB2
DB1
DB2
DB1
DB2
MBX-2
CAS-2
OWA
MBX-3
OWA
—OWA send
—OWA failure
—OWA failure detected
—OWA restart App pool
—OWA restart complete
—OWA verified as healthy
—OWA send
—OWA failure
—OWA failure detected
—OWA restart App pool
—OWA restart failed
—Failover server’s databases
—OWA service restarts
—OWA verified as healthy
—Server becomes “good”
failover target (again)
Bringing the
learnings from
the service to the
enterprise
Monitoring
based on the
end user’s
experience
Protect the user’s
experience
through recovery
oriented
computing
@MSFTExchange
Join the conversation, use #IamMEC
www.iammec.com
: http://fasttrack.office.com//
http://channel9.msdn.com/Events/TechEd
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn