SESSION CODE: EXL407 Scott Schnoll Principal Technical Writer Microsoft Corporation EXCHANGE SERVER 2010 HIGH AVAILABILITY DEEP DIVE (c) 2011 Microsoft.
Download ReportTranscript SESSION CODE: EXL407 Scott Schnoll Principal Technical Writer Microsoft Corporation EXCHANGE SERVER 2010 HIGH AVAILABILITY DEEP DIVE (c) 2011 Microsoft.
SESSION CODE: EXL407 Scott Schnoll Principal Technical Writer Microsoft Corporation EXCHANGE SERVER 2010 HIGH AVAILABILITY DEEP DIVE (c) 2011 Microsoft. All rights reserved. Agenda ► Exchange Server 2010 High Availability Deep Dive – – – – Database Availability Group Networks Active Manager Best Copy Selection Datacenter Activation Coordination Mode (c) 2011 Microsoft. All rights reserved. Exchange Server 2010 High Availability Deep Dive: Database Availability Group Networks DAG Networks ► A DAG network is a collection of one or more subnets ► There are two types of DAG networks – MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.) • Registered in DNS / DNS configured • Uses default gateway • Client for Microsoft Networks/File and Print Sharing enabled – Replication Network - used for/by continuous replication (log shipping and seeding) • Not registered in DNS / DNS not configured • Typically no default gateway • Client for Microsoft Networks/File and Print Sharing disabled DAG Networks ► All DAGs must have: – Exactly one MAPI network – Zero or more Replication networks • Separate network(s) on separate subnet(s) • LRU determines which replication network is used with multiple replication networks ► DAG networks automatically created when Mailbox server is added to DAG – Based on cluster’s enumeration of networks • Cluster enumeration based on subnet • One cluster network is created for each subnet DAG Networks ► Maximum round trip return latency between all DAG members must be 500 ms or less – Regardless of the latency of the solution, customers should validate that the network between all DAG members is capable of satisfying the data protection and availability goals of the deployment – May need to investigate increasing the number of databases or decreasing the number of mailboxes per database to achieve desired goals DAG Networks Server / Network IP Address / Subnet Bits Default Gateway EX1 – MAPI 192.168.0.15/24 192.168.0.1 EX1 – REPLICATION 10.0.0.15/24 N/A EX2 – MAPI 192.168.0.16/24 192.168.0.1 EX2 – REPLICATION 10.0.0.16/24 N/A Name Subnet(s) Interface(s) MAPI Access Enabled Replication Enabled DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) EX2 (192.168.0.16) True True DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) EX2 (10.0.0.16) False True DAG Networks Server / Network IP Address / Subnet Bits Default Gateway EX1 – MAPI 192.168.0.15/24 192.168.0.1 EX1 – REPLICATION 10.0.0.15/24 N/A EX2 – MAPI 192.168.1.15/24 192.168.1.1 EX2 – REPLICATION 10.0.1.15/24 N/A Name Subnet(s) Interface(s) DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True DAGNetwork02 10.0.0.0/24 False True DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True DAGNetwork04 10.0.1.0/24 False True EX1 (10.0.0.15) EX2 (10.0.1.15) MAPI Access Enabled Replication Enabled DAG Networks ► Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04 Name Subnet(s) Interface(s) DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True DAGNetwork02 10.0.0.0/24 False True DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True DAGNetwork04 10.0.1.0/24 False True EX1 (10.0.0.15) EX2 (10.0.1.15) MAPI Access Enabled Replication Enabled DAG Networks ► Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04 Name Subnet(s) Interface(s) DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) 192.168.1.0/24 EX2 (192.168.1.15) False True DAGNetwork02 10.0.0.0/24 10.0.1.0/24 False True EX1 (10.0.0.15) EX2 (10.0.1.15) MAPI Access Enabled Replication Enabled DAG Networks ► Automatic detection occurs only when members added to DAG – If networks are added after member is added, you must perform discovery Set-DatabaseAvailabilityGroup -DiscoverNetworks ► DAG network configuration persisted in cluster registry – HKLM\Cluster\Exchange\DAG Network ► DAG networks include built-in encryption and compression – Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs – Compression: Microsoft XPRESS, based on LZ77 algorithm DAG Networks ► Block cross-network communication to minimize heartbeat traffic Allowed Subnet 1 Subnet 3 Subnet 2 Subnet 4 Blocked DAG Networks ► If using iSCSI storage, configure DAG and cluster to ignore iSCSI networks 1. Set-DatabaseAvailabilityGroupNetwork -Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true 2. Cluster network <ClusterNetworkName> /prop Role=0 DAG Networks ► When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnet ► Use DHCP in site resilience configurations to assign IP addresses to Replication network – Enables delivery of the typically required static routes – If using static IP addresses, use netsh to configure static routes ► Configure a DNS TTL on service access connection records that is consistent with your SLA, e.g. ~5 minutes for a one hour RTO SLA Exchange Server 2010 High Availability Deep Dive: Active Manager Active Manager ► What are the three Active Manager roles? – Standalone – PAM (Primary Active Manager) – SAM (Standby Active Manager) ► Transition of role state logged into MicrosoftExchange-HighAvailability/Operational event log (Crimson Channel) Active Manager Functionality ►Mount and Dismount Databases ►Provide Database Availability Information ►Provide Interface for Administrative Tasks ►Monitor for Failures ►Maintains Database and Server State Information AutoMount on DAG Members ► In a DAG, all AutoMount operations are coordinated through the PAM ► AutoMount operations occur: – When the first server in the DAG is initialized – When the ownership of the PAM role is changed AutoMount on DAG Members ► Checks msExchMasterServerOrAvailabilityGroup to determine all databases hosted on the DAG ► Checks if database can be mounted on startup – If msExchEDBOffline is TRUE, stop processing – If msExchEDBOffline is FALSE, proceed with processing AutoMount on DAG Members ► Checks persistent database information stored in cluster registry ► Determines if database is mounted on another DAG member – If the database is mounted on another server, take no action – If the database is not mounted on another server, proceed AutoMount on DAG Members ► Checks AdminDismount in cluster registry: – If AdminDismount is TRUE, take no action – If AdminDismount is FALSE, proceed ► Checks persistent database state information in cluster registry for server on which database was last mounted – If server available, issue mount request to Information Store on that server – If server not available or property not set, issue mount request to next server in sorted list AutoMount on DAG Members ► If AutoMount operation succeeds: – Update persistent database state information stored in cluster database – Propagate information to all other DAG members Mount / Dismount Database Copy ► Mount Database – An administrator action invoked through a task – The last part of a move operation ► Dismount Database – An administrator action invoked through a task – The first part of a move operation Mount Database – DAG Member ► Initiate RPC to member of the DAG – If the server contacted is not the PAM, the task is referred to the PAM – If the server is the PAM, continue with no referral ► Checks the msExchMasterServerOrAvailabilityGroup to ensure database is hosted in the DAG – If database is hosted in DAG, proceed – If database is not hosted in DAG, error out Mount Database – DAG Member ► Checks if the database is already mounted – If already mounted, task fails – If not already mounted, task continues ► PAM invokes callback – This invokes a pre-check for the database mount operation – Persistent database state updated to show mount Initiated Mount Database – DAG Member ► PAM invokes RPC call to Information Store to mount database – If mount fails, task fails – If mount succeeds, task completes successfully ► Persistent database state updated to record results of operation and propagated to other members Dismount Database – DAG Member ► Task initiates call to PAM or is referred to PAM ► PAM checks that msExchMasterServerOrAvailabilityGroup value matches the DAG ► PAM verifies that database is mounted in the DAG by checking persistent database state information stored in registry – If database is mounted, task proceeds – If database is dismounted, task fails Dismount Database – DAG Member ► PAM updates persistent state information in cluster database to show state Initiated ► PAM makes RPC call to Information Store on DAG member and invokes dismount – If dismount operation succeeds, persistent database state information stored in cluster database is updated – If dismount operation fails, task fails Auto Dismount – DAG Member ► Occurs when a DAG loses quorum ► All DAG members are running (but may not be participating in the cluster) ► Databases dismounted as quickly as possible to avoid split-brain – Information Store service is terminated Auto Dismount – DAG Member ► Dismount operation should attempt to update database state information in cluster database ► This is the only case where a database operation occurs on a server other than the PAM Active Manager – Move Database ► Move Database – An administrator action invoked by a task – Automatic operation initiated by the PAM (failover) ► Begins with a Dismount operation and ends with a Mount operation Exchange Server 2010 High Availability Deep Dive: Best Copy Selection Best Copy Selection ►Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status ►Active Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a targetless switchover Best Copy Selection – RTM ► Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary ► Selects from sorted listed based on which set of criteria met by each copy ► Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy Best Copy Selection – SP1 ► Sorts copies by activation preference when auto database mount dial is set to Lossless – Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary ► Selects from sorted listed based on which set of criteria met by each copy ► Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy Best Copy Selection ► Is database mountable? – Is copy queue length <= AutoDatabaseMountDial? • If Yes, database is marked as current active and mount request is issued • If not, next best database tried (if one is available) ► During best copy selection, any servers that are unreachable or “activation blocked” are ignored Best Copy Selection Criteria Copy Queue Length Replay Queue Length Content Index Status 1 < 10 logs < 50 logs Healthy 2 < 10 logs < 50 logs Crawling 3 N/A < 50 logs Healthy 4 N/A < 50 logs Crawling 5 N/A < 50 logs N/A 6 < 10 logs N/A Healthy 7 < 10 logs N/A Crawling 8 N/A N/A Healthy 9 N/A N/A Crawling 10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource Best Copy Selection – RTM ► Four copies of DB1 ► DB1 currently active on Server1 Server1 X DB1 Database Copy Server2 Server3 Server4 DB1 DB1 DB1 Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection – RTM ► Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary): – Server3\DB1 – Server2\DB1 – Server4\DB1 Database Copy Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection – RTM ► Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): – Server3\DB1 – Server2\DB1 – Server4\DB1 Database Copy Lowest copy queue length – tried first Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection – SP1 ► Four copies of DB1 ► DB1 currently active on Server1 ► Auto database mount Server1 dial set to Lossless X DB1 Database Copy Server2 Server3 Server4 DB1 DB1 DB1 Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection – SP1 ► Sort list of available copies based by Activation Preference: – Server2\DB1 – Server3\DB1 – Server4\DB1 Database Copy Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection – SP1 ► Sort list of available copies based by Activation Preference: – Server2\DB1 – Server3\DB1 – Server4\DB1 Database Copy Lowest preference value – tried first Activation Preference Copy Queue Length Replay Queue Length CI State Database State Server2\DB1 2 4 0 Healthy Healthy Server3\DB1 3 2 2 Healthy DiscAndHealthy Server4\DB1 4 10 0 Crawling Healthy Best Copy Selection ► After Active Manager determines the best copy to activate – The Replication service on the target server attempts to copy missing log files from the source (ACLL) • If successful, then the database will mount with zero data loss • If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting • If data loss is outside of dial setting, next copy will be tried Best Copy Selection ►If an activated database copy is mounted – It will generate new log files (using the same log generation sequence) – Transport Dumpster requests will be initiated for the mounted database to recover lost messages – When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed Exchange Server 2010 High Availability Deep Dive: Datacenter Activation Coordination Mode Datacenter Activation Coordination Mode ► DAC mode is a property of a DAG ► Acts as an application-level form of quorum – Controls whether or not a Mailbox server attempts to mount its active databases on startup – Designed to prevent multiple copies of same database mounting on different members due to loss of network (split brain) ► Also enables use of Site Resilience tasks – Stop-DatabaseAvailabilityGroup – Restore-DatabaseAvailabilityGroup – Start-DatabaseAvailabilityGroup Datacenter Activation Coordination Mode ► RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites – Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site ► SP1: DAC Mode can be enabled for all DAGs ► If using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode Datacenter Activation Coordination Mode ►Uses Datacenter Activation Coordination Protocol (DACP) ►A bit in memory (in MSExchangeRepl.exe) set to either: – 0 = can’t mount – 1 = can mount Datacenter Activation Coordination Mode ► Active Manager startup sequence – DACP is set to 0 – DAG member communicates with other DAG members it can reach to determine the current value for their DACP bits • If the starting DAG member can communicate with all other members on the StartedServers list, DACP bit switches to 1 • If the starting DAG member can communicate with another member, and that other member’s DACP bit is set to 1, starting DAG member DACP bit switches to 1 • If the starting DAG member can communicate with another member, and that other member’s DACP bits are set to 0, starting DAG member DACP bit remains at 0 Outlook Outlook DAG1 HT2010 FSW DAG1 CAS-Pri HT2010 CAS-Sec Active MBX-A Secondary Datacenter Primary Datacenter Datacenter Activation Coordination Mode Active MBX-B MBX-C MBX-D Secondary Datacenter Primary Datacenter Datacenter Activation Coordination Mode Outlook Outlook AWS DAG1 HT2010 FSW DAG1 CAS-Pri Active MBX-A HT2010 CAS-Sec Active MBX-B MBX-C MBX-D Secondary Datacenter Primary Datacenter Datacenter Activation Coordination Mode Outlook Outlook AWS DAG1 HT2010 FSW DAG1 CAS-Pri Active 0 MBX-A HT2010 CAS-Sec Active 0 MBX-B 1 MBX-C 1 MBX-D Resources Exchange Team Blog - http://aka.ms/ehlo Exchange 2010 Documentation - http://aka.ms/ex2010docs My Blog – http://aka.ms/schnoll Twitter: @schnoll Enrol in Microsoft Virtual Academy Today Why Enroll, other than it being free? The MVA helps improve your IT skill set and advance your career with a free, easy to access training portal that allows you to learn at your own pace, focusing on Microsoft technologies. What Do I get for enrolment? ► Free training to make you become the Cloud-Hero in my Organization ► Help mastering your Training Path and get the recognition ► Connect with other IT Pros and discuss The Cloud Where do I Enrol? www.microsoftvirtualacademy.com Then tell us what you think. [email protected] © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. (c) 2011 Microsoft. All rights reserved. Resources www.msteched.com/Australia www.microsoft.com/australia/learning Sessions On-Demand & Community Microsoft Certification & Training Resources http:// technet.microsoft.com/en-au http://msdn.microsoft.com/en-au Resources for IT Professionals Resources for Developers (c) 2011 Microsoft. All rights reserved.