Role-Based High Availability with Exchange 2007 Jim McBee http://www.ithicos.com Who is Jim McBee!!?? • Consultant, Writer, MCSE, MVP and MCT – Honolulu, Hawaii • Principal clients.

Transcript Role-Based High Availability with Exchange 2007 Jim McBee http://www.ithicos.com Who is Jim McBee!!?? • Consultant, Writer, MCSE, MVP and MCT – Honolulu, Hawaii • Principal clients.

Role-Based High Availability with Exchange 2007

Jim McBee http://www.ithicos.com

Who is Jim McBee!!??

• Consultant, Writer, MCSE, MVP and MCT – Honolulu, Hawaii • Principal clients (Dell, Microsoft, SAIC, Servco Pacific) • Author – Exchange 2003 Advanced Administration (Sybex) • Contributor – Exchange and Outlook Administrator • Blog – http://mostlyexchange.blogspot.com

– http://www.directory-update.com

Agenda

• High availability versus fault tolerance • Resiliency versus high availability • Server roles • Providing higher availability • Continuous replication technologies

Fault tolerance

• Designing and building a server that is resistant to failure • All servers should be fault tolerant • RAID disks • ECC memory • Redundant power supplies • UPS systems • Active Directory and DNS

High availability

• Components of your system that allow quicker recovery from a failure • Examples include… – Clustering – Load balanced hosts – Built-in redundancy or load balancing – DNS / application redundancy or load balancing

Resiliency

• Solutions that allow for contingency of operations • Recovery in the event of a serious disaster • Not solutions that are invoked when applying a service pack or a quick power outage • Usually not automatic failover • Examples include… – Standby Continuous Replication – Local Continuous Replication

Server roles

Roles configured at installation

• Simplify installation – Optimize the server for the jobs it performs – Increase availability through the most efficient and economic means – Manage the servers more intuitively

Exchange 2007 Server Roles

By defining well-described roles, we can: – Remove unnecessary functionality – Reduce the attack surface • Benefit: optimize server performance • Benefit: reduced exposure in the perimeter

Hub Transport Server Client Access Server Edge Transport Server

Perimeter Network

Mailbox Server Unified Messaging Server

Protected Network

Server Roles 1/5

• Edge Transport – Must be on its own separate physical machine – No other roles installed – May be workgroup member or joined to an Active Directory domain – Uses Active Directory Application Mode (ADAM) for configuration and recipient information – Perimeter policy enforcement – Message hygiene • Anti-spam • Transport anti-virus • Not Required

Server Roles 2/5

• Client Access Server (CAS) – Supports Outlook Web Access, Exchange ActiveSync, Outlook Anywhere (formerly RPC/HTTPS), POP3 and IMAP4 protocols, Autodiscover, Availability, and Web services – At least one CAS in each Active Directory site and domain where mailbox servers exist – Requires good network connection (low latency) to mailbox servers – Uses RPC communication to mailbox server

Server Roles 3/5

• Hub Transport – Handles message delivery and routing (see EX03) – Applies policies to incoming and outgoing mail – Can handle message hygiene functions – Reduces cost and complexity • Provides more predictable routing • Reduces downtime

Server Roles 4/5

• Mailbox – Responsible for serving mailbox databases and public folders – Mailbox access through MAPI – Possible to require MAPI encryption – Possible to run without public folders

Server Roles 5/5

• Unified Messaging – Placed in the protected corporate network – Requires that Mailbox and Hub Transport roles exist – Check with your phone vendor to see if their phone system will work with UM server • May require PBX gateway

Things to Consider

• Interdependencies – Mailbox servers require the Hub Transport role for message delivery – even to the same database – The CAS roles provide OWA, ActiveSync, RPC over HTTP, the Availability Service, Autodiscover, and more – The Edge role requires a Hub Transport server • Fault tolerance – Mailbox servers can only talk to Hub Transport servers in the same Active Directory site – Mailbox servers will talk to Hubs on the same server before other Hubs in the same Active Directory site – For proxy & re-direct scenarios CAS connects to "best" CAS • CAS

not the same as

FE servers

High availability

Focus on Availability and Resilency

• Improve data availability and resiliency – Protect mailbox data from failures and corruptions – Reduce time required to restore mailbox data – Provide data redundancy • Service availability – Make mailbox data more available – Make cluster failover less painful – Make cluster management easier – Support for ‘stretch’ or ‘geo-clusters’ – Allow large mailboxes inexpensively

Hub Transport Server High Availability Options

• Use redundant hardware • Automatically load balanced and redundant with multiple Hub Transport servers • Inbound SMTP mail – Direct delivery to Hub Transport from Internet – Direct delivery to Hub Transport from 3 rd party SMTP system – Load balancing • Third party load balancing • Windows Network Load Balancing (NLB) • Server failure will result in failure of current connections • May result in some data loss for any messages in the Hub Transport Server queue database

Client Access Server High Availability Options

• Redundant hardware – Windows NLB or third party load balancing – Round robin DNS (not the best solution) • Server failure will result in current connections being lost – User may need to re-establish connection

Unified Messaging Server High Availability Options

• Redundant hardware – Windows NLB or third party load balancing – Round robin DNS • PBX or Gateway redundancy – Some PBXs may have load balancing options for multiple UM servers • Server failure will result in any loss of current connections or call transfers in progress

Mailbox Server High Availability and Resiliency Options

• Resiliency and recoverability – Local continuous replication (LCR) – Standby continuous replication (SCR) • Requires Exchange 2007 SP1 • High availability – Cluster continuous replication (CCR) – Single copy clusters (SCC) • CCR and SCC require dedicated servers – No other roles can exist on a clustered node except Mailbox – Other roles must be on their own hardware • Changes to transaction log files – 1MB in size – Log file is completely written after 15 minutes – Checkpoint depth is still 20MB / Storage Group

Shared Copy Clusters

• Requires Microsoft Cluster Services • Benefits – Improved Exchange Cluster setup – Traditional clustering used today – Failovers use the same data copy • Exchange Virtual Server = Clustered Mailbox Server • 2 to 8 node Active /

SCC Caveats

• Requires expensive hardware with shared storage • Can be complicated for admins to learn • Doesn’t protect from storage/data issues • Let Servers must be on same IP subnet – Data redundancy provided through partners • Hardware must be in the Windows Server Catalog

Local Continuous Replication

• Additional copy of the logs and database – On the same server – On a different volume • Benefits – Easy configuration – Single datacenter – Doesn’t require expensive hardware – Online backups – Very quick restoration of service • Caveats – Adds additional CPU/memory/disk overhead – Initial seeding required – Manual activation – Additional storage requirements – One database per storage group

Logs Database Logs Database

Local Continuous Replication

Mailbox Server Database D:\SG1\Logs

Enable LCR

Logs E00.log

E0000000012.log

E0000000011.log

Copy and verify logs

D:\SG1\Copy\Logs Logs E0000000012.log

E0000000011.log

Updated database

Database

Advance database by playing logs

Local Continuous Replication Tips

• One database per storage group • Plan for additional hardware resources – Minimum 20% additional CPU overhead – Additional 1GB of RAM – Will more than double IOPS requirements • Maximum database size approximately 2GB • Separate storage into LUNs – Do not break LUNs in to separate partitions – Put each database on a separate LUN – Isolate active and passive LUNs • Use battery backed up storage controllers – Configure caching controllers for 75% write / 25% read • LCR activation is manual – Use Restore-StorageGroupCopy cmdlet – Use backup copy “in place” or move it

Local continuous replication

Clustered Continuous Replication

Witness

KB 921181 • Benefits – Potentially no single point of failure – Two copies of the data on separate servers – No need for shared / SAN storage.

– Full redundancy with automatic recovery – Backup mailboxes without disturbing production – Doesn’t require validation for clustered configuration

File Share

CCR Advantages

• No single point of failure • Fast recover • Simplified hardware and storage requirements • Simplified deployment • Out-of-the-box replication solution • Can “stretch” the cluster to a second data center • Ability to offload VSS-based backups to passive node • Can integrate with SCR

CCR Caveats

• Requires Microsoft Cluster Services – Majority Node Set cluster – Requires a third “voting” node - uses a shared folder • Two-node, Active/Passive only • Backup: – Streaming backup against production storage groups – VSS backup against production and replica storage groups • Limit of one database per storage group • Can be used for PF database if it is the only PF database in the organization • Initial database seeding required • Servers must be on same IP subnet • Transaction logs pulled over SMB shares • Some scenarios required log validation, replay • Database failure does not cause failover

Standby Continuous Replication

• Coming in Service Pack 1 • Source and target machines can be – Stand-alone – In two different MSCS clusters – On different subnets • Controlled per storage group • Many-to-one and one-to many supported • Manually activated

Replication to a standby server

LCR versus CCR versus SCR

• • • LCR – Focused towards resiliency – Improve restore time – Administrator has to initiate restore manually – Single data-center solution – Implements log shipping and replay out of the box • Log files are copied locally and replayed CCR – Targeted towards site resiliency – Automatic failovers – Single or two-data center solution – Supports “stretch” option – Implements log shipping and replay out of the box • • Log files are copied to remote server and replayed – Simplifies cluster deployment No SAN or shared storage SCR – Provides site and server resiliency – “Cold spare” approach cuts hardware costs – Can be combined with LCR, CCR, and SCC for maximum flexibility

Continuous Replication Basics

• Exchange store runs normally • Replication service keeps a copy of the database up-to-date • Copies, inspects, and replays log files • In CCR, Cluster service provides failover • Move network identity (client transparency) • LCR activation is manual • Restore-StorageGroupCopy task

Continuous Replication Basics

• A ‘pull’ model • Exchange server creates log files normally • Log files are copied by Replication service • E

xxnnnnnnnn

.log files copied as they appear • E

.log is copied for handoff/failover • If it can’t be copied loss setting (AutoDatabaseMountDial) is consulted • Lossless (0 logs lost) • GoodAvailability (3 logs lost) • BestAvailability (6 logs lost – default setting)

Continuous Replication

Source DB Store Source Log Directory Replication Service Inspector Directory Replication Service Target Log Directory Replication Service DB Copy

Continuous Replication

Source DB Store Replication Source Log Directory

LastLogCopyNotified

Service Inspector Directory

LastLogCopied LastLogInspected LastLogReplayed

Replication Service Target Log Directory DB Copy Replication Service

Continuous Replication Monitoring

LastLogCopyNotified Last generation seen in the source directory LastLogCopied Last generation copied to Inspector directory by Replication service LastLogInspected Last generation inspected Moved to log file directory LastLogReplayed Last generation replayed into the database copy Available through Performance Monitor

Divergence

• When the copy has information not in the original it is diverged • Divergence may be in database or log files • Lossy failover will produce a divergence • ‘Split-brain’ on a cluster also causes divergence • Even if clients can’t connect, background maintenance still modifies the database • Administrator error can cause divergence!

• e.g. running eseutil /r

Recovering from Divergence • Re-seed will always work

• Expensive for large databases

• Look at the common case

• Lossy failover • Only a few log files are lost

• Built-in solutions

• Decreased log file size to reduce data loss • Lost Log Resilience (LLR)

• • •

Transport Dumpster

Feature built into the Hub Transport server role • • Runs to redeliver mail to CMS’ in its Site Uses the creation time of the last log file copied CCR only in RTM • • Use Set-TransportConfig to change default settings (setting is organization-wide) Set MaxDumpsterSizePerStorageGroup be to

1.5

times the size of the maximum message that can be sent (default value is 18MB) Recommend MaxDumpsterTime be

7.00:00:00

, which is seven days (default value)

Backups from Passive Database

• Backing up the passive moves the performance hit off the active • Backup the active or the passive?

• Remember, they can change designations • Passive backup is VSS only • Data Protection Manager v2 • Active backup can be VSS or streaming ESE

Questions?

Thanks for attending!

Book giveaway and e-mail notice

• Please give me a piece of paper with your name for drawing • Include your e-mail address or give me a business card if you want: – 20% discount code for Directory Update software – Notification e-mail when Mastering Exchange Server 2007 is available