Multi-Site Clustering for Hyper-V Disaster Recovery Greg Shields, MVP, vExpert Senior Partner Concentrated Technology www.ConcentratedTech.com @ConcentratdGreg About the speaker Over 15 years of Windows experience  Administrator – Managed.

Transcript Multi-Site Clustering for Hyper-V Disaster Recovery Greg Shields, MVP, vExpert Senior Partner Concentrated Technology www.ConcentratedTech.com @ConcentratdGreg About the speaker Over 15 years of Windows experience  Administrator – Managed.

Multi-Site Clustering
for Hyper-V
Disaster Recovery
Greg Shields, MVP, vExpert
Senior Partner
Concentrated Technology
www.ConcentratedTech.com
@ConcentratdGreg
About the speaker
Over 15 years of Windows experience
 Administrator – Managed environments ranging from a few




dozen to many thousands of users…
Consultant – Hands-on and Strategic…
Speaker – TechMentor, Tech Ed, Windows Connections, MMS,
VMworld, ISACA, others…
Analyst/Author – Fourteen books and counting…
Columnist – TechNet Magazine, Redmond Magazine,
Windows IT Pro Magazine, TechTarget Online, others…
 All-around good guy…
What Makes a Disaster?
Which of the following would you consider a disaster?
Impacts your datacenter and causes damage. That damage
causes the entire processing of that datacenter to cease
Interrupts the functionality of your datacenter for an extended
period of time
It’s immediately ceasing all processing on that server
It causes problems with a service, shutting down that service
and preventing some action from occurring on the server
It causes a server or an entire rack of servers to
inadvertently and rapidly power down
What Makes a Disaster?
Which of the following would you consider a disaster?
It’s immediately ceasing all processing on that server
Just a bad day…
It causes problems with a service, shutting down that service
and preventing some action from occurring on the server
It causes a server or an entire rack of servers to inadvertently
and rapidly power down
What Makes a Disaster?
 Your decision to “declare a disaster” and move to
“disaster ops” is a major one
 The technologies used for disaster protection are
different than those used for high-availability
• More complex
• More expensive
 Failover and failback processes involve more thought
• You might not be able to just “fail back” with a click of a button
Multi-Site Hyper-V == Single-Site Hyper-V
Multi-site Hyper-V looks very much the same as single-site Hyper-V
•
•
•
•
Microsoft has not done a good job of explaining this fact!
Some Hyper-V hosts
Some networking and storage
Virtual machines that Live Migrate around
But there are some major differences too…
•
•
•
•
VMs can Live Migrate across sites
Sites typically have different subnet arrangements
Data in the primary site must be replaced with the DR site
Clients need to know where your servers go!
Constructing Site-Proof Hyper-V: Three Things
At a very high level, Hyper-V disaster recovery is three things
Storage
mechanism
Replication
mechanism
Target
Servers &
Cluster
 Once you have these three things, layering Hyper-V
atop is easy.
Constructing Site-Proof Hyper-V: Three Things
Replication Mechanism
Primary
Hyper-V Server
Backup
Hyper-V Server
Primary
Hyper-V Server
Backup
Hyper-V Server
Storage Device
Storage Device
Backup Site
Storage Device(s)
Target Servers
Thing 1: A Storage Mechanism
Typically, two SANs in two different locations
Fibre Channel,
iSCSI,
FCoE,
heck JBOD
Similar model
or
manufacturer
Similarity 
proper
replication
Backup SAN doesn’t necessarily need to be of the same size or
speed as the primary SAN
Replicated
≠
Full data
(not always)
DR
– not for
everything!
DR Environments:
Where Old SANs
Go To Die!
Thing 2: A Replication Mechanism
Replication between SANs must occur
1. Synchronously
2. Asynchronously
• Changes are made on one
node at a time
• Changes on backup SAN
will eventually be written
• Subsequent changes on
primary SAN must wait for
ACK from backup SAN
• Changes queued at
primary SAN to be
transferred at intervals
Thing 2: A Replication Mechanism
1. Synchronously
● Changes are made on one node at a time. Subsequent
changes on primary SAN must wait for ACK from backup SAN.
Change Committed at
Primary Site
Change Replicated to
Secondary Site
Change Committed at
Secondary Site
Acknowledge of
Change Returned to
Primary Site
Storage Device
Primary Site
Change Complete
Storage Device
Backup Site
Thing 2: A Replication Mechanism
2. Asynchronously
● Changes on backup SAN will eventually be written. Are queued
at primary SAN to be transferred at intervals.
Change 1 Committed at
Primary Site
Change 2 Committed at
Primary Site
Change 3 Committed at
Primary Site
Change 4 Committed at
Primary Site
Changes Replicated to
Secondary Site
Storage Device
Primary Site
Storage Device
Backup Site
Food for Thought
Which would you choose? Why?
Synchronous
● Assures no loss of data
● Requires a high-bandwidth and
low-latency connection
● Write and acknowledgement
latencies impact performance
● Requires shorter distances
between storage devices
Asynchronous
● Potential for loss of data during a
failure
● Leverages smaller-bandwidth
connections, more tolerant of
latency
● No performance impact
● Potential to stretch across longer
distances
Your Recovery Point Objective makes this decision…
Thing 2½: Replication Processing Location
There are also two locations for replication processing…
1. Storage Layer
● Replication processing is handled by the SAN itself
● Agents are often installed to virtual hosts or machines to ensure crash
consistency
● Easier to set up, fewer moving parts. More scalable
● Concerns about crash consistency
2. OS / Application Layer
● Replication processing is handled by software in the VM OS
● This software also operates as the agent
● More challenging to set up, more moving parts. More installations to
manage/monitor. Scalability and cost are linear
● Fewer concerns about crash consistency
Thing 3: Target Servers and a Cluster
 Finally are target servers and a cluster in the backup
site.
Hyper-V
Server
Network
Switch
Network
Switch
Network
Switch
Network
Switch
Storage
Storage
Hyper-V
Server
Backup Site
Clustering’s Sordid History
Windows NT 4.0
- Microsoft Cluster Service “Wolfpack”
- “As the corporate expert in Windows clustering, I
recommend you don’t use Windows clustering”
Windows 2000
Greater availability, scalability. Still painful
Windows 2003
- Added iSCSI storage to traditional Fibre Channel
- SCSI Resets still used as method of last resort (painful)
Windows 2008
- Eliminated use of SCSI Resets
- Eliminated full-solution HCL requirement
- Added Cluster Validation Wizard and pre-cluster tests
- Clusters can now span subnets (ta-da!)
Windows 2008 R2
- Improvements to Cluster Validation Wizard and
Migration Wizard
- Additional cluster services
- Cluster Shared Volumes (!) and Live Migration (!)
So, What IS a Cluster?
So, What IS a Cluster?
Quorum Drive & Storage
for Hyper-V VMs
So, What IS a Multi-Site Cluster?
Witness Server
Witness Site
Hyper-V Server
Network
Switch
Network
Switch
Network
Switch
Network
Switch
iSCSI
Storage
iSCSI
Storage
Hyper-V Server
Backup Site
Quorum:
Clustering’s Most Confusing Configuration
 Ever been to a Kiwanis meeting…?
 A cluster “exists” because it has quorum between its members.
Quorum is achieved via a voting process
Different clubs –
different rules
Different clusters –
different rules
Different than
resource failover
 If a cluster “loses quorum”, the entire cluster shuts down and
ceases to exist. This happens until quorum is regained
 Multiple quorum models exist
Four Options for Quorum
1. Node and Disk Majority
2. Node Majority
3. Node and File Share Majority
4. No Majority: Disk Only
Quorum in Multi-Site Clusters




Node and Disk Majority
Node Majority
Node and File Share Majority
No Majority: Disk Only
Microsoft recommends using the Node and File Share Majority model
for multi-site clusters
 This model provides the best protection for a full-site outage
 Full-site outage requires a file share witness in a third geographic
location
Quorum in Multi-Site Clusters
 Use the Node and File Share Quorum
● Prevents entire-site outage from impacting quorum.
● Enables creation of multiple clusters if necessary.
Third Site for
Witness Server
Witness Server
Witness Site
Hyper-V Server
Network
Switch
Network
Switch
Network
Switch
Network
Switch
iSCSI
Storage
iSCSI
Storage
Hyper-V Server
Backup Site
I Need a Third Site? Seriously?
Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily
complicated…
 What happens if you put the quorum’s file share in the
primary site?
● The secondary site might not automatically come online after a primary
site failure
● Votes in secondary site < Votes in primary site
I Need a Third Site? Seriously?
Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily
complicated…
 What happens if you put the quorum’s file share in the
secondary site?
● A failure in the secondary site could cause the primary site to go down.
● Votes in secondary site > votes in primary site.
This problem gets even weirder as time passes and the number of
servers changes in each site
I Need a Third Site? Seriously?
Third Site for
Witness Server
Witness Server
Witness Site
Hyper-V Server
Network
Switch
Network
Switch
Network
Switch
Network
Switch
iSCSI
Storage
iSCSI
Storage
Hyper-V Server
Backup Site
Multi-Site Cluster Tips/Tricks
Manage Preferred Owners & Persistent Mode options
● Make sure your servers fail
over to servers in the same
site first
● But also make sure they
have options on failing over
elsewhere
Multi-Site Cluster Tips/Tricks
Consider carefully the effects of Failback
● Failback is a great solution
for resetting after a failure
● But Failback can be a
massive problem-causer as well
● Its effects are particularly
pronounced in Multi-Site Clusters
● Recommendation: Turn it off,
(until you’re ready)
More Multi-Site Cluster Tips/Tricks
Resist creating clusters that support other services
●
A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster
Use disk “dependencies” as Affinity/Anti-Affinity rules
● Hyper-V all by itself doesn’t have an elegant way to affinitize
● Setting disk dependencies against each other is a work-around
Add Servers in Pairs
● Ensures that a server loss won’t cause site split brain
● This is less a problem with the File Share Witness configuration
Multi-Site Cluster Tips/Tricks
 Segregate traffic!!!
Most Important!
Ensure that networking remains available when VMs migrate from
primary to backup site
 Clustering can span subnets!
- This is good, but only if you plan for it…
● Crossing subnets also means: changing IP address, subnet mask,
gateway, etc., at new site
● Automatically done by using DHCP and dynamic DNS OR must be
manually updated
● DNS replication is also a problem. Clients will require time to update
their local cache
● Consider reducing DNS TTL or clearing client cache
Multi-Site Clustering
for Hyper-V
Disaster Recovery
Greg Shields, MVP, vExpert
Senior Partner
Concentrated Technology
www.ConcentratedTech.com
@ConcentratdGreg
Enjoy and share this material
 Feel free to promote this material
 Recommend your peers to pass certification
 Blog, Tweet and share this material and your experience on Facebook
 You’re an Expert? We will be happy to have you as Backup Academy
contributor. Apply here.
Web: http://www.backupacademy.com
E-mail: [email protected]
Twitter: BckpAcademy
Facebook: backup.academy

Multi-Site Clustering for Hyper-V Disaster Recovery Greg Shields, MVP, vExpert Senior Partner Concentrated Technology www.ConcentratedTech.com @ConcentratdGreg About the speaker Over 15 years of Windows experience  Administrator – Managed.

Transcript Multi-Site Clustering for Hyper-V Disaster Recovery Greg Shields, MVP, vExpert Senior Partner Concentrated Technology www.ConcentratedTech.com @ConcentratdGreg About the speaker Over 15 years of Windows experience  Administrator – Managed.

Directory