Gopal Ashok Program Manager Microsoft Corporation DAT306 Agenda Introduction to High Availability and Disaster Recovery SQL Server Always On Technologies Developing Your Availability Solution Conclusion.
Download ReportTranscript Gopal Ashok Program Manager Microsoft Corporation DAT306 Agenda Introduction to High Availability and Disaster Recovery SQL Server Always On Technologies Developing Your Availability Solution Conclusion.
Gopal Ashok Program Manager Microsoft Corporation DAT306 Agenda Introduction to High Availability and Disaster Recovery SQL Server Always On Technologies Developing Your Availability Solution Conclusion Introduction to High Availability and Disaster Recovery Definitions Introduce key terms and concepts Business Continuity Planning Overview of the BCP process SQL Server High Availability Planning How does BCP apply to SQL Server availability? High Availability and Disaster Recovery: Definition High Availability High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period Availability defined in terms of service level agreements (SLA) Recovery Time Data loss during unplanned downtime A highly available application should be accessible by users x% of the time Disaster Recovery Processes and procedures designed to restore business operations due to a natural or human-induced disaster Typically involves providing redundancy spanning multiple sites or across geographic regions Defining x and SLA Availability Class Acceptable Downtime (hrs/yr) OR RTO Acceptable Data Loss (time of last copy) OR RPO Tier 1 >99.99% (1 hr or less) 5 min or less Tier 2 99.9% - 99.99% (18.5 hrs) 5 mins to 8.5 hrs Tier 3 (<99.9%) (Hours to days) Hours to days Recovery Time Objective (RTO) guided by availability requirements How much downtime can you tolerate? Recovery Point Objective (RPO) guided by criticality of application data How much data can you lose? RPO Tier1 RTO Protection Levels Protection against resource failures Machine Database Corruption Disk Regional DR Location Redundancy Building < 10 miles Geographic DR Protection against Natural Disasters Protection against Network Outages Site Failures Location Redundancy Local HA – City, County – < 100-200 miles Location Redundancy – State, Country – > 100-200 miles Business Continuity Planning Impact Analysis Critical Functions Threat Identification Recovery Objectives Analysis Solution Design Maintenance Solution Design Achieve recovery objectives for relevant threats within specified constraints like budget, human resources etc Cost\Benefit analysis of solutions Implementation Deploy the recommended solution Testing Testing Implementati on Test to see if the solution meets the recovery requirements Maintenance Yearly testing and review of procedures SQL Server High Availability Planning Analysis Application tiers serviced by the databases Causes of database downtime Protection levels: Local HA, Regional DR, Geographic DR Analysis Solution Design Need to understand what solutions exists? What are the characteristics and cost of the solution? Maintenance Solution Design Implementation What are the deployment steps and best practices? Testing How do I test my implementation? Maintenance How do I monitor and maintain the solution? Testing Implementation Database Downtime Drivers Failure Unplanned Downtime Protection User Errors Database Downtime Online Planned Downtime Administration Predictable Resourcing Analysis Solution Design Understand the solutions and choices before making a decision Solution Design Solution Architecture HA Capabilities Limitations and Caveats Cost Vector Solution Design SQL Server Always On Technologies Always On Technologies Provides a full range of options to minimize downtime and maintain appropriate levels of application availability Increases Availability Decreased Downtime • • • • • Solution Design Backup and Restore Log Shipping Database Mirroring Failover Clustering Peer-Peer Replication • Online Index Operations • Table Partitioning • Enhanced Locking • Resource Governor • Database Snapshot • Dedicated Admin Connection • Dynamic Configuration Always On Technology Overview Solution Design Architecture Overview How does it work? Solution Characteristics Data Loss Guarantees Failover Characteristics Redundancy Levels and Utilization Cost Limitations and Caveats Increases Availability • • • • • Backup and Restore Log Shipping Database Mirroring Failover Clustering Peer-Peer Replication Backup and Restore Base availability technology for any solution Protects against failures and recovery from errors Provides Local HA and Site DR Need to ensure the backups are accessible if site goes down High RTO due to restore time RPO=0 can never be guaranteed Types: Full, Differential, and Transaction Log File-group backup/restore for large databases Backup Compression provides faster and smaller backups in SQL Server 2008 Solution Design Log Shipping Automated transaction log backup and restore provides redundancy at the database level SQLLogship.exe provides the underlying framework for doing automated backup, copy and restore Backup on primary instance Restore on secondary instance(s) Scheduling is done through SQL Server Agent jobs SQL Server 2008 provides sub-minute scheduling interval providing the ability to do quick backup and restores No automatic failover capabilities Solution Design Database Mirroring A database level high availability solution that provides complete protection against data loss and fast recovery through automatic failover Maintains a redundant database by shipping log blocks when the transactions are committed on the principal Synchronous and Asynchronous modes provide the spectrum of options to choose between availability and performance Automatic failover when using witness server Solution Design Failover Clustering Solution Design Instance level protection built on Windows Failover Clustering shared disk model Cluster nodes typically co-located within the same site to provide local HA Regional DR possible using VLAN and stretch storage level replication No built in data redundancy like database mirroring and log shipping Data protection has to be provided at the storage level or by combining with other solutions Transactional Replication Solution Design A high performance data replication solution that provides granular table level replication Logical data movement provides flexibility and better hardware utilization Key scenarios: Customized application-specific DR Real-time reporting on secondary server that be used for Site DR Scale out application queries with ability to use any one database copy for Site DR Two types relevant for HA and DR Transactional and Peer-to-Peer Always On Solution Characteristics RPO Redundancy and Utilization Failover Solution Design Cost Hardware App Perf Impact Manageability Low Low Low * Low High Low * Low Low Low Cluster High*** Low *** Low*** Transactional Replication Low Low High Peer-Peer Replication Low Low High Solutions No Data Loss (RPO=0) Failover Unit Inst DB Tab Auto Failover (RTO) Sync Async Multiple * Log Shipping DBM Read + ** Write * Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively ** Database Mirroring provides fastest failover to hot secondary *** Depends on SAN technology What’s New in SQL Server 2008 New Features Resource Governor Manage SQL Server workloads and resources by specifying limits on resource consumption Backup Compression Reduce backup and restore time Feature Enhancements Database Mirroring Automatic recovery from page corruption Log stream compression Faster recovery on failover Log Shipping Sub-Minute Log Shipping Backup compression Failover Clustering 16 nodes Rolling upgrade Peer-Peer Replication Hot add new nodes Backup Compression Common questions: “How much compression will I see?” “Will it be comparable to, say, SQL Litespeed?” One simple answer: “It depends!” All data compresses differently – the compression ratio achieved depends on: The type of data in the database Whether the data in the database is already compressed Whether the data/database is encrypted “We saw an 85 percent reduction in file size using SQL Server 2008 Backup Compression,” says Colin Neller, Senior Software Engineer at ServiceU and part of the company’s SQL Server 2008 implementation team. “A backup file that was previously over 300 GB is now only 40 GB, and the job runs in about half the time.” Backup Compression: Backup Performance Backup of a 322 MB Adventureworks database Compressed Uncompressed Hardly any CPU used (avg 5%), runtime = 39.5s, compression ratio of 0. A LOT more CPU used (avg 25%) BUT runtime = 21.6s (45% improvement) and backup stored in 76.7MB (4.2x compression ratio) DEMO: Increasing Availability Using Always On Technologies Solution Design Developing Your Availability Solution using SQL Server Always On Technologies Recap Application availability requirements or SLA drive primary solution choices RPO and RTO are the key metrics used to define the SLA Need mitigation against planned and unplanned downtimes Solution Design Application Availability Unplanned downtime Multiple solution choices that provides varying cost\benefits Clustering Planned Downtime Database Mirroring Other requirements apart from application SLA factor into the choice Understand constraints and tradeoffs you can make Log Shipping Peer-Peer Replication Always On Solution Characteristics RPO Redundancy and Utilization Failover Solution Design Cost Hardware App Perf Impact Manageability Low Low Low * Low High Low * Low Low Low Cluster High*** Low *** Low*** Transactional Replication Low Low High Peer-Peer Replication Low Low High Solutions No Data Loss (RPO=0) Failover Unit Inst DB Tab Auto Failover (RTO) Sync Async Multiple * Log Shipping DBM Read + ** Write * Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively ** Database Mirroring provides fastest failover to hot secondary *** Depends on SAN technology AdventureWorks Inc Scenario Adventureworks Inc is a manufacturing company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA Solution Design One datacenter located in Omaha Three applications Manufacturing – Tier 1 Finance – Tier 2 Scheduling – Tier 3 Manufacturing application runs on a dedicated SQL Server 2008 Instance All other applications run on a second instance Availability of manufacturing application is critical Implement a solution at the lowest possible cost Application Requirements Applications Data Loss RPO=0 RTO in secs Failover Unit Inst DB Auto Failover Read Solution Design Multiple Sites Read Write Tab Manufacturing Finance Scheduling Manufacturing application has strict SLA’s Finance application requires readability on the secondary The reports are run every 4 hours and need to be fresh as of the last one hour. To offload the reporting load from the main system they would like to utilize the mirror Solution Choice for Manufacturing Application Solution Design Solutions Data Loss RPO=0 Fast RTO Failover Unit Inst DB Tab Auto Failover Read >1 Sites\ Copy Read Write Cluster SAN Replication DBMClustering - Sync can provide a zero data loss solution that can also provide fast DBM - Async instance level failover Use RAID configuration to provide data Log Shipping redundancy on the SAN Transactional If a redundant copy is required that can Replication provide instance failover with zero data loss use SAN replication Peer-Peer Replication High Cost Solution Use synchronous database mirroring if instance failover is not needed Clustering with RAID Solution Choice for Finance Application Solution Design Solutions Data Loss RPO=0 Fast RTO Failover Unit Inst DB Tab Auto Failover Read >1 Sites\ Copy Read Write Cluster SAN Replication DBM - Sync DBM - Async Shipping LogFor database level redundancy with acceptable data loss with minimal perf impact, Transactional asynchronous database mirroring is an optimal Replication choice Peer-Peer Use database snapshots at periodic intervals to provide Replication a readable snapshot of the data for reporting Low cost solution Reports Finance Scheduling Async Database Mirroring Omaha Datacenter Db Snapshot every hour Adding a Regional Datacenter Into the Mix Solution Design Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure. It is now required that all applications have the ability to failover to the regional datacenter across the river in Council Bluff The SLA need to be maintained for tier 1 applications even in the case of site failures Regional Site Solution Choices Solution Design Manufacturing Cluster with SAN Sync Mirroring no witness Reports Finance Scheduling Async Database Mirroring Db Snapshot every hour Log Shipping Omaha Datacenter CB Datacenter A Complete Topology Solution Design Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX The disaster recovery site has lower SLA requirements for all applications The manufacturing application can have an RPO of 1 hour The RTO is set at 4 hours Topology Diagram Manufacturing Sync Mirroring No witness Cluster with SAN Log Shipping Solution Design Scale Out and Availability Scenario Adventureworks is building a new web based order management system that allows customers from all over the world access the system and place orders The core group of customers are in Western Europe, South East Asia and North America Solution Design Requirements – – – – Geo Redundancy Data Locality High Availability Local Read-Scale Workload Characteristics – Mainly reads – Few writes Application Characteristics – Each user logging in connects to a particular server Partitioned based on user-id and region Writes from a user always happen on one server regardless of the region the user log in from – All reads redirected to the closest geolocation Reasonable tolerance for latency (5-10 minutes) Replication Topology Asia1 Peer Nodes Read-Only Servers Solution Design Asia2 Implementing and maintaining a HA solution Licensing Facts Passive servers are mirror, log shipped secondary and clustering passive node No license required on passive if it is truly passive A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server. The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly. HA Features Edition Support Feature Express Workgroup Standard Database Mirroring 1 Failover Clustering 2 Enterprise Comments Advanced high availability solution that includes fast failover and automatic client redirection Backup Log-shipping Data backup and recovery solution Online System Changes Includes Hot Add Memory, dedicated administrative connection, and other online operations Online Indexing Online Restore Fast Recovery ₁Single thread redo ₂ Limited to 2 node cluster Database available when undo operations begin Summary There is no “one size fits all” solution Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions TEST the solution to ensure it can meet the availability requirements and meet SLA’s SQL Server Word of the Day Wednesday, May 13 DATA COMPRESSION *Game cards may be picked up at the SQL Server booths in the TLC Additional Resources • • Team Forum: http://social.msdn.microsoft.com/Forums/enUS/sqldisasterrecovery/threads Other: http://sqlcat.com/ External Resources http://www.microsoft.com/sqlserver/2008/en/us/hi gh-availability.aspxURL http://sqlcat.com/ SQL Server 2008 Business Value Calculator: www.moresqlserver.com Resources www.microsoft.com/teched www.microsoft.com/learning Sessions On-Demand & Community Microsoft Certification & Training Resources http://microsoft.com/technet http://microsoft.com/msdn Resources for IT Professionals Resources for Developers www.microsoft.com/learning Microsoft Certification and Training Resources Related Content Breakout Sessions DAT302 All You Need to Know about Microsoft SQL Server 2008 Failover Clusters DAT318 Microsoft SQL Server 2008 Virtualization Considerations and Best Practices DAT322 Tips and Tricks for Successful Database Mirroring Deployments with Microsoft SQL Server Hands On Lab DAT07-HOL Microsoft SQL Server 2008 Peer-to-Peer Replication DAT08-HOL Microsoft SQL Server 2008 Table Index and Partitioning DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 1 DAT13-HOL Microsoft SQL Server 2008 Database Mirroring, Part 2 DAT16-HOL Using Microsoft SQL Server 2008 Resource Governor for Predictable Performance Complete an evaluation on CommNet and enter to win! © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.