Gopal Ashok Program Manager Microsoft Corp What is this talk about? Ensuring IT services and operational continuity in the enterprise Protect mission critical SQL Server databases using Always On Technologies Maintenance Analysis Testing Solution Design Implementation Deployments.
Download ReportTranscript Gopal Ashok Program Manager Microsoft Corp What is this talk about? Ensuring IT services and operational continuity in the enterprise Protect mission critical SQL Server databases using Always On Technologies Maintenance Analysis Testing Solution Design Implementation Deployments.
Gopal Ashok Program Manager Microsoft Corp What is this talk about? Ensuring IT services and operational continuity in the enterprise Protect mission critical SQL Server databases using Always On Technologies Maintenance Analysis Testing Solution Design Implementation Deployments and Best Practices Defining HA and DR High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period Disaster Recovery involves processes and procedures designed to restore business operations due to a natural or human-induced disaster Typically involves providing redundancy spanning multiple sites or across geographic regions Availability defined in terms of service level agreements (SLA) Recovery Time Data loss during unplanned downtime Recovery Time Objective (RTO) guided by availability requirements How much downtime can you tolerate? Recovery Point Objective (RPO) guided by criticality of application data How much data can you lose? Availability Class Acceptable Downtime (hrs/yr) OR RTO Acceptable Data Loss (time of last copy) OR RPO Tier 1 >99.99% (1 hr or less) 5 min or less Tier 2 99.9% - 99.99% (18.5 hrs) 5 mins to 8.5 hrs Tier 3 (<99.9%) (Hours to days) Hours to days Protection Levels Protection against resource failures Machine Database Corruption Disk Resource Bottlenecks Regional DR Location Redundancy Building < 10 miles Geographic DR Protection against Local HA Natural Disasters Protection against Network Outages Site Failures Location Redundancy – City, County – < 100 Location Redundancy – State, Country – > 100 miles SQL Server High Availability Planning Analysis Application tiers serviced by the databases Protection levels: Local HA, Regional DR, Geographic DR Causes of database downtime Maintenance Analysis Solution Design Need to understand what solutions exists? What are the characteristics and cost of the solution? Implementation What are the deployment steps and best practices? Testing Solution Design Implementation Database Downtime Drivers Solution Design Understand the available technology options and characteristics before making a decision Solution Architecture HA Capabilities Limitations and Caveats Cost Vector Always On Technologies Provides a full range of options to minimize operational downtime and maintain appropriate levels of application availability. Always On Solution Characteristics RPO Redundancy and Utilization Failover Cost Hardware App Perf Impact Manageability Low Low Low * Low High Low * Low Low Low Cluster High*** Low *** Low*** Transactional Replication Low Low High Peer-Peer Replication Low Low High Solutions No Data Loss (RPO=0) Failover Unit Inst DB Tab Auto Failover (RTO) Sync Async Multiple * Log Shipping DBM Read + ** Write * Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively ** Database Mirroring provides fastest failover to hot secondary *** Depends on SAN technology Increasing Availability: ServiceU Planned downtime: Provide solutions for reserved seat ticketing, box office management, event management and online Payments No Service = No Revenue RPO = 0 (no data loss) RTO = 60 seconds maximum; some database changes may require a longer downtime than 60 seconds; in those cases every effort is made to minimize the service interruption Unplanned downtime: Loss of a database server: RPO = 0; that is, no data loss RTO = 60 seconds maximum Loss of the primary data center, or the entire database storage unit in the primary data center: RPO = 3 minutes maximum; RTO = 15 minutes total, including evaluation of the issue; ServiceU High Availability Architecture Basic Principle: Redundancy for all components 3-node cluster Redundancy during single node failure, patching etc No Majority: Disk Only Quorum Model Availability during multi-node failure No automatic failback to preferred node ServiceU Disaster Recovery Architecture Using Log Shipping to setup Mirroring Upgrading to SQL Server 2008 Windows Server 2003\SQL Server 2005 Upgraded both OS and SQL Server to 2008 Had to do this with very little downtime How much? Let’s find out!!!! Primary Site Upgrade Process Application Switch Over to temp cluster Establish async DBM from 2005 to 2008 Block users Sync mirroring DBM Failover Redirection Remove DBM Total end user down-time 10 minutes Temporary SQL Server 2008 Cluster On Windows Server 2008 Upgraded primary cluster to 2008 Repeated steps above Downtime 6 minutes Windows Server 2008 & SQL Server 2008 Better Together Failover Clustering Rolling upgrade and patching 16 nodes Database Mirroring Automatic recovery from page corruption Log stream compression Faster recovery on failover Resource Governor Manage SQL Server workloads and resources by specifying limits on resource consumption Backup Compression Reduce backup and restore time Log Shipping Sub-Minute Log Shipping Backup compression Replication Peer-Peer Replication: Hot add new nodes Improved performance over WAN links Database Mirroring Compression Benefit Cost Automatic Page Repair Rolling upgrade using Mirroring Failure is not an option: bWin Sports betting, Soft & skill games 1 million bets per day on > 90 Sports The Mission: Failure is not an option & Money is not a problem Rather lose availability and performance than data Environment 100+ TB Data 850+ DB’s 100 Instances 450K SQL Statements\Sec bWin High Availability Architecture Datacenter A Datacenter B Principal: 32 IA64 Dual Core CPU’s Mirror 32: IA64 Single Core Mirroring Principal Mirror 64 Network Ports (1 Gbps) 400 local SAS drives on 16 Log Shipping 1h delay Log backup file server Database backup file server Log backup file server LogShipping No delay Database backup file server RAID controllers (for OS, TempDB and Log files – low latency) 16 HBA’s for 256 Disk / 256GB cache SAN system Scale Out and Availability Scenario Adventureworks is building a new web based order management system that allows customers from all over the world to access the system and place orders The core group of customers are in Western Europe, South East Asia and North America Requirements – Geo Redundancy – Data Locality – High Availability – Local Read-Scale Workload Characteristics – Mainly reads – Few writes Application Characteristics – Each user logging in connects to a particular server Partitioned based on user-id and region Writes from a user always happen on one server regardless of the region the user log in from – All reads redirected to the closest geolocation Reasonable tolerance for latency (5-10 minutes) Replication Topology Asia1 Peer Nodes Read-Only Servers Asia2 Key to Success It’s not the vendor! It’s not the technology! It’s not the features! Licensing Facts Passive servers are mirror, log shipped secondary and clustering passive node No license required on passive if it is truly passive A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server. The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly. HA Features Edition Support Feature Express Workgroup Standard Database Mirroring 1 Failover Clustering 2 Enterprise Comments Advanced high availability solution that includes fast failover and automatic client redirection Backup Log-shipping Data backup and recovery solution Online System Changes Includes Hot Add Memory, dedicated administrative connection, and other online operations Online Indexing Online Restore Fast Recovery ₁Single thread redo ₂ Limited to 2 node cluster Database available when undo operations begin Resources www.microsoft.com/teched www.microsoft.com/learning Sessions On-Demand & Community Microsoft Certification & Training Resources http://microsoft.com/technet http://microsoft.com/msdn Resources for IT Professionals Resources for Developers Related Content Breakout Sessions DAT312 All You Needed to Know about Microsoft SQL Server 2008 Failover Clustering Hands-on Labs DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 1 DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 2 DAT05-HOL Microsoft SQL Server 2008 Data Snapshots DAT07-HOL Microsoft SQL Server 2008 Peer-to-Peer Replication DAT06-HOL Microsoft SQL Server 2008 Online Operations Complete an evaluation on CommNet and enter to win an Xbox 360 Elite! © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.