Building Mission Critical Systems with SQL Server 2005

Download Report

Transcript Building Mission Critical Systems with SQL Server 2005

Building Highly Available Systems with SQL Server™ 2005

Availability

What does it mean to you?

Can your customers get done, what they need to get done, when they need to do it?

 Why not?

 Site is unavailable  System is unavailable  Database is unavailable  Database is

partially un

available  Table is unavailable  Data is unavailable

24x7x365

Take Advantage When?

How much work to leverage the technology?

Upgrade Immediate Minimal Work to Leverage Design and Architect

Partial Database Availability

Online Piecemeal Restore

Instant File Initialization

Fast Recovery

Online Index Operations

When Criteria Met 

Snapshot Isolation

Statement-level Snapshot 

Snapshot Isolation

Transaction-level Snapshot (RO) 

Failover Clustering

Database Mirroring

Log Shipping

Database Snapshots

Online Index Operations

When Criteria NOT Met (minority) 

Snapshot Isolation

With Update Conflict Detection 

Replication

 

Improving Availability from Installation to Design Availability in Layers to minimize downtime and data loss

Improved Data Availability without Requiring Standby

Downtime is reduced and/or prevented for these barriers:

 Database is

partially un

available  Table is unavailable  Data is unavailable

Barriers to Availability

Isolated Failures  Continuing to work with isolated failures  Limiting the scope of failure  Partial Database Availability  Online Piecemeal Restore  Supporting Technology  Instant File Initialization  How do they work?

What happens when…

 Disks Fail  In SQL Server ™ 2000  Database is marked suspect  Users are unable to access the database  In SQL Server ™ 2005   Filegroup is marked offline Users are able to access undamaged data

What happens when…

 Recovery begins  In SQL Server ™ 2000  

Database

is in a restoring state Users are

unable

to access the database   File needs to be recreated and zero initialized File Restore can proceed –

offline

 In SQL Server ™ 2005 

Filegroup

is in a restoring state    Users are

able

to access undamaged data File can be recreated with instant file Initialization Piecemeal Restore can proceed –

online

How is This Possible?

  Fine grained operations are based on “functional partitioning” Partitioning – in this sense – does not require Partitioned Tables  Partitioned Tables benefit significantly from fine grained operations  Partitioning for fine grained operations requires secondary, non-primary data files where data is strategically placed  Recovery of your damaged devices can be prioritized and then the database can be brought online in stages

Functional Partitioning

Strategies to separate Objects/Data  Related Object-groupings  Separate tables – strategically placed on different filegroups  Time-based data placement/partitioning  Structures designed for sliding window scenario  List-based groupings/partitioning  Range-based partitioning based on complete lists   To fully leverage Partial Database Availability for partitioned objects – use Partitioned Tables Partitioned Tables – new feature in SQL Server ™ 2005 to further simplify the process of building large data warehouses

Benefits of Partitioning

 Speed in managing sliding window  Partition manipulation outside of active table  Piecemeal Backup  Backup active components more frequently, inactive less frequently  Partial Database Availability  If a filegroup becomes unavailable the undamaged data remains available  Online Piecemeal Restore  During the restore, the undamaged data remains available

Partial Database Availability

Improving Availability for Isolated Disaster  Undamaged data remains available while damaged data is inaccessible  File Status shown in sys.database_files catalog view  Page Errors written to suspect_pages table in msdb  Agent alerts:  Notification of the damaged file  Can take the database offline, if desired  Can automate the restore, for read-only data

Database Components

TicketSalesD B Primary File1 File2

 Database 

consists of…

Filegroups

consist of…

 Files 

consist of…

Extents

consist of…

 Pages

consist of data

2004 2003 2002 2001 Log File3 File4 File5 File6 File Header 0 1 2 3

extent

0 7 4 5 6 8 9 10 11

extent

1 15 12 13 14 16 17 18 19

extent

2 23 20 21 22 24 25 26 27

extent

3 31 28 29 … 30

Improving Data Availability, Part I Partial Database Availability

Name Title Company

Online Piecemeal Restore

Improving Availability during Recovery      Readonly filegroups can be restored without rolling forward log changes  Almost any component (page, file, filegroup) can be restored – ONLINE If a page is damaged – restore only that page from a file, filegroup or database backup If a file is damaged – restore only that file from a file, filegroup or database backup If a filegroup is damaged – restore only that filegroup from a filegroup or database backup Users can access the database during the restore

Instant File Initialization

Improving Availability by Reducing Downtime   SQL Server™ 2000  All data and log files must be zero initialized  Downtime during recovery negatively impacted by the file creation phase of restore SQL Server™ 2005  Only log files must be zero initialized  Downtime during recovery significantly reduced by skipping zero initialization during the file creation phase of restore  Not only a benefit to Restore    Database Creation All Restores: File, Filegroup and Database Restores Database File Changes: autogrow, manual resizing

Improving Data Availability, Part I Online Piecemeal Restore

Name Title Company

Summary: Isolated Failures

Technology Partial Database Availability Instant File Initialization Online Piecemeal Restore Improves

Data Availability

Undamaged data/partitions remains available 

Recovery Time

Recover only that which is damaged  

File, Filegroup, and Database Restore

Missing files are created quickly  

Database Creation Time

Files are not zero-initialized

Autogrow and Manual Growth time

Additional space is quickly added

Recovery Time

– Less time to create files  

Data Availability

Undamaged data/partitions remains available during recovery

Recovery Time

Recover only that which is damaged –

online

When

Upgrade Immediate Upgrade Immediate Upgrade Immediate

Barriers to Availability

Concurrency Requirements  Database is available but the application/user cannot complete required operations  What about operational impacts?

 Maintenance Operations which cause blocking  New Online Index Rebuilds  What about application impact?

 Poorly designed and/or long running transactions  Varying data access patterns  New Snapshot Isolation options

What happens when…

 Indexes need to be rebuilt  In SQL Server™ 2000  Index rebuilds require an exclusive table-level lock, resulting in offline rebuilds  Users are

unable

to access the table  In SQL Server™ 2005  Rebuilds of an index can be performed online if a few simple criteria are met  Users are

able

to access the table

Online Index Operations

Improving Concurrency during Index Maintenance  SQL Server™ 2000   Offline Index Rebuilds ; table data is unavailable during operation  SQL Server™ 2005  Includes all of the above offline operations, plus…  New ALTER INDEX…REBUILD:   ONLINE – allows concurrent user access (queries as well as modifications) to the index during rebuild OFFLINE – works using locks (same as SQL Server™ 2000)  Rebuild options: DBCC DBREINDEX and CREATE with DROP_EXISTING If online is not possible by default, consider design alternatives to fully leverage online index rebuilds

Online Index Rebuilds

Name Title Company

What happens when…

 Readers and Writers desire the same data  In SQL Server™ 2000 

Locking

is used to guarantee the intended level of isolation  Users must wait to access locked data  Concurrency and performance compromised   Correctness is compromised when lower isolation levels are used to avoid locking In SQL Server™ 2005  

Locking

OR

Versioning

can be used to guarantee a variety of isolation levels With

versioning

, Readers won’t block writers and writers won’t block readers   Performance improved if contention was primary bottleneck Correctness is not compromised due to use of lower isolation levels

Snapshot Isolation

Improving Concurrency in Mixed Workloads  SQL Server™ 2000    Isolation implemented solely through locking Mixed workloads may experience:  Concurrency problems due to blocking  The Inconsistent Analysis problem SQL Server™ 2005  Isolation implemented using locking and versioning  Mixed workloads can improve read consistency and performance using:  Read committed with Statement-level snapshot to improve statement-level consistency  Snapshot Isolation to improve transaction-level consistency

Snapshot Isolation

Name Title Company

Summary: Concurrency Requirements

Technology Online Index Operations Snapshot Isolation

Statement-level Snapshot

Snapshot Isolation

Transaction-level Snapshot for read-consistency Transaction-level Snapshot for Update Conflict Resolution

Improves

 

Table concurrency

tables being rebuilt remain available

Downtime due to Maintenance

no longer required for majority of indexes 

Row concurrency

locked rows prior and consistent version remain accessible 

Accuracy

long running aggregates/statements use consistent version from

statement

start 

Analysis/Query Time

Queries do not wait!

Row concurrency

locked rows prior and consistent version remain accessible 

Accuracy

long running aggregates/statements use consistent version from

transaction

start 

Analysis/Query Time

Queries do not wait!

When

Minimal Work to Leverage Design and Architect Minimal Work to Leverage Minimal Work to Leverage Design and Architect

Improved Availability with Standby Technologies

Downtime is reduced and/or prevented for these barriers:

 Site is unavailable  System is unavailable  Database is unavailable

Barriers to Availability

Catastrophic Failures     Database is completely unavailable Server is unavailable Site is unavailable Standby Technologies  Failover Clustering  Database Mirroring  Replication  Log Shipping  Supporting Technology  Fast Recovery  How do they work?

Failover Clustering

Server-level Redundancy     Established High Availability Technology Hot Standby: Automatic Detection and Automatic Failover  No work loss exposure and no direct impact to workload Protects against node failures Geographically Dispersed Failover Clusters with approved hardware

Failover Cluster

 Recovery on failover improved by Fast Recovery

Failover Clustering

New for SQL Server™ 2005        Faster Failover through Fast Recovery Supports up to an 8-node Failover Cluster with Enterprise Edition Supports up to a 2-node Failover Cluster with Standard Edition Supports mounted volumes for better explicit disk usage – helps in server consolidation Supports dynamic AWE for better memory utilization Unattended setup All SQL Server data services participate  Database Engine, SQL Server Agent, Full-Text Search  Analysis Services – Now has multiple instances

Fast Recovery

Improving Availability by Reducing Downtime  Not only beneficial to Failover Clustering  On every server startup, Restart Recovery runs to guarantee consistency  Restart Recovery has two phases:  REDO: rolls forward committed transactions  UNDO: rolls back any incomplete transactions  In SQL Server™ 2005, users are allowed access after REDO SQL Server™ 2005

Redo

ONLINE SQL Server™ 2000

Undo

ONLINE

Database Mirroring

Database-level Redundancy  Upcoming High Availability Technology  Released for testing and prototyping in SQL Server™ 2005 RTM  Certified for Production Use in the first half of 2006  Supports three configurations:  High Availability  High Protection  High Performance

Database Mirroring

Technology Overview   Principal Database handles user activity Mirror Database receives changes via secure, dedicated TCP channel  Server does NOT require a license if the server acts solely for redundancy   Optional Witness Server  Lightweight mechanism to help provide quorum  Can run on any SQL Server Edition Supports three configurations:    High Availability High Protection High Performance

Database Mirroring

Basic Principal of Synchronous Mirroring

Commit Acknowledge Acknowledge Write to Local Log DB Log Transmit to Mirror Committed in Log Write to Remote Log Log Constantly Redoing on Mirror DB

Database Mirroring

Configuration Summary 

High Availability Automatic Detection

Automatic Failover

Uses synchronous form of mirroring High Protection

No Automatic Detection

Manual Failover

Uses synchronous form of mirroring

High Performance No Automatic Detection

Manual Failover

Uses asynchronous form of mirroring

   

Requires Witness Principal performance is affected by network speed

 

Does not require Witness Principal performance is

 

Does not require Witness Principal performance is NOT affected by

Mirror database is available for read-only analysis through

distance

the use of Database Snapshots

network speed and distance

Database Mirroring

Name Title Company

Database Scale Out

Peer to Peer Replication  Identical databases continuously synchronize in near real time  Scale query workloads beyond what’s possible with a single database Example: Distributed Trading System Chicago London Tokyo

Availability through Scalability

Peer to Peer Replication  Enables load-balancing and improved availability through scalability  Database failures shouldn’t bring down the application system  Database upgrades should be done without outages  Individual databases can be taken online/offline and maintained without application downtime  Warm Standby  Small possibility of some data loss on failure

Peer-to-Peer Replication

 Based on Established Transaction Replication Technology  Based on Bi-directional Transactional Replication  All participants are peers   Schema is identical on all sites Publish the updates made on “their” data   Subscribe to others to pick up their changes No hierarchy as in “normal” transactional replication  A given set of data can be updated at only one site at a time  Data “ownership” is purely logical; does not prevent conflicts  SQL Server prevents a change from round-tripping

Peer to Peer Topology

London

Logreader Agent

Chicago

Logreader Agent

Peer to Peer Transactional Replication

Dist DB Distribution Agent

Tokyo

Logreader Agent Dist DB Distribution Agent Dist DB Distribution Agent

Peer-to-Peer Replication

Name Title Company

Log Shipping

Database-level Redundancy   Established High Availability Technology Supports multiple secondary servers  Secondary for Failover  Secondary for Reporting  Secondary with delay for Human Error Recovery   Can be combined with other technologies such as Failover Clustering and Database Mirroring New for SQL Server™ 2005  Integration in SQL Server Management Studio  Log Shipping is not delayed during Database or Differential Backups

Summary: Standby Technologies

Technology Failover Clustering Database Mirroring

High Availability Configuration

Log Shipping Database Mirroring

High Protection Configuration

Database Mirroring

High Performance Configuration

Replication Partial Database Availability Backup and Restore Standby

Hot Hot Warm Warm Warm Warm Online Cold

Protection

Server Database Database Database

When

Minimal Work to Leverage Minimal Work to Leverage Minimal Work to Leverage Minimal Work to Leverage Database Database (Publication of objects) Filegroup/File Database Filegroup/File Minimal Work to Leverage Design and Architect Upgrade Immediate Upgrade Immediate

Barriers to Availability

People

I’m going to modify this data…right … here !

This job would be great if it weren’t for… …the users …the staff …us

Barriers to Availability

Many more barriers than discussed Only some are addressable by database technology Be sure to consider people, planning, procedures and training

 Microsoft SQL Server™ 2005 gives you greatly improved tools to overcome these barriers to availability:  Database Server Failure or Disaster      Isolated Disk Failure Data Access Concurrency Limitations Database Maintenance and Operations Availability at Scale User or Application Error

Summary

   SQL Server™ 2005 offers greater availability – immediately  Many technologies available just by upgrading!

 Some architected/implemented over time SQL Server™ 2005 is more Available  Partially damaged databases remain available  Databases being recovered remain available  Instant File Initialization, Fast Recovery  New and Improved Replication Alternatives SQL Server™ 2005 is more Robust

Take Advantage When?

How much work to leverage the technology?

Upgrade Immediate Minimal Work to Leverage Design and Architect

Partial Database Availability

Online Piecemeal Restore

Instant File Initialization

Fast Recovery

Online Index Operations

When Criteria Met 

Snapshot Isolation

Statement-level Snapshot 

Snapshot Isolation

Transaction-level Snapshot (RO) 

Failover Clustering

Database Mirroring

Log Shipping

Database Snapshots

Online Index Operations

When Criteria NOT Met (minority) 

Snapshot Isolation

With Update Conflict Detection 

Replication

 

Improving Availability from Installation to Design Availability in Layers to minimize downtime and data loss