Disk and I/O Tuning on SQL Server 2005

Transcript Disk and I/O Tuning on SQL Server 2005

Disaster Recovery Mistakes
Jason Hall – Quest Software
DASSUG – 10.11.2007
Copyright © 2006 Quest Software
Agenda
•
•
•
•
•
Top 10 Disaster Recovery Mistakes
Best and Worst practices
About Quest Software
Resources
Q&A
1
Top 10 Disaster Recovery Mistakes
10. Misuse of Database Recovery Models
•
•
•
One of the first things considered when
building/receiving a database
Wrong choice can have disastrous implications
Recovery models will impact:
– Performance
– Backup time
– DR Complexity
– Recoverability
– Administration
Recovery Model Options
• Simple
– Transactions are minimally logged.
– Log file is only used for data consistency
– No log backups
• Bulk-Logged
– Bulk transactions are minimally logged while other
transactions are fully logged
– Improves performance of bulk operations
– Log backups/restores possible but no point in time recovery
• Full
– All transactions are fully logged
– Performance impact on all transactions
– Point in time recovery possible
9. Poorly Designed Retention Policies
•
•
•
•
•
Must consider data stored locally/remotely/offsite
When a disaster occurs, where would you like to
backup to be
– Where is it?
When calculating restore time take into
consideration
– Network transfer time
– Tape load time
Potential to make a 30 minute restore (downtime)
last hours/days if tention policy not well defined
Much more than what is offered in SSMP
8. Backup files are safe after written to disk?
• True or False?
– Backup files are written in a proprietary format that is only
useful during a restore operation.
8. Backup files are safe after written to disk?
• True or False?
– Backup files are written in a proprietary format that is only
useful during a restore operation.
• Backup files are a plain text dump of data
• Anyone with access to notepad/wordpad can view
potentially sensitive data contained in a backup file
• Same care taken to secure database should be taken
to secure backup file
– Folder permissions
– Encryption disk/take
• Show me!!!
7. Not realizing that a backup does not contain
everything.
• Backup files contain quite a bit
• They do not contain everything needed in all disaster
scenarios
• Mappings of logins to users
–
–
–
–
Master..syslogins -> login | SID
Userdb..sysusers - > dbuser | SID
No guarantee that SID is consistent between servers
Database access could be comprimised
• Full text catalogue’s in SQL Server 2000
– In SQL 2005 they are included in a full backup
• Database restore is only the first step in a recovery
process
6. Ignoring (de-valuing) system databases
• System databases are just as important (if not more
so) than user databases
• Master contains server instance level information
– Login/password
– Configuration settings
• MSDB contains SQL Server job information
– Job code
– Job history
– Backup information
• Model contains a template for new database creation
– May be important if you’ve made changes to it
• TempDB?
5. Relying only on backup and recovery
• Must consider all types of failure
– Physical vs. Logical
Physical vs. Logical Failure
•
•
Must consider all types of failure
Obvious (physical)
–
–
–
•
Not so obvious (logical)
–
•
Disk or controller failure rendering database
unusable
Network outage
Power outage
DROP/TRUNCATE table (non logged
operations)
High Availability solutions typically protect against physical
failures but provide no protection against logical failures as
they result in multiple copies of bad data.
5. Relying only on backup and recovery
•
•
Must consider all types of failure
– Physical vs. Logical
High Availability solutions combined with backup
and recovery provide a robust DR solution
HA and BR
• High Availability refers to uptime. “How can I ensure
that my server is available 99.999 percent of the
time?”
– Clustering
– Disk Mirroring
– Replication
• Backup and Recovery will fill the gaps left when HA
solutions are impractical (cost) or will not suffice
• Log Shipping can be considered a combination of the
two
4. Thinking that a successful backup ensures
recoverability
• Many factors can affect recoverability
– Media failures
– Security
– Network
• Run RESTORE VERIFYONLY as often as possible
– Realize that this also does not guarantee recoverability
• Run periodic recovery scenarios that test all aspects
of a disaster recovery plan
– Find the backup files
– Restore system databases(what if SA password lost or
unknown?)
– Restore the user database(s)
• Potentially run orphaned users script
– Connect the application to the restored database
3. No forecasting or trending of backup size and
performance
• As the database grows, the space required to backup
will grow with it
• Just because you have the space to do backups
today does not mean you will 6 months from now
• Just because you can fit a backup into a
maintenance window today does not mean you will 6
months from now
• Trending and Forecasting are critical in maintaining a
reliable and stable DR architecture
2. Backup direct to tape
• No longer best practices
• Tape is:
– Cheap
– Slow
– Unreliable
• Disk is:
– Also cheap (and getting cheaper)
– Fast (and getting faster)
– Reliable (and getting more reliable)
• Tape is still a valuable piece of a DR architecture
– Backup Disk to Tape
– Only archive to tape after you have successful disk
backup(s)
1. DBA loses control
• Your neck is on the line
• Make sure that you can restore the database with no
involvement from anyone else
• When the system is down:
–
–
–
–
You shouldn’t need to call the “tape guy”
You shouldn’t need any usernames or passwords
You shouldn’t need to fill holes in documentation
You shouldn’t need to explain to your boss that a database
restore hasn’t even begun because someone else dropped
the ball
How to build a Disaster Recovery plan?
•
•
How much data can be lost?
What technologies are available?
– HA and BR
•
•
•
•
How much time do you have (SLA’s)?
Is data security an issue?
How can we leverage our DR plan to refresh other
environments?
How are we going to ensure that our plan is
working?
How Much Time/Data Can You Afford To Lose?
• Affected by Recovery Mode – Simple or Full
• Reduce time to backup & capture more data
granularity using the various backup techniques:
– Log
– Differential
– Full
• Never more backup jobs than (CPUs – 1)
• Speed backups by spreading backup I/O across
multiple files and/or partitions
• Speed recovery by keeping backups on local disks
NOTE: High-availability techniques should complement
but NOT a replacement to backup & recovery.
19
Do You Have An SLA For Backup/Restore?
• Make sure databases aren’t corrupt by regularly
applying DBCC tests.
• Perform backups with verification to ensure file is
written properly.
• Remember that more files in a recovery can take
more time.
– Lots of log file backups can add more time/complexity to recovery
• Make sure you test both backup AND RESTORE for
cumulative time to completion!
– Test should include full OS+ recovery.
– Test should include pulling from tape.
– Don’t sign the SLA until you KNOW how long it takes!
20
Are Your Backups Secure and Available?
• Native backup files are human-readable with a hex
editor.
• Don’t forget about major legislation:
– HIPAA
– Sarbanes-Oxley (SOX)
• Legislation and new business practices mean the
lifespan of archived data must be at least 7 years.
– We often don’t have the tape drives any more!
– Some progressive sites are using “tiered storage” so data is
never off-line
21
Spend A Lot Of Time Refreshing Other
Environments?
• Recovery is often about setting up other
environments like Dev, QA and staging servers
– Do you need to refreshed only specific data sets in your other
environments?
– Do you only need schema and database objects such as
procedures and functions?
• SQL Server 2005 can really help here:
– Partial backups & piecemeal restores
– Database Snapshots
– Online restore features
22
Do You Have Centralized Reporting?
• Managing multiple servers with multiple databases is
difficult.
– Takes too much time to check each server individually
– Multiple tasks associated with backups (DBCC’s, file verification,
etc) add to the work load
– Managing multiple jobs that conflict can be difficult as well
• You must always backup Master and MSDB, plus
any user databases. So you always have multiple
DBs per instance to backup.
23
Backup & Recovery Best Practices
• Don’t rely on maintenance plans. Perform DBCCs
check first, then backup if successful.
• Verify file is written properly.
– Backup to local disk first.
– Archive to tape later, or to other network archival site.
• Use FULL recovery mode.
• Full backup nightly with log backups throughout the
day.
• Make sure Master and MSDB are also backed up.
• Periodically test a full recovery scenario, as if the
server hardware was irrecoverable.
• Ensure recovery when SA password is lost or
unknown.
24
Things to Avoid (Worst Practices)
• Backing up directly to tape.
• Backup up to the same physical drive as the
databases.
• Failing to check for corruption before backup.
• Failing to check for successful backup (both the SQL
Server Agent job and disk write).
• Failing to verify the backup (RESTORE
VERIFYONLY).
• Failing to test the full restore process.
25
How Do You Fix Self-Inflicted Data Problems?
• Where possible, use a log reader/recovery tool to fix
a single transaction issue. (NOTE: Only works in
FULL recovery mode.)
• Wrap everything you do in a transaction!!
• Some update/delete issues occur in an environment
with lots of new inserts happening all the time. Your
only option then is to:
– Restore to a staging database
– Recovery the lost data back in to production
– Can takes hours!
26
Resources
• TechNet support webcast on Backup & Restore at
http://support.microsoft.com/kb/325257/en-us
• Partial restores on SQL Server at
http://support.microsoft.com/kb/321836/en-us
• File and filegroup backup and restore at
http://support.microsoft.com/kb/281122/en-us
• The SearchSQLServer.com Backup and Recovery
Learning Guide at
http://searchsqlserver.techtarget.com/generic/0,295582,sid87_g
ci1166473,00.html?track=sy41&asrc=RSS_RSS-3_41
27
Q&A
• Send questions to me at: [email protected]
THANK YOU!
28
Who is Quest?
Accolades & Awards
•
TechTarget 2006 Product of the Year with LiteSpeed
•
Best of Tech Ed award 2006 with Spotlight on SQL Server
•
SQL Server Magazine, Platinum reader’s choice award
•
SQL Server Magazine Readers Choice Awards, winner in 11
categories, 2005
•
No. 1 vendor in Distributed Data Management Facilities, IDC,
2005
•
Microsoft ISV Partner of the Year, 2004
•
Microsoft TechEd Best of Show Winner, 2004
•
Microsoft TechEd Europe Best of Show Winner, 2005
•
No. 1 vendor in Application Management Software, Gartner
Dataquest, 2005
•
Jolt Productivity Award Winner
•
Network Computing Editor’s Choice Winner
•
No. 8 in the “Who’s Who in Enterprise Software,” Investor’s
Business Daily
30
31
LiteSpeed for SQL Server
Industry standard backup
and recovery tool
90% compressed backup
files and 70% faster backup
and restore operations
Object Level Recovery
Integrated Reporting
Spotlight on SQL Server Enterprise
Real time performance
diagnostics
Topological view of SQL
Server environment
Intuitive UI
Point and click
resolutions
OS and Database
drilldowns
Performance Analysis for SQL Server
Historical Performance
Analysis and Trending
Dimensional Analysis
Automatic Baselining
Integrated Reporting
Lightweight Stealth Collect
Agent
SQL Optimizer for SQL Server
Industry leading tuning
algorithms provide improved
query performance
Programmatically create
and test alternative SQL
statements