Disk and I/O Tuning on SQL Server 2005

download report

Transcript Disk and I/O Tuning on SQL Server 2005

Strategies for Addressing Common
SQL Server Challenges
Presented by Kevin Kline
SQL Server MVP
Tuesday, July 14, 2007
Copyright © 2006 Quest Software
Speaker bio
About Quest Software
Accolades & Awards
Surviving the Data Avalanche
Disk IO Tuning
Speaker Bio – Kevin Kline
• Started in IT in 1986. BS in MIS in 1989 from
University of Alabama.
• Microsoft SQL Server MVP since 2004
• Author on 7 database books
– 1 on Oracle, 2 on SQL Server, 2 on SQL, 1 on DB design, 1 on DB
• President of PASS (www.sqlpass.org)
– Conference is next Sept 16-18 in Denver, CO
– Over 130 sessions on SQL Server, BI & Dev
• Blogs for SQLMag and SQLBlog
• Monthly columns in SQL Server Magazine and
Database Trends & Applications
• Worked for NASA, US Army, and Deloitte & Touche
before Quest Software.
Accolades & Awards
TechTarget 2006 Product of the Year with LiteSpeed
Best of Tech Ed award 2006 with Spotlight on SQL Server
SQL Server Magazine, Platinum reader’s choice award
SQL Server Magazine Readers Choice Awards, winner in 11
categories, 2005
No. 1 vendor in Distributed Data Management Facilities, IDC,
Microsoft ISV Partner of the Year, 2004
Microsoft TechEd Best of Show Winner, 2004
Microsoft TechEd Europe Best of Show Winner, 2005
No. 1 vendor in Application Management Software, Gartner
Dataquest, 2005
Jolt Productivity Award Winner
Network Computing Editor’s Choice Winner
No. 8 in the “Who’s Who in Enterprise Software,” Investor’s
Business Daily
What is a VLDB?
• Used to be a set size:
>100mb in the 1980’s
>100gb in the early 1990’s
>1tb in the early 2000’s
>10tb in the mid 2000’s
• Now, a more flexible definition for VLDB prevails:
– It’s a VLDB when you can no longer use “common and standard”
techniques for management and performance
How big are SQL Server databases today?
• Hundreds of multi-terabyte databases on SQL Server
are known. At least an equal number are unknown.
• An example Microsoft customer has a 270tb data
– Growing 3tb per day, with 1tb of deletions per day
– Definitely a VLDB
• An example EMC customer in the DOE has a 200tb
database (with lots of blob data)
– Possibly not a VLDB
Why do apps generate more data today?
• Compliance and Auditing Requirements
Especially for telecom, finance, and health industries
Requirements to retain data for 7-10 years
Internal processes in industries like insurance
• Real Time Business Intelligence
– Old ETL/Mainframe-derived data is losing ground
– Live streaming data is producing more accurate, more timely
• Data is both more granular and more records (e.g. drive-through
fast-food vendors in the restaurant industry)
• More sophisticated business processes
• Longer-term data is still kept on-line for better decision making
(e.g. the leap-year birthday phenomenon in the retail industry)
Why else is there more data today?
• Backup retention is a HUGE driver
• Recovery from tapes is losing ground to:
– High-availability alternatives (e.g. hot standby’s, clustering,
replicated data) are often easier than tapes:
• Old versions of application aren’t around
• Platform software (OS, database, drivers) aren’t the right
• Tape drives or other important hardware may not be around
– Tiered storage is a growing alternative
• Replace the “sliding window” of data with a tiered set of storage
based on performance and cost
• Recovery is the overlooked component of backup
strategy. TEST!!!
What other problems arise with VLDBs?
• Server proliferation becomes a big headache with
– Additional servers are needed for high-availability, replicated, or hot
standby servers
– Often, architecture divides processing onto a several servers
– ETL servers
• KISS principles dictate that when you have
something big and unwieldy, that you break it down
into to more manageable components
– Partitioning in SQL Server 2005
– Metadata server with partitioned data warehouses in BI VLDBs
• Loading data and, separately, cleaning data is an
enormous issue
Storage Strategies – Tiered Storage
• Tapes don’t get upgraded when the system gets
• Some progressive customers are using tiered storage
– Active application data is on state-of-the-art disk arrays (e.g.
RAID10, high RPM speeds)
– Near-term older data is kept on less expensive disk arrays (e.g.
RAID5, middle RPM speeds)
– Old, long-term data is kept on very inexpensive SATA drives (e.g.
high volume, low RPM speeds) and never deleted
Storage Strategies – Disk and IO Design
• Storage admin might give the DBA disk according to
the volume s/he needs, but not the IO
– Schema design is key; not always a knob to turn
– Schema design (done poorly) often contributes to data bloat
through poor normalization and/or poor choice of data types
– Misaligned block sizes
– LUNs set up for serial IO transaction load, not large block reads
– Exchange best practices are carried forward to SQL Server even
when they don’t apply!
– OLTP and OLAP have conflicting needs (IO/sec versus MB/sec)
• Remedy of first resort is often “throw more hardware
at it”
Storage Strategies – Personnel
• Applications need an overall architect!
• Personnel (storage admin, database admin) within a
company often don’t communicate
• Ensure that there’s one version of the “truth”
– Developers aren’t always thinking beyond the deadline
• Coding to business requirements, not performance
• Often thinking as in row-based mode
– Many companies could benefit from a database programmer role
• DBAs very often spend a LOT of time fixing the bad
code of other people
Storage Strategies – SAN
• Other apps may suck up all the cache.
• Other apps may suck up all of the IO on the SAN
• DBAs often data the volume of storage they ask for,
but don’t know to also ask for IO as well!
• Test your SAN to ensure it carries the load effectively
Managing VLDBs, part 1
• Backup
– You have to get clever with VLDBs, using something like database
– Date range backups; e.g. older data doesn’t change, so you don’t
need to back it up
– Partitioning, especially read-only partitions, can amplify your ability
to manage a SQL Server system – not only backups, but also
indexing and defragmentation
– Serialized may be better than parallel backup
– More thought needs to go into multi-database backup and recovery!
– More databases on a single instance means more IO contention
and shorter maintenance windows
• Indexing
– New on-line indexing opens up new capabilities; slower due to
trickle effect, but on-line through-out the process
Managing VLDBs, part 2
• Partitioning
– Multiple IO paths to the SAN
– Allows parallel indexing, backups, data cleanup operations, and
much more
– Be careful of updating statistics and DBCC commands. They run
per tables across all partitions.
– Sexy!
• Transaction Processing
– Views can help mitigate transactions that have run amok
– Views work on all versions of SQL Server
– Partitioning is a big help in SQL Server 2005
– Smart clustering can facilitate parallel loads, even to a single table
– Load jobs are long running, single-threaded jobs – thus tied to a
processor – meaning that no more load jobs than number of CPUs.
Quest Management Suite for SQL Server
The Basics of I/O
1. A single fixed disk is inadequate except for the
simplest needs
2. Database applications require a Redundant Array of
Inexpensive Disks (RAID) for:
a. Fault tolerance
b. Availability
c. Speed
d. Different levels offer different pros/cons
RAID Level 5
• Pros
– Highest Read data transaction rate; Medium Write data transaction rate
– Low ratio of parity disks to data disks means high efficiency
– Good aggregate transfer rate
• Cons
– Disk failure has a medium impact on throughput; Most complex controller
– Difficult to rebuild in the event of a disk failure (compared to RAID 1)
– Individual block data transfer rate same as single disk
RAID Level 1
• Pros
– One Write or two Reads possible per mirrored pair
– 100% redundancy of data
– RAID 1 can (possibly) sustain multiple simultaneous drive failures
Simplest RAID storage subsystem design
• Cons
– High disk overhead (100%)
– Cost
RAID Level 10 (a.k.a. 1 + 0)
• Pros
– RAID 10 is implemented as a striped array whose segments are RAID 1
– RAID 10 has the same fault tolerance as RAID level 1
RAID 10 has the same overhead for fault-tolerance as mirroring alone
– High I/O rates are achieved by striping RAID 1 segments
– RAID 10 array can (possibly) sustain multiple simultaneous drive failures
– Excellent solution for sites that would have otherwise go with RAID 1 but
need some additional performance boost
SAN (Storage Area Network)
• Pros
– Supports
multiple systems
– Newest
matches RAID1
/ RAID1+0
• Cons
– Expense and
– Must measure
for bandwidth
requirements of
internal RAID,
and I/O
Overview by Analogy
Monitoring Disk Performance
1. Physical Disk
2. Logical Disk
Monitoring Raw Disk Physical Performance
Avg. Disk sec/Read and Avg. Disk sec/Write
Transaction Log Access
Avg disk writes/sec should be <= 1 msec (with array
accelerator enabled)
Database Access
Avg disk reads/sec should be <= 15-20 msec
Avg disk writes/sec should be <= 1 msec (with array
accelerator enabled)
Remember checkpointing in your calculations!
Monitoring Raw I/O Physical Performance
1. Counters - Disk Transfers/sec, Disk Reads/sec, and
Disk Writes/sec
2. Calculate the nbr of transfers/sec for a single drive:
a. First divide the number of I/O operations/sec by
number of disk drives
b. Then factor in appropriate RAID overhead
3. You shouldn’t have more I/O requests (disk
transfers)/sec per disk drive:
8KB I/O Requests
Sequential Write
Random Read/Write
10K RPM 9-72 GB
15K RPM 9–18 GB
Estimating Average I/O
1. Collect long-term averages of I/O counters (Disk
Transfers/sec, Disk Reads/sec, and Disk Writes/sec)
2. Use the following equations to calculate I/Os per second
per disk drive:
a. I/Os per sec. per drive w/RAID 1 = (Disk Reads/sec + 2*Disk
Writes /sec)/(nbr drives in volume)
b. I/Os per sec. per drive w/RAID 5 = (Disk Reads/sec + 4*Disk
Writes /sec)/(nbr drives in volume)
3. Repeat for each logical volume. (Remember
4. If your values don’t equal or exceed the values on the
previous slide, increase speeds by:
a. Adding drives to the volume
b. Getting faster drives
Queue Lengths
1. Counters - Avg. Disk Queue Length and Current Disk
Queue Length
a. Avg Disk Queue <= 2 per disk drive in volume
b. Calculate by dividing queue length by number of drives in
2. Example:
a. In a 12-drive array, max queued disk request = 22 and
average queued disk requests = 8.25
b. Do the math for max: 22 (max queued requests) divided by
12 (disks in array) = 1.83 queued requests per disk during
peak. We’re ok since we’re <= 2.
Do the math for avg: 8.25 (avg queued requests) divided by
12 (disks in array) = 0.69 queued requests per disk on
average. Again, we’re ok since we’re <= 2.
Disk Time
1. Counters - % Disk Time (%DT), % Disk Read Time
(%DRT), and % Disk Write Time (%DWT)
a. Use %DT with % Processor Time to determine time spent
executing I/O requests and processing non-idle threads.
b. Use %DRT and %DWT to understand types of I/O performed
2. Goal is the have most time spent processing non-idle
threads (i.e. %DT and % Processor Time >= 90).
3. If %DT and % Processor Time are drastically different,
then there’s usually a bottleneck.
Database I/O
1. Counters – Page Reads/sec, Page Requests/sec,
Page Writes/sec, and Readahead Pages/sec
2. Page Reads/sec
a. If consistently high, it may indicate low memory allocation or
an insufficient disk drive subsystem. Improve by optimizing
queries, using indexes, and/or redesigning database
b. Related to, but not the same as, the Reads/sec reported by
the Logical Disk or Physical Disk objects
3. Page Writes/Sec: Ratio of Page Reads/sec to Page
Writes/sec typically ranges from 5:1 and higher in OLTP
4. Readahead Pages/Sec
a. Included in Page Reads/sec value
b. Performs full extent reads of 8 8k pages (64k per read)
Tuning I/O
1. When bottlenecking on too much I/O:
a. Tuning queries (reads) or transactions (writes)
b. Tuning or adding indexes
Tuning fill factor
d. Placing tables and/or indexes in separate file groups on
separate drives
e. Partitioning tables
2. Hardware solutions include:
a. Adding spindles (reads) or controllers (writes)
b. Adding or upgrading drive speed
Adding or upgrading controller cache. (However, beware
write cache without battery backup.)
d. Adding memory or moving to 64-bit memory.
Trending and Forecasting
1. Trending and forecasting is hard work!
2. Create a tracking table to store:
a. Number of records in each table
b. Amount of data pages and index pages, or space consumed
Track I/O per table using fn_virtualfilestats
d. Run a daily job to capture data
3. Perform analysis:
a. Export tracking data to Excel
b. Forecast and graph off of data in worksheet
4. Go back to step 2d and repeat
Disk Rules of Thumb for Better Performance
1. Put SQL Server data devices on a non-boot disk
2. Put logs and data on separate volumes and, if possible, on
independent SCSI channels
3. Pre-size your data and log files; Don’t rely on AUTOGROW
4. RAID 1 and RAID1+0 are much better than RAID5
5. Tune TEMPDB separately
6. Create 1 data file (per filegroup) for physical CPU on the server
7. Create data files all the same size per database
8. Add spindles for read speed, controllers for write speed
9. Partitioning … for the highly stressed database
10. Monitor, tune, repeat…
Quest Management Suite for SQL Server
• LiteSpeed for SQL Server Enterprise – Advanced
compression and encryption
– Efficiently manage SQL Server backup and recovery operations
• Spotlight on SQL Server Enterprise – Real-time
performance diagnostics
– Diagnose and resolve SQL Server performance Issues
• Change Director – 24x7 SQL Server change
– Track database changes on any SQL Server
• Capacity Manager for SQL Server – Automated
storage and resource planning
– Automate the process of capacity and resource planning
Quest Product Bundle
Resources, VLDB
• Download my white papers at Quest.com, read my
blog at http://www.sqlmag.com and http://sqlblog.com
• See the full panel discussion from the PASS 2006
Summit at
• Project REAL at
• SQL Server Customer Advisory Blog at
Resources, Disk IO Tuning
• See my webcast and read my article on SQL Server
Magazine called ‘Bare Metal Tuning’ to learn about
file placement, RAID comparisons, etc.
• Check out www.baarf.com and www.SQL-ServerPerformance.com
• Storage Top 10 Best Practices at
Call to Action – Next Steps
• Learn more about Quest Management Suite for SQL
Server: http://www.quest.com/sql_server/
– Download trials
– Read white papers
– Review case studies
• Email us with your questions: [email protected]
Send questions to me at: [email protected]
Send broader technical questions to: [email protected]
Send Sales questions to: [email protected]
For sales & licensing questions, call:
– UK: +44.1628.518000