Transcript Slide 1

SQL Server Storage and
Index Structures
Physical Data Organization
Indexes
B-Trees
SQL Server Data Access
Clustered and Non-Clustered
Creating, Altering, Dropping Indexes
Choosing your Indexes
Maintaining your Indexes
SQL Server Storage Hierarchy
• Database
• Extent
– 8 contiguous 64K data pages
– Once extent full, next record will take up a
whole additional extent.
– Pre-allocating space saves time.
SQL Server Storage Hierarchy
• Page
– 64K bytes
– # of records/page varies with bytes/record
– Types of Pages: Data and Index pages
– Page Split
• When page becomes full, it splits
– New page allocated: ½ data from old page moved to new
• Rows
– 8060 bytes and 1024 columns
B-tree Key Points to Remember
• Tree portion includes key attributes only
– ordered as in create index statement
• Keys packed in index pages
– Fewer bytes per key -> more keys per
page/extent -> fewer page faults per access.
• Clustered indexes have records at leafs
– Records are in data pages
– Data pages sequentially linked
• Non-Clustered indexes point into heap or
tree portion of clustered index
Create Index Statement
• Create [unique] [clustered | nonclustered]
index <indexName> on <table or view>
(col. Name> [asc|desc] [,…])
include (<col name> [,…])
(with …
pad_index, fillfactor, ignore_dup_key
drop_existing, statistics_norecompute
sort_in_tempDB, online, allow_row_locks,
allow_page_locks, maxdop
Create Index Details
• Asc/Desc
– Ascending & descending sort order for index
• Include (cool!)
– Includes col in leaf nodes of clustered index
• Allows very fast access to non-key attribute
• Useful with very large record – fewer page faults
Create Index “with” Details
• Pad_Index= (on|off)
– Initial fill-factor for index’s non-leaf pages
• Fill Factor = <1% – 100%>
– Default is index pages are as full as possible minus
two records
– Fill factor is how full after index is created
• Once split goes to 50%
• Ignore_dup_key
– Circumvent unique key constraint somewhat
• Still get error message, but no rollback
• useful for storing unique values but trashing transactions
Create Index “with” Details
• Drop_Existing
– Any existing index with same name is
dropped with this create statement
• More efficient than drop index followed by create
for clustered index as no need to touch nonclustered indexes or data pages
• Statistics_nonrecompute
– Default: sql server automates the process of
updating the statistics on tables/ indexes
– This option says you will maintain stats
• DON’T USE THIS!
Create Index “with” Details
• Sort_In_tempdb
– Only useful when tempdb on physically
separate drive
– Reads/write for sort compete with read/writes
to write data and index pages
• This make sense if and only if you understand disk
writes - discussion
• Online
– Keeps table available to users while creating
index – sounds good, but ….!!
Create Index “with” Details
• Allow row/page locks
– Don’t use unless really good
• MAXDOP
– Overrides system setting for max degree of
parallelism while building index
• How many processes are used to construct an
index. MAXDOP sets limit on how many
processors per operation.
– Compare and contrast these terms
Create Index “with” Details
• ON
– Can store index separately from data
• Space for index spread across drives
• I/O for indexes not compete with physical data
retrieval
XML Indexes
• Indexes into XML data
– Xml VERY unstructured
– Column can be of type xml in sql server
– Create index on xml column
– Page 276 for more details
Implied indexes
• created by some constraints
– Primary Key
– Unique
• Can easily end up with duplicate
constraints and not realize it
Deciding what indexes go where?
• Indexes speed access, but costly to
maintain
– Almost every update to table requires altering
both data pages and every index.
• All inserts and deletions affect all indexes
• Many updates will affect non-clustered indexes
• Sometimes less is more
– Not creating an index sometimes may be best
• Code for tranasaction have where clause? What
columns used? Sort requried?
Deciding what indexes go where?
• Selectivity
– Indexes, particularly non-clustered indexes,
are primarily beneficial in situations where
there is a reasonably HIGH LEVEL of
Selectivity within the index.
• % of values in column that are unique
• Higher percentage of unique values, the higher the
selectivity
– If 80% of parts are either ‘red’ or ‘green’ not very
selective
Choosing Clustered Index
• Only one per table! - Choose wisely
• Default, primary key creates clustered index
– Do you really want your prime key to be
clustered index?
– Option: create table foo myfooExample
(column1 int identify
primary key nonclustered
column2 ….
)
– Changing clustered index can be costly
• How long? Do I have enough space?
Clustered Indexes Pros & Cons
• Pros
– Clustered indexes best for queries where
columns in question will frequently be the
subject of
• RANGE query (e.g., between)
• Group by with max, min, count
– Search can go straight to particular point in
data and just keep reading sequentially from
there.
– Clustered indexes helpful with order by based
on clustered key
Clustered Indexes Pros & Cons
• The Cons – two situations
– Don’t use clustered index on column just
because seems thing to do (e.g., primary key
default)
– Lots of inserts in non-sequential order
• Constant page splits, include data page as well as
index pages
• Choose clustered key that is going to be sequential
inserting
• Don’t use a clustered index at all perhaps?
Column Order Matters
• (P#, S#, Qty)
– P# S# together are primary key
• One index that includes all columns is not useful
in all situations!
– Only end up storing data a second time.
• Clustered index of P#S# not same as S#P#
– P#S# can lookup P# fairly easily, but looking up S#
requires a linear search.
– S#P# can lookup S# fairly easily, but not P#.
• Note that even though key of S#P# means can’t
lookup P# quickly, are some advantages in
include P# in key.
Dropping Indexes
• Sometimes makes sense to constantly reanalyze situation and add indexes
– DON’T FORGET TO DROP INDEXES!!
• Big overhead for inserts and deletes
– Always ask yourself: “Can I get rid of any of
these?”
– Drop INDEX <TABLE NAME> <Index name>
Index Tuning Wizard
• Hopefully you will evolve to the point you
don’t need to use this gadget
– But still can be quite handy
• Uses workload file generated using sql
server profiler (ch 19)
• Not ideal to depend on this tool, but it may
make some suggestions that you have not
thought of.
Maintaining Indexes
• Page Splits
– Insert/delete order and rate critical
• Fragmentation
– Not OS fragementation – e.g. defrag tool
– Happens when database grows, pages split,
and then data eventually deleted.
– Btrees great on maintaining balance on
insertions, but with deletes, can end up with
many pages containing small # of records.
Fragmentation Problems
• Wasted space
– Sql server allocates an extend at a time
• Could end up with an extent, containing single
page, with single record.
• Thrashing (way too many disk hits)
– Could end up with page 1 of data on one
extend, page 2 on another, page 3 on the
first, page 4 on another, ….
– Records all over the place
• Bit better for inserts but really bad for reads!
Identifying Fragmentation vs. page
splits
• DBCC SHOWCONTIG
– Page 283
– Demo with northwind
DBREINDEX & Fillfactor
• DBCC DBREINDEX
– Can drop index and rebuild
• Usually best to use drop-existing
– Completely rebuilds the index
– If supply table name, rebuilds all indexes on
table.
• Re-establishes base fillfactors etc.
– Strongly recommend disallow transactions
while doing this.
– Rebuilding is probably better.
Summary
• Clustered indexes usually faster than nonclustered
• Only place non-clustered indexes on
columns with high selectivity (>95% of
rows are unique on that column)
• All data manipulation language statements
can benefit, from indexes, but inserts,
deletes, and updates are slowed.
• Indexes take up space and require page
hits.
Summary
• Index used only if first column in index is
relevant to query
• Indexes can hurt as much as they help
– Make sure don’t add one by accident.
• Indexes can provided structured data
performance to unstructured XML, but
overhead involved.
Summary
• Is there a high level of selectivity on the
data?
– if yes and is frequently target of where clause,
then add index
• Have I dropped indexes I no longer need?
– Why not?
• Do I have a maintenance strategy
established?
– Why not?
Critical Questions
• Are there lots of inserts of modifications to
this table?
– If yes, keep indexes to minimum
• Is this a reporting table?
– E.g. not many inserts but lots of reports run many
different ways
– If yes, more indexes are fine.
• Is there a high level of selectivity on the data?
– If yes and is frequently target of where clause,
then add index