Transcript Slide 1
SQL Server Storage and Index Structures Physical Data Organization Indexes B-Trees SQL Server Data Access Clustered and Non-Clustered Creating, Altering, Dropping Indexes Choosing your Indexes Maintaining your Indexes SQL Server Storage Hierarchy • Database • Extent – 8 contiguous 64K data pages – Once extent full, next record will take up a whole additional extent. – Pre-allocating space saves time. SQL Server Storage Hierarchy • Page – 64K bytes – # of records/page varies with bytes/record – Types of Pages: Data and Index pages – Page Split • When page becomes full, it splits – New page allocated: ½ data from old page moved to new • Rows – 8060 bytes and 1024 columns B-tree Key Points to Remember • Tree portion includes key attributes only – ordered as in create index statement • Keys packed in index pages – Fewer bytes per key -> more keys per page/extent -> fewer page faults per access. • Clustered indexes have records at leafs – Records are in data pages – Data pages sequentially linked • Non-Clustered indexes point into heap or tree portion of clustered index Create Index Statement • Create [unique] [clustered | nonclustered] index <indexName> on <table or view> (col. Name> [asc|desc] [,…]) include (<col name> [,…]) (with … pad_index, fillfactor, ignore_dup_key drop_existing, statistics_norecompute sort_in_tempDB, online, allow_row_locks, allow_page_locks, maxdop Create Index Details • Asc/Desc – Ascending & descending sort order for index • Include (cool!) – Includes col in leaf nodes of clustered index • Allows very fast access to non-key attribute • Useful with very large record – fewer page faults Create Index “with” Details • Pad_Index= (on|off) – Initial fill-factor for index’s non-leaf pages • Fill Factor = <1% – 100%> – Default is index pages are as full as possible minus two records – Fill factor is how full after index is created • Once split goes to 50% • Ignore_dup_key – Circumvent unique key constraint somewhat • Still get error message, but no rollback • useful for storing unique values but trashing transactions Create Index “with” Details • Drop_Existing – Any existing index with same name is dropped with this create statement • More efficient than drop index followed by create for clustered index as no need to touch nonclustered indexes or data pages • Statistics_nonrecompute – Default: sql server automates the process of updating the statistics on tables/ indexes – This option says you will maintain stats • DON’T USE THIS! Create Index “with” Details • Sort_In_tempdb – Only useful when tempdb on physically separate drive – Reads/write for sort compete with read/writes to write data and index pages • This make sense if and only if you understand disk writes - discussion • Online – Keeps table available to users while creating index – sounds good, but ….!! Create Index “with” Details • Allow row/page locks – Don’t use unless really good • MAXDOP – Overrides system setting for max degree of parallelism while building index • How many processes are used to construct an index. MAXDOP sets limit on how many processors per operation. – Compare and contrast these terms Create Index “with” Details • ON – Can store index separately from data • Space for index spread across drives • I/O for indexes not compete with physical data retrieval XML Indexes • Indexes into XML data – Xml VERY unstructured – Column can be of type xml in sql server – Create index on xml column – Page 276 for more details Implied indexes • created by some constraints – Primary Key – Unique • Can easily end up with duplicate constraints and not realize it Deciding what indexes go where? • Indexes speed access, but costly to maintain – Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more – Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried? Deciding what indexes go where? • Selectivity – Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity – If 80% of parts are either ‘red’ or ‘green’ not very selective Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index – Do you really want your prime key to be clustered index? – Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) – Changing clustered index can be costly • How long? Do I have enough space? Clustered Indexes Pros & Cons • Pros – Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count – Search can go straight to particular point in data and just keep reading sequentially from there. – Clustered indexes helpful with order by based on clustered key Clustered Indexes Pros & Cons • The Cons – two situations – Don’t use clustered index on column just because seems thing to do (e.g., primary key default) – Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps? Column Order Matters • (P#, S#, Qty) – P# S# together are primary key • One index that includes all columns is not useful in all situations! – Only end up storing data a second time. • Clustered index of P#S# not same as S#P# – P#S# can lookup P# fairly easily, but looking up S# requires a linear search. – S#P# can lookup S# fairly easily, but not P#. • Note that even though key of S#P# means can’t lookup P# quickly, are some advantages in include P# in key. Dropping Indexes • Sometimes makes sense to constantly reanalyze situation and add indexes – DON’T FORGET TO DROP INDEXES!! • Big overhead for inserts and deletes – Always ask yourself: “Can I get rid of any of these?” – Drop INDEX <TABLE NAME> <Index name> Index Tuning Wizard • Hopefully you will evolve to the point you don’t need to use this gadget – But still can be quite handy • Uses workload file generated using sql server profiler (ch 19) • Not ideal to depend on this tool, but it may make some suggestions that you have not thought of. Maintaining Indexes • Page Splits – Insert/delete order and rate critical • Fragmentation – Not OS fragementation – e.g. defrag tool – Happens when database grows, pages split, and then data eventually deleted. – Btrees great on maintaining balance on insertions, but with deletes, can end up with many pages containing small # of records. Fragmentation Problems • Wasted space – Sql server allocates an extend at a time • Could end up with an extent, containing single page, with single record. • Thrashing (way too many disk hits) – Could end up with page 1 of data on one extend, page 2 on another, page 3 on the first, page 4 on another, …. – Records all over the place • Bit better for inserts but really bad for reads! Identifying Fragmentation vs. page splits • DBCC SHOWCONTIG – Page 283 – Demo with northwind DBREINDEX & Fillfactor • DBCC DBREINDEX – Can drop index and rebuild • Usually best to use drop-existing – Completely rebuilds the index – If supply table name, rebuilds all indexes on table. • Re-establishes base fillfactors etc. – Strongly recommend disallow transactions while doing this. – Rebuilding is probably better. Summary • Clustered indexes usually faster than nonclustered • Only place non-clustered indexes on columns with high selectivity (>95% of rows are unique on that column) • All data manipulation language statements can benefit, from indexes, but inserts, deletes, and updates are slowed. • Indexes take up space and require page hits. Summary • Index used only if first column in index is relevant to query • Indexes can hurt as much as they help – Make sure don’t add one by accident. • Indexes can provided structured data performance to unstructured XML, but overhead involved. Summary • Is there a high level of selectivity on the data? – if yes and is frequently target of where clause, then add index • Have I dropped indexes I no longer need? – Why not? • Do I have a maintenance strategy established? – Why not? Critical Questions • Are there lots of inserts of modifications to this table? – If yes, keep indexes to minimum • Is this a reporting table? – E.g. not many inserts but lots of reports run many different ways – If yes, more indexes are fine. • Is there a high level of selectivity on the data? – If yes and is frequently target of where clause, then add index