Transcript statistics

Statistics That Need Special
Attention
Joe Chang
www.qdpma.com
Jchang6 @ yahoo
http://www.qdpma.com/CBO/Statistics.html
About Joe
•
•
•
•
•
SQL Server consultant since 1999
Query Optimizer execution plan cost formulas (2002)
True cost structure of SQL plan operations (2003?)
Database with distribution statistics only, no data 2004
Decoding statblob/stats_stream
– writing your own statistics
• Disk IO cost structure
• Tools for system monitoring, execution plan analysis
See http://www.qdpma.com/
Download: http://www.qdpma.com/ExecStatsZip.html
Blog: http://sqlblog.com/blogs/joe_chang/default.aspx
Statistics – Special Attention
• What works automatically?
– No need for special attention
• What does not work by default
– or simple adjustments, trace flags etc.
– What could cause spectacular failures
• Ok, now what do we do?
– scheduled jobs, triggered jobs, in-procedure,
optimize for “fake data” Other?
Topics
• Quick Statistics Overview
– Details available elsewhere
• Weak point of SQL Server Statistics
– Recompute set points
– Sampling
– Compile parameter etc.
• Options – Trace 2371, 4136
• Workarounds
SQL Performance
SQL
Tables
natural keys
Indexes
Query
Optimizer
DOP Memory
Parallel plans
API Server Cursors:
open, prepare,
execute, close?
SET NO COUNT
Information
messages
Execution
Plan
Storage
Engine
Hardware
Statistics
& Compile
parameters
Compile
Row estimate
propagation
errors
Recompile
temp table /
table variable
Index & Stats
Maintenance
Tables and SQL combined
implement business logic
Natural keys with unique
indexes, not SQL
Index and Statistics
maintenance policy
1 Logic may need more than
one execution plan?
Compile cost versus
execution cost?
Plan cache bloat?
The Execution Plan links all the elements of performance
Index tuning alone has limited value
Over indexing can cause problems as well
Factors to Consider
SQL
Tables
Indexes
Statistics
Query
Optimizer
Storage
Engine
Hardware
Compile
Parameters
DOP
memory
STATISTICS
SQL Server Statistics
• Principles: it should just work
• Statistics are automatically
– created and updated
• Indexes and columns
Statistics Used by the Query Optimizer in Microsoft SQL Server 2008
http://technet.microsoft.com/en-us/library/dd535534(v=sql.100).aspx
Eric N. Hanson and Yavor Angelov
Contributor Lubor Kollar
DBCC SHOW_STATISTICS
DBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX)
header
Density vector
Histogram
Options:
WITH STAT_HEADER, DENSITY VECTOR, HISTOGRAM, STATS_STREAM
binary
Density Vector
• Statistics binary storage structure allows for
30 rows in density vector
• Full nonclustered index key consists of
explicitly declared columns plus cluster key
columns (not in NC key)
• Limit of 15 columns in clustered index key
• And 15 columns in nonclustered index key
Histogram
• Up to 200 steps
• Captures low and high bounds
• Attempts to capture skewed distribution
– frequent values equal rows, others in range rows
UPDATE STATISTICS – STATS_STREAM
DBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX) WITH STATS_STREAM
UPDATE STATISTICS LINEITEM(L_SHIPDATE_CLUIDX)
WITH STATS_STREAM =
0x010000000300000000000000000000000792A163000000009A128038F3...
, ROWCOUNT = 59986052
, PAGECOUNT = 574971
Statistics Structure
• Stored (mostly) in binary field
Scalar values
Density Vector – limit 30, half in NC, half Cluster key
Histogram
Up to 200 steps
Consider not blindly using IDENTITY on critical tables
Example: Large customers get low ID values
Small customers get high ID values
http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx
Statistics
• Auto - re-compute at 20% (first 6, 500 rows)
• Sampling strategy
– How much to sample - theory?
– Random pages versus random rows
– Histogram Equal and Range Rows
– Out of bounds, value does not exist
– etc.
Statistics Used by the Query Optimizer in SQL Server 2008
Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar
Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator
Joseph Sack
http://msdn.microsoft.com/en-us/library/dd535534.aspx
System tables, views, functions
sys.stats
sys.stats_columns
sys.dm_db_stats_properties (2008R2 sp2, 2012 sp1, 2014 RTM)
;WITH k AS (
SELECT k.object_id, k.stats_id, k.stats_column_id, c.column_id, c.name
FROM sys.stats_columns k
INNER JOIN sys.columns c ON c.object_id = k.object_id AND c.column_id = k.column_id
)
SELECT s.name, o.name, d.name, d.stats_id, d.auto_created
, ISNULL(STUFF(( SELECT ', ' + name
FROM k WHERE k.object_id = d.object_id AND k.stats_id = d.stats_id
ORDER BY k.stats_column_id FOR XML PATH(''), TYPE,
ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Keys
, p.rows, p.rows_sampled, p.steps, p.modification_counter, p.last_updated
FROM sys.objects o JOIN sys.schemas s ON s.schema_id = o.schema_id
JOIN sys.stats d ON d.object_id = o.object_id
OUTER APPLY sys.dm_db_stats_properties(d.object_id , d.stats_id) p
WHERE o.is_ms_shipped = 0
ORDER BY s.name, o.name, d.stats_id
;WITH b AS (
SELECT d.object_id, d.index_id, part = COUNT(*)
, reserved = 8*SUM(d.reserved_page_count)
, used = 8*SUM(d.used_page_count )
, in_row_data = 8*SUM(d.in_row_data_page_count)
, lob_used = 8*SUM(d.lob_used_page_count)
, overflow = 8*SUM( d.row_overflow_used_page_count)
, row_count = SUM(row_count)
, notcompressed = SUM(CASE data_compression WHEN 0 THEN 1 ELSE 0 END)
, compressed = SUM(CASE data_compression WHEN 0 THEN 0 ELSE 1 END) -- change to 0 for SQL Server 2005
FROM sys.dm_db_partition_stats d WITH(NOLOCK)
INNER JOIN sys.partitions r WITH(NOLOCK) ON r.partition_id = d.partition_id
GROUP BY d.object_id, d.index_id
), j AS (
SELECT j.object_id, j.index_id, j.key_ordinal, c.column_id, c.name,is_descending_key
FROM sys.index_columns j
INNER JOIN sys.columns c ON c.object_id = j.object_id AND c.column_id = j.column_id
)
SELECT t.name, o.name, ISNULL(i.name, '') [index]
, ISNULL(STUFF(( SELECT ', ' + name + CASE is_descending_key WHEN 1 THEN '-' ELSE '' END
FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal >0
ORDER BY j.key_ordinal FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Keys
, ISNULL(STUFF(( SELECT ', ' + name
FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal = 0
ORDER BY j.column_id FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Incl
, i.index_id
, CASE WHEN i.is_primary_key = 1 THEN 'PK'
WHEN i.is_unique_constraint = 1 THEN 'UC'
WHEN i.is_unique = 1 THEN 'U'
WHEN i.type = 0 THEN 'heap'
WHEN i.type = 3 THEN 'X'
WHEN i.type = 4 THEN 'S' ELSE CONVERT(char, i.type) END typ
, i.data_space_id dsi
, b.row_count , b.in_row_data in_row , b.overflow ovf , b.lob_used lob
, b.reserved - b.in_row_data - b.overflow -b.lob_used unu
, 'ABR' = CASE row_count WHEN 0 THEN 0 ELSE 1024*used/row_count END
, y.user_seeks, y.user_scans u_scan, y.user_lookups u_look, y.user_updates u_upd
, b.notcompressed ncm , b.compressed cmp
, rw_delta = b.row_count - s.rows, s.rows_sampled --, s.unfiltered_rows
, s.modification_counter mod_ctr, s.steps
, CONVERT(varchar, s.last_updated,120) updated
, i.is_disabled dis, i.is_hypothetical hyp, ISNULL(i.filter_definition, '') filt
FROM sys.objects o JOIN sys.schemas t ON t.schema_id = o.schema_id
JOIN sys.indexes i ON i.object_id = o.object_id
LEFT JOIN b ON b.object_id = i.object_id AND b.index_id = i.index_id
LEFT JOIN sys.dm_db_index_usage_stats y ON y.object_id = i.object_id AND y.index_id = i.index_id AND y.database_id = DB_ID()
OUTER APPLY sys.dm_db_stats_properties(i.object_id , i.index_id) s
WHERE o.is_ms_shipped = 0
Statistics Auto/Re-Compute
• Automatically generated on query compile
• Recompute at 6 rows, 500, every 20%?
http://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/
Understanding When Statistics Will Automatically Update
Has this changed? 2008 R2
Trace 2371 – lower threshold auto recomputed for large tables
http://support.microsoft.com/kb/2754171
Statistics Sampling
• Sampling theory
– True random sample
– Sample error - square root
N
• Relative error 1/ N
• SQL Server sampling
– Random pages
• But always first and last page???
– All rows in selected pages
Query Optimizer Assumptions
• Statistics has captured lower and upper
bounds
• If there are no range rows between range hi
keys, then it knows there are no rows in
between
• The page sampling method compensates for
lack of true random row sample
Hypothesis
• If we insert/update rows,
• then dm_db_stats_properties
modification_counter > 0
• Should not we assume that there is new data?
Row Estimate Problems (at source)
• Skewed data distribution
– Does query compile with low, medium or high
skew parameter value?
– What about subsequent executes?
• Errors – due to random page sample
• Out of bounds
• Value does not exist
Row estimate errors at source – is classified under statistics topic
Problems in Statistics
• Scenario – update all statistics with default
• Insert rows, new (int / identity) values that are
higher than upper bound at time of stats
• if plan compiles with previously existing value
– Then fine
• If plan compiles with new value,
– Then optimizer knows value “does not exist”
– But plan will show 1 row
SELECT * FROM PART
WHERE P_PARTKEY = 1999999
SELECT * FROM PART
WHERE P_PARTKEY = 2000001
SELECT * FROM LINEITEM
WHERE L_PARTKEY = 2000001
SELECT * FROM PART
JOIN LINEITEM ON L_PARTKEY = P_PARTKEY
WHERE P_PARTKEY = 2000001
SELECT * FROM PART
JOIN LINEITEM ON L_PARTKEY = P_PARTKEY
WHERE P_PARTKEY = 1999999
Loop Join - Table Scan on Inner Source
Estimated out from first 2 tabes (at right) is
zero or 1 rows. Most efficient join to third
table (without index on join column) is a loop
join with scan. If row count is 2 or more, then
a fullscan is performed for each row from
outer source
Default statistics rules may lead to serious ETL issues
Consider custom strategy
Limited distinct values
• Column has less than 200 distinct values
– Statistics via histogram knows the exact values
– i.e., it also knows what values do not exist
• Now insert rows with different values
– If >= 20%, then statistics update is triggered
– Next query w/explicit SARG on new value good
• Ex WHERE bIsProcessed = 0
– If too few row to trigger stats, then bad news
Compile Parameter Not Exists
Main procedure has cursor around view_Servers
First server in view_Servers is ’CAESIUM’
Cursor executes sub-procedure for each Server
sql:
SELECT MAX(ID) FROM TReplWS
WHERE Hostname = @ServerName
But CAESIUM does not exist in TReplWS!
Good and Bad Plan?
SqlPlan Compile Parameters
SqlPlan Compile Parameters
<?xml version="1.0" encoding="utf-8"?>
<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)
FROM TReplWS WHERE Hostname = @ServerName"
StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1"
StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2"
StatementOptmEarlyAbortReason="GoodEnoughPlanFound">
<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true"
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />
<QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168">
<RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar"
EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“
AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0">
</RelOp>
<ParameterList>
<ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" />
</ParameterList>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
Compile parameter values at bottom of sqlplan file
Microsoft Responds (Empire Strikes Back)
http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-toautomatic-update-statistics-in-sql-server-traceflag-2371.aspx
Changes to automatic update statistics in SQL Server – traceflag 2371
Trace flag 4136
http://blogs.msdn.com/b/axinthefield/archive/2010/11/04/sql-server-traceflags-for-dynamics-ax.aspx
QL Server Trace Flags for Dynamics AX
Disables use of the histogram
2371
http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspx
2014
The New and Improved Cardinality Estimator in SQL Server 2014
http://blogs.technet.com/b/dataplatforminsider/archive/2014/03/17/th
e-new-and-improved-cardinality-estimator-in-sql-server-2014.aspx
Filtered Stats and CE Model Variation
http://www.queryprocessor.com/ce_filteredstats/
http://www.somewheresomehow.ru/
Dima Puligin
Upper Bound Problem
• Insert dummy row with maximum value
– SET IDENTITY_INSERT table ON
– int 2,147,483,647, datetime Dec 31, 9999
• OPTION (OPTIMIZE FOR (@P1 UNKNOWN))
• SQL Server 2014 – not a problem
Temp Table and Table Variable
• Forget what other people have said
– Most is cr@p
• Temp Tables – subject to statistics auto/re-compile
• Table variable – no statistics, assumes 1 row
• Question: In each specific case: does the statistics
and recompile help or not?
– Yes: temp table
– No: table variable
Is this still true?
Multiple joins to one table
SELECT xxx
FROM Cession.RetroTransaction rt
INNER JOIN Common.Code c1 ON c1.CodeId = rt.TransactionTypeId
INNER JOIN Cession.NARSplit ns ON ns.NARSplitId = rt.NARSplitId AND ns.CessionId = rt.CessionId
INNER JOIN CRM.Company comp ON comp.CompanyId = ns.CompanyId
INNER JOIN Cession.RetroTransactionPeriod rtp ON rt.RetroTransactionPeriodId = rtp.RetroTransactionPeriodId
LEFT OUTER JOIN Cession.RetroTransactionAllocation BasePremium WITH ( NOLOCK ) ON BasePremium.RetroTransactionId = rt.RetroTransactionId
AND BasePremium.AllocationTypeId = '02C48EDA-57ED-4704-99CD-6122AE372B3D'
AND BasePremium.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'
LEFT OUTER JOIN Cession.RetroTransactionAllocation BaseAllowance WITH ( NOLOCK ) ON BaseAllowance.RetroTransactionId = rt.RetroTransactionId
AND BaseAllowance.AllocationTypeId = 'CA5047A3-43CD-496B-8F71-3E95E209806E'
LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFee WITH ( NOLOCK ) ON PolicyFee.RetroTransactionId = rt.RetroTransactionId
AND PolicyFee.AllocationTypeId = 'FCD31932-840F-4BCA-99DC-F55BD46E46AF'
AND PolicyFee.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'
LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFeeAllowance ON PolicyFeeAllowance.RetroTransactionId = rt.RetroTransactionId
AND PolicyFeeAllowance.AllocationTypeId = 'E570DA71-6B6C-4742-AB8B-DDFDB5286938'
LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtra WITH ( NOLOCK ) ON TempFlatExtra.RetroTransactionId = rt.RetroTransactionId
AND TempFlatExtra.AllocationTypeId = '82CB2D45-904F-4B50-8C36-A39CD2F33D9D'
AND TempFlatExtra.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'
LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtraAllowance ON TempFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId
AND TempFlatExtraAllowance.AllocationTypeId = '43E49878-9AC3-4A1F-A13F-19A1BA7D3AFA'
LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtra WITH ( NOLOCK ) ON PermFlatExtra.RetroTransactionId = rt.RetroTransactionId
AND PermFlatExtra.AllocationTypeId = '889639AB-620C-4234-9B9E-2970A4500B57'
AND PermFlatExtra.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'
LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtraAllowance ON PermFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId
AND PermFlatExtraAllowance.AllocationTypeId = '43DFA681-44FB-4025-B75A-1E12498A679B'
LEFT OUTER JOIN Cession.RetroTransactionAllocation CashSurrender WITH ( NOLOCK ) ON CashSurrender.RetroTransactionId = rt.RetroTransactionId
AND CashSurrender.AllocationTypeId = 'BCBDF86C-DCC5-49DA-BD2E-CD42CE402E99'
LEFT OUTER JOIN Common.Code splittype ON splittype.CodeId = ns.SplitTypeId
WHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f8868852af5'
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.2500.0"
xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1“
StatementEstRows="5004190000“
StatementId="1" StatementOptmLevel="FULL“
StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="44029"
StatementText="SELECT rt.RetroTransactionId, FROM Cession.vwRetroTransaction AS rt
WHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f8868852af5'"
StatementType="SELECT" ParameterizedText="(@1 nvarchar(4000)) " QueryHash="0x19274C9794934C69"
QueryPlanHash="0x80E6A4E14DF1DC0A">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true"
CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan CachedPlanSize="896" CompileTime="296392"
CompileCPU="296383" CompileMemory="379856">
<RelOp AvgRowSize="365" EstimateCPU="133.341" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0“
EstimateRows="5004190000" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism“
EstimatedTotalSubtreeCost="44029">
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.2500.0“
xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1“
StatementEstRows="24.2411“
StatementId="1" StatementOptmLevel="FULL“
StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="0.683036“
StatementText="SELECT rt.RetroTransactionId FROM Cession.vwRetroTransaction rt WHERE rt.CessionId=@1“
StatementType="SELECT" QueryHash="0x19274C9794934C69" QueryPlanHash="0xD5FF47FBFB2AD6AE">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true"
NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="1" MemoryGrant="2720“ CachedPlanSize="232“
CompileTime="51" CompileCPU="51" CompileMemory="2712">
<RelOp AvgRowSize="365" EstimateCPU="0.000101328" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0"
EstimateRows="24.2411" LogicalOp="Left Outer Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops"
EstimatedTotalSubtreeCost="0.683036">
plan