Trouble Shooting SQL First Steps
Download
Report
Transcript Trouble Shooting SQL First Steps
Trouble Shooting SQL
For the SQL User Group UK
Matthew Stephen
Systems Engineer
eSolutions
Microsoft Ltd
Agenda
What to do when trouble starts
Available resources
Researching error messages
Performance problems
When the trouble starts
Don’t make assumptions
Don’t ignore things you don’t understand
Don’t guess
Don’t trust second hand information
Don’t get tunnel vision
Collect evidence – cross reference
Collect all the facts:
– from SQLDiag.exe
Contains WinMSD, sp_configure, error logs and flight
recorder. see Q233332 for cluster.
SQL Agent error logs
Application and System event logs CSV or EVT format.
Cluster Logs see Q168801
All DMP files – same directory as err logs.
SQL Profiler trace – usually for performance
ODBC trace – remember to turn it off!
Reproducible scenario?
Log files from installation and service pack. SQLStp.log,
SQLSP.log, SQLClstr.log in %systemroot%
SQLDiag.txt
Collect all the symptoms:
How often does the problem occur?
What’s changed?
Has it ever worked?
Is there a pattern to the problem?
Are all users affected?
Client or server problem?
Is SQL Clustered?
What are the error messages?
Research
Books On Line!!!!
Query fundamentals, Advanced query concepts,
Optimising DB performance, Replication,
Building SQL Apps, SQL Server architecture,
Administering SQL server. Troubleshooter
MSDN
Technet
Platform SDK
Knowledge Base
Ask someone else
MSDN, Technet and Platform SDK
Have a machine with these on and VS
6.0/.NET
MSDN
TECHNET is best used on line
1Gb APIs, Win32 API, sample code
www.microsoft.com/technet
SQL Server resource kit
Platform SDK – header files
Esp. DtsPkg.h
WinError.h – for HRESULTS and SCODEs
(essentially the same thing)
Knowledge Base
http://support.microsoft.com
Don’t use phrases
Use exact error message
Start with error numbers
HRESULTs can take 5 different forms
0x80020005, 8002005, -2147352571,
2147352571
Custom HRESULTs generated by VB are
always offset by vbOjectError (2147221504)
Useful books
Inside Microsoft Windows 2000
http://www.microsoft.com/mspress/books/4354.asp
Hitchhiker’s Guide to Visual Basic & SQL
http://www.microsoft.com/mspress/books/1597.asp
Debugging Applications
http://www.microsoft.com/mspress/books/4023.asp
Programming ADO
http://www.microsoft.com/mspress/books/3445.asp
Designing Secure Web-Based Applications for Microsoft
Windows 2000
http://www.microsoft.com/mspress/books/4293.asp
Programming Distributed Applications with COM and
Microsoft Visual Basic 6.0
http://www.microsoft.com/MSPress/books/2137.asp
Developing XML Solutions
http://www.microsoft.com/mspress/books/3535.asp
Good external websites to
keep in touch with:
http://www.dejanews.com
This is an archive of online newsgroups. The
microsoft.public.* newsgroups are monitored by support
professionals. You may find your problem answered there.
http://www.microsoft.com/sql
Official SQL Server homepage. Go here for whitepapers, etc.
http://msdn.microsoft.com/sql
SQL Server MSDN home. Great development stuff here.
http://www.microsoft.com/data
This is the official home of MDAC.
http://www.sqlmag.com
Great online magazine for DTS stuff.
http://www.swynk.com/sysapps/sql.asp
Good technical articles.
And more
http://www.sqlwire.com
http://www.sqlservercentral.com
http://www.sqlteam.com
http://www.swynk.com
http://www.sql-server-performance.com/
http://www.sqldts.com
http://www.sqlsecurity.com
http://www.sqlpass.org
http://www.insidesqlserver.com
http://www.devx.com/gethelp
http://www.mssqlserver.com
Researching an Error Message
Often 99% of this effort involves obtaining the
description associated with an error number.
Visual Studio includes the Error Lookup tool
Use “net helpmsg msg#” from the command line to
look up operating system error numbers.
Just type in the error number (which can be a HRESULT)
and hey presto!
-[Microsoft][ODBC SQL Server
Driver][DBNETLIB]ConnectionOpen (Connect()).NativeError-231-SQLState-01000
Net helpmsg 231= ‘All pipe instances are busy’
WinError.h is included in Visual Studio and the
Platform SDK. It is the header file that defines
operating system errors. In addition, it includes a
very useful section that details how to interpret
HRESULTs / SCODEs
In WinError.h you’ll see the following logical structure of a HRESULT / SCODE:
Values are 32 bit values layed out as follows:
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---+-+-+-----------------------+-------------------------------+
|Sev|C|R|
Facility
|
Code
|
+---+-+-+-----------------------+-------------------------------+
where
Sev - is the severity code
00 – Success
01 – Informational
10 – Warning
11 – Error
C - is the Customer code flag
R - is a reserved bit
Facility - is the facility code
Code - is the facility's status code
The severity field will almost always be 11 (which is why the hexadecimal representation
of a HRESULT begins with an 8). The facility code is very important. It tells you which
sub-system returned the error.
Facility codes
following values can be used for facility:
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
FACILITY_WINDOWS
FACILITY_STORAGE
FACILITY_SSPI
FACILITY_SETUPAPI
FACILITY_RPC
FACILITY_WIN32
FACILITY_CONTROL
FACILITY_NULL
FACILITY_MSMQ
FACILITY_MEDIASERVER
FACILITY_INTERNET
FACILITY_ITF
FACILITY_DISPATCH
8
3
9
15
1
7
10
0
14
13
12
4
2
#define FACILITY_CERT
11
The most common facility codes used are
FACILITY_WIN32 (7), FACILITY_ITF (4) and FACILITY_NULL (0).
FACILITY_WIN32
Error codes from functions in the Win32 API.
E.g. 0x8007xxxx result of a Win32 API error.
Strip out xxxx, convert it to decimal and use net helpmsg
to look up the error description directly.
For example: DTS reports an HRESULT of
0x8007007B
Strip out the severity and the facility code (which we can
see is FACILITY_WIN32) which leaves us with 0x007B.
Convert 0x007B to decimal which is 123.
Net helpmsg 123 returns: “The filename, directory name,
or volume label syntax is incorrect”.
We now know straight away that the user probably typed
in an incorrect filename or directory name as a parameter
to DTSRun.EXE.
FACILITY_ITF
Used for most status codes returned from interface
methods.
Meaning of the error is defined by the interface.
For example, a client OLE DB application may
receive 0x80040E07. As this has a facility code of
FACILITY_ITF (4) you now know that this error has
been reported by a specific interface, which in this
case is OLE DB.
From OleDbErr.h you’ll find that the macro for this
HRESULT is DB_E_CANTCONVERTVALUE and that
the error description is:
“Data or literal value could not be converted to the type
of the column in the data source, and the provider was
unable to determine which columns could not be
converted. Data overflow or sign mismatch was not the
cause”.
FACILITY_NULL
Used for common HRESULT errors. This is where
the infamous 0x80040005 comes from:
“Unspecified error”.
Other useful FACILITY_NULL HRESULTs are (see
WinError.h for a complete list):
0x80000002 “E_OUTOFMEMORY”
Ran out of memory
0x80000003 “E_INVALIDARG”
One or more arguments are invalid
0x80000005 “E_POINTER”
Invalid pointer
0x80000009 “E_ACCESSDENIED”
General access denied error
Convert HRESULTs to Hex
You may also see an HRESULT reported as its
decimal representation which is e.g.-2147024773.
As HRESULTS are easier to work with in their
hexadecimal form.
HRESULTS are interpreted as signed long integer
data types.
The binary value is converted to a decimal value using
twos complement.
With twos complement the leftmost bit is used to indicate
the sign of the number: 1 is negative, 0 is positive.
If the sign of the number is negative, the rest of the bits
are flipped from their positive representation and one is
added (this prevents two possible forms of 0 i.e. +0 and -0
which is the case with ones complement).
Convert from negative decimal to
hexadecimal (twos complement):
Subtract one from the positive decimal
representation:
2147024773 becomes 2147024772
Convert the decimal to its binary equivalent (binary
is easier to work with when you arrange the digits
in groups of four starting from the right):
2147024772 becomes 111 1111 1111 1000 1111 1111
1000 0100
Flip all the bits:
000 0000 0000 0111 0000 0000 0111 1011
Replace the negative with a 1:
1000 0000 0000 0111 0000 0000 0111 1011
Convert the groups of four binary digits to their
hexadecimal value:
0x8007007B
Convert from hexadecimal to
decimal
Convert the hexadecimal to its binary equivalent:
0x8007007B becomes 1000 0000 0000 0111 0000
0000 0111 1011
Strip out the leading bit:
000 0000 0000 0111 0000 0000 0111 1011
Flip all the bits:
111 1111 1111 1000 1111 1111 1000 0100
Convert to decimal:
2147024772
Add one:
2147024773
Add the negative sign:
-2147024773
Access Violation or Assertion
Failure
An access violation is an attempt by the process to
execute a memory operation (i.e. either read / write
to a location in memory) that is not allowed by the
underlying page protection (i.e. the process has
not allocated that memory first or that memory has
been marked read-only)
An assertion is a programmatic mechanism used
to detect extraordinary conditions during
processing. It is basically a checkpoint that makes
sure that certain conditions must be met at a
certain point in the application.
SQL Server treats them both in the same way by
dumping the call stack.
The call stack
SQL Server basically “catches” the exception and dumps the
call stack and some memory to the errorlog.
Once you have determined what type of exception you have,
the next step is to read the call stack.
The readability of a call stack is totally dependant on having
the correct symbols installed. The call stack relies on the
symbol file (sqlservr.dbg for SQL Server 6.5 and sqlservr.pdb
for SQL Server 7.0 / 2000) present.
The symbol file is created when SQL Server is built so for
every build of SQL Server there is a unique symbol file to go
with it. This also means that each SQL Server service pack
updates the symbol file as well.
If the call stack is empty, are contains functions that seem
incorrect when compared to their memory location chances
are that the symbol file is not correct.
AVs - usually an MS problem
Dodgy hardware can cause AVs – these will
usually be very random
Dodgy third party in process extensions
Consistant AVs are most likely an MS fault.
Search the knowledge base and look for the
‘Exception Address = 7801166c’
Function at the top of the short stack dump in the
error log
Phone Microsoft support – it’s free for bugs!
Performance problems
Poor indexes
Transaction log not on own drive
Blocking - Q271509
Interfering processes
Recomplilation – temp tables
INF: Troubleshooting Application Performance with SQL Server
[sqlserver] ID: Q224587
Contains useful profiler events
Bad Server configuration
Perfmon/Task Manager
If in any doubt use defaults
Loads of good stuff in the knowledge base
Server side cursors
Do you need:
Auto grow shrink
Auto stats
Dynamic memory
Poor Indexes
Use the index tuning wizard
Better than an expert!
Tune select sets of queries at once
Exclude complicated queries to isolate
extremes
Based on MS Research technology
Non clustered indexes use the
clustered index!
Do you really need a clustered index?
Sample I/O Subsystem Layout
18GB drives
FileA.mdb
FileB.mdb
FileC.mdb
FileD.mdb
50% read
50% write
FileE.mdb
FileF.mdb
FileG.mdb
FileH.mdb
DB LOG, OS
M
Y
F
I
L
E
G
R
O
U
P
All Tables
and all
indexes
On single
filegroup
Important PerfMon Counters – see BOL
Monitoring Disk I/O & Detecting Excess Paging
Isolating Disk Activity Created by SQL Server
SQL Server: Buffer Manager Page Reads/sec
SQL Server: Buffer Manager Page Writes/sec
Monitoring CPU Usage
PhysicalDisk: % Disk Time
PhysicalDisk: Avg. Disk Queue Length
Memory: Page Faults/sec
GET YOUR
Processor:% Processor Time
Others
BASELINE!
Processor: % Privileged Time
Processor: %User Time
System: Processor Queue Length
Monitoring Memory Usage
Memory: Available Bytes
Memory: Pages/sec
Important PerfMon
Counters
Isolating Memory Used by SQL Server
SQL Server: Buffer Manager: Free Buffers
SQL Server: Memory Manager: Total Server
Memory (KB)
SQL Server: SQL Statistics:
Recompilations/sec
SQL Server: Buffer Manager: Buffer cache
hit ratio
SQL Server: Buffer Manager: Readahead
pages/sec