Transcript Slide 1

10 DOs and 10 DON’Ts
for the SQL Server CLR
Matt Whitfield
Who am I?
• A geek, mostly
• I hang about on
ask.sqlservercentral.com
• I provide (now free-of-charge)
SQL Server tools
• I help run the #SQLSoton user
group
• I work for a small company in
Southampton
What’s this presentation about?
• The CLR is a hugely untapped resource in
SQL Server
• I’m on a bit of a mission to make people
more aware of what you can do and how
you can do it best
• This presentation addresses the ‘how you
can do it best’ bit…
…at a ‘starter’ level
So what is the CLR anyway?
• CLR = Common Language Runtime
• In the SQL Server world, we use the term
CLR to refer to database objects that are
implemented in a CLR language (C#,
VB.NET …)
• We can make procedures, functions, userdefined types, user-defined aggergates
and triggers
DO #1:
Know your transactions
• The easiest way to wrap your transactions
in the CLR is using the
TransactionScope() class
• If you are just using a single connection,
then a local transaction will be issued
• However, if you connect using a different
connection string (even in the same
database) a distributed transaction will be
enlisted
DON’T #1:
Add strings together
• Basic concept in all .NET code
• string is Immutable
– An instance of a string cannot be changed
– Special compiler support for strings can make
it look like you can change it
• Adding strings together places load on the
managed heap and garbage collector
• Use StringBuilder instead
DO #2:
Know structs v classes
• In .NET, there are two types of object
– Reference types
– Value types
• Reference types are passed around by
their reference – functions will operate on
the same object
• Value types are passed around in their
entirety – functions will operate on a copy
of the object
DO #2:
Know structs v classes
• What does that mean?
DEMO TIME
Any volunteers?
DON’T #2:
Go overboard
• The CLR support in SQL Server is a great
programming tool
• It allows you to perform an extremely wide
variety of operations
• It also, therefore, allows you to do dumb
stuff
• Don’t start trying to implement all your
procedures in CLR for the sake of it
DON’T #2:
Go overboard
• Don’t wrap your T-SQL in a CLR
procedure that just passes that T-SQL
back to the engine (i.e. don’t make CRUD
procedures in the CLR)
• Don’t create CLR types when they don’t
really add value to the task at hand –
remember there is an overhead
DO #3:
Dispose your IDisposables
• If an object implements IDisposable, then it
expects you to call Dispose() on it when you are
finished
• The idea is that Dispose() gives the object the
chance to release unmanaged resources early –
before garbage collection
• You don’t necessarily know why an object is
IDisposable, so you need to trust that it is
necessary
• There is huge misconception about this
DO #3:
Dispose your IDisposables
• The easiest way to guarantee disposal is
through the using() {} block.
• When you exit the scope of a using block,
the resource that you allocated in that
block is guaranteed to be disposed, no
matter how you exit the block (Exception,
return)
• You can find incorrect examples of this all
over the web, MSDN included
DO #3:
Dispose your IDisposables
DEMO TIME
DON’T #3:
Forget that data scales
• Data is often bigger in your production
environment than in your test environment
• Don’t implement anything that assumes a
fixed size of data (e.g. a static array
without bounds checking)
• Don’t assume that an access method that
is fastest with a small amount of data will
be fastest with a large amount of data
DO #4:
Aim to write SAFE code
• Permission sets control what your CLR code can
do
• SAFE is a restrictive permission set, but
targeting it means that you cannot affect the
stability of the SQL Server Process
• EXTERNAL_ACCESS allows access to
resources outside of SQL Server, but be careful
not to introduce unnecessary waits
• Web service call in a trigger – I’m looking right at
you
DO #4:
Aim to write SAFE code
• UNSAFE is called unsafe for a reason.
• You need to be very sure about what you
are doing under the UNSAFE permission
set.
• How long will that static variable live for?
• What exactly is happening when another
thread is started?
DON’T #4:
Treat NULL as an afterthought
• NULLs happen
• Always assume that you will receive a NULL
value as a parameter or in data that you read
back from the database
• Use nullable types to wrap simple types in your
data structures if you need to
• The Sqlxxx types all represent NULL with the
IsNull property
• Nullable types represent NULL with the
HasValue property
DO #5:
Understand the GAC-lack
• Only ‘blessed’ assemblies can be loaded
from the GAC
• Other assemblies need to be loaded
through CREATE ASSEMBLY
• This can create a maintenance issue if
application layer code shares an assembly
with the database layer code
DO #5:
Understand the GAC-lack
•
•
•
•
•
•
•
•
•
Mscorlib.dll
System.Data.dll
System.dll
System.Xml.dll
System.Security.dll
System.Web.Services.dll
System.Data.SqlXml.dll
System.Transactions.dll
System.Configuration.dll
•
•
•
•
Microsoft.VisualBasic.dll
Microsoft.VisualC.dll
CustomMarshalers.dll
System.Data.OracleClient.dll
DON’T #5:
Use finalisers
• Finalisers are the .NET equivalent of
destructors – code which is fired when an
object is freed.
• In .NET, garbage collection does the free,
and so calls your finaliser for you
• This means that your class is always
promoted to a Generation 1 collection…
• What on earth is that?
DON’T #5:
Use finalisers
• Garbage collection is done in generations
0, 1 and 2
• Generation 0 is the cheapest, implying the
least work for the garbage collector
• A ‘dead’ object with a finaliser is always
skipped – guaranteeing promotion to
Generation 1
• All objects referenced by the object with
the finaliser are also kept alive
DO #6:
Understand managed memory
• Managed memory isn’t the same as Native
memory
• The managed memory used by the CLR
comes from the MemToLeave address
space
• Moving objects between native and
managed memory is called marshalling,
and it has a cost associated with it
DON’T #6:
Reinvent the wheel
• Does your boss like the ‘Just Do It’
attitude?
• It works for Nike, for coding not so much
• Take the time to find out if the function you
want already exists (it probably does)
• If you can’t find it, try community sites –
they are immensely helpful
DO #7:
Use the context connection
• If you need to get data from the database
that your code is running against…
…use the context connection
• Connection string is:
“context connection=true;”
• Connecting using a standard connection
requires elevation to
EXTERNAL_ACCESS
DON’T #7:
Access data in scalar functions
• CLR Scalar Functions can be a lot faster
than their T-SQL equivalents, particularly
for string manipulations and complex
procedural logic
• Accessing data in scalar functions is not
cool
• You will absolutely kill performance by
setting up a connection, running a query,
returning the result
DO #8:
Write efficient code
• If you are running CLR code in SQL Server, then
you are running it in the place within your
architecture that is hardest to scale
• It might be easy to write a brute-force algorithm,
but when will it become a problem?
• When you’re on holiday – and you certainly don’t
want to spend the time on re-writing the
algorithm to be more efficient then
• Put your code in a normal app to profile it
DON’T #8:
Over optimise
• Hang on… didn’t you just say to write
efficient code?
• Yes, but there’s a difference between
efficient and unintelligible
• Simple, well structured code often has
suitable performance characteristics
• Don’t spend 90% of the time on the last
10% performance gain
• Find and solve the major bottlenecks first
DO #9:
Understand boxing
• No, not the sport, or the day after
christmas
• We learned about value and reference
types earlier
• All objects can be treated the same way
• So how is a value type passed in a context
that expects a reference (e.g. List<T>)
• Boxing
DO #9:
Understand boxing
• Boxing is taking your value type, and
enclosing it in an Object instance, so it
becomes a reference type
• This has implications for equality
• 1 == 1, but (object)1 != (object)1
• Why?
• Because when boxed, each 1 has it’s own
box & reference, and these are not equal
DON’T #9:
Move the middle tier in
• We’ve established that sometimes it can
be useful to run middle tier code in the
database
• This should not be a default position
• Running middle tier code in the database
will limit your ability to scale-out – scaling
out the database layer is inherently more
complex than scaling out a middle tier.
DO #10:
Use Dictionary<TKey, TValue>
• Dictionary objects allow fast (close to
O(1)) access to a large list of objects
• Finding an object by key rather than
looping through is the exact equivalent of
doing an index seek rather than a scan
• When implementing objects for dictionary
access, over-ride the GetHashCode and
Equals methods of Object
DO #10:
Use Dictionary<TKey, TValue>
• Be careful, though
• If two objects would return true when
Equals was called between them, then
their GetHashCode calls must return the
same value
• This makes it unsuitable for fuzzy
matching (e.g. allowing tolerance between
float values)
DON’T #10:
Call Environment.Exit
• So Mladen Prajdic told me I should put this
in. I think he was joking but…
• This shows some of the dangers of
working in the UNSAFE permission set
• What happens when you call
Environment.Exit?
• It exits your procedure and rolls back?
DOs:
•
•
•
•
•
•
•
•
•
•
Know your transactions
Understand structs & classes
Dispose your IDisposables
Aim to write SAFE code
Understand the GAC
Understand managed memory
Use the context connection
Write efficient code
Understand boxing
Use Dictionary objects
DON’Ts:
•
•
•
•
•
•
•
•
•
•
Add strings together
Go overboard
Forget that data scales
Treat NULL as an afterthought
Use finalisers
Re-invent the wheel
Access data in scalar functions
Over-optimise
Move the middle tier in
Call Environment.Exit
Thanks!
• Any questions – drop me an email:
[email protected]
@atlantis_uk on twitter
• If you live near Southampton – come to the
#SQLSoton user group
• If not – find your local user group at Community
corner