HOW TO OPTIMIZE A HIERARCHY IN SQL SERVER Louis Davidson (drsql.org) [email protected] Hierarchies Hierarchy Types • Trees - Single Parent Hierarchies • Graphs – Multi Parent.

Download Report

Transcript HOW TO OPTIMIZE A HIERARCHY IN SQL SERVER Louis Davidson (drsql.org) [email protected] Hierarchies Hierarchy Types • Trees - Single Parent Hierarchies • Graphs – Multi Parent.

HOW TO OPTIMIZE A
HIERARCHY IN SQL
SERVER
Louis Davidson (drsql.org)
[email protected]
3
Hierarchies
4
Hierarchy Types
• Trees - Single Parent Hierarchies
• Graphs – Multi Parent Hierarchies
Wood with Tape
Piece of Wood
Screw and Tape
Tape
Screw
• Note: Graphs can be complex to deal with as a whole, but often
you can deal with them as a set of trees
5
Hierarchy Uses
• Trees
• Species
• Jurisdictions
• “Simple” Organizational Charts (Or at least the base
manager-employee part of the organization)
• Directory folders
• Graph
• Bill of materials
• Complex Organization Chart (all those dotted lines!)
• Genealogies
• Biological (Typically with limit cardinality of parents to 2 )
• Family Tree – (Sky is the limit)
6
Implementation of a Hierarchy
• “There is more than one way to shave a dog”
• None of which are pleasant for the dog or the shaver
• And the doctor who orders it only asks for a bald dog
• Hierarchies are not at all natural to manipulate/query
using relational code
• And the natural, recursive processing of a node at a time is
horribly difficult and slow in relational code
• So, multiple methods of processing them have arisen
through the years
• The topic (much like the topic of how cruel it is to shave
a dog), inspires religious-like arguments
• I find all of the implementation possibilities fascinating,
so I set out to do an overview of them all…
7
Working with Trees - Background
• Node Recursion
• Relational Recursion
8
Cycles in Hierarchies
Grandparent
Parent
Child
• “I’m my own grandpa” syndrome
• Must be understood or can cause infinite loop in
processing
• Generally disallowed in trees
• Generally handled in graphs
9
Tree Processing Algorithms
• There are several methods for processing trees in
SQL
• We will cover
• Fixed Levels
• Adjacency List
• HierarchyId
• Path Technique
• Nested Sets
• Kimball Helper Table
• Without giving away too much, pretty much all of
the methods have some use…
10
Preconceived Notions
Which method/algorithm do
you expect to be fastest?
• Fixed Levels
• Adjacency List
• HierarchyId
• Path Technique
• Nested Sets
• Kimball Helper Table
11
Coding for trees
• Manipulation:
• Creating a new node
• Moving/Reparenting a node
• Deleting a node (with/without children)
• Usage
• Getting the children of a node
• Getting the parent of a node
• Aggregating along the tree
• Note: No tree algorithms allow for “simple” SQL solutions to all
of these problems
• We will have demos of all of these operations…
12
Reparenting Example
• Starting with:
• Perhaps ending with:
Dragging along
all of it’s child
nodes along
with it
13
Implementing a tree – Fixed Levels
CREATE TABLE CompanyHierarchy
(
Company
varchar(100) NULL,
Headquarters varchar(100) NULL,
Branch
varchar(100) NULL,
PRIMARY KEY (Company, Headquarters, Branch)
)
Very limited, but very fast and easy to work with
I will not demo this structure today because it’s use is both
extremely obvious and limited
14
Implementing a tree – Adjacency List
• Every row includes the key value of the parent in the row
• Parent-less rows have no parent value
• Code to get information out is the most complex to write (though not as
inefficient as it might seem)
• CREATE TABLE CompanyHierarchy
(
)
Organization
varchar(100) NOT NULL PRIMARY KEY,
ParentOrganization varchar(100) NULL REFERENCES
CompanyHierarchy (Organization),
Name
varchar(100) NOT NULL
15
Implementing a tree – Path Method
Every row includes a representation of the path to their parent
Processing makes use of like and string processing (I have seen a case
that used fixed length binary values)
Limitation on path size for string manipulation/indexing
CREATE TABLE CompanyHierarchy
(
OrganizationId
int NOT NULL PRIMARY KEY,
Name
varchar(100) NOT NULL,
Path
varchar(900)
)
16
Implementing a tree – HierarchyId
Somewhat unnatural method to the typical SQL Programmer
Similar to the Path Method, and has some of the same limitations when moving
around nodes
Node path does not use data natural to the table, but rather positional
locationing
CREATE TABLE CompanyHierarchy
(
OrganizationId
int NOT NULL PRIMARY KEY,
Name
varchar(100) NOT NULL,
OrgNode
hierarchyId not null
)
17
Implementing a tree – Nested Sets
Query processing is done using range queries
Structure is quite slow to maintain due to fragile structure
Can produce excellent performance for queries
CREATE TABLE CompanyHierarchy
(
Organization
varchar(100) NOT NULL PRIMARY KEY,
Name
varchar(100) NOT NULL,
Left
int NOT NULL,
Right
int NOT NULL
)
18
Implementing a tree – Kimball Helper
• Developed initially for data warehousing since data is
modified all at once with a fixed cost
• Basically explodes the hierarchy into a table that
turns all hierarchy manipulations into a relational
query
• Maintenance can be slightly costly, but using the data
is extremely fast
19
Implementing a tree – Kimball Helper
• For the rows in yellow, expands to the table shown:
ParentId
ChildId
Distance
ParentRootNode
ChildLeafNode
1
1
1
1
2
2
2
1
2
4
5
2
4
5
0
1
2
2
0
1
1
1
1
1
1
0
0
0
0
0
1
1
0
1
1
20
Demo Setup
• For each style of hierarchy, we will see how to:
• Implement a physical model that models the corporate
hierarchy of the previous graphics
• Create Stored Procedures for Insert, Delete, and
Reparenting a Node
• Queries to access and aggregate the data in the
hierarchy
• We will do this for two sets of data, the data in the
presentation, and then a randomly generated set.
21
Demo Code
• Example code available in download
22
Did I change any of your minds?
23
Graphs
• Generally implemented in same manner as
adjacency list
• Can be processed in the same manner as an adjacency
list
• Primary difference is child can have > 1 parent node
• Cycles are generally acceptable
• Graph structure will always be external to data
structure
• Graphs are even more natural data structures
than trees
24
Graphs are Everywhere
• Almost any many to many can be a graph
ActingCast
Actor
Movie
MovieDirectory
Director
25
Graph Demo
PersonInterest
Person
PersonConnection
Interest
Contact info
• Louis Davidson - [email protected]
• Website – http://drsql.org <-- Get slides here
• Twitter – http://twitter.com/drsql
• SQL Blog
http://sqlblog.com/blogs/louis_davidson
• Simple Talk Blog – What Counts for a DBA
http://www.simple-talk.com/community/blogs/drsql/default.aspx