Disjoint set structures --for Operations over set CS2223 Recitation 3 March 30, 2005
Download ReportTranscript Disjoint set structures --for Operations over set CS2223 Recitation 3 March 30, 2005
Disjoint set structures
--for Operations over set (Reference: textbook, pp175-180) CS2223 Recitation 3 March 30, 2005 Song Wang
Problem Description
• Given: – A set
S
with
N
objects, identified using number 1 to N.
– Disjoint partitions (subsets) of the set
S.
• Any item belongs to one partition • No one item belongs to more than one partitions.
• What to do: – Find: given an object, find which set contains it.
– Merge: given two set, merge them into one set.
• Why: – Basic and frequently used functions for set operations, like union, intersection, and etc.
– Consequently, important problem for many other algorithms, like finding the minimum spanning tree.
2
Preliminaries
• Data Structure for Set: Tree • Ex.
1 2 3 7 8 Set 1 9 5 Set 2 4 Set 3 6 Parent Node denotes each set Smallest object as the parent node (one choice) 3
Preliminaries II
• Degraded Linked List: Array to record parent only Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Some adaptation: Index: 1 2 3 4 5 6 7 8 9 Array 0 0 0 3 2 3 1 1 1 4
Solution 1: find1()
Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 find1(7): 1--belongs to set 1 find1(2): 2 —belongs to set 2 Function find1(x) return set[x] 5
Solution 1: merge1()
Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Merge set 1 and 2: Index: 1 2 3 4 5 6 7 8 9 Array 1
1
3 3
1
3 1 1 1 Procedure merge1(a,b) i<- min (a, b) j<-max (a, b) for k<-1 to N do if set[k]=j then set[k]<-i Scan 6
Performance Analysis of find1() and merge1() • Case Study:
n
times of
find
and
<=N-1
times of
merge
. (
n
is comparable to
N
) • Function find1 takes constant time: Θ(1) • Procedure merge1 takes linear time: Θ(N) • Total:
n* Θ(1)+(N-1)Θ(N)= Θ(N 2 ) or Θ(n 2 )
7
Can We do Better?
3 1 2 7 8 Set 1 Merge set 1 and 2: 1 9 5 Set 2 7 8 Set 1 9 2 5 4 Set 3 6 1 7 8 Set 1 9 2 5 8
Solution 2: merge2()
Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Merge set 1 and 2: Index: 1 2 3 4 5 6 7 8 9 Array 1
1
3 3 2 3 1 1 1 Procedure merge2(a,b) if a
Solution 2: find2()
1 Index: 1 2 3 4 5 6 7 8 9 Array 1
1
3 3 2 3 1 1 1 7 8 find1(5): 1 Set 1 Need traverse the whole path from node 5 to the root node 1 9 Function find2(x) r<-x while set[r]!=r do r<-set[r] return r 2 5 Only for root, r=set[r] 10
Performance Analysis of find2() and merge2() • Case Study:
n
times of
find
and
<=N-1
times of
merge
. (
n
is comparable to
N
) • Function find2 takes linear time: Θ(N) in the worst case.
• • Procedure merge2 takes constant time: Θ(1) • Total:
n* Θ(N)+(N-1)Θ(1)= Θ(N 2 ) or Θ(n 2 ) No improvement!
11
What is the Problem?
• The worst case: linear tree 1 2 3 4 5 6 Merge2(5,6) 5 6 1 2 3 4 Merge2(4,5) Merge2(1,2) 4 5 6 1 2 1 2 3 3 4 5 6 Find2(6)? Height of the tree is essential for performance 12
How to Avoid a Bad Merge Tree
1 4 2 3 5 6 7 1 Merge(1,4) 4 3 4 2 1 5 6 7 2 3 5 6 7 13
Who’s whose subtree?
• Tree t 1 has height h 1 height h 2 and Tree t 2 has • If h 1 < h 2 : t 1 becomes subtree of t 2 and merged tree’s height is h 2 • If h 1 == h 2 : t 1 becomes subtree of t 2 merged tree’s height is h 1 +1 and • The root of the tree is not always the smallest node any more!
14
Theorem 5.9.1, pp 177
• A tree containing
k
nodes has a height at most
└log k┘
• Proof by induction.
15
Solution 3: merge3()
Procedure merge3(a,b) if height[a]=height[b] then height[a]<-height[a]+1 set[b]<-a else if height[a]>height[b] then set[b]<-a else set[a]<-b 16
Performance Analysis of find2() and merge3() • Case Study:
n
times of
find
and
<=N-1
times of
merge
. (
n
is comparable to
N
) • Function find2 takes
• • Procedure merge3 takes constant time: Θ(1) • Total:
n* Θ(logN)+(N-1)Θ(1)= Θ(n log n) Some improvement
17
Path Compression in find3()
• Intuitive explanation – More fan-out of children, less height of the tree.
6 Find3(20) 6 4 9 4 9 1 1 11 10 8 11 10 8 12 20 12 20 21 16 21 16 18
Solution 3: find3()
Function find3(x) r<-x while set[r]!=r do r<-set[r] i<-x while i!=r do j<-set[i] set[i]<-r i<-j return r First traverse of the path Find the root Second traverse of the path Connect nodes on path to root 19
Performance Analysis of find3() and merge3() • Case Study:
n
times of
find
and
<=N-1
times of
merge
. (
n
is comparable to
N
) • Function find3 takes little more than constant time.
• • Procedure merge3 takes constant time: Θ(1) • Total: close to
Θ(n) Best one!
20
1
Summery
7 8 9 2 5 Find1() and merge1(): Best for find, worst for merge (height =1, always ) Mixing above Find2() and merge3() (height = lgN, worst case) 1 7 8 9 2 5 Find2() and merge2() Best for merge, worst for find (height = N, worst case) Mixing above Find3() and merge3() (height close to 1) Best for both 21