Disjoint set structures --for Operations over set CS2223 Recitation 3 March 30, 2005

Download Report

Transcript Disjoint set structures --for Operations over set CS2223 Recitation 3 March 30, 2005

Disjoint set structures

--for Operations over set (Reference: textbook, pp175-180) CS2223 Recitation 3 March 30, 2005 Song Wang

Problem Description

• Given: – A set

S

with

N

objects, identified using number 1 to N.

– Disjoint partitions (subsets) of the set

S.

• Any item belongs to one partition • No one item belongs to more than one partitions.

• What to do: – Find: given an object, find which set contains it.

– Merge: given two set, merge them into one set.

• Why: – Basic and frequently used functions for set operations, like union, intersection, and etc.

– Consequently, important problem for many other algorithms, like finding the minimum spanning tree.

2

Preliminaries

• Data Structure for Set: Tree • Ex.

1 2 3 7 8 Set 1 9 5 Set 2 4 Set 3 6 Parent Node denotes each set Smallest object as the parent node (one choice) 3

Preliminaries II

• Degraded Linked List: Array to record parent only Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Some adaptation: Index: 1 2 3 4 5 6 7 8 9 Array 0 0 0 3 2 3 1 1 1 4

Solution 1: find1()

Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 find1(7): 1--belongs to set 1 find1(2): 2 —belongs to set 2 Function find1(x) return set[x] 5

Solution 1: merge1()

Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Merge set 1 and 2: Index: 1 2 3 4 5 6 7 8 9 Array 1

1

3 3

1

3 1 1 1 Procedure merge1(a,b) i<- min (a, b) j<-max (a, b) for k<-1 to N do if set[k]=j then set[k]<-i Scan 6

Performance Analysis of find1() and merge1() • Case Study:

n

times of

find

and

<=N-1

times of

merge

. (

n

is comparable to

N

) • Function find1 takes constant time: Θ(1) • Procedure merge1 takes linear time: Θ(N) • Total:

n* Θ(1)+(N-1)Θ(N)= Θ(N 2 ) or Θ(n 2 )

7

Can We do Better?

3 1 2 7 8 Set 1 Merge set 1 and 2: 1 9 5 Set 2 7 8 Set 1 9 2 5 4 Set 3 6 1 7 8 Set 1 9 2 5 8

Solution 2: merge2()

Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Merge set 1 and 2: Index: 1 2 3 4 5 6 7 8 9 Array 1

1

3 3 2 3 1 1 1 Procedure merge2(a,b) if a

Solution 2: find2()

1 Index: 1 2 3 4 5 6 7 8 9 Array 1

1

3 3 2 3 1 1 1 7 8 find1(5): 1 Set 1 Need traverse the whole path from node 5 to the root node 1 9 Function find2(x) r<-x while set[r]!=r do r<-set[r] return r 2 5 Only for root, r=set[r] 10

Performance Analysis of find2() and merge2() • Case Study:

n

times of

find

and

<=N-1

times of

merge

. (

n

is comparable to

N

) • Function find2 takes linear time: Θ(N) in the worst case.

• • Procedure merge2 takes constant time: Θ(1) • Total:

n* Θ(N)+(N-1)Θ(1)= Θ(N 2 ) or Θ(n 2 ) No improvement!

11

What is the Problem?

• The worst case: linear tree 1 2 3 4 5 6 Merge2(5,6) 5 6 1 2 3 4 Merge2(4,5) Merge2(1,2) 4 5 6 1 2 1 2 3 3 4 5 6 Find2(6)? Height of the tree is essential for performance 12

How to Avoid a Bad Merge Tree

1 4 2 3 5 6 7 1 Merge(1,4) 4 3 4 2 1 5 6 7 2 3 5 6 7 13

Who’s whose subtree?

• Tree t 1 has height h 1 height h 2 and Tree t 2 has • If h 1 < h 2 : t 1 becomes subtree of t 2 and merged tree’s height is h 2 • If h 1 == h 2 : t 1 becomes subtree of t 2 merged tree’s height is h 1 +1 and • The root of the tree is not always the smallest node any more!

14

Theorem 5.9.1, pp 177

• A tree containing

k

nodes has a height at most

└log k┘

• Proof by induction.

15

Solution 3: merge3()

Procedure merge3(a,b) if height[a]=height[b] then height[a]<-height[a]+1 set[b]<-a else if height[a]>height[b] then set[b]<-a else set[a]<-b 16

Performance Analysis of find2() and merge3() • Case Study:

n

times of

find

and

<=N-1

times of

merge

. (

n

is comparable to

N

) • Function find2 takes

• • Procedure merge3 takes constant time: Θ(1) • Total:

n* Θ(logN)+(N-1)Θ(1)= Θ(n log n) Some improvement

17

Path Compression in find3()

• Intuitive explanation – More fan-out of children, less height of the tree.

6 Find3(20) 6 4 9 4 9 1 1 11 10 8 11 10 8 12 20 12 20 21 16 21 16 18

Solution 3: find3()

Function find3(x) r<-x while set[r]!=r do r<-set[r] i<-x while i!=r do j<-set[i] set[i]<-r i<-j return r First traverse of the path Find the root Second traverse of the path Connect nodes on path to root 19

Performance Analysis of find3() and merge3() • Case Study:

n

times of

find

and

<=N-1

times of

merge

. (

n

is comparable to

N

) • Function find3 takes little more than constant time.

• • Procedure merge3 takes constant time: Θ(1) • Total: close to

Θ(n) Best one!

20

1

Summery

7 8 9 2 5 Find1() and merge1(): Best for find, worst for merge (height =1, always ) Mixing above Find2() and merge3() (height = lgN, worst case) 1 7 8 9 2 5 Find2() and merge2() Best for merge, worst for find (height = N, worst case) Mixing above Find3() and merge3() (height close to 1) Best for both 21