Transcript MST

Data Structure & Algorithm
11 – Minimal Spanning Tree
JJCAO
Steal some from Prof. Yoram Moses & Princeton COS 226
Weighted Graphs
G =(V,E),wt
wt: E β†’ R
wt(G) =
(𝑒,𝑣)∈𝐸[𝐺] 𝑀𝑑(𝑒, 𝑣)
2
Sub-Graphs
Note: G' is not a spanning sub-graph of G
3
Minimum Spanning Tree
β€’
β€’
β€’
β€’
A Subgraph
A tree
Spans G
Of minimal weight
4
MST Origin
Otakar Boruvka (1926).
β€’ Electrical Power Company of Western Moravia in Brno.
β€’ Most economical construction of electrical power
network.
β€’ Concrete engineering problem is now a cornerstone
problem in combinatorial optimization.
5
MST describes arrangement of nuclei in the
epithelium for cancer research
http://www.bccrc.ca/ci/ta01_archlevel.html
6
Normal Consistency
[Hoppe et al. 1992]
β€’ Based on angles between unsigned normals
β€’ May produce errors on close-by surface sheets
7
MST is fundamental problem with diverse
applications
β€’ Network design.
– telephone, electrical, hydraulic, TV cable, computer, road
β€’ Approximation algorithms for NP-hard problems.
– traveling salesperson problem, Steiner tree
β€’ Indirect applications.
–
–
–
–
–
–
–
max bottleneck paths
LDPC codes for error correction
image registration with Renyi entropy
learning salient features for real-time face verification
reducing data storage in sequencing amino acids in a protein
model locality of particle interactions in turbulent fluid flows
autoconfig protocol for Ethernet bridging to avoid cycles in a
network
β€’ Cluster analysis.
8
Minimum Spanning Tree on Surface of Sphere 5000 Vertices
9
Minimum Spanning Tree
Input: a connected, undirected graph - G, with a weight
function on the edges – wt
Goal: find a Minimum-weight Spanning Tree for G
Fact:
If all edge weights are distinct, the MST is unique
Brute force: Try all possible spanning trees
β€’ problem 1: not so easy to implement
β€’ problem 2: far too many of them
Ex: [Cayley, 1889]: V^{V-2} spanning trees on the complete graph on V vertices.
10
Main algorithms of MST
1. Kruskal’s algorithm
2. Prim’s algorithm
Both O(ElgV) using ordinary binary heaps
Both greedy algorithms => Global solution
3. …
11
Two Greedy Algorithms
β€’ Kruskal's algorithm. Consider edges in ascending
order of cost. Add the next edge to T unless doing so
would create a cycle.
β€’ Prim's algorithm. Start with any vertex s and greedily
grow a tree T from s. At each step, add the cheapest
edge to T that has exactly one endpoint in T.
Greed is good. Greed is right. Greed works. Greed
clarifies, cuts through, and captures the essence of the
evolutionary spirit."
- Gordon Gecko
12
Cycle Property
β€’ Let T be a minimum spanning tree of
a weighted graph G
β€’ Let e be an edge of G that is not in T
and C be the cycle formed by e with
T
β€’ For every edge f of C, weight(f) ≀
weight(e)
Proof:
β€’ By contradiction
β€’ If weight(f) > weight(e) we can get a
spanning tree of smaller weight by
replacing e with f
13
Edges cross the cut
14
Cut
(/Partition)
Property
Lemma:
Let G =(V,E) and X βŠ‚ V.
If e = a lightest edge connecting X and V-X
then e appears in some MST of G.
Proof:
β€’ Let T be an MST of G
β€’ If T does not contain e, consider the cycle C
formed by e with T and let f be an edge of C
across the partition
β€’ By the cycle property,
weight(f) ≀ weight(e)
β€’ Thus, weight(f) = weight(e)
β€’ We obtain another MST by replacing f with e
locally optimal choice
(of lightest edges)
globally optimal solution
(MST)
15
Disjoint Set ADT
16
An application of disjoint-set data structures
17
Linked List Implementation
18
Union
in Linked List Implementation
19
20
Worst-Case Example
β€’ n: the number of MAKE-SET operations,
β€’ m: the total number of MAKE-SET, UNION, and FINDSET operations
β€’ we can easily construct a sequence of m operations on n
objects that requires Θ(n^2) time
21
Weighted Union Heuristic
β€’ Each set id includes the length of the list
β€’ In Union - append shorter list at end of
longer
Theorem: Performing m > n operations
takes O(m + nlgn) time
22
Simple Forest Implementation
Find-Set(x) follow pointers
from x up to root
Union(c,f) - make c a child of f and return f
βˆͺ
23
Worst-Case Example
n
…
3
2
1
24
Weighted Union Heuristic
β€’ Each node includes a weight field
weight = # elements in sub-tree rooted at node
β€’ Find-Set(x) - as before O(depth(x))
β€’ Union(x,y) - always attach smaller tree
below the root of larger tree
O(1)
25
Weighted Union
Theorem:
Any k-node tree created using the
weighted-union heuristic, has height ≀ lg(k)
Proof: By induction on k
Find-Set Running Time: O(lg n)
26
2nd heuristic: Path Compression
27
The function lg n
lg n = the number of times we have to take
the log2 n repeatedly to reach root node
Lg 2 = 1
Lg 2^2 = 2
Lg 2^16 = lg 65536 = 16
=> Lg n < 16 for all practical values of n
28
Theorem(Tarjan): If
S = a sequence of O(n) Unions and Find-Sets
The worst-case time for S with
– Weighted Unions, and
– Path Compressions
is O(nlgn)
The average time is O(lgn) per operation
in Linked List
Implementation
29
Theorem(Tarjan): Let
S = a sequence of O(n) Unions and Find-Sets
The worst-case time for S with
– Weighted Unions, and
– Path Compressions
is O(nΞ±(n))
The average time is O(Ξ±(n)) per operation, Ξ±(n)
< 5 in practice
30
Connected Components using
Union-Find
Reminder:
β€’ Every node v is connected to itself
β€’ if u and v are in the same connected
component then v is connected to u and u
is connected to v
β€’ Connected components form a partition of
the nodes and so are disjoint:
31
MST-Kruskal
Kruskal's algorithm for minimum spanning tree works by inserting
edges in order of increasing cost, adding as edges to the tree
those which connect two previously disjoint components.
The minimum spanning tree describes the cheapest network to
connect all of a given set of vertices
Kruskal's algorithm on a graph of distances between 128 North American cities
32
Example
33
34
MST-Kruskal
35
MST-Kruskal
36
MST-Kruskal
Running Time:
37
MST-Prim-Jarnik
38
Example
39
MST-Prim-Jarnik
40
MST-Prim
41
MST-Prim
42
MST-Prim
43
Decrease_key(v,x)
We use a min-Heap to hold the edges in G-T
How can we implement Decrease key(v,x)?
Simple solution:
β€’ Change value for v
β€’ Follow strategy for Heap_insert from v upwards
β€’ Cost: O(lgV)
44
MST-Prim
Running Time:
45
Does a linear-time MST algorithm
exist?
46
Euclidean MST
Given N points in the plane, find MST connecting them,
where the distances between point pairs are their
Euclidean distances.
Brute force. Compute ~ 𝑁 2 /2 distances and run Prim's
algorithm.
Ingenuity. Exploit geometry and do it in ~ c N lg N.
47
Scientific application: clustering
k-clustering. Divide a set of objects classify into k coherent groups.
Distance function. Numeric value specifying "closeness" of two objects.
Goal. Divide into clusters so that objects in different clusters are far
apart.
outbreak of cholera deaths in London
in 1850s (Nina Mishra)
Applications.
β€’ Routing in mobile ad hoc networks.
β€’ Document categorization for web search.
β€’ Similarity searching in medical image databases.
β€’ Skycat: cluster 109 sky objects into stars, quasars, galaxies.
48
Single-link clustering
k-clustering. Divide a set of objects classify into k coherent groups.
Distance function. Numeric value specifying "closeness" of two objects.
Goal. Divide into clusters so that objects in different clusters are far
apart.
Single link. Distance between two clusters equals the distance between
the two closest objects (one in each cluster).
Single-link clustering. Given an integer k, find a k-clustering that
maximizes the distance between two closest clusters.
49
Single-link clustering algorithm
β€œWell-known” algorithm for single-link clustering:
β€’ Form V clusters of one object each.
β€’ Find the closest pair of objects such that each object is in
a different cluster, and merge the two clusters.
β€’ Repeat until there are exactly k clusters.
Observation. This is Kruskal's algorithm (stop when k
connected components).
Alternate solution. Run Prim's algorithm and delete k-1 max
weight edges.
50
Dendrogram
Tree diagram that illustrates arrangement of
clusters.
51
Dendrogram of cancers in human
Tumors in similar tissues cluster together
52