Transcript Greedy Algorithms - Computer Science
Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid
Department of Computer Science, AUC
Part 8. Greedy Algorithms
Prof. Amr Goneid, AUC 1
Greedy Algorithms
Prof. Amr Goneid, AUC 2
Greedy Algorithms
Microsoft Interview From: http://www.cs.pitt.edu/~kirk/cs1510/ Prof. Amr Goneid, AUC 3
Greedy Algorithms
Greedy Algorithms The General Method Continuous Knapsack Problem Optimal Merge Patterns Prof. Amr Goneid, AUC 4
1. Greedy Algorithms
Methodology:
Start with a solution to a small sub problem Build up to the whole problem Make choices that look good in the short term but not necessarily in the long term Prof. Amr Goneid, AUC 5
Greedy Algorithms
Disadvantages:
They do not always work. Short term choices may be disastrous on the long term.
Correctness is hard to prove
Advantages:
When they work, they work fast Simple and easy to implement Prof. Amr Goneid, AUC 6
2. The General method
Let a[ ] be an array of elements that may contribute to a solution. Let S be a solution,
} { Greedy (a[ ],n) S = empty; } { for each element (i) from a[ ], i = 1:n x = Select (a,i); if ( Feasible (S,x)) S = Union (S,x); return S;
Prof. Amr Goneid, AUC 7
The General method (continued)
Select:
Selects an element from a[ ] and removes it.Selection is optimized to satisfy an objective function.
Feasible:
True if selected value can be included in the solution vector, False otherwise.
Union:
Combines value with solution and updates objective function.
Prof. Amr Goneid, AUC 8
3. Continuous Knapsack Problem
Prof. Amr Goneid, AUC 9
Continuous Knapsack Problem
Environment
Object (i):
Total Weight w i Total Profit p i Fraction of object (i) is continuous (0 =< x i
A Number of Objects
1 2 n 1 =< i <= n <= 1)
A knapsack
Capacity m m Prof. Amr Goneid, AUC 10
The problem
Problem Statement:
For n objects with weights w i and profits p i , obtain the set of fractions of objects x i which will maximize the total profit without exceeding a total weight m.
Formally:
Obtain the set
X
= (x 1 maximize 1 i n constraints: 1 i n w i x i m , 0 p i , x 2 , … , x n ) that will x i subject to the x i 1 , 1 i n Prof. Amr Goneid, AUC 11
Optimal Solution
Feasible Solution:
by satisfying constraints.
Optimal Solution:
Feasible solution and maximizing profit.
Lemma 1:
If 1 i n w i = m then x i = 1 is optimal.
Lemma 2:
An optimal solution will give 1 i n w i x i = m Prof. Amr Goneid, AUC 12
Greedy Algorithm
To maximize profit, choose highest p first.
Also choose highest x , i.e., smallest w first.
In other words, let us define the “value” of an object (i) to be the ratio v i = p i /w i and so we choose first the object with the highest v i value.
Prof. Amr Goneid, AUC 13
Algorithm
{ GreedyKnapsack ( p[ ] , w[ ] , m , n ,x[ ] ) insert indices (i) of items in a maximum heap on value v i = p i / w i ; Zero the vector x; Rem = m ; For k = 1..n
{ remove top of heap to get index (i); if (w[i] > Rem) then break; x[i] = 1.0 ; Rem = Rem – w[i] ; } if (k < = n ) x[i] = Rem / w[i] ; } // T(n) = O(n log n)
Prof. Amr Goneid, AUC 14
Example
n = 3 objects, m = 20 P = (25 , 24 , 15) , W = (18 , 15 , 10), V = (1.39 , 1.6 ,1.5) Objects in decreasing order of V are {2 , 3 , 1} Set X = {0 ,0 ,0} and Rem = m = 20 K = 1, Choose object i = 2: w 2 < Rem, Set x 2 = 1, w 2 x 2 K = 2, Choose object i = 3: = 15 , Rem = 5 w 3 > Rem, break; K < n , x 3 = Rem / w 3 = 0.5
Optimal solution is X = (0 , 1.0 , 0.5) , Total profit is 1 Total weight is i n p i x i = 31.5 1 i n w i x i = m = 20 Prof. Amr Goneid, AUC 15
4. Optimal Merge Patterns (a) Definitions
Binary Merge Tree: A binary tree with external nodes representing entities and internal nodes representing merges of these entities.
Optimal Binary Merge Tree: The sum of paths from root to external nodes is optimal (e.g. minimum). Assuming that the node (i) contributes to the cost by p i and the path from root to such node has length L i , then optimality requires a pattern that minimizes
L
i n
1
p i L i
Prof. Amr Goneid, AUC 16
Optimal Binary Merge Tree
If the items {A,B,C} contribute to the merge cost by P A , P B , P C , respectively, then the following 3 different patterns will cost:
ABC ABC ABC AB A B C A B BC C B AC A C
P 3 = 2P A +P B +2P C P 1 = 2(P A +P B )+P C P 2 = P A +2(P B +P C ) Which of these merge patterns is optimal?
Prof. Amr Goneid, AUC 17
(b) Optimal Merging of Lists
Lists {A,B,C} have lengths 30,25,10, respectively. The cost of merging two lists of lengths n,m is n+m. The following 3 different merge patterns will cost:
ABC ABC ABC AB A B C A B BC C B AC A C
P 1 = 2(30+25)+10 = 120 P 2 = 30+2(25+10) = 100 P 3 = 25+2(30+10) = 105 P 2 is optimal so that the merge order is {{B,C},A}.
Prof. Amr Goneid, AUC 18
The Greedy Method
Insert lists and their lengths in a minimum heap of lengths.
Repeat
Remove the two lowest length lists (p i ,p j ) from heap.
Merge lists with lengths (p i ,p j ) to form a new list with length p ij = p i + p j Insert p ij and its into the heap until all symbols are merged into one final list
C B A 10 25 A 30 30 BC 35 BCA 65 Prof. Amr Goneid, AUC 19
The Greedy Method
Notice that both Lists (B : 25 elements) and (C : 10 elements) have been merged (moved) twice List (A : 30 elements) has been merged (moved) only once.
Hence the total number of element moves is 100.
This is optimal among the other merge patterns.
Prof. Amr Goneid, AUC 20
(c) Huffman Coding Terminology
Symbol: A one-to-one representation of a single entity.
Alphabet: A finite set of symbols.
Message: A sequence of symbols.
Encoding: Translating symbols to a string of bits.
Decoding: The reverse.
Prof. Amr Goneid, AUC 21
Example: Coding Tree for 4-Symbol Alphabet (a,b,c,d)
Encoding:
a 00 b 01 c 10 0
abcd
d 11
Decoding:
0
ab
1 0110001100 b c a d a
a b This is fixed length coding c
0 1
cd
1
d
Prof. Amr Goneid, AUC 22
Coding Efficiency & Redundancy
L i
=Length of path from root to symbol (i) = no. of bits representing that symbol.
P i
= probability of occurrence of symbol (i) in message.
n
= size of alphabet.
< L >
= Average Symbol Length =
1
bits/symbol (bps) i
n P i L i
For
fixed length (bps)
coding,
L i = L = constant, < L > = L
Is this optimal (minimum) ? Not necessarily.
Prof. Amr Goneid, AUC 23
Coding Efficiency & Redundancy
The absolute minimum
< L >
the
Entropy.
in a message is called The concept of entropy as a measure of the average content of information in a message has been introduced by
Claude Shannon
(1948).
Prof. Amr Goneid, AUC 24
Coding Efficiency & Redundancy
H
Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication. It is computed as:
i n
1
P i
log
P i
i n
1
P i
log 1
P i
(
bps
)
Prof. Amr Goneid, AUC 25
Coding Efficiency & Redundancy
Coding Efficiency:
= H / < L > 0
Coding Redundancy: R = 1 -
0
R
1 1 Actual
Prof. Amr Goneid, AUC 26
Example: Fixed Length Coding
4- Symbol Alphabet (a,b,c,d). All symbols have the same length L = 2 bits Message : abbcaada Symbol (i) p i a b c d
< L > = 2 (bps) 0.5
0.25
0.125
0.125
-log p i 1 2 3 3 -p i log p i 0.5
code 00 0.5
0.375
0.375
H = 1.75
01 10 11 L i 2 2 2 2
Prof. Amr Goneid, AUC 27
Example
Entropy H = 0.5 + 0.5 + 0.375 + 0.375 = 1.75 (bps),
Coding Efficiency
= H / < L > = 1.75 / 2 = 0.875,
Coding Redundancy R = 1 – 0.875 = 0.125
This is not optimal
Prof. Amr Goneid, AUC 28
Result
Fixed length coding is optimal (perfect) only when all symbol probabilities are equal.
To prove this: With n = 2 m symbols, L = m bits and
If all probabilities are equal,
p i
1
n
2
m
, log
p i
m H
i n
1
p i
log
p i
1
n i n
1 log
p i
m Hence
H L
1 Prof. Amr Goneid, AUC 29
Variable Length Coding
(
Huffman Coding)
The problem: Given a set of symbols and their probabilities Find a set of binary codewords that minimize the average length of the symbols Prof. Amr Goneid, AUC 30
Variable Length Coding
(
Huffman Coding)
Formally: Input: A message M( a symbol alphabet
A
= {a 1 ,a 2 ,…,a n } of size (n) a set of probabilities for the symbols
P
= {p 1 ,p 2 ,….p
n } Output: A set of binary codewords with bit lengths
L
= {L 1 ,L 2 ,….L
n }
C
= {c 1 ,c 2 ,….c
n } Condition:
A
,
P
) with Minimize
L
i n
1
p i L i
Prof. Amr Goneid, AUC 31
Variable Length Coding
(
Huffman Coding)
To achieve optimality, we use optimal binary merge trees to code symbols of unequal probabilities.
Huffman Coding
: More frequent symbols occur nearer to the root ( shorter code lengths), less frequent symbols occur at deeper levels (longer code lengths).
Prof. Amr Goneid, AUC 32
The Greedy Method
Store each symbol in a parentless node of a binary tree.
Insert symbols and their probabilities in a minimum heap of probabilities.
Repeat
Remove lowest two probabilities (p i ,p j ) from heap.
Merge symbols with (p i ,p j ) to form a new symbol (a i a j ) with probability p ij = p i + p j Store symbol (a i a j ) in a parentless node with two children a i Insert p ij and its symbols into the heap and a j until all symbols are merged into one final alphabet (root)
Trace path from root to each leaf (symbol) to form the bit string for that symbol. Concatenate “0” for a left branch, and “1” for a right branch.
Prof. Amr Goneid, AUC 33
Example (1):
4- Symbol Alphabet A
= {a, b, c, d} of size (4).
Message
M(
A
,
P
)
: abbcaada, P
= {0.5, 0.25, 0.125, 0.125}
H = 1.75
Symbol (i) p i a 0.5
b c d 0.25
0.125
0.125
-log p i 1 2 3 3 -p i log p i 0.5
0.5
0.375
0.375
Prof. Amr Goneid, AUC 34
Building The Optimal Merge Table
s i d p i 0.125
s i p i s i p i s i p i c 0.125 cd 0.25
b a 0.25
b 0.5
a 0.25
bcd 0.5
0.5
a 0.5
abcd 1.0
Prof. Amr Goneid, AUC 35
Optimal Merge Tree for Example(1)
Example:
a (50%), b (25%), c (12.5%), d (12.5%)
a b c
Prof. Amr Goneid, AUC
d
36
Optimal Merge Tree for Example(1)
Example:
a (50%), b (25%), c (12.5%), d (12.5%)
a b c 0 cd 1 d
Prof. Amr Goneid, AUC 37
Optimal Merge Tree for Example(1)
Example:
a (50%), b (25%), c (12.5%), d (12.5%)
a b 0 bcd c 1 0 cd 1 d
Prof. Amr Goneid, AUC 38
Optimal Merge Tree for Example(1)
Example:
a (50%), b (25%), c (12.5%), d (12.5%)
a i c i L i (bits) abcd
a 0 1
1 a 0 bcd
b 10 2
1 b 0 cd
c 110 3
0 1
d 111 3
c d
Prof. Amr Goneid, AUC 39
Coding Efficiency for Example(1)
< L > = ( 1* 0.5 + 2 * 0.25 + 3 * 0.125 + 3 * 0.125) = 1.75 (bps)
H = 0.5 + 0.5 + 0.375 + 0.375 =
1.75 (bps), = H / < L > = 1.75 / 1.75 = 1.00 , R = 0.0
Notice that: Symbols exist at leaves, i.e., no symbol code is the prefix of another symbol code.
This is why the method is also called “prefix coding”
Prof. Amr Goneid, AUC 40
Analysis
The cost of insertion in a minimum heap is
O(n logn)
The repeat loop is done (n-1) times. In each iteration, the worst case removal of the least two elements is 2 logn and the insertion of the merged element is logn Hence, the complexity of the Huffman algorithm is
O(n logn)
Prof. Amr Goneid, AUC 41
Example (2):
4- Symbol Alphabet A
= {a, b, c, d} of size (4).
P
= {0.4, 0.25, 0.18, 0.17}
H = 1.909
Symbol (i) p i a 0.40
b c d 0.25
0.18
0.17
-log p i 1.322
2 2.474
2.556
-p i log p i 0.5288
0.5
0.4453
0.4345
Prof. Amr Goneid, AUC 42
Example(2): Merge Table
s i d p i 0.17
s i p i s i p i s i p i c 0.18
b 0.25
b a 0.25
cd 0.40
a 0.35
a 0.40
0.40
cdb 0.60
cdba 1.0
Prof. Amr Goneid, AUC 43
Optimal Merge Tree for Example(2)
a i c i
a 1 b 01 c 001 3 d 000 3
L i (bits)
1 2
0 c 0 cd 0 1 cdb d 1 b cdba 1 a
Prof. Amr Goneid, AUC 44
Coding Efficiency for Example(2)
a (40%), b (25%), c (18%), d (17%)
= 97.9 % R = 2.1 % Coding is optimal (97.9%) but not perfect Important Result: Perfect coding
(
= 100 %) can be achieved only for probability values of the form 2 - m (1/2, ¼, 1/8,…etc )
Prof. Amr Goneid, AUC 45
File Compression
Variable Length Codes can be used to compress files. Symbols are initially coded using ASCII (8-bit) fixed length codes.
Steps:
1. Determine Probabilities of symbols in file.
2. Build Merge Tree (or Table) 3. Assign variable length codes to symbols.
4. Encode symbols using new codes.
5. Save coded symbols in another file together with the symbol code table.
The Compression Ratio = < L > / 8 Prof. Amr Goneid, AUC 46
Huffman Coding Animations
For examples of animations of Huffman coding, see:
http://www.cs.pitt.edu/~kirk/cs1501/animations Huffman.html
http://peter.bittner.it/tugraz/huffmancoding.html
Prof. Amr Goneid, AUC 47