Greedy Algorithms - Computer Science

Download Report

Transcript Greedy Algorithms - Computer Science

Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid

Department of Computer Science, AUC

Part 8. Greedy Algorithms

Prof. Amr Goneid, AUC 1

Greedy Algorithms

Prof. Amr Goneid, AUC 2

Greedy Algorithms

Microsoft Interview From: http://www.cs.pitt.edu/~kirk/cs1510/ Prof. Amr Goneid, AUC 3

Greedy Algorithms

 Greedy Algorithms  The General Method  Continuous Knapsack Problem  Optimal Merge Patterns Prof. Amr Goneid, AUC 4

1. Greedy Algorithms

Methodology:

 Start with a solution to a small sub problem  Build up to the whole problem  Make choices that look good in the short term but not necessarily in the long term Prof. Amr Goneid, AUC 5

Greedy Algorithms

Disadvantages:

 They do not always work.  Short term choices may be disastrous on the long term.

 Correctness is hard to prove

Advantages:

 When they work, they work fast  Simple and easy to implement Prof. Amr Goneid, AUC 6

2. The General method

Let a[ ] be an array of elements that may contribute to a solution. Let S be a solution,

} { Greedy (a[ ],n) S = empty; } { for each element (i) from a[ ], i = 1:n x = Select (a,i); if ( Feasible (S,x)) S = Union (S,x); return S;

Prof. Amr Goneid, AUC 7

The General method (continued)

Select:

Selects an element from a[ ] and removes it.Selection is optimized to satisfy an objective function.

Feasible:

True if selected value can be included in the solution vector, False otherwise.

Union:

Combines value with solution and updates objective function.

Prof. Amr Goneid, AUC 8

3. Continuous Knapsack Problem

Prof. Amr Goneid, AUC 9

Continuous Knapsack Problem

Environment

Object (i):

 Total Weight w i Total Profit p i Fraction of object (i) is continuous (0 =< x i

A Number of Objects

1 2 n 1 =< i <= n <= 1) 

A knapsack

Capacity m m Prof. Amr Goneid, AUC 10

The problem

Problem Statement:

For n objects with weights w i and profits p i , obtain the set of fractions of objects x i which will maximize the total profit without exceeding a total weight m.

Formally:

Obtain the set

X

= (x 1 maximize  1  i  n constraints:  1  i  n w i x i  m , 0 p i , x 2 , … , x n ) that will x i subject to the  x i  1 , 1  i  n Prof. Amr Goneid, AUC 11

Optimal Solution

Feasible Solution:

 by satisfying constraints.

Optimal Solution:

Feasible solution and maximizing profit.

 

Lemma 1:

If  1  i  n w i = m then x i = 1 is optimal.

Lemma 2:

An optimal solution will give  1  i  n w i x i = m Prof. Amr Goneid, AUC 12

Greedy Algorithm

 To maximize profit, choose highest p first.

 Also choose highest x , i.e., smallest w first.

 In other words, let us define the “value” of an object (i) to be the ratio v i = p i /w i and so we choose first the object with the highest v i value.

Prof. Amr Goneid, AUC 13

Algorithm

{ GreedyKnapsack ( p[ ] , w[ ] , m , n ,x[ ] ) insert indices (i) of items in a maximum heap on value v i = p i / w i ; Zero the vector x; Rem = m ; For k = 1..n

{ remove top of heap to get index (i); if (w[i] > Rem) then break; x[i] = 1.0 ; Rem = Rem – w[i] ; } if (k < = n ) x[i] = Rem / w[i] ; } // T(n) = O(n log n)

Prof. Amr Goneid, AUC 14

Example

          n = 3 objects, m = 20 P = (25 , 24 , 15) , W = (18 , 15 , 10), V = (1.39 , 1.6 ,1.5) Objects in decreasing order of V are {2 , 3 , 1} Set X = {0 ,0 ,0} and Rem = m = 20 K = 1, Choose object i = 2: w 2 < Rem, Set x 2 = 1, w 2 x 2 K = 2, Choose object i = 3: = 15 , Rem = 5 w 3 > Rem, break; K < n , x 3 = Rem / w 3 = 0.5

Optimal solution is X = (0 , 1.0 , 0.5) , Total profit is  1  Total weight is i  n p i x i = 31.5  1  i  n w i x i = m = 20 Prof. Amr Goneid, AUC 15

4. Optimal Merge Patterns (a) Definitions

 

Binary Merge Tree: A binary tree with external nodes representing entities and internal nodes representing merges of these entities.

Optimal Binary Merge Tree: The sum of paths from root to external nodes is optimal (e.g. minimum). Assuming that the node (i) contributes to the cost by p i and the path from root to such node has length L i , then optimality requires a pattern that minimizes

L

i n

  1

p i L i

Prof. Amr Goneid, AUC 16

Optimal Binary Merge Tree

If the items {A,B,C} contribute to the merge cost by P A , P B , P C , respectively, then the following 3 different patterns will cost:

ABC ABC ABC AB A B C A B BC C B AC A C

P 3 = 2P A +P B +2P C P 1 = 2(P A +P B )+P C P 2 = P A +2(P B +P C ) Which of these merge patterns is optimal?

Prof. Amr Goneid, AUC 17

(b) Optimal Merging of Lists

Lists {A,B,C} have lengths 30,25,10, respectively. The cost of merging two lists of lengths n,m is n+m. The following 3 different merge patterns will cost:

ABC ABC ABC AB A B C A B BC C B AC A C

P 1 = 2(30+25)+10 = 120 P 2 = 30+2(25+10) = 100 P 3 = 25+2(30+10) = 105 P 2 is optimal so that the merge order is {{B,C},A}.

Prof. Amr Goneid, AUC 18

The Greedy Method

Insert lists and their lengths in a minimum heap of lengths.

Repeat

  

Remove the two lowest length lists (p i ,p j ) from heap.

Merge lists with lengths (p i ,p j ) to form a new list with length p ij = p i + p j Insert p ij and its into the heap until all symbols are merged into one final list

C B A 10 25 A 30 30 BC 35 BCA 65 Prof. Amr Goneid, AUC 19

The Greedy Method

    Notice that both Lists (B : 25 elements) and (C : 10 elements) have been merged (moved) twice List (A : 30 elements) has been merged (moved) only once.

Hence the total number of element moves is 100.

This is optimal among the other merge patterns.

Prof. Amr Goneid, AUC 20

(c) Huffman Coding Terminology

    

Symbol: A one-to-one representation of a single entity.

Alphabet: A finite set of symbols.

Message: A sequence of symbols.

Encoding: Translating symbols to a string of bits.

Decoding: The reverse.

Prof. Amr Goneid, AUC 21

Example: Coding Tree for 4-Symbol Alphabet (a,b,c,d)

  

Encoding:

a 00 b 01 c 10 0

abcd

d 11

Decoding:

0

ab

1 0110001100 b c a d a

a b This is fixed length coding c

0 1

cd

1

d

Prof. Amr Goneid, AUC 22

Coding Efficiency & Redundancy

     

L i

=Length of path from root to symbol (i) = no. of bits representing that symbol.

P i

= probability of occurrence of symbol (i) in message.

n

= size of alphabet.

< L >

= Average Symbol Length = 

1

bits/symbol (bps) i

n P i L i

For

fixed length (bps)

coding,

L i = L = constant, < L > = L

Is this optimal (minimum) ? Not necessarily.

Prof. Amr Goneid, AUC 23

Coding Efficiency & Redundancy

  The absolute minimum

< L >

the

Entropy.

in a message is called The concept of entropy as a measure of the average content of information in a message has been introduced by

Claude Shannon

(1948).

Prof. Amr Goneid, AUC 24

Coding Efficiency & Redundancy

H

Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication. It is computed as:  

i n

 

1

P i

log

P i

i n

 

1

P i

log 1

P i

(

bps

)

Prof. Amr Goneid, AUC 25

Coding Efficiency & Redundancy

 

Coding Efficiency:

= H / < L > 0

  

Coding Redundancy: R = 1 -

0

R

1 1 Actual Optimal Perfect H

Prof. Amr Goneid, AUC 26

Example: Fixed Length Coding

 

4- Symbol Alphabet (a,b,c,d). All symbols have the same length L = 2 bits Message : abbcaada Symbol (i) p i a b c d

< L > = 2 (bps) 0.5

0.25

0.125

0.125

-log p i 1 2 3 3 -p i log p i 0.5

code 00 0.5

0.375

0.375

H = 1.75

01 10 11 L i 2 2 2 2

Prof. Amr Goneid, AUC 27

Example

Entropy H = 0.5 + 0.5 + 0.375 + 0.375 = 1.75 (bps),

Coding Efficiency

= H / < L > = 1.75 / 2 = 0.875,

Coding Redundancy R = 1 – 0.875 = 0.125

This is not optimal

Prof. Amr Goneid, AUC 28

Result

Fixed length coding is optimal (perfect) only when all symbol probabilities are equal.

To prove this: With n = 2 m symbols, L = m bits and = m (bps).

If all probabilities are equal,

p i

 1

n

 2 

m

,  log

p i

m H

 

i n

  1

p i

log

p i

 1

n i n

  1  log

p i

m Hence

  

H L

  1 Prof. Amr Goneid, AUC 29

Variable Length Coding

(

Huffman Coding)

The problem:  Given a set of symbols and their probabilities  Find a set of binary codewords that minimize the average length of the symbols Prof. Amr Goneid, AUC 30

Variable Length Coding

(

Huffman Coding)

Formally:    Input: A message M( a symbol alphabet

A

= {a 1 ,a 2 ,…,a n } of size (n) a set of probabilities for the symbols

P

= {p 1 ,p 2 ,….p

n } Output: A set of binary codewords with bit lengths

L

= {L 1 ,L 2 ,….L

n }

C

= {c 1 ,c 2 ,….c

n } Condition:

A

,

P

) with Minimize 

L

 

i n

  1

p i L i

Prof. Amr Goneid, AUC 31

Variable Length Coding

(

Huffman Coding)

 To achieve optimality, we use optimal binary merge trees to code symbols of unequal probabilities.

Huffman Coding

: More frequent symbols occur nearer to the root ( shorter code lengths), less frequent symbols occur at deeper levels (longer code lengths).

Prof. Amr Goneid, AUC 32

The Greedy Method

 

Store each symbol in a parentless node of a binary tree.

Insert symbols and their probabilities in a minimum heap of probabilities.

Repeat

   

Remove lowest two probabilities (p i ,p j ) from heap.

Merge symbols with (p i ,p j ) to form a new symbol (a i a j ) with probability p ij = p i + p j Store symbol (a i a j ) in a parentless node with two children a i Insert p ij and its symbols into the heap and a j until all symbols are merged into one final alphabet (root)

Trace path from root to each leaf (symbol) to form the bit string for that symbol. Concatenate “0” for a left branch, and “1” for a right branch.

Prof. Amr Goneid, AUC 33

Example (1):

  

4- Symbol Alphabet A

= {a, b, c, d} of size (4).

Message

M(

A

,

P

)

: abbcaada, P

= {0.5, 0.25, 0.125, 0.125}

H = 1.75

Symbol (i) p i a 0.5

b c d 0.25

0.125

0.125

-log p i 1 2 3 3 -p i log p i 0.5

0.5

0.375

0.375

Prof. Amr Goneid, AUC 34

Building The Optimal Merge Table

s i d p i 0.125

s i p i s i p i s i p i c 0.125 cd 0.25

b a 0.25

b 0.5

a 0.25

bcd 0.5

0.5

a 0.5

abcd 1.0

Prof. Amr Goneid, AUC 35

Optimal Merge Tree for Example(1)

Example:

a (50%), b (25%), c (12.5%), d (12.5%)

a b c

Prof. Amr Goneid, AUC

d

36

Optimal Merge Tree for Example(1)

Example:

a (50%), b (25%), c (12.5%), d (12.5%)

a b c 0 cd 1 d

Prof. Amr Goneid, AUC 37

Optimal Merge Tree for Example(1)

Example:

a (50%), b (25%), c (12.5%), d (12.5%)

a b 0 bcd c 1 0 cd 1 d

Prof. Amr Goneid, AUC 38

Optimal Merge Tree for Example(1)

Example:

a (50%), b (25%), c (12.5%), d (12.5%)

a i c i L i (bits) abcd

a 0 1

1 a 0 bcd

b 10 2

1 b 0 cd

c 110 3

0 1

d 111 3

c d

Prof. Amr Goneid, AUC 39

Coding Efficiency for Example(1)

< L > = ( 1* 0.5 + 2 * 0.25 + 3 * 0.125 + 3 * 0.125) = 1.75 (bps)

 

H = 0.5 + 0.5 + 0.375 + 0.375 =

1.75 (bps), = H / < L > = 1.75 / 1.75 = 1.00 , R = 0.0

Notice that: Symbols exist at leaves, i.e., no symbol code is the prefix of another symbol code.

This is why the method is also called “prefix coding”

Prof. Amr Goneid, AUC 40

Analysis

The cost of insertion in a minimum heap is

O(n logn)

The repeat loop is done (n-1) times. In each iteration, the worst case removal of the least two elements is 2 logn and the insertion of the merged element is logn Hence, the complexity of the Huffman algorithm is

O(n logn)

Prof. Amr Goneid, AUC 41

Example (2):

  

4- Symbol Alphabet A

= {a, b, c, d} of size (4).

P

= {0.4, 0.25, 0.18, 0.17}

H = 1.909

Symbol (i) p i a 0.40

b c d 0.25

0.18

0.17

-log p i 1.322

2 2.474

2.556

-p i log p i 0.5288

0.5

0.4453

0.4345

Prof. Amr Goneid, AUC 42

Example(2): Merge Table

s i d p i 0.17

s i p i s i p i s i p i c 0.18

b 0.25

b a 0.25

cd 0.40

a 0.35

a 0.40

0.40

cdb 0.60

cdba 1.0

Prof. Amr Goneid, AUC 43

Optimal Merge Tree for Example(2)

a i c i

a 1 b 01 c 001 3 d 000 3

L i (bits)

1 2

0 c 0 cd 0 1 cdb d 1 b cdba 1 a

Prof. Amr Goneid, AUC 44

Coding Efficiency for Example(2)

a (40%), b (25%), c (18%), d (17%) = 1.95 bps (Optimal) H = 1.909

= 97.9 % R = 2.1 % Coding is optimal (97.9%) but not perfect Important Result: Perfect coding

( 

= 100 %) can be achieved only for probability values of the form 2 - m (1/2, ¼, 1/8,…etc )

Prof. Amr Goneid, AUC 45

File Compression

  Variable Length Codes can be used to compress files. Symbols are initially coded using ASCII (8-bit) fixed length codes.

Steps:

1. Determine Probabilities of symbols in file.

2. Build Merge Tree (or Table) 3. Assign variable length codes to symbols.

4. Encode symbols using new codes.

5. Save coded symbols in another file together with the symbol code table.

The Compression Ratio = < L > / 8 Prof. Amr Goneid, AUC 46

Huffman Coding Animations

For examples of animations of Huffman coding, see: 

http://www.cs.pitt.edu/~kirk/cs1501/animations Huffman.html

http://peter.bittner.it/tugraz/huffmancoding.html

Prof. Amr Goneid, AUC 47