没有幻灯片标题 - Zhejiang University

Transcript 没有幻灯片标题 - Zhejiang University

§1 Greedy Algorithms
2. Huffman Codes – for file compression
〖Example〗 Suppose our text is a string of length 1000 that
comprises the characters a, u, x, and z. Then it will take 8000
?
bits to store the string as 1000 one-byte characters.
We may encode the symbols as a = 00, u = 01, x = 10, z = 11. For
example, aaaxuaxz is encoded
0000001001001011.
Then the space
Noticeas
that
we have only
taken by the string
with length
1000 will
bits + space for
4 distinct
characters
in be
that2000
string.
code table. /* log C bits
are we
needed
a standard encoding where C
Hence
need in
only
is the size of the character set */
2 bits to identify them.
 frequency ::= number of occurrences of a symbol.
In string aaaxuaxz , f(a) = 4, f(u) = 1, f(x) = 2, f(z) = 1.
The size of the coded string can be reduced using variable-length
00010110010111
codes, for example, a = 0, u = 110, x = 10, z = 111.
Note: If all the characters occur with the same frequency, then
there are not likely to be any savings.
1/17
§1 Greedy Algorithms
Representation of the original
code in a binary tree /* trie */
0
a
0
1
1
u
0
x
 If character Ci is at depth di and
occurs fi times, then the cost of the
code =  di fi .
Cost ( aaaxuaxz  0000001001001011 )
= 24 + 21 + 22 + 21 = 16
1
z
Representation of the optimal
Now, with
code in a binary tree
The answer
a = 0,is uaaaxuaxz
= 110,(with
x = 10, z = 111
Cost ( aaaxuaxz
u00010110010111
)= 111).
The
trick
is:
aand
= 0,
=
110,
x
=
10,
z
the
string
00010110010111, 1
= 14 + 31
+
22
+
31
=
14
No
code
is
a
prefix
of
another.
What makes this decoding
Allmethod
nodes either are leaves
can
it?
work?you decode
or
have
1
0
 Any sequence of bits can always be two children.
0
decoded unambiguously if the characters
0
1
are placed only at the leaves of a full tree –
a
x
u
z
such kind of code is called prefix code.
Find the full binary tree of minimum total cost where all
characters are contained in the leaves.
2/17
§1 Greedy Algorithms
 Huffman’s Algorithm (1952)
void Huffman ( PriorityQueue heap[ ], int C )
{ consider the C characters as C single node binary trees,
and initialize them into a min heap;
for ( i = 1; i < C; i++ ) {
create a new node;
/* be greedy here */
delete root from min heap and attach it to left_child of node;
delete root from min heap and attach it to right_child of node;
weight of node = sum of weights of its children;
/* weight of a tree = sum of the frequencies of its leaves */
insert node into min heap;
}
}
T = O( C log
? C
3/17
)
§1 Greedy Algorithms
〖Example〗
t
nl
a4
10
1
0
t
8
25
4
0
i
12
1
nl
sp
1
13
Ci
fi
a
10
e
15
s
3
t
4
sp
13
nl
1
e e sp ai : 111
a
a ai e4 18 spes sp
t
sp
nl
i
25
15 3315 18
13 12 10
25
10 10
13
e
:
10
12 15
15
3 13 4
13
12
1
1
i : 00
nl
s a
i
sp
a
1 833sp 3 e
8 18
i
es12
i
58e
15
15
312
4i
12
10
15
13
0
te s
13
s:
10
12
1
t
418
4153
0
nl
0
1
t
4
1
sat
18
310
4
8
4
nl
1
nl
4
4
4/17
i
12
0
nl
1
s
1
3
1
4
t
a
sp
10
nl
s
11011
: 1100
: 01
: 11010
3 = 310 + 215
Cost
+ 212 + 53
s
3
+ 44 + 213
+ 51
= 146
§1 Greedy Algorithms
3. Approximate Bin Packing
 The Knapsack Problem
A knapsack with a capacity M is to be packed. Given N items.
Each item i has a weight wi and a profit pi . If xi is the percentage
of the item i being packed, then the packed profit will be pi xi .
An optimal packing is a feasible one with maximum profit. That is,
n
we are supposed to find the values of xi such that  p i xi obtains its
i 1
maximum under the constrains
n
http://acm.zju.edu.cn/show_problem.php?pid=2109
w x  M and x  [0, 1] for 1  i  n
Sunny Cup 2004

i 1
i
i
i
Q: What must we do in each stage?
A: Pack one item into the knapsack.
Q: On which criterion shall we be greedy?
 maximum profit
 minimum weight
 maximum profit density pi / wi
5/17
n = 3, M = 20,
(p1, p2, p3)
= (25, 24, 15)
(w1, w2, w3)
= (18, 15, 10)
( 0, 1, 1/2 )
P = 31.5
§1 Greedy Algorithms
 The Bin Packing Problem
Given N items of sizes S1 , S2 , …, SN , such that 0 < Si  1 for all
1  i  N . Pack these items in the fewest number of bins, each of
which has unit capacity.
〖Example〗N = 7; Si = 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8
0.3
0.5
0.8
NP Hard
0.7
0.4
0.2
B1
0.1
B2
B3
An Optimal Packing
6/17
§1 Greedy Algorithms
 On-line Algorithms
Place an item before processing the next one, and can NOT
change decision.
〖Example〗Si = 0.4 , 0.4 , 0.6 , 0.6
You never know
when the input might end.
Hence an on-line algorithm
cannot always give
an optimal solution.
0.4
0.4
0.6
0.6
【Theorem】There are inputs that force any on-line
bin-packing algorithm to use at least 4/3 the optimal
number of bins.
7/17
§1 Greedy Algorithms
 Next Fit
void NextFit ( )
{ read item1;
while ( read item2 ) {
if ( item2 can be packed in the same bin as item1 )
place item2 in the bin;
else
create a new bin for item2;
item1 = item2;
} /* end-while */
}
【Theorem】Let M be the optimal number of bins required to pack
a list I of items. Then next fit never uses more than 2M bins. There
exist sequences such that next fit uses 2M – 2 bins.
8/17
 First Fit
§1 Greedy Algorithms
void FirstFit ( )
{ while ( read item ) {
scan for the first bin that is large enough for item;
if ( found )
place item in that bin;
else
Can be implemented
create a new bin for item;
} /* end-while */
in O( N log N )
}
【Theorem】Let M be the optimal number of bins required to pack
a list I of items. Then first fit never uses more than 17M / 10 bins.
There exist sequences such that first fit uses 17(M – 1) / 10 bins.
 Best Fit
Place a new item in the tightest spot among all bins.
T = O( N log N ) and bin no. < 1.7M
9/17
§1 Greedy Algorithms
〖Example〗Si = 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8
Next Fit
First Fit
Best Fit
0.3
0.1
0.1
0.1
0.3
0.5
0.8
0.7
0.4
0.2
0.3
0.5
0.7
0.8
0.5
0.7
0.4
0.2
0.4
0.2
〖Example〗Si = 1/7+, 1/7+, 1/7+, 1/7+, 1/7+, 1/7+,
where  = 0.001.
1/3+, 1/3+, 1/3+, 1/3+, 1/3+, 1/3+,
1/2+, 1/2+, 1/2+, 1/2+, 1/2+, 1/2+
 The optimal solution requires
?6 bins.
However, all the three on-line algorithms require 10
? bins.
10/17
0.8
§1 Greedy Algorithms
 Off-line Algorithms
View the entire item list before producing an answer.
Trouble-maker: The large items
Solution: Sort the items into non-increasing sequence of sizes. Then
apply first (or best) fit – first (or best) fit decreasing.
〖Example〗Si = 0.2,
0.8,0.5,
0.7,0.4,
0.5,0.7,
0.4,0.1,
0.3,0.3,
0.2,0.8
0.1
0.2
0.1
0.3
0.4
0.8
0.7
0.5
【Theorem】Let M be the optimal number
of bins required to pack a list I of items.
Then first fit decreasing never uses more than
11M / 9 + 4 bins. There exist sequences such
that first fit decreasing uses 11M / 9 bins.
Simple greedy heuristics can give good results.
11/17
§2 Divide and Conquer
Divide: Smaller problems are solved recursively (except base cases).
Conquer: The solution to the original problem is then formed from the
solutions to the subproblems.
Cases solved by divide and conquer
 The maximum subsequence sum – the O( N log N ) solution
 Tree traversals – O( N )
 Mergesort and quicksort – O( N log N )
Note: Divide and conquer makes at least two recursive calls
and the subproblems are disjoint.
12/17
§2 Divide and Conquer
1. Running Time of Divide and Conquer Algorithms
【Theorem】The solution to the equation
T(N) = a T(N / b) + (Nk logpN ),
where a  1, b > 1, and p  0 is
 O( N logb a )
if a  b k


T ( N )   O( N k logp 1 N ) if a  b k

k
p
if a  b k
 O( N log N )
〖Example〗 Mergesort has a = b = 2, p = 0 and k = 1.
T = O( N log N )
〖Example〗 Divide with a = 3, and b = 2 for each recursion;
Conquer with O( N ) – that is, k = 1 and p = 0 .
T = O( N1.59 )
If conquer takes O( N2 ) then T = O( N2 ) .
13/17
§2 Divide and Conquer
2. Closest Points Problem
Given N points in a plane. Find the closest pair of points. (If two
points have the same position, then that pair is the closest with
distance 0.)
 Simple Exhaustive Search
Check N ( N ?– 1 ) / 2 pairs of points. T = O( N
? 2 ).
 Divide and Conquer – similar to the maximum subsequence
〖Example〗
sum problem
Sort according to
x-coordinates and
divide;
Conquer by forming
a solution from left,
right, and cross.
14/17
§2 Divide and Conquer
is so simple,
Just likeItfinding
It is O(
N
log
N
)
all
right.
How about k ?
and we
clearly have
an
the max
subsequence
sum,
But
is
it
really
Can you find the cross distance
O( have
N logaN=)balgorithm.
we
=2…
clearly
so?
in linear time?
15/17
§2 Divide and Conquer
If
= O( N ) , we have
 NumPointInStrip
- strip
/* points are all in the strip */
for ( i=0; i<NumPointsInStrip; i++ )
for ( j=i+1; j<NumPointsInStrip; j++ )
if ( Dist( Pi , Pj ) <  )
 = Dist( Pi , Pj );
The worst case: NumPointInStrip = N


The worst case:
For any pi , at most 7
points are considered.
Textra = O( N )
16/17
/* points are all in the strip */
/* and sorted by y coordinates */
for ( i = 0; i < NumPointsInStrip; i++ )
for ( j = i + 1; j < NumPointsInStrip; j++ )
if ( Dist_y( Pi , Pj ) >  )
break;
else if ( Dist( Pi , Pj ) <  )
 = Dist( Pi , Pj );
§2 Divide and Conquer
Note: Sorting y-coordinates in each recursive call gives
O( N log N ) extra work instead of O( N ).
Solution: Please read the last paragraph on p.374.
3. The Selection Problem – self-study the O( N ) algorithm
4. Big Integer Multiplication and Matrix Multiplication
Self-study: only of theoretical interests.
17/17