Approximation Algorithms - Renmin University of China

Download Report

Transcript Approximation Algorithms - Renmin University of China

Approximation Algorithms
1
Motivation
• By now we’ve seen many NP-Complete problems.
• We conjecture none of them has polynomial
time algorithm.
2
Motivation
• Is this a dead-end? Should we give up
altogether?
3
Motivation
• Or maybe we can settle for good
approximation algorithms?
4
Introduction
• Objectives:
– To formalize the notion of approximation.
– To demonstrate several such algorithms.
• Overview:
– Optimization and Approximation
– VERTEX-COVER, SET-COVER
5
Optimization
• Many of the problems we’ve
encountered so far are really
optimization problems.
• I.e - the task can be naturally
rephrased as finding a
maximal/minimal solution.
• For example: finding a maximal
clique in a graph.
6
Approximation
• An algorithm which returns an answer C
which is “close” to the optimal solution
C* is called an approximation algorithm.
• “Closeness” is usually measured by the
ratio bound (n) the algorithm produces.
• Which is a function that satisfies, for
any input size n, max{C/C*,C*/C}(n).
7
VERTEX-COVER
• Instance: an undirected graph G=(V,E).
• Problem: find a set CV of minimal size s.t.
for any (u,v)E, either uC or vC.
Example:
8
Minimum VC NP-hard
Proof: It is enough to show the decision
problem below is NP-Complete:
• Instance: an undirected graph G=(V,E) and a
number k.
• Problem: to decide if there exists a set V’V of
size k s.t for any (u,v)E, uV’ or vV’.
This follows immediately from the following
observation.
9
Minimum VC NP-hard
Observation: Let G=(V,E) be an undirected
graph. The complement V\C of a vertexcover C is an independent-set of G.
Proof: Two vertices outside a vertex-cover
cannot be connected by an edge. 
10
COR(B) 523-524
VC - Approximation Algorithm
• C
• E’  E
• while E’  
– do let (u,v) be an arbitrary edge of E’
–
C  C  {u,v}
–
remove from E’ every edge
incident to either u or v.
• return C.
11
Demo
Compare this cover to12the
one from the example
Polynomial Time
• C
• E’  E O(n2)
• while E’   do
– let (u,v) be an arbitrary edge of E’
– C  C  {u,v}
O(n2)
– remove from E’ every edge incident
to either u or v
• return C
O(1)
O(n)
13
Correctness
The set of vertices our algorithm returns is
clearly a vertex-cover, since we iterate
until every edge is covered.
14
How Good an Approximation is it?
Observe the set of edges our algorithm chooses
no common vertices!  any VC contains 1 in each
15
our VC contains both, hence at most twice as large
The Traveling Salesman
Problem
16
The Mission:
A Tour Around the World
17
The Problem:
Traveling Costs Money
1795$
18
Introduction
• Objectives:
– To explore the Traveling Salesman
Problem.
• Overview:
–
–
–
–
TSP: Formal definition & Examples
TSP is NP-hard
Approximation algorithm for special cases
Inapproximability result
19
TSP
• Instance: a complete weighted undirected
graph G=(V,E) (all weights are non-negative).
• Problem: to find a Hamiltonian cycle of
minimal cost.
3
1
3
4
2
10
5
20
Polynomial Algorithm for TSP?
What about the greedy
strategy:
At any point, choose the closest
vertex not explored yet?
21
The Greedy $trategy Fails
10

2

5
12
3

1
0
22
The Greedy $trategy Fails
10

2

5
12
3

1
0
23
TSP is NP-hard
The corresponding decision problem:
• Instance: a complete weighted undirected
graph G=(V,E) and a number k.
• Problem: to find a Hamiltonian path whose
cost is at most k.
24
TSP is NP-hard
verify!
Theorem: HAM-CYCLE p TSP.
Proof: By the straightforward efficient
reduction illustrated below:
0
0
1
0 1
k=0
0
HAM-CYCLE
TSP
25
What Next?
• We’ll show an approximation
algorithm for TSP,
• which yields a ratio-bound of 2
• for cost functions which satisfy
a certain property.
26
The Triangle Inequality
Definition: We’ll say the cost function c
satisfies the triangle inequality, if
u,v,wV : c(u,v)+c(v,w)c(u,w)
27
COR(B) 525-527
Approximation Algorithm
1. Grow a Minimum Spanning Tree
(MST) for G.
2. Return the cycle resulting from a
preorder walk on that tree.
28
Demonstration and Analysis
The cost of a
minimal hamiltonian
cycle  the cost of a
MST

29
Demonstration and Analysis
The cost of a
preorder walk is
twice the cost of
the tree
30
Demonstration and Analysis
Due to the triangle
inequality, the
hamiltonian cycle is
not worse.
31
COR(B) 528
What About the General
Case?
• We’ll show TSP cannot be
approximated within any
constant factor 1
• By showing the corresponding
gap version is NP-hard.
32
gap-TSP[]
• Instance: a complete weighted undirected
graph G=(V,E).
• Problem: to distinguish between the
following two cases:
There exists a hamiltonian cycle, whose
cost is at most |V|.
The cost of every hamiltonian cycle is
more than |V|.
33
Instances
1

1


1
min cost
|V|
|V|
0

1
0
0
+1
34
What Should an Algorithm
for gap-TSP Return?
NO!
YES!
|V|
DON’T-CARE...
min cost
|V|
gap
35
gap-TSP & Approximation
Observation: Efficient approximation of
factor  for TSP implies an efficient
algorithm for gap-TSP[].
36
gap-TSP is NP-hard
Theorem: For any constant 1,
HAM-CYCLE p gap-TSP[].
Proof Idea: Edges from G cost 1. Other
edges cost much more.
37
The Reduction Illustrated
1
1
|V|+1
1
|V|+1
1
HAM-CYCLE
gap-TSP
Verify (a) correctness
(b) efficiency
38
Approximating TSP is NPhard
gap-TSP[] is NP-hard
Approximating TSP within
factor  is NP-hard
39
Summary

• We’ve studied the Traveling Salesman
Problem (TSP).
• We’ve seen it is NP-hard.
• Nevertheless, when the cost function
satisfies the triangle inequality, there
exists an approximation algorithm with
ratio-bound 2.
40
Summary

• For the general case we’ve proven there is
probably no efficient approximation
algorithm for TSP.
• Moreover, we’ve demonstrated a generic
method for showing approximation
problems are NP-hard.
41
SET-COVER
• Instance: a finite set X and a family F of
subsets of X, such that
X  S
SF
• Problem: to find a set CF of minimal size
which covers X, i.e -
X
S
SC
42
SET-COVER: Example
43
SET-COVER is NP-Hard
Proof: Observe the corresponding decision
problem.
• Clearly, it’s in NP (Check!).
• We’ll sketch a reduction from (decision)
VERTEX-COVER to it:
44
VERTEX-COVER p SET-COVER
one element
for every edge
VC
SC
one set for every vertex,
containing the edges it covers
45
COR(B) 530-533
Greedy Algorithm
• C
• UX
• while U   do
– select SF that maximizes |SU|
min{|X|,
|F|}
– C  C  {S}
–UU-S
• return C
O(|F|·|X|)
46
Demonstration
compare to
the optimal
cover
0
5
4
3
2
1
47
Is Being Greedy Worthwhile?
How Do We Proceed From Here?
• We can easily bound the approximation
ratio by logn.
• A more careful analysis yields a tight
bound of lnn.
48
Loose Ratio-Bound
Claim: If  cover of size k, then after k iterations
the algorithm covered at least ½ of the elements.
the n elements
Suppose it doesn’t
and observe the
situation after k
iterations:
>½
what we covered
49
Loose Ratio-Bound
Claim: If  cover of size k, then after k iterations
the algorithm covered at least ½ of the elements.
Since this part 
can also be covered
by k sets...
the n elements
>½
what we covered
50
Loose Ratio-Bound
Claim: If  cover of size k, then after k iterations
the algorithm covered at least ½ of the elements.
the n elements
there must be a
set not chosen
yet, whose size is
at least ½n·1/k
>½
what we covered
51
Loose Ratio-Bound
Claim: If  cover of size k, then after k iterations
the algorithm covered at least ½ of the elements.
and the
claim is
proven!
the n elements
>½
Thus in each of
the k iterations
we’ve covered at
least ½n·1/k new
elements
what we covered
52
Loose Ratio-Bound
Claim: If  cover of size k, then after k iterations
the algorithm covered at least ½ of the elements.
Therefore after klogn iterations (i.e - after
choosing klogn sets) all the n elements must be
covered, and the bound is proved.
53
Tight Ratio-Bound
Claim: The greedy algorithm approximates
the optimal set-cover within factor
H(max{ |S|: SF } )
Where H(d) is the d-th harmonic number:
1
H(d)  
i1 i
def d
54
Tight Ratio-Bound
n
1 n 1
1
   1   dx  1  lnn  1

x
k 1 k
k 2 k
1
n
55
Claim’s Proof
• Whenever the
algorithm chooses
a set, charge 1.
• Split the cost
between all
covered vertices.
0.2
0.2
0.2
0.2
0.2
56
Analysis
• That is, we charge every
element xX with
1
cx 
| Si  (S1  ...  Si1 ) |
def
• Where Si is the first set which
covers x.
cx
57
Lemma
Lemma: For every SF,
c
xS
x
 H(| S |)
Number of
members of S
left uncovered
after i
iterations
Proof: Fix an SF. For any i, Define
def
ui  | S  (S1  ...  Si ) |
Let k be the smallest index, for which uk=0.
1ik : Si covers ui-1ui elements from S
58
Lemma
This
This
last
isweaobservation
H(0)=0
udefined
telescopic
u0k=|S|
=0
yields:
sum
Since
Our
for
greedy
any 1i|C|
strategy
promises
uSi ias
(1ik)
|S-(Scovers
)|...
For
any
b>aN,
H(b)-H(a)=1/(a+1)+...+1/(b)(b-a)·1/b
1...Siat
least as many new elements as S.
k

i1
1
1
(u
)- H(u
cx 
(H(u
| Si-0|1 )-i--u1 i)H(u
H(u
)
0
i ))
k
|uS
S
i-1i - (S1  ...  Si1 ) |
i1
k
59
Analysis
Now we can finally complete our analysis:
| C |  cx
xX
 c
SC * xS
x
| C* | H(max{| S |: S  F})
60
Summary

• As it turns out, we can sometimes find
efficient approximation algorithms for NPhard problems.
• We’ve seen two such algorithms:
– for VERTEX-COVER (factor 2)
– for SET-COVER (logarithmic factor).
61
The Subset Sum Problem
• Problem definition
– Given a finite set S and a target t, find a subset S’ ⊆ S whose
elements sum to t
• All possible sums
– S = {x1, x2, .., xn}
– Li = set of all possible sums of {x1, x2, .., xi}
• Example
–
–
–
–
S = {1, 4, 5}
L1 = {0, 1}
L2 = {0, 1, 4, 5} = L1  (L1 + x2)
L3 = {0, 1, 4, 5, 6, 9, 10} = L2  (L2 + x3)
• Li = Li-1  (Li-1+xi)
62
Subset Sum, revisited:
• Given a set S of numbers, find a subset S’
that adds up to some target number t.
• To find the largest possible sum that doesn’t
exceed t:
x + T adds x to each
element in the set T
T = {0};
for each x in S {
Potential doubling
T = union(T, x+T);
at each step
remove elements from T that exceed t;
}
return largest element in T;
n
Complexity O(2 )
• (Aside: How should we implement T?)
63
Trimming:
• To reduce the size of the set T at each
stage, we apply a trimming process.
• For example, if z and y are consecutive
elements and (1-d)y  z < y, then remove z.
• If d=0.1, {10,11,12,15,20,21,22,23,24,29}
 {10,12,15,20,23,29}
64
Subset Sum with Trimming:
• Incorporate trimming in the previous
algorithm:
T = {0};
for each x in S {
0  d  1/n
T = union(T, x+T);
T = trim(d, T);
remove elements from T that exceed t;
}
return largest element in T;
• Trimming only eliminates values, it
doesn’t create new ones. So the final
result is still the sum of a subset of S
that is less than t.
65
• At each stage, values in the trimmed T
are within a factor somewhere between
(1-d) and 1 of the corresponding values
in the untrimmed T.
• The final result (after n iterations) is
within a factor somewhere between (1d)n and 1 of the result produced by the
original algorithm.
66
• After trimming, the ratio between
successive elements in T is at least 1/(1d), and all of the values are between 0 and
t.
• Hence the maximum number of elements
in T is: log(1/(1-d)) t  (log t / d).
• This is enough to give us a polynomial
bound on the running time of the
algorithm.
67
Subset Sum – Trim
• Want to reduce the size of a list by “trimming”
–
–
–
–
–
–
–
L: An original list
L’: The list after trimming L
d: trimming parameter, [0..1]
y: an element that is removed from L
z: corresponding (representing) element in L’ (also in L)
(y-z)/y  d
(1-d)y  z  y
–
–
–
–
–
–
L = {10, 11, 12, 15, 20, 21, 22, 23, 24, 29}
d = 0.1
L’ = {10,
12, 15, 20,
23,
29}
11 is represented by 10. (11-10)/11  0.1
21, 22 are represented by 20. (21-20)/21  0.1
24 is represented by 23. (24-23)/24  0.1
• Example
68
Subset Sum – Trim (2)
•
Trim(L, d)
1.
2.
6.
7.
8.
L’ = {y1}
last = y1 // most recent element z in L’ which represent
elements in L
for i = 2 to m do
if last < (1-d) yi then
// (1-d)y  z  y
// yi is appended into L’ when it cannot be represented
by last
append yi onto the end of L’
last = yi
return L’
–
–
–
L = {10, 11, 12, 15, 20, 21, 22, 23, 24, 29}
d = 0.1
L’ = {10,
12, 15, 20,
23,
29}
3.
4.
5.
•
•
// L: y1, y2, .., ym
Example
O(m)
69
Subset Sum – Approximate Algorithm
•
Approx_subset_sum(S, t, e)
•
Example
// S=x1,x2,..,xn
1. L0 = {0}
2. for i = 1 to n do
3.
Li = Li-1  (Li-1+xi)
4.
Li = Trim(Li, e/n)
5.
Remove elements that are greater than t from Li
6. return the largest element in Ln
–
–
–
–
L = {104, 102, 201, 101}, t=308, e=0.20, d = e/n=0.05
L0 = {0}
L1 = {0, 104}
L2 = {0, 102, 104, 206}
• After trimming 104: L2 = {0, 102, 206}
– L3 = {0, 102, 201, 206, 303, 407}
• After trimming 206: L3 = {0, 102, 201, 303, 407}
• After removing 407: L3 = {0, 102, 201, 303}
– L4 = {0, 101, 102, 201, 203, 302, 303, 404}
• After trimming 102, 203, 303: L4 = {0, 101, 201, 302, 404}
• After removing 404: L4 = {0, 101, 201, 302}
– Return 302 (=201+101)
70
Subset Sum - Correctness
• The approximation solution C is not smaller than (1e) times of an optimal solution C*
– i.e., C*(1-e)  C
• Proof
– for every element y in L there is a z in L’ such that
• (1-e/n)y  z  y
– for every element y in Li there is a z in L’i such that
• (1-e/n)i y  z  y
– If y* is an optimal solution in Ln, then there is a
corresponding z in Ln’
• (1-e/n)n y*  z  y*
– Since (1-e) < (1-e/n)n
• (1-e) y*  (1-e/n)n y*  z
• (1-e) y*  z
[ (1-e/n)n is increasing ]
– So the value z returned is not smaller than 1-e times the71
optimal solution y*
Subset Sum – Correctness (2)
• The approximation algorithm is fully
polynomial
• Proof
– Successive elements z and z’ in Li’ must have the
relationship
• z/z’ = 1/(1-e/n)
• i,e, they differ by a factor of at least 1/(1-e/n)
– The number of elements in each Li is at most
• log 1/(1-e/n) t
[ t is the largest ]
• = ln t / (-ln(1-e/n))
•  (ln t) / (-(-e/n)) [Eq. 2.10: x/(1+x)  ln(1+x)  x, for
x > -1]
72
•  (n ln t) / e
Summary:
• Not all problems are computable.
• Some problems can be solved in polynomial
time (P).
• Some problems can be verified in
polynomial time (NP).
• Nobody knows whether P=NP.
• But the existence of NP-complete
problems is often taken as an indication
that PNP.
• In the meantime, we use approximation to
find “good-enough” solutions to hard
problems.
73

What’s Next?
• But where can we draw the line?
• Does every NP-hard problem have an
approximation?
• And to within which factor?
• Can approximation be NP-hard as
well?
74