1) What should we teach in approximation algorithms courses?

Transcript 1) What should we teach in approximation algorithms courses?

What should be taught in
approximation algorithms
courses?
Guy Kortsarz, Rutgers
Camden
Advanced issues presented in many
lecture notes and books:
• Coloring a 3-colorable graph using vectors.
• Paper by Karger, Motwani and Sudan.
• Things a student needs to know:
Separation oracle for: A is PSD.
Getting a random vector in Rn.
This is done by choosing the Normal
distribution at every entry.
Given unit vector v, v . r is normal
distribution.
Things a student needs to know:
• There is a choice of vectors vi for every i V so
that so that for every (i, j) E, vi · vj  -1/2.
A student needs to know:
• S={i | r · vi }, threshold method, by now
standard.
• Sum of two normal distributions also normal.
• Two inequalities (non trivial) about the normal
distribution.
• The above can be used to find a large
independent set.
• Combined with the greedy algorithm gives about
n1/4 ratio approximation algorithm.
Advanced methods are also required
in the following topics often taught:
• The seminal result of Jain. With the
simplification of Nagarajan et. al. 2-ratio for
Steiner Network.
• The beautiful 3/2 ratio by Calinesco, Karloff
and Rabani, for Multiway Cuts: geometric
embeddings.
• Facharoenphol, Rao and Talwar, optimal
random tree embedding. With this can get
O(log n) for undirected multicut.
How to teach sparsest cut?
• Many still teach the embedding of a metric into L1 ,
with O(log n) distortion. By Lineal, London, Rabinovich.
• Advantage: relatively simple.
• The huge challenge posed by the Arora, Rao and
Vazirani result. Unweighted sparsest cut sqrt{log n}
• Teach the difficult lemma? Very advance. Very difficult.
• A proof appears in the book of Shmoys and
Williamson.
Simpler topics?
• I can not complain if it is TAUGHT! Of
course not. Let me give a list of basic topics
that always taught
• Ratio3/2 for TSP, the simple approximation of
2 for min cost Steiner tree.
• Set-Cover , simple approximation ratio.
• Knapsack, PTAS. Bin packing, constant ratio.
• Set-Coverage. BUT: only costs 1.
Knapsack Set-Coverage
• The Set-Coverage problem is given a set system
and a number k select k sets that cover as many
elemnts as possible.
• Knapsack version, not that known:
• Each set has cost c(s) and there is a bound B on
the maximum sum of costs, of sets we can
choose.
• Maximize number of elements covered.
Result due to Khuller , Moss
and, Naor, 1997, IPL
• The (1-1/e) ratio is possible.
• In the usual algorithm & analysis (1-1/e) only
follows if we can add the last set in the greedy
choice. Thus, fails.
• Because most times, adding the last set will give
cost larger than B.
• Trick: guess the 3 sets in OPT of least cost.
Then apply greedy (don’t go over budget B).
Why do I know this paper?
• I became aware of this result only several years after
published. And only because I worked on Min Power
Problems. No conference version!
• This result seems absolutely basic to me. Why is it no
taught?
• Remark: Choosing one (least cost) element of OPT
gives unbounded ratio. Choosing two sets of smallest
cost gives ratio ½. Guessing the three sets of least cost
and then greedy gives (1-1/e).
First general neglected topic
• Important and not taught: Maximizing a
submodular non-decreasing function under
Matroid Constrains, ratio 1/2, Fischer,
Nemhauser, Wolsey, 1977.
• Improved in 2008(!) to best possible (1-1/e) by
Vondrak in a brilliant paper.
First story: a submission I refereed
• I got a paper to referee, and it was obvious that it is
maximize Submodular function under Matroid
constrains
• If memory serves, the capacity 1, of the following
Matroid: G(V,E), edge capacities, fix S  V. T reaches
S if every vertex in T can send one unit of flow to S.
• The set of all T that reach S a special Matroid called
Gammoid. Everything in this paper, known!
• Asked Chekuri (everybody must have an oracle) what is
the Matroid, and Chekuri answered. Paper erased.
Story 2: a worse outcome.
• Problem. Input like Set-Cover but S= Si.
• Required: choose at most one set of every Si
and maximize the number of elements covered.
• Paper gave ratio ½. This is maximizing
submodular cover subject to partition Matroid.
PLEASE!!! Do not try to check who the authors
are. Not ethical. Unfair to authors, as well.
• Nice applications, but was accepted and ratio
not new.
Related to pipage rounding
• Due to Ageev, Sviridenko.
• Dependent rounding, is a generalization of
Pipage rounding by Gandhi, Khuller, Parthasarathy,
Srinivasan.
• Say that we have an LP and a constraint
xi=k. RR can not derive exact equality.
• Pipage Rounding : instead of going to a larger set of
solutions like IP to LP, we replace the objective
function.
The principals of pipage rounding
• We start with LP maximization with function
L(X).
• Define a non linear function F.
• Show that the maximum of F is integral.
• Show that integral points of F belong to the
Polyhedra of L. Namely feasible for L as long
as it is integral, and feasible for F.
The principals of pipage rounding
• Then, show that F(Xint ) ≥L(X* )/, for
> 1.
• Here Xint is the (integral) optimum of F and
X* the optimum fractional solution for L
• Because Xint is known to be feasible for L(x) due
to its integrality, it is feasible for L and thus 
approximation.
Example: Max Coverage
• Max j wi zi
S.T
 element j belongs to set i xi≥zj
 set i xi=p
xi and zj are integral
In Set Coverage we bound the number of sets.
The function F
F(x)= j wi (1-element j belongs to set i(1-xi) )
Define a function on a cycle.
As a function of .
The idea is to make plus  and then minus 
over the cycle.
• Make one entry on the cycle smaller by  and
another larger by .
•
•
•
•
The function F
• F(x)= j wi (1-element j belongs to set i(1-xj) )
• The idea is to make plus  and then minus  all
over the cycle.
• But to show convexity we make just one entry
on the cycle smaller by  and another larger by .
• The  appears as 2 in this term.
The function F
• As  appears as 2 in this term, the second
derivative is positive.
• Thus F is convex.
• Which means that the maximum is in the
borders.
• For example for x2 between -4 and 3.
• The maximum is in the border -4.
Changing the  two by two
• Putting plus and minus  alternating along a
cycle make at least one entry integral.
• Moreover, we can decompose a cycle into two
matching and there are two ways to increase and
decrease by .
• One direction of the two makes the function
not smaller.
• This implies that the optimum of F is integral.
Thus the optimum of F is
integral
• Its not hard to see that on integral vectors F and
L have the same value.
• Another inequality that is quite hard to prove is
that: 1-i=1 to k (1-xi)≥(1-(1-1/k)k)L(X)
• This gives a slightly better than 1-1/e ratio if k is
small.
Submodularity: related to very basic
technique.
• f is submodular if f(A)+f(B)f(AB)+f(AB)
• Makes a lot of difference if non-decreasing or
not. If not, in my opinion represent concave.
• If non-decreasing, brings us to the next lost
simple subject: Submodular cover problems.
• Input: U and submodular non-decreasing
function f and cost c(u) per item u.
• Required: a set S of minimum cost so that
f(S)=f(U).
Wolsey , 1982, did much better
• Each iteration pick item u so that helpu(S)/c(u) is
maximum.
• The ratio is max{u U}ln f(u)+1.
• Example: For Set-Cover ln|s|+1, s largest set.
• Example: Same for Set-Cover with hard capacities. A
paper in 1991, and one in 2002, did this result again
(second was 20 years after Wolsey). Special case after 20
years! But its worse, yet.
• Wolsey did better than that. Natural LP unbounded
ratio even for Set-Cover with hard capacities.
• Wolsey found a fabulous LP of gap max{u U}ln f(u) +1.
More general: density
•
•
•
•
•
•
•
•
Not taught at all but just cited. Why?
Here is a formal way:
Universe U and a function f: 2UR+
Each element in U has a cost c(u).
The function f not decreasing.
We want to find a minimum cost W so that f(W)=f(U).
We usually say, S U, c(S)=uS c(u)
But it works for an subadditive cost function
The density claim
• Say that we already created a set S via a greedy
algorithm.
• Now say that at any iteration we are able to find some Z
so that:
(f(Z+S)-f(S))/c(Z)≥(f(U)-f(S))/(δ·opt)
• Then the final set S has cost bounded by
(δ ln(U)+1) opt
What does it mean?
• Think for the moment of δ=1.
• Say that the current set S has no intersection with the
optimum.
• Then if we add all of OPT to S we certainly get a
feasible solution.
• Then clearly f(S+OPT)=f(U)
• And
• (f(S+OPT)-f(S))/c(Z)≥ (f(OPT)-f(S))/c(OPT)
• =(f(U)-f(S))/f(OPT)
• It means that we found a solution to add that has the
same density as adding OPT.
Proof continued
• f(U) -  j≤ i-1 f(Sj)≥ 1.
• We may assume that the cost of every set
added is at most
opt, therefore c(Sj ) ≤ opt
• Therefore it remains to bound:
 j≤ i-1 c(Zi)
Let us concentrate on what happens before
Si is added.
By the previous claims
• 1 ≤ f(U)-f(Z1+Z2 +……Zi-1)≤
Πj≤ i-1(1-c(Zi)/δ·opt)· f(U)
• 1/f(U) ≤ Πj≤ i-1(1-c(Zi)/δ·opt)·
• Take ln and use ln(1+x) ≤ x:
-ln( f(U))≤  i≤ j -c(Zi) )/δ·opt
 i≤ j c(Zi) ≤ opt δ ln( f(U))
and so the ratio of (δ ln( f(U))+1) follows.
A paper of mine
• Min c  x subject to ABx b, with A positive entries
and B flow matrix. Ratio logarithmic.
• We got much more general results. The above I was
sure then and sure now, KNOWN and presented as
known.
• Referees: Cite, or prove submodularity! We had to
prove (referees did not agree its known!).
• Example: gives log n for directed Source Location.
Maybe first time stated but I considered it known.
• This log n was proved at least 4 times since then.
Remarks
• The bad thing about these 4 papers is not that did not
know our paper (to be expected) but that they would
think such a simple result is NOT KNOWN.
• It is good to know the result of Wolsey: for example,
used it recently (Hajiaghayi ,Khandekar,K , Nutov) to
give a lower bound of about log 2 n for a problem in
fashion: Capacitated Network Design (Steiner network
with capacities). First lower bound for hard capacities.
Dual fitting and a mistake we all
make
• 1992. GK to Noga Alon:
• This (spanner) result bares similarities to the proof
done by Lovats for set-cover.
• Noga Alon (seems very unhappy, maybe angry): Give
me a break! That is folklore. Lovats told me he wrote it
so he would have something to cite.......
• Everybody cites Lovats here. Its simply not true.
• We don’t know the basics. Result known many years
before 1975.
• Should we cite folklore? Yes!
HOW to teach dual fitting for set
cover, unweighted?
• Let S be the collection of sets and T the
elements.
• The dual, costs 1: Maximize tT yt
• Subject to: ts yt  c(s)=1
• We define a dual: if the greedy chose a star of
length i, each element in the set gets 1/i
.2 .2 .2
.2
.2
The bound on the sum of elements
of a given set
1/7 1/5
1/7 1/4 1/4
1/3
1/12 1/11
1/12 1/12
1/10 1/9
1/12 1/8
1/12 1/7 1/6
1/2
1/2
1
Primal Dual of GW
• Goemans and Williamson gave a rather well
known Primal-Dual algorithm. Always taught,
and should be.
• A question I asked quite several researchers and
I don’t remember a correct response: Why
reverse delete?
• Why not Michael Jackson?
• GW primal dual imitates recursion.
• In LR reverse delete follows from recusrsion.
Local Ratio for covering problems
• Give weights to items so that every minimal.
solution is a  approximation. Reduce items
costs by weights chosen.
• Elements of cost 0 enter the solution.
• Make minimal.
• Recurse.
• No need for reverse delete. Recursion implies it.
• Simpler for Steiner Network in my opinion.
Local Ratio
• Without it I don’t think we could find a ratio 2 for
Vertex feedback set.
• A recent result of K, Langberg, Nutov. Minor result
(main results are different) but solves an open problem
of a very smart person: Krivelevich.
• Covering triangles, gap 2 for LP (polynomial).
• Open problem: tight?
• Not only we showed tight family but showed as hard as
approximating VC. Used LR in proof.
Group Steiner problem on trees
• Group Steiner problem on trees.
• Input : An undirected weighted rooted by r tree
T = (V; E) and subsets S1,……,Sp V.
• Goal: Find a tree in G that connects at least one vertex
from each Si to r.
• The Garg, Konjevod and Ravi proof while quite simple
can be much much further simplified. In both proofs:
O(log n· log p) ratio.
• The easier (unpublished) proof is by Khandekar and
Garg.
The theorem of Garg Konjevod and
Ravi
• There is an O(h log p)-approximation algorithm
for Group Steiner on trees.
T= (V; E) rooted at r has depth h.
• Simple observation: we may assume that the
groups only contain leaves by adding zero cost
edges.
• The GKR result uses an LP methods.
The fractional LP
• Minimize e cost(e)· xe
frg=1 For every g.
feg≤ xe
fvg ≤ v’ child of v fvv’(g)
fvg = fpar(v) v(g)
The xe are capacities. Under that, the sum of flows from r to the leaves that
belong to g is 1. If we set xe =1 for the edges of the optimum we get an
optimum solution.
Thus the above (fractional) LP is a relaxation.
The rounding method of GKR
• Consider xe and say that its parent edge is
(par(v),v)
• Independently for every e, add it to the solution
with probability xe/xpar(v)v
• We show that the expected cost is bounded by
the LP cost.
• The probability that an edge gets to the root
is a telescopic multiplication.
The probability that an edge is
chosen
• All terms cancel but the first and the last. The
First is xe. The last is the flow from ‘ The parent
of r to r ’ which we may assume is 1.
• Since this is the case, xe contributes
xe· cost(e) to the expected cost.
• Therefore, the expected cost is the LP value
which is at most the integral optimum.
However: what is the probability that a group is
covered?
The probability a group is
covered
• Let v be a vertex at level I in the tree,
then the probability that after rounding
there is a path from v to a vertex in g
is at least:
fvg /((h-i+1)· xpar(v)v)
Let P(v) be this probability that the
group is not covered
• Let P(v) be the probability that there is no path
from v to a leaf in group g. In the next
inequalities a vertex v’ is always a child of v and
the corresponding edge is e=(v,v’).
• P(v)=Πv’ (1-(xe· (1-p(v’))/xpar(v)v )
• Explanation: The probability for a group to get
connected to v’ for some child v’ of v is (1-P(v’)).
Given that, the probability that the edge (v,v’)
gets selected is xe·/xpar(v)v . The multiplication is
because the events are independent for different
children.
Proof continued
• (1-P(v’)) is the probability that v’ can reach a leaf
of g by a path after the randomized process.
• By the induction assumption:
(1-P(v’)) ≥ fgv’ /((h-i+1)· xpar(v)v)
Therefore:
P(v)≤Π(1-xe· fgv’ /(xpar(v)v(h-i)·xe)=
Π (1-fgv’ /(xpar(v)v(h-i))
Proof continued
• We use the inequality 1-x≤exp(-x) to get the
inequality:
P(v) ≤ exp(- fgv’
/(xpar(v)v(h-i))
• From the constrains of the LP we get:
• P(v) ≤exp( -fgv/(xpar(v)v(h-i)))
Ending the proof
• Use the inequality exp(1/(1-x))≤1-1/x
to get:
• P(v) ≤ 1- fvg/((h-i-1) · xpar(v)v)
• This ends the proof.
• We now only have to consider v=r
Proof continued
For the root we may think of xpar(r)r=1
• For the root frg=1 and thus the
probability that a group is covered is at
least 1/(h+1). The probability that a
group is not covered in (h+1)· ln p
iterations is at most
• (1-1/(h+1))(h+1)·ln p exp(-ln p)=1/p
•
End of proof.
• Since a group is not covered with probability
1/p we can take every uncovered group and join
it by a shortest path to r. A shortest path from
any group member to r is at most opt.
• Thus the expected cost of this final stage is:
1/p· p · opt=opt
• Thus the expected cost is (h+1)ln p· opt+opt
Making the h=log n
• Question: If the input for Group Steiner is a very tall
tree to begin with. How do we get O(log2 n) ratio?
• Use FRT? Looses a log n and complicated.
• Basic but probably not widely known: Chekuri Even
and Kortsarz show how to reduce the height of any
tree to log n with a penalty 8 on the cost. Combinatorial!
• In summary, we get an elementary analysis of
O(log n· log p) approximation ratio for the Group
Steiner on trees.
Recursive greedy
• Never taught. Directed Steiner, basic problem.
• A gem by Charikar et al. Say that the number of
terminals to be covered is z. There is a child u in T*
whose density is at most opt/z.
• Let z’ be the number of terminals in T*u
• The analysis stops once we cover at least z’/(h-1)
terminals. Details omitted but gives telescopic
multiplication that means density returned at most
hopt/z.
• Can make h O(1/) with ratio penalty n1/
(Zelikovsky). Time: larger but in the ball park of nh =
nO(1/).
Alternative approximation algorithm
for Directed Steiner
• This was known (Chekuri told me) apparently in more
complex form, since 1999.
• The simpler way (as far as I know) Mendel and Nutov.
• Create a graph H in which each path from the root r to
some terminal u of length at most 1/ , is a node.
• There is a directed edge between p’ and p if p extends p’
by one edge.
• By the theorem of Zelikovsky, a solution of cost at
most O(n 1/ )opt is embedded in H.
A non recursive greedy
approximation for Directed Steiner
• For every terminal t, make a group Ht of all paths of
length at most 1/ that start at r and end at t.
• This reduces the problem to Group Steiner on trees:
Connect at least one terminal of Ht by a path from r ,
for every t . Our analysis works and it’s a page and a
half.
• This gives a non Recursive Greedy algorithm of two
pages for Directed Steiner with same ratio: n. Only
black box is the (very complex) height reduction CEK
and the Zelikovsky theorem.
Certificate of failure
• Many papers say that: 'The value opt of OPT is
KNOWN'.
• Knowing opt?? How can we know opt? Absurd. Means
P=NP.
• I first saw this in a paper of Hochbaum and Shmoys
from J.ACM 1984. The paper is called: Powers of
graphs: A powerful approximation technique for
bottleneck problems.
• Certificate of failure. Take . If  < opt the algorithm
may return a set of size  opt.
• Alternatively, may return failure. In this case  < opt.
and then this hold true (this is why its certificate).
Certificate of failure
• In case  > opt algorithm returns a solution of
cost at most  opt.
• Binary search: fails for /2 but succeeds with 
. As /2<opt, and return a solution of cost at most 
opt, the ratio is 2
• Referees of my papers failed to understand that,
many many times. Convention does not seem to
be known to all. Should be!
Density LP: useful and basic
• Say that you have an LP for a covering problem
that has some good ratio.
• But now you only want to cover k of the
elements. For every element x, there will be a
variable yx that says how much x is taken.
• We write  yx=k but then divide the sum by k
which means that the objective value is also
divided by k. Thus we try to solve a density LP.
Density LP
• You can get the original ratio with penalty in the
ratio of O(log 2 n)
• Number of items inside the solution may be
much more than k therefore if we can get
exactly k may depend on the possibility of
density decomposition.
• I first was shown this (by Chekuri) about 6 years
ago. What do I not know about LP now? I fear
that a lot.
Application of the basics, example 1
• Broadcast problem, directed graph, Steiner set S.
• A vertex r knows a message and the goal is to
transmit it to all of S. Let K be the set that know
the message and N those who don’t. At every
round a directed matching from K to N.
• The endpoint in N of the matching join K.
• Minimize number of rounds.
• Let k=|S|. Remark: Result obtained with Elkin.
Algorithm
• Find u that has at least sqrt{k} terminals at distance at
most opt from u.
• Remove Tu with exactly sqrt{k} terminals from G and
height at most opt. Let N remaining vertices.
• Iterate untill no such u.
• Let K’ be the union of trees, R be the roots. Clearly
number of roots at most sqrt{k}.
• Can not employ recursion but can inform all K’ in
2sqrt{k}+2opt rounds.
To finish enough to inform distance
opt dominating set DN
• Cover NS by trees rooted at D. No vertex in
those trees has more than sqrt{k} terminals at
distance opt. So informing the rest of N given
D K, requires opt+sqrt {k} rounds.
• How do we inform a distance opt dominating
set?
• Reduce to the minimization version of
maximazing a non-decreasing submodular
function under partition Matroid.
Define a new graph
(k,n) (k,n1) (k,n2)
z
p
q
(k,np)
z p q
opt opt opt
n
Finding k<|S|Arborescence from r
with minimum maximum outdegree
s’
sqrt{k}
sqrt{k}
W
sqrt{k}

t’ k
1
1
t
Solution
• Solution obtained with Khandekar and Nutov.
• Edges that carry flow and an arborescence from
r to W. Flow(W) non-decreasing submodular
• We prove there exists a size sqrt{k/} feasible
W. Non-trivial proof, omitted.
• The capacity of vertices and edges, divided by
 is also sqrt{k/}.
• By the Wolsey theorem about sqrt{k/} ratio
approximation. The LP gap is sqrt{k}!
Summary
• It goes without saying that my opinions bound me only.
• My intention is not to change courses for real. Will be
presumptuous.
• Will I follow my own advice? Yes.
• Can not only use the wonderful existing slides.
• The little man always had to struggle in very difficult
circumstances.
• Thank you