On Routing without Regret Avrim Blum CMU Portions of this talk are joint work with Eyal Even-Dar, Katrina Ligett, Yishay Mansour, and Brendan McMahan.

Download Report

Transcript On Routing without Regret Avrim Blum CMU Portions of this talk are joint work with Eyal Even-Dar, Katrina Ligett, Yishay Mansour, and Brendan McMahan.

On Routing without Regret
Avrim Blum
CMU
Portions of this talk are joint work with Eyal
Even-Dar, Katrina Ligett, Yishay Mansour,
and Brendan McMahan.
Plan for this talk:
 History and background on “no-regret”
algorithms.
 Some recent results / new directions, esp
motivated by routing problems.
 What can we say about global behavior if
everyone is optimizing in this way, in
Wardrop traffic model.
Consider the following setting…
 Each morning, you need to pick
one of N possible routes to get to
work.
 But traffic is different each day.


Not clear a priori which will be best.
When you get there you find out how
long your route took. (And maybe
others too or maybe not.)
IAS
32
 Is there a strategy for picking routes so that in the
long run, whatever the sequence of traffic patterns
has been, you’ve done not much worse than the best
fixed route in hindsight? (In expectation, over
internal randomness in the algorithm)
 Yes.
mi
“No-regret” algorithms for repeated games
The setup:
 Repeated play of matrix game with N rows. (Algorithm is
row-player, rows represent different possible actions).
Algorithm
Adversary – world - life
 At each time step, algorithm picks row, life picks column.
 Alg pays cost for action chosen.
 Alg gets column as feedback (or just its own cost in
the “bandit” model).
 All entries scaled to be losses/costs between 0 & 1.
“No-regret” algorithms for repeated games
 At each time step, algorithm picks row, life picks column.
Define
total regret in T time steps as difference between
 Alg pays cost for action chosen.
(expected) total cost incurred and cost of best fixed row
 Alg gets column as feedback (or just its own cost in
in hindsight. Average regret is (total regret)/T, which we
the “bandit” model).
want to go to 0 or better [= “no-regret” algorithm].
 All entries scaled to be losses/costs between 0 & 1.
AKA “combining expert advice in the decision-theoretic
setting”.
Some intuition & properties of no-regret algs.
 In fact, existence of no-regret algs
yields proof of minimax thm.
Algorithm
 Time-average performance guaranteed to
approach minimax value V of game (or
better, if life isn’t adversarial).
 If can implement adversary (separation)
oracle, can use to get apx minimaxoptimal. Also, two NR algorithms played
against each other will have empirical
distribution approach minimax optimal.
 Algorithms must be randomized or else
it’s hopeless.
Adversary – world - life
History and development (abridged)
 [Hannan’57]. Algorithm with total regret O((TN)1/2).
1/2 is necessary from coin-flipping example.
 Can see T
2
 Re-phrasing, need only T = O(N/ ) steps to get timeaverage regret down to .
(will call this quantity T)
 Game-theorists viewed N as fixed, constant, and not
so important as T, so pretty much done.
History and development (abridged)
 [Hannan’57]. Algorithm with total regret O((TN)1/2).
2
 T = O(N/ ) steps to get average regret down to .
 Learning-theory 80s-90s:
 Q: given a space of hypotheses (like conjunctions over
n boolean features), can you do online prediction in a
way that does nearly as well as best of them in
hindsight (ignoring computational issues: here N=2n)?
 [LittlestoneWarmuth’89]: Weighted-majority algorithm
 E[#Mistakes] · OPT(1+) + -1 log N,
 Or, OPT + O((T log N)1/2) if set  based on T.
 Or, T = O((log N)/2).
 For intuition, think of case where OPT=0.
 Can replace “log N” with #bits to describe opt.
 [FreundSchapire] realized could apply to game setup.
Weighted-majority & variants
 Initialize all weights of all actions to 1. Pick action i with
probability pi = wi/W, where W = i wi.
 Given cost vector c = (c1, c2, …, cN), update wi à wi(1-)ci.
 Won’t give proof since will analyze a more general setting
& algorithm instead.
A generalization before continuing our story…
 A natural generalization of our regret goal is: what if we
also want that on rainy days, we do nearly as well as the
best route for rainy days.
 And on Mondays, do nearly as well as best route for
Mondays.
 More generally, have N “rules” (on Monday, use path P).
Goal: simultaneously, for each rule i, guarantee to do
nearly as well as it on the time steps in which it fires.
 For all i, want E[costi(alg)] · (1+)costi(i) + O(-1log N).
“Specialists” or “sleeping experts” problem.
Studied theoretically in [B95][FSSW97][BM05]; in
practice [CS’96,CS’99] for document classification.
A generalization before continuing our story…
 Simple alg (joint with Yishay Mansour):
 Define “relaxed regret” with respect to rule i as:
Want ·  log N
Ri = E[costi(alg)]/(1+) – costi(i).
R
 Give rule i weight wi = (1+) i. Pick with prob pi=wi/W.
 Initially, all weights are 1 and sum to N.
 Prove sum of weights never increases:
-1
insert proof here
Conclude Ri · log1+N ¼ -1log N.
 Can extend to rules that can be fractionally on too.

A generalization before continuing our story…
 Simple alg (joint with Yishay Mansour):
 Define “relaxed regret” with respect to rule i as:
Want ·  log N
Ri = E[costi(alg)]/(1+) – costi(i).
R
 Give rule i weight wi = (1+) i. Pick with prob pi=wi/W.
 Initially, all weights are 1 and sum to N.
 Prove sum of weights never increases:
-1
Conclude Ri · log1+N ¼ -1log N.
 Can extend to rules that can be fractionally on too.

A generalization before continuing our story…


Ri = E[costi(alg)]/(1+) – costi(i).
Give rule i weight wi = (1+)Ri. Pick with prob pi=wi/W.
Prove sum of weights never increases:
That’s it!
History and development contd…
 [Hannan’57]. T=O(N/2).
 Weighted-maj: T=O((log N)/2).
 So, conceivably can do well when N is exponential in
natural problem size (like in online routing), if only could
implement efficiently.
 Learning theory 90s-00s: series of results giving
efficient implementation/alternatives in various settings:
 [HelmboldSchapire97]: best pruning of given DT.
 [BChawlaKalai02]: 1+ static-optimal for list-update
 [TakimotoWarmuth02]: online shortest paths.
 [KalaiVempala03]: elegant setting generalizing all above
 [Zinkevich03]: online convex programming
 [AwerbuchKleinberg04][McMahanB04]: [KV] bandit model
 [Kl,FlKaMc05]: bandit version of [Z03]
Kalai-Vempala setting
 Set S of feasible points in Rm, of bounded diameter.
(Think of as indicator vectors
for possible paths)
x
 For t = 1 to T do:
 Alg picks xt 2 S, adversary picks cost vector ct.
 Alg pays xt ¢ ct.
 Goal is compete wrt best fixed x in hindsight: x 2 S that
minimizes x ¢ (c1 + c2 + … + cT).
 Don’t store S explicitly. Instead, assume have oracle for
offline problem: given c, find best x 2 S.
 E.g., S is convex, or S is set of paths from vs to vt.
 Goal is to use to solve online problem.
Kalai-Vempala algorithm
 Assume have oracle for offline problem: given c, find
best x 2 S.
x
 Algorithm is very simple:
 Just pick xt 2 S that minimizes x¢(c0 + c1 + … + ct-1),
 where c0 is picked from appropriate distrib.
 In fact, very similar to Hannan’s original alg.
 Form of bounds:
2
 T = O(diam(S) ¢ L1 bound on c’s ¢ log(m)/  ).
2
 For online shortest path, T = O(nm¢log(n)/ ).
Analysis sketch [KV]
Two algorithms walk into a bar…
 Alg A picks xt minimizing xt¢ct-1, where ct-1=c1+…+ct-1.
 Alg B picks xt minimizing xt¢ct, where ct=c1+…+ct.
(B has fairy godparents who add ct into history)
Step 1: prove B is at least as good as OPT:
t (B’s xt)¢ ct · minx2 S x¢(c1 + … + cT)
Uses cute telescoping argument.
Now, A & B start drinking and their objectives get fuzzier…
Step 2: at appropriate point, prove A & B
are similar and yet B has not been hurt
too much.
Ct-1
Ct
Applications
Efficient online algorithm to perform nearly as well as:
 Best fixed path in hindsight (in routing)
 Best fixed search tree in hindsight (in data-structures)
 …
Potential use for algorithmic
problems that can be viewed
as finding an apx-optimal
solution for an exponentialsize matrix game, if can fit
both players into this
framework.
Can combine KV with sleeping-experts too
 Say you are given N “conditions” or “features” to pay
attention to (is it raining?, is it a Monday?, …).
 Each day satisfies some conditions and not others.
 What can we do?
 For each condition i, run a copy of KV on just the days
satisfying that condition.
 Then view these N algorithms as “sleeping experts”
and feed their suggestions as inputs into [BM].
 For each condition i, on the days satisfying that
condition we do nearly as well as the best x 2 S for the
days satisfying that condition.
Bandit setting [AK] [MB]
 What if alg is only told cost xt¢ct and not ct itself.
 E.g., you only find out cost of your own path, not all
edges in network.
 Can you still perform comparably to the best path in
hindsight? (which you don’t even know!)
 Ans: yes, though bounds are worse.
 Basic idea is fairly straightforward:
t-1=c + … + c .
 All we need is an estimate of c
1
t-1
 So, pick basis B and occasionally sample a random x2B.
 Use dot-products with basis vectors to reconstruct
estimate of ct-1. (Helps for B to be as orthogonal as
possible)
 Even if adversary is adaptive, still can’t bias your
estimate too much.
Now on to the last part of the
talk…
What if everyone started using NR algs?
 What if changing cost function is due to other players in
the system optimizing for themselves?
 No-regret can be viewed as a nice definition of
reasonable self-interested behavior.
 What happens to overall system if everyone uses one?
 In zero-sum games, behavior quickly approaches minimax
optimal.
 In general-sum games, does behavior quickly (or at all)
approach a Nash equilibrium? (after all, a Nash Eq is
exactly a set of distributions that are no-regret wrt
each other).
 Well, unfortunately, no.
A bad example for general-sum games
 Augmented Shapley game from [Z04]:
 First 3 rows/cols are Shapley game (rock / paper /
scissors but if both do same action then both lose).
th action “play foosball” has slight negative if other
 4
player is still doing r/p/s but positive if other player
does 4th action too.
 NR algs will cycle among first 3 and have no regret,
but do worse than only Nash Equilibrium of both
playing foosball.
 But how about routing, since this has more structure?
Consider Wardrop/Roughgarden-Tardos traffic model
 Given a graph G. Each edge e has non-decreasing cost
function ce(fe) that tells latency of that edge as a
function of the amount of traffic using it.
 Say 1 unit of traffic (infinitesimal users) wants to travel
from vs to vt. E.g., simple case:
 Nash equilibrium is flow f*
such that all paths with
positive flow have the same
cost, and no path is cheaper.
ce(f)=f
vs
Nash is 2/3,1/3
vt
ce(f)=2f
 Useful notions:
 Cost(f) = e ce(fe)fe = cost of average user under f.
 Costf(P) = e 2 P ce(fe) = cost of using path P given f.
 So, Cost(f*) = minP Costf*(P).
 What happens if people use no-regret algorithms?
Global behavior of NR algs [B-EvenDar-Ligett]
On day t, have flow ft.
Average regret  by some time T.
So, avgt[Cost(ft)] ·  + minP avgt[Costft(P)].
What we’d like to say is the time-average flow
favg is -Nash:
Cost(favg) ·  + minP Costfavg(P)
 Or even better that most ft are -Nash:
Cost(ft) ·  + minP Costft(P)
 But problems if cost functions are too sharp.




2/3
vt
vs
1/3
But can show if bounded slope…
Proof sketch:
1. For any edge e, time-avg cost · flow-avg cost. So,
feavg ¢ avgt[ce(ft)] · avgt[ce(ft) ¢ ft]
2. Summing over all edges, and applying the regret bound:
avgt[Costft(favg)] · avgt[Cost(ft)] ·
+minPavgt[Costft(P)], which in turn is ·  +
avgt[Costft(favg)].
3. This means that actually, for each edge, the time-avg
cost must be pretty close to the flow-avg cost, which
(by the assumption of bounded slope) means the costs
can’t vary too much over time.
4. This then lets you swap quantifiers (cost/avg) to get:
Cost(favg) · ’ + minP Costfavg(P)
where ’ = O(( ¢ max-slope ¢ n)1/2).
Can also get bounds for “most” ft too.
Summary/Open problems
 Regret-minimizing algorithms esp motivated by
online routing-type problems.
 Can perform comparably to best fixed path in
hindsight even with very limited information.
 No-regret property sufficient to converge to
Nash in Wardrop model.
Open problems:
 Algorithmic use of [KV] for fast apx minimax.
 [KV] with apx oracles; internal regret.
 Nash convergence bounds pretty loose, esp for
“most ft close to -Nash”.