Regret Minimization and Job Scheduling

Transcript Regret Minimization and Job Scheduling

Regret Minimization and Job Scheduling

Yishay Mansour Tel Aviv University

Decision Making under uncertainty

• • Online algorithms – – Stochastic models Competitive analysis – Absolute performance criteria A different approach: – Define “reasonable“ strategies – Compete with the best (in retrospect) – Relative performance criteria

Routing

• • • • Model: Each day 1. select a path from source to destination 2. observe the latencies.

– Each day diff values Strategies: All source-dest. paths Loss: The average latency on the selected path Performance Goal: match the latency of best single path 5

Financial Markets: options

• • • • Model: stock or cash.

Each day, set portfolio then observe outcome.

Strategies: invest either: all in stock or, all in cash Gain: based on daily changes in stock Performance Goal: Implements an option !

CASH STOCK 6

Machine learning – Expert Advice

• Model: each time • • • 1. observe expert predictions 2. predict a label Strategies: experts (online learning algorithms) Loss: errors Performance Goal: match the error rate of best expert – In retrospect 1 1 2 3 1 4 0 1 7

Parameter Tuning

• • • • Model: Multiple parameters.

Strategies: settings of parameters Optimization: any Performance Goal: match the best setting of parameters 8

Parameter Tuning

• •

Development Cycle

– develop product (software) – test performance – tune parameters – deliver “tuned” product

Challenge: can we combine

– testing – tuning – runtime 9

Regret Minimization: Model

• • • • Actions A={1, … ,N} Time steps: t ∊

{ 1, … , T}

At time step t: – – Agent selects a distribution p

(i) over A Environment returns costs c

t (i) ε [0,1]

•

Adversarial setting

– Online loss: l

t (on) = Σ i c t (i) p t

(i) Cumulative loss : L

T (on) = Σ t l t (on)

External Regret

• • • • • Relative Performance measure: – compares to the best strategy in A • The basic class of strategies Online cumulative loss : L

T (on) = Σ t l t (on)

Action i cumulative loss : L

T (i) = Σ t c t (i)

Best action: L

T (best) = MIN i {L T (i) }=MIN i {Σ t c t (i)}

External Regret = L

T (on) – L T (best)

External Regret Algorithm

• • • Goal: Minimize Regret Algorithm: – Track the regrets – Weights proportional to the regret Formally: At time t – Compute the regret to each action • • • Y t (i)= L t (on)- L t (i), and r t (i) = MAX{ Y t (i),0} p t+1 (i) = r t (i) / Σ

–

r t (i)

If all r t (i) = 0 select p t+1 arbitrarily.

R t = < r t (1), …,r t

(N)> and ΔR t = Y t - Y t-1

External Regret Algorithm: Analysis

•

R t = < r t (1), …,r t

(N)> and ΔR t = Y t - Y t-1 LEMMA: ΔR t ∙ R

t-1 = 0 Σ i (c t (i) – l t (on)) r t-1 (i) = Σ i c t (i)r t-1 (i)– Σ i l t (on)r t-1 (i) Σ i l t (on) r t-1 (i) = [Σ i c t (i) p t (i) ]Σ i r t-1 (i)

•

= Σ i c t (i)r t-1 (i) LEMMA:

max

i R T

(

)  1

ΔR t

2  2

R t-1 R t

External regret: Bounds

• • Average regret goes to zero – No regret – Hannan [1957] Explicit bounds – Littstone & Warmuth ‘94 – CFHHSW ‘97 – External regret = O(log N + √Tlog N) 14

Dominated Actions

• • • Model: action y dominates x if y always better than x Goal: Not to play dominated actions Goal (unknown model): The fraction of times we play dominated actions is played is vanishing Cost Action y .3

Cost Action x .2

Internal/Swap Regret

• • • • Internal Regret – – Regret(x,y) = ∑ t: a(t)=x c t (x) - c t (y) Internal Regret = max x,y Regret(x,y) Swap Regret – Swap Regret = ∑ x max y Regret(x,y) Swap regret ≥ External Regret – ∑ x max y Regret(x,y) ≥ max y ∑ x Regret(x,y) Mixed actions – Regret(x,y) = ∑ t (c t (x) - c t (y))p t (x)

Dominated Actions and Regret

• • • Assume action y dominates action x – For any t: c

t (x) > c t (y)+δ

Assume we used action x for n times – Regret(x,y) > δ n If SwapRegret < R then – Fraction of time dominated action used – At most R/δ

Calibration

• • Model: each step predict a probability and observe outcome Goal: prediction calibrated with outcome – During time steps where the prediction is p the average outcome is (approx) p Predict prob. of rain predictions outcome .3

Calibration: .3

1/3 1/2 19

Calibration to Regret

• Reduction to Swap/Internal regret: – Discrete Probabilities • Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} – Loss of action x at time t: (x – c

t ) 2

– y*(x)= argmax • y y*(x)=avg(c t |x) Regret(x,y) – Consider R(x,y*(x))

Internal regret

• No internal regret – [Foster & Vohra] , [Hart & Mas-Colell] • Based on the approachability theorem [Blackwell ’56] – Explicit Bounds – [Cesa-Bianchi & Lugasi ’03] • Internal regret = O(log N + √T log N) – [Blum & Mansour] • Swap regret = O(log N + √T N) 21

Regret: External vs Internal

• External regret – You should have bought S&P 500 – Match boy i to girl i • Internal regret – Each time you bought IBM you should have bought SUN – Stable matching • Limitations: - No state - Additive over time 22

[Even-Dar, Mansour, Nadav, 2009] 23

• • • Atomic – –

Routing Games

Finite number of players Player i transfer flow from s i to t i f 1, L Splittable flows Cost i =  p ε (si, ti) Latency(p) * flow i (p) t 2 f 2, T s 1 f 1, R s 2 f 2,T e f 1,L Latency on edge e = L e (f 1,L + f 2,T ) f 2, B t 1 f 2 f 1

• • •

Cournot Oligopoly

[Cournot 1838] Firms select production level Market price depends on the TOTAL supply Firms maximize their profit = revenue - cost Market price

X Cost 1 (X) Y Cost 2 (Y) P X y

Overall quantity • • Best response dynamics converges for 2 players [Cournot 1838] – Two player ’ s oligopoly is a super-modular game [Milgrom, Roberts 1990] Diverges for

 5 [Theocharis 1960]

•

Resource Allocation Games

Advertisers set budgets: $5M $10M $17M $25M The best response dynamics generally diverges for linear resource allocation games • Each advertiser wins a proportional market share ‘s allocated rate = 25 5+10+17+25 • Utility: – – Concave utility from allocated rate Quasi-linear with money

U = f

( )

- $25M

Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games There exists 

1 ,…,



n > 0



1 u 1 (x) +

Such that 

2 u 2 (x)+…+



n u n (x) R

2. A (weighted) social welfare is concave 3. The utility of a player is convex in the vector of actions of other players Socially Concave Games

The relation between socially concave games and concave games • Concave Games [ Rosen 65] The utility of a player is strictly concave in her own strategy • A sufficient condition for equilibrium uniqueness Normal Form Games (with mixed strategies) Socially Concave Games Zero Sum Games Atomic, splittable routing Cournot Resource Allocation Unique Nash Equilibrium Concave Games

The average action and average utility converge to NE If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium: Theorem 1: The average action profile converges to NE Day 1 Day 2 Day 3 Day

Average of days 1…

Player 1: Player 2: Player

: (T) - Nash equilibrium Theorem 2: The average daily payoff of each player converges to her payoff in NE

• Convergence of the “ average action ” and “ average payoff ” are two different things!

Here the average action converges to ( ½ , ½ ) for every player • On Even Days • On Odd Days s t s t • But the average cost is 2, while the average cost in NE is 1 s t

The Action Profile Itself Need Not Converge • On Even Days s t s • On Odd Days t

Correlated Equilibrium

• • • CE: A joint distribution Q Each time t, a joint action drawn from Q – Each player action is BR Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE Action x .3

Action y .2

[Even-Dar, Klienberg, Mannor, Mansour, 2009] 33

Job Scheduling: Motivating Example

GOAL: Minimize load on servers users Load Balancer servers 35

Online Algorithms

• Job scheduling – N unrelated machines • machine = action – each time step: • a job arrives – has different loads on different machines • algorithm schedules the job on some machine – Given its loads – Goal: • minimize the loads – makespan or L 2 • Regret minimization – N actions • machines – each time step • First, algorithm selects an action (machine) • Then, observes the losses – Job loads – Goal: • minimize the sum of losses 36

Modeling Differences: Information

• • • Information model: – what does the algorithm know when it selects action/machine Known cost: – First observe costs then select action – job scheduling Unknown cost: – First select action then observe costs – Regret Minimization 37

Modeling Differences: Performance

• • Theoretical Performance measure: – comparison class • job scheduling: best (offline) assignment • regret minimization: best static algorithm – Guarantees: • job scheduling: multiplicative • regret minimization: additive and vanishing.

Objective function: – job scheduling: global (makespan) – regret minimization: additive.

Formal Model

• • • • N actions Each time step t algorithm ON – select a (fractional) action: p

t (i)

– observe losses c

(i) in [0,1] Average losses of ON – for action i at time T: ON

T (i) = (1/T) Σ t

Global cost function: – –

C ∞ (ON T (1), … , ON T (N)) = max i ON T (i) C d (ON T (1), … , ON T (N)) = [ Σ i (ON T (i)) d ] 1/d

39

Formal Model

• Static Optimum: – Consider any fixed distribution α •

Every time play α

–

Static optimum α * - minimizes cost C

•

Formally:

– Let α ◊ L = (α(1)L(1) , … , α(N) L(N)) •

Hadamard (or Schur) product.

– – best fixed α

*

(L) = arg min α • where L

T (i) = (1/T) Σ t c t (i) C(α ◊ L )

static optimality C

* (L) = C(α * (L) ◊ L)

40

Example

• Two machines, makespan: L 1 L 2 observed loads 4 2 α*(L)  

L

1

L

2 

L

2 ,

L

1

L

1 

L

2   L 1 final loads L 2

makespan



L

1

L

2 

L

1

L

2 ( 1/3 , 2/3) 4/3 4/3 41

Our Results: Adversarial General

•

General Feasibility Result:

– – Assume C convex and C

*

• includes makespan and L

d

concave norm for d>1.

There exists an online algorithm ON, which for any loss sequence L:

C(ON) < C*(L) + o(1)

– Rate of convergence about √N/T 42

Our Results: Adversarial Makespan

• •

Makespan Algorithm

– There exists an algorithm ON – for any loss sequence L

C(ON) < C*(L) + O(log 2 N / √T)

Benefits: – very simple and intuitive – improved regret bound Two actions Δ

p t

( 1 )  1 2  

T

43

Our Results: Adversarial Lower Bound

• • We show that for many non-convex C there is a non-vanishing regret – includes L

d

norm for d<1 Non-vanishing regret  ratio >1 There is a sequence of losses L, such that,

C(ON) > (1+γ) C *

(L), where γ>0 44

Preliminary: Local vs. Global

time B 1 B 2 Low regret in each block Overall low regret B k 45

Preliminary: Local vs. Global

• LEMMA: – Assume C convex and C* concave, – Assume a partition of time to B

i

– At each time block B

i

regret at most R

i

Then:

C(ON)-C

*

(L) ≤ Σ

i

R

i

46

Preliminary: Local vs. Global

Proof: C(ON) ≤ Σ C(ON(B i )) Σ C*(L(B i )) ≤ C*(L) C(ON(B i )) – C*(L(B i )) ≤ R i C is convex C* is concave low regret in each B i Σ C(ON(B i )) – C*(L(B i )) ≤ Σ R i C(ON) – C*(L) ≤ Σ R i QED • Enough to bound the regret on subsets.

47

Example

t=2 t=1 M 1 M 2 arrival losses M 1 M 2 static opt α*=(1/2,1/2) cost = 3/2 M 1 M 2 local opt α*: (1/3,2/3) (2/3,1/3) cost = 4/3 M 1 M 2 global offline opt: (0,1) (1,0) cost = 1 48

Stochastic case:

• Each time t the costs are drawn from a joint distribution, – i.i.d over time steps, not between actions • • • INTUITION: Assume two actions (machines) Load Distribution: – With probability ½ : (1,0) – With probability ½ : (0,1) Which policy minimizes makespan regret?!

Regret components: – – MAX(L(1),L(2)) = sum/2 +|Δ|/2 Sum=L(1)+L(2) & Δ=L(1)-L(2)

Stochastic case: Static OPT

• • • Natural choice (model based) – Select always action ( ½, ½ ) Observations: – – Assume T/2+Δ times (1,0) and T/2-Δ times (0,1) Loads (T/4+ Δ/2 , T/4-Δ/2) – – Makespan = T/4+ Δ/2 > T/4 Static OPT: T/4 – Δ 2 /T < T/4 • W.h.p. OPT is T/4-O(1) sum=T/2 & E[|Δ|]= O(√T) – Regret = O(√T)

Can we do better ?!

Stochastic case: Least Loaded

• • • Least loaded machine: – Select the machine with the lower current load Observation: – Machines have same load (diff ≤ 1): |Δ| ≤ 1 – Sum of loads: E[sum] = T/2 – Expected makespan = T/4 Regret – Least Loaded Makespan LLM=T/4 ± √T – Regret =MAX{LLM-T/4,0} = O(√T) • Regret considers only the “bad” regret

Can we do better ?!

Stochastic case: Optimized Finish

• • Algorithm: – Select action ( ½, ½ ) • For T-4√T steps – Play least loaded afterwards.

Claim: Regret =O(T 1/4 ) – Until T-4 √T steps (w.h.p) Δ < 2√T – Exists time t in [T-4 √T ,T]: • • Δ=0 & sum = T/2 + O( T 1/4 ) From 1 to t: regret = O(T 1/4 ) • From t to T: Regret = O(√(T-t)) = O(T 1/4 )

Can we do better ?!

Stochastic case: Any time

• • • An algorithm which has low regret for any t – Not plan for final horizon T Variant of least loaded: – Least loaded weight: ½ + T -1/4 Claim: Regret = O(T 1/4 ) – Idea: • • Regret = max{(L 1 + L 2 )/2 – T/4,0} + Δ Every O(T 1/2 ) steps Δ=0 • Very near (½, ½)

Can we do better ?!

Stochastic case: Logarithmic Regret

• • Algorithm: – Use phases – Length of phases shrink exponentially • • T k = T k-1 /2 and T 1 = T/2 Log T phases – Every phase cancels deviations of previous phase • Deviation from the expectation Works for any probabilities and actions !

– Assuming the probabilities are known.

Can we do better ?!

Stochastic case

• • Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps Theorem (makespan/L d ) – Known distribution • Regret =O(log T /T) – Unknown distributions • Regret = O( log 2 T /T)

Summary

• • Regret Minimization – External – Internal – Dynamics Job Scheduling and Regret Minimization – Different global function – Open problems: • Exact characterization • Lower bounds

64

Makespan Algorithm

• Outline: – Simple algorithm for two machines • Regret O(1/√T) • simple and almost memory-less – Recursive construction: • Given three algorithms: two for k/2 actions and • • one for 2 actions build an algorithm for k actions Main issue: what kind of feedback to “propagate”.

Regret O(log –

2 N /√T)

better than the general result.

65

Makespan: Two Machines

• • • Intuition: – Keep online’s loads balanced Failed attempts: – use standard regret minimization • In case of unbalanced input sequence L, • algo. will put most of the load on single machine – use optimum to drive the probabilities Our approach: – Use the online loads • not opt or static cumulative loads 66

Makespan Algorithm: Two actions

• • • At time t maintain probabilities

p t,1

and p

t,2

Initially p

1,1 = 1-p t,1 = p 1,2 = ½

At time t:

p t

 1 , 1  

p t

, 1 

p t

, 2

l t

, 2 

p t

, 1

l t

1 2 

ON t

, 2 

T ON t

, 1

T

, 1 • Remarks: – – uses online loads Almost memory-less Δ

p t

, 1  1 2  

T

67

Makespan Algorithm: Analysis

• View the change in probabilities as a walk on the line.

0 ½ 1 68

Makespan Algorithm: Analysis

• Consider a small interval of length ε • Total change in loads: – identical on both • started and ended with same Δ • • • Consider only losses in the interval – local analysis Local opt is also in the interval Online used “similar” probability – loss of at most ε per step 69

Makespan Algorithm: Analysis

• Simplifying assumptions: – The walk is “balanced” in every interval • • add “virtual” losses to return to initial state only O(√T) additional losses – relates the learning rate to the regret – Losses “cross” interval’s boundary line • needs more sophisticated “bookkeeping”.

– make sure an update affects at most two adjacent intervals.

– Regret accounting • • loss in an interval additional “virtual” losses.

70

Makespan: Recursive algorithm

• Recursive algorithm: A3 A1 A2 71

Makespan: Recursive

• The algorithms: – Algorithms A1 and A2: • Each has “half” of the actions – gets the actual losses and “balances” them • each work in isolation.

– simulating and not considering actual loads.

– Algorithm A3 • gets the average load in A1 and A2 – balances the “average” loads.

A1

l t,1 ….

A3 A2

l t,k/2 ….

72

Makespan: Recursive

• The combined output : q 1 p 1 , … , q k/2 p 2 , … , q 1 , … x p 1 p 2 x q k/2 , … A3 A1 A2 l 1 , … l k/2 , … 74

Makespan: Recursive

• • Analysis (intuition): – Assume perfect ZERO regret • just for intuition … – The output of A1 and A2 completely balanced • The average equals the individual loads – maximum=average=minimum – The output of A3 is balanced • the contribution of A1 machines equal to that of A2 Real Analysis: – need to bound the amplification in the regret.

75

Regret Minimization and Job Scheduling

Transcript Regret Minimization and Job Scheduling