Simple Stochastic Games Mean Payoff Games Parity Games Uri Zwick Tel Aviv University Zero sum games –3 –5 –2 Mixed strategies Max-min theorem …

Download Report

Transcript Simple Stochastic Games Mean Payoff Games Parity Games Uri Zwick Tel Aviv University Zero sum games –3 –5 –2 Mixed strategies Max-min theorem …

Simple Stochastic Games Mean Payoff Games Parity Games

Uri Zwick Tel Aviv University

Zero sum games

1 0 1 2 –5 7 –3 2 –2 Mixed strategies Max-min theorem

Stochastic games

[Shapley (1953)] 1 0 1 2 –5 7 –3 2 –2 3 2 –7 –4 –3 – 1 4 – 1 7 Mixed positional (memoryless) optimal strategies

Simple Stochastic games (SSGs)

2 –5 7 2 –4 – 1 4 – 1 7 Every game has only one row or column Pure positional (memoryless) optimal strategies

Simple Stochastic games (SSGs) Graphic representation M m R MAX min

RAND

The players construct an (infinite) path

0 ,

1 ,… Terminating version Non-terminating version Discounted version Fixed duration games easily solved using dynamic programming

Simple

Stochastic games (SSGs) Graphic representation – example min m Start vertex M M MAX R RAND

Simple Stochastic game (SSGs) Reachability version [Condon (1992)] M MAX m min R

RAND

M M 0-sink No weights All prob. are ½ 1-sink

Objective:

Max / Min the prob. of getting to the 1-sink Technical assumption: Game halts with prob. 1

Simple

Stochastic games (SSGs) Basic properties Every vertex in the game has a value

Both players have positional optimal strategies Positional strategy for MAX: choice of an outgoing edge from each MAX vertex Decision version: Is value 

“Solving” binary SSGs

The values

v i

of the vertices of a game are the unique solution of the following equations: The values are rational numbers requiring only a linear number of bits Corollary: Decision version in NP  co-NP

Markov Decision Processes (MDPs) M MAX m min R

RAND

Theorem: [Derman (1970)] Values and optimal strategies of a MDP can be found by solving an LP

NP



co-NP – Another proof

Deciding whether the value of a game is at least (at most)

is in NP  co-NP To show that value  guess an optimal strategy 

, for MAX Find an optimal counter-strategy  by solving the resulting MDP.

for min

Is the problem in P ?

Mean Payoff Games (MPGs) [Ehrenfeucht, Mycielski (1979)] M MAX m min Non-terminating version Discounted version MPGs Reachability SSGs Pseudo polynomial algorithm R

RAND

(PZ’96) (PZ’96)

Mean Payoff Games (MPGs) [Ehrenfeucht, Mycielski (1979)] Value – average of the cycle

Parity Games (PGs)

Priorities 3 EVEN 8 ODD EVEN wins if largest priority seen

infinitely often

in even Equivalent to many interesting problems in automata and verification: Non-emptyness of  -tree automata modal  -calculus model checking

Parity Games (PGs) Mean Payoff Games (MPGs)

[Stirling (1993)] [Puri (1995)] 3 EVEN 8 ODD Chang priority

to payoff ( 

)

Move payoff to outgoing edges

Simple

Stochastic games (SSGs) Additional properties An SSG is said to be binary if the outdegree of every non-sink vertex is 2 A switch is a change of a strategy at a single vertex A switch is profitable for MAX if it increases the value of the game (sum of values of all vertices) A strategy is optimal iff no switch is profitable

randomized

subexponential algorithm for binary SSGs [Ludwig (1995) ] [Kalai (1992) Matousek-Sharir-Welzl (1992) ] Start with an arbitrary strategy  for MAX Choose a random vertex



MAX Find the optimal strategy  ’ for MAX in the game in which the only outgoing edge from i is (

,  (

)) If switching  ’ at

then  ’ is not profitable, is optimal Otherwise, let  (  ’)

and repeat

randomized

subexponential algorithm for binary SSGs [Ludwig (1995) ] [Kalai (1992) Matousek-Sharir-Welzl (1992) ] MAX vertices All correct !

Would never be switched !

There is a hidden order of MAX vertices under which the optimal strategy returned by the first recursive call correctly fixes strategy of MAX at vertices 1,2,…,

the

Exponential algorithm for PGs [McNaughton (1993)] [Zielonka (1998)] Vertices of highest priority (even) First Second recursive recursive call call In the worst case, both recursive calls are on games of size

 1 Vertices from which EVEN can force the game to enter A

Deterministic subexponential alg for PGs Jurdzinski, Paterson, Z (2006) Second recursive call

Idea:

Look for small dominions!

Dominions of size

can be found in O(

n s

) time

Dominion

A (small) set from which one of the players can without the play ever leaving this set

Open problems

● Polynomial algorithms?

● Faster subexponential algorithms for parity games? ● Deterministic subexponential algorithms for MPGs and SSGs?

● Faster pseudo-polynomial algorithms for MPGs?

Simple Stochastic Games Mean Payoff Games Parity Games Uri Zwick Tel Aviv University Zero sum games –3 –5 –2 Mixed strategies Max-min theorem …

Transcript Simple Stochastic Games Mean Payoff Games Parity Games Uri Zwick Tel Aviv University Zero sum games –3 –5 –2 Mixed strategies Max-min theorem …