Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Download Report

Transcript Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Oblivious Equilibrium for Stochastic Games with Concave Utility
Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith
Wireless environments are reactive
• Scenario: Many devices competing for common wireless
resources (spectrum sharing)
• Typical approach: Assume that the environment is nonreactive.
• Flawed assumption at best:
In cognitive radios, the environment consists of other cognitive
devices – highly reactive environment
Question:
How to design policies for such systems?
Large scale wireless interference games
We want to study optimal dynamic strategies for devices in
the following type of game:
Foundational theory
The standard solution concept for a stochastic game is
Markov perfect equilibrium (MPE):
Each player’s policy maximizes net present value
starting from any state, given policies of other players.
• Each device decides power allocation over bands
available
MPE is computationally difficult to compute
Requires excessive information exchange between devices
• Interference felt from other devices, and treated as pure
noise
Question: When can we approximate MPE with some
simple strategies?
Model these wireless systems as stochastic games.
What is the performance loss by assuming non-reactive
environment?
Foundational theory
Oblivious policies: A device reacts to its own state and
and the long run average of other devices’ states.
–
–
–
–
Oblivious policies are simple to compute and implement
Each policy computation is a 1-D dynamic program
No information exchange between devices
Completely distributed implementation
Our model
• m players
• State of player i is x i ; action of player i is ai
• A Markov policy is a decision rule based on current state
and empirical distribution:
x i ,t 1  A x i ,t  B a i ,t  w i ,t
• Payoff:

Oblivious Equilibrium (OE)
• In an oblivious policy, a player responds instead to x -i , t
and only the long run average f -i (m).
ai , t = ¹ (x i , t , f -i , t (m))
• State evolution:
Such a problem setting was developed by Weintraub et al.
Oblivious equilibrium (OE) is a vector of policies, one per
device, where each device has chosen an optimal
oblivious policy.
Markov Perfect Equilibrium (MPE)
x
i ,t
, a i ,t , f  i ,t

• A Markov perfect equilibrium is a vector of Markov
policies, where each player has maximized present
discounted payoff, given policies of other players.
• In an oblivious equilibrium each player has maximized
present discounted payoff using an oblivious policy,
given long run average state induced by other players’
policies.
where f -i = empirical distribution of other players’ states
# of
players
Question: When is MPE close to OE?
state
Assumptions
Main theorem
[A1]
State dynamics are linear
[A2]
MPE and OE exist.
[A3]
Payoff function is separable in state and action
Under [A1]-[A6], oblivious equilibrium payoff is
approximately optimal over Markov policies, as m  1 .
In other words, OE is approximately an MPE.
  x, a, f    1 ( x, f )   2 (a )
[A4]
Payoff function is jointly concave in state and action
[A5]
Payoff function is uniformly bounded over state
space.
[A6]
Payoff is Gateaux differentiable w.r.t. f -i .
The basic idea is that as m  1 , the true state distribution
is close to the time average—so knowledge of other
player’s policies does not significantly improve payoff.
Main Contribution and Future Work
Light Tail
• The proof of the main theorem relies on a Light tail
condition
• Define
g ( y )  sup

f ( y )
• Our main contribution is identifying the exogenous set of
conditions under which the light tail condition is satisfied
x ,a , f
g(y) can be interpreted as the maximum rate of change
of the payoff function w.r.t small change in fraction of
players at state y.
Light Tail: For all  > 0, there exists c > 0 such that for all
m:
E [ g(Um) 1{Um > c} ] < ,
where Um is distributed according to f
• The light tail is endogenous – Checking it requires first
knowing the equilibrium outcome.
m.
• These conditions are on model primitives and hence are
easy verifiable.
Future Work: The assumption of linear dynamics is
restrictive
Need to identify conditions on state evolution under
which the light tail result holds.