Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Transcript Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Oblivious Equilibrium for Stochastic Games with Concave Utility
Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith
Wireless environments are reactive
• Scenario: Many devices competing for common wireless
resources (spectrum sharing)
• Typical approach: Assume that the environment is nonreactive.
• Flawed assumption at best:
In cognitive radios, the environment consists of other cognitive
devices – highly reactive environment
Question:
How to design policies for such systems?
Large scale wireless interference games
We want to study optimal dynamic strategies for devices in
the following type of game:
Foundational theory
The standard solution concept for a stochastic game is
Markov perfect equilibrium (MPE):
Each player’s policy maximizes net present value
starting from any state, given policies of other players.
• Each device decides power allocation over bands
available
MPE is computationally difficult to compute
Requires excessive information exchange between devices
• Interference felt from other devices, and treated as pure
noise
Question: When can we approximate MPE with some
simple strategies?
Model these wireless systems as stochastic games.
What is the performance loss by assuming non-reactive
environment?
Foundational theory
Oblivious policies: A device reacts to its own state and
and the long run average of other devices’ states.
–
–
–
–
Oblivious policies are simple to compute and implement
Each policy computation is a 1-D dynamic program
No information exchange between devices
Completely distributed implementation
Our model
• m players
• State of player i is x i ; action of player i is ai
• A Markov policy is a decision rule based on current state
and empirical distribution:
x i ,t 1  A x i ,t  B a i ,t  w i ,t
• Payoff:

Oblivious Equilibrium (OE)
• In an oblivious policy, a player responds instead to x -i , t
and only the long run average f -i (m).
ai , t = ¹ (x i , t , f -i , t (m))
• State evolution:
Such a problem setting was developed by Weintraub et al.
Oblivious equilibrium (OE) is a vector of policies, one per
device, where each device has chosen an optimal
oblivious policy.
Markov Perfect Equilibrium (MPE)
x
i ,t
, a i ,t , f  i ,t

• A Markov perfect equilibrium is a vector of Markov
policies, where each player has maximized present
discounted payoff, given policies of other players.
• In an oblivious equilibrium each player has maximized
present discounted payoff using an oblivious policy,
given long run average state induced by other players’
policies.
where f -i = empirical distribution of other players’ states
# of
players
Question: When is MPE close to OE?
state
Assumptions
Main theorem
[A1]
State dynamics are linear
[A2]
MPE and OE exist.
[A3]
Payoff function is separable in state and action
Under [A1]-[A6], oblivious equilibrium payoff is
approximately optimal over Markov policies, as m  1 .
In other words, OE is approximately an MPE.
  x, a, f    1 ( x, f )   2 (a )
[A4]
Payoff function is jointly concave in state and action
[A5]
Payoff function is uniformly bounded over state
space.
[A6]
Payoff is Gateaux differentiable w.r.t. f -i .
The basic idea is that as m  1 , the true state distribution
is close to the time average—so knowledge of other
player’s policies does not significantly improve payoff.
Main Contribution and Future Work
Light Tail
• The proof of the main theorem relies on a Light tail
condition
• Define
g ( y )  sup

f ( y )
• Our main contribution is identifying the exogenous set of
conditions under which the light tail condition is satisfied
x ,a , f
g(y) can be interpreted as the maximum rate of change
of the payoff function w.r.t small change in fraction of
players at state y.
Light Tail: For all  > 0, there exists c > 0 such that for all
m:
E [ g(Um) 1{Um > c} ] < ,
where Um is distributed according to f
• The light tail is endogenous – Checking it requires first
knowing the equilibrium outcome.
m.
• These conditions are on model primitives and hence are
easy verifiable.
Future Work: The assumption of linear dynamics is
restrictive
Need to identify conditions on state evolution under
which the light tail result holds.

Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Transcript Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith Wireless environments are reactive • Scenario:

Directory