Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.

Download Report

Transcript Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.

Multi-Agent Learning Mini-Tutorial

Gerry Tesauro

IBM T.J.Watson Research Center

http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist

    

Outline

Statement of the problem Tools and concepts from RL & game theory

“Naïve” approaches to multi-agent learning

ordinary single-agent RL; no-regret learning   

“Sophisticated” approaches

minimax-Q (Littman), Nash-Q (Hu & Wellman)  tinkering with learning rates: WoLF (Bowling), Multiple-timescale Q-learning (Leslie & Collins)  fictitious play evolutionary game theory “strategic teaching” (Camerer talk)

Challenges and Opportunities

Normal single-agent learning

   Assume that environment has observable states , characterizable expected rewards and state transitions , and all of the above is stationary (MDP-ish) Non-learning, theoretical solution to fully specified problem: DP formalism Learning: solve by trial and error without a full specification: RL + exploration, Monte Carlo, ...

Multi-Agent Learning Problem:

Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems.

 Non-learning, theoretical solution to fully specified problem:

game theory

Basics of game theory

 A game is specified by: payoff matrices players (1…N), actions , and (functions of joint actions) B’s action A’s action

R P S R

0  1  1

P

 1 0  1

S

 1  1 0

R P S R

0  1  1

P

 1 0  1

S

 1  1 0 A’s payoff B’s payoff  If payoff matrices are identical, game is cooperative , else non-cooperative ( zero-sum = purely competitive)

Basic lingo…(2)

  Games with no states: (bi)-matrix games Games with states: stochastic games , Markov games ; (state transitions are functions of joint actions)   Games with simultaneous moves: normal form Games with alternating turns: extensive form   No. of rounds = 1: one-shot game No. of rounds > 1: repeated game   deterministic action choice: pure strategy non-deterministic action choice: mixed strategy

Basic Analysis

     A joint strategy x is Pareto-optimal everybody’s payoffs if no x’ that improves An agent’s x i is a dominant strategy regardless of others’ actions if it’s always best x i is a best-reponse given x -i to others’ x -i if it maximizes payoff A joint strategy x is an strategy is simultaneously a best-response to everyone else’s strategy, i.e. no incentive to deviate (Nash, correlated) equilibrium if each agent’s A Nash equilibrium always exists, but may be exponentially many of them, and not easy to compute

What about imperfect information games?

 Nash eqm. requires knowledge of all payoffs. For imperfect info. games, corresponding concept is Bayes-Nash equilibrium regular Nash.

(Nash plus Bayesian inference over hidden information). Even more intractable than

Can we make game theory more tractable?

    Active area of research Symmetric games: payoffs are invariant under swapping of player labels.  Can look for symmetric equilibria, where all agents play same mixed strategy.

Network games: agent payoffs only depend on interactions with a small # of neighbors Summarization games: payoffs are simple summarization functions of population joint actions (e.g. voting)

Summary: pros and cons of game theory

 Game theory provides a nice conceptual/theoretical framework for thinking about multi-agent learning.

 Game theory is appropriate provided that:     Game is stationary and fully specified; Enough computer power to compute equilibrium; Can assume other agents are also game theorists; Can solve equilibrium coordination problem.

  Above conditions rarely hold in real applications Multi-agent learning is not only a fascinating problem, it may be the only viable option.

Naïve Approaches to Multi-Agent Learning

  Basic idea: agent adapts, ignoring non-stationarity of other agents’ strategies 1. Fictitious play: Agent observes time-average frequency of other players’ action choices, and models:

times k observed prob

(

action k

)  #

total

#

observatio ns

agent then plays best-response to this model  Variants of fictitious play: exponential recency weighting, “smoothed” best response (~softmax), small adjustment toward best response, ...

What if all agents use fictitious play?

 Strict Nash equilibria are absorbing points for fictitious play  Typical result is limit-cycle behavior of strategies, with increasing period as N    In certain cases, product of empirical distributions converges to Nash even though actual play cycles (penny matching example)

More Naïve Approaches…

2. Evolutionary game theory:

“Replicator Dynamics” models: large population of agents using different strategies, fittest agents breed more copies.

 Let x= population strategy vector, and x k = fraction of population playing strategy k. Growth rate then:

dx k dt

x k

u

(

e k

,

x

) 

u

(

x

,

x

)    Above equation also derived from an “imitation” model NE are fixed points of above equation, but not necessarily attractors (unstable or neutral stable)

Many possible dynamic behaviors...

limit cycles attractors unstable f.p.

Also saddle points, chaotic orbits, ...

Replicator dynamics: auction bidding strategies

More Naïve Approaches…

3. Iterated Gradient Ascent: (Singh, Kearns and Mansour): Again does a myopic adaptation to other players’ current strategy.

 

dx dt i

   

x i

u

(

x i

,

x

i

)  Coupled system of linear equations: u is linear in x i and x -i Analysis for two-player, two-action games: either converges to a Nash fixed point on the boundary (at least one pure strategy), or get limit cycles

Further Naïve Approaches…

4. Dumb Single-Agent Learning: Use a single-agent algorithm in a multi-agent problem & hope that it works   No-regret learning by pricebots (Greenwald & Kephart) Simultaneous Q-learning by pricebots (Tesauro & Kephart)  In many cases, this actually works: learners converge either exactly or approximately to self-consistent optimal strategies

“Sophisticated” approaches

 Takes into account the possibility that other agents’ strategies might change.

5. Multi-Agent Q-learning:

Minimax-Q (Littman): convergent algorithm for two-player zero-sum stochastic games  Nash-Q (Hu & Wellman): convergent algorithm for two-player general-sum stochastic games; requires use of Nash equilibrium solver

More sophisticated approaches...

6. Varying learning rates

  WoLF: “Win or Learn Fast” (Bowling): agent reduces its learning rate when performing well, and increases when doing badly. Improves convergence of IGA and policy hill-climbing Multi-timescale Q-Learning (Leslie): different agents use different power laws t where ordinary Q-learning doesn’t -n for learning rate decay: achieves simultaneous convergence

More sophisticated approaches...

7. “Strategic Teaching:” recognizes that other players’ strategy are adaptive  “A strategic teacher may play a strategy which is not myopically optimal (such as cooperating in Prisoner’s Dilemma) in the hope that it induces adaptive players to expect that strategy in the future, which triggers a best-response that benefits the teacher.” (Camerer, Ho and Chong)

Theoretical Research Challenges

 

Proper theoretical formulation?

 “No short-cut” hypothesis : Massive on-line search a la Deep Blue to maximize expected long-term reward   (Bayesian) Model and predict behavior of other players , including how they learn based on my actions (beware of infinite model recursion) trial-and-error exploration  continual Bayesian inference using all evidence over all uncertainties (Boutilier: Bayesian exploration)

When can you get away with simpler methods?

Real-World Opportunities

Multi-agent systems where you can’t do game

theory (covers everything :-))  Electronic marketplaces (Kephart)   Mobile networks (Chang) Self-managing computer systems (Kephart)    Teams of robots (Bowling, Stone) Video games Military/counter-terrorism applications