Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.

Download Report

Transcript Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.

Multi-Agent Learning Mini-Tutorial

Gerry Tesauro

IBM T.J.Watson Research Center

http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist

    

Outline

Statement of the problem Tools and concepts from RL & game theory



“Naïve” approaches to multi-agent learning

ordinary single-agent RL; no-regret learning   

“Sophisticated” approaches

minimax-Q (Littman), Nash-Q (Hu & Wellman)  tinkering with learning rates: WoLF (Bowling), Multiple-timescale Q-learning (Leslie & Collins)  fictitious play evolutionary game theory “strategic teaching” (Camerer talk)

Challenges and Opportunities

Normal single-agent learning

   Assume that environment has observable states , characterizable expected rewards and state transitions , and all of the above is stationary (MDP-ish) Non-learning, theoretical solution to fully specified problem: DP formalism Learning: solve by trial and error without a full specification: RL + exploration, Monte Carlo, ...



Multi-Agent Learning Problem:

Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems.

 Non-learning, theoretical solution to fully specified problem:

game theory

Basics of game theory

 A game is specified by: payoff matrices players (1…N), actions , and (functions of joint actions) B’s action A’s action

R P S R

0  1  1

 1 0  1

 1  1 0

R P S R

0  1  1

 1 0  1

 1  1 0 A’s payoff B’s payoff  If payoff matrices are identical, game is cooperative , else non-cooperative ( zero-sum = purely competitive)

Basic lingo…(2)

  Games with no states: (bi)-matrix games Games with states: stochastic games , Markov games ; (state transitions are functions of joint actions)   Games with simultaneous moves: normal form Games with alternating turns: extensive form   No. of rounds = 1: one-shot game No. of rounds > 1: repeated game   deterministic action choice: pure strategy non-deterministic action choice: mixed strategy

Basic Analysis

     A joint strategy x is Pareto-optimal everybody’s payoffs if no x’ that improves An agent’s x i is a dominant strategy regardless of others’ actions if it’s always best x i is a best-reponse given x -i to others’ x -i if it maximizes payoff A joint strategy x is an strategy is simultaneously a best-response to everyone else’s strategy, i.e. no incentive to deviate (Nash, correlated) equilibrium if each agent’s A Nash equilibrium always exists, but may be exponentially many of them, and not easy to compute

What about imperfect information games?

 Nash eqm. requires knowledge of all payoffs. For imperfect info. games, corresponding concept is Bayes-Nash equilibrium regular Nash.

(Nash plus Bayesian inference over hidden information). Even more intractable than

Can we make game theory more tractable?

    Active area of research Symmetric games: payoffs are invariant under swapping of player labels.  Can look for symmetric equilibria, where all agents play same mixed strategy.

Network games: agent payoffs only depend on interactions with a small # of neighbors Summarization games: payoffs are simple summarization functions of population joint actions (e.g. voting)

Summary: pros and cons of game theory

 Game theory provides a nice conceptual/theoretical framework for thinking about multi-agent learning.

 Game theory is appropriate provided that:     Game is stationary and fully specified; Enough computer power to compute equilibrium; Can assume other agents are also game theorists; Can solve equilibrium coordination problem.

  Above conditions rarely hold in real applications Multi-agent learning is not only a fascinating problem, it may be the only viable option.

Naïve Approaches to Multi-Agent Learning

  Basic idea: agent adapts, ignoring non-stationarity of other agents’ strategies 1. Fictitious play: Agent observes time-average frequency of other players’ action choices, and models:

times k observed prob

(

action k

)  #

total

observatio ns

agent then plays best-response to this model  Variants of fictitious play: exponential recency weighting, “smoothed” best response (~softmax), small adjustment toward best response, ...



What if all agents use fictitious play?

 Strict Nash equilibria are absorbing points for fictitious play  Typical result is limit-cycle behavior of strategies, with increasing period as N    In certain cases, product of empirical distributions converges to Nash even though actual play cycles (penny matching example)

More Naïve Approaches…



2. Evolutionary game theory:

“Replicator Dynamics” models: large population of agents using different strategies, fittest agents breed more copies.

 Let x= population strategy vector, and x k = fraction of population playing strategy k. Growth rate then:

dx k dt



x k



(

e k

) 

(

)    Above equation also derived from an “imitation” model NE are fixed points of above equation, but not necessarily attractors (unstable or neutral stable)

Many possible dynamic behaviors...



limit cycles attractors unstable f.p.



Also saddle points, chaotic orbits, ...

Replicator dynamics: auction bidding strategies

More Naïve Approaches…

 3. Iterated Gradient Ascent: (Singh, Kearns and Mansour): Again does a myopic adaptation to other players’ current strategy.

 

dx dt i

   

x i



(

x i



)  Coupled system of linear equations: u is linear in x i and x -i Analysis for two-player, two-action games: either converges to a Nash fixed point on the boundary (at least one pure strategy), or get limit cycles

Further Naïve Approaches…

 4. Dumb Single-Agent Learning: Use a single-agent algorithm in a multi-agent problem & hope that it works   No-regret learning by pricebots (Greenwald & Kephart) Simultaneous Q-learning by pricebots (Tesauro & Kephart)  In many cases, this actually works: learners converge either exactly or approximately to self-consistent optimal strategies

“Sophisticated” approaches

 Takes into account the possibility that other agents’ strategies might change.



5. Multi-Agent Q-learning:

 Minimax-Q (Littman): convergent algorithm for two-player zero-sum stochastic games  Nash-Q (Hu & Wellman): convergent algorithm for two-player general-sum stochastic games; requires use of Nash equilibrium solver

More sophisticated approaches...



6. Varying learning rates

  WoLF: “Win or Learn Fast” (Bowling): agent reduces its learning rate when performing well, and increases when doing badly. Improves convergence of IGA and policy hill-climbing Multi-timescale Q-Learning (Leslie): different agents use different power laws t where ordinary Q-learning doesn’t -n for learning rate decay: achieves simultaneous convergence

More sophisticated approaches...

 7. “Strategic Teaching:” recognizes that other players’ strategy are adaptive  “A strategic teacher may play a strategy which is not myopically optimal (such as cooperating in Prisoner’s Dilemma) in the hope that it induces adaptive players to expect that strategy in the future, which triggers a best-response that benefits the teacher.” (Camerer, Ho and Chong)

Theoretical Research Challenges

 

Proper theoretical formulation?

 “No short-cut” hypothesis : Massive on-line search a la Deep Blue to maximize expected long-term reward   (Bayesian) Model and predict behavior of other players , including how they learn based on my actions (beware of infinite model recursion) trial-and-error exploration  continual Bayesian inference using all evidence over all uncertainties (Boutilier: Bayesian exploration)

When can you get away with simpler methods?

Real-World Opportunities



Multi-agent systems where you can’t do game

theory (covers everything :-))  Electronic marketplaces (Kephart)   Mobile networks (Chang) Self-managing computer systems (Kephart)    Teams of robots (Bowling, Stone) Video games Military/counter-terrorism applications

Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.

Transcript Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to.