下載/瀏覽Download

Download Report

Transcript 下載/瀏覽Download

Reinforcement Learning in
Strategy Selection for a
Coordinated Multirobot System
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND
CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL.
37, NO. 6, NOVEMBER 2007
Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen,
and Ching-Huang Lee
Advisor : Ming-Yuan Shieh
PPT製作率︰100%
Student : Ching-Chih Wen
S/N
1
: M9820108
OUTLINE
 Abstract
 Introduction
 SYSTEM FORMATION





Basic Behavior
Role Assignment
Strategies
Learning System
Dispatching System
 EXPERIMENTS
 CONCLUSION
2
ABSTRACT
 This correspondence presents a multi-strategy decision-making
system for robot soccer games. Through reinforcement processes,
the coordination between robots is learned in the course of game.
 The responsibility of each player varies along with the change of the
role in state transitions. Therefore, the system uses several strategies,
such as offensive strategy, defensive strategy, and so on, for a
variety of scenarios.
 The major task assignment to robots in each strategy is simply to
catch good positions.
 Utilizing the Hungarian method, each robot can be assigned to its
assigned spot with minimal cost.3
INTRODUCTION(1/3)
 Reinforcement learning has attracted increasing interest in the fields
of machine learning and artificial intelligence recently since it
promises a way to use only reward and punishment in achieving a
specific task [1].
Fig.1
4
INTRODUCTION(2/3)
 Traditional reinforcement-learning algorithms are often concerned
with single-agent problems; however, no agent can act alone since it
must interact with other agents in the environment to achieve a
specific task [3].
 Therefore, we here focus on high-level learning rather than the
basic-behavior learning.
 The main objective of this correspondence is to develop the
reinforcement-learning architecture for multiple coordinate
strategies in a robot soccer system.
5
INTRODUCTION(3/3)
 In this correspondence, we utilize the robot soccer system as our
test platform since this system can fully implement a multi-agent
system.
Fig.2
6
SYSTEM FORMATION
Fig.3
7
SYSTEM FORMATION-Basic
Behavior
 1) Go to a Position
 2) Go to a Position With Avoidance
 3) Kick a Ball to a Position
Fig.4
Fig.5
8
SYSTEM FORMATION-Role
Assignment(1/3)
 1) Attacker position
Fig.6
Fig.7
Fig.8
9
SYSTEM FORMATION-Role
Assignment(2/3)
2) Sidekick position
Fig.9
Fig.10
3) Backup position
4) Defender position
10
SYSTEM FORMATION-Role
Assignment(3/3)
5) Goalkeeper position
Fig.11
11
SYSTEM FORMATIONSTRATEGIES(1/2)
 1) Primary part:
The attacker’s weighting is W .
 2) Offensive part:
The weighting of sidekick and backup are W and W ,
respectively.
 3) Defensive part:
The weighting of defender and goalkeeper are W and W ,
respectively.
a
s
b
d
12
g
SYSTEM FORMATIONSTRATEGIES(2/2)
 According to the different weightings, different strategies can be
developed. We can develop three strategies as follows:
 1) Normal strategy:
W  W  W  W  W W ,W ,W ,W ,W   1,1,1,1,1 is an example used in our
simulations.
 2) Offensive strategy:
W > max W , W  and min W , W  max W ,W W W W W W   2,1.5,1.5,1,1 is an
example used in our simulations.
 3) Defensive strategy:
W  max W ,W  and min W ,W   max W ,W W ,W ,W ,W ,W   2,1,1,1.5,1.5 is an
example used in our simulations.
a
a
a
s
b
d
s
b
d
g
g
a
s
s
d
b
b
g
d
g
d
g
s
13
a
b
s
a
b
d
s
g
b
d
g
SYSTEM FORMATIONLEARNING SYSTEM(1/3)
Fig.12
14
SYSTEM FORMATIONLEARNING SYSTEM(2/3)
 1) States:
Fig.13
 2) Actions :The actions of Q-learning are spontaneous decisions on
the strategies taken in each learning cycle. Each action is
represented by a set of weights.
15
SYSTEM FORMATIONLEARNING SYSTEM(3/3)
 3) Reward Function:
— Gain a point: r = 1.
— Lose a point: r = −1.
—Others: r = 0.
 4) Q-Learning:Based on the states, actions, and reward function, we
can fully implement the Q-learning method.
 Here, the ε-greedy method is chosen as action selection policy, and
the probability of exploration ε is 0.1. The learning rate α is 0.8, and
the discount factor γ is 0.9.
16
SYSTEM FORMATIONDISPATCHING SYSTEM
 First, we introduce the method to compute cost.
 Since the cost of each robot reaching each target is known, we can
compute the summation costs of all robots to their dispatching
positions.
17
EXPERIMENTS(1/4)
 Multiple Strategy Versus the Benchmark
Fig.14
Fig.15
18
EXPERIMENTS(2/4)
 Multiple Strategy Versus Each Fixed Strategy
Fig.16
Fig.17
19
EXPERIMENTS(3/4)
 Multiple Strategy Versus Defensive Strategy
Fig.18
Fig.19
20
EXPERIMENTS(4/4)
 Multiple Strategy Versus Normal Strategy
Fig.20
Fig.21
21
CONCLUSION
 1) Hierarchical architecture: The system is designed hierarchically,
from basic behaviors to strategies. In other vehicle-systems, the
basic behaviors can also be utilized.
 2) A general learning system platform: If another strategy is
designed, it can easily be added into our learning system without
much alteration. Through the learning process, we can map the state
to the best strategy.
 3) Dynamic and quick role assignment: In this system, the role of
each robot is changeable. We use the linear programming method to
speed up our computation and to find the best dispatch under a
strategy.
22
REFERENCES














[1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania,
Philadelphia, Mar. 2001. Tech. Rep. [Online]. Available: http://citeseer.ist.psu.edu/531873.html
[2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim. , vol. 42, no. 4, pp. 1143–1166,
2003.
[3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent
Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004,
pp. 89–95.
[4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent
systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr. 2004.
[5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,”
IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep. 2003.
[6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,”
IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug. 2004.
[7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot.
Auton. Syst., vol. 35, no. 2, pp. 109–122, May 2001.
[8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time selforganizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr. 2002.
[9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12, 1999.
[10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill, 2001.
[11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman, 1983.
[12] Accessed on 22th of March 2003. [Online]. Available: http://www.fira. net/soccer/simurosot/overview.html
[13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ:
Prentice-Hall, 1982.
[14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp.
NeuralNetw ., 2006, vol. 3971, pp. 599–602.
23