Transcript Document

Fuzzy Reinforcement Learning
Agents
By
Ritesh Kanetkar
Systems and Industrial Engineering
Lab Presentation
May 23, 2003
What is a agent?

An agent is a computer system situated in some environment, and
that is capable of autonomous action in this environment in order to
meet its design objectives.

An autonomous agent should be able to act without the direct
intervention of humans or other agents, and should have control
over its own actions and internal state.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Why Agents?





Ability to act autonomously
Flexibility, scalability and modularity
characteristics
Real-time performance
Suitability for distributed applications
Ability to work co-operatively in teams
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Learning in Agents

Supervised Learning


Neural Network
Unsupervised Learning

Reinforcement Learning
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Supervised vs. Unsupervised

Supervised Learning
 Learning under a skilled teacher
 Learning through presentation of input-output pairs
 Given a set of inputs attempts to predict the output values

Unsupervised Learning
 No supervisor present
 Only data available is through feedback
 Learning through evaluation of actions
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Reinforcement Learning





Maps states to actions
Input is current state S1
Output is selected action
Action change the state to S2
After evaluating the mapping a
reinforcement signal is given to
the agent
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Reinforcement Learning

Advantages



Less environment oriented programming
Works in changing environment
Problems


Large number of possible states
Consider only discrete events ( Real world problems are
continuous)
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
How RL works?
T=30
T=30
M1
M1
R=0. 5
a1
R=0. 5
b1
T=20
T=20
M2
M2
a2
S1
b2
R=0. 5
S2 S2
R=0. 5
T=10
a3
T=10
b3
M3
M3
R=0. 5
R=0. 5
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
S3
Continued



Aim 1 : To find the shortest processing time.
Ideal Actions : a3 – b3.
Assumptions :
 Action with highest utility is chosen
 Each machine bids for the part as per its utility value (initially all
0. 5).
 The winning machine gives a part of its utility to the previous
winning agent for successfully creating the state for him.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Continued ( Rule for reward)

Rule for giving reward to previous winning agent
t (min)
10
20
30

r
0.3
0.2
0.1
Reward from state S0 and S1, say 0.25 for our
model.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Continued (Calculations of utility value)
Iteration
State
Machine selected
Utility value
1
S1
M2
( 0.5 -0.25) + 0.2 = 0.45
2
S1
M1
( 0.5 -0.25) + 0.1 = 0.35
3
S1
M3
( 0.5 -0.25) + 0.3 = 0.55
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Continued (Changes in utility value)
Iterations
M1
M2
M3
M1
M2
M3
0
0.5
0.5
0.5
0.5
0.5
0.5
1
0.5
0.45
0.5
2
0.35
0.45
0.5
3
0.35
0.45
0.55
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
0.55
Use of Fuzzy Logic

Fuzzy logic to map states (environment) to actions. Problem tackled
is of the elimination of discrete events by use of fuzzy logic.

Fuzzy logic to integrate the multiple rewards into a single feedback
signal.

Due to large action space we cannot use traditional lookup tables.
So generalization of mapping is required.

Incorporation of human language.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Problems







Agents as dynamical systems interacting with the
environment
Network of agents (Multi-agent system)
Multiple reward system
Multiple criteria systems
Continues events system
Large state space in real world problems
Bargaining problems
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Fuzzy Inference System (FIS)
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
FIS

FIS rule base is made of N rules :
Ri : If s1 = L1i and ……and sn = LN1i
then y1 = O1o and ……and yn = ON1o
Where,
Si = input vector
Ri = i’th rule
Lji = Fuzzy label
Yi = Output vector
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Fuzzy Inference System (FIS)

Layer 1: Input layer


Layer 2: Linguistic Labels


This layer does the fuzzification process.
Layer 3: Rules


Defines the input variables needed to describe the states completely.
This layer defines the if-else rules giving rule truth values.
Layer 4: Output layer

Gives the FIS output.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Assumptions




Number of input variables and fuzzy labels are selected
depending on problem
Number of rules is determined by numbers of elements in first
two layers. (Product of labels for each input variable)
Each have a predefined number of outputs
So only most difficult part left is the conclusion of all possible
combinations (Rule conclusion)
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
What it does?



Maps states to actions.
Rules can be formulated in human language.
Each rule contains:




Value Vi to approx. optimal evaluation function.
Action set Ui
Parameter vector wi giving the weight of different action in a rule
to approximate policy.
Final output is the weighted average of all the actions.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
FIS Output
(Primary reinforcement)
(Internal reinforcement through critic)
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Procedure






Estimate the evaluation function corresponding to current state.
Vt(St+1)= vt .Фt+1
Compute the TD error єt+1.
Tune the parameters v and w.
Estimate the new evaluation function with new conclusion vector
vt+1.
Learning rate updating.
Computing and triggering of global action Ut+1
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Problem





Single machine scheduling problem
3 parts
Each part with individual earliness-tardiness
penalties, due dates and processing times
19 time slots on machine
Minimize the deviation from due dates reducing the
penalties
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Work in progress




Currently working with the single machine
scheduling problem with earliness/tardiness penalty
and due dates.
Identifying the various parameters.
Understanding the mathematics behind the FIS.
Incorporating bargaining model in FIS.
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
References








Fuzzy Inference System Learning by Reinforcement Methods – Lionel Jouffe
(IEEE)
Dynamic single machine scheduling under distributed decision making – Pooja
Dewan, Sanjay Joshi (IJPR)
Evolutionary Learning agents for shop floor control- Bruno Maione, David Naso
(IEEE)
A fuzzy logic based methodology to rank shop floor dispatching rules – Albert
Petroni (IJPE)
Multi Agent Reinforcement Learning with bidding for automatic segmentation of
action sequence – Ron Sun (IEEE)
AI depot - http://ai-depot.com/ (for RL)
RL – An Introduction (Suttons and Barto)
Matlab fuzzy logic toolbox tutorials
COMPUTER INTEGRATED MANUFACTURING LAB
Department of Systems and Industrial Engineering
Thank You