اتوماتای یادگیر Instructor : Saeed Shiry مقدمه   An automaton is a machine or control mechanism designed to automatically follow a predetermined sequence of operations.

Download Report

Transcript اتوماتای یادگیر Instructor : Saeed Shiry مقدمه   An automaton is a machine or control mechanism designed to automatically follow a predetermined sequence of operations.

‫اتوماتای یادگیر‬
Instructor : Saeed Shiry
1
‫مقدمه‬


An automaton is a machine or control mechanism
designed to automatically follow a predetermined
sequence of operations or respond to encoded
instructions.
The concept of learning automaton grew out of a
fusion of the work of psychologists in modeling
observed behavior, the efforts of statisticians to
model the choice of experiments based on past
observations, the attempts of operation researchers
to implement optimal strategies in the context of the
two-armed bandit problem, and the endeavors of
system theorists to make rational decisions in
random environments
2
Stochastic Learning Automata–
Reinforcement Learning
3
Stochastic Learning Automata–
Reinforcement Learning





In classical control theory, the control of a process is based on
complete knowledge of the process/system. The mathematical
model is assumed to be known, and the inputs to the process are
deterministic functions of time.
Later developments in control theory considered the
uncertainties present in the system.
Stochastic control theory assumes that some of the
characteristics of the uncertainties are known. However, all those
assumptions on uncertainties and/or input functions may be
insufficient to successfully control the system if changes.
It is then necessary to observe the process in operation and
obtain further knowledge of the system, i.e., additional
information must be acquired on-line since a priori assumptions
are not sufficient.
One approach is to view these as problems in learning.
4
reinforcement learning


A crucial advantage of reinforcement learning
compared to other learning approaches is that it
requires no information about the environment
except for the reinforcement signal .
A reinforcement learning system is slower than
other approaches for most applications since
every action needs to be tested a number of
times for a satisfactory performance.

Either the learning process must be much faster than
the environment changes, or the reinforcement
learning must be combined with an adaptive forward
model that anticipates the changes in the environment
5
applications of learning
automata

ُSome Recent applications of learning automata to
real life problems:










control of absorption columns,
Bioreactors,
control of manufacturing plants,
pattern recognition ,
graph partitioning ,
active vehicle suspension,
path planning for manipulators ,
distributed fuzzy logic processor training ,
path planning and
action selection for autonomous mobile robots.
6
learning paradigm


The the learning automaton presents may be stated as follows:
a finite number of actions can be performed in a random
environment.



When a specific action is performed the environment provides a random
response which is either favorable or unfavorable.
The objective in the design of the automaton is to determine how
the choice of the action at any stage should be guided by past
actions and responses.
The important point to note is that the decisions must be made
with very little knowledge concerning the “nature” of the
environment.

The uncertainty may be due to the fact that the output of the
environment is influenced by the actions of other agents unknown to the
decision maker.
7
The automaton and the
environment
8
The environment


The environment in which the automaton “lives”
responds to the action of the automaton by
producing a response, belonging to a set of
allowable responses, which is probabilistically
related to the automaton action.
The term environment is not easy to define in
the context of learning automata. The definition
encompasses a large class of unknown random
media in which an automaton can operate.
9
The environment

Mathematically, an environment is
represented by a triple: {a, c, b}

a represents a finite action/output set,
b represents a (binary) input/response set, and

c is a set of penalty probabilities, where each

element ci corresponds to one action ai of the
set a.
10
The environment


The output (action) a(n) of the automaton belongs to the set a,
and is applied to the environment at time t = n.
The input b(n) from the environment is an element of the set b
and can take on one of the values b1 and b2.

In the simplest case, the values bi are chosen to be 0 and 1,


The elements of c are defined as:



1 is associated with failure/penalty response.
{
Prob b(n) a(n) a
} c (i , ,...)
Therefore ci is the probability that the action ai will
result in a penalty input from the environment.
When the penalty probabilities ci are constant, the
environment is called a stationary environment.
11
Models

P-model

Models in which the input from the environment can take only
one of two values, 0 or 1, are referred to as P-models. In this
simplest case, the response value of 1 corresponds to an
“unfavorable” (failure, penalty) response, while output of 0 means
the action is “favorable”

Q-model
 A further generalization of the environment allows finite
response sets with more than two elements that may take finite
number of values in an interval [a, b]. Such models are called Qmodels.

S-model

When the input from the environment is a continuous random
variable with possible values in an interval [a, b], the model is
named S-model.
12
The automaton

The automaton can be represented by a quintuple
{F, a, b, F(•,•), H(•,•)}
where:






F is a set of internal states. At any instant n, the state f(n) is an element of
the finite
set F = {f1, f2,..., fs}
a is a set of actions (or outputs of the automaton). The output or action of
an
automaton an the instant n, denoted by a(n), is an element of the finite set a
= {a1, a2,..., ar}
b is a set of responses (or inputs from the environment). The input from the
environment b(n) is an element of the set b which could be either a finite
set or an infinite set, such as an interval on the real line:
b = {b1, b2 ,..., bm} or b = {(a,b)}
13
The automaton


F(•,•): F x b  F is a function that maps the current state
and input into the next state.
F can be deterministic or stochastic:
f(n +1) = F[f(n),b(n)]
H(•,•): F x b  a is a function that maps the current state
and input into the current output.
If the current output depends on only the current state, the
automaton is referred to as state-output automaton.
In this case, the function H(•,•) is replaced by an output
function G(•): F  a, which can be either deterministic
or stochastic:
a(n) = G[f(n)]
14
The Stochastic Automaton


In stochastic automaton at least one of the two
mappings F and G is stochastic.
If the transition function F is stochastic, the
elements fij b of F represent the probability that
the automaton moves from state fi to state fj
following an input b:
15
The Stochastic Automaton

For the mapping G, the definition is similar:

Since fij are probabilities, they lie in the closed
interval [a, b]; and to conserve probability
measure we must have:
b
16
The Stochastic Automaton
17
Automaton and Its
Performance Evaluation





A learning automaton generates a sequence of
actions on the basis of its interaction with the
environment.
If the automaton is “learning” in the process, its
performance must be superior to “intuitive” methods.
To judge the performance of the automaton, we
need to set up quantitative norms of behavior.
The quantitative basis for assessing the learning
behavior is quite complex, even in the simplest Pmodel and stationary random environments.
To introduce the definitions for “norms of behavior”,
we will consider this simplest case
18
Norms of Behavior




If no prior information is available, there is no basis in
which the different actions ai can be distinguished.
In such a case, all action probabilities would be equal to a
“pure chance” situation.
For an r-action automaton, the action probability vector
p(n) = Pr {a(n) = ai } is given by:
Such an automaton is called “pure chance automaton,” and
will be used as the standard for comparison.
19
Norms of Behavior

Consider a stationary random environment
with penalty probabilities

We define a quantity M(n) as the average
penalty for a given action probability vector:
20
Norms of Behavior

For the pure-chance automaton, M(n) is a
constant denoted by Mo:

Also note that:

i.e., E[M(n)] is the average input to the
automaton.
21
Norms of Behavior
22
Variable Structure Automata


A more flexible learning automaton model
can be created by considering more general
stochastic systems in which the action
probabilities (or the state transitions) are
updated at every stage using a reinforcement
scheme.
For simplicity, we assume that each state
corresponds to one action, i.e., the
automaton is a state-output automaton.
reinforcement scheme

A reinforcement scheme can be represented
as follows:

where T1 and T2 are mappings.
Linear Reinforcement
Schemes
general linear schemes
the parameter a is associated with reward
response, and the parameter b with
penalty response. If the learning
parameters a and b are equal, the scheme
is called the linear reward-penalty scheme
LR-P
Linear Reinforcement
Schemes

by analyzing eigen values of the resulting difference
equation, it can be shown that asymptotic solution of
the set of difference equations enables us to
conclude:

Therefore, the multi-action automaton using the LR-P
scheme is expedient for all initial action probabilities and
in all stationary random environments.
Expediency



Expediency is a relatively weak condition on the
learning behavior of a variable-structure automaton.
An expedient automaton will do better than a pure
chance automaton, but it is not guaranteed to reach
the optimal solution.
In order to obtain a better learning mechanism, the
parameters of the linear reinforcement scheme are
changed as follows:


if the learning parameter b is set to 0, then the scheme is
named the linear reward-inaction scheme LR-I.
This means that the action probabilities are updated in the
case of a reward response from the environment, but no
penalties are assessed.
Interconnected Automata






it is possible that there are more than one automata in an environment.
If the interaction between different automata is provided by the
environment, the case of multi-automata is not different than a single
automaton case.
The environment reacts to the actions of multiple automata, and the
environment output is a result of the combined effect of actions chosen
by all automata.
If there is direct interaction between the automata, such as the
hierarchical (or sequential) automata models, the actions of some
automata directly depend on the actions of others.
It is generally recognized that the potential of learning automata can be
increased if specific rules for interconnections can be established.
Example: A Vehicle Control

Since each vehicle’s planning layer will include two automata — one for lateral, the
other for longitudinal actions — the interdependence of these two sets of actions
automatically results in an interconnected automata network.
Application of Learning Automata
to Intelligent Vehicle Control


Designing a system that can safely control a
vehicle’s actions while contributing to the optimal
solution of the congestion problem is difficult
When the design of a vehicle capable of carrying
out tasks such as vehicle following at high speeds,
automatic lane tracking, and lane changing is
complete, we must also have a control/decision
structure that can intelligently make decisions in
order to operate the vehicle in a safe way.
Vehicle Control

The aim here is to design an automata
system that can learn the best possible action
(or action pairs: one for lateral, one for
longitudinal) based on the data received from
on-board sensors.
The Model


For our model, we assume that an intelligent vehicle is capable
of two sets of lateral and longitudinal actions.
 Lateral actions are shift-to-left-lane (SL), shift-to-right-lane (SR)
and stayin- lane (SiL).
 Longitudinal actions are accelerate (ACC), decelerate (DEC)
and keep-same-speed(SM).
There are nine possible action pairs provided that speed
deviations during lane changes are allowed.
Sensors




An autonomous vehicle must be able to ‘sense’ the environment around
itself.
In the simplest case, it is to be equipped with at least one sensor
looking at the direction of possible vehicle moves.
Furthermore, an autonomous vehicle must also have the knowledge of
the rate of its own displacement.
Therefore, we assume that there are four different sensors on board the
vehicle:
headway sensor, two side sensors, and a speed sensor.



The headway sensor is a distance measuring device which returns the
headway distance to the object in front of the vehicle. An implementation of
such a device is a laser radar.
Side sensors are assumed to be able to detect the presence of a vehicle
traveling in the immediately adjacent lane. Their outputs are binary. Infrared
or sonar detectors are currently used for this type of sensor.
The speed sensor is simply an encoder returning the current wheel speed of
the vehicle.
Automata in a multi-teacher
environment connected to the
physical layers
Mapping


The mapping F from sensor module outputs to the
input b of the automata can be a binary function
(for a P-model environment), a linear
combination of four teacher outputs, or a more
complex function  as is the case for this
application.
An alternative and possibly more ideal model
would use a linear combination of teacher outputs
with adjustable weight factors (e.g., S-model
environment).
buffer in regulation layer

The regulation layer is not expected to carry
out the action chosen immediately. This is not
even possible for lateral actions. To smooth
the system output, the regulation layer carries
out an action if it is recommended m times
consecutively by the automaton, where m is a
predefined parameter less than or equal to
the number of iterations per second.
35
‫مراجع‬


Phd Thesis:
Unsal, Cem , Intelligent Navigation of
Autonomous Vehicles in an Automated
Highway System: Learning Methods and
Interacting Vehicles Approach
http://scholar.lib.vt.edu/theses/available/etd5414132139711101/
http://ceit.aut.ac.ir/~shiry/lecture/machinelearning/tutorial/LA/
36
‫اتوماتاي يادگير سلولي‬
Cellular Learning Automata
Amirkabir Univerity - Machine
learning Course
‫اتوماتاي سلولي)‪(CA‬‬
‫‪ ‬اتوماتاي سلولي شبكه اي از سلولها ست که در يک توپولوژي مشخص قرار گرفته اند‪.‬‬
‫‪ ‬براي هر سلول‪ ،‬سلولهاي ديگري به عنوان همسايه وجود دارند‪.‬‬
‫• دو ويژگي همسايگي‪:‬‬
‫‪ )1‬هر سلول همسايه خودش است‪ )2 .‬اگر سلول ‪ x‬همسايه سلول ‪ y‬است‪ ,‬سلول ‪ y‬هم همسايه سلول ‪x‬‬
‫است‪.‬‬
‫‪ ‬براي هر سلول تعدادي متناهي حالت)‪ (state‬وجود دارد که در هر لحظه هر سلول در يکي از اين حالت هاست‪.‬‬
‫‪38‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫‪ ‬در ‪ CA‬مجموعه اي از قوانين محلي تعريف مي شود که بر اساس حالت همسايگان سلول‪ ،‬حالت بعدي آن سلول را‬
‫مشخص مي کند‪.‬‬
‫‪Neighborhood‬‬
‫‪Rules‬‬
‫‪Next Step‬‬
‫‪39‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫ويژگيهاي اساس ي اتوماتاي سلولي عبارتند از‪:‬‬
‫آ‪ -‬فضايي گسسته دارند‪.‬‬
‫ب‪ -‬زمان بصورت گسسته پيش ميرود‪.‬‬
‫پ‪ -‬حالتهاي که سلولها مي توانند دارا باشند متناهي است‪.‬‬
‫ت‪ -‬تمام سلولها يكسان ميباشند‪.‬‬
‫ث‪ -‬عمل بروز در آوردن سلولها بصورت همگام ميباشد‪.‬‬
‫ج‪ -‬قوانين تصادفي نبوده و بطور قطعي اعمال ميشوند‪.‬‬
‫چ‪ -‬قانون در هر محل فقط بستگي به مقادير همسايههاي آن دارد‪.‬‬
‫‪40‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫مثال‪Game of Life :‬‬
‫قوانين ‪:‬‬
‫‪ .1‬هر سلول که دو يا سه همسايه آن زنده باشند‪ ،‬زنده مي ماند‪.‬‬
‫‪ .2‬هر سلول با چهار يا بيشتر همسايه زنده به خاطر ازدحام جمعيت مي ميرد‪.‬‬
‫‪.3‬هر سلول با يک يا هيچ همسايه زنده از تنهايي مي ميرد‪.‬‬
‫‪ .4‬هر سلول مرده که دقيقا سه همسايه زنده داشته باشد متولد مي شود‪.‬‬
‫‪41‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫قابليتها و نقصهاي اتوماتاي سلولي‬
‫‪ CA ‬مدلي است که از يکسري اجزاء مشابه و ساده تشکيل شده است که قوانين بسيار ساده محلي نيز بر آنها حاکم است‪ .‬اما در‬
‫نهايت مي تواند سيستمهاي پيچيده اي را مدل کند‪.‬‬
‫‪ ‬يک اشکال عمده ‪ CA‬تعيين فرم قطعي قوانين مورد نياز براي يک کاربرد خاص است و اينکه ‪ CA‬براي مدل کردن سيستمهاي‬
‫قطعي مناسب مي باشد‪.‬‬
‫‪ ‬پس بايد به دنبال روش ي باشيم که بدون نياز به تعيين فرم قطعي قوانين‪ ،‬با گذشت زمان قوانين مناسب استخراج شوند‪.‬‬
‫هوشمند کردن سلولهاي ‪ CA‬و افزودن قابليت يادگيري به آنها يکي از اين روشهاست!‬
‫‪42‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫اتوماتاي يادگير سلولي‬
‫‪ CLA‬يک ‪ CA‬است که هر سلول آن به يک ‪ LA‬مجهز مي باشد‪.‬‬
‫‪ ‬اين مدل از ‪ CA‬بهتر است‪ ,‬به دليل قابليت يادگيري که داراست‪.‬‬
‫‪ ‬و بر ‪ LA‬نيز برتري دارد زيرا مجموعه اي از ‪ LA‬هاست که مي توانند با هم فعل و انفعال داشته باشند‪.‬‬
‫‪ ‬ايده اصلي ‪ CLA‬که اين است که از يادگيري براي تنظيم چگونگي انتقال وضعيتها در ‪ CA‬استفاده کنيم‪.‬‬
‫‪43‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫تعريف رياض ي اتوماتاي يادگير سلولي‬
‫) ‪CLA = (Z d , f , A, N , F‬‬
‫است به طوريكه‪:‬‬
‫اتوماتاي يادگير سلولي ‪ d‬بعدي يك چندتايي‬
‫‪ Z d ‬يك شبكه از ‪ d‬تايي هاي مرتب از اعداد صحيح مي باشد‪ .‬اين شبكه مي تواند يك‬
‫شبكه متناهي‪ ،‬نيمه متناهي يا نامتناهي باشد‪.‬‬
‫‪ f ‬يك مجموعه متناهي از حالتها مي باشد‪.‬‬
‫‪ ،A ‬يك مجموعه از اتوماتاهاي يادگير (‪ )LA‬است كه هر اتوماتاي يادگير به يك سلول‬
‫نسبت داده مي شود‪.‬‬
‫‪ ،N = {x1,...,xm } ‬يك زير مجموعه متناهي از‬
‫مي شود‪.‬‬
‫‪m‬‬
‫‪b‬‬
‫‪F :f  b - ‬‬
‫مجموعه مقاديري است كه مي‬
‫قانون محلي ‪ CLA‬مي باشد به طوريكه‬
‫تواند به عنوان سيگنال تقويتي پذيرفته شود‪.‬‬
‫‪d‬‬
‫مي ‪Z‬باشد كه بردار همسايگي خوانده‬
‫‪44‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫کاربردهاي اتوماتاي يادگير سلولي‬
‫‪ ‬كاربردهاي پردازش تصويري (حذف نويز و تعيين لبه)‬
‫‪‬تخصيص منابع در شبكه هاي موبايل (تخصيص كانال و پذيرش كانال)‬
‫‪ ‬مدل سازي پديده ها (انتشار شايعه و بازار اقتصادي)‬
‫‪ ‬گريدهاي محاسباتي‬
‫‪ ‬طراحي سيستمهاي چند عامله‬
‫‪ ‬الگوريتمهاي بهينه سازي‬
‫‪45‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫کاربرد ‪ CLA‬در پردازش تصوير‬
‫‪ ‬ابتدا تصوير در اتوماتاي يادگير سلولي دوبعدي نگاشت مي شود بطوريکه هر پيکسل به يکي از سلولهاي اتوماتاي‬
‫يادگير داده مي شود‪.‬‬
‫‪ ‬با توجه به کاربرد خاص مورد نظر‪ ،‬اقدامهاي ممکن براي سلولها و قانون محلي جريمه و پاداش تعيين مي شود‪.‬‬
‫‪46‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫مثال‪ :‬تشخيص لبه هاي تصوير با استفاده از ‪CLA‬‬
‫‪ ‬هر اتوماتا داراي دو اقدام مي باشد‪ .1 :‬پيکسل به لبه متعلق است‪ .2 .‬پيکسل به لبه متعلق نيست‪.‬‬
‫‪ ‬در ابتدا هر اتوماتا به صورت تصادفي يکي از اقدامهاي خود را انتخاب مي کند و البته تعداد اتوماتاهايي که اقدام اول را‬
‫انتخاب مي کنند کمتر از تعداد اتوماتاهايي که اقدام دوم را انتخاب مي کنند در نظر گرفته مي شود‪.‬‬
‫‪ ‬در هر مرحله از تکرار هر اتوماتا وضعيت خود را با وضعيت همسايگانش مقايسه مي کند و بر اساس اين مقايسه رفتار خود‬
‫را تصحيح مي نمايد‪ .‬بر اين اساس که‪:‬‬
‫• اگر دو تا چهار اتوماتاي همسايه يک سلول روي لبه باشند‪ ،‬احتماال اين سلول هم روي لبه است‪.‬‬
‫• اگر يک يا بيشتر از چهار اتوماتاي همسايه يک سلول روي لبه نباشند‪ ،‬احتماال اين سلول همروي لبه نيست‪.‬‬
‫‪47‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫بدين صورت‪:‬‬
‫• اگر يک سلول در ‪ CLA‬اقدام اول خود را انتخاب کند و تعداد اتوماتاهاي همسايگي ‪8‬تايي آن که همان اقدام را انتخاب کرده‬
‫اند بين دو تا چهار باشد‪ ،‬اقدام انتخاب شده مناسب بوده و پاداش مي گيرد‪.‬‬
‫• اگر يک سلول در ‪ CLA‬اقدام دوم خود را انتخاب کند و تعداد اتوماتاهاي همسايگي ‪8‬تايي آن که همان اقدام را انتخاب‬
‫کرده اند يک يا بيشتر از چهار باشد‪ ،‬اقدام انتخاب شده مناسب بوده و پاداش مي گيرد‪.‬‬
‫• حالتهاي ديگر به اين معني است که اقدام انتخاب شده نادرست بوده و جريمه مي شود‪.‬‬
‫‪ ‬عمليات فوق را به تعداد معين و يا تا زماني که کليه اتوماتاها به وضعيت پايدار برسند تکرار مي کنيم‪.‬‬
‫‪48‬‬
‫‪Amirkabir Univerity - Machine‬‬
‫‪learning Course‬‬
‫ در استخراج لبه‬CLA ‫عملکرد روش‬
Amirkabir Univerity - Machine
learning Course
49
‫منابع‬
H.Beigy and M.R.Meybodi. “A Mathematical Framework For 
Cellular Learning Automata”, Advanced in Complex Systems ,2004.
‫ دانشگاه‬,”‫ “اتوماتاي يادگير سلولي و کاربردهاي آن در پردازش تصاوير‬,‫ محمد رفيع خوارزمي‬،‫ محمد رضا ميبدي‬
.1382 ‫ پاييز‬،‫اميرکبير‬
Amirkabir Univerity - Machine
learning Course
50