Semantic Web - University of Huddersfield

Download Report

Transcript Semantic Web - University of Huddersfield

AI – Week 22
Sub-symbolic AI Two: An
Introduction to Reinforcement
Learning
Lee McCluskey, room 3/10
Email [email protected]
http://scom.hud.ac.uk/scomtlm/cha2555/
Reinforcement Learning
Resources
Support Resources:
Introduction for 5 minutes.
http://www.youtube.com/watch?v=m2weFARriE8
See first 10 -20 mins of this one:
http://www.youtube.com/watch?v=ifma8G7LegE
Longer video sequence (home work)
http://videolectures.net/mlss08au_szepesvari_rele/
To Read:
http://www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf
Reinforcement Learning
Reinforcement Learning
• Reinforcement learning is defined characterizing a
learning problem and not by characterizing
learning methods.
• Reinforcement learning differs from supervised
learning, the kind of learning studied in most
current research e.g. machine learning, statistical
pattern recognition, and artificial neural networks
Definition of Terms
•
•
•
•
Policy,
Reward Function,
Value function
Model of the
environment.(optionally)
Policy
• A policy defines the learning agent's way of
behaving at a given time. It is a mapping from
perceived states of the environment to actions
to be taken when in those states.
•
It corresponds to what in psychology would be
called a set of stimulus-response rules or
associations.
• The policy is the core of a any reinforcement
learning agent.
Rewards Function
• A reward function defines the goal in a
reinforcement learning problem.
• It maps each perceived state (or state-action pair)
of the environment to a single number, a reward,
indicating the intrinsic desirability of that state.
• A reinforcement learning agent's sole objective is to
maximize the accumulated reward over a period of
time.
Value Function
•
The reward accumulated over a period of time is known as the
value function
•
Whereas a reward function indicates what is good in an
immediate sense, a value function specifies what is good in the
long run.
•
Whereas rewards determine the immediate, intrinsic desirability
of environmental states, values indicate the long-term desirability
of states after taking into account the states that are likely to
follow, and the rewards available in those states.
•
Rewards are in a sense primary, whereas values, as predictions
of rewards, are secondary. Without rewards there could be no
values, and the only purpose of estimating values is to achieve
more reward.
Model of the Environment.
• This represents and mimics the behaviour of the
environment.
• Models are used for planning, by which we mean
any way of deciding on a course of action by
considering possible future situations before they
are actually experienced.
• Models are optional
General Idea
s
SENSE
Environment
EFFECT
Represent the world as an agent interacting with
the environment:
• Conduct “Trial and Error” or “Sampling” experiments in order to
solve some goal.
• Sense a reward (or negative reinforcement) as a result of some
behaviour that moves towards the goal.
• Add more weight to using that behaviour (or less weight) and
continue trials.
Reinforcement Learning
RL – the idea is pervasive
REWARD
ACTION
Reinforcement Learning
Output of RL
The goal of RL is to learn a mapping
SITUATION => ACTION
which optimises the rewards obtained.
Note connection with AI Planning:
• Situation = Goal, State, Actions
• Input to MetricFF, output solution
• Action is head(solution). Assuming solution is optimal, this is the
best action to take.
RL “comes into its own” when the condition where we can use a
planner are not met e.g. partial observable state, actions not well
specified
Reinforcement Learning
Challenges of RL
• One of the challenges that arise in reinforcement
learning is the trade-off between exploration and
exploitation
Tic-Tac-Toe
Tic-Tac-Toe
• Although might look like a simple problem, but
cannot readily be solved in a satisfactory way
through classical techniques.
• For example, the classical "minimax" solution from
game theory is not accurate in this case because it
assumes a particular way of playing by the
opponent.
Tic-Tac-Toe
• This example has a relatively small, finite state
set, whereas reinforcement learning can be used
when the state set is very large, or even infinite.
For example, Gerry Tesauro (1992, 1995) combined the algorithm described above with an artificial
neural network to learn to play backgammon, which has approximately states 1020. With this
many states it is impossible ever to experience more than a small fraction of them.
Summary
• Reinforcement learning uses a formal framework in
terms of states, actions, and rewards
• The concepts of maximising value and value
functions are the key features of the reinforcement
learning methods.
• Reinforcement learning is a computational
approach to understanding and automating goaldirected learning and decision-making.