Reinforcement learning (Part I, intro)

Download Report

Transcript Reinforcement learning (Part I, intro)

CS 4700:
Foundations of Artificial Intelligence
Bart Selman
Reinforcement Learning
R&N – Chapter 21
Note: in the next two parts of RL, some of the
figure/section numbers refer to an earlier edition of R&N
with a more basic description of the techniques.
The slides provide a self-contained description.
Reinforcement Learning
In our discussion of Search methods (developed for
problem solving), we assumed a given State Space and
operators that lead from one State to one or more
Successor states with a possible operator Cost.
The State space can be exponentially large but is in principle
Known. The difficulty was finding the right path (sequence of
moves). This problem solved by searching through the various
alternative sequences of moves. In tough spaces, this leads to
exponential searches.
Can we do something totally different?? Avoid search…
Why don’t we “just learn” how to make the right
move in each possible state?
In principle, need to know very little about
environment at the start. Simply observe another
agent / human / program make steps (go from
state to state) and mimic!
Reinforcement learning: Some of the earliest AI
research (1960s). It works! Principles and ideas still
applicable today.
Environment we consider is a basic game (the simplest
non-trivial game):
Tic-Tac-Toe
The question: Can you write a program that learns
to play Tic-Tac-Toe?
Let’s try to re-discover what Donald Michie did in
1962. He did not even use a computer! He handsimulated one.
The first non-trivial machine learning program!
Tic-tac-toe (or Noughts and
crosses, Xs and Os)
Now, we don’t want…
We start 3 moves per player in:
3x3 Tic-Tac-Toe
optimal play
X’s turn
O’s turn
X
loss
loss
Bart Selman
CS4700
5
What else can we think of?
Basic ingredients needed:
1) We need to represent board states.
2) What moves to make in different states.
It may help to think a bit probabilistically … pick moves
with some probability and adjust probabilities through a
learning procedure …
Learn from human opponent
We could try to learn directly from human what
moves to make…
But, some issues:
1) Human may be a weak player.  We want to
learn how to beat him/her!
2) Human may play “nought” (second player)
and computer wants to learn how to play
“cross” (first player).
Answer:
Let’s try to “just play” human against machine
and learn something from wins and losses.
To start: some basics of the “machine”
For each board state where cross is on-move,
have a “match box” labeled with that state.
Requires a few hundred matchboxes.
Each match box has a number of colored “beads” in it, each
color represents a valid move for cross on that board.
E.g. start with ten
beads of each color
for each valid move.
1) To make a move,
pick up box with label of
current state, shake it,
Pick random bead. Check
color and make that move.
2) New state, wait for
human counter-move.
New state, repeat above.
Game ends when one of the parties has a win /
loss or no more open spaces.
This is how the machine plays. How well will it
play? What is is doing initially?
Machine needs to learn! How? Can you think of a
strategy? The first successful machine learning
program in history (not involving search)…
Let’s try to come up with a strategy…What do we
need to do?
Reinforcement Learning
Reinforcement Learning
Works!!!  Don’t need that
many games. Quite surprising!
Comments
Learning in this case took “advantage of”:
1) State space is manageable. Further reduced by
using 1 state to represent all isomorphic states
(through board rotations and symmetries).
We quietly encoded some knowledge about tic-tac-toe!
2) What if state space is MUCH larger? As for any
interesting game…
Options:
a) Represent board by “features.” I.e., number of
various pieces on chess board but not their
position.. It’s like having each matchbox
represent a large collection of states. Notion of
“valid moves” becomes a bit trickier.
b) Don’t store “match boxes” / states explicitly,
instead learn a function (e.g. neural net) that
computes the right move directly when given
some representation of the state as input.
c) Combination of a) and b).
d) Combine a), b), and c) with some form of
“look-ahead” search.