Learning_to_Play_Blackjack.ppt

Download Report

Transcript Learning_to_Play_Blackjack.ppt

Learning to Play
Blackjack
Thomas Boyett
Presentation for CAP 4630
Teacher: Dr. Eggen
Objective
Design and implement an agent that learns to play
Blackjack.
The rational agent approach to AI described in
Chapter 2 of our textbook was used as a guide.
Accomplish this without supplying to the agent
theories or rules about how to play the game
optimally.
Specifying the Task Environment
Can be considered the problem that a rational agent is the
solution too.
Designing a good solution always includes gaining an in depth
knowledge of the problem.
PEAS
Performance – Objective measure of work quality.
Environment – The things the agent will interact with.
Actuators
Sensors
Other Properties of the Task
Environment
Fully observable vs. partially observable
Deterministic vs. stochastic
Episodic vs. sequential
Static vs. dynamic
Discrete vs. continuous
Single agent vs. multi-agent
The Rules
Unlike most cards games you play only against the dealer.
Whoever has the highest valued hand without exceeding 21 is
the winner. Going above 21 is called Busting and is an immediate loss.
Aces are worth 11 or 1, your choice.
Kings, Queens, Jacks and Tens are worth 10.
All other cards are worth their face value.
The suit of the card is ignored.
The dealer gives you two cards face up and deals himself two cards,
one of them face up. This is one of the features of Blackjack that cause
it to be a partially observable task environment.
The Rules
If you want another card you can hit. If you are
satisfied with your hand you can stand. Whoever
has the best score wins the game. If the scores
are equal then the game is a draw.
If either player on the initial deal receives an
ace and any card worth 10 then they have a
Blackjack. A Blackjack is immediate victory unless
both players have a Blackjack, this results in a
draw.
Types of Rational Agents
Simple reflex agents
Model-based reflex agents
Goal-based agents
Utility-based agents
All types can be extended to be learning agents.
The Blackjack agent will be designed as a learning
model-based reflex agent.
The Components of a Learning
Agent
Performance element
Critic
Learning element
Problem Generator
The Performance Element
The part of the agent that chooses what to
do.
The Blackjack agent in this design will be
limited to Hitting and Standing. Optimal
winning and betting are separate and
Complex problems.
The Performance Element
Chooses to hit or stand based on dealer’s
Value and its own value.
Actions stored in a reference table.
Columns represent dealer’s value and rows
represent the agent’s value.
The Performance Element
2
3
4
…
16
17
18
19
20
2
H
H
H
…
S
S
S
S
S
3
H
H
H
…
S
S
S
S
S
4
H
H
H
…
S
S
S
S
S
5
H
H
H
…
S
S
S
S
S
6
H
H
H
…
S
S
S
S
S
7
H
H
H
…
H
S
S
S
S
8
H
H
H
…
H
S
S
S
S
9
H
H
H
…
H
S
S
S
S
10
H
H
H
…
H
H
S
S
S
11
H
H
H
…
H
H
S
S
S
The Critic
The critic tells the learning element if the
results of an action were good or bad.
The critical part of a learning agent must be
objective and independent of the learning
element.
The Critic
If the agent chooses to hit:
The outcome is good if the agent did not bust.
The outcome is bad if the agent did.
If the agent chose to stand:
The outcome is good if the agent won the game.
The outcome is bad if the agent lost.
The outcome is ignored if the game ends in a
draw. Neither dealer or player benefit from a
draw or are penalized by it.
The Learning Element
Makes improvements to the performance
element.
Works in direct response to feedback
provided by the critic.
The Learning Element
The actual structure of a
lookup table entry
are four values that
represent the agent’s
previous experience with a
Specific
dealer/player value
combination. The learning
element maintains these
values.
# Hits resulting in bad
feedback
# Hits resulting in good
feedback
# Stands resulting in
bad feedback.
# Stands resulting in
good feedback.
The Learning Element
The good/bad ratio of the hitting and
standing results are computed and whichever ratio
is largest decides the perceived optimal action.
This approach allows the agent to improve
based on previous results. Thousands of
games must be played to generate a reliable
lookup table.
The Learning Element
An example of table
computation. A hypothetical
Table entry for dealer with
Value 8 and player with
value 12.
Since GH/BH is greater than
GS/BS this data evaluates
to MUST HIT on (8,12).
GH
45
BH
5
GS
20
BS
30
GS/BH = 45/5 = 9
GS/BS = 15/35 = 0.429
Problem Generator
The problem generator’s job is to
occasionally tell the learning agent to try a
non optimal action for a given situation.
At the cost of sometimes behaving less
optimally the agent is given the opportunity
to find less obvious ways to perform better.
The Problem Generator
Force the agent to play a set of games either only
hitting or standing.
Naïve policy if you are playing to win, but it allows
the agent to learn about the quality of both choices
in all circumstances.
Problem generation may seem counter productive
but it allows the agent to learn information that
otherwise would have been left undiscovered.
Results
Before Being allowed to learn:
Average Win%: 17%
Average Lose%: 80%
After being allowed to learn without guidance from a problem
Generator (50,000 games):
Average Win%: 32%
Average Loss%: 60%
After being allowed to learn with a problem generator (50,000 games):
Average Win%: 45%
Average Loss%: 49%
Results in Perspective
A player that always hits:
Average Win%: 15%
Average Loss%: 80%
A player that flips a coin to decide hit/stand:
Average Win%: 24%
Average Loss%: 70%
Results in Perspective
A player that always stands:
Average Win%: 40%
Average Loss%: 55%
A professional Blackjack player who uses
Basic (Optimal) Strategy:
Average Win%: 45%
Average Loss%: 49%
References
Artificial Intelligence: A Modern Approach,
Second Edition. Stuart Russel, Peter Norvig.
Prentice Hall, 2003.