Agents that can play multi

Transcript Agents that can play multi

Practical techniques for agents
playing multi-player games
Quiz: Complexity of Minimax
Chess: has an average branching factor of ~30, and each game takes on
average ~40.
If it takes ~1 milli-second to compute the value of each board position in the
game tree, how long to figure out the value of the game using Minimax?
A few milliseconds
A few seconds
A few minutes
A few hours
A few days
A few years?
A few decades?
A few millenia (thousands of years)?
More time than the age of the universe?
Quiz: Complexity of Minimax
Chess: has an average branching factor of ~30, and each game takes on
average ~40.
If it takes ~1 milli-second to compute the value of each board position in the
game tree, how long to figure out the value of the game using Minimax?
A few milliseconds
A few seconds
A few minutes
A few hours
A few days
A few years?
A few decades?
A few millenia (thousands of years)?
More time than the age of the universe
Strategies for coping with complexity
• Reduce b
• Reduce m
• Memoize
Reduce b: Alpha-beta pruning
∆
4∇
9∆
6
9
<=1∇
7∆
4
5
7
∇
∆
1
-5
∆
12
-7
2
∆
20
5
-9
15
During Minimax search (assume depth-first, left-to-right order):
First get a 6 for the left-most child of the root.
For the middle child of the root, the first child is a 1.
The agent can stop searching the middle child after this 1.
Reduce b: Alpha-beta pruning
∆
4∇
9∆
6
9
<=1∇
7∆
4
5
7
∆
1
4
10
∇
∆
12
2
∆
20
3
-9
15
The agent can stop searching the middle child after this 1.
The reason is that this is a Min node,
and by finding a 1,
we’ve already guaranteed that Min would select AT MOST a 1.
So, we’ve guaranteed that MAX would not select this child, and we
can move on.
Quiz: Reduce b: Alpha-beta pruning
∆
4∇
9∆
6
9
<=1∇
7∆
4
5
7
∆
1
4
10
∇
∆
12
2
∆
20
3
-9
15
What other nodes will be visited, if the agent continues with this
technique?
What will be the values of those nodes?
Answer:
Reduce b:∆ Alpha-beta pruning
4∇
9∆
6
9
<=1∇
7∆
4
5
7
∆
1
4
10
<=3 ∇
12
3∆
2
∆
20
3
-9
15
What other nodes will be visited, if the agent continues with this
technique?
What will be the values of those nodes?
Quiz: Reduce b: Alpha-beta pruning
∆
∇
∆
6
9
∇
∆
4
5
7
∆
1
4
10
∇
∆
12
5
∆
20
15
-9
2
Suppose the algorithm visits nodes depth-first, but Right-to-Left.
What nodes will be visited, and what are the values of those nodes?
Answer:
Reduce b:∆4 Alpha-beta pruning
∇4
>=9
∆
6
9
∇1
∆7
4
5
7
∆10
1
4
10
∇2
12
>=15
∆
5
15
∆2
20
-9
2
Going right-to-left in this tree, there are fewer opportunities for
pruning: effects of pruning depend on the values in the tree.
On average, this technique tends to cut branching factors down to
their square root (from b to √b).
Reduce m: evaluation functions
∆
∇
∆
6
9
∇
∆
4
5
7
∆
1
4
10
∇
∆
12
2
∆
20
5
-9
15
Suppose searching to a depth of m=3 is just too expensive.
What we’ll do instead is introduce a horizon (h), or cutoff.
For this example, we’ll let h=2.
No nodes will be visited beyond the horizon.
Reduce m: evaluation functions
∆
∇
?∆
6
9
∇
∆?
4
5
7
∆?
1
4
10
∇
∆?
12
2
∆?
20
5
-9
15
Problem: how do we determine the value of non-terminal nodes at
the horizon?
The general answer is to introduce evaluation functions, which
estimate (or guess) the value of a node.
Reduce m: evaluation functions
∆
∇
?∆
6
9
∇
∆?
4
5
7
∆?
1
4
10
∇
∆?
12
2
∆?
20
5
-9
15
Let E(n) = w1 f1(n) + w2 f2(n) + … + wk fk(n)
Each fi(n) is a “feature function” of the node, that returns some real
number describing the node in some way.
Each wi is a real number weight or parameter.
One common way to create E(n) is to get game experts to come up
with appropriate fi and wi.
Hex Evaluation Function Example
fi = shortest path for red – shortest path for blue
=2
-1
=1
As an example,
one possible fi
function for
Hex could be
the shortest
path to a
solution for
Red, minus the
shortest path to
a solution for
Blue.
Hex Evaluation Function Example
fi = shortest path for red – shortest path for blue
=2
-1
=1
If Red is Max, we
can assign wi = -1.
This encodes the
intuition that if
Red has a longer
shortest path
than Blue, then
this is a bad
position for Red.
Hex Evaluation Function Example
Can you think
of some other
potential fi for
Hex?
Notice, the
important thing
is that fi should
be correlated
with Value(n).
Learning an evaluation function
Experts are often good at coming up with fi
functions.
But it’s often hard for a game expert (or anyone)
to come up with the best wi weights for E(n).
An alternative approach is to create an
algorithm to learn the wi weights from data.
What’s the data?
To do machine learning, you need data that contains inputs and labels.
For an evaluation function, that means board positions and values.
But we’ve already said that it’s hard to figure out the right value for
many board positions – that’s the whole point of the evaluation
function in the first place.
Instead of asking people to label boards with values, a common
approach is to have a simulation of the agent playing against itself.
The outcome of the game is used as the value of the board positions
along the way.
Quiz: What’s the learning algorithm?
Once you’ve collected enough examples of
board positions and values, there are lots of
algorithms to do the learning.
For the kind of evaluation function I introduced,
name some appropriate learning techniques
that we’ve discussed.
Answer: What’s the learning
algorithm?
Once you’ve collected enough examples of board
positions and values, there are lots of algorithms to do
the learning.
For the kind of evaluation function I introduced, name
some appropriate learning techniques that we’ve
discussed.
Two come to my mind:
-Linear Regression
-Gradient Descent
Quiz: Horizon effect example
1. What is Red’s best
move?
2. If we use a horizon of
2 and the “shortestpath” evaluation,
what will Red
choose?
Answer: Horizon effect example
1. What is Red’s best
move?
If Red moves to either of these squares, Blue can easily block it by
moving to the other one.
If Red moves to either of these squares, Blue can easily win by
moving to the other one.
Answer: Horizon effect example
1. What is Red’s best
move?
If Red moves to any of these squares,
Blue can win by moving here.
Answer: Horizon effect example
1. What is Red’s best
move?
Red’s only chance is to move here first.
In fact, if Red does that, Red should win the game.
But – that’s hard to see, and requires seeing many moves in
advance.
Answer: Horizon effect example
2. If we use a horizon of
2 and the “shortestpath” evaluation,
what will Red
choose?
This choice gives Red a shortest path of 3.
Many of Blue’s responses would
decrease Blue’s shortest path to 2
for a difference of 1 in favor of Blue.
Answer: Horizon effect example
2. If we use a horizon of
2 and the “shortestpath” evaluation,
what will Red
choose?
This choice gives Red a shortest path of 3.
Many of Blue’s responses would
decrease Blue’s shortest path to 2
for a difference of 1 in favor of Blue.
But that’s basically evaluation for many of Red’s moves. So Red
has no idea what move is the best move, and must pick randomly.
Memoization
Memoization involves remembering/memorizing certain good
positions or strategies or moves, to avoid doing a lot of search when
such positions or moves become available.
Some common examples:
• Opening book: A database of good positions for a particular player
in the beginning phases of a game, as determined by game experts.
(These are especially important for chess.)
• Closing book: A database of board positions that are close to the
end of the game, with the best possible strategy for completing the
game from that position.
• Killer moves: A technique of remembering when some move in a
game tree results in a big change to the game (eg, someone’s
queen gets taken, in chess). If this happens in one place in a game
tree, it’s a good idea to check for it in other branches of the tree as
well.
Full Algorithm
Initial Call: Value(root, 0, -∞, +∞).
α: best value for Max found so far. β: best value for Min found so far.
Value(n, depth, α, β):
- If n is a terminal node, return ∆’s utility
- If depth >= cutoff, return E(n)
- If n is ∆’s turn:
v = -∞
For each c ∊ Children(n):
v = max(v, Value(c, depth+1, α, β))
if v >= β: return v
α = max(α, v)
Return v
-
(pruning step)
If n is ∇’s turn:
v = +∞
For each c ∊ Children(n):
v = min(v, Value(c, depth+1, α, β))
if v <= α: return v
β = min(β, v)
Return v
(pruning step)
Benefits to Complexity
• Reduce b: O(bm)  O(bm/2)
• Reduce m: O(bm)  O(bh), h << m
• Memoize: O(bm)  O(1), if board position has already
been analyzed before
Note: alpha-beta pruning is an exact method: the best
move using alpha-beta pruning is the same as the best
move without it (normal Minimax).
Horizon cutoffs are approximate methods: you may get
bad results for the choice of an action, if you get a
horizon effect.
Example of a Real System:
Chinook (checkers player)
Checkers:
• Red moves first.
• Each move is either
- a diagonal move one square
- a diagonal jump over an opponent’s
piece, which removes the opponent’s
piece.
• Multiple jumps are possible.
• If a Jump is available to a player, the player
must take it.
• Ordinarily, pieces can only move forward.
• If a piece gets to the opposite side of the
board, it gets “crowned”.
• A “crowned” piece can move forward or
backwards.
Example of a Real System:
Chinook (checkers player)
Chinook:
Chinook is a computer program that plays Checkers.
1990: It became the first computer program to win the
right to compete for a world championship in a game or
sport. (It came second in the US National competition.)
The Checkers governing body didn’t like that, but they
created the World Man-Machine Competition.
1994: Chinook wins the World Man-Machine
Competition against Dr. Marion Tinsley, after Tinsley
withdraws due to health problems.
1995: Chinook defended its title against Don Lafferty.
After that, the program creators decided to retire
Chinook.
2007: The program creators proved that the best
anyone can do against Chinook is draw.
Example of a Real System:
Chinook (checkers player)
Chinook:
The Chinook system:
• Minimax + alpha-beta pruning
• Hand-crafted evaluation function (no learning
component)
Linear function with features like:
• Num. pieces for each player
• How many kings for each player
• How many kings are “trapped”
• How many pieces are “runaways” – nothing
to stop them from being crowned
• Opening move database from checkers experts
• End-game database that stores the best move
from all positions with 8 or fewer pieces.
Stochastic Games
Many games
(Backgammon,
Monopoly, World of
Warcraft, etc.) involve
some randomness.
An attack against
another creature may or
may not succeed, or the
number of squares your
piece is allowed to
move may depend on a
dice roll.
Stochastic Games:
Giving Nature a turn
?
1
2
∇
∆
6
9
∇
?
4
5
3
∆
1
7
4
10
∇
∆
12
2
?
20
5
-9
In Stochastic
games, we give
“Nature” a turn in
the game tree
whenever it’s
time for a dice roll
or some random
event.
15
We’ll represent
Nature’s turn with
a ?, and call these
“Chance nodes”.
Stochastic Games:
Giving Nature a turn
?
1
2
∇
∆
∇
?
4
3
∆
1
∇
∆
12
1 2
6
9
5
7
?
20
1 2
4
10
2
5
-9
15
We’ll define the
Value of a chance
node to be the
expected value of
its children
nodes.
Quiz: Stochastic Games
?
1
2
∇
∆
3
∇
?
4
∆
1
∇
∆
12
9
5
7
?
20
1 2
6
Assume each
branch of chance
nodes have equal
probability.
1 2
4
10
2
5
-9
15
What is the value
of the root node
for this game
tree?
Answer: Stochastic Games
2.667
?
1
2
4∇
9∆
?
4
3
1∇
6
10
∆
1
∇3
∆5
12
9
5
7
?
20
1 2
6
Assume each
branch of chance
nodes have equal
probability.
1 2
4
10
2
5
-9
15
3
What is the value
of the root node
for this game
tree?
Partially-Observable Games:
Poker Example
Simple Poker game:
Deck contains 2 Ks, 2As.
Each player is dealt one card.
1st round: P1 can raise (r) or check (k)
2nd round: P2 can call (c) or fold (f)
Partially-Observable Games:
Poker Example
?
1/6:
Both K
1/3:
P1<-K, P2<-A
∆
k
∆
r
k
∇
-1
1
k
r
1
1/6:
Both A
∆
∆
r
1
∇
0
-2
1/3:
P1<-A, P2<-K
k
∇
0
1
r
∇
0
2
1
0
Computing equilibria in games with
imperfect information
If the game has perfect recall (meaning, everyone knows the
history of every player’s actions)
For 2-player, zero-sum games:
Finding equilibria still amounts to Linear Programming.
It’s possible to compute equilibria in polynomial time in the
size of the game tree.
For general-sum games:
As hard as the general problem of finding Nash equilibria.
For games without perfect recall: hard even for zero-sum
games.

Agents that can play multi

Transcript Agents that can play multi

Directory