Machine Learning in Computer Game Players
Download
Report
Transcript Machine Learning in Computer Game Players
Chikayama & Taura Lab.
M1 Ayato Miki
1
Introduction
Computer Game Players
Machine Learning in Computer Game
Players
Tuning Evaluation Functions
1.
2.
3.
4.
◦
◦
◦
5.
Supervised Learning
Reinforcement Learning
Evolutionary Algorithms
Conclusion
2
Improvements in Computer Game Players
◦ DEEP BLUE defeated Kasparov in 1997
◦ GEKISASHI and TANASE SHOGI on WCSC 2008
Strong Computer Game Players are usually
developed by strong human players
◦ Input heuristics manually
◦ Devote a lot of time and energy to tuning
3
Machine Learning enables automatic tuning
using a large amount of data
It is not necessary for a developer to be an
expert of the game
4
Introduction
Computer Game Players
Machine Learning in Computer Game
Players
Tuning Evaluation Functions
1.
2.
3.
4.
◦
◦
◦
5.
Supervised Learning
Reinforcement Learning
Evolutionary Algorithms
Conclusion
5
Games
Game Trees
Game Tree Search
Evaluation Function
6
Turn system games
◦ ex. tic-tac-toe, chess, shogi, poker, mah-jong…
Additional Classification
◦
◦
◦
◦
two player or otherwise
zero-sum or otherwise
deterministic or non-deterministic
perfect or imperfect information
Game Tree Model
7
← player’s turn
move 1 →
← move 2
← opponent’s turn
8
ex. Minimax search algorithm
5
Max
5
3
Min
5
3
1
Min
8
Max
5
4
8
3
Max
2
3
0
6
1
6
4
2
9
Difficult to search up to leaf nodes
◦ 10^220 possible positions in shogi
Stop search at practicable depth
And “Evaluate” nodes
◦ Using Evaluation Function
10
Estimate the superiority of the position
Elements
◦ feature vector of the position
◦ parameter vector
V (s) f (s , )
s
feature vector of position s
parameter vector
11
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦ Supervised Learning
◦ Reinforcement Learning
◦ Evolutionary Algorithms
Conclusion
12
Initial work
◦ Samuel’s research [1959]
Learning objective
◦ What do Computer Game Players Learn ?
13
Many useful techniques
◦ Rote learning
◦ Quiescence search
◦ 3-layer neural network evaluation function
And some machine learning techniques
◦ Learning through self-play
◦ Temporal-difference learning
◦ Comparison training
14
Opening Book
Search Control
Evaluation Function
15
Automatic construction of evaluation function
◦ Construct and select a feature vector automatically
◦ ex. GLEM [Buro, 1998]
◦ Difficult
Tuning evaluation function parameters
◦ Make a feature vector manually and tune its
parameters automatically
◦ Easy and effective
18
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦ Supervised Learning
◦ Reinforcement Learning
◦ Evolutionary Algorithms
Conclusion
19
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
20
Provide the program with example positions and
their exact evaluation values
・・・
50
20
Adjusts the parameters in a way that minimizes the
error between the evaluation function outputs and
the exact values
error V (s)
40
10
50
21
Manual labeling positions
Quantitative evaluation
Consider more soft approach
22
Soft Supervised Training
Require only relative order for the possible moves
◦ Easier and more intuitive
>
23
Comparison training using records of expert
games
Simple relative order
The expert
move
>
other moves
24
Based on the Optimal Control Theory
Minimize the Cost Function J
N 1
J ( s0 , s1 ,, s N 1 , ) l ( si , )
i 0
si
example positions in the records
N
total number of example positions
l ( si , )
error function
25
Error Function
M 1
l ( s, ) T [ ( s'm , ) ( s'm0 , )]
m 1
s 'm
child position with move m
M
total number of possible moves
m0
the move played in the record
( s 'm , ) minimax search value
order discriminant function
T ( x)
26
Sigmoid Function
1
T ( x)
kx
1 e
◦ k is the parameter to control the gradient
◦ When k , T(x) is Step Function
◦ In this case, the error function means “the number
of moves that were considered to be better than the
move in the record”
27
30,000 professional game records and
30,000 high rating game records in SHOGI
CLUB 24 were used
The weight parameters of about 10,000
feature elements were tuned
And won in the World Computer Shogi
Championship 2006
29
It is costly to accumulate a training data set
◦ It takes a lot of time to label manually
◦ Using expert records has been successful
But how if not enough expert records ?
◦ New games
◦ Minor games
Other approach without a training set
◦ ex. Reinforcement Learning (Next)
30
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
31
The learner gets “a reward” from the
environment
In the domain of game, the reward is final
outcome(win/lose)
Reinforcement learning requires only the
objective information of the game
32
+10 +20 -10
+30 +60 -30
+120 -60
+200
-100
+60
+100
Inefficient in Games…
33
+10 +10
+30 +15
+60 +10
+80
+100
TDerror r V (st 1 ) V (st )
34
Trained through self-play
Version
Features
Strength
TD-Gammon 0.0
Raw Board
Information
Top of Computer
Players
TD-Gammon 1.0
Plus Additional
Heuristics
Worldchampionship
35
Falling into a local optimum
◦ Lack of playing variation
Solutions
◦ Add intentional randomness
◦ Play against various players (computer/human)
Credit Assignment Problem (CAP)
◦ Not clear which action was effective
36
Supervised Learning
Reinforcement Learning
Evolutionary Algorithm
37
Initialize Population
Randomly Vary
Individuals
Evaluate “Fitness”
Apply Selection
38
Evolutionary algorithm for chess player
Using open-source chess program
◦ Attempt to tune its parameters
39
Make initial 10 parents
◦ Initialize parameters with random values
40
Create 10 offsprings from each surviving
parent by mutating parental parameters
'i i N (0, s'i )
N ( , )
s'i
Gaussian random variable
strategy parameter
41
Each player plays ten games against randomly
selected opponents
Select 10 opponents randomly
Ten best players become parents of the next
generation
42
Material value
Positional value
Weights and biases of three neural networks
43
Each network has 3 Layers
Input = Arrangement of specific areas
(front 2 rows, back 2 rows, and center 4x4 square)
Hidden = 10 Units
Output = Worth of the area arrangement
16 input
10 hidden
1 output
44
Initial Rating = 2066 (Expert)
◦ Rating of open-source player
10 independent trials (Each has 50 generations)
Best Rating = 2437 (Senior Master)
But the program cannot yet compete with
other strongest chess programs (R2800~)
45
Introduction
Computer Game Players
Machine Learning in Computer Game Players
Tuning Evaluation Functions
◦ Supervised Learning
◦ Reinforcement Learning
◦ Evolutionary Algorithms
Conclusion
47
Advantages
Disadvantages
Supervised Learning
Direct and Effective
Manual Labeling Cost
Reinforcement
Learning
Wide Application
Local Optimal
CAP
Evolutionary Algorithm
Wide Application
No CAP
Indirect
Random Dispersion
48
Automatic position labeling
◦ Using records or computer play
Sophisticated reward
◦ Consider opponent’s strength
◦ Move analysis for credit assignment
Experiment in other games
49