Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.
Download
Report
Transcript Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.
Ralf Herbrich, Thore Graepel
Applied Games Group
Microsoft Research Cambridge
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Tutorial
(13:00 – 15:30)
Part 1
(13:00 – 14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Test Beds for Machine
Learning
• Perfect instrumentation
and measurements
• Perfect control and
manipulation
• Reduced cost
• Reduced risk
• Great way to showcase
algorithms
Improve User Experience
• Create adaptive, believable
game AI
• Compose great multiplayer
matches based on skill and
social criteria
• Mitigate Network latency
using prediction
• Create realistic character
movement
Partially observable stochastic games
States only partially observed
Multiple agents choose actions
Stochastic pay-offs and state transitions depend
on state and all the other agents’ actions
Goal: Optimise long term pay-off (reward)
Just like life: complex, adversarial, uncertain,
and we are in it for the long run!
From single
player’s
perspective
Approximate
Solutions
What is the best
AI?
• Partially Observable Markov Decision
Process (POMDP)
• Reinforcement Learning
• Unsupervised Learning
• Supervised Learning
• Always takes optimal actions
• Delivers best entertainment value
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 –
14:20)
Reinforcement
Learning
Part 2
(14:20 –
15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Agents
1977
Non-Player
Character
Human Player
2001
Number of games
consoles sold
• Xbox 360: 19 million
• Wii: 18 million
• PS3: 10 million
Global Revenue: Video
Game Software &
Hardware
• 2005: 8.7 billion US$
• 2006: 10.3 billion US$
• 2007: 13.7 billion US$
Hardware
development
• Graphics (graphics cards, displays)
• Sound (5.1, 7.1 surround, speakers, headphones)
• CPU speed, cache
• Memory, Hard disks, DVD, HDDVD (Blu-Ray)
• Networking (Broadband)
Software
development
• Graphics (rendering etc.)
• Sound (sound rendering etc.)
• Physics simulation (e.g., Havoc engine)
• Networking software (e.g., Xbox Live)
• Artificial Intelligence
Objective is to nurture
creatures called norns
Model incorporates
artificial life features
Norns had neural
network brains
Their development can
be influenced by player
feedback
Peter Molineux’s
famous “God Game”
Player determines fate
of villagers as their
“God” (seen as a hand)
Creature can be taught
complex behaviour
Good and Evil - actions
have consequence
First car racing game to
use neural networks
Variety of tracks, drivers
and road conditions
Racing line provided by
author, neural network
keeps car on racing line
Multilayer perceptrons
trained with RPROP
Simple rules for recovery
and overtaking
Applications of ML in
games
• Battle Cruiser:
3000AD (1996)
• Spin: Sprint Car
Racing (2001)
• Virtua Fighter 4
(2001)
Source: http://www.gameai.com/games.html
Reasons for not using
Machine Learning
• Designers afraid of
unpredictable
behaviour
• Technical difficulty
of making it work
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
Problem
Solution
• Match outcomes between
arbitrary number of teams and
players per team
• Partial orderings into global
orderings
• Fair competitive matchmaking
• Probabilistic model of match
outcomes and skill
• Efficient Bayesian inference
• Generalisation of Chess ranking
algorithm ELO
• Launched September 2005
• Every Xbox 360 game uses
TrueSkill™
• > 12 million players
• > 2 million matches per day
• > 2 billion hours of
accumulated game-play
• Launched September 2007
• Largest entertainment
launch in history
• Often, > 200,000 players
concurrently playing
• Reference implementation
of TrueSkill
Xbox Live
Halo 3
Tutorial
(13:00 – 15:30)
Part 1
(13:00 – 14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
game state
Agent
parameter update
action
reward /
punishment
Learning Algorithm
game state
Game
action
•
•
•
•
•
•
•
Q(s,a) is expected reward for action a in state s.
α is rate of learning
a is action chosen
r is reward resulting from a
s is current state
s’ is state after executing a
γ is discount factor for future rewards
Q Learning (off-policy)
SARSA (on-policy)
+10.0
actions
Q-Table
THROW
KICK
STAND
13.2
10.2
-1.3
1ft / GROUND
2ft / GROUND
3ft / GROUND
3 ft
4ft / GROUND
game states
5ft / GROUND
6ft / GROUND
1ft / KNOCKED
2ft / KNOCKED
3ft / KNOCKED
4ft / KNOCKED
5ft / KNOCKED
6ft / KNOCKED
3.2
6.0
4.0
Game state features
•
•
•
•
Reinforcement Learner
Separation (5 binned ranges)
Last action (6 categories)
Mode (ground, air, knocked)
Proximity to obstacle
Available Actions
• 19 aggressive (kick, punch)
• 10 defensive (block, lunge)
• 8 neutral (run)
Q-Function Representation
• One layer neural net (tanh)
• Linear
In-Game AI Code
Reward for decrease in Wulong Goth’s health
Early in the learning process …
… after 15 minutes of learning
Punishment for decrease in either player’s health
Early in the learning process …
… after 15 minutes of learning
3.
4.
5.
Collect Experience
Learn transition
probabilities and rewards
Revise Value Function and
Policy
Revise state-action
abstraction
Return to 1 and collect
more experience
Left Distance
1.
2.
Speed
Too Coarse
Just Right!
Too Fine
Representational Complexity
A
A
A A
A
A
A
A
Split
Merge
Split
Merge
Real time racing simulation.
Goal: as fast lap times as possible.
Laser Range Finder
Measurements as Features
Progress along Track as
Reward
Coast
Accelerate
Brake
Hard-Left
Hard-Right
Soft-Left
Soft-Right
Current Games
have unrealistic
physical movement
Moonwalk
Hovering
Only death scenes
are realistic
Rag-doll physics
Releases joint
constraints
Compromise between hard-wired and
learned controller
Motion Sequencer with corrections
FOX controller: based on cerebellar model
articulation controller (CMAC) neural network
trained by reinforcement learning
Can follow paths and climb up and down
slopes
Trained monopeds (“hopper”) and bipeds
Hopper Training
Hopper Trained
Biped Training
Biped Trained
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why
Machine
Learning
and Games?
Machine
Learning in
Commercial
Games
Part 2
(14:20 –
15:30)
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 –
14:20)
Reinforcement
Learning
Part 2
(14:20 –
15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Fix Markers at key
body positions
Record their position
in 3D during motion
Fundamental
technology in
animation today
Free download of mocap files:
www.bvhfiles.com
Generative model for dimensionality reduction
Probabilistic equivalent to PCA which defines a
probability distribution over data
Non-linear manifolds based on kernels
Visualisation of high-dimensional data
Back-projection from latent to data space
Can deal with missing data
Latent variables
Weight matrix
x
W
y
Data
• SPCA: Marginalise over
x and optimise W
• GPLVM: Marginalise
over W and optimise x
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Goal: Learn from skilled players how to act in
a first-person shooter (FPS) game
Test Environment:
Unreal Tournament FPS game engine
Gamebots control framework
Idea: Naive Bayes classifier to learn under
which circumstances to switch behaviour
St: bot’s state at t
St+1: bot’s state t+1
H: health level
W: weapon
OW: opponent’s weapon
HN: hear noise
NE: number of close enemies
PW: weapon close by?
PH: health pack close by?
Naive Bayes
(“inverse
programming”)
Supervised
learning
• Specify probabilities P(H| St+1) and
P(St+1|St)
• Use Bayes rule for P(St+1|H,W,...) etc.
• Recognition of state of human trainer
• Reading out variables from game engine
• Determine relative frequencies to
estimate probabilities P(H| St+1) (table).
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
“Built-In” AI Behaviour
Development Tool
Drivatar
Learning System
Drivatar Racing Line
Behaviour Model
Vehicle Interaction and
Racing Strategy
Recorded Player
Driving
Controller
Car Behaviour
Drivatar AI Driving
Two phase process:
1. Pre-generate possible racing lines prior to the
race from a (compressed) racing table.
2. Switch the lines during the race to add
variability.
Compression reduces the memory needs per
racing line segment
Switching makes smoother racing lines.
Segments
a1
a2
a3
a4
Car Position
and Speed
at time t
Static Car
Properties
Car Controls
Physics (“causal”)
Physics
Simulation
System
Control (“inverse”)
Controller
Car Position
and Speed
at time t+1
Desired Pos.
and Speed
at time t+1
Given:
Match outcomes: Orderings among k teams
consisting of n1, n2 , ..., nk players, respectively
Questions:
Skill si for each player such that
Global ranking among all players
Fair matches between teams of players
Possible outcomes: Player 1 wins over 2 (and
vice versa)
s1
s2
p1
p2
y1
2
Gaussian Prior Factors
s
s
s
s
1
2
3
4
Fast and efficient approximate message passing
t using Expectation
t Propagation t
1
2
3
y1
y2
2
3
Ranking Likelihood Factors
Leaderboard
Global ranking of all players
Matchmaking
For gamers: Most uncertain outcome
For inference: Most informative
Both are equivalent!
40
35
30
Level
25
20
15
char (TrueSkill™)
10
SQLWildman (TrueSkill™)
char (Halo 2 rank)
SQLWildman (Halo 2 rank)
5
0
0
100
200
300
Number of Games
400
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Multi-platform car racing simulation (Linux,
FreeBSD, MacOSX and Windows) under GPL
50 different cars, more than 20 tracks, and 50
opponents
Lighting, smoke, skid-marks and glowing
brake disks
Damage model, collisions, tire and wheel
properties, aerodynamics
http://torcs.sourceforge.net/
Next competition Car Racing @ CIG 2008!
Based on real-time NeuroEvolution of
Augmenting Topologies (rtNEAT)
Goals of NERO:
Demonstrate the power of state-of-the-art
machine learning technology
Create an engaging game based on it
Provide robust and challenging development
and benchmarking domain for AI researchers
http://www.nerogame.org/
Each simulated robot
player has its own play
strategy
Every simulated team
consists of a collection
of programs.
The games last for
about 10 minutes.
Sensors
• Aural
• Limited capacity
• Limited range
• Visual
• Field of View
• Range of View
• Noisy
• Physical
Actions
•
•
•
•
•
•
•
Catch (Goalie only)
Dash
Kick
Move
Say
Turn
Turn Neck
More information: http://sserver.sourceforge.net/
Halo 2 Beta
• Available at APG web page
• 4 datasets of game outcomes gathered
during the Beta testing period for the Xbox
game Halo 2
• 120,000 match outcomes between 6,000
players
Chess Rankings
•
•
•
•
Available at ChessBase web page
3.75 million Chess games from 1560 – 2007
220,000 (semi-professional) players
Largest annotated collection of Chess
games in the world
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Mentioned in this
tutorial
• Adaptive and
Learning Game AI
• Realistic Physical
Movement
• Online Gaming
Interactions
Not mentioned in
this tutorial
• Adaptive Input
Devices
• Dialogue Generation
• Computer Vision
• New Game Genres
based on Machine
Learning
Computer games can be used as test beds for
machine learning research.
Machine learning can be used to improve the
user experience in computer games.
Both research and applications are in their
infancy and there are many open questions.
Game frameworks exist to plug in machine
learning algorithms, but it is never easy.
But fun...