Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.

Download Report

Transcript Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.

Ralf Herbrich, Thore Graepel
Applied Games Group
Microsoft Research Cambridge
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Tutorial
(13:00 – 15:30)
Part 1
(13:00 – 14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Test Beds for Machine
Learning
• Perfect instrumentation
and measurements
• Perfect control and
manipulation
• Reduced cost
• Reduced risk
• Great way to showcase
algorithms
Improve User Experience
• Create adaptive, believable
game AI
• Compose great multiplayer
matches based on skill and
social criteria
• Mitigate Network latency
using prediction
• Create realistic character
movement

Partially observable stochastic games
 States only partially observed
 Multiple agents choose actions
 Stochastic pay-offs and state transitions depend
on state and all the other agents’ actions
 Goal: Optimise long term pay-off (reward)

Just like life: complex, adversarial, uncertain,
and we are in it for the long run!
From single
player’s
perspective
Approximate
Solutions
What is the best
AI?
• Partially Observable Markov Decision
Process (POMDP)
• Reinforcement Learning
• Unsupervised Learning
• Supervised Learning
• Always takes optimal actions
• Delivers best entertainment value
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 –
14:20)
Reinforcement
Learning
Part 2
(14:20 –
15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Agents
1977
Non-Player
Character
Human Player
2001
Number of games
consoles sold
• Xbox 360: 19 million
• Wii: 18 million
• PS3: 10 million
Global Revenue: Video
Game Software &
Hardware
• 2005: 8.7 billion US$
• 2006: 10.3 billion US$
• 2007: 13.7 billion US$
Hardware
development
• Graphics (graphics cards, displays)
• Sound (5.1, 7.1 surround, speakers, headphones)
• CPU speed, cache
• Memory, Hard disks, DVD, HDDVD (Blu-Ray)
• Networking (Broadband)
Software
development
• Graphics (rendering etc.)
• Sound (sound rendering etc.)
• Physics simulation (e.g., Havoc engine)
• Networking software (e.g., Xbox Live)
• Artificial Intelligence
Objective is to nurture
creatures called norns
 Model incorporates
artificial life features
 Norns had neural
network brains
 Their development can
be influenced by player
feedback

Peter Molineux’s
famous “God Game”
 Player determines fate
of villagers as their
“God” (seen as a hand)
 Creature can be taught
complex behaviour
 Good and Evil - actions
have consequence

First car racing game to
use neural networks
 Variety of tracks, drivers
and road conditions
 Racing line provided by
author, neural network
keeps car on racing line
 Multilayer perceptrons
trained with RPROP
 Simple rules for recovery
and overtaking

Applications of ML in
games
• Battle Cruiser:
3000AD (1996)
• Spin: Sprint Car
Racing (2001)
• Virtua Fighter 4
(2001)
Source: http://www.gameai.com/games.html
Reasons for not using
Machine Learning
• Designers afraid of
unpredictable
behaviour
• Technical difficulty
of making it work
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
Problem
Solution
• Match outcomes between
arbitrary number of teams and
players per team
• Partial orderings into global
orderings
• Fair competitive matchmaking
• Probabilistic model of match
outcomes and skill
• Efficient Bayesian inference
• Generalisation of Chess ranking
algorithm ELO
• Launched September 2005
• Every Xbox 360 game uses
TrueSkill™
• > 12 million players
• > 2 million matches per day
• > 2 billion hours of
accumulated game-play
• Launched September 2007
• Largest entertainment
launch in history
• Often, > 200,000 players
concurrently playing
• Reference implementation
of TrueSkill
Xbox Live
Halo 3
Tutorial
(13:00 – 15:30)
Part 1
(13:00 – 14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
game state
Agent
parameter update
action
reward /
punishment
Learning Algorithm
game state
Game
action
•
•
•
•
•
•
•

Q(s,a) is expected reward for action a in state s.
α is rate of learning
a is action chosen
r is reward resulting from a
s is current state
s’ is state after executing a
γ is discount factor for future rewards
Q Learning (off-policy)

SARSA (on-policy)
+10.0
actions
Q-Table
THROW
KICK
STAND
13.2
10.2
-1.3
1ft / GROUND
2ft / GROUND
3ft / GROUND
3 ft
4ft / GROUND
game states
5ft / GROUND
6ft / GROUND
1ft / KNOCKED
2ft / KNOCKED
3ft / KNOCKED
4ft / KNOCKED
5ft / KNOCKED
6ft / KNOCKED
3.2
6.0
4.0
Game state features
•
•
•
•
Reinforcement Learner
Separation (5 binned ranges)
Last action (6 categories)
Mode (ground, air, knocked)
Proximity to obstacle
Available Actions
• 19 aggressive (kick, punch)
• 10 defensive (block, lunge)
• 8 neutral (run)
Q-Function Representation
• One layer neural net (tanh)
• Linear
In-Game AI Code
Reward for decrease in Wulong Goth’s health
Early in the learning process …
… after 15 minutes of learning
Punishment for decrease in either player’s health
Early in the learning process …
… after 15 minutes of learning
3.
4.
5.
Collect Experience
Learn transition
probabilities and rewards
Revise Value Function and
Policy
Revise state-action
abstraction
Return to 1 and collect
more experience
Left Distance
1.
2.
Speed
Too Coarse
Just Right!
Too Fine
Representational Complexity
A
A
A A
A
A
A
A
Split
Merge
Split
Merge


Real time racing simulation.
Goal: as fast lap times as possible.
Laser Range Finder
Measurements as Features
Progress along Track as
Reward







Coast
Accelerate
Brake
Hard-Left
Hard-Right
Soft-Left
Soft-Right

Current Games
have unrealistic
physical movement
 Moonwalk
 Hovering

Only death scenes
are realistic
 Rag-doll physics
 Releases joint
constraints





Compromise between hard-wired and
learned controller
Motion Sequencer with corrections
FOX controller: based on cerebellar model
articulation controller (CMAC) neural network
trained by reinforcement learning
Can follow paths and climb up and down
slopes
Trained monopeds (“hopper”) and bipeds
Hopper Training
Hopper Trained
Biped Training
Biped Trained
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why
Machine
Learning
and Games?
Machine
Learning in
Commercial
Games
Part 2
(14:20 –
15:30)
Coffee Break
(14:10 – 14:20)
Reinforcement
Learning
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Tutorial
(13:00 –
15:30)
Part 1
(13:00 –
14:10)
Why Machine
Learning and
Games?
Machine
Learning in
Commercial
Games
Coffee Break
(14:10 –
14:20)
Reinforcement
Learning
Part 2
(14:20 –
15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Fix Markers at key
body positions
 Record their position
in 3D during motion
 Fundamental
technology in
animation today
 Free download of mocap files:
www.bvhfiles.com

Generative model for dimensionality reduction
Probabilistic equivalent to PCA which defines a
probability distribution over data
 Non-linear manifolds based on kernels
 Visualisation of high-dimensional data
 Back-projection from latent to data space
 Can deal with missing data


Latent variables
Weight matrix
x
W
y
Data
• SPCA: Marginalise over
x and optimise W
• GPLVM: Marginalise
over W and optimise x
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges


Goal: Learn from skilled players how to act in
a first-person shooter (FPS) game
Test Environment:
 Unreal Tournament FPS game engine
 Gamebots control framework

Idea: Naive Bayes classifier to learn under
which circumstances to switch behaviour









St: bot’s state at t
St+1: bot’s state t+1
H: health level
W: weapon
OW: opponent’s weapon
HN: hear noise
NE: number of close enemies
PW: weapon close by?
PH: health pack close by?
Naive Bayes
(“inverse
programming”)
Supervised
learning
• Specify probabilities P(H| St+1) and
P(St+1|St)
• Use Bayes rule for P(St+1|H,W,...) etc.
• Recognition of state of human trainer
• Reading out variables from game engine
• Determine relative frequencies to
estimate probabilities P(H| St+1) (table).
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
“Built-In” AI Behaviour
Development Tool
Drivatar
Learning System
Drivatar Racing Line
Behaviour Model
Vehicle Interaction and
Racing Strategy
Recorded Player
Driving
Controller
Car Behaviour
Drivatar AI Driving

Two phase process:
1. Pre-generate possible racing lines prior to the
race from a (compressed) racing table.
2. Switch the lines during the race to add
variability.


Compression reduces the memory needs per
racing line segment
Switching makes smoother racing lines.
Segments
a1
a2
a3
a4
Car Position
and Speed
at time t
Static Car
Properties
Car Controls
Physics (“causal”)
Physics
Simulation
System
Control (“inverse”)
Controller
Car Position
and Speed
at time t+1
Desired Pos.
and Speed
at time t+1

Given:
 Match outcomes: Orderings among k teams
consisting of n1, n2 , ..., nk players, respectively

Questions:
 Skill si for each player such that
 Global ranking among all players
 Fair matches between teams of players

Possible outcomes: Player 1 wins over 2 (and
vice versa)
s1
s2
p1
p2
y1
2
Gaussian Prior Factors
s
s
s
s
1
2
3
4
Fast and efficient approximate message passing
t using Expectation
t Propagation t
1
2
3
y1
y2
2
3
Ranking Likelihood Factors

Leaderboard
 Global ranking of all players

Matchmaking
 For gamers: Most uncertain outcome
 For inference: Most informative
 Both are equivalent!
40
35
30
Level
25
20
15
char (TrueSkill™)
10
SQLWildman (TrueSkill™)
char (Halo 2 rank)
SQLWildman (Halo 2 rank)
5
0
0
100
200
300
Number of Games
400
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges





Multi-platform car racing simulation (Linux,
FreeBSD, MacOSX and Windows) under GPL
50 different cars, more than 20 tracks, and 50
opponents
Lighting, smoke, skid-marks and glowing
brake disks
Damage model, collisions, tire and wheel
properties, aerodynamics
http://torcs.sourceforge.net/
Next competition Car Racing @ CIG 2008!


Based on real-time NeuroEvolution of
Augmenting Topologies (rtNEAT)
Goals of NERO:
 Demonstrate the power of state-of-the-art
machine learning technology
 Create an engaging game based on it
 Provide robust and challenging development
and benchmarking domain for AI researchers

http://www.nerogame.org/



Each simulated robot
player has its own play
strategy
Every simulated team
consists of a collection
of programs.
The games last for
about 10 minutes.
Sensors
• Aural
• Limited capacity
• Limited range
• Visual
• Field of View
• Range of View
• Noisy
• Physical
Actions
•
•
•
•
•
•
•
Catch (Goalie only)
Dash
Kick
Move
Say
Turn
Turn Neck
More information: http://sserver.sourceforge.net/
Halo 2 Beta
• Available at APG web page
• 4 datasets of game outcomes gathered
during the Beta testing period for the Xbox
game Halo 2
• 120,000 match outcomes between 6,000
players
Chess Rankings
•
•
•
•
Available at ChessBase web page
3.75 million Chess games from 1560 – 2007
220,000 (semi-professional) players
Largest annotated collection of Chess
games in the world
Tutorial
(13:00 – 15:30)
Why Machine
Learning and
Games?
Part 1
(13:00 – 14:10)
Coffee Break
(14:10 – 14:20)
Machine
Learning in
Commercial
Games
Reinforcement
Learning
Part 2
(14:20 – 15:30)
Unsupervised
Learning
Supervised
Learning
Testbeds
Future
Challenges
Mentioned in this
tutorial
• Adaptive and
Learning Game AI
• Realistic Physical
Movement
• Online Gaming
Interactions
Not mentioned in
this tutorial
• Adaptive Input
Devices
• Dialogue Generation
• Computer Vision
• New Game Genres
based on Machine
Learning





Computer games can be used as test beds for
machine learning research.
Machine learning can be used to improve the
user experience in computer games.
Both research and applications are in their
infancy and there are many open questions.
Game frameworks exist to plug in machine
learning algorithms, but it is never easy.
But fun...