Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.
Download ReportTranscript Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part.
Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Tutorial (13:00 – 15:30) Part 1 (13:00 – 14:10) Why Machine Learning and Games? Machine Learning in Commercial Games Coffee Break (14:10 – 14:20) Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Test Beds for Machine Learning • Perfect instrumentation and measurements • Perfect control and manipulation • Reduced cost • Reduced risk • Great way to showcase algorithms Improve User Experience • Create adaptive, believable game AI • Compose great multiplayer matches based on skill and social criteria • Mitigate Network latency using prediction • Create realistic character movement Partially observable stochastic games States only partially observed Multiple agents choose actions Stochastic pay-offs and state transitions depend on state and all the other agents’ actions Goal: Optimise long term pay-off (reward) Just like life: complex, adversarial, uncertain, and we are in it for the long run! From single player’s perspective Approximate Solutions What is the best AI? • Partially Observable Markov Decision Process (POMDP) • Reinforcement Learning • Unsupervised Learning • Supervised Learning • Always takes optimal actions • Delivers best entertainment value Tutorial (13:00 – 15:30) Part 1 (13:00 – 14:10) Why Machine Learning and Games? Machine Learning in Commercial Games Coffee Break (14:10 – 14:20) Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Agents 1977 Non-Player Character Human Player 2001 Number of games consoles sold • Xbox 360: 19 million • Wii: 18 million • PS3: 10 million Global Revenue: Video Game Software & Hardware • 2005: 8.7 billion US$ • 2006: 10.3 billion US$ • 2007: 13.7 billion US$ Hardware development • Graphics (graphics cards, displays) • Sound (5.1, 7.1 surround, speakers, headphones) • CPU speed, cache • Memory, Hard disks, DVD, HDDVD (Blu-Ray) • Networking (Broadband) Software development • Graphics (rendering etc.) • Sound (sound rendering etc.) • Physics simulation (e.g., Havoc engine) • Networking software (e.g., Xbox Live) • Artificial Intelligence Objective is to nurture creatures called norns Model incorporates artificial life features Norns had neural network brains Their development can be influenced by player feedback Peter Molineux’s famous “God Game” Player determines fate of villagers as their “God” (seen as a hand) Creature can be taught complex behaviour Good and Evil - actions have consequence First car racing game to use neural networks Variety of tracks, drivers and road conditions Racing line provided by author, neural network keeps car on racing line Multilayer perceptrons trained with RPROP Simple rules for recovery and overtaking Applications of ML in games • Battle Cruiser: 3000AD (1996) • Spin: Sprint Car Racing (2001) • Virtua Fighter 4 (2001) Source: http://www.gameai.com/games.html Reasons for not using Machine Learning • Designers afraid of unpredictable behaviour • Technical difficulty of making it work Adaptive avatar for driving Separate game mode Basis of all in-game AI Basis of “dynamic” racing line Problem Solution • Match outcomes between arbitrary number of teams and players per team • Partial orderings into global orderings • Fair competitive matchmaking • Probabilistic model of match outcomes and skill • Efficient Bayesian inference • Generalisation of Chess ranking algorithm ELO • Launched September 2005 • Every Xbox 360 game uses TrueSkill™ • > 12 million players • > 2 million matches per day • > 2 billion hours of accumulated game-play • Launched September 2007 • Largest entertainment launch in history • Often, > 200,000 players concurrently playing • Reference implementation of TrueSkill Xbox Live Halo 3 Tutorial (13:00 – 15:30) Part 1 (13:00 – 14:10) Why Machine Learning and Games? Machine Learning in Commercial Games Coffee Break (14:10 – 14:20) Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges game state Agent parameter update action reward / punishment Learning Algorithm game state Game action • • • • • • • Q(s,a) is expected reward for action a in state s. α is rate of learning a is action chosen r is reward resulting from a s is current state s’ is state after executing a γ is discount factor for future rewards Q Learning (off-policy) SARSA (on-policy) +10.0 actions Q-Table THROW KICK STAND 13.2 10.2 -1.3 1ft / GROUND 2ft / GROUND 3ft / GROUND 3 ft 4ft / GROUND game states 5ft / GROUND 6ft / GROUND 1ft / KNOCKED 2ft / KNOCKED 3ft / KNOCKED 4ft / KNOCKED 5ft / KNOCKED 6ft / KNOCKED 3.2 6.0 4.0 Game state features • • • • Reinforcement Learner Separation (5 binned ranges) Last action (6 categories) Mode (ground, air, knocked) Proximity to obstacle Available Actions • 19 aggressive (kick, punch) • 10 defensive (block, lunge) • 8 neutral (run) Q-Function Representation • One layer neural net (tanh) • Linear In-Game AI Code Reward for decrease in Wulong Goth’s health Early in the learning process … … after 15 minutes of learning Punishment for decrease in either player’s health Early in the learning process … … after 15 minutes of learning 3. 4. 5. Collect Experience Learn transition probabilities and rewards Revise Value Function and Policy Revise state-action abstraction Return to 1 and collect more experience Left Distance 1. 2. Speed Too Coarse Just Right! Too Fine Representational Complexity A A A A A A A A Split Merge Split Merge Real time racing simulation. Goal: as fast lap times as possible. Laser Range Finder Measurements as Features Progress along Track as Reward Coast Accelerate Brake Hard-Left Hard-Right Soft-Left Soft-Right Current Games have unrealistic physical movement Moonwalk Hovering Only death scenes are realistic Rag-doll physics Releases joint constraints Compromise between hard-wired and learned controller Motion Sequencer with corrections FOX controller: based on cerebellar model articulation controller (CMAC) neural network trained by reinforcement learning Can follow paths and climb up and down slopes Trained monopeds (“hopper”) and bipeds Hopper Training Hopper Trained Biped Training Biped Trained Tutorial (13:00 – 15:30) Part 1 (13:00 – 14:10) Why Machine Learning and Games? Machine Learning in Commercial Games Part 2 (14:20 – 15:30) Coffee Break (14:10 – 14:20) Reinforcement Learning Unsupervised Learning Supervised Learning Testbeds Future Challenges Tutorial (13:00 – 15:30) Part 1 (13:00 – 14:10) Why Machine Learning and Games? Machine Learning in Commercial Games Coffee Break (14:10 – 14:20) Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Fix Markers at key body positions Record their position in 3D during motion Fundamental technology in animation today Free download of mocap files: www.bvhfiles.com Generative model for dimensionality reduction Probabilistic equivalent to PCA which defines a probability distribution over data Non-linear manifolds based on kernels Visualisation of high-dimensional data Back-projection from latent to data space Can deal with missing data Latent variables Weight matrix x W y Data • SPCA: Marginalise over x and optimise W • GPLVM: Marginalise over W and optimise x Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Goal: Learn from skilled players how to act in a first-person shooter (FPS) game Test Environment: Unreal Tournament FPS game engine Gamebots control framework Idea: Naive Bayes classifier to learn under which circumstances to switch behaviour St: bot’s state at t St+1: bot’s state t+1 H: health level W: weapon OW: opponent’s weapon HN: hear noise NE: number of close enemies PW: weapon close by? PH: health pack close by? Naive Bayes (“inverse programming”) Supervised learning • Specify probabilities P(H| St+1) and P(St+1|St) • Use Bayes rule for P(St+1|H,W,...) etc. • Recognition of state of human trainer • Reading out variables from game engine • Determine relative frequencies to estimate probabilities P(H| St+1) (table). Adaptive avatar for driving Separate game mode Basis of all in-game AI Basis of “dynamic” racing line “Built-In” AI Behaviour Development Tool Drivatar Learning System Drivatar Racing Line Behaviour Model Vehicle Interaction and Racing Strategy Recorded Player Driving Controller Car Behaviour Drivatar AI Driving Two phase process: 1. Pre-generate possible racing lines prior to the race from a (compressed) racing table. 2. Switch the lines during the race to add variability. Compression reduces the memory needs per racing line segment Switching makes smoother racing lines. Segments a1 a2 a3 a4 Car Position and Speed at time t Static Car Properties Car Controls Physics (“causal”) Physics Simulation System Control (“inverse”) Controller Car Position and Speed at time t+1 Desired Pos. and Speed at time t+1 Given: Match outcomes: Orderings among k teams consisting of n1, n2 , ..., nk players, respectively Questions: Skill si for each player such that Global ranking among all players Fair matches between teams of players Possible outcomes: Player 1 wins over 2 (and vice versa) s1 s2 p1 p2 y1 2 Gaussian Prior Factors s s s s 1 2 3 4 Fast and efficient approximate message passing t using Expectation t Propagation t 1 2 3 y1 y2 2 3 Ranking Likelihood Factors Leaderboard Global ranking of all players Matchmaking For gamers: Most uncertain outcome For inference: Most informative Both are equivalent! 40 35 30 Level 25 20 15 char (TrueSkill™) 10 SQLWildman (TrueSkill™) char (Halo 2 rank) SQLWildman (Halo 2 rank) 5 0 0 100 200 300 Number of Games 400 Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Multi-platform car racing simulation (Linux, FreeBSD, MacOSX and Windows) under GPL 50 different cars, more than 20 tracks, and 50 opponents Lighting, smoke, skid-marks and glowing brake disks Damage model, collisions, tire and wheel properties, aerodynamics http://torcs.sourceforge.net/ Next competition Car Racing @ CIG 2008! Based on real-time NeuroEvolution of Augmenting Topologies (rtNEAT) Goals of NERO: Demonstrate the power of state-of-the-art machine learning technology Create an engaging game based on it Provide robust and challenging development and benchmarking domain for AI researchers http://www.nerogame.org/ Each simulated robot player has its own play strategy Every simulated team consists of a collection of programs. The games last for about 10 minutes. Sensors • Aural • Limited capacity • Limited range • Visual • Field of View • Range of View • Noisy • Physical Actions • • • • • • • Catch (Goalie only) Dash Kick Move Say Turn Turn Neck More information: http://sserver.sourceforge.net/ Halo 2 Beta • Available at APG web page • 4 datasets of game outcomes gathered during the Beta testing period for the Xbox game Halo 2 • 120,000 match outcomes between 6,000 players Chess Rankings • • • • Available at ChessBase web page 3.75 million Chess games from 1560 – 2007 220,000 (semi-professional) players Largest annotated collection of Chess games in the world Tutorial (13:00 – 15:30) Why Machine Learning and Games? Part 1 (13:00 – 14:10) Coffee Break (14:10 – 14:20) Machine Learning in Commercial Games Reinforcement Learning Part 2 (14:20 – 15:30) Unsupervised Learning Supervised Learning Testbeds Future Challenges Mentioned in this tutorial • Adaptive and Learning Game AI • Realistic Physical Movement • Online Gaming Interactions Not mentioned in this tutorial • Adaptive Input Devices • Dialogue Generation • Computer Vision • New Game Genres based on Machine Learning Computer games can be used as test beds for machine learning research. Machine learning can be used to improve the user experience in computer games. Both research and applications are in their infancy and there are many open questions. Game frameworks exist to plug in machine learning algorithms, but it is never easy. But fun...