Evolving Winning Controllers for Virtual Race Cars Yonatan Shichel & Moshe Sipper
Download ReportTranscript Evolving Winning Controllers for Virtual Race Cars Yonatan Shichel & Moshe Sipper
Evolving Winning Controllers for Virtual Race Cars
Yonatan Shichel & Moshe Sipper
Outline • • • Introduction – Artificial Intelligence – – AI in games • Robocode: Java-based tank-battle simulator • RARS: Robot Auto Racing Simulator Evolutionary Computation • Key concepts in evolution • • Genetic Algorithms (GA) Genetic Programming (GP) GP-RARS: evolution of winning controllers for virtual race cars – Game description – – – – – Previous work Evolutionary environment setup & calibration Experiments and Results Discussion Result Analysis Concluding Remarks
Introduction
Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]”
Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [ act /think] [ like humans /rationally]”
Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/ think ] [ like humans /rationally]”
Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/ think ] [like humans/ rationally ]”
Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [ act /think] [like humans/ rationally ]”
AI in Games • • • • • • games are natural candidates for AI games provide a variety of challenges games allow exploration of real-world realms games allow comparison to human behavior games can be rewarding to master games are fun!
Robocode
Robocode • • • • tank-battle simulation Java-based, open-source programming game simplistic physical model active gamer community – extensive online robot library – ongoing tournaments
RARS: Robot Auto Racing Simulator
RARS: Robot Auto Racing Simulator
RARS: Robot Auto Racing Simulator
RARS: Robot Auto Racing Simulator
RARS: Robot Auto Racing Simulator • • • • car-race simulation C++-based, open-source programming game sophisticated physical model inactive gamer community – limited online robot library – tournaments held between 1995 and 2003
Evolutionary Computation “a family of algorithmic approaches aimed at finding optimal solutions to search problems of high complexity”
Key concepts in Evolution • • • The Origin of Species (Darwin, 1859): a population is composed of many individuals individuals differ in characteristics , which are inheritable by means of sexual reproduction environment consists of limited resources , leading to a struggle for survival
Key concepts in Evolution • • The Origin of Species (Darwin, 1859): fitter individuals are more likely to survive and reproduce , passing their characteristics to their offspring as time passes, populations slowly adapt surrounding environment to their
Genetic Algorithms (GA) • • • • • Inspired by Darwin’s evolutionary principles: a fixed-size population is composed of many solution instances for the problem at hand solutions are encoded in genomes a fitness function determines how fit each individual is population is re-populated on each generation fitter individuals have higher probabilities to be selected to next generation
Genetic Algorithms (GA) • • genetic operators new individuals – crossover and mutation – are applied on selected individuals for the creation of process is repeated for many generations
Genetic Algorithms (GA) A schematic flow of a basic GA: g=0 initialize population P 0 evaluate P 0 //assign fitness values to individuals while (termination condition not met) do g=g+1; select P g from P g-1 crossover P g mutate P g evaluate P g end while
Genetic Algorithms (GA) • • • • • • • GA customization: genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Genetic Programming (GP) “an evolutionary computation approach aimed at the creation of computer programs rather than static solutions ”
Genetic Programming (GP) • individual’s genome is composed of LISP expressions
Genetic Programming (GP) example of LISP expression: + x * x 1 (+ (* x x) 1) ==> x 2 +1
Genetic Programming (GP) • • individual’s genome is composed of LISP expressions LISP expressions are composed of functions and terminals
Genetic Programming (GP) functions: terminals: {+, *} {1, x} + x * x 1
Genetic Programming (GP) functions: terminals: {+, *} {1, x} + x * x 1
Genetic Programming (GP) functions: terminals: {+, *} {1, x} + x * x 1
Genetic Programming (GP) functions: terminals: {+, *} {1, x} + x * x 1
Genetic Programming (GP) • • • individual’s genome is composed of LISP expressions LISP expressions are composed of functions and terminals LISP expressions evaluate to numeric values, hence representing functions
Genetic Programming (GP) evaluation of LISP expression: -3 -2 -1 4 3 6 5 2 1 0 0 1 2 3
x
-2 -1 0 1 2
(+ (* x x) 1)
5 2 1 2 5
Genetic Programming (GP) • • • • individual’s genome is composed of LISP expressions LISP expressions are composed of functions and terminals LISP expressions evaluate to numeric values, hence representing functions genetic operators are defined to operate on (and return) LISP expressions
Genetic Programming (GP) subtree substitution crossover: x + * x 1 (+ (* x x) 1) x 2 +1 1 1 * x + 1 (- 1 (* 1 (+ x 1))) -x
Genetic Programming (GP) subtree substitution crossover: x * + x 1 (+ (* x x) 1) x 2 +1 1 1 * x + 1 (- 1 (* 1 (+ x 1))) -x
Genetic Programming (GP) subtree substitution crossover: x * + x 1 (+ (* x x) 1) x 2 +1 1 * 1 x + 1 (- 1 (* 1 (+ x 1)) ) -x
Genetic Programming (GP) subtree substitution crossover: + 1 1 * 1 x + 1 (- 1 (* 1 (+ x 1)) ) -x
Genetic Programming (GP) subtree substitution crossover: + 1 1 -
Genetic Programming (GP) subtree substitution crossover: + * 1 x + 1 1 (+ (* 1 (+ x 1)) x+2 1) 1 -
Genetic Programming (GP) subtree substitution crossover: + * 1 x + 1 1 (+ (* 1 (+ x 1)) x+2 1) 1 x * x (- 1 (* x x) ) 1-x 2
Genetic Programming (GP) random subtree growth mutation: x + * x 1 (+ (* x x) 1) x 2 +1
Genetic Programming (GP) random subtree growth mutation: x * + x 1 (+ (* x x ) 1) x 2 +1
Genetic Programming (GP) random subtree growth mutation: x * + 1
Genetic Programming (GP) random subtree growth mutation: x 1 * + 1 1 (+ (* x (- 1 1) ) 1) 1
Genetic Programming (GP) A schematic flow of a basic GP: g=0 initialize population P 0 evaluate P 0 //assign fitness values to individuals while (termination condition not met) do g=g+1; while (P g is not full) do OP = choose a genetic operator select individual or individuals from P g-1 according to OP's inputs apply OP on selected individuals add the resulting individuals to P g end while evaluate P g end while
GP-RARS evolution of winning controllers for virtual race cars
Basic Rules • • • • one or more cars drive on a track for given number of laps cars are damaged when colliding or driving off track car may be disabled and disqualified if its damage exceeds a certain level the winner is the driver that finishes first
Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1)
Game Variants • • • • • number of cars: one , two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1)
Game Variants • • • • • number of cars: one , two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1)
Game Variants • • • • • number of cars: one , two, multiple number of tracks: one, multiple race length: short , long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1)
Game Variants • • • • • number of cars: one , two, multiple number of tracks: one, multiple race length: short , long controller program: generic , specialized driver class: reactive (c2), optimal-path (c1)
Game Variants • • • • • number of cars: one , two, multiple number of tracks: one, multiple race length: short , long controller program: generic , specialized driver class: reactive (c2), optimal-path (c1)
Controlling the Car • • • movement: steering: fuel & damage: desired speed variable wheel angle variable pit stop request flag
Car Sensors situation variables: • • • • • • current speed, drift speed and heading current track segment ID position on current track segment distances from left and right road shoulders distance to next track segment • • radii and lengths of current and next track segments additional data: complete track layout nearby cars information
Car Sensors ...some basic RARS situation variables:
The Challenge • • • • PEAS system (Russell & Norvig, 2003): Performance measure Environment Actuators Sensors
The Challenge • • • • PEAS system (Russell & Norvig, 2003): Performance measure Environment Actuators Sensors
The Challenge • • • • PEAS system (Russell & Norvig, 2003): Performance measure Environment Actuators Sensors
The Challenge
GP-RARS is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully
GP-RARS
fully
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully partially
GP-RARS
fully partially
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully partially no
GP-RARS
fully partially no
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully partially no either
GP-RARS
fully partially no static
static
indicates whether the environment changes with or without the intervention of the active agent. In the basic RARS game it can be non-static if more than one agent is active; GP-RARS is single-car and thus fully static.
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully partially no either continuous
GP-RARS
fully partially no static continuous
The Challenge
is the environment...
...observable?
...deterministic?
...episodic?
...static?
...discrete?
...single agent?
RARS
fully partially no either continuous single OR multiple
GP-RARS
fully partially no static continuous single
The Challenge • • • • PEAS system (Russell & Norvig, 2003): Performance measure Environment Actuators Sensors
The Challenge • • • • PEAS system (Russell & Norvig, 2003): Performance measure Environment Actuators Sensors
Previous Work • • planning approaches: – Genetic Algorithms (Eleveld, Sáez) – A* search (Pajala) reactive approaches: – Decision Trees (Wang) – Action Tables (Cleland) – Artificial Neural Networks (Ng, Pyeatt, Coulum) – Evolving Neural Networks (Stanley)
Previous Work • • planning approaches: – Genetic Algorithms ( Eleveld , Sáez) – A* search (Pajala) reactive approaches: – Decision Trees (Wang) – Action Tables ( Cleland ) – Artificial Neural Networks ( Ng , Pyeatt, Coulum ) – Evolving Neural Networks ( Stanley )
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Genome Representation • • • each individual is composed of two trees: – steering tree – throttling tree trees evaluate to numeric values, which are truncated to fit game-world restrictions trees are defined using an extensive set of functions and terminals, both simple and complex
Genome Representation • • • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1} terminal set (complex): {a, a-angle, off-center, inner-wall, outer-wall, closest-wall} function set: {add (2) , sub (2) , mul (2) , div (2) , abs (1) , neg (1) , tan (1) , if-greater (4) , if-positive (3) , if-cur-straight (2) , if-nex-straight (2) }
Genome Representation • • • terminal set (simple): { cur-rad , nex-rad , to-end , nex-len , v , vn , to-lft , to-rgt , track-width , random-constant , 0 , 1 } terminal set (complex): { a , a-angle , off-center, inner-wall, outer-wall, closest-wall} function set: { add (2) , sub (2) , mul (2) , div (2) , abs (1) , neg (1) , tan (1) , if-greater (4) , if-positive (3) , if-cur-straight (2) , if-nex-straight (2) } blue terminals and functions are the ones chosen after a calibration process
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Fitness Measure • • • fitness evaluation performed on a single-lap, single-car race on one track: sepang track believed to exhibit various track features two fitness measures were used: – race distance – modified race time
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Selection Method • several methods examined for a 250 individual population: – tournament of k, with k={2,3,4,5,6,7} – fitness proportionate selection – square-fitness proportionate selection
Selection Method • several methods examined for a 250 individual population: – tournament of k, with k={2, 3 ,4,5,6,7} – fitness proportionate selection – square-fitness proportionate selection
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Crossover & Mutation • • • crossover: subtree substitution mutation: random subtree growth probabilities: – 40% reproduction – 50% crossover – 10% mutation • • 5% random constant mutation 5% structural (subtree) mutation
Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation
Initialization & Termination • • initial population creation: – Koza’s ‘ramped-half-and-half’ method: for each k = {4,5,6,7,8}: • • 10% of the trees grown to a depth up to k 10% of the trees grown to a depth of exactly k termination condition: – evolution stops after 255 generations
Experiments & Results • • • several evolutionary runs were made two best runs were taken, and best driver of last generation was extracted from each driver was then tested for 10 single-lap, single-car races
Experiments & Results best run, race-distance fitness: GP-Single-1 160.0 ± 0.4 seconds
Experiments & Results best run, modified-race-time fitness: GP-Single-2 160.9 ± 0.3 seconds
...but how do they drive?
Result Comparison • • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks comparison to machine-crafted drivers
Result Comparison • • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks comparison to machine-crafted drivers
#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Result Comparison single-car, single-lap race on sepang
Driver
Dodger13 K1999 K2001 SmoothB4 Bulle2 Sparky5 SmoothB3 Felix16 SmoothB2 GPSingle1 GPSingle2 Vector WappuCar Apex8 Djoefe Ali2 Mafanja SBv1r4 Burns Eagle Bulle Magic JR001 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1
Class
1 1 1
Lap Time (sec.)
146.3 ± 0.1
146.6 ± 0.1
147.1 ± 0.1
148.3 ± 0.1
150.4 ± 0.1
150.4 ± 0.1
153.3 ± 0.1
153.6 ± 0.1
156.5 ± 0.1
160.0 ± 0.4
160.9 ± 0.3
160.1 ± 0.1
161.7 ± 0.1
162.5 ± 0.2
163.7 ± 164.1 ± 164.4 ± 165.7 ± 168.4 ± 169.3 ± 169.5 ± 174.0 ± 178.5 ± 0.1
0.1
0.3
0.1
5.7
0.6
0.2
0.1
0.1
Result Comparison • • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks comparison to machine-crafted drivers
Result Comparison Aug. 2004 season results (16 tracks)
#
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Driver
Vector Eagle GPSingle2 GPSingle1 SBv1r4 Bulle Mafanja Magic WappuCar Djoefe Burns Ali2 Apex8 JR001 2 1 1 3 2
1 st
6 1 2 2 1 2 1 2 3
2 nd
3 2 1 2 2 1 4
3 rd
2 2 2 2 2 1 6 4 1 4 6 9
total
11
Result Comparison • • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks comparison to machine-crafted drivers
Author
Eleveld (GA) Ng et al.
(ANN) Coulum (ANN) Cleland (Action Tables) Stanley et al.
(Evolving ANN) Result Comparison Previous Works Results
Track
v01 suzuka race7 v03 oval complex clkwis v01 clkwis
Reported Time (sec.) GP-Single-1
37.8 ± 0.1
149.7 ± 0.1
85.7 ± 0.2
59.4
33.0
209.0
38.0
38.1 ± 1.7
177.1 ± 5.2
61.9 ± 0.6
55.3 ± 0.5
31.0 ± 0.1
196.2 ± 6.0
37.8 ± 0.1
GP-Single-2
34.9 ± 0.1
167.5 ± 0.3
63.3 ± 0.4
49.3 ± 0.1
30.8 ± 0.1
204.6 ± 1.3
36.4 ± 0.1
37.4
38.1 ± 1.7
34.9 ± 0.1
37.6 / 37.9 37.8 ± 0.1
36.4 ± 0.1
Conclusions • • • GP-Drivers rank higher than any human crafted driver in their class when racing on their training track GP-Drivers rank among the top human crafted drivers in their class when racing on new, unseen tracks GP-Drivers perform better than any machine-crafted driver developed by past RARS researchers
Discussion
Performance Analysis GPSingle2 on sepang (159.9 sec)
Performance Analysis Dodger13 on sepang (146.5 sec)
Performance Analysis GPSingle2 on clkwis
Genome Representation • • • terminal set (simple): { cur-rad , nex-rad , to-end , nex-len , v , vn , to-lft , to-rgt , track-width , random-constant , 0 , 1 } terminal set (complex): { a , a-angle , off-center, inner-wall, outer-wall, closest-wall} function set: { add (2) , sub (2) , mul (2) , div (2) , abs (1) , neg (1) , tan (1) , if-greater (4) , if-positive (3) , if-cur-straight (2) , if-nex-straight (2) } blue terminals and functions are the ones chosen after a calibration process
Genome Representation • • • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v , vn , to-lft, to-rgt, track-width, random-constant , 0, 1 } terminal set (complex): { a , a-angle , off-center, inner-wall, outer-wall, closest-wall} function set: {add (2) , sub (2) , mul (2) , div (2) , abs (1) , neg (1) , tan (1) , if-greater (4) , if-positive (3) , if-cur-straight (2) , if-nex-straight (2) } blue terminals and functions are the ones “chosen” by evolution (in best-of-run)
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% 1.0 (% v a)) (neg a))) (- ( (* n (neg n)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a)))))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ( (% 1.0 (% v a)) (neg a))) (- ( (* n (neg n)) (neg a)) (neg a))) ( (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a)))))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ( (% a v) (neg a))) (- ( (* n (neg n)) (neg a)) (neg a))) ( (% a v) (neg (% (% a v) (% v a)))))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ( (% a v) (neg a))) (- ( (* n (neg n)) (neg a)) (neg a))) ( (% a v) (neg (% (% a v) (% v a)))))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a)) ) ( (% a v) (neg a)) ) (- ( (* n (neg n)) (neg a)) (neg a))) ( (% a v) (neg (% (% a v) (% v a)) )))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a ) ) (+ (% a v) a ) ) (- ( (neg (* n n)) (neg a)) (neg a))) ( (% a v) (neg (* (% a v) (% a v)) )))
Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a )) (+ (% a v) a )) (- ( (neg (* n n)) (neg a)) (neg a))) ( (% a v) (neg (* (% a v) (% a v)))))
Genetic Analysis GP-Single-2, Steering ...
Genetic Analysis GP-Single-2, Steering behavior depends on distance,
a
, to upcoming curve: when next turn is far enough, controller slightly adjusts wheel angle to prevent drifting off track; when approaching a curve, however, controller steers according to relative curve angle —steep curves will result in extreme wheel angle values.
Genetic Analysis • • what’s a/v?
a – distance to next obstacle v – current speed
Genetic Analysis • • what’s a/v?
a – distance to next obstacle v – current speed a/v – time to crash!
Genetic Analysis GP-Single-2, Throttling (ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (* n (neg n)) (% v a))))))
Genetic Analysis GP-Single-2, Throttling (ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (* n (neg n)) (% v a))))))
Genetic Analysis GP-Single-2, Throttling (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a))))
Genetic Analysis GP-Single-2, Throttling
Future Work • • • apply GP to other RARS variants – multiple-car scenarios – long (endurance) races use GA to plan optimal paths migrate research to TORCS
Bibliography • • • • • • • Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach. 2nd edition. s.l. : Prentice Hall, 2003. ISBN 0-13-790395-2 Darwin, Charles. On the Origin of Species: By Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life. London : John Murray, 1859. ISBN 0-486-45006-6 GP-Robocode: Using Genetic Programming to Evolve Robocode Players. Shichel, Yehonatan, Ziserman, Eran and Sipper, Moshe. s.l. : Springer, 2005. 8th European Conference on Genetic Programming. pp. 143-154 Eleveld, Doug. [Online] http://rars.sourceforge.net/selection/douge1.txt
Pajala, Jussi. [Online] http://rars.sourceforge.net/selection/jussi.html
Wang, Zhijin. Car Simulation Using Reinforcement Learning. Computer Science Department, University of British Columbia. Vancouver, B.C., Canada : s.n., 2003 MoNiF: a modular neuro-fuzzy controller for race car navigation. Ng, Kim C, et al. Monterey, CA, USA : s.n., 1997. IEEE International Symposium on Computational Intelligence in Robotics and Automation. pp. 74-79. ISBN 0-8186-8138-1
Bibliography • • • • • Learning to Race: Experiments with a Simulated Race Car. Pyeatt, Larry D and Howe, Adele E. Sanibel Island, Florida, USA : s.n., 1998. 11th International Florida Artificial Intelligence Research Society Conference Coulom, Rémi. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. Institut National Polytechnique de Grenoble. 2002. PhD Thesis Cleland, Ben. Reinforcement Learning for Racecar Control. University of Waikato. 2006. M.Sc. Thesis Neuroevolution of an automobile crash warning system. Stanley, Kenneth, et al. 2005. Genetic And Evolutionary Computation Conference. pp. 1977 - 1984. ISBN 1-59593-010 8 Sáez, Yago, et al. Driving Cars by Means of Genetic Algorithms. Parallel Problem Solving from Nature – PPSN X. s.l. : Springer, 2008, pp. 1101-1110