wcci 2008 tutorial on computational intelligence and games, part 2 of 3
DESCRIPTION
WCCI 2008 Tutorial on Computational Intelligence and Games by Simon Lucas, Julian Togelius and Thomas Runarsson, part 2 of 3TRANSCRIPT
CIG case study: car racing• A prolonged example of applying CI to a
game: car racing
• Sensor representation and input selection
• Incremental evolution
• Competitive coevolution
• Player modelling
• Content creation
Racing games
• On the charts for the last three decades
• Can be technically simple (computationally cheap) or very sophisticated
• Easy to pick up and play, but possess almost unlimited “depth” (a lifetime to master)
• Can be played on your own or with others
CI in racing games• Learning to race
• on your own, against specific opponents, against opponents in general, on one or several tracks, using simple or complex cars/physics models, etc.
• Modelling driving styles
• Creating entertaining game content: tracks and opponent drivers
A simple car game
• Optimised for speed, not for prettiness
• 2D dynamics (momentum, understeer, etc.)
• Intended to qualitatively replicate a standard toy R/C car driven on a table
• Bang-bang control (9 possible commands)
• Walls are solid
• Waypoints must be passed in order
• Fitness: continuous approximation of waypoints passed in 700 time steps
• Inputs
• Six range-finder sensors (evolvable pos.)
• Waypoint sensor, Speed, Bias
• Networks
• Standard MLP, 9:6:2
• Outputs interpreted as thrust/steering
Fig. 2. The initial sensor setup, which is kept throughout the evolutionaryrun for those runs where sensor parameters are not evolvable. Here, the caris seen in close-up moving upward-leftward. At this particular position, thefront-right sensor returns a positive number very close to 0, as it detects awall near the limit of its range; the front-left sensor returns a number closeto 0.5, and the back sensor a slightly larger number. The front, left and rightsensors do not detect any walls at all and thus return 0.
range 200 pixels, as has three sensors pointing forward-
left, forward-right and backward respectively. The two other
sensors, which point left and right, have reach 100; this is
illustrated in figure 2.
B. Neural networks
The controllers in the experiments below are based on
neural networks. More precisely, we are using multilayer
perceptrons with three neuronal layers (two adaptive layers)
and tanh activation functions. A network has at least three
inputs: one fixed input with the value 1, one speed input
in the approximate range [0..3], and one input from the
waypoint sensor, in the range [-!..!]. In addition to this,it might have any number of inputs from wall sensors, in
the range [0..1]. All networks have two outputs, which are
interpreted as driving commands for the car.
C. Evolutionary algorithm
The genome is an array of floating point numbers, of
variable or fixed length depending on the experimental setup.
Apart from information on the number of wall sensors and
hidden neurons, it encodes the orientation and range of the
wall sensors, and weights of the connections in the neural
network.
The evolutionary algorithm used is a kind of evolutionary
strategy, with µ = 50 and ! = 50. In other words, 50genomes (the elite) are created at the start of evolution. At
each generation, one copy is made of each genome in the
elite, and all copies are mutated. After that, fitness value is
calculated for each genome, and the 50 best individuals of
all 100 form the new elite.
There are two mutation operators: Gaussian mutation
of all weight values, and Gaussian mutation of all sensor
parameters (angles and lengths), which might be turned on
or off. In both cases, the standard deviation of the Gaussian
distribution was set to 0.3.
Last but not least: the fitness function. The fitness of a
controller is calculated as the number of waypoints it has
Track 10 50 100 200 Pr.1 0.32 (0.07) 0.54 (0.2) 0.7 (0.38) 0.81 (0.5) 22 0.38 (0.24) 0.49 (0.38) 0.56 (0.36) 0.71 (0.5) 23 0.32 (0.09) 0.97 (0.5) 1.47 (0.63) 1.98 (0.66) 74 0.53 (0.17) 1.3 (0.48) 1.5 (0.54) 2.33 (0.59) 95 0.45 (0.08) 0.95 (0.6) 0.95 (0.58) 1.65 (0.45) 86 0.4 (0.08) 0.68 (0.27) 1.02 (0.74) 1.29 (0.76) 57 0.3 (0.07) 0.35 (0.05) 0.39 (0.09) 0.46 (0.13) 08 0.16 (0.02) 0.19 (0.03) 0.2 (0.01) 0.2 (0.01) 0
TABLE I
THE FITNESS OF THE BEST CONTROLLER OF VARIOUS GENERATIONS ON
THE DIFFERENT TRACKS, AND NUMBER OF RUNS PRODUCING
PROFICIENT CONTROLLERS. FITNESS AVERAGED OVER 10 SEPARATE
EVOLUTIONARY RUNS; STANDARD DEVIATION BETWEEN PARENTHESES.
passed, divided by the number of waypoints in the track,
plus an intermediate term representing how far it is on its way
to the next waypoint, calculated from the relative distances
between the car and the previous and next waypoint. A
fitness of 1.0 thus means having completed one full track
within the alloted time. Waypoints can only be passed in the
correct order, and a waypoint is counted as passed when the
centre of the car is within 30 pixels from the waypoint. In
the evolutionary experiments reported below, each car was
allowed 700 timesteps (enough to do two to three laps on
most tracks in the test set) and fitness was averaged over
three trials.
IV. EVOLVING TRACK-SPECIFIC CONTROLLERS
The first experiments consisted in evolving controllers for
the eight tracks separately, in order to the test the software
in general and to rank the difficulty of the tracks.
For each of the tracks, the evolutionary algorithm was run
10 times, each time starting from a population of “clean”
controllers, with all connection weights set to zero and sensor
parameters as explained above. Only weight mutation was
allowed. The evolutionary runs were for 200 generations
each.
A. Fixed sensor parameters
1) Evolving from scratch: The results are listed in table I,
which is read as follows: each row represents the results for
one particular track. The first column gives the mean of the
fitnesses of the best controller of each of the evolutionary
runs at generation 10, and the standard deviation of the
fitnesses of the same controllers. The next three columns
present the results of the same calculations at generations 50,
100 and 200, respectively. The “Pr” column gives the number
of proficient best controllers for each track. An evolutionary
run is deemed to have produced a proficient controller if
its best controller at generation 200 has a fitness (averaged,
as always, over three trials) of at least 1.5, meaning that it
completes at least one and a half lap within the allowed time.
For the first two tracks, proficient controllers were pro-
duced by the evolutionary process within 200 generations,
but only in two out of ten runs. This means that while it is
possible to evolve neural networks that can be relied on to
Track 1 2 3 4 5 6 7 8Fitness (sd) 1.66 (0.12) 1.86 (0.02) 2.27 (0.45) 2.66 (0.3) 2.19 (0.23) 2.47 (0.18) 0.22 (0.15) 0.15 (0.01)
TABLE V
FITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS. COMPOUND FITNESS
2.22 (0.09).
Track 10 50 100 200 Pr.1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01) 2.04 (0.02) 102 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2.15 (0.01) 103 3.25 (0.08) 3.4 (0.1) 3.45 (0.12) 3.57 (0.1) 104 3.35 (0.11) 3.58 (0.11) 3.61 (0.1) 3.67 (0.1) 105 2.66 (0.13) 2.84 (0.02) 2.88 (0.06) 2.88 (0.06) 106 2.64 (0) 2.71 (0.08) 2.72 (0.08) 2.82 (0.1) 107 1.53 (0.29) 1.84 (0.13) 1.88 (0.12) 1.9 (0.09) 108 0.59 (0.15) 0.73 (0.22) 0.85 (0.21) 0.93 (0.25) 0
TABLE VI
FITNESS OF BEST CONTROLLERS, EVOLVING CONTROLLERS
SPECIALISED FOR EACH TRACK, STARTING FROM A FURTHER EVOLVED
GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS.
Fig. 5. Sensor setup of controller specialized for track 5. While more orless retaining the two longest-range sensors from the further evolved generalcontroller it is based on, it has added medium-range sensors in the front andback, and a very short-range sensor to the left.
controllers. For each track, 10 evolutionary runs were made,
where the initial population was seeded with the general
controller and evolution was allowed to continue for 200
generations. Results are shown in table VI. The mean fitness
improved significantly on all six first tracks, and much of
the fitness increase occured early in the evolutionary run,
as can be seen from a comparison with table V. Further,
the variability in mean fitness of the specialized controllers
from different evolutionary runs is very low, meaning that the
reliability of the evolutionary process is very high. Perhaps
most surprising, however, is that all 10 evolutionary runs
produced proficient controllers for track 7, on which the
general controller had not been trained (and indeed had very
low fitness) and for which it had previously been found to
be impossible to evolve a proficient controller from scratch.
Analysis of the evolved sensor parameters of the special-
ized controllers show a remarkable diversity, even among
controllers specialized for the same track, as evident in
figures 5, 6 and 7. Sometimes, no similarity can be found
between the evolved configuration and either the original
sensor parameters or those of the further evolved general
controller the specialization was based on.
Fig. 6. Sensor setup of a controller specialized for, and able to consistentlyreach good fitness on, track 7. Presumably the use of all but one sensor andtheir angular spread reflects the large variety of different situations the carhas to handle in order to navigate this more difficult track.
Fig. 7. Sensor setup of another controller specialized for track 7, like theone in figure 6 seemingly using all its sensors, but in a quite different way.
VII. OBSERVATIONS ON EVOLVED DRIVING BEHAVIOUR
It has previously been found that the evolutionary approach
used in this paper can produce controllers that outperform
human drivers[4]. To corroborate this result, one of the
authors measured his own performance on the various tracks,
driving the car using keyboard inputs and a suitable delay
of 50 ms between timesteps. Averaged over 10 attempts,
the author’s fitness on track 2 was 1.89, it was 2.65 on
track 5, and 1.83 on track 7, numbers which compare rather
unfavourably with those found in table VI. The responsible
author would like to believe that this says more about the
capabilities of the evolved controllers than those of the
author.
Traces of steering and driving commands from the evolved
controllers show that they often use a PWM-like technique,
in that they frequently - sometimes almost every timestep -
change what commands they issue. For example, the general
controller used as the base for the specializations above
employs the tactic of constantly alternating between steering
left and right when driving parallell to a wall, giving the
appearance that the car is shaking. Frequently alternating
Track 1 2 3 4 5 6 7 8Fitness (sd) 1.66 (0.12) 1.86 (0.02) 2.27 (0.45) 2.66 (0.3) 2.19 (0.23) 2.47 (0.18) 0.22 (0.15) 0.15 (0.01)
TABLE V
FITNESS OF A FURTHER EVOLVED GENERAL CONTROLLER WITH EVOLVABLE SENSOR PARAMETERS ON THE DIFFERENT TRACKS. COMPOUND FITNESS
2.22 (0.09).
Track 10 50 100 200 Pr.1 1.9 (0.1) 1.99 (0.06) 2.02 (0.01) 2.04 (0.02) 102 2.06 (0.1) 2.12 (0.04) 2.14 (0) 2.15 (0.01) 103 3.25 (0.08) 3.4 (0.1) 3.45 (0.12) 3.57 (0.1) 104 3.35 (0.11) 3.58 (0.11) 3.61 (0.1) 3.67 (0.1) 105 2.66 (0.13) 2.84 (0.02) 2.88 (0.06) 2.88 (0.06) 106 2.64 (0) 2.71 (0.08) 2.72 (0.08) 2.82 (0.1) 107 1.53 (0.29) 1.84 (0.13) 1.88 (0.12) 1.9 (0.09) 108 0.59 (0.15) 0.73 (0.22) 0.85 (0.21) 0.93 (0.25) 0
TABLE VI
FITNESS OF BEST CONTROLLERS, EVOLVING CONTROLLERS
SPECIALISED FOR EACH TRACK, STARTING FROM A FURTHER EVOLVED
GENERAL CONTROLLER WITH EVOLVED SENSOR PARAMETERS.
Fig. 5. Sensor setup of controller specialized for track 5. While more orless retaining the two longest-range sensors from the further evolved generalcontroller it is based on, it has added medium-range sensors in the front andback, and a very short-range sensor to the left.
controllers. For each track, 10 evolutionary runs were made,
where the initial population was seeded with the general
controller and evolution was allowed to continue for 200
generations. Results are shown in table VI. The mean fitness
improved significantly on all six first tracks, and much of
the fitness increase occured early in the evolutionary run,
as can be seen from a comparison with table V. Further,
the variability in mean fitness of the specialized controllers
from different evolutionary runs is very low, meaning that the
reliability of the evolutionary process is very high. Perhaps
most surprising, however, is that all 10 evolutionary runs
produced proficient controllers for track 7, on which the
general controller had not been trained (and indeed had very
low fitness) and for which it had previously been found to
be impossible to evolve a proficient controller from scratch.
Analysis of the evolved sensor parameters of the special-
ized controllers show a remarkable diversity, even among
controllers specialized for the same track, as evident in
figures 5, 6 and 7. Sometimes, no similarity can be found
between the evolved configuration and either the original
sensor parameters or those of the further evolved general
controller the specialization was based on.
Fig. 6. Sensor setup of a controller specialized for, and able to consistentlyreach good fitness on, track 7. Presumably the use of all but one sensor andtheir angular spread reflects the large variety of different situations the carhas to handle in order to navigate this more difficult track.
Fig. 7. Sensor setup of another controller specialized for track 7, like theone in figure 6 seemingly using all its sensors, but in a quite different way.
VII. OBSERVATIONS ON EVOLVED DRIVING BEHAVIOUR
It has previously been found that the evolutionary approach
used in this paper can produce controllers that outperform
human drivers[4]. To corroborate this result, one of the
authors measured his own performance on the various tracks,
driving the car using keyboard inputs and a suitable delay
of 50 ms between timesteps. Averaged over 10 attempts,
the author’s fitness on track 2 was 1.89, it was 2.65 on
track 5, and 1.83 on track 7, numbers which compare rather
unfavourably with those found in table VI. The responsible
author would like to believe that this says more about the
capabilities of the evolved controllers than those of the
author.
Traces of steering and driving commands from the evolved
controllers show that they often use a PWM-like technique,
in that they frequently - sometimes almost every timestep -
change what commands they issue. For example, the general
controller used as the base for the specializations above
employs the tactic of constantly alternating between steering
left and right when driving parallell to a wall, giving the
appearance that the car is shaking. Frequently alternating
Example video
Evolved with 50+50 ES, 100 Generatons
Choose your inputs(+their representation)• Using third-person inputs (cartesian inputs)
seems not to work
• Either range-finders or waypoint sensor can be taken away, but some fitness lost
• A little bit of noise is not a problem, actually it’s desirable
• Adding extra inputs (while keeping core inputs) can reduce evolvability drastically!
If you don’t knowyour inputs...
• Memetic techniques (e.g. memetic ES) can sort out useful from useless inputs
• Principle: evolve neural network weights together with a mask: whether connections are on or off
• Masks and weights are evolved at different time scales; after every mask mutation, weight space is searched - if no fitness increase, the mask is reverted
Learning controllers with irrelevant inputs present
Togelius, Gomez and Schmidhuber (2008)
Generalization and specialization
• A controller evolved for one track does not necessarily perform well on other tracks
• How do we achieve more general game-playing skills?
• Is there a tradeoff between generality and performance?
Fig. 1. The eight tracks. Notice how tracks 1 and 2 (at the top), 3 and4, 5 and 6 differ in the clockwise/anti-clockwise layout of waypoints andassociated starting points. Tracks 7 and 8 have no relation to each otherapart from both being difficult.
how to evolve controllers that provide robust performanceover several tracks. These controllers are then validated ontracks for which they have not been evolved. Finally, thesecontrollers are further evolved to provide better fitness onspecific tracks, conclusions are drawn, and further researchis suggested.
II. THE CAR RACING MODEL
The experiments in this article were performed in a2-dimensional simulator, intended to qualitatively if notquantitatively, model a standard radio-controlled (R/C) toycar (approximately 17 centimeters long) in an arena withdimensions approximately 3*2 meters, where the track isdelimited by solid walls. The simulation has the dimensions400*300 pixels, and the car measures 20*10 pixels.
R/C toy car racing differs from racing full-sized cars inseveral ways. One is the simplified controls; many R/C carshave only three possible drive modes (forward, backward,and neutral) and three possible steering modes (left, rightand center). Other differences are that many toy cars havebad grip on many surfaces, leading to easy skidding, and that
damaging such cars in collisions is harder due to their lowweight.
The dynamics of the car are based on a reasonably detailedmechanical model, taking into account the small size of thecar and bad grip on the surface, but is not based on any actualmeasurement [13][14]. The model is similar to that used in[4], and differs mainly in its improved collision handling;after more experience with the physical R/C cars the collisionresponse system was reimplemented to make collisions morerealistic (and, as an effect, more undesirable). Now, a collisonmay cause the car to get stuck if the wall is struck at anunfortunate angle, something often seen in experiments withphysical cars.
A track consists of a set of walls, a chain of waypoints,and a set of starting positions and directions. When a caris added to a track in one of the starting positions, withcorresponding starting direction, both the position and anglebeing subject to random alterations. The waypoints are usedfor fitness calculations.
For the experiments we have designed eight differenttracks, presented in figure 1. The tracks are designed tovary in difficulty, from easy to hard. Three of the tracksare versions of three other tracks with all the waypointsin reverse order, and the directions of the starting positionsreversed.
The main differences between our simulation and thereal R/C car racing problem have to do with sensing. Asreported in Tanev et al. as well as [4], there is a small butnot unimportant lag in the communication between camera,computer and car, leading to the controller acting on outdatedperceptions. Apart from that, there is often some errorin estimations of the car’s position and velocity from anoverhead camera. In contrast, the simulation allows instantand accurate information to be fed to the controller.
III. EVOLVABLE INTELLIGENCE
A. Sensors
The car experiences its environment through two typesof sensors: the waypoint sensor, and the wall sensors. Thewaypoint sensor gives the difference between the car’s cur-rent orientation and the angle to the next waypoint (but notthe distance to the waypoint). When pointing straight to awaypoint, this sensor thus outputs 0, when the waypoint isto the left of the car it outputs a positive value, and vice versa.As for the wall sensors, each sensor has an angle (relative tothe orientation of the car) and a range, between 0 and 200pixels. The output of the wall sensor is zero if no wall isencountered along a line with the specified angle and rangefrom the centre of the car, otherwise it is a fraction of one,depending on how close to the car the sensed wall is. A smallamount of noise is applied to all sensor readings, as it is tostarting positions and orientations.
In some of the experiments the sensor parameters aremutated by the evolutionary algorithm, but in all experimentsthey start from the following setup: one sensor points straightforward (0 radians) in the direction of the car and has
Incremental evolution
• Introduced by Gomez & Mikkulainen (1997)
• Change the fitness function f (to make it more demanding) as soon as a certain fitness is achieved
• In this case, add new tracks to f as soon as the controller can drive 1.5 rounds on all tracks currently in f
Incremental evolution
• Controllers evolved for specific tracks perform poorly on other tracks
• General controllers, that can drive almost any track, can be incrementally evolved
• Starting from a general controller, a controller can be further evolved for specialization on a particular track
• drive faster than the general controller
• works even when evolution from scratch did not work!
Two cars on a track• Two car with solo-evolved controllers on
one track: disaster
• they don’t even see each other!
• How do we train controllers that take other drivers into account? (avoiding collisions or using them to their advantage)
• Solution: car sensors (rangefinders, like the wall sensors) and competitive coevolution
Video: navigatinga complex track
Competitive coevolution
• The fitness function evaluates at least two individuals
• One individual’s success is adversely affected by the other’s (directly or indirectly)
• Very potent, but seldom straightforward; e.g. Hillis (1991), Rosin and Belew (1996)
Competitive coevolution
• Standard 15+15 ES; each individual is evaluated through testing against the current best individual in the population
• Fitness function a mix of...
• Absolute fitness: progress in n time steps
• Relative fitness: distance ahead of or behind the other car after n time steps
Video: absolute fitness
Video: 50/50 fitness
Video: relative fitness
Problems with coevolution
• Over-specialization and cycling
• Can be battled with e.g. archives
• Loss of gradient
• Can be battled with careful fitness function design, e.g. combining absolute and relative fitness
• Much more research needed here!
Multi-population coevolution
• Typically, competitive coevolution uses one or two populations
• Many more populations can be used!
• Can help against cycling and overspecialization
• The phenotypical diversity between populations can be useful in itself
Example: 1 versus 9 populations
Togelius, Burrow, Lucas (2007)
Player modelling
• Can we create players that drive just like specific human players?
• The models need to be...
• Similar in terms of performance
• Similar in terms playing (driving) style
• Robust
Direct modelling• Let a player drive a number of tracks
• Use supervised learning to associate inputs (sensors) with outputs (driving commands)
• e.g. MLP/Backpropagation or k-nearest neighbour
• Suffers from generalization problems, and that any approximation is likely to lead to worse playing performance
Indirect modelling
• Let a human drive a test track, record performance, speed and orthogonal deviation at the various waypoints the track
• Start from a good, general evolved neural network controller, and evolve it further
• Fitness: negative difference between controller and player for the three measures above
Fig. 2. The test track and the car.
First of all, we design a test track, featuring a number ofdifferent types of racing challenges. The track, as picturedin (fig), has two long straight sections where the player candrive really fast (or choose not to), a long smooth curve,and a sequence of nasty sharp turns. Along the track are 30waypoints, and when a human player drives the track, theway he passes each waypoint is recorded. What is recordedis the speed of the car when the waypoint is passed, and theorthogonal deviation from the path between the waypoints,i.e. how far to the left or right of the waypoint the car passed.This matrix of two times 30 values constitutes the raw datafor the player model.
The actual player model is constructed using the Cascad-ing Elitism algorithm, starting from a general controller andevolving it on the test track. Three fitness functions are used,based on minimising the following differences between thereal player and the controller: f1: total progress (number ofwaypoints passed within 1500 timesteps), f2: speed at whicheach waypoint was passed, and f3: orthogonal deviation waspassed. The first and most important fitness measure is thustotal progress difference, followed by speed and deviationdifference respectively.
D. ResultsIn our experiments, five different players’ driving was
sampled on the test track, and after 100 generations ofthe Cascading Elitism algorithm with a population of 100,controllers whose driving bore an acceptable degree ofresemblance to the modelled humans had emerged. Thetotal progress varied considerably between the five players- between 1.31 and 2.59 laps in 1500 timesteps - and thisdifference was faithfully replicated in the evolved controllers,which is to say that some controllers drove much faster thanothers. Progress was made on the two other fitness measuresas well, and though there was still some numerical differ-ence between the real and modelled speed and orthogonaldeviation at most waypoint passings, the evolved controllersdo reproduce qualitative aspects of the modelled players’driving. For example, the controller modelled on the first
0 10 20 30 40 50!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
Fitn
ess
(pro
gess
, spe
ed)
0 10 20 30 40 50!11
!10.5
!10
!9.5
!9
!8.5
!8
!7.5
!7
!6.5
!6
Generations
Fitn
ess
(orth
ogon
al d
evia
tion)
speedprogressorthogonal deviation
Fig. 3. Evolving a controller to model a slow, careful driver.
0 10 20 30 40 50
!2
!1.8
!1.6
!1.4
!1.2
!1
!0.8
!0.6
!0.4
!0.2
0
Fitn
ess
(pro
gess
, spe
ed)
0 10 20 30 40 50!11
!10.5
!10
!9.5
!9
!8.5
!8
!7.5
!7
Generations
Fitn
ess
(orth
ogon
al d
evia
tion)
speedprogressorthogonal deviation
Fig. 4. Evolving a controller to model a good driver. The lack of progress onminimising the progress difference is because the progress of the modelleddriver is very close to that of the generic controller used to initialise theevolution.
author drives very close to the wall in the long smoothcurve, very fast on the straight paths, and smashes into thewall at the beginning of the first sharp turn. Conversely, thecontroller modelled on the anonymous and very careful driverwho scored the lowest total progress crept along at a steadyspeed, always keeping to the center of the track.
V. TRACK EVOLUTION
Once a good model of the human player has been acquired,we will use this model to evolve new, fun racing tracks forthe human player. In order to do this, we must know whatit is for a racing track to be fun, how we can measure thisproperty, and how the racing track should be representedin order for good track designs to be in easy reach of theevolutionary algorithm. We have not been able to find anyprevious research on evolving tracks, or for that sake any sortof computer game levels or environments. However, Ashlock
The test track supposedly requires a varied repertoire of driving skills
Content creation• Creating interesting, enjoyable levels, worlds,
tracks, opponents etc.
• Not the same as well-playing opponents
• Probably the area where commercial game developers need most help
• What makes game content fun? Many theories, e.g. Thomas Malone, Raph Koster, Mihály Csíkszentmihályi
Track evolution
• Using the controllers we evolved to model human players, we evolve tracks that are fun to drive for the modelled player
• Fitness function:
• Right amount of progress
• Variation in progress
• High maximum speed
The collision detection in the car game works by samplingpixels on a canvas, and this mechanism is taken advantageof when the b-spline is transformed into a track. First thickwalls are drawn at some distance on each side of the b-spline, this distance being either set to 30 pixels or subjectto evolution depending on how the experiment is set up. Butwhen a turn is too sharp for the current width of the track,this will result in walls intruding on the track and sometimesblocking the way. The next step in the construction of thetrack is therefore “steamrolling” it, or traversing the b-splineand painting a thick stroke of white in the middle of thetrack. Finally, waypoints are added at approximately regulardistances along the length of the b-spline. The resulting trackcan look very smooth, as evidenced by the test track whichwas constructed simply by manually setting the control pointsof a spline.
D. Initialisation and mutation
In order to investigate how best to leverage the representa-tional power of the b-splines, we experimented with severaldifferent ways of initialising the tracks at the beginningof the evolutionary runs, and different implementations ofthe mutation operator. Three of these configurations aredescribed here.
1) Straightforward: The straightforward initial trackshape forming a rectangle with rounded corners. Each mu-tation operation then perturbs one of the control points byadding numbers drawn from a gaussian distribution withstandard deviation 20 pixels to both x and y axes.
2) Random walk: In the random walk experiments, mu-tation proceeds like in the straightforward configuration, butthe initialisation is different. A rounded rectangle track isfirst subject to random walk, whereby hundreds of mutationsare carried out on a single track, and only those mutationsthat result in a track on which a generic controller is notable to complete a full lap are retracted. The result of such arandom walk is a severely deformed but still drivable track.A population is then initialised with this track and evolutionproceeds as usual from there.
3) Radial: The radial method of mutation, starts from anequally spaced radial disposition of the control points aroundthe center of the image; the distance of each point fromthe center is generated randomly. Similarly at each mutationoperation the position of the selected control point is simplychanged randomly along the respective radial line from thecenter. Constraining the control points in a radial dispositionis a simple method to exclude the possibility of producinga b-spline containing loops, therefore producing tracks thatare always fully drivable.
E. Results
We evolved a number of tracks using the b-spline rep-resentation, different initialisation and mutation methods,and different controllers derived using the indirect playermodelling approach.
Fig. 5. Track evolved using the random walk initialisation and mutation.
Fig. 6. A track evolved (using the radial method) to be fun for the firstauthor, who plays too many racing games anyway. It is not easy to drive,which is just as it should be.
1) Straightforward: Overall, the tracks evolved with thestraightforward method looked smooth, and were just as easyor hard to drive as they should be: the controller for which thetrack was evolved typically made a total progress very closeto the target progress. However, the evolved tracks didn’tdiffer from each other as much as we would have wanted.The basic shape of a rounded rectangle shines through rathermore than it should.
2) Random walk: Tracks evolved with random walk ini-tialisation look weird and differ from each other in aninteresting way, and so fulfil at least one of our objectives.However, their evolvability is a bit lacking, with the actualprogress of the controller often quite a bit from the targetprogress and maximum speed low.
Fig. 7. A track evolved (using the radial method) to be fun for the secondauthor, who is a bit more careful in his driving. Note the absence of sharpturns.
3) Radial: With the radial method, the tracks evolve ratherquickly and look decidedly different depending on whatcontroller was used to evolve them, and can thus be saidto be personalised. However, there is some lack of variety inthe end results in that they all look slightly like flowers.
4) Comparison with segment-based tracks: It is interest-ing to compare these tracks with some tracks evolved usingthe segment-based representation from our previous paper.Those tracks do show both the creativity evolution is capableof and a good ability to optimise the fitness values we define.But they don’t look like anything you would want to get outand drive on.
VI. DISCUSSION
We believe the method described in this paper holds greatpromise, and that our player modelling method is goodenough to be usable, but that there is much that needs tobe done in order for track evolution to be incorporated inan actual game. To start with, the track representation andmutation methods need to be developed further, until wearrive at something which is as evolvable and variable asthe segment-based representation but looks as good as (andis closed like) the b-spline-based representation.
Further, the racing game we have used for this investiga-tion is too simple in several ways, not least graphically butalso in its physics model being two-dimensional. A naturalnext step would be to repeat the experiments performed herein a graphically advanced simulation based on an suitablephysics engine, such as Ageia’s PhysX technology [19]. Insuch a simulation, it would be possible to evolve not only thetrack in itself, but also other aspects of the environment, suchas buildings in a city in which a race takes place. This couldbe done by combining the idea of procedural content creation[20][21] with evolutionary computation. Another excitingprospect is evolving personalised competitors, building on
the results of our earlier investigations into co-evolution incar racing [10].
In the section above on what makes racing fun, wedescribe a number of potential measures of entertainmentvalue, most of which are not implemented in the experimentsdescribed here. Defining quantitative versions of these mea-sures would definitely be interesting, but we believe it is moreurgent to study the matter empirically. Malone’s and Koster’soft-cited hypotheses are just hypotheses, and as far as weknow there are no psychological studies that tell us whatentertainment metric would be most suitable for particulargames and types of player. Real research on real players isneeded.
Finally we note that although we distinguished betweendifferent approaches to computational intelligence and gamesin the beginning to this paper, many experiments can beviewed from several perspectives. The focus in this paperon using evolutionary computation for practical purposesin games is not at all incompatible with using games forstudying under what conditions intelligence can evolve, aperspective we have taken in some of our previous papers.On the contrary.
VII. ACKNOWLEDGEMENTS
Thanks to Owen Holland, Georgios Yannakakis, RichardNewcombe and Hugo Marques for insightful discussions.
REFERENCES
[1] G. Kendall and S. M. Lucas, Proceedings of the IEEE Symposium onComputational Intelligence and Games. IEEE Press, 2005.
[2] P. Spronck, “Adaptive game ai,” Ph.D. dissertation, University ofMaastricht, 2005.
[3] I. Tanev, M. Joachimczak, H. Hemmi, and K. Shimohara, “Evolutionof the driving styles of anticipatory agent remotely operating a scaledmodel of racing car,” in Proceedings of the 2005 IEEE Congress onEvolutionary Computation (CEC-2005), 2005, pp. 1891–1898.
[4] B. Chaperot and C. Fyfe, “Improving artificial intelligence in amotocross game,” in IEEE Symposium on Computational Intelligenceand Games, 2006.
[5] J. Togelius and S. M. Lucas, “Evolving controllers for simulated carracing,” in Proceedings of the Congress on Evolutionary Computation,2005.
[6] ——, “Evolving robust and specialized car racing skills,” in Proceed-ings of the IEEE Congress on Evolutionary Computation, 2006.
[7] K. Wloch and P. J. Bentley, “Optimising the performance of aformula one car using a genetic algorithm,” in Proceedings of EighthInternational Conference on Parallel Problem Solving From Nature,2004, pp. 702–711.
[8] D. Cliff, “Computational neuroethology: a provisional manifesto,” inProceedings of the first international conference on simulation ofadaptive behavior on From animals to animats, 1991, pp. 29–39.
[9] D. Floreano, T. Kato, D. Marocco, and E. Sauser, “Coevolution ofactive vision and feature selection,” Biological Cybernetics, vol. 90,pp. 218–228, 2004.
[10] J. Togelius and S. M. Lucas, “Arms races and car races,” in Proceedingof Parallel Problem Solving from Nature. Springer, 2006.
[11] D. A. Pomerleau, “Neural network vision for robot driving,” in TheHandbook of Brain Theory and Neural Networks, 1995.
[12] J. Togelius, R. D. Nardi, and S. M. Lucas, “Making racing fun throughplayer modeling and track evolution,” in Proceedings of the SAB’06Workshop on Adaptive Approaches for Optimizing Player Satisfactionin Computer and Physical Games, 2006.
[13] D.-A. Jirenhed, G. Hesslow, and T. Ziemke, “Exploring internalsimulation of perception in mobile robots,” in Proceedings of theFourth European Workshop on Advanced Mobile Robots, 2001, pp.107–113.
The collision detection in the car game works by samplingpixels on a canvas, and this mechanism is taken advantageof when the b-spline is transformed into a track. First thickwalls are drawn at some distance on each side of the b-spline, this distance being either set to 30 pixels or subjectto evolution depending on how the experiment is set up. Butwhen a turn is too sharp for the current width of the track,this will result in walls intruding on the track and sometimesblocking the way. The next step in the construction of thetrack is therefore “steamrolling” it, or traversing the b-splineand painting a thick stroke of white in the middle of thetrack. Finally, waypoints are added at approximately regulardistances along the length of the b-spline. The resulting trackcan look very smooth, as evidenced by the test track whichwas constructed simply by manually setting the control pointsof a spline.
D. Initialisation and mutation
In order to investigate how best to leverage the representa-tional power of the b-splines, we experimented with severaldifferent ways of initialising the tracks at the beginningof the evolutionary runs, and different implementations ofthe mutation operator. Three of these configurations aredescribed here.
1) Straightforward: The straightforward initial trackshape forming a rectangle with rounded corners. Each mu-tation operation then perturbs one of the control points byadding numbers drawn from a gaussian distribution withstandard deviation 20 pixels to both x and y axes.
2) Random walk: In the random walk experiments, mu-tation proceeds like in the straightforward configuration, butthe initialisation is different. A rounded rectangle track isfirst subject to random walk, whereby hundreds of mutationsare carried out on a single track, and only those mutationsthat result in a track on which a generic controller is notable to complete a full lap are retracted. The result of such arandom walk is a severely deformed but still drivable track.A population is then initialised with this track and evolutionproceeds as usual from there.
3) Radial: The radial method of mutation, starts from anequally spaced radial disposition of the control points aroundthe center of the image; the distance of each point fromthe center is generated randomly. Similarly at each mutationoperation the position of the selected control point is simplychanged randomly along the respective radial line from thecenter. Constraining the control points in a radial dispositionis a simple method to exclude the possibility of producinga b-spline containing loops, therefore producing tracks thatare always fully drivable.
E. Results
We evolved a number of tracks using the b-spline rep-resentation, different initialisation and mutation methods,and different controllers derived using the indirect playermodelling approach.
Fig. 5. Track evolved using the random walk initialisation and mutation.
Fig. 6. A track evolved (using the radial method) to be fun for the firstauthor, who plays too many racing games anyway. It is not easy to drive,which is just as it should be.
1) Straightforward: Overall, the tracks evolved with thestraightforward method looked smooth, and were just as easyor hard to drive as they should be: the controller for which thetrack was evolved typically made a total progress very closeto the target progress. However, the evolved tracks didn’tdiffer from each other as much as we would have wanted.The basic shape of a rounded rectangle shines through rathermore than it should.
2) Random walk: Tracks evolved with random walk ini-tialisation look weird and differ from each other in aninteresting way, and so fulfil at least one of our objectives.However, their evolvability is a bit lacking, with the actualprogress of the controller often quite a bit from the targetprogress and maximum speed low.
evolutionary selection can be seen to guarantee the top speed not to be droppedin favour of the other fitnesses.
In figure 3 are displayed three tracks that evolution tailored on the playermodel of two of the authors; track ((a)) is evolved for a final progress of 1.1(since the respective human player was not very skilled), track (b) and (c) areinstead evolved on a model of a much skilled player for final progress 1 1.5. Fortrack (a) and (b) all the three fitness measure were used, while for track (c) onlyprogress fitness was used.
The main di!erence between tracks (a) and (b) is that track (a) is broaderand has fewer tricky passages, which makes sense as the player model used toevolve (a) drives slower. Both contain straight paths that allow the controller toachieve high speeds. In track (b) we can definitely notice the presence of narrowpassages and sharp turns, elements that force the controller to reduce speedbut only sometimes causes the car to collide. Those elements are believed tobe the main source of final progress variability. These features are also notablyabsent from track c, on which the good player model has very low variability.The progress of the controller is instead limited by many broad curves.
Fig. 3. Three evolved tracks: ((a)) evolved for a bad player with target progress 1.1,(b) evolved for a good player with target fitness 1.5, (c) evolved for a good player withtarget progress 1.5 using only progress fitness.
7 Conclusions
We have shown that we can evolve tracks that, for a given controller, will yield apredefined progress for the car in a given time, while maximizing the maximum1 The target progress is set between 50 and 75 percent of the progress achievable by
the specific controller in a straight path. As a comparison, in Formula 1 races thisratio (calculated as ratio between average speed and top speed) is about 70 percent,and for the latest Need for Speed game it is between 50 and 60 percent.
Video: evolvedTORCS drivers
Video: real car control
More on these topics
• http://julian.togelius.com
• e.g. Togelius, Lucas and De Nardi: “Computational Intelligence in Racing Games”
• Togelius, Gomez and Schmidhuber: “Learning what to ignore” on Friday, 11.10, room 606
• Car Racing Competition on Tuesday 15.00, room 402