Back

Results and Conclusions:

     We apologize, the mentioned graphs and figures are not yet available on our webpage.

     We ran the program four times, each run having different settings. In the first run, both pheromones and task specialization were available to the ghosts. They could communicate with each other and run differing sets of code within the team. In the next, run we changed the parameters so the ghosts couldn’t use pheromones this made communication impossible. We did the third run with pheromones available, but task specialization was turned off. The ghosts could communicate but they all ran the same program within the team. In the fourth run, the ghosts were not able to use pheromones or task specialization. To ensure statistical accuracy, we repeated each of the four different runs several times. We maintained the same parameters such as maze size, number of rounds per generation, and number of Pacmen in the maze. Since the code for the ghost teams is randomly generated at the beginning of a run, there was moderate variation in the results. However, the new sets of data for the repeated tests were consistent with our original findings in the first series of runs.

     The results of our test runs are shown in Figure 1. The graph of fitness vs. number of generations for the four parameter variations shows three main trends. The first trend is a general increase in fitness as the number of generations increases. That is, as time goes on, the ghosts are able to catch more Pacmen than the previous generation. That trend demonstrates that our evolution method is effective.

     The second trend is that the rate of evolution decreases over time. The differences in fitness between consecutive later generations are much smaller than the differences between consecutive earlier generations. Because of this, the graphs flatten out at a particular fitness, perhaps indicating that the ghosts have evolved to near perfection or that a fitness of 10 (the maximum achievable fitness) is unattainable given the particular set of parameters of the program.

     The third trend involves the demonstration of punctuated equilibrium. Punctuated equilibrium is the theory that evolution does not occur as a result of gradual changes, but instead it occurs through large changes between periods of little change. Early in the graphs, the fitnesses rise extremely quickly. The curve is much flatter both before and after these large changes. This differs from our original prediction that the change would be a more gradual curve.

     Looking beyond the general trends on the fitness versus number of generations graph, we see marked differences between the different runs. The most obvious difference is the highest fitness that the respective routines obtain. The highest fitness occurred in the run where the ghosts are able to use pheromones and task specialization. When we eliminated the ghosts’ ability to communicate by turning pheromones off, the ghosts were not able to catch as many Pacmen as those who could communicate. The same thing happened when their ability to evolve task specialization is taken away.

     What is fascinating, though, is that when both pheromones and task specialization are taken away, the ghosts evolve to be better than the two runs where the ghosts lacked either pheromones or task specialization. This indicates that if the ghosts use either pheromones or task specialization, they require the other factor to function well.

     Another difference between the four graphs is the rate of evolution. Figure 2 shows an expanded view of generations 1 through 30. All of the runs with one or both factors (pheromones or task specialization) turned off rise earlier and faster than the run with both factors on. This is useful information because while these three routines do not achieve as high a fitness as the routine with both factors on, they achieve their maximum fitness much faster. In general, the more simple a routine is, the faster it can evolve since there are fewer variables to accommodate in each generation. It might be useful to turn off certain factors in the program so that a better routine will evolve faster than one with all the factors turned on.

     Figure 3 shows the how the performance of the ghost teams improves as the program runs. The figure shows the fitness of each of the 400 teams in the generations 1, 50, and 250 ranked in ascending order. The distribution from generation 1, where the routines were essentially random, has uniformly low values. By generation 50, the overall fitness has risen considerably and the distribution is no longer as flat. However, there are quite a few ghost teams that are still not performing well relative to the average. The distribution of the final generation indicates a much higher overall fitness than generation 50, and there are many fewer weak teams compared with the average.

     The graph of ghost program length versus number of generations (Figure 4) shows some interesting features of genetic programming. Genetic programming is notorious for producing "trash code," or code that serves no purpose (such as an if statement that always evaluates to false, leaving the second argument to always occur). The program lengths were taken from the run with pheromones and task specialization available to the ghosts. The graph fluctuates considerably but maintains a general upward trend that flattens toward the end. The programs never approach the maximum program length of 800 commands (which we dictated) indicating that we allotted enough space for the amount of time we ran it. When compared to the graph of the fitnesses vs. generation (Figure 1), the general trend shows that the program lengths tend to get longer as the fitness increases. However, the huge fluctuations show that shorter routines are able to perform as well or better then long ones. Although longer programs aren’t considered less efficient in terms of our fitness scores, they do take longer for a processor to run. As programmers we would like our code to be not only efficient in terms of fitness, but in terms of processor usage as well. In the future we might use program length as a factor in fitness.

     We also noticed several trends in the evolved teams of ghosts. First of all, they heavily favored relative directions (Forward, Backward, Left, Right) over absolute direction (North, South, East, West). In non-specialized ghosts, this difference was even more pronounced. We expected this because when task specialization is on the teams could theoretically evolve a strategy such as "send two ghosts to the top of the maze and two to the bottom." The ghosts in Appendix D employed a similar strategy. They sent one of their team members to the lower right hand corner and simply left him there to scare the Pacmen to the upper left-hand corner.

     Evolved ghosts of both types made heavy use of "if" conditional statements. While only one in nine random ghosts contains an if statement, almost all evolved ghosts do. Without if statements, the ghosts could only have one possible behavior. Especially in runs with task specialization disabled, most teams will require multiple behaviors from their ghosts (the team in the appendix does have one ghosts without an if statement).

     The evolved ghosts often derived information they needed through unconventional methods. For example, a ghost would occasionally form a statement like: "If squares to dot in direction forward is greater than 56, then..." That particular example was evolved on a 20 by 20 maze. So how could that if’s condition ever be satisfied? The answer lies in an implementation issue that the ghosts learned how to exploit. Functions such as squares-to-dot would simply return a huge value, the number of squares in the entire maze, if there was no dot in the specified direction. Thus the above program really asks the questions "are there any dots at all in front of me?" In a way that is just as effective as the way a human programmer would do it, by using the function "number of dots in dir."

Next