Evolutionary Algorithms have been used with much success for the automatic generation of programs. In particular, Koza's [Koza 92] Genetic Programming has enjoyed considerable popularity and widespread use. Koza's method originally employed Lisp as its target language, and others still generate Lisp code. However, most experimenters generate a homegrown language, peculiar to their particular problem.
Many other approaches to automatic program generation using Evolutionary Algorithms have also used Lisp as their target language. Lisp enjoys much popularity for a number of reasons, not least of which is the property of Lisp of not having a distinction between programs and data. Hence, the structures being evolved can directly be evaluated. Furthermore, with reasonable care, it is possible to design a system such that Lisp programs may be safely crossed over with each other and still remain syntactically correct.
Evolutionary Algorithms have been used to generate other languages, by using a grammar to describe the target language. Researchers such as Whigham [Whigham 95] and Wong and Leung's LOGENPRO system [Wong 95] used context free languages in conjuntion with GP to evolve code. Both systems exploited GP's use of trees to manipulate parse trees, but LOGENPRO did not explicitly maintain parse trees in the population, and so suffered from some ambiguity when trying to generate a tree from a program. Whigham's work did not suffer from this, and had the added advantage of allowing an implementor to bias [Whigham 96] the search towards parts of the grammar.
Another attempt was that of Horner [Horner 96] who introduced a system called Genetic Programming Kernel (GPK), which, similar to standard GP, employs trees to code genes. Each tree is a derivation tree made up from the BNF definition. However, GPK has been criticised [Paterson 97] for the difficulty associated with generating the first generation - considerable effort must be put into ensuring that all the trees represent valid sequences, and that none grow without bounds. GPK has not received widespread usage.
An approach which generates C programs directly, was described by Paterson [Paterson 97]. This method was applied to the area of evolving caching algorithms in C with some success, but with a number of problems, most notably the problem of the chromosomes containing vast amounts of introns. This method will be more fully discussed in section 3.
We describe a different approach to using BNF definitions, and develop a system that evolves individuals containing no introns. This system can be used to evolve programs in any language. We adopt the approach that the genotype must be mapped to the phenotype [Keller 96] [Ryan 97a] [Ryan 97b] [Gruau 94] rather than treating the actual executable code as data. In this respect we diverge from Koza's approach, but with the result that the individuals tend to be much smaller.