Genetic algorithms vs. genetic programming - PLT perspective?

Given definitions of genetic algorithm and genetic programming, some PLT-inspired observations:

  1. As interpreters are a special case of programs, then genetic programming is a special case of genetic algorithms. If this is correct, then some statements from referred articles at least need rewording (e.g., computational complexity).
  2. It is (probably) interesting to investigate what are requirements for PLs being used as object language in genetic programming (e.g., set of (more or less) valid programs closed under mutations).
  3. Related to the previous one - typing issues.
  4. DSL perspective
  5. Reflection, staged computation, etc. (bring in the slider).

Am I barking up the fallen tree? Any comments are appreciated.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Convergence?

I think it's hard to talk about GP seriously without some sort of strict demonstration of convergence.

Yes

Given the brevity of your comments I think this is an appropriately brief response.

Now, more seriously, the first three at least are current issues in GA/GP work. As time (which is directly proportional to speed) is the main issue in most work fancy PL theory hasn't yet had much impact.

concatenative languages might

concatenative languages might be ok for genetic programming. as you imply, the trick is to make the language as robust to re-arrangement as possible - since concatenative programs are "just a string of terms" they might work moderately well, but there's still a problem with the types on the stack. can tcl (everything is a string) be used in a concatenative manner?

anyway, some practical results, in case they're useful:

i've twice had some success (ie the programs gave interesting results) with genetic programming. in both cases i used very restrictive, domain specific languages that described directed graphs. the nodes processed values of a single type.

in broad terms: rearranging the program order changed the arrangement of arcs; changing a "line" of the program changed the pattern of arcs and/or the processing at a node.

in one case the data were binary 0 or 1 with implicit timing information - the nodes were similar to "logic gates" and there was a clock signal that triggered the carry-forwards from one node to another. so with cyclic graphs feedback occurred. the aim was to generate rhythmic patterns.

in the other case the data were coordinates on a sphere. these could be added, etc, while remaining valid (closed group under the the various operations). initially there was a mapping from 2D rectangular coordinates and finally there was a mapping to colour. so this generated 2D coloured images.

(the other, equally difficult, half of the problem is finding a way to quanitfy the quality of the competing results.)

remember that a node may have

remember that a node may have 0 to many incoming arcs (in my approach). so one solution is to have a node contain both an "initial value" and a "binary function", and then fold the function over the input values. if the values and operation form a closed group then everything if guaranteed to work.

i can't remember the exact details, but i think only my points-on-a-sphere code worked like that.

oh, and you can also rewrite

oh, and you can also rewrite the graph dynamically. for example, i associated nodes and arcs with points on a sphere. the arc is then "connected" to the nearest node. since the data values are also points on a sphere you can "jump to" a value.

(this gives you non-linearity, even if your operations at nodes are linear).

one final comment - the "rhyt

one final comment - the "rhythm" code, which generated a sequence of binary values - is "general" in that it can be used to find a solution to "any problem", providing you find the appropriate mapping from the binary output to the space of interesting representations (eg for text, interpreting it as a sequence of 7 bit ascii values).

saves you worrying so much about the programming - the problem is just one of encoding.

also, of course, the output is cyclic, with a maximum period depending on the size of the graph (number of arcs, since they store state). which is only to be expected.

Clarification

I agree that my initial post is overly terse and cryptic (though not expressive :-) ).

I posted because I was intrigued by this sentence from the definition of GA:

GP algorithms typically require running time that is orders of magnitude greater than that for genetic algorithms, but they may be suitable for problems that are intractable with genetic algorithms.
My understanding is that GP is a special case of GA, so both statements of the quote (computational complexity and intractability) come as a surprise. This lead me to question my understanding of GA and GP, and as I had several spare minutes between discussions on MDA, I scetched a few PLT-related issues I imagined on the spot, and logged them on LtU to not forget them. While my only question was whether all GAs are GPs as well, it is quite off-topic on LtU, and instead I hope to hear some personal experience stories (like that of Andrew, thanks!) on overlap between GA/GP/evolutionary thingies and PLT. I don't have any specific expectations, it just seems like a good topic for anecdotes.

ok, here's my take:

ok, here's my take:

you can divide the inner loop of genetic algorithms into two steps: generating a child; evaluating the child. in the simplest cases, evaluating a child of length n takes O(n). but when the child is a program with "complexity" n, then evaluation is likely much more expensive.

so typically gp is orders of magnitude more expensive than the most brain-dead trivial ga problems because the inner evaluation loop takes longer. but it's a continuous spectrum.

also the ga/gp distinction depends, to some extent, on an artificial division between "evaluating a program" and "measuring the fitness of the program's output". you can take even the simplest ga output as not a result that needs to be measured, but as a program whose execution gives you the fitness. so everything is gp.

so, to take my rhythm generating example, you could imagine three different approaches. at the simplest (ga) extreme, the state is a binary sequence that represents therhythm directly (bang drum on "1") and is treated as "dna". at the most complex extreme (gp) it's a program that generates a sequence of values. but in between you might choose a more complex representation like the fourier transform of the final rhythm. that can be seen as ga or as gp.

the choice depends on the complexity you are looking for - in a sense, the degree of long-range correlation you want in your output. a ga is going to have a very hard time generating a regular rhythm. a fourier transform might tend to be too regular. with luck a program that models a digital circuit with components that tend to build "oscillators" might get you something that sounds good....

hmm. i guess that's obvious! (this is your slider?)

GP as a stochastic language

I do not think that GP is a special case of GA. GA requires a fixed genotype -> phenotype encoding where the phenotypes are objects in the solution space e.g. function calls with certain argument tuples. Once you have fixed this mapping the algorithm acts on a discrete and finite solution space. Within the GP setting you manipulate the phenotypes directly and mostly on a functional expression level. GP is a stochastic method to create terms in a formal language consisting of some alphabet e.g. {1,x,y,sin,exp,log} and compositional operators e.g. {add,mul,compose}. The set of terms that can be created by GP is unbounded. That's the main reason why GP requires more runtime resources: the terms can grow arbitrary large and they usually do so. There is no intrisic criterion to cut the solution space down or partition it into regions.

Kay

You're correct (if I am)

My understanding is that GP is a special case of GA, so both statements of the quote (computational complexity and intractability) come as a surprise.

I believe your understanding is correct, and the distinction made between the two is due to historical reasons. My colleague has written a paper addressing this issue.

WHILE := FOR

From the paper:

Usually a maximum depth or size of tree is imposed in
GP to avoid memory overflow (caused by bloat).

That's a funny argument. It's like asserting that FOR and WHILE are equivalent because with limited computer resources you can simulate each WHILE with a FOR, a sufficient enough loop parameter bound and a BREAK. Or one obtains that integrals are just finite sums because one cannot calculate infintitely many terms practically. Usually mathematicians look for somewhat more profound reasons before they equate things but boys who know typing on a computer keyboard may have a different attitude.

Kay

Garbage in - garbage out

Usually mathematicians look for somewhat more profound reasons before they equate things but boys who know typing on a computer keyboard may have a different attitude.
Validity of any statement of whether GA and GP are the same or not depends on their definitions. As there is no formal widely accepted definition of either GA or GP, I would not expect a formal proof of their relationship. Meanwhile, I agree, if you define GA to operate on a finite domain, and GP on an infinite (though countable) one, then they are different by definition. I suspect the author of the paper suggested to come to some other definitions.

To bring this more or less to PLT topic: it looks like Evolutionary Computation (EC) community might benefit from cooperation with PLT community - denotational semantics alone could bring at least a unification of vocabulary. Acquiring a taste for category theory may be a useful side effect as well. Applications of metaprogramming ideas look promising.

On the other hand I believe EC people have a lot of great ideas just waiting to be generalized in context of PLT.

my (amateur, non-biologist) u

my (amateur, non-biologist) understanding is that the solutions that evolution "arrives at" are often rather ad-hoc and, from a mathematical point of view, quite inelegant. so while arbitrary limits on depths of recursion (or the need for a stack at all) might disturb a mathematician of delicate sensibilities, that doesn't mean that it won't work.