archives

A language for blind uncomprehending idiots who have no idea how programs work.

I'm at present working on a thing which isn't really a programming language, because it's to be used by a thing which isn't really a programmer.

This is part of a machine learning project. Each 'program' is a genome which is operated on by a genetic algorithm. The genetic algorithm of course is the aforementioned blind uncomprehending idiot that has no idea how programs work. It combines and mutates genomes at random; the fact that these genomes code for programs is irrelevant to it.

The PL challenge is to build a programming language (genome encoding) that has the evolvable properties that will enable a GA to make progress. Ordinary programming languages (Koza's work with LISP included) are extraordinarily poorly suited to evolutionary methods.

What is required for evolvability:

First, small changes in the program (usually) make small changes in semantic behavior;

Second, combining elements of different programs results in a new one that's reasonably likely to have semantics similar to both (or all) parents, although there must be a chance that it does something completely different and unexpected instead.

Third, the genetic encoding must support the production of programs that are arbitrarily short or long, arbitrarily simple or complex, and must be able to get there mostly in 'baby steps.'

Fourth, there must be significant retention and potential recombination of structure even when that structure does not semantically affect the current program. (non-coding or 'junk' DNA, alias 'introns', is a reservoir of potentials that may be invoked in the case of mutations and usually code for something that's useful when that particular mutation happens.

Fifth, absolutely random genomes must make a program that has a definite semantics - any runtime or compile error is instantly lethal to that genome.

Sixth, tree representations with branch-swap combination operations are largely considered to be a failed experiment at this point, because they sort of failed many of the evolvability criteria given above. Most combinations did not express anything close to the semantics of any parent, and mutations weren't usually 'small' changes as measured in semantics. There wasn't much of a 'smooth gradient' that genetic algorithms could climb. So, it's time to build something else.

Our blind uncomprehending idiot is attempting to debug the code. It has lots of patience and lots of energy, but absolutely no clue how the genomes are related to semantics; it just has a 'combination' operator and a 'mutation' operator and gets fed 'solution effectiveness' information that it uses to determine which genomes to favor or disfavor.

Now, what kind of programming language should the blind uncomprehending idiot use? Remember that readability, syntactic cleanliness, intuitive design, etc, score absolutely zero here. The programmer can't read, has no idea what syntax is, and has no 'intuition' as such. Its efforts to debug mostly amount to slicing programs at random places and in random directions and sticking them together with bits it sliced from other genomes.

I've managed to build a 'language' that seems to have good evolvable properties for GAs and specifies neural networks. It is, however, horrifying to human sensibilities.

The genomes are metaprograms; A path through them is traversed (each state contains a 'next state' pointer) and any 'output bits' stored at states along the path are output in sequence and become part of the input to the neural network builder. So the genome works something like a funge, and the output of running the funge is a linear program.

The 'structure' results from the path taken through the genome funge. The path is specified by the "next operation" addresses in each node. If part of a path is particularly important then 'introns' can arise to protect it from minor mutations. Such introns would take the form of next-instruction pointers in nodes near the path node, pointing back onto the path. Mutations to instruction pointers usually change the destinations by a small distance in funge-space so the idea of introns in nodes that are 'near' the path makes sense.

I've been pretty careful with the format of the bit strings accepted by the neural network builder, and no catenation of output messages results in an outright error. It might result in a completely useless network, but never an error. And Artificial neural networks are fairly forgiving in terms of retaining most of their functionality when a few nodes here and there are changed or missing or when a few new ones appear, so this
seems more "reasonable" than trying to evolve Lisp programs using subexression exchange.

I think it would be possible to use the same funge-ish mechanism to create output strings that are read as other kinds of program as well, with a different kind of 'interpreter' than the one that builds neural networks. But my confidence is less, because most other kinds of 'program' are far less forgiving of minor structural changes. If you were doing any semantics other than constructing neural networks, how would you do it?