Evolved Turing neural networks - Unorganized machines and the brain

(This is Alan Turing's 100th birthday. And I was delighted to finally stumble upon this that I had missed thus far and I find utterly interesting.)

Evolved Turing neural networks

by Craig Webster and William Fleming

In a report written in 1948 Alan Turing proposed the use of a genetic algorithm (GA) to “train” a particular type of neural network he called a B-type. Despite the apparent advantages of these networks Turing’s proposal has remained undeveloped until now [...]

Evolved design

Evolution is not a rational process and often yields solutions which are unexpected and quite different to those designed in a minimal, rational manner. As a result, evolved solutions are often very difficult to understand. For the same reason it is usually a mistake to second guess, or otherwise constrain, the “solution space” beforehand when tackling complex problems with GAs. [...]

Evolution selects only for the function of the overall network, not for the tidy compartmentalisation of function into individual components. Neural networks of all kinds and the products of GA design have the tendency to quickly become algorithmically opaque for this reason. The advantage of GAs, of course, is that evolution often works when no rational or minimal solution is known, or where finding an optimal solution by other methods would take an unreasonably long time (for example, the Travelling Salesman Problem which can take longer than the lifetime of the universe to solve).

Unorganized machines and the brain

In a fascinating and farsighted report written in 1948 Alan Turing suggested that the infant human cortex was what he called an unorganized machine[...]

Turing defined the class of unorganized machines as largely random in their initial construction, but capable of being trained to perform particular tasks. There is good reason to consider the cortex unorganized in this sense:
there is insufficient storage capacity in the DNA which controls the construction of the central nervous system to exactly specify the position and connectivity of every neurone and by not hard-wiring brain function before birth we are able to learn language and other socially important behaviors which carry great evolutionary advantage [...]

Link to the November 1946 letter of Alan Turing to William Ross Ashby, advising him to experiment with these ideas by using the ACE.

Also :

Turing's B-type neural networks and Turing's Last Programs (on AlanTuring.net), and

The Alan Turing Home Page (by Andrew Hodges)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Against Genetic Algorithms

I have often heard (here and possibly elsewhere) that GA are ineffective, that other meta-heuristic (such as hill climbing) consistently outperform them. However, I do not remember the references/sources for that (potentially controversial, given the number of people enamored with GA) statement. Do you have references to article making those comparisons, against or in defense of genetic algorithms ?

I, too, have some

I, too, have some difficulties to find such references you ask. To me it seems GAs, and also especially Turing's evolutionary networks, simply haven't been given enough research effort as compared to other disciplines, for one to make any fair statement. I believe it's still mostly speculative for us just because of lack of objective data, in one way or the other. Compucology.net does give an example application and rationale in the context of chip design, though.

One way a evolutionary algorithms can be designed to fail

If crossover acts as a more destructive form of mutation, (as I've seen in some applications of genetic programming) then hillclimbing should be just as effective and a good deal more efficient. For crossover to be useful, traits from each parent that effect fitness have to be inherited intact. You can't just toss both parents in a blender and hope for the best.

However, I cannot believe that hillclimbing consistently outperforms genetic algorithms, because the potential problem sets are so inconsistent that there have to be some that hillclimbing is less efficient at.

Thanks for the feedback.I

Thanks for the feedback.

I am not familiar enough with either side of the argument to make a useful enough judgement if only for myself, so far.

But what did attract my attention and that I found rather surprising (not to say more) is that claim in the first article I've linked to, stating "[...]these networks [of] Turing’s proposal [have] remained undeveloped until now".

I mean, it is historically known that even though Turing introduced us to the concept of his machines (including the universal one) as early as his 1936 paper "On Computable Numbers, with an Application to the Entscheidungsproblem", this is not until the mid 40s (supposedly around 1945 / 1946), thus a decade later, when John von Neumann started to develop upon Turing's model of computation, that Turing's works on his machines really took off among the worldwide research community.

And, from then on, the Turing machines of his 1936 paper have generated all the other research offsprings built upon them that we know of today, practically spanning more than 6 decades non-stop.

On the other hand, Turing's B-type machines, among the very last research works that Turing was busy with during his last 6 or so life years (works he had started a couple years after the war had been won by the allies(*), roughly at the same time von Neumann was really "jumping in" re: Turing machines) seemingly haven't attracted, at all, the same interest thus far. Or not until the recent mid 90s, maybe (?)

That's what I found curious.

EDIT
(*)and largely, thanks to him!

I have used GAs and experienced this....

In the usual use case, Genetic Algorithms are less efficient than hill-climbing with restarts.

The usual use case is where you map a bitfield to some data structure that expresses potential solutions of a particular type in a simple way, and attempt to evolve that using your best guesses as to the parameters of the evolution.

In order to get better performance from Genetic Algorithms (than is available via Hill-climbing with restarts), it is necessary either to have both the ability to guess good parameters of evolution for a problem and have more understanding of the problem and its solution space, or, ironically, to have less understanding of the problem and its solution space.

If you have more understanding, you will be able to "lay out" the genes of the Genetic Algorithm so that its combining operator (subtree swap, crossover, or whatever) is most likely to keep related values together, or swap things for other things that make some sense in each other's context. For example, if your GA is manipulating a tree of program code, you may have a "subtree swap" combiner operation that limits itself to subtrees returning the correct numeric types, in the same approximate scale. Or if your Genome is a simple vector of piece values to be used in a minmaxing chess player, you will want to put closely related values (such as the values of pawns under various circumstances) close together in the GA so that they are unlikely to be separated by the "crossover" operator.

But that's only half the battle. As to the rest of it you have the parameters of evolution to decide, and at this stage of understanding, deciding those parameters is truly a black art. You have to determine your population size, your rate of replacement, your degree of elitism or "evolutionary pressure", your rate of mutation, and the basic mechanics relating one generation to the next.

Each time a generation Y succeeds a generation X, Y contains new members which are the offspring of members of X, and these new members replace some members of X which are not members of Y. The fraction of the population size that gets replaced is your rate of replacement. You have to decide how often a "mutation" or random change occurs in the new population. That is the mutation rate. You also have to decide how much of a "survival edge" and/or "breeding advantage" is granted to members by higher scores on the fitness function. That is the elitism or evolutionary pressure. And finally you have to decide how new population members are produced from old ones; that is the basic mechanics.

If your "elitism" is total, meaning, only the very most fit individual ever reproduces, then you rapidly reduce the population to a series of copies of this one individual and no evolution takes place, save by hill-climbing on your mutation function. If your "elitism" is zero, meaning fitness score is not relevant to deciding which individual/s have offspring, then you get random, undirected variation, not evolution. The question is at what rate to eliminate the "inferior" individuals. These individuals may carry rare genes that would, in some combination you haven't seen yet, be better than your most-fit individuals. If you eliminate them the first time they show up, or never allow "inferior" individuals to create offspring, you never get to see those combinations and your evolution goes nowhere. But if you don't bias the breeding/survival at all in favor of fitness, that's another way to get nowhere.

The problem is that optimal values for these parameters - population size, replacement rate, mutation rate, and elitism - vary from problem to problem and genome layout to genome layout; it is often claimed to be better to vary these parameters even at different generations in the same run. It's very very hard to predict what the best values for them are. In order to get performance better than hill climbing with restarts, you have to guess good values *and* have basic mechanics of reproduction that are likely to preserve related values together.

You can also get better results with a GA when you *don't* have any idea what the solution you're after looks like. Genetic Algorithms with heavily "meta" genomes and ridiculously vague fitness functions can produce novel solutions you'd never have imagined were part of the solution space at all, which is something that I haven't ever seen demonstrated via hill-climbing.

For example "life on Earth" can be considered as a set of genetic-algorithm solutions to some ridiculously vague fitness function about converting matter into copies of other matter, in the presence of some local energy sources and gravitation. But it wasn't necessary to specify the *process* by which matter was to be converted into copies of other matter, or even the mechanics of reproduction. The fitness function doesn't know anything about DNA or the layout of genomes at all, let alone sex; sex is just one of the wonderful things, like vision, intelligence, and a sense of smell, that was "invented" along the way as an enabler for a certain set of solution classes to be expressed.

But this latter use of GA? Where you truly *don't* understand the solution space well enough to write a hill-climber? It isn't fast. Not by a long shot.

Ray

Configuring the Genome

you will be able to "lay out" the genes of the Genetic Algorithm so that its combining operator (subtree swap, crossover, or whatever) is most likely to keep related values together

It seems that permutation of genome structure should also be a common mutation, i.e. the GA exploration should itself eventually determine which traits should be kept together.

Perhaps you were looking for

Perhaps you were looking for this? The paper I mentioned there is When Will a Genetic Algorithm Outperform
Hill Climbing?

Yes, I think it is the post

Yes, I think it is the post I was remembering, thank you.

I have looked at this paper a bit more deeply, and tried to find more resources on that topic (rigorous comparison of GA versus other metaheuristics; there are a lot of surveys on GA alone, but that's not what I want), but I couldn't find anything very conclusive, that seemed to make consensus. Here are the most relevant links I've got:

1. a rather trumpeting reply to this paper by Prügel-Bennett in 2004: When a Genetic Algorithm Outperforms Hill-Climbing (PS)

A toy optimisation problem is introduced which consists of a fitness gradient broken up by a series of hurdles. The performance of a hill-climber and a stochastic hill-climber are computed. These are compared with the empirically observed performance of a genetic algorithm (GA) with and without. The hill-climber with a sufficiently large neighbourhood outperforms the stochastic hill-climber, but is outperformed by a GA both with and without crossover. The GA with crossover substantially outperforms all the other heuristics considered here. The relevance of this result to real world problems is discussed.

2. A theoretical discussion of the importance of crossover in a problem related to the one discussed in the article you cited: Jansen and Wegener's 2004 Real royal road functions—where crossover provably is essential

Mutation and crossov r are th main s arch op rators of different variants of evolutionary algorithms. Despite the many discussions on the importance of crossover nobody has proved rigorously for some explicitly defined fitness functions fn : {0 , 1 }n -> R that a genetic algorithm with crossover (but without idealization)can optimize fn in expected polynomial time while all volution strategies based only on mutation (and selection) need expected exponential time. Here such functions and proofs are presented. For some functions one-point crossover is appropriate while for others uniform crossover is the right choice.

3. A rather recent StackExchange discussion: Why has research on genetic algorithms slowed?. Provides some content, but I personally found it inconclusive.

While discussing some intro level topics today, including the use of genetic algorithms; I was told that research has really slowed in this field. The reason given was that most people are focusing on machine learning and data mining.
Update: Is this accurate? And if so, what advantages does ML/DM have when compared with GA?

Well (to keep a taste of controversy), I'm happy to see that type systems is not the only research field that is so intrisically fascinating to practitioners that they have, maybe, not bothered enough establishing empirical proof of relevance for skeptical outsiders.

In the first paper, by Adam Prügel-Bennett, there is a mention page 15 of a hybrid combination of GA and a "powerful mutation operator (such as Tabu search or a fast hill-climber)" that "has outperformed all other heuristics on a set of standard problems which has been competitively tested by a large number of authors", on the problem of graph colouring. Are such techniques used in register allocation nowadays?

Such techniques are not used

Such techniques are not used in register allocation, as far as I know.

Type systems are in a much better state than GAs ;)

I have not found a comparison of GAs with hill climbing on non-toy problems. What has happened in the GA literature is that first there was a lot of hype about them, which caused many people to try them. Lo and behold they worked somewhat, like any algorithm that somehow selects for higher fitness. Then after a while GAs became a subject matter in and of itself, which caused 3 types of research papers to appear: (1) papers that theoretically "prove" that GAs work on certain classes of problems (2) research papers that started looking for toy problems where GAs outperform other algorithms (3) papers that apply GAs on slightly less toy problems but didn't compare with any other algorithm. But as far as I know, out of the probably thousands of papers on GAs, there has not been a single paper that demostrates that GAs outperform hill climbing on a non toy problem (or at least I was not able to find it -- if somebody here has found one that would be great).

Note that the paper that I linked to is from the the most famous people in genetic algorithms, including Holland himself. The paper predictably concludes like it's a victory for genetic algorithms. The contents of the paper don't support that at all:

There is a class of optimization problems where GAs were thought to outperform hill climbing. The problem is basically the following. The space to be searched is the space of bit strings of length k*n. The fitness of a bitstring is as follows: cut up the string in k pieces of length n, and count the number of pieces that have all bits set to 1. That number is the fitness. So the optimum is the bit string of all ones. This problem has been widely used as an example where GAs work well. However then somebody actually did the experiment, and found out that hill climbing worked >10x better. So the paper sets out to find another toy problem where GAs do work better. After several tries they come up with a problem, though I'm not exactly sure what the problem is because it's described in extremely vague terms. Nowhere in the paper do they actually describe the problem clearly. Anyway, after extremely careful tuning of the parameters of the GA, and not tuning the parameters of the hill climber, the GA outperforms the hill climber by a factor of 2. For example for the hill climber they use a very low mutation rate, which leads to a very slow algorithm. Because the paper does not describe the problem accurately I was not able to do an independent tuning of the hill climber, but I'm pretty sure that if you double the mutation rate then the algorithm will work almost 2x faster. Given that there is so much trouble to come up with a toy problem where GAs actually work, the conclusion of the paper should have been that GAs don't work. Even if there is a class of problems where GAs do outperform hill climbing, the question is if that class would ever come up in the real world.

Constrast this with machine learning, where instead of trying to find toy problems to fit your favorite algorithm, they try to find algorithms to fit a problem. People always compare their algorithms against not just a low baseline, but actually against the state of the art. For example for the mnist handwriting recognition problem (http://yann.lecun.com/exdb/mnist/) there is a long history of algorithms that improved upon previous algorithms. Note also that this is a real world problem. If GAs are really as good as they are hyped to be, then why is the field not in the same state as machine learning?

Anyway, maybe this is far to harsh to GAs, I'd love to be proven wrong. But at the very least, try a proven and non hyped algorithm like hill climbing or beam search as well, if you have to solve an unstructured optimization problem.

Darwintunes.org

GAs fill a small niche in AI. I myself have been underwhelmed by their performance as students performed various experiments with them. They have an appeal, certainly to students, since the concepts are remarkably simple.

Personally, I believe they are better for usage in cases where the domain model is close to nature. Like, tree shaping algorithms, or music. The BBC aired an experiment done on darwintunes.org, where you can actively participate on modelling a piece of music.

Regarding Turing's comment. I think that, explained with current day knowledge, his remarks are closer to that he expected the brain to be like a neuron computer, not a GA. Personally, I believe that neural computing is where the next breakthrough will be in IT, though they have problems constructing one. Which I don't really understand; a fruitfly has about 100,000 neurons and can do all kinds of things 'out of the box' which are nearly impossible for von-Neumann style computers. Certainly they can get a measly 100,000 neurons on a chip by know, or can't they? No idea, really...

Related

reddit's askscience is a great place to lurk; e.g.,

Also, a couple of recent events on the usefulness of neural networks:

There is some really cool things going on this is area today (this year even), but we still have a long ways to go before we can even simulate a e coli bacteria or use the neural networks that actually work more widely.

Also relevant: Intel reveals

Also relevant: Intel reveals neuromorphic chip design. Definitely mind-contorting to think of computation this way.

Simulating just one neuron

Simulating just one neuron accurately is impossible with current technology. One neuron still contains too many atoms, and even a couple of atoms are impossible to simulate because quantum mechanics is hard to simulate (this is also the reason why quantum computers could possible be huge: because you can do so much computation with so little). Certainly a whole fruit fly is an incredibly complex beast.

Blue brain

Is that not one of the goals of the Blue Brain project? I phrase that as a question because I'm not sure where they are up to at this point in time. The original goal was indeed a molecular level simulation of a single neuron that was physically plausible.

Simulation of quantum physics is easy: it's just doing it accurately for small-scale systems that is hard. Whether or not the aggregate behaviour of millions of atoms is enough to model the behaviour of a neuron is still, of course, an open question.

Supervised

Darwintunes relies on an expensive resource, human attention and supervision, to provide the fitness function. While we shouldn't dismiss the possibility of human-driven fitness functions, we should seek approaches that don't require much attention.

An interesting recent

An interesting recent application of evolutionary computing might be of interest to some here given the discussion in this thread. What I found most interesting was their fitness function's clear connections to cyclomatic complexity used to measure the inherent complexity of programs (and other metrics of modularity of course).

Given how intricate and difficult to understand most evolved solutions are, adding this simple criterion to any fitness function could actually make solutions more understandable.