archives

Chemistry, Graph Tranformation and Programming Languages

I’d love to have a LISP equivalent of a graph rewriting system, towards making something that chemists and biologists could use to computationally “play” the universe of molecules and the reactions between them.

To solicit some advice (perhaps provoke a collaboration?) from experts in the field of programming languages, I give to you what could be a considered a summary of a paper I presented at ICGT 2004. It was a great conference (if anyone who attended is reading), but was focused more towards software engineering and I think most of the people there had more important things to do than to listen to me.

The hope is to generalize the notion of synthetic organic chemistry to graph grammars, and create a stronger underlying formalism that would allow us to make useful tools and think of new problems and applications that can be solved by thinking of them in a “chemical” fashion.

What kinds of problems do people in synthetic organic chemistry think about?

Before we talk about that, let me provide a quick intro to chemistry. Chemists generally deal in molecules, which can often be sufficiently described in terms of vertex and edge labeled graphs which I will call chemical graphs. Molecules undergo reactions with other molecules to make new molecules. For today’s sake, reactions can be considered a relation between n-tuples of molecules. Reactions can be partitioned into a set of classes, we will call chemical transformations. Chemical transformations can be denoted by only considering a small “reactive” portion of the reactant molecules and products, they go by names like “Diels Alder Cycloaddition”, “Cope Re-arrangment” and “Mitsunobu Coupling”. Chemical transformations can be considered rewriting rules over chemical graphs.

Some chemists try to make weird and interesting molecules that can be found in nature, usually extracting things from weird creatures deep in the ocean or rare plants. They then try to build those molecules using only chemicals that are commercially available; and the chemical transformations that they know. Sometimes they find new chemical transformations in their quest to build their new molecule. This process can be considered a type of deduction over chemical graphs, where the axioms are the available chemicals and known chemical graph transformations; and the theorems are the target chemical graphs. These deduction paths are called synthetic routes to the chemist. Sometimes these synthetic routes can be tricky or require many (>50!) steps. The problem of asking wether or not a synthetic route exists to you target is equivalent to the semi-group word problem, and thus undecidable. With just a few assumptions, we can consider the problem of determining synthetic routes equivalent to the tiling problem, and is thus NP-hard. There is a good story behind the synthetic route of a molecule that is an anti-cancer agent, called Taxol, originally isolated from the rare yew tree. The last 60 or so years of chemistry has developed to give us synthetic routes to millions of molecules.

How many different molecules are there? The set of all molecules is infinite, the space of all “theoretically” synthetically accessible molecules is also infinite (but a subset of the former). The space of all practically synthetically accessible molecules has been estimated to be around 1062. There isn’t enough time to make each of them, although I think we do actually have enough atoms in the universe to do it. We are often interested in finding new molecules to do interesting things. Some chemists spend their time “fishing” in this set of synthetically accessible molecules. They do this by either thinking of a new molecule that would seem to be amenable to synthesis or make many molecules in parallel and try to use selection to find ones that have interesting properties.

Molecules do lots of really neat things, they are used as dyes, explosives, foodstuffs and medicines to name a few. Computational systems to help us navigate the infinitely large space of molecules would help us find new molecules to do things just as cool, or even cooler.

There is plenty of good software out there that could be used to do the stuff I am talking about. A good list of examples is in the paper, but they all have a very “applied” flavour; none of them are really instances of a more general framework of a grammar over graph rewriting. I wonder if some inter-discplinary thinking might not only help produce the kind of software I am thinking about, but also provide generalizations of concepts that may be useful to larger audience.

Perhaps it is worth noting that chemical analogies have crossed paths with areas of programming languages before, for instance, the chemical abstract machine. I do however consider the analogy “weak”, in that trying to understand CHAMs hasn’t given me any good ideas about chemistry. People seem to be talking a little more about applying concurrency to the description of things like biochemical pathways; but I haven’t seen anything that is really so compelling that it would help an investigator “in the field”. I hope it doesn’t sound like I’m suggesting computer science is guilty of not giving anything to chemistry (obviously not true). In case your feelings are hurt, you may hear about a "DNA Computer" in a science-type blog every so often; just about all those things are absolute rubbish, stay far away.

So if you found any of this interesting or have any ideas, please post below. I'd really love to be able to follow up on my ICGT paper (which was more a glorified tutorial/proposal) with something a little more concrete.

On the Unusual Effectiveness of Logic in Computer Science

In 2001, Moshe Vardi organised a workshop devoted to a the topic of The Unusual Effectiveness of Logic in Computer Science with papers presented covering such topics as "Logic as the calculus of computer science" (Vardi) and "Descriptive complexity" (Immerman), and later a gang consisting of Halpern, Harper, Immerman, Kolaitis, Vardi, and Vianu published a likewise named 24 page article in the July 2002 issue of the Bulletin of Symbolic Logic.

The title is derived from Wigner's famous article on The Unreasonable Effectiveness of Mathematics in the Natural Sciences, which was devoted to raising and attempting to answer the important question: why should mathematics have been so useful to natural scientists? With respect to logic, my answer for the effectiveness of LICS is that, while computation is a physical phenomenon, it is a phenomenon that is best understood via powerful abstractions, and the most powerful abstractions we have at the moment are abstractions in mathematical logic, because of the fundamental relationship of Turing completeness to Goedelian incompleteness.

Links derived from Richard Zach's Motivating Intro Logic for Philosophy majors (and others).

More links...