Chemistry, Graph Tranformation and Programming Languages

I’d love to have a LISP equivalent of a graph rewriting system, towards making something that chemists and biologists could use to computationally “play” the universe of molecules and the reactions between them.

To solicit some advice (perhaps provoke a collaboration?) from experts in the field of programming languages, I give to you what could be a considered a summary of a paper I presented at ICGT 2004. It was a great conference (if anyone who attended is reading), but was focused more towards software engineering and I think most of the people there had more important things to do than to listen to me.

The hope is to generalize the notion of synthetic organic chemistry to graph grammars, and create a stronger underlying formalism that would allow us to make useful tools and think of new problems and applications that can be solved by thinking of them in a “chemical” fashion.

What kinds of problems do people in synthetic organic chemistry think about?

Before we talk about that, let me provide a quick intro to chemistry. Chemists generally deal in molecules, which can often be sufficiently described in terms of vertex and edge labeled graphs which I will call chemical graphs. Molecules undergo reactions with other molecules to make new molecules. For today’s sake, reactions can be considered a relation between n-tuples of molecules. Reactions can be partitioned into a set of classes, we will call chemical transformations. Chemical transformations can be denoted by only considering a small “reactive” portion of the reactant molecules and products, they go by names like “Diels Alder Cycloaddition”, “Cope Re-arrangment” and “Mitsunobu Coupling”. Chemical transformations can be considered rewriting rules over chemical graphs.

Some chemists try to make weird and interesting molecules that can be found in nature, usually extracting things from weird creatures deep in the ocean or rare plants. They then try to build those molecules using only chemicals that are commercially available; and the chemical transformations that they know. Sometimes they find new chemical transformations in their quest to build their new molecule. This process can be considered a type of deduction over chemical graphs, where the axioms are the available chemicals and known chemical graph transformations; and the theorems are the target chemical graphs. These deduction paths are called synthetic routes to the chemist. Sometimes these synthetic routes can be tricky or require many (>50!) steps. The problem of asking wether or not a synthetic route exists to you target is equivalent to the semi-group word problem, and thus undecidable. With just a few assumptions, we can consider the problem of determining synthetic routes equivalent to the tiling problem, and is thus NP-hard. There is a good story behind the synthetic route of a molecule that is an anti-cancer agent, called Taxol, originally isolated from the rare yew tree. The last 60 or so years of chemistry has developed to give us synthetic routes to millions of molecules.

How many different molecules are there? The set of all molecules is infinite, the space of all “theoretically” synthetically accessible molecules is also infinite (but a subset of the former). The space of all practically synthetically accessible molecules has been estimated to be around 1062. There isn’t enough time to make each of them, although I think we do actually have enough atoms in the universe to do it. We are often interested in finding new molecules to do interesting things. Some chemists spend their time “fishing” in this set of synthetically accessible molecules. They do this by either thinking of a new molecule that would seem to be amenable to synthesis or make many molecules in parallel and try to use selection to find ones that have interesting properties.

Molecules do lots of really neat things, they are used as dyes, explosives, foodstuffs and medicines to name a few. Computational systems to help us navigate the infinitely large space of molecules would help us find new molecules to do things just as cool, or even cooler.

There is plenty of good software out there that could be used to do the stuff I am talking about. A good list of examples is in the paper, but they all have a very “applied” flavour; none of them are really instances of a more general framework of a grammar over graph rewriting. I wonder if some inter-discplinary thinking might not only help produce the kind of software I am thinking about, but also provide generalizations of concepts that may be useful to larger audience.

Perhaps it is worth noting that chemical analogies have crossed paths with areas of programming languages before, for instance, the chemical abstract machine. I do however consider the analogy “weak”, in that trying to understand CHAMs hasn’t given me any good ideas about chemistry. People seem to be talking a little more about applying concurrency to the description of things like biochemical pathways; but I haven’t seen anything that is really so compelling that it would help an investigator “in the field”. I hope it doesn’t sound like I’m suggesting computer science is guilty of not giving anything to chemistry (obviously not true). In case your feelings are hurt, you may hear about a "DNA Computer" in a science-type blog every so often; just about all those things are absolute rubbish, stay far away.

So if you found any of this interesting or have any ideas, please post below. I'd really love to be able to follow up on my ICGT paper (which was more a glorified tutorial/proposal) with something a little more concrete.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Interestingly enough, I have

Interestingly enough, I have been contemplating playing with a small scale version of an idea along these lines as a possible end of year thing to do so as to complete the project requirement in my high school chemistry class.

One of the challenges I see is that (at least at the high school chemistry level), a lot of the techniques for dealing with chemical reactions are more heuristic in flavor than algorithmic, in that they more of a good guess than definitive answer as to the nature of the reaction.

OK, I officially feel old. And stupid.


Bah, Humility!

I am inspired to work that much harder to learn what I want to learn when I meet people who are half my age and know twice as much as I do about the subjects that fascinate me. I would much rather be in a group where everyone is smarter than and knows more than me, rather than being in a group where everyone knows less.

I am often humbled on the #haskell irc channel, and here on LtU as well. I am also convinced that if I haven't been humbled recently, I'm not trying hard enough.

(Works for unicycling too, if I haven't unexpectedly left my unicycle at high speed during a ride, I'm not trying to improve my skills.)

--Shae Erisson - ScannedInAvian.com

But I was Waiting for an "Organic Chemistry Prolog"!

Wherein one would

  1. build molecular structures that characterise a problem,
  2. place them in a beaker,
  3. execute concurrent chemical reactions,

  4. analyze the results of the reaction (which represent the solution of the problem) and,
  5. voila, problem solved.

I was hoping we could use this "Organic Chemistry Prolog" to solve difficult problems in computer science. But it appears we're at an impasse - who's going to make the first move: chemists or programmers?

Adleman tried....

and everyone was (rightfully) really excited to think of the possibilities of solving TSP with parallel computing on DNA.

A lot of people worked in this area for several years; they even felt like they were competing with quantum computing.

It was a nice ride, and we learned some interesting things, but it seems somehow obvious now that computing with molecules (in the "wet" sense) just doesn't really work.

but it seems somehow obvious

but it seems somehow obvious now that computing with molecules (in the "wet" sense) just doesn't really work

I'm curious, could you elaborate a bit on that (or point me to relevant articles)?

Some quotes and links articles

Interestingly, googling "The Failure Of DNA Computing", brings up articles which propose "the success of failure of DNA computing....". Maybe I should write an article with that title.


I'll end my comment with some of my own thoughts (I don't mean to snowjob anyone with quotes), but you may consider these somewhat telling:


Adlmen Interview:
Leonard Adleman sends his regrets. In an e-mail FAQ he uses to fend off journalists seeking interviews, the University of Southern California computer scientist and world-famous cryptographer who invented the field of DNA computing confesses that "DNA computers are unlikely to become stand-alone competitors for electronic computers." He continues, somewhat apologetically: "We simply cannot, at this time, control molecules with the deftness that electrical engineers and physicists control electrons."

Linus Torvalds Interview:
I think that's a load of bull. I see all these news reports that say "Hey, we had a chemistry set that computed pi to seven digits!" That's basically what they're doing. They're not doing computing. They're doing pattern matching with DNA. And that's fine. That's what you want to do if what you are matching is DNA, right? But if you actually want to do computation you obviouisly don't want to do this biological solution of stuff and just hoping that the answer will come, right? So I'll believe it when I see it.

There are a number of popular articles on DNA computing, it is a really fun idea to think about. But look past the inflationary headlines ("DNA-Nano Computer Could Help Fight Cancer"), and look carefully at where the actual science is. It can be tricky if you don't have a background in chemistry.

There are numerous examples of "molecular logic gates". Not the kinds of things that could be used to effect current, but the "inputs" are usually two different molecules and the "output" is something like the emission of light at a specfic wavelength. These things really can't be coupled together past a step ot two, and it's not at all obivous what could be done to alleviate that.

It is also a mistake that DNA computing somehow rivals quantum computing. The latter has a pretty rich theory and brings up new questions about decidability and complexity, the former does not.

In summary: Grrr. DNA/Molecular computing sucks.

Mathematics of organic chemistry?

It seems you would need some formal infrastructure first. Do you have references to, say CS-like/logic based, formalisms of organic chemistry?

There are no such formalisms...

and maybe it is becuase nobody actually needs them to do anything useful.

But I think that, maybe, we need something like that to make a slick graph rewriting system, that would certainly be useful.

Some Links to DNA Computing Resources

Molecular Computation of Solutions to Combinatorial Problems by Leonard M. Adleman. DNA Computing: A Primer is a discussion of the method used by Adleman. Eric Baum's papers DNA Based Computing describe the obstacles to DNA computing (seems it's currently very I/O-bound!-). As always, Google is a good starting point.

I was wrong to speak of an "Organic Chemistry Prolog" - the technique might better be described as massively-parallel "generate and test". Backtracking is absent and simple matching is used [rather than unification, although this might be debatable - there are multiple ways to match atoms and/or molecules. Just thinking out loud here, though.]

Topic Appears to Have Active Ongoing Research

Searching Google for topics such as "Computer Aided Synthesis Design"
yields such gems as The SynGen program for organic synthesis design and others(search for first occurrence of CASD). It appears that this is an active research area although there also may be some resistance to using new tools.

Cited in the paper

as is LHASA and some other packages.

Sertanty Inc.'s ChIP platform has something really modern and cool going. Great company too, I just wish I had a command of the resources to get my hands on some of their software...

All of these softwares do *work* (Sertanty's works very well). But I wonder if such software could be generalized to make better chemical software, and to consider the application of chemical problem solving to other domains...

You sure?

Though I am usually a fervent protagonist of out-of-the box abductive thinking, I doubt the offspring of "chemical problem" solving (where I partly agree with your view on DNA-computing; let's not go there).

The field of algorithms is rather rich, and there are already a lot of CS people in the chemistry/bioinformatics field. That's why I thought a good approach would be to formalize an/your abstraction of organic chemistry - this would provide a sound basis for implementing (adaptions of), I presume, well-known algorithms. From what I know of bioinformatics, say the folding problem, this already would be non-trivial. Of course, I might be totally incorrect since I am not a domain expert on chemistry.

We're on the same page.

Trying to see if there is a (interesting) formalism is the first thing I want to do; a calculus with graphs and graph rewriting rules in which you would be able to do things like:

-specifications of rules at multiple "levels of detail"

-ability to apply at rules "partially"

-specify global contexts that change the way rules are applied

-Handle things like multiple potential applications etc. in a clean way

The fact that no formalism jumps out at me, suggests maybe that there really isn't anything that interesting here.

Let me back away from my statement about "chemical problem solving". I certainly didn't think that chemists have invented any algorithms, at least none that are not explored in the world of computing.

What I think I mean, is that by having a flexible graph rewriting system that you can show "chemistry" is just an instance of; it would be convienent for people to explore the use of graph transformation in what they study.

Suppose instead of molecules, we were modelling fish that could swim in schools; eat each other etc. It would be convienent to map the fish onto molecules, specify graph trasformations and start doing stochastic applications of the rules. Am I making sense here?

Oops, I did it again

Ahum, I reread the topic. Hmpf, well, uhm, my apologies? Just read you were already going for the formalism; I got distracted after reading your paper where most examples described molecules in some (slightly odd) syntax, and, somehow, I presumed some people must already have tried to build formal logics for organic chemistry (I think some logic, maybe modal, would best fit your needs).

Ah well, back to your original question. Well, I guess you are correct in thinking that it is worthwhile to study the applicability of logics and algorithms/reasoners over graphs for organic chemistry; although, I suspect you are looking at a rather rich graph structure there (what to do with 3d-properties for instance).

To me it seems you need to start of with defining the domain and rules of just a chemical logic. Most of the general reasoning algorithms you would need then I guess you could adapt from any good book on AI. (In the end it seems your primary interest can be translated to algorithms for stochastically generating the derivable terms for an elaborate logic on graphs)

If you want to build a general framework for reasoning in various manners over graphs which also may contain these kind of rich structures, uhm, well, that would be interesting but very non-trivial also.

Ah well, succes.

Note: another more pragmatic way would be, of course, to start playing with Prolog, or a forward reasoner (There was a simple forward reasoner mentioned on LtU just lately but sorry, I forgot the name)

Not merely graph rewriting...

What I think I mean, is that by having a flexible graph rewriting system that you can show "chemistry" is just an instance of; it would be convienent for people to explore the use of graph transformation in what they study.

Suppose instead of molecules, we were modelling fish that could swim in schools; eat each other etc. It would be convienent to map the fish onto molecules, specify graph trasformations and start doing stochastic applications of the rules. Am I making sense here?

But each molecule is represented by a separate graph, and there are multiple graphs involved in a reaction. The orientation of these graphs relative to one another is relevant and may change as the reaction progresses. The electronegativity and structure of the vertices of each graph are similarly relevant. So this problem, even in a simplified form, is very complex because we are modeling a physical 3-dimensional system. In classical terms, we're solving an "N-body problem".

But to do this correctly we need quantum mechanics (or at least a simplified version with s- and p- orbitals, etc. that correctly characterizes the bonds).

Now while chemical graph rewrite rules might be used to easily characterise the effect of a single ion upon a single graph structure held in a fixed orientation, as the complexity of the interacting components increases the interactions of even only two graph structures (two interacting molecules) becomes inordinately complex.

No.

The paper makes it clear that I am not talking about physical simulation; just up to determining plausible synthetic routes.

Criticisms apply nonetheless

The problems of intractibility cannot be wished away. If the representation is inaccurate, the system will not properly predict; if the representation is accurate then the problem may prove intractable except in sufficiently-constrained cases.

The paper describes aspects of SMILES and other systems, speaks hopefully of a "graph-rewriting" system for chemical reactions, draws parallels between mathematical proofs and "chemical deduction" and then launches into a wish-list of features for such a system.

You envision a base representation (SMILES-like) of reactants and a set of constraints. But I am uncomfortable with the concept of "reaction" as an operator except in very simplified models, since it hides complex interactions (N-body again) between reactants. Indeed I believe that, if there is no such system, then that is precisely why there is no such system available.

The paper Analysis of Metabolic Pathways by Graph Transformation also given at ICGT meets my approval because it is more specific than your paper, better elaborated and recognizes the limitations of various approaches.

Analogies between linguistics and chemistry have limits of applicability. For example, while computer science is largely devoted to context-independent grammars, any "chemical grammar" would be context-dependent. I believe in this particular instance (your paper) the analogy has been stretched near, if not beyond, the breaking-point.

But perhaps you should try the U.S. Patent Office!-))