Reversing operations

If one wants to iterate through all reachable states s1 from an initial state s0 in an imperative language (think of a chess board and a possible set of moves) one could do something like:

for i in 1..#changes {
    s1 := copy(s0)
    change(s1, i)
    // process s1

It is usually faster and less memory intensive to do the following instead:

for i in 1..#changes {
    change(s0, i) pushing undo data to the stack
    // process s0
    undo(s0) popping undo data from the stack

Are there general techniques for inferring the "undo data" that an operation should save, and possibly the corresponding undo operation?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

You may enjoy the Galois

You may enjoy the Galois system. Traditional speculative parallelism does exactly what you don't want: it speculatively runs on a copy. To go back, it scraps the copied state and use the cached state. Galois does what you want: when it speculates ahead, it records a log of undo operations. To go back, it can just play the log.

To provide a more solid linguistic basis, it may be possible to adapt bidirectional languages such as lenses. I've been thinking more in terms of incremental computation however, but won't say much more there as it's active research :)

Persistent Data Structures

Most computations are not conservative, and your ability to infer state change is severely limited across any modular composition. So you'll probably need to constrain your vocabulary if you want reversible computing.

Keep it simple and sufficient. I think one simple and sufficient answer is to just select a data structure with O(1) copy costs and a storage cost proportional to the changes applied (or at least well below linear), aka persistent data structure. Clojure explicitly uses this technique to support MVCC.

Reversible computations are

Reversible computations are very interesting and useful. If you have an aggregate operation on a set and an element is removed or its value changes, you can "reverse" its effect on the aggregate to efficiently update the value of the aggregate operation. Likewise, if you have a bi-directional binding, you can reverse the bound-to operation to determine its effects on the LHS; I do this a lot in Bling.

Unfortunately, David is right, most computations are not conservative and in general it is difficult to compute reversals. I'm guessing most quantum-computing programming languages are based on reversible operations because they have to be, but such languages are very theoretical and probably don't make sense in your context.

As dmbarbour says, most

As dmbarbour says, most everyday computations are not reversible, ie. they are many:1 mappings. This is the same as saying information is lost during the computation, and that information is exactly the "undo data" you are asking about.

There is a simple way to make any computation reversible, and that is to change information-destroying operations into information-moving operations, where the information is moved to a junk area that's never read. In general we can take any Turing machine and give it a second tape, on which we write any information which would otherwise be lost during an update of the first tape. This doesn't affect our computation, which only cares about the first tape, but it does make our computations 1:1, and importantly the second tape is the "undo data" which lets us get back to our starting configuration.

Moving more high-level than Turing Machines, if you can enumerate the domain of every destructive (many:1) operation then instead of every output value having a set of possible inputs, it has an ordered list of possible inputs. For example, reversing square(x) for an output of 25 will give the set {5, -5}. If we enumerate the integers as 0, 1, -1, 2, ... then we can put the set {5, -5} into their enumerated order as the list [5, -5].

Each time a destructive operation (like square) is called, it should look up its argument in the list of possible inputs and log the index to the "undo data". For example, "square(-5)" should enumerate the integers and their squares until it gets to -5: (0,0), (1,1), (-1,1), (2,4), (-2,4), (3,9), (-3,9), (4,16), (-4,16), (5,25), (-5,25). It then logs the number of values it found with squares equal to square(-5), which in this case is 1. Thus the undo data for square(x) is 1 bit, which specifies whether it's positive or negative.

To undo a computation, given its undo data, we just reverse each operation, popping an index off the undo data if the operation is destructive.

Such a scheme is probably vastly inefficient and requires enumerable values, but it's as generic as I can think of at the moment. Note that it is near-optimal space-wise, it generalises the Turing Machine case (everything on a Turing Machine is enumerable, since we can enumerate tapes), and is amenable to arbitrary optimisations (like having "square" log 0/1 based on the sign, rather than doing the full enumeration).

Also, a fun fact: for any real life computer, the undo log is the heat it gives off during the computation!

Also, a fun fact: for any

Also, a fun fact: for any real life computer, the undo log is the heat it gives off during the computation!

You mean for any "perfect" theoretically thermally optimal computer. I'm sure we are nowhere near that point yet in the computers we actually build.


While not answering your immediate question, I wonder if the Darcs theory of patches might be relevant?


The J language has first-class support for this (for pure expressions) in terms of Under (&.). Many built-in operations have pre-defined inverses, and it's possible to define them yourself for your own functions.

Learning things like this is

Learning things like this is why I still love LtU.

Bling had something similar to J. Still, if you defined a method used out of only reversible operations, the method itself was also reversible. Bling also applied this concept to derivatives (inverse of op, derivative of op); i.e., ops went beyond being just computations.

It is really hard for me to grok the J under ops. How do they handle multi-value cases? E.g., a = b + c, when we go in reverse, we have a value for c and a and wish to compute b (b = a - c). Bling always took the left most value so to say: right = left + width, by inversion assignment would always shove values into left (left' = right' - width). Getting values into width required using the abstraction right0 = width + left instead. Since a + b was not the same as b + a , this was a weird semantic corner of the language.

Immutable structures

I've written game AIs using both your proposed methods.

There's also a third approach: use immutable data structures and instead of change() use a function that constructs a new board from the old one with the appropriate move. Eg. a quadtree type structure or one of many other similar structures. The new tree is largely built from pieces of the old tree so you don't need to make a complete new copy. You don't need to undo, you can just keep the old and new trees in memory (efficient because they share data) and just free the new tree when you're done.

Easy undo is one of those things that can come for free with immutable FP. (Though these data structures all work fine outside of the context of FP.) Of course there's a trade-off, you may find that the performance you get from a flat array is much better than anything you could get from a tree structure.

Immutablity is basically the

Immutablity is basically the problem he is avoiding. Instead of making a copy cost on every operation, his insight is that it may be cheaper to be destructive and recover.

I agree that copy can be cheap with FP tricks. To help prime the search: zippers and Chris Okasaki's book on functional data structures.

How about dynamic scope?

Or better to say, assignments whose scopes are bound dynamically. So compare

dlet (x = 4) in { ... }


dlimit (change(s1, i)) in { ... }

An implementation could intercept all assignments in the limited method call, push the old values on a stack (or just trap them in a closure), and re-assign the old values when the body is finished.

Program Inversion

There is some work by Torben Mogensen on semi-inversion in functional languages (GPCE 2005 and PEPM 2008). There is also a paper in PLDI'11 that looks relevent (haven't read it yet, but the abstract sounds interesting):

Path-based Inductive Synthesis for Program Inversion
Saurabh Srivastava, Sumit Gulwani, Jeffrey S. Foster, Swarat Chaudhuri

lots of links between immutable data

It seems to me that it's tricky when most data is immutable but there are a lot of links, references or pointers. I've been trying to think about whether it's reasonable to build distributed data structures using IPLD (the Linked Data work that builds upon IPFS and JSON-LD). The trendy cornerstone technique there is hashing whatever content you are storing and using it as the ID for finding and linking to it. The upside is you don't care where it is stored, and you can be certain that nobody can mutate data without your knowledge because the hash matches. So all content ends up immutable. But if you insist on replacing all pointers (either in memory, as in a CAM [content-addressed memory], or URLs on the web) with hashes, and the hash changes every time the content changes, then if you need to update the content, you'd better hope there weren't too many untracked links to it, because they will all be out of date: pointing to the old content, which still exists as long as any participant has a copy of it. So naively it seems that such systems are only good for recording history, not for holding "current" data and being able to link to it. And yet, git is such a system, and does both. There is also IPNS, a way of boxing an immutable hash with a mutable one. So that as with git you can update the outermost link, the one that points to the whole tree of objects, and make it point to a different tree. But what kind of granularity is that? Suppose you want to store your family tree in IPLD records... each living person is going to have changing attributes over time. Changing links to other people, places and things which are themselves always changing. Only dead people can have unchanging hashes in this tree, but even then, it may be that research discovers more facts about them. IPNS is too coarse for every node in the tree to have its own name, so one would probably still have to fall back to the good old hierarchical directory structure: a particular Bob Jones might be referred to as /Jones/Bob2354 instead of just a hash. Something like xpath to drill down to the data that you want, and hope that it doesn't get restructured so much that the same path doesn't work anymore. IPFS so far is oriented toward storing a whole website under each IPNS name, not a single document.

I had a lot of Java experience a long time ago, and in practice it was never a problem that String is immutable, because you can still replace a string anywhere by changing the reference to it. More likely you will have multiple references to a shared mutable object rather intentionally sharing a String (even though interning makes a lot of unintentional sharing occur). So, you work around the immutability by putting immutable data inside a mutable box. But in the limit I don't see how it works to try to make as much data as possible immutable in any situation in which there are a lot of untracked links, pointers or references.

From reading about Xanadu (also a long time ago), there were a few big ideas, but the most relevant ones might be 1) all links should be bidirectional, not one-way links like we have on the web today (easy enough to achieve - you just store links separately rather than within the hypertext); 2) whatever tricks can be employed to be able to point to subparts of data, which are necessary to make separately-stored links possible. The plan was for text to be editable in the extreme and yet for a link to point to an individual phrase, word or maybe even character range within a paragraph within a document, and link either to a fixed version of that text, or to the changing version so that the link stays up-to-date if desired. But the documentation is so sparse and chaotic and inconsistent and full of domain-specific jargon that I always seemed to be missing something about how to implement such a thing. (And apparently everybody else did too, otherwise that's what we'd be using now.) Does anybody have a handle on that? There was supposed to be some obscure invention of a multi-dimensional tree that was held up for a while as the ultimate solution.

Another downside of replacing pointers with hashes, in general, is that hashes are so long. I wonder if in the long term, if we really know how to handle immutable data and make it mutable anyway, we will use hashes instead of pointers within programs, within memory. The CAM technique everywhere. But if you want to do this without wasting too much memory, you need a 64-bit hash (same size as a pointer) which is likely-enough unique to avoid hash clashes. It's much shorter than, say, a SHA2, and so it can't scale to distributed data structures. (So maybe you need a varint SHA2: use the 64-bit prefix when it is enough, or a longer prefix when a clash occurs.) Whereas ordinary location-based pointers are always unique by definition. And mutable. So easy...

re: hashes

I'm currently using secure hashes pretty heavily for links between structured binary data. My goal is to keep many versions for very large codebases and databases (e.g. the size of Wikipedia) with flexible structure sharing and efficient comparisons and histories and sharing and so on.

While hashes are long, they're really not so bad! They're much shorter than many URLs or filenames with directory structure, at least. I use 280-bit BLAKE2b secure hashes encoded in a consonants-only base32, which hence requires 56 bytes per hash. The hash is trivial to recognize for conservative GC purposes.

The main trick is to design data structures that can work efficiently with relatively large hashes. The easiest way to do this is to inline data structures until some heuristic threshold is reached, so we have larger binaries. For example, a trie could inline many nodes into one large blob rather than use one hash per node. Normal-sized pointers can be used when parsing the binary, which might include dozens or hundreds of nodes. A secondary trick is to delay construction of the hash reference or otherwise avoid intermediate nodes, e.g. model trie with an explicit 'compact' operator that returns the compacted trie such that we can add huge batches of key-value updates between compactions.

Also, while hashes are shorter than many URLs, we can also add that hashes don't require authorization or versioning metadata. This can save space in the larger context of systems programming. Obviously not relevant in every use case. But it's very useful in some other cases, e.g. for memoization, caching, sharing of results.

IMO, hashes are excellent for scaling pure functional programming to large scale persistent systems, a purely functional alternative to virtual memory, a basis for large scale structure sharing (e.g. immutable refs to shared libraries or DLLs). I don't consider hashes to be appropriate everywhere. Just wherever you need very big immutable data structures. That said, it's usually not difficult to isolate mutations to just a few toplevel objects, especially when you can efficiently model full databases as first-class values.


Xanadu used a data structure called an "Enfilade" to do the transclusion. This was all made public around 1999 when the source code was published. See:

* Icon has both backtracking and non-backtracking assignments