## Can referential transparency be bad? (puzzle included)

So in Haskell, I would take this data type:
 data Tree = Leaf | Node Tree Tree 
and write a function that maps leaves to leaves, and nodes to nodes with swapped children.
"mirror an immutable binary tree" was easy. What about "preserving existing sharing"?
For that I need to cheat a bit and augment my data type with identifiers, and pretend that meets the requirements of the puzzle.

Or I may use StableName, but then I will end up inside IO kitchen sink.
One may argue that ability to recognize sharing is not referentially transparent, and therefore it's a feature that it's available only inside IO computations. I however feel that this binary choice between purity and impurity is too harsh - can I have an indulgence, please?

To be constructive, I decided to redesign the puzzle so it is a fair game.
Now it sounds as: "clone an immutable binary tree obtaining maximum sharing".
I believe that while pure code cannot recognize sharing, it can at least manifest its desire to create sharing (subject to compiler's whim, of course).

The full tree of depth n (having 2n-1 nodes) will therefore be compressed to a DAG of n nodes - impressive ratio!

Additional points if your code does not create intermediate structures (this is unfair towards PLs without implicit stack, BTW), if it traverses the original tree only once, if it can be reused for other data types, and if you can state it's computational complexity and/or maximum size of the stack (sure, some of these tasks are much harder than others).

## Comment viewing options

### 2 thoughts

I recognise the second puzzle: Alan Bawden told me about it around 6 years ago, with only the single traversal constraint. Solving that problem in scheme suggested a harder problem that I thought was insoluble in a clean, general way: write a letcyc macro that creates cyclic data structures analogous to those one can create with Haskell let statements.

On this binary choice between purity and impurity is too harsh - I quite agree, and it seems that much harsher when you generalise the first puzzle to allow cyclic trees. Core Haskell just doesn't seem well suited for this kind of thing, maybe a point in favour of Peter van Roy's argument for ramified language design.

### My Thoughts

The first is sort of a technical objection to the statement of the problem; Haskell doesn't have a defined space semantics and it thus becomes hard to discuss sharing except as an implementation issue. For the problem to be well-stated, you'd have to pose it in a language where the semantics defines what is shared and what is not.

Putting that aside, the second thought is that this problem can be solved without too much difficulty because the leaves are all interchangable. First you generate an "infinite" trie where each lookup slot in the trie is the cannonical representation of the tree of that shape, and the trie is constructed to preserve sharing, much like the memoization tricks one can do in Haskell. Cloning a tree involves looking up the appropriate node in the trie.

Something line this, perhaps? I think this creates maximal sharing (under a "reasonable" model of sharing), but I haven't looked into the bonus points.

data Tree = Leaf | Node Tree Tree deriving Show

data FullTrie a = T a (FullTrie (FullTrie a))

lookupTrie :: Tree -> FullTrie a -> a
lookupTrie Leaf       (T l n) = l

### Cool

I wasn't 100% sure it would work.

The trie technique is pretty much taken from Ralf Hinze's paper Generating Generalized Tries. I've occasinally thought of using tries like this for general memoization of tree-structured data, but this is the first time I've tried it.

Doing the data structure transformations in your head is a little mind-bending, but its also very cool exersize.

Also, on a second thought, I think this techinque would also extend to labeled trees; you just include the label in the trie.

### a strange puzzle

The "preserving existing sharing" requirement dips below the language level. It's like saying "do such and such in a C program which keeps all values in registers." You could do it, sure, but you have to know what's going on behind the scenes. Imagine a Haskell implementation that always copies things and doesn't pass by reference.

### not so strange

To spell out the point I made obliquely before, is it so strange to want to be able to make a clean map on a tree with some cycles that doesn't need to create an infinitely large structure

Consider:
 data Tree = Leaf | Node Tree Tree

 cyctree = let (a,b) = (Node Leaf b, Node a Leaf) in b 

mirrorredtree= f cyctree 

where f is your preferred pure solution to Anton's problem?

How OK is it that, as you explore the structure of mirroredtree, you will be requiring a boundless amount of storage for what is conceptually a small datum?

I think it's clear that the pure Haskell solution isn't a good one. That doesn't mean there isn't a purely functional way to do this: it may be that if we rework the Bird-Meertens approach to work well with coalgebraic notions, we might get a pleasing PL out of it.

A further thought: consider:
 c=1:c d=1:1:d 

Clearly c and d have to be different in our new model Haskell, even though, for all n, take n c and take n d are the same, thus we lose one of the important equations of Haskell, the take-lemma. What should we replace it with? And (easy question), should drop 0 d and drop 1 d be the same or different?

### Cycles

You talk about cycles as if that's something you can create in Haskell.
Haskell doesn't say anything about what happens when you make a recursive definition. It might or might not create a cycle depending on the implementation.

And as you note, if you starting making sharing observable then nice properties start to disappear. So it's a tradeoff, you can have a more low level language, but you'll pay a price.

### What you ask is closely related to a problem...

...I had a while back and for which I haven't yet seen a good pure functional solution.

With sharing, it's easy to construct what is observationally a tree with O(exp N) elements from only O(N) nodes stored in memory. (I hope that's reasonably correct terminology.)

So very roughly, my problem is about efficiently building vectors from expressions in basis elements, eg. 5*(e_1+3*e_2+5*(e_3+e_4)) should evaluate to (5,15,25,25). An obvious way to evaluate this is from bottom up but it turns out there's a really nice 'adjoint' way to evaluate such expressions from the top down that can exploit sharing in the tree describing the expression. In particular, I want to evaluate these vectors in time O(N) for an O(N)-node expression tree that observationally has O(exp N) nodes.

Rather than clutter LtU with the description, I'll just give a link to my original post on comp.lang.functional.

This problem came up in the context of writing automatic differentiation code. I believe that at least one published functional automatic differentiation algorithm suffers from an exponential running time because of this problem.

### I believe that it needs symbolic computation

In other words, I don't think there is a solution which respects referential transparency, ie is purely extensional.

And of course this arises in both AD and symbolic computation. Guess why I've been harassing Oleg for a while to figure out how to do typed symbolic computation? If course, in pure Oleg style, he came close - but note the use of the untyped Template Haskell rather than the typed MetaOCaml (which he also knows well). The point is that in symbolic computation, you only have intensional equality, so sharing is something easier to deal with. So it is good that StablePrt (see above) does all its real work in IO!

### Reverse-mode AD in a Functional Framework

You're basically asking how to incorporate reverse-mode AD into a functional programming language. Jeff Siskind and I have developed a method for accomplishing this. Working code, along with a preliminary version of the manuscript (a revised version should appear there shortly) are available at http://www.bcl.hamilton.ie/~qobi/stalingrad/.

### Great!

I'm glad to see someone working on this problem as it's been bugging me for ages.

One thing I'll add is that I think that it should be considered separately from AD. It is used in reverse mode AD but it also has other applications.

### Trouble Shared is Trouble Halved

On the topic of sharing, I notice that Trouble Shared is Trouble Halved hasn't had much discussion on LtU.
A nexus is a tree that contains shared nodes, nodes that have more than one incoming arc. Shared nodes are created in almost every functional program--for instance, when updating a purely functional data structure--though programmers are seldom aware of this. In fact, there are only a few algorithms that exploit sharing of nodes consciously. One example is constructing a tree in sublinear time. In this pearl we discuss an intriguing application of nexuses; we show that they serve admirably as memo structures featuring constant time access to memoized function calls. Along the way we encounter Boolean lattices and binomial trees.

### Representing cyclic structures as nested datatypes

Representing cyclic structures as nested datatypes describes a way to explicitly represent and preserve sharing in the special case where sharing only consists of backlinks.

We show that cyclic structures, i.e., ï¬nite or possibly inï¬nite structures with back-
pointers, unwindable into possibly inï¬nite structures, can be elegantly represented as
nested datatypes. This representation is free of the various deï¬ciencies characteriz-
ing the more naive representation as mixed-variant datatypes. It is inspired by the
representation of lambda-terms as a nested datatype via the de Bruijn notation.

### let, or symbolic lambda

If you look at it closely, functional program itself has lots of sharing using let, or in order words, the lambda abstraction.

original: (\f . f + (f + f)) (1 + 1)

essentially gives you a tree (if you view the + as a node constructor, and number as the leaf) of

unfolded: (1 + 1) + ((1 + 1) + (1 + 1))

You can easily have functions operates on the unfolded form as if it is an ordinary binary tree, while keeping its original form for data structure construction.

But again, I have been working on this for a while, AFAIK, there isn't a generic solution for all cases, unless you involve meta programming to symbolicallyy manipulate the lambdas. Template Haskell (Oleg's solution recently posted to Haskell mailinglist) only does it statically at compile time, which is limited in expressiveness.