"Folding"/FP traversal over trees

I've been working on defining a "fold" operation for trees.

There was some helpful discussion in the past "Folding over trees", but I haven't been able to find anything beyond the XML-type examples suggested there.

My problem: I want to have more control of the traversal. That is, when a certain node is reached I want the possibility of skipping further descent, or reentering an already-traversed node.

So yeah, this isn't actually a "fold". Maybe its a "visitor with control"?


Basically I'm trying to cleanly separate
a) details of the traversal process specific to the data structure (how to get a child, etc.)
b) processing of the current node
c) control of the next step in the traversal

Olin Shiver's paper "The Anatomy of a Loop" (previously on LtU) discusses a general framework for separating these kinds of concerns.. But I'm looking for something a bit simpler.. (and am doing this in C++ besides)

So... any suggestions on general FP approaches to this kind of problem? [hopefully this isn't OT!]


I'm currently using something like the foldts used in the prior discussion, modified so that the seed IS the function (or equivalently the function is 'apply'). (is this a monad?)
These seed-closures then return an extra value indicating whether they want to continue
in the regular traversal order: (schemeish syntax here)

(seed 'fup node)   => (next-seed, reenter-children?)
(seed 'fdown node) => (next-seed, skip-children?)
(seed 'fhere leaf) => (next-seed, skip-the-rest-of-the-children?)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

iterators

iterators fulfill all your abstraction requirements.

see Design Pattern for the Implementation of Graph Algorithms for a thesis focusing on graph algorithms and abstraction, or the boost graph library if you are interested in a particular c++ implementation.

a purely functional implementation should be easily derived thereof, because through the abstraction state is wrapped up nicely.

Sounds Like Oleg Again

You may wish to see Towards the best collection traversal interface and Generic Zipper and its applications.

Since you're trying to do functional programming in C++, I highly recommend a look at Phoenix V2.

Phoenix V2

Wow! Thanks.

Zippers in C++?

I don't see one in Phoenix or anywhere else.

Do you know of an implementation?

TBD

It's still a goal of mine to do a generic zipper in O'Caml and C++. The C++ case is complicated by the lack of monad support even in Boost Lambda or Phoenix. I don't believe the issues to be insurmountable, and in fact the Syntax extension for Monads in Ocaml will no doubt inform the implementation, because O'Caml doesn't natively support monads, either, yet here we have monads including Oleg Kiselyov's delimited continuations, the point behind the addition of which, from my point of view, is precisely to support the generic zipper, which Oleg characterizes as "a delimited continuation reified as a data structure."

conditions should be composable

Like all good designs the key is separations!

Separate the "accumulate state" that may depend on the history of the traversal, the conditions that are used to decide or not if a sub-tree should be visited, and finaly (and more obviously) the algorithms that do the traversals.

Returning a triplet of booleans to indicate what should be done at each tree node to indicate if the left, righ or this element should be "processed" has the advantage that two different conditions of iterations can easily be "and"ed or "or"ed together.

I am sure this is a standard DB trick because it is then quite easy to compile a multi-condition tree query by generating each tree traversal function and composing them together with the same logical condition as the query.

Example: generate an iterator to find x GT a and another to find x LT b, now compose then with and "and" and you get an iterator for a LT x LT b. (sorry for the fortran but I tried to use the less then, greater symbols and got the HTML formating instead).

Note that this does not address the order in which the traversal is done. You could add another attribute: left, right, don't care and complain if you try to compose incompatible iterators (left with right for example)

Algorithms on more than one iterator can be seen as joins.

I implemented this in C++ templates a few years back and last time I looked was still surprized not to have seen a similar solution in boost yet (didn't check lately).

If you want to control the

If you want to control the traversal internally, what you can do is use lazy evaluation to let the user make the the choice of how to traverse the data structure:

type 'a tree =
  | Node of 'a tree * 'a * 'a tree
  | Leaf

let rec fold node leaf tree =
  match tree with
  | Node(l, x, r) -> node (lazy (fold node leaf l)) x (lazy (fold node leaf r))
  | Leaf -> leaf

val fold : ('a lazy_t -> 'b -> 'a lazy_t -> 'a) -> 'a -> 'b tree -> 'a 

The trick in this definition of fold is that the recursive calls down the tree are suspended in thunks, so that the client can choose whether to force them or not. Haskell is nice in this regard because this happens automatically.

Now, you can get a preorder or postorder traversal with no particular difficulty (these functions are deliberately inefficient to be concise):

let preorder tree =
  fold (fun l x r -> x :: (Lazy.force l @ Lazy.force r)) [] tree

let postorder tree =
  fold (fun l x r -> (Lazy.force l @ Lazy.force r @ [x])) [] tree

Likewise, if you are multiplying the elements of a tree, and want to stop evaluating as soon as you find a 0, you can do it like this:

let multiply tree = 
  fold 
    (fun l' x r' -> 
       if x = 0 then 
         0
       else 
         match Lazy.force l' with
         | 0 -> 0
         | lval -> x * lval * (Lazy.force r'))
    1
    tree

separation of control, structural recursion vs. loops vs. HOFs

Actually, I initially went about this by passing a closure to the Node function.

I abandoned this aproach because it still embeds the control-flow within the node-evaluation function.
Of course, this is basically the essence of any recursive function, but I want to separate out the recursion from the processing. Perhaps this is just misguided?

Again, I refer to "Scoping based on control flow graph" (AKA "The Anatomy of a Loop").

I see that you (neelk) advocated for structural recursion in that discussion as well:

However, I'm only partly convinced by his case for his loop language. The reason is that I think pattern matching gives you a simpler alternative. If you program with structural recursion -- that is, if every recursive call is to variables that were revealed in a pattern match, then the termination property is clear, and evident in the code. This even scales to handle "iteration-protocol" style operations, by writing a function that enumerates the elements as a lazy list.

Perhaps you're right -- given a language where the structural recursion is cleanly expressed in syntax.
I definitely agree with you that:

[Packaging up the recursion into a higher-order function] is great, but there's a fly in the ointment. Inductive proofs all structure the induction slightly differently each time (there's induction, course-of-values induction, structural induction, and so on), and your HOF encodes exactly one such induction scheme. Contorting your code to make it fit can be unnatural.

I guess I really want to reopen the previous discussion (as I tried before! :-) ).

I also find this quote from Benjamin Cutler (in the same discussion) interesting, especially from the C++ perspective:

What I'm trying to say is, for somebody that understands what fold does, the function that uses fold in its definition seems elegant and simple. For somebody that doesn't understand what fold does, it seems mysterious and complicated. On the other hand, any sort of loop definition will be simple to understand for both people.

"Strategic Programming"?

"... I want to separate out the recursion from the processing. Perhaps this is just misguided?"

This sounds quite similar to what some have called "strategic programming" (as seen on LtU in 2002. Here's the start of the abstract from Lämmel and Vissers's paper:

Strategic programming is generic programming with the use of strategies. A strategy is a generic data-processing action which can traverse into heterogeneous data structures while mixing uniform and type-specific behaviour. With strategic programming, one gains full control over the application of basic actions, most notably full traversal control.

mapReduce

Thank you for the excellent paper.

Upon looking at the author's other publications, I was reminded of Google's MapReduce (prior LtU discussions here and here).
This is a good example of a case where the traversal must be strongly separated from the per-element processing, since the traversal is parallelized across a network. (though I don't think Lämmel's mapReduce paper makes reference to "strategies")

Excerpt from The Essence of Strategic Programming (italics in original):

The established programming paradigms provide little support to effectively separate
traversal control and basic actions. We contend that there is not just a lack of
expressiveness but also a lack of a general design method for traversal functionality. It
is common practice to implement traversals according to one of the following options:

a rely on a built-in normalisation strategy, or
b entangle traversal and basic computations, or
c compromise the primary, typed representation of data.

Option (a) is common practice in rule-based programming (e.g., algebraic term
rewriting, expert systems, abstract state machines). Here, the programmer supplies a
set of computation rules and relies on the system’s built-in evaluation strategy (e.g.,
leftmost-innermost) for applying these to the relevant parts of the input data. This approach
breaks down when extra precautions are required to enforce termination, restriction
to a scope, or side conditions involving extra data. Then, programmers end up
adhering to option (b), that is, they encode traversal functionality in computation rules.

Option (b) is common practice in standard functional programming. Here, the programmer
encodes traversals as families of mutually recursive functions via patternmatch
equations. In this approach, the traversal strategy is heavily entangled with the
basic actions
as illustrated in Fig. 1. Virtually, the same could be said about the basic
visitor technique in standard object-oriented programming. [...]

Option (c) suggests to operate on an untyped representation type, maybe even in
a basically untyped environment such as Prolog. This approach suffers from a lack of
the very valuable static checks [...].

[..]

2 Strategic traversal = basic actions + traversal control
The key idea underlying strategic programming is the separation of problem-specific
ingredients of traversal functionality (i.e., basic actions) and reusable traversal schemes
(i.e., traversal control). In order to enable the definition of traversal schemes, an incarnation
of strategic programming provides primitives for ‘one-layer’ traversal that cater
for generic access to the immediate components of heterogeneous data structures.

The traversals they build up from a few combinators are wonderfully succinct. Now I'll have to look at whether they're too succinct.. ;)