Bidirectional fold and scan

Bidirectional fold and scan, O'Donnell ,J.T. In Functional Programming, Glasgow 1993, Springer Workshops in Computing.

Bidirectional fold generalises foldl and foldr to allow simultaneous communication in both directions across a list. Bidirectional scan calculates the list of partial results of a bidirectional fold, just as scanl and scanr calculate the partial results of a foldl or foldr. Mapping scans combine a map with a scan, and often simplify programs using scans. This family of functions is significant because it expresses important patterns of computation that arise repeatedly in circuit design and data parallel programming.

The biderctional fold is a bit surprising at first, but the real reason I am posting this is to encourage discussion on the connection between functional and data prallel programming.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Tail-recursive convolution

Here's a nice functional programming exercise in the style of this paper. Write a function that takes two lists [x1 x2 ... xn] and [y1 y2 ... yn] and returns their symbolic convolution [x1#yn x2#y(n-1) ... xn#y1] (where x#y is a pairing operation). The function should be tail recursive and do no more than n recursive calls. Hint: dataflow variables allow writing a straightforward tail-recursive append. (This is exercise 16 of chapter 3 in CTM.)

There and back again

The functional pearl "There and and back again" by Danvy and Golfberg use this problem as a motivational example.

Here is the first page:

There and back again

Oliver Dancy and Mayer Goldberg

Dear Reader:

Before proceeding any further, could you first ponder on the two following riddles?

Computing a symbolic convolution:

Given two lists [x1; x2; ...; xn−1; xn] and [y1; y2; ...; yn−1; yn],
where n is not known in advance, write a function that constructs

[(x1; yn); (x2; yn−1); ...; (xn−1; y2); (xn; y1)]

in n recursive calls and with no auxiliary list.

Detecting a palindrome:

Given a list of length n, where n is not known in advance,
determine whether this list is a palindrome
in ceiling(n/2) recursive calls and with no auxiliary list.

-- Jens Axel Søgaard

Aha, but there is a difference

Yes, my exercise is a variation on the first of Danvy's exercises. The difference is that I ask for the function to be tail recursive. This difference makes a big difference!

Does it matter?

The space requirements for n stack frames or n dataflow variables are both O(n). Is there really a difference?

Constant factors sometimes make a difference

Well, is there a difference whether append is tail-recursive or not, since the input list is O(n)? In both the append and the convolution examples, the difference in space is (just) a constant factor. If the input list has 10 million elements, it can make a difference. But the purpose of my little exercise was to show an example of the expressive power of dataflow variables when used in sequential functional programming.

Solution to both exercises

Note though, that the solution on page 3 in the paper is tail recursive.

Tail recursive but 2n recursive calls

the solution on page 3 in the paper is tail recursive

Yes, but it does more than n recursive calls, because it has to call a helper function. My exercise explicitly asked for a single function that does just n recursive calls. (As a warm-up, write a tail-recursive append that takes just n recursive calls and that does not need any helper functions.)

Tail-recursive append without helper functions

Here is a tail-recursive append written in Oz that does not need any helper functions:

 proc {Append Xs Ys Zs}
    case Xs
    of nil then Zs=Ys
    [] X|Xr then Zr in
       Zs=X|Zr
       {Append Xr Ys Zr}
    end
 end

Although this is a function, I have written it in procedural notation to highlight why it is tail recursive. (The procedural notation is a partial translation to the Oz kernel language, which is the process calculus underlying Oz.) This definition uses an unbound dataflow variable Zr, which is passed to the recursive call. Because Zr is left unbound before the recursive call, the list construction X|Zr can be done *before* the call, which makes the Append tail recursive. The above definition can be written in a functional notation:

 fun {Append Xs Ys}
    case Xs
    of nil then Ys
    [] X|Xr then X|{Append Xr Ys}
    end
 end

This has identical semantics as the previous definition. The symbolic convolution also uses unbound dataflow variables. Both the tail-recursive append and the symbolic convolution show the expressive power of dataflow variables for sequential functional programming. (Of course, concurrent programming is where dataflow variables really shine. These examples show that they are also useful for sequential programming.)

Was one thing that wowed me about Oz.

The naive translation of the functional notation to Alice-ML would have:

   fun append (nil, ys)   = ys
     | append (x::xs, ys) = x::append(xs, ys);
   val x = append([1,2,3], [4,5,6]);
   inspect x;

But this doesn't capture what Oz is actually doing. Instead, you have to thread a promise through as a parameter, just as you did in your procedural code above:

   fun append (nil, ys, p) = Promise.fulfill(p, ys)
     | append (x::xs, ys, p) =
         let
            val px = Promise.promise()
         in
            Promise.fulfill(p, x::(Promise.future px));
            append(xs, ys, px)
         end;
   val p = Promise.promise();
   append([1,2,3], [4,5,6], p);
   inspect (Promise.future p);

Definitely impressed me that in Oz you could express it in the functional style and the compiler would automatically translate it to the more procedural style. Means that you can be operating on the results before the function is completed, if you choose to push the function call to a seperate thread. (Edit Note: Not to mention you get tail recursion for free).

Dataflow variables and monads.

I'm curious about the connection between dataflow variables in the presence of residuating functions and monads?

If we have a sequence:

f(w,x),g(x,y),h(y,z)

where f,g,h residuate unless thier first arguments are instantiated, we force a sequencing even though it is not explicitly required in the code.

Which leads me to wonder if arrows are related to sequencing of data flow of predicates with more complicated data-flow variable topology. (http://www.haskell.org/arrows/)

Dataflow variables are very similar to (and a generalization of) unix pipes, and unix pipes are very similar to monads:

http://okmij.org/ftp/Computation/monadic-shell.html

Has all of this been formalized?

----------------------------------------------------
If people never did silly things nothing intelligent would ever get done.
--Ludwig Wittgenstein

Dataflow variables like pipes?

Dataflow variables are very similar to (and a generalization of) unix pipes

I'm not sure we are using "dataflow variable" the same way here, but this doesn't seem right to me.

The ports/streams of message-passing concurrency are generalizations of pipes, but dataflow variables in the CTM sense are monotonic (once a value is set, it can't be taken away) and deterministic (they only have one observable value), which are not properities shared by pipes.

fast and loose

Yeah I was playing a bit fast and loose with terminology, but it wasn't with the idea of a dataflow variable.

Data that goes through a unix pipe is completely untyped/unstructured but in practice a line of text acts in much the same way as a dataflow variable. Lines of text are monotonic, once they go into standard out we are stuck with that value. I think identyfing an analogy between the set of valuations of a given dataflow variable in a non-deterministic language and the lines that pass through a unix pipe is a fairly accurate one.

Consider the following program:

echo "ls" | grep ls | sh

this program will list the current directory and if a line=dataflow var it could correspond roughly to:

echo("ls",Y),grep(Y,"ls",Z),sh(Z,Output)

how about:

ls /bin/ | grep ls | sh

ls("/bin/",Y),grep(Y,"ls",Z),sh(Z,Output)

Here Y aquires a number of valuations all of which then are "piped" through to grep, which kills off a number all but one, and sends it to sh.

Anyhow, that was the correspondence that I was refering to. The generalization is simply the fact that the topology of the flow doesn't have to be totally linear.

----------------------------------------------------
If people never did silly things nothing intelligent would ever get done.
--Ludwig Wittgenstein

Ceci n'est pas un pipe

I think identyfing an analogy between the set of valuations of a given dataflow variable in a non-deterministic language and the lines that pass through a unix pipe is a fairly accurate one.

I guess I'm still not seeing this one. I can see how you are saying that a stream is like a sequence of single-assignment values.

However, the only access method in this instance is by sequence, whereas dataflow variables have naming.

Are you observing that declarative concurrency with dataflow variables imposes an observed order on evaluation? If so, I'd have to say that is sort of the whole idea.

Still not what I meant.

I was trying to say that each line of text was like one valuation on a single dataflow variable, not that its like a valuation on a sequence of dataflow variables without naming. ie. f(x,y),g(y,z) where y takes on the successive values of each line of output from f. Pipes have naming as well, its just that the names are resticted to identifying the standard output, standard input and standard error.

Are you observing that declarative concurrency with
> dataflow variables imposes an observed order on
> evaluation? If so, I'd have to say that is sort of
> the whole idea.

Yeah, I think this is what I'm observing. Isn't this also the whole idea of monads and arrows? If so, is there a formal connection?

----------------------------------------------------
If people never did silly things nothing intelligent would ever get done.
--Ludwig Wittgenstein

Lambda: the Ultimate Sequencer

If so, is there a formal connection?
I believe that the most primitive way (among the popular) of specifying sequence of evaluation is CPS. Everything else is sugar on top of it ;-)

For a more serious account, see From Direct Style to Monadic Style through CPS .