Folding neither Left nor Right (or how to avoid overspecifying the solution to a problem)

Can anyone tell me which functional languages support a non-order-specific fold, and what the name of those operations are? I read somewhere that sometimes reduce is non-order specific, but other places claims it is the same as foldl. Clearly it depends on the languages, but I don't trust those source, so I thought I'd ask my favourite group of egg-heads. :-)

It seems that specifying left or right folds when the function being used is associative is over-specifying the solution to a problem. In other words you are solving a more specific problem than stated. Commonplace in imperative code, but it should be more easily avoided in functional code. Perhaps this is moot because it might be an "easy" problem for compilers to figure out if a "clean" function (e.g. pure functional) is associative and/or commutative based on the primitives.

I am studying the problem of optimizing / parallelizing pure functional code. So any good pointers to basic primers on the internet would be much appreciated. I am particularly interested in those common-knowledge optimizations of functional code. I don't have time to purchase any books, so online references would be most appreciated. It would be cool to gather a compendium of functional optimization tips and tricks on this thread, but that might be hoping for too much.

Comment viewing options

I'm not exactly sure I see

I'm not exactly sure I see how you could have a sort of "neutral" fold like you speak of. The reason you have a foldl and foldr in, for example, Haskell, is because when you pass a function to the fold, the associativity of the function becomes essentially irrelevant since the list you are folding over is decomposed 1 element at a time, and the fold is applied to that element along with the result of its previous application.

So, for example, although one can talk about the associativity of (-), and say that the evaluation of:

0 - 1 - 2 - 3 - 4

could be interpreted either as:

(((((0 - 0) - 1) - 2) - 3) - 4)

or

(0 - (1 - (2 - (3 - (4 - 0)))))

... for foldl and foldr respectively, in terms of the implementation of fold, this is somewhat misleading because the fold functions cannot determine the associativity of the function or operator passed to it. You have to specify that explicity, which is partly what is meant when you say foldl vs. foldr. In fact, in a functional language based upon the lambda calculus (which is nearly all of them), function application is always left associative, and it's only operators that may have different associativity. But when you pass operators to functions, they are usually from that point just treated as any other function, so their associativity is, again, basically lost in the mix at that point.

If there were a way for the fold function itself to determine, at the runtime level, what the associativity of the operator passed to it was, then I suppose you could have it determine whether to fold left or right according to that information. I don't know of any languages that do this sort of thing though. However, you can "fake" it if you like by creating an ADT that encodes a function along with an associativity, and then passing that to a sort of "intelligent" fold. Not a very elegant solution though, in my opinion.

Anyway, I don't know if any of that helps, but maybe it does :)

Isn't "reduce" another name for "fold" rather than "filter"

Hi Darin,

Thanks for you response.

I have never seen "reduce" called "filter". Can you please point me somewhere on the web where the terms "reduce" and "filter" are used to mean one another?

I realize that foldl and foldr are specifically order dependant by their very definition in Haskell. My issue is that as a programmer, I would rather define a function such as "sum" without specifying whether to use foldl or foldr since they both will end up with the same result. The idea being that if I don't specify the order, then the compiler can theoretically optimize and parallelize the code more easily.

Maybe I am barking up the wrong tree here.
:-)

They don't necessarily give

They don't necessarily give the same result in Haskell - foldl never terminates given an infinite list for input, whereas foldr can. Consider folding (&&) over an infinite list with a False in it, for example, or using the fold to build a data structure not all of which is evaluated.

Good point

Good point, thanks for bringing it up.

speaking of names: filter -> sieve

I think of filters as stateful processors of streams, e.g. low pass filters, etc. Thus I think a better name for "filter" is "sieve" since it eliminates things from the stream (Smalltalk calls it "select"). When writing stream processors that can also be signal processors, using the name "filter" is confusing to people who are more familiar with signal processing than functional programming.

Folding

When I've used an unspecified fold, I've called it "fold", i.e., neither l nor r. But using this function carries a proof obligation, since the implementation can assume that the operator is associative, and that cannot be machine checked (in general).

BTW, the "sum" function is the same with foldl and foldr if you add up Int, but certainly not Double. (For Double you should really use http://en.wikipedia.org/wiki/Kahan_Summation_Algorithm)

i'm obviously clueless

How about operations are typed so the fold knows what kind of associativity they respect? Then fold does the right thing based on that? Sorta like Monads, they aren't proven by the compiler, whoever makes one has to make sure to not blow up the contract.

Sure

You could have a type of associative operators, and ways to combine them that maintains associativity. And then some way to inject functions into this type where you have a proof obligation. It could be helpful.

But the comparison with the Monad class isn't quite right. As far as I know, no Haskell compiler uses the (unproven) monad laws for any transformation. So if you lie to the compiler about something being a monad then nothing bad happens (which is lucky since many types in the monad class aren't. :) )

"associativity" vs. "associates"

Close, but not quite. It's rather perverse that these two forms of the same word have rather different technical meanings. You seem to be getting these meanings confused.

First, if you talk about the "associativity of subtraction," most mathematicans would take that to mean "subtraction is associative," which means that for all x, y, and z:

x - (y - z) == (x - y) - z

You seem to understand that this is not the case. In fact, it's easy to use elementary algebra to prove that this equality is true if and only if z = 0. (Although this isn't the case in group theory.) Therefore, there is no "associativity of subtraction," at least when you are talking to a mathematician.

On the other hand, there are the conventions that define standard order of operations. Sombody will say that "subtraction associates to the left" to mean that when we write

x - y - z

we actually mean

(x - y) - z

This is purely convention, no more and no less. "associates" has nothing to do with fold-left vs fold-right.

On the other hand, "associativity" is quite relevant to fold-left and fold-right. Namely, when passed an associative operator, fold-left computes the same value as fold-right when at least one of these additional conditions are true:

1. the initial element is the identity element for that operation
2. the operation is also commutative.

If the operation is not associative, it all depends. Fold-left probably does not compute the same thing as a fold-right.

"fully"

I've generally found it easiest to explain by calling associative operations "fully" associative instead of "left" or "right" associative. People seem to get that pretty easily.

Subtraction

(Although this isn't the case in group theory.)

Depends on the point of view really:

(a) If one considers '-' as a binary operation [over a set of integers], then the the structure is not a group but rather a quasigroup without identity (similarly, nonzero rationals under division).

(b) If one considers '-' as a notation for the inverse element, then we have a group [of integers] under addition where the '+' operation is implied:

a-b-c a + (-b) + (-c)

Neutral folds

A compiler for a Turing-complete language can't ascertain whether an arbitrary function of two variables is commutative. So, could we hope to come up with an incomplete but conservative solution for determining commutativity that succeeds in simple cases? Unfortunately, even that seems intractably difficult to implement, and even then, only likely to work in the most simplistic of cases.

An area for research

So does this mean it is a good area for research? One solution, might be to allow programmers to provide attributes for functions with regards to their properties (i.e. associative, commutative, etc).

Go for it

I'm sure it's worth researching for your own benefit, and you just might come up with something. :) I really don't know what has or has not been done with your question, but I can offer generic advice.

It's pretty well known that you can use associativity and commutativity properties to optimize folds, especially really large folds across multiple machines. As you suggest, you could declare something to be associative and commutative, though I certainly would not trust a hurried programmer to create these declarations.

Tim's right. Associativity and commutivity are undecidable properties in general. Unchecked declarations might be the way to go if you are looking to get something done, but this probably isn't a viable line of research.

There are interactive theorem provers that help humans generate a machine-checkable proof of associativity and commutativity. This a possible application of proof-carrying-code, as a means of safe optimization instead of a means of security. (Though the two are interrelated.)

Another tactic for dealing with undecidablity is to restrict your language. Can you find a restricted language that produces only associative and commutative operations, and is still sufficient to efficiently express most or all operations of interest? Can you restrict your language so that a fully automated theorem prover can decide whether or not something is associative and commutative?

Regular expressions, SQL, HTML, CSS, PDF, are examples of languages that sacrificed turing-completeness in order to gain something else. (Such languages also have the tendancy to creep back to turing-completeness.)

Finally, you may decide that you don't really need associativity and commutivity, that your conceptual domain and range are actually a quotient type. Then you should prove that functions you define are in fact well-defined.

Associativity, Commutativity, and Quotients

A language which supports declaring associativity and commutativity of a operators in a non-trivial way is OBJ3 and presumably other members of the OBJ family. It is covered in section 2.4.1 of Introduction to OBJ3 (the correct link to the PDF there is also a link to a PS on the page). In general, the OBJ family is interesting.

Leon's remark at the end makes me again bring up Constructing Polymorphic Programs with Quotient Types. A function accepting an unordered pair would be a commutative function.

Crazy idea

The typical PLT approach would be to define an algebra of symmetric functions, enabling you to build up any symmetric function from a set of primitive symmetric functions. Then the primitive symmetric functions could be certified to be symmetric by the compiler and off you go.

It turns out there's a much simpler approach that works in many cases.

Suppose we are dealing with numerically-valued functions; or more generally, functions taking values in an integral domain. Let f be such a function. We say that f is symmetric when f(x,y) = f(y,x). Note that this is equivalent to 1/2(f(x,y) + f(y,x)) = f(x). For any function f, symmetric or not, 1/2(f(x,y) + f(y,x)) is called the symmetrization of f since it is always symmetric in x and y. But only in the case where f is symmetric does f equal its symmetrization.

Underlying the symmetization construction is the symmetrize combinator, which we could define in Haskell as

symmetrize f x y = ((f x y) + (f y x))/2

Suppose that this symmetrize combinator was provided by a language's standard library and certified to only produce symmetric functions. (Not unlike how the memoization function in the Standard Prelude is assumed to be side-effect free from the outside.) Then there could be a neutral fold only taking folders produced by symmetrize. The programmer's burden shifts to proving that f = symmetrize f externally to the type system.

You can play similar tricks for associative functions using so-called associators.

Aggregation and databases

The operation you want is called aggregation, which, abstractly, takes a multiset, a binary associative and commutative operator over the members of the multiset, and the identity element for the operation. The sum and the product of the elements of a multiset are the aggregation of addition and multiplication, respectively, over that set.

SQL databases implement aggregate functions like SUM, PRODUCT, MIN, MAX, etc., but not a higher-order aggregation function. This is in fact highly relevant to you, since SQL is after all a declarative laguage that gets compiled to an execution plan, and many database engines do in fact exploit the commutativity and associativity of the underlying binary operations to optimize and parallelize the SQL aggregate functions; that is, the problem you're interested on here is intimately related to many database query optimization problems.

Aggregation is also of fundamental importance to dimensional modeling in OLAP database applications.

So if I understand you,

So if I understand you, aggregate atomic operations, are mostly ignored in many functional languages? In the Google map/reduce framework do they parallelize the reduce function I wonder?

So if I understand you,

So if I understand you, aggregate atomic operations, are mostly ignored in many functional languages?

I'm not sure I understand this question. Anyway, I think the deal here is that most functional languages use lists as their most common data structure. Fold right is a very natural operation over a list, since it essentially performs a recursive computation that's isomorphic to the list it's given (the good old "replace cons and nil with your choice of function and value" description of fold right).

Aggregation, on the other hand, is an operation over sets or multisets (a.k.a. "bags"), a data structure that functional languages have not tended to emphasize as much as lists. Relational query languages, on the other hand, have the relation, a set of tuples, as their fundamental data type.

So I think one approach to what you're interested in would be to for the language to provide abstract set and multiset datatypes, and higher-order aggregation combinators over these. The contract here is that if you're using the set or multiset datatypes, while there may be a way to construct a set recursively, when you deconstruct the same set, you can't necessarily recover the order in which you constructed it (which is what makes sets and lists different). Then, your underlying concrete data structures, and your implementation of the aggregation combinators, can be such that the operation parallelizes.

Didn't the Kleisli Query System do this?

So I think one approach to what you're interested in would be to for the language to provide abstract set and multiset datatypes, and higher-order aggregation combinators over these.

If I recall correctly, Kleisli took this approach, including primitive structural recursion operators for lists, sets and bags. These primitives were also exposed via comprehension syntax for each type of collection.

I think Kleisli deserves a bit more attention for its efforts to bridge databases and programming languagues.

In the Google map/reduce framework do they parallelize the reduce function I wonder?

googles map/reduce requires the user to supply a custom partitioning function. the partitioning controls the distribution of the data before the reduce step. it has to be intelligent enough to guarantee that the distribution before reduce does not break associativity.

besides that, reduce operations that are both commutative and associative are optimized through an additional interface called combiner.

i guess that such decisions are deliberately burdened onto the user because a generic solution (that might be transparently implemented ontop of fold[lr]) cannot automatically decide distributability.

"insert" in APL & J

I don't think this answers your question, but it's another way of looking at folds. In J, the "insert" operation works by inserting the specified verb between all the elements of a list. So "+/" applied to "1 2 3 4 5" means "1 + 2 + 3 + 4 + 5". "/" is the insert operator; "+" is the verb to insert. I've always found this easier to explain than fold.

That is indeed useful. I

That is indeed useful. I want to write an introductory programming text at some point, and I find this a useful way to think about fold/reduce.

Foldr

That is indeed useful. I want to write an introductory programming text at some point, and I find this a useful way to think about fold/reduce.

Isn't this how everyone explains foldr? That is, you think of foldr f a applied to a cons list as replacing the conses by f and the nil by a. This then motivates how to define a foldr equivalent for other algebraic data types.

The J-style way of thinking about the 'over' adverb is only correct when the verb you're using is associative. But I don't think the usual functional programming explanation of foldr (as described above) is any less intuitive than James's explanation.

But the APL operation...

...performs n-1 binary operations on an array of size n while fold performs n binary operations on a list of size n. Of course they turn out to be equal for some choices of binary operation.

Data structure API contracts

While not exactily the same as what you asked, I've delt with this situation in Edison
by introducing a family of folds where the order is unspecified by the API contract. Concrete datastructure implementations are then free to fold in any order that is convienient.

Of course, the behavior is fixed for any particular concrete implementation.

SAC Array Fold

The array fold operation in Single Assignment C has an unspecified order of evaluation. I believe this is solely to the compiler can optimise the heck out of programs that use it.

It's not just the direction

Thinking of foldl and foldr as "the same, but in different directions" is useful, but not the whole story. In reality, both work from the left, as dictated by the structure of lists, but foldr is plain recursion while foldl is accumulator recursion. The difference in behaviour is quite noticable.

You could of course have a symmetrical fold, with a signature like

fold :: (b -> b -> b) -> (a -> b) -> b -> [a] -> b,

which could split the input sequence at any point it likes. But of course, for plain lists, you get foldr again. For a different sequence type, say something tree-like, you could implement a divide-and-conquer scheme, which is actually a good idea. I've seen it implemented at least twice: MPI has it (sort of, MPI is a C API and allows only some simple reductions, but it does implement them in parallel and it makes sure, they are indeed associative) and Dessy (a library of "Democratic Sequences" by Robert Will) had it in full when it was available.

However, the skewed reductions (foldl and foldr) are just so damn useful, you'd provide them anyway. And the symmetric reduction could need a counterpart with an accumulating parameter, though it's not entirely obvious what would be the natural way to implement that.

And the direction is already in the list

In reality, both work from the left, as dictated by the structure of lists, ...

For that matter, some of the inherent assymmetry is already in the structure of the lists. There is an intuitive, pre-theoretic understanding of a list as neither left- nor right-recursive, but more symmetrically just an (ordered) collection. Think of an array rather than a cons- or snoc-list. Or better, picture a list as a row of cells, rather than as a left-recursive "tree" built from cons cells.

I came from an APL background and had that intuition about arrays when I first started using ML and Miranda; the need to commit to a left- or right-recursive understanding of lists was a difficult part of the transition for me. Of course, now it's hard to get back to that other way of thinking; now I find it hard to even think of the structures without thinking of the traversals as an inherent part of them.

Endomorphisms

The problem with a non-specific fold is that, for it to be well-defined, it must be over an operation + that is associative; and for that to be well-typed, it must be an endomorphism: (a + b) + c = a + (b + c) requires that the type of the outer +'s match, which demands that the types of both operands be the same, hence + : S x S → S. You cannot lift Cons to an endomorphism, since both its arguments are different: Cons : α x List(α) → List(α).

However, you can morph the structure of nonempty lists built from the singleton constructor and list concatenation:

• [_] : α → List(α)
• ++ : List(α) x List(α) → List(α)

Now define fold as

fun fold (+) [x] => x
fold (+) (m ++ n) => (fold (+) m) + (fold (+) n)


Then fold (+) is a semigroup endomorphism whenever (A, +) is a semigroup. It is clear that the source of the "non-specificity" in the fold is the indeterminate choice of where exactly deconstruct a list l into a catenation m ++ n.

Parametrizing by direction (left or right)

You may be interested in The Essence of the Iterator Pattern, which featured here before.

Idioms are a generalization of Monads and they allow an idiomatic map operation (a map with effects) for a very large family of data types.
Of interest to you is the fact that all idioms have a dual idiom (in the paper it is called the Backwards Idiom). By default an Idiom will propagate effects from left-to-right, but with Backwards it will do it do it from right-to-left (this is discussed briefly in Section 4.3).

Now, accumulations (or, in other words, reductions) are just one instance of idioms. So you can get reductions from the left or from the right (these are just generalizations of foldl or foldr for all Traversable "containers"). In page 11 of this presentation you can see these two functions: reduceL and reduceR. There they are specialized already, but we could have instead defined:

reduce :: Traversable t => IAdapter (Acc b) -> (a -> b -> b) -> b -> t a -> b reduce dir f = flip (runE . paccum dir (E . f)) 

The parameter given by "IAdapter (Acc b)" is the direction you wish. So, you could have a function:

sum :: IAdapter (Acc Int) -> [Int] -> Int sum p = reduce p (+) 0 

that does not specify whether if the traversal is left-to-right or right-to-left, delaying that decision to the calls of "sum". Perhaps, more interestingly the function

sub :: IAdapter (Acc Int) -> [Int] -> Int sub p = reduce p (-) 0 

depending on which direction you choose, will give you different results.

The point is that while verifying properties such as associativity is a hard problem, parametrizing the way the binary function associates is not. This gives you a way to specify functions without commiting to either Left or Right.

Thanks to everyone who has responded, it is much appreciated. These are very illuminating posts.

Syntactic sugar

Is the difference between foldl and foldr just the order of the parameters to the function? If so then you can make a "universal" fold with just a little syntactic sugar to tell which parameter is the seed and which is the list, putting them in the order they will be used. Something like:

let fold f a b = match a, b with
| (seed z), xs => fold_right f z xs
| xs, (seed z) => fold_left f z xs


That is what I do in dodo.

No, it's not just taking

No, it's not just taking arguments in a different order.

foldl c n = foldr (flip c) n . reverse

foldr isn't quite definable in terms of foldl (in a lazy language).

NESL

NESL is a data-parallel functional language. It has a parallel fold which is actually called "scan", the more general operator. It folds an associative binary function over a vector, in O(log n) depth, and returns a pair of both a vector of intermediate results and the final result.

You might find http://www.cs.cmu.edu/~scandal/nesl.html useful reading, though admittedly the web-page is a bit old.