Structural recursion on sets


I am not sure I am entirely happy with the usual structural recursion on sets approaches. They all look like a bit of cheating, i.e. using lists under covers, broadly speaking.

I mean SRI/SRU/powerset/etc tricks (

E.g. in the SRI method (collection insertion representation), we can do matching using the insertion operator that 'disassembles' a non-empty set thereby maing it amenable to something like foldr.

On the other hand, the 'choose' operator is criticised (elsewhere) as 'non-deterministic'. I am aware of various laws and such that make SRI 'work', I just have a feeling that the insertion operator is a kind of deus ex machina as much as 'choose' is !

Please comment.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What is your beef, exactly?

I'm assuming you mean finite sets, so the axiom of choice doesn't come into play.

Are you concerned that "iterating" over a set imposes an ordering (whether non-determinstic, determined by some "secret sauce" such as a hash index or an ordering function like Erlang imposes), etc?

Here's a way to soothe your mind. Map and filter can easily be viewed as parallel operators when going from set-to-set, with traversal order an unimportant implementation detail when implementing on machines incapable of processing every element in parallel.

With fold[l|r], it depends on the communativity and associativity of the operator in question. If the operator is both communative and associative, the results of the fold are order-independent. Otherwise, the results of the fold *are* order-dependent, and traversing a set with fold without specifying an ordering produces non-determinstic results. So indeed, you shouldn't do that. :)

Some languages, of course, feature sorted sets in which you can specify an ordering. Of course, if the order is partial, you may still have trouble.

No, no AC ;)

Are you concerned that "iterating" over a set imposes an ordering (whether non-determinstic, determined by some "secret sauce" such as a hash index or an ordering function like Erlang imposes), etc?

Yes, more or less (see my earlier response to neelk).

Map and filter can easily be viewed as parallel operators

Yes, they can be viewed so, but we just pretend that the ops are parallel and over sets whilst under covers we work with lists. It is harder to pretend that deconstructing a sri set, when computing its cardinality for example, is parallel. If you have the intuiton for that, please share.

So, my concern with structural recursion over sets is implementational and perhaps philosophical rather than theoretical. We know that the 'choose' or 'sri' primitive is arguably not practically viable and yet we reason as if either is. Sort of a bit of cheating, eh ?

I do appreciate your comments.

Your sense that using a

Your sense that using a choice operator and something like sri are equivalent is basically correct -- the two are interderivable. However, sri-style operators are somewhat easier to use if you need to do automated reasoning. Here's why:

A finite set is isomorphic to a list type quotiented by the equivalence relation generated by the congruent closure of associativity and idempotence of cons. That is, you take the congruent closure of the following two equations:

   x :: y :: rest == y :: x :: rest
   x :: y :: rest == x :: rest       when x = y

This means that any function on lists is also uniquely determines a function on sets, if the function is invariant under modifying the list argument by associativity or idempotence. Now, it's easy an easy inductive argument to show that a fold function fold f initis invariant under this relation if the functional argument f is also associative and indempotent, since fold just replaces calls to cons with calls to f.

So the bulk of the proof burden lies in showing that f(x, f(x, a)) == f(x, a) and that f(x, f(y, a)) == f(y, f(x, a)). Note that the variables x and y are universally quantified here. This makes automatic or semi-automatic checking easier, because computers are generally quite bad at instantiating existentially quantified variables at types which have an interesting equality structure.

If you give a choice operator, it will have a signature and specification like:

   mem    : (elt * set) -> bool
   choose : set -> (elt * set) option 

   forall xs:set. 
      forall x: elt. not (mem(x, xs)) and choose(xs) == None 
      exists x:elt, ys:set. choose(xs) == Some(x, ys) and 
                            not(mem(x, ys) and 
                            add(x, ys) == xs

Basically, the specification of choose is that either you get None when its argument is empty, or you get an arbitrary element of the argument and the set that's the result of taking that element out. Note that this specification has an existential quantifier in it. Automatically proving things about programs that use choose directly is harder, because the specification of choose choose existentially quantifies over its return value, and involves the membership function to boot.

Now, observe that you can define sri in terms of choose, and vice-versa.

  let rec sri f e set = 
    match choose set with
    | None -> e
    | Some(x, xs) -> f(x, sri f e xs)

   let choose set = 
     sri (fun x acc ->
            match acc with
            | None          -> Some(x, Set.emptyset)
            | Some(y, rest) -> Some(y, Set.add(x, rest)))

For a set API in a general purpose language, you probably want to expose choose. However, if what you're interested in is in letting programmers define some new group functions in a database query language, you may prefer to use sri in order to simplify checking that what they've defined actually is a good group function.

Ordering and implementation

Thank you. Very nice summary !

I am familiar with derivations similar to yours and have no problem with them. However, what bothers me about 'sri' is that any naive or otherwise implementation would rely on the coincidental ordering introduced during the set construction with the insert operator. So, it seems that we only pretend to traverse a set while in fact we work with a list roughly speaking.

The choice operator gives somehow warmer feeling, but it is unclear how one would go about implementing it as a primitive without translating to 'sri'.