Species: making analytic functors practical for functional programming

Species: making analytic functors practical for functional programming, Jacques Carette and Gordon Uszkay. Submitted to MSFP 2008.

Inspired by Joyal's theory of species, we show how to add new type constructors and constructor combinators to the tool set of functional languages. We show that all the important properties of inductive types lift to this new setting. Species are analytic functors, representing a broader range of structures than regular functors. This includes structures such as bags, cycles and graphs. The theory is greatly inspired by combinatorics rather than type theory: this adds interesting new tools to bear, but also requires more work on our part to show that species form a good foundations for a theory of type constructors. The combinatorial tools provide a calculus for these structures which has strong links with classical analysis, via a functorial interpretation of generating series. Furthermore, we show how generic programming idioms also generalise to this richer setting. Once the theory is understood, various methods of implementation are relatively straightforward.

Learning more about the theory of species and working out its implications for programming has been on my to-do list for several years now. It really seems like the natural way to more tightly integrate combinatoric ideas into the functional programming style. (For an example of how fruitful this connection can be, consider how datatype differentiation explains zippers/functional pointers.)

However, there's been this whole "graduation" and "primary research focus" stuff that's kept getting in the way, so I'm happy to see that Carette and Uszkay are figuring it out so I can just crib from them. :)

Comment viewing options

Just beat me!

I was just about to post this as well. Species have come up on LtU many times over the years, sometimes explicitly and sometimes in the background. Lots of people seem to share the hunch that this is going to be a very important direction for functional and datatype-generic programming in the coming years, so I'm very excited to see this work being done!

(Incidentally, I think this is also a great example of some of the stuff that generally falls under the cultural heading of "functional programming", but is really only tangentially related to the way that most people understand that term.)

This is *submitted*

Please note that this paper has been submitted to MSFP but not accepted. It might be good to edit the post reflect that.

And now rejected

MSFP rejected this paper, for pretty much the same (good) reasons already brought up here: as written, the paper was way too imprecise.

And the referees were right! I was too close to the topic to properly see it, but this write-up was simply not ready for publication. This is one thing I really like about parts of the PL community: refereeing is taken very seriously and standards are high. And because the referees take their job seriously (or at least the 3 that looked at my paper), the feedback was fantastic.

It has happened to me three times now that a paper where I am co-author gets rejected and, after reading the referee reports, my conclusion is that I agree with the referees. This does not always happen, naturally, but my current sample set makes me happy to continue submitting papers to conferences such as MSFP.

In any case, by late June or mid-July, an updated version of the paper should be available, with all this feedback taken into account.

Confusing

The paper is written with frustrating vagueness, especially in its basic definitions.

The authors define a species as a certain kind of functor. But then they define species by means of vague analogy to data structures without stating clearly what the functor is. For example, we see the species 0 of "structures that can't be instantiated". What functor is this? The functor from finite sets to the empty set? Then we see the species epsilon, "a set with membership but no order on the elements". Aren't species supposed to be functors? If so, what's the functor? What does it mean for a species to be a set?

The definition of composition equates (F o G) with F(G[A]). What's A? What does it mean to equate a species to a structure? Or does the author mean (F o G)[A] = F(G[A])? What do the graphs with nodes and lines have to do with the underlying functors?

What does it mean to restrict a species to a cardinality n? Its functor maps finite sets to finite sets, so are we talking about restricting the cardinality of the sets in the functor's domain? Or its range? If the later, is the resulting functor partial?

The paper gets clearer later on (e.g. the definition of molecular species). But by then most of the non-expert audience is lost, which is unfortunate given the impressive breadth of the field that the paper covers and the translation into a generic-programming style implementation in Haskell at the end.

Thanks for the feedback

The paper was originally written using set-theoretical and combinatorial language only (like the [Berg89] book) but then we decided that for MSFP we could safely assume categorical knowledge. We had originally tried to give more 'intuition', and unfortunately that came out as vagueness. We rewrote parts of the paper in a more categorical style, but did not redo the earlier parts. Clearly we should have done that too.

We will fix the vagueness - whether the paper gets accepted to MSFP or not. Hopefully we can do that in the coming week, and we'll update the version on the web site.

To answer some of your questions: The species 0 is the functor that maps all sets to the empty set (and all arrows to the identity). The species \epsilon of "elements" is the functor that maps any set U to the set U, and is the identity on arrows (ie keeps the arrows as is); in other words the structures on U are the elements of U. We carelessly described functors by their images on objects, in other words the "set encoding" of a data-structure [the arrows being bijections]. Think of the usual coding of a pair (a,b) as the set {{a},{a,b}}, or the disjoint sum of sets U and V as the set {(0,u)|u\in U} union {(1,v)|v\in V} [where you have to encode that as a set too]. So the low-level set maps are hideous, and that's why one wants to immediately move away from that. But the low-level maps are often closer to actual implementation details! So unlike some of the literature on models of data-structures as Functors, we wanted to keep a bit of that visible since B-species and L-species offer different perspectives on memory hierarchies [for example, something we left out: using 5-sorted species, we can model the 5 levels of memory (register, L1, L2, main memory, on-disk) ].

For composition, we should have written (F o G)[A] = (F(G[A]]) as the "on objects" definition. Graphs and nodes are then the usual visual abstraction that one always does when doing set-theory: you associate an abstract object (like a graph) to a particular set-encoding.

Cardinality restriction can be defined precisely as
F_n[U] = if |U| = n then F[U] else \emptyset

Thanks!

Thanks for elaborating on the definitions. As I spend more time with this, I'd be happy to provide more feedback directly if you'd like (just email me: tim@epicgames.com).

There is an impressive amount of information in this paper. This is an important field for future data structure work, so I see a lot of value in presenting the material in a way that's accessible to technical people in other disciplines.

email?

[Sorry for the noise on LtU] Did my email make it through?

What's functorial composition?

I still can't see the difference between composition and functorial composition. The definitions in the paper are F(G[A]) and F[G[U]], which (to me) look identical. (Is there a difference between brackets and parentheses?) I'm guessing the difference has to do with U being the "underlying set", but I have no sense of what that means.

Imprecision, again

I have uploaded a new version of the paper where the definitions are much more precise [at least on the object part of the functor, the arrow part is usually straightforward]. You guys really hammered the point home that the intuitive'' definitions were simply insufficient [they below in a much longer expository paper, along with many more pictures, but skipping the formal definition was a mistake].

Composition is really substitution - F\circ G is really an F-structure on a set of G-structures obtained by partitioning the underlying set. It corresponds to composition of power series. This is what people mean when they say "a List of Trees". Functorial composition corresponds to putting an F-structure on the set of all G-structures on the underlying set, which corresponds to composition of series coefficients. The neutral element for Functorial composition is the species of pointed sets, while X is the unit for composition.

Sets versus other collection types in species construction

Jacques, Gordon,

In reading your paper i am curious as to how central the role of set is as the collection widget. Could you parameterize the construction over a monad that represented a "notion of collection"? For example could you build species over multi-sets or lists or trees? Could you build species over species?

Best wishes,

--meredith

The role of Set in species

Set is used (as a category) because it is a very well-understood category with many useful properties for encoding'' data-structures. It is not the only category that can be used -- but you need at least a CCC (and possibly more, I have not checked all the details) to make things work. But the main point is that sets are used as in the standard encoding'' of classical mathematics in set-theory. So while the result is a set, that is as useful to the theory as saying that all Haskell ever manipulates is bit patterns. This is true, but not the point. One needs to abstract away from the 'implementation mechanism'.

Species allow you to built multisets, lists and trees -- why would you want to use those for further encodings? I don't quite understand this part of the question.

It has recently been shown that generalized species (see reference in the paper) form a CCC. I am reasonably sure that the category of generalized species is rich enough to built most of the theory of species atop that.

Having said that, I am less sure that species built on categories other than Set would have the fundamental property of corresponding exactly to analytic functors (ie those functors which have series expansions). The proof of that theorem seem more Set-oriented than the rest of the development of the theory.

The role of Set in species

Jacque,

Thanks for your thoughtful response. One of my motivations for wanting to generalize the notion of collection -- even though species can construct notion of collection -- is factorizations. We might want to think of various complex data structures in terms of different factorizations through notions of collection. This is very useful if we have different kinds of support for notions of collection -- e.g. hardware support for XML processing -- and want to target efficient representations of our specific data structures over multiple "platforms".

On a different note, can i expect an analytic solution to the analogue of the equation for the golden mean? That is, can i expect an analytic solution to

GM(x)*GM(x) ~ GM(x) + 1

How does this solution compare to the continued fraction

1 + (1/(1 + 1/(...)))?

More generally, is there correspondence between continued fraction representations and analytic solutions to the analogous domain equations?

Best wishes,

--meredith

Some answers

Could you be more precise about your notion of 'factorization'? Note that species (like regular functors) really are about abstract notions of types and containers. They are explicitly quite independent of representation and implementation issues. To me, species neither helps nor hinders efficient representations, that is an orthogonal issue. But if I understood better your notion of 'factorization', I might be able to say something more cogent.

You can generally expect analytic solutions for species equations whenever the analogue equation has an analytic solution. In the case of your equation, the result is not very interesting as a type constructor as there is no type variable in the answer. In other words, you get a structure as solution, which is the same structure that you get interpreting the continued fraction as a structure, but it has no places to put data in it. This is to be expected, since it is the generalization of a 'constant', so it is pure structure.

However, for other equations, if the analytic equation has an analytic solution (in terms of continued fractions or series), then you can expect the species equation to also have a solution.

Places to put data are overrated

Jacques,

Thanks again for your detailed answers. i'll give more factorization examples in a later posting. But, as "closed" structures, these are actually of great interest to me. Set theory without atoms is closed. Here's a finite approximation for a free theory (that needs to be whacked down by a structural equivalence corresponding to the set theoretic axioms)

S ::= { S* }

A similar equation characterizes Conway games -- which are extraordinarily expressive.

G ::= { G* | G* }

All of the reflective soln's i've devised for various computational calculi are closed. For example a "closed" Ï€-calculus

P,Q ::= 0
x?(y).P
x!(P)
P|Q
@x
x,y ::= 'P'

This is expressive enough to encode the ordinary Ï€-calculus; and yet, there is no "hole" for data. The solution to the equation is pure structure. (Note: i've also shown a solution for an FM-style set theory with atoms that follows this same procedure.)

Finally, every such structure i've investigated has a representation in terms of a 2-level type decomposition + a 'knot tying' eqn. So, you can always get from a "pure structure" spec of this form to something that "carries data".

Best wishes,

--greg