## Generating nested data types at runtime

Hi,

I'm currently implementing an algorithm which allows one to synthesise coalgebras given various local/global specifications, these being coalgebras of a covariant endofunctor T on Set^m where Set is the category of sets and m >= 0.

I have a problem regarding the genericity of what I am trying to implement, which I hope someone can shed some light on. Although I'm using Haskell, I'd be interested if any other languages are capable of what I ask. My problem is not with the algorithm itself but rather with an initial preprocessing stage.

In general terms I want to construct a collection of interleaving data types given some specification of how they need to be interleaved. This needs to happen at runtime. Previously for my MSc I implemented a fragment of this algorithm, at that time I got round this problem by generating the Haskell code with a script -- kind of an ugly solution. On the other hand I guess its hardly a common thing for the user to create their own data types.

I'd very much appreciate any pointers,

Cheers,

Rob

## Comment viewing options

### Terminology

While I am interested in what I believe you're trying to do, I can't be sure because you've used a term I don't understand, "coalgebra", and for which the wikipedia entry is both 1) mostly defined in terms of other terms with wikipedia entries with these same 2 properties, and 2) gives no example which allows me to deduce what the term's useful application might be in the context of programming languages.

Could you do me (and probably a number of others) a favor and give a good reference answer for "What is a coalgebra and why is it useful?" which is easily understood by someone who 100% understands Haskell (type theory and all)?

### I second this.

While I have a vague idea what a coalgebra is, I'm interested in more details too.

### Ah, Template Haskell -- that

Ah, Template Haskell -- that looks like what I need, cheers

### Motivating Coalgebra

The wikipedia page for F-coalgebra is sadly very sparse and seems to provide very little if any motivation. Here's my own take on coalgebra; I've tried to make few assumptions about previous knowledge.

In Computer Science there seem to be many different kinds of 'transition' from a state to (or over) a subset of states; e.g. deterministic (a trace), sibling-ordered (a tree), nondeterministic (a relation), probability distribution (discete Markov chain), monotone neighbourhoods (important in conditional logics), coalitional (where agents' joint action chooses next state), multisets (yielding multigraphs) and many others. Also we might have two or more kinds of transition in parallel at a particular state e.g. two trees over the same states, or one might freely choose between different types of transition or require some alternation.

All of these transition structures turn out to be coalgebras. More precisely the notion of 'transition' can be captured as an endofunctor T on the category Set of sets (or Set^n for multisorted transitions). Given a set X then TX is the collection of 'transitions' over X e.g. (i) for binary trees TX = X^2 (Cartesian product) is the collection of pairs of states (ii) for nondeterminstic transition systems TX = PX (Powerset of X) is the collection of subsets of X, thought of as possible successors. The coalgebras for T consist of a set of states X and a function c: X --> TX.

It's not surprising that for any notion of transition and set X that we can define a set TX of possible transitions. However saying T is an endofunctor really specifies that it acts on functions in a nice way. I won't go into the details but I will say that from first principles one can deduce that any reasonable notion of transition yields an endofunctor T. The action on functions yields notions of bisimulation/behavioural equivalence, amongst many other things.

Two examples:

1. Haskell programs f: A -> B may be understood as (usually infinite) coalgebras.

Suppose we wrote the standard eval :: Expr -> Nat  for the data types:

data Expr = Value Nat | Add Expr Expr data Nat = Zero | Succ Nat 

Writing this program is the much the same as defining a coalgebra c: X -> TX where: T is the (polynomial) functor TX = {Zero} + {Succ}*X induced by Nat and X is the collection of all expressions Expr (hence infinite). For example, we'd define c(Value Zero) = c(Add (Value Zero) (Value Zero)) = 0 and we'd send other expressions t to (Succ,u) e.g. c(Add (Value Succ t) u) = (Succ,(Add t u)). If one wrote a Haskell program it would most likely include: eval (Add s t) = (eval s) + (eval t) -- this can be seen as defining the coalgebra c recursively.

Then the 'behaviour' of an expression in the coalgebra c is a sequence of Succ-labelled transitions which terminates at Zero, e.g. Add (Value (Succ Zero)) (Value (Succ Zero)) --> (Succ,Add (Value Zero) (Value (Succ Zero))) --> (Succ,(Succ,(Add (Value Zero) (Value Zero)))) --> (Succ,(Succ,Zero)) .

This shows that a program can be understood as a coalgebra.

On the other hand many functional programs can also be understood as a mixture of unfolding (coalgebraic) and folding (algebraic) e.g. the factorial function. The Haskell function unfoldr takes a coalgebra c: X --> {[]} + A * X and an initial state x as input and outputs the 'behaviour' i.e. an infinite stream [a1,a2,a3,...] where c(x) = (a1,y), c(y) = (a2,z) etc. Then computing 4! can be understood as unfolding to get [1,1,2,3,4], then performing a multiplicative fold: see anamorphism and the reference there.

There has also been considerable interest in programs that may be understood as finite coalgebras e.g. finite state automata / transducers etc. Generic algorithms have emerged to synthesise them given specifications.

2. Data can be understood as (usually finite) coalgebras.

Roughly speaking, by data I mean the finite structures we use to represent information/knowledge. These could be particular kinds of data structures e.g. linked lists & heaps, or more generally the finite transition structures used in Computer Science to reason about situations, systems etc. For example: discrete Markov chains, metric spaces and multiagent systems governed by social-laws -- they all have a notion of state and transition, making them coalgebras.

Another important example is the knowledge representation formalism used in description logic, this forms the technical basis for the Semantic Web. There data is represented by a collection of individuals and relationships between them (graphs), together with an ontology which relates these relationships to each other (conditions on these graphs).

I am looking at coalgebras more from this second perspective, essentially because the generic algorithm I have been working on constructs finite coalgebras. Roughly, given various local/global partial observations about coalgebras for a functor T, the algorithm constructs a coalgebra of the intended type that satisfies these requirements; if no such coalgebra is possible it provides a proof to that effect. I hope that this does have applications in (functional) programming, perhaps for using partially specified data, type inference or other things.

### Interesting.

Well, I have to admit, my understanding of coalgebras still involves entirely too much hand-waving at this point. However, the equations in your off-the-cuff explanation don't really work out.

I tried, briefly, to work out your example myself, though I have yet to be successful. I'll continue trying as it would be a valuable exercise for me.

I'm most definitely interested in looking at your work once you decide it's ready for public display. :-)

### I'd be interested in what

I'd be interested in what sense you think my equations don't work out... I'll admit it was off-the-cuff though. I am generally interested in talking about coalgebra in such a way that other people find it interesting, so I really appreciate feedback.

As a side note I'm thinking I'll probably implement it in Agda, dependent types seem too good to miss.

Your coalgebraic morphism "c" has a type error...

As to how to make your work interesting enough and relatively easy to comprehend, which both helps garner wider attention, I suggest reading Doron Zeilberger's 65th Opinion for inspiration, which I pretty much agree with entirely, all the way through.

Start with simple examples that are also interesting enough to properly motivate your work, that people are likely to be at least somewhat familiar with. Of course the best examples vary by audience... Finite automata, mealy machines, and friends would be an obvious choice for a computer science-oriented audience, while Markov chains would probably be a better choice of topic for many other crowds. As the topics are so similar, maybe they should be in the same paper, with a subsequent demonstration of a mealy machine with probabilistic transitions, which would probably help each crowd appreciate the value of the other crowd's perspective.

The coalgebraic generalization of natural numbers is the classic example, and it does have the benefit of being particularly easy to understand, but I also think that it doesn't properly motivate the work. I appreciate that you tried to go beyond this example in your explanation, trying to coalgebraically formalize arithmetic expressions.

In my book, any motivating reason to sit down and learn Agda would be great! :-) Hope this helps!

### Hmm...

I suspect that you can't fully do what you'd like at actual runtime, but you could instead implement a staged embedded language in which the types are constructed in the runtime of the surrounding haskell program. Rather than an actual ADT you would then either have a "meta-ADT" which is a syntax tree of sorts describing the actual ADTs you've constructed, or you could use HOAS and encode your data type constructors and lambdas, if that makes sense...

The papers here may be of interest: http://okmij.org/ftp/Computation/tagless-typed.html