Data, Codata, and Their Implications for Equality, and Serialization

I'm working on a toy language, intended as a functional implementation of Date and Darwen's vision for a 'D' language to compete with SQL.

Right now, I'm considering how far to go with the ideas in Turner's Total Functional Programming.

Distinguishing data from codata and statically enforcing termination is pretty much entirely a win for this language, I think. Nearly all of the functions and I-wish-they-were-functions-SQL-views-queries-procedures I have in any of my example databases are structurally recursive or have entirely opaque implementations.

Some questions I am stuck on:

• Where do the function types fall on this distinction? They are codata, right, because you can only observe them by applying (~= deconstructing) them? (As an aside, if so, is it possible to write a Turner-style codata declaration that starts codata (->) a b = ? I'm not sure if the answer is yes, no, or not without referencing the built-in (->) type.)
• It seems like equality and serializability, both obviously of interest to a database language, are sensibly defined for precisely those types that are data "all the way down" (counting function types as codata types). True, false, or true-ish-but-distinguish-them-anyway?
• There is a weird thing that happens to function types when the following conditions are met:

• The argument type and result type are both data types
• The argument type is "finitely inhabited" (I'm not sure what the real name of this concept is) in that it classifies finitely many values. data DayOfWeek = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday is finitely inhabited, as are Haskell's Int and even Float, but neither Integer nor [DayOfWeek]

When this happens, the (necessarily total) functions magically look like data again. Taking |t| to be the cardinality of the type t, |a -> b| = |b| ^ |a| when these conditions are met. At least in principle (it might take a long time in some or most cases), we can compare functions of such a type for (extensional) equality by evaluating them at every inhabitant of the argument type. We can serialize them by (there may be more efficient ways, but who cares) serializing this table of results. What's happening here? Could/should a language recognize this special case and allow serializing functions with types like DayOfWeek -> Bool? Or provide conversions to/from Map a b from/to a -> b if and only if these conditions hold? Or ignore this whole thing as an oddity?

Comment viewing options

Good questions

I love these questions you have.

Some considerations:
- some languages work with functions as text (JavaScript) or as structure (Lisp). Obviously you still need to apply them to do anything useful, but it opens the door to function building and function transforming.
- a function which is finitely inhabited is equivalent to a table, as you noted, and this table can be built progressively by caching/memoizing. The technique can apply to any pure function (but in these the table may be unbounded).

Type class of enumerable

Your final question about "finitely inhabited" types seems like it would be adequately captured with type classes:

class FI a
count :: Int
enum :: Int -> a

instance FI a => FI (a -> b)
...


I agree

I have a typeclass just like that in my prelude.

The only thing is, unless I am missing something, I think the instance needs to be:

instance (FI a, FI b) => FI (a -> b)
...


In practice it works pretty nicely, because when functions really are simple you can get a tabular version for documentation very, very easily.

I split it up, though, because its possible for (countably-) infinite types to belong in the enum "half" of that typeclass.

Other fun goodies you get are things like

class Finite a
count :: a -> Natural -- pass me undefined, but my parameter needs to be here so that we know which count we are talking about :(

class Enumerable a
fromInteger :: Natural -> a
toInteger :: a -> Natural

-- in the monad of random variables
uniform_distribution :: (Finite a, Enumerable a, RandomGen g) => a -> Rand g a
uniform_distribution a = do
n <- getRandomR (0, (size a) - 1)
return $fInteger n  There's an unenforced law that when something is both enumerable and finite that its fromInteger works on [0, count-1] and its toInteger produces results in [0, count-1]. I'm not thrilled by this rule because its not clearly part of the contract of Finite nor is it clearly part of the contract of Enumerable. Anyone got a better idea? I'm also not thrilled by the (undefined :: DayOfWeek) that you have to pass to uniform_distribution so that it can pass it to size, but I'm not sure if there is anything to be done about it. I think there is some trick to be pulled with using type-level naturals to encode the size. Even further down the rabbit hole, maybe it should really be a type-level cardinal, and the type function Cardinality :: * -> Kcardinality is defined at every type in *? AFAICT you also want AFAICT you also want the instance: instance (FI a, Data b) => Data (a -> b) In addition to: instance (FI a, FI b) => FI (a -> b) Yes This is what I intended with my last hurried post. Data? (Oops, this was a silly question.) Functions as Tries? It may be worth remarking that even when the domain of a function space is countable rather than finite, you can often find codata structures which are isomorphic to function spaces. The classic example is that Stream α ~= Nat → α. If I may use fixpoint notation, that's νy. α * y ~= (μy. 1 + y) → α. Note that α1+y ~= α * y. You can calculate the structure you need by familiar laws of exponentiation. For more in this vein, see Ralf Hinze's "Memo functions, polytypically". Thorsten Altenkirch also did some work in this area. I learned from Thorsten that if we take codata Bush x = x :-: Bush (Bush x) data Tree = Leaf | Node Tree Tree we get that Bush x ~= Tree -> x. tabulate :: (Tree -> x) -> Bush x tabulate f = f Leaf :-: tabulate$ \ l -> tabulate $\ r -> f (Node l r) project :: Bush x -> Tree -> x project (x :-: _) Leaf = x project (_ :-: b) (Node l r) = project (project b l) r  So, yes, functions are codata, and they correspond to infinite memo tries in a systematic way, whenever the domain can be considered a set of finite paths. Amusingly, functions from codata to data types sometimes have an inductive flavour. E.g., μx. β + (α → x), being the free monad on the ‘read an α’ command, is an isotope of Stream α → β when the elements of β are finitary. The idea is that any function must eventually stop consuming its input and deliver an output. Of course, we need to translate away the inner (α → x) at each node, depending on what α happens to be, so codata will probably return. Functions from codata to codata, as explored by Ghani, Hancock and Pattinson, often have a mixed coinductive and inductive flavour. Total functions in Stream α -> Stream β are given by νy. μx. (β * y) + (α -> x), so that only finitely many inputs may be consumed before each output is emitted. You can play similar games to bake fairness properties into types. E.g., you can represent an infinite sequence whose elements are either αs or βs not in any specific pattern of alternation, but none the less admitting a total demux operator yielding a separate Stream α and Stream β. But I digress. *head explodes* *head explodes* So, yes, functions are codata, and they correspond to infinite memo tries in a systematic way, whenever the domain can be considered a set of finite paths. It'll take me a long time to wrap my head around this whole thread, but the possibilities are enticing. I think the first time I made a flimsy connection about this relationship, ie. functions=codata, was in the finally, tagless paper, where they said that the lambda calculus had a coalgebraic structure. When I read that, I reasoned that if data=algebra, then functions=coalgebra implies function=codata. I still don't understand the full implications, but this thread has highlighted some possibilities. For instance, if functions=codata=infinite memo tries, then is compilation the construction of such a trie from a function definition, ie. the trie is the executable instruction stream with edges being branching instructions? By "finite paths", do you mean the domain is a non-recursive or structurally inductive algebraic type? That would make sense, at least for non-recursive algebraic types would elaborate into a sequence of instructions with a finite number of branches as a trie. codata Jan Rutten has been thinking and writing about this for a very long time -- but somehow few people in the FP community seem to be aware of his work, or 'mining' it appropriately. Look at his papers, especially the ones from the late 90s. My personal favourite his his 2003 TCS paper on behavioural differential equations (i.e. defining streams up to behavioural equivalence via differential equations). What Conor is pointing out can be generalized: there is an isomorphism between S () -> x for any species S (considered as a type constructor) and codata T x where T x ~= xS (). The isomorphism works because S is graded, so the molecular decomposition of S as an infinite series translates to an infinite product for T. The most complicated thing (which is not entirely obvious from the above code) is that the labelled product 'Node Tree Tree' translates to Functorial Composition, aka 'Bush (Bush x)' - and I do not know how to do this systematically. More specifically, what I don't know how to do is to take either the algebraic equation, or even better, the corresponding differential equation, and translate it systematically to a useful (algebraic or differential) equation for Bush. In this case, x Bush' = Tree * Bush, but how do I "integrate" this to get Bush(x) ~= x*Bush(Bush(x)). On the origin of species equations Apologies for what's probably a dumb question, but can you explain where you got the following equation? x Bush' = Tree * Bush On the other hand, it looks like Conor's equation for Bush follows in a straightforward way from the goal of being isomorphic to Tree -> x: Tree = 1 + Tree2 Bush x = xTree = x1 + Tree2 = x * xTree2 = x * (xTree)Tree = x * (Bush x)Tree = x * Bush (Bush x) On that species equation I differentiated the equation Bush x = x^Tree(x) which gives Bush' x = Tree(x) x^(Tree(x)-1) then multiplied both sides by x (so as to not need virtual species), and then refolded the definition. Thanks a lot for that derivation of the Bush equation -- I had not seen that. I do believe that your derivation does contain some key ingredients to prove a more general version of this idea. The other example seems different Ok, your differential equation was derived from the original equation. I should have tried that, but I thought you might have had some way to write it down directly. As an aside, one thing that bugged me after posting that derivation was that it didn't seem to use at any step that Tree is inductive. If we take (Bush x) to be defined as (Tree -> x), then all of those steps are valid for any Tree that satisfies the Tree equation, data or codata. But the correspondence between (Tree -> x) and the codata version of Bush only holds if Tree is inductive, so what gives? The answer is that the re-write rule of the derivation above is only justified in being applied finitely many times. If we instead start from the desired coinductive definition of Bush, then the last two substitutions will need the inductive hypothesis for justification. I agree there's probably a more general version of this, but Conor's other example, Î¼x. Î² + (Î± â†’ x) = Stream Î± â†’ Î², doesn't seem to have the same flavor as the Bush/Tree example. Starting from Foo Î± Î² = Stream Î± â†’ Î², it seems that we need to rely on the parametricity of Stream Î± â†’ Î² to make progress. Anyone know how to do this? Good observation It would be quite instructive to do this inside a proof assistant, to really see what assumptions are needed. Of course, this may well be at the very edge of what proof assistants can handle comfortably (i.e. one might run into all sorts of limitations of Coq and/or Agda which are not otherwise a problem in the theory itself). And while I claimed "more general", I just meant up to (Species -> x), not more than. Like you, I am quite sure that there is indeed a 'higher algebra' of this which would cover Conor's example, but it is much trickier. One can easily unwind Î¼x. Î² + (Î± â†’ x) to Î² + Î²Î± + Î²Î±2 + ..., and Stream Î± â†’ Î² to Î²Î±Nat, but these are not isomorphic! The condition that makes it work is that continuity forces one to take a strictly finite amount of information from Stream Î±, and then we have the isomorphism. Note that I believe it is continuity, not parametricity, which makes this work -- but I would be happy to learn that these are somehow tightly related. pigworker tricked us The step I don't see is "then we have the isomorphism." I think the reason is: those types aren't actually isomorphic (even in the case Î² is data). The type Î¼x. Î² + (Î± â†’ x) encodes more information: if you have such a type, then you get to observe how many steps were required to produce a Î². BTW, my suggestion that we use parametricity was nonsensical. Sorry. What's an isotope? I read Conor's 'isotope' as isomorphic data type, but perhaps it's a reference to an isotopic data type? I'm familiar with isotopy from topology, but I don't really see how that concept would apply to a pair of data types. Is it possible the data types are isotopic but not isomorphic? Also, I made a minor edit to the parent post for clarity. Isotopes and streamwatching By "isotope", I do indeed mean "isomorphic type", but in somewhat loose and informal sense here. That said, there's more than a whiff of topology to what's going on, with all the connections between computability and continuity. Fair cop, I did rather skip whistling over a bump when I switched from coding functions with finitary domains (where it's clear how to implement both directions of the iso directly) to functions from codata. For the latter, it's easy to interpret the codes as functions, but it's rather trickier to extract the code from the function. So what's the status of the claim? One needs rather stronger reasoning principles to show that every total function has a code than you get just by being able to apply functions to codata. That's distinctly meta. You need to ask "what can this function possibly do with its input?". Having said that, if we define  {-co-}data Stream a = a :> Stream a data Eater a b = Return b | Eat (a -> Eater a b)  I bet we can use some deep wickedness to implement a Haskell function  encode :: (Stream a -> Bool) -> Eater a Bool  which will indeed produce the code. When I say wickedness, I mean using unsafePerformIO to embed effects in the input to the stream-consuming function which interact with an element supply. Which is all to say that these more concrete characterisations of functions from codata are really external observations on computability, but that they can be realised if you have some means to spy on the machines. Informal isotopes I agree with all of the above. My hangup was looking for an actual isomorphism, when in fact the type Î¼x. Î² + (Î± â†’ x) can encode Stream Î± â†’ Î², but not vice versa. E.g. the following Eaters correspond to the same Stream function: foo = Return False foo' = Eat (\a -> Return False)  my mistake, then That's a fair point. We get that every computable function has an encoding, and we can get hold of that encoding if we can do some sort of intensional observation of the function. But extensionally speaking, we might get multiple (clearly, infinitely many) encodings for the same function. If I want to legitimize my use of "isotope" in these examples, as well as define the things computationally, then I need to consider functions from codata up to the finer equivalence induced by the more intrusive observations (e.g., applying functions to booby-trapped inputs and detecting the kaboom). dirty money where my pottymouth is import Prelude hiding (catch) import Control.Exception import System.IO.Unsafe import Control.DeepSeq data Stream a = a :> Stream a data Eater a b = Ret b | Eat (a -> Eater a b) encode :: NFData b => (Stream a -> b) -> Eater a b encode f = unsafePerformIO$ let b = f (error "Kaboom!") in
catch (b deepseq return (Ret b)) $\ (ErrorCall "Kaboom!") -> return . Eat$ \ a -> encode (f . (a :>))


Sorry, NSFW, I know...

Better with delimited continuations

Nicolas Oury and I just implemented encode in SML/NJ using Filinski's encoding of shift and reset:

datatype 'a Cons = C of 'a * 'a Stream
withtype 'a Stream = unit -> 'a Cons

datatype ('a, 'b) Eater = Ret of 'b | Eat of 'a -> ('a, 'b) Eater

fun tryOne () = shift (fn k => Eat (fn a => k (C (a, tryOne))))
fun encode f = reset (fn () => Ret (f tryOne))


This has the advantage of running f only once. It even works correctly with side-effects in f :-)

bravo!

Yes, that's what I *really* wanted.

Bushes and trees

I took me a little while to recognise that familiar seeming computation. I have at least one generalisation there.

Î¼x. Î² + (Î± â†’ x) is an isotope of Stream Î± â†’ Î²

E.g., Î¼x. Î² + (Î± â†’ x), being the free monad on the â€˜read an Î±â€™ command, is an isotope of Stream Î± â†’ Î² when the elements of Î² are finitary.

Why do you say that the elements of Î² need to be finitary? I did an implementation and it seems quite irrelevant what Î² is.

What if &beta; is codata?

Take β to be Stream α. Express the identity function.

Clear. Thanks!

Stream a -> Sream b

The same transformation on Stream a -> Stream b gives you a well-known coidinuctive type: the Stream Processors.

We implemented that using shift/reset in SML, following up on our previous post:

datatype 'a Cons = C of 'a * 'a Stream
withtype 'a Stream = unit -> 'a Cons

datatype ('a, 'b) SP =
Put of 'b * (unit -> ('a,'b) SP)
| Get of 'a -> ('a, 'b) SP

fun probeOne () = shift (fn k => Get (fn a => k (C (a, probeOne))))
fun emitOne s =
let val C(v, s') = s() in
shift(fn k => Put (v, k));
emitOne s'
end

fun encode f = reset (fn () => emitOne (f probeOne))

Reals

A possibly interesting observation:

All of the types I have found for expressing exact real arithmetic are codata.

Why, since the computable reals are just as countable as the integers?

Is it just an artifact of the way we think about them? In that we usually ignore problems like "arbitrary" precision integer types that can only represent numbers up to, say, GHC's (2^8)^(2^29)?

Categorical Programming Language

I'm not sure this is relevant, but Hagino's Categorical Programming Language - or at least the definition syntax - allows one to define the type of functions like this:

right object exp(X, Y ) with curry is
eval: prod(exp, X) â†’ Y
end object


This is not vacuous - the primitive arrows in the syntax refer to morphisms between objects (approximately, types of values), this definition actually gives you an (exponential) object whose elements are functions.

It may be an oversimplification, but "right" seems to corresponds roughly to "codata", and "eval" is indeed the primitive observation for the new type. This example comes from chapter 3 of the thesis.