Trouble understanding Danvy's functional unparsing

I've read Danvy's paper here[1], and I fail to understand why CPS is necessary for the types to work out. I've written a basic version of "functional sprintf" in OCaml that doesn't use CPS and seems to typecheck fine:


let int_ s i = s ^ string_of_int i

let float_ s i = s ^ string_of_float i

let (@) f g s a b = g (f s a) b

let sprintf fmt = fmt ""

let (_s : string) =
let x = sprintf (int_ @ float_) 1 1.0 in
let y = sprintf (int_ @ float_ @ int_) 1 1.0 2 in
x ^ y

[1]: http://www.brics.dk/RS/98/12/BRICS-RS-98-12.pdf

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Seems to work

I think your definition also works. Arguably it is also in CPS, in the sense that you do thread the string prefix `s` as a "destination" parameter -- but Danvy's definition threads both s and a functional continuation k so it is more sophisticated.

One property of Danvy's encoding, that he remarks on, is that format concatenation is just function composition -- while your operator (@) is slightly more complex. Whether that justifies the extra CPS complexity is a subjective question, but I would note that the Haskell community tends to favor definitions (eg. Van Laarhoven lenses) that can be composed with function composition directly, as it allows to reuse their convenient (.) infix operator without fuss.

I think it is even possible to use a CPS-based solution to remove the need for an infix operator (@) altogether: write basic combinators in such a way that (int float (lit "foo") int end) is a valid format. There was a paper about going very far with these tricks (you could essentially encode LR parsing to express arbitrary grammars without separators as direct application of CPS-style combinators), but right now I don't remember the reference.

Note that OCaml uses functional unparsing nowadays, using GADTs to represent formats. I like the idea of having a first-order representation of formats as a plain data-structure, even though it can be expressed in a GADT-less language with those encodings. (The OCaml type has many type parameters and might be simplified if it dropped backward-compatibility with previous exposed types; but it also supports relatively sophisticated features such as dynamic format substitution).

There are many variations around the type-safe printf theme. From Oleg's webpage on unparsing, I found a reference to work by Kenichi Asai in 2007-2008 that discusses the use of CPS in much detail (with solution in more direct style using delimited control), which you may be interested in.

Found it

The reference I had in mind is the pair of "Techniques for embedding postfix languages in Haskell", Chris Okasaki, 2002, which explains the technique for direct embedding of postfix programs (without explicit sequencing), and "Typed quote/antiquote or: Compile-time parsing", Ralf Hinze, 2011, which scales the idea all the way up to LL(1) parsing.

There is also a paper that

There is also a paper that describes how to use memoized CPS to do full context free grammar parsing in polynomial time (in Scheme).

Memoization of Top-down Parsing

This is closely related to tabling in logic programming.

Oh I love that paper

The frustrating thing comes from a different paper I read the other day that referenced this paper. They said that no one references this paper because it's impossible to read.

:/ it's scheme, I find it easy to read. However I can't make head or tail of the more common (these days) sort of notation used by the other paper which, among other things, purported to be translating this paper into something comprehensible. Why is it that CS researchers and programmers can't read each others work?

* Because one speaks Greek and the other speaks Latin


Long live format strings?

Note that OCaml uses functional unparsing nowadays, using GADTs to represent formats.

Right, but if I understand correctly formats are still expressed as format strings in source code, and get translated into the GADT-based representation by the compiler.

On a related note I remember a talk by Vincent Balat some years ago explaining that the ocsigen web framework had a printing library relying on Danvy's functional unparsing (without syntactic sugar) but that they dropped it because the resulting error messages were very poor.

Indeed; and I think it is

Indeed; and I think it is sensible to respect the standard format that users expect. That said, if that had to be redone from scratch, it should use a different syntax than string literals (strings with an explicit marker that those are format strings) to avoid making the type of string literals ambiguous.