Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing

In Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing, Rendel Tillmann and Klaus Ostermann at the University of Marburg, Germany apply the "don't repeat yourself" principle to parsers and pretty printers.

Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Partial isomorphisms

I think their idea of partial isomorphisms is the coolest thing in the paper. It will be interesting to see what else can be built with that.


What would happen if we redefined constructors at the language level to be partial isomorphisms like these?

It seems that might eliminate the need (in Haskell-like languages) to have a namespace for constructors beside the normal namespace. When constructor names appear in pattern contexts they would be sugar for the elimination side of the isomorphism, and when they appear in application expression contexts they would take on the type of the introduction side of the isomorphism.

A problem might arise because the partial isomorphisms described in the paper are partial in both directions. A version that was total in the introduction direction and partial in the elimination directory seems better suited to this use.

The issue after that would be that the the introduction side of a type like Iso (Rational, Rational) Complex has type (Rational, Rational) -> Complex instead of Rational -> Rational -> Complex that is used now. This might be insurmountable without bigger language changes, but maybe it could be spackled over with a form like Constructor (TCons Rational (TCons Rational TNil)) Complex and a type-level function to change the list of argument types and the result type into the usual curried function type.

Languages with extensible records might be able to benefit even more from a change like this by viewing constructors as partial isomorphisms between a suitable record of arguments and the constructed type.

Stack-based iso constructors

Martijn van Steenbergen and I are working on something like this, inspired by wanting to have a bidirectional URL parser/printer. (So the webserver is guaranteed to be able to handle the URLs that are generated.)

The repository is here:

We're using a stack to push and pop values on/off. Your example would become Iso (Rational :- Rational :- r) (Complex :- r), i.e. the constructor takes 2 Rationals from the stack and pushed a Complex value on the stack, and the destructor does the reverse.

Your most recent commit

says you have some ideas inspired by a JSON router. Which one?

Personally, I don't understand the point behind your bidirectional URL parser/printer (the Sitemap pattern). I also don't understand the point behind Rails' Routing and the many imitations since the basic idea for routing actions was done in Struts 1.0.

To me, the routing system should be an object graph with dynamically removable and addable nodes. This is because the real problem is the user experience in the browser and the clumsiness of how hyperlinks are navigated in a browser; one of the huge design flaws in modern browsers, IMHO. Edit: I guess if I were to formalize this, I would do it using graph grammars and not have a statically defined parser somewhere.

Reminds me of abstract interpretation

....although to be honest so do many other things.

In AI there is a reversable mapping between two domains: on one side you have many values that are "the same" and a Galois Connection to describe the composition of the maps. The domains generally describe data.

In this work there is a reversable mapping between two domains: on one side you many values that are "the same" (different string representations of the same code) and if I've read the paper properly then the isomorphism is preserving that equality relation under composition of the maps. The large difference seems to be that the domains are describing code.

Seems very cool. It will require much more thinking about.

Wanted: Pointful Syntax

This is a great idea. But the code is terse and hard to follow. What we really need is an syntax that brings clarity to code build using the unified parsing and pretty-printing framework, analogous to comprehension syntax bringing clarity to monad operations. I think Douglas McClean has the right underlying idea for this. For primitive parser combinators, you'd write the isomorphism explicitly as in the paper; but for user parsers you'd use something that looks like enhanced pattern-matching to define both the forward-operation (parsing) and inverse-operation (printing) pointfully. This requires further investigation.

Partial logic language features

Looking at this from another perspective, if you wrote just the parser in a logic language like Prolog (paying careful attention to the other of the joins), you could construct it reversibly so that given a known result you can solve for a canonical string that produced it.

The authors do address logic programming as an alternative...

...but they're not optimistic about it. In section 4.3, they write, "in contrast to logic programming[, our model of] invertibility can actually be made to work in practice."

I'm a little more hopeful. It wouldn't necessarily be easy -- but not every logic programming system uses plain old DFS, and not every logic program has to use impure features.

Logic Programming

I've used this approach previously.