Research in Programming Languages

Interesting blog post by Crista Lopes. Here is some text from the bottom that struck a chord with me:

In order to do experimental design research AND be scientifically honest at the same time, one needs to let go of claims altogether. In that dreadful part of a topic proposal where the committee asks the student “what are your claims?” the student should probably answer “none of interest.” In experimental design research, one can have hopes or expectations about the effects of the system, and those must be clearly articulated, but very few certainties will likely come out of such type of work. And that’s ok! It’s very important to be honest. For example, it’s not ok to claim “my language produces bug-free programs” and then defend this with a deductive argument based on unproven assumptions; but it’s ok to state “I expect that my language produces programs with fewer bugs [but I don't have data to prove it].” TB-L’s proposal was really good at being honest.

We've talked a little about programming language design research before.

What does focusing tell us about language design?

A blog post about Call-By-Push-Value by Rob Simmons: What does focusing tell us about language design?

I think that one of the key observations of focusing/CBPV is that programs are dealing with two different things - data and computation - and that we tend to get the most tripped up when we confuse the two.

  • Data is classified by data types (a.k.a. positive types). Data is defined by how it is constructed, and the way you use data is by pattern-matching against it.
  • Computation is classified by computation types (a.k.a. negative types). Computations are defined their eliminations - that is, by how they respond to signals/messages/pokes/arguments.

There are two things I want to talk about, and they're both recursive types: call-by-push-value has positive recursive types (which have the feel of inductive types and/or algebras and/or what we're used to as datatypes in functional languages) and negative recursive types (which have the feel of recursive, lazy records and/or "codata" whatever that is and/or coalgebras and/or what William Cook calls objects). Both positive and negative recursive types are treated by Paul Blain Levy in his thesis (section 5.3.2) and in the Call-By-Push Value book (section 4.2.2).

In particular, I want to claim that Call-By-Push-Value and focusing suggest two fundamental features that should be, and generally aren't (at least simultaneously) in modern programming languages:

  • Support for structured data with rich case analysis facilities (up to and beyond what are called views)
  • Support for recursive records and negative recursive types.

Previously on Rob's blog: Embracing and extending the Levy language; on LtU: Call by push-value, Levy: a Toy Call-by-Push-Value Language.

And let me also repeat CBPV's slogan, which is one of the finest in PL advocacy: Once the fine structure has been exposed, why ignore it?

Parametric Grammars

I am curious why it seems there is little or no research on adapting grammars and parsers to support parametric polymorphism. It seems to me that grammars and types are more or less the same thing, and technology related to polymorphic type systems should apply to grammars and parsing as well.

Existing grammar technology has woeful compositional properties compared to type systems. I am currently using Dypgen which allows dynamic extension of a grammar. But first let me backtrack a bit:

Suppose to have an executable recursive descent parser for statements, where the parser accepts a list of statement forms and tries each one until it succeeds. If you put that list in a global variable, it is easy to extend the system by constructing a suitable data structure for parsing a statement at run time, push it onto the statement list and store the resulting list in the global variable.

The use of a global variable here rather than a weak functional technique is mandatory when you consider that some statements may be composed from others, and we want the recursion to extend nested statements to include the new production too.

Now as to Dypgen, it is better because it is purely functional in that after adding a new production for a statement, it rebuilds the parser engine, and so the recursion required to support nested statements works.

BUT .. we are still adding a new production to a statement, which is similar to hacking an Ocaml variant type and adding a new case, then recompiling. It's not the recompilation that concerns me here, but the fact we're forced to modify the old grammar to extend it.

The thing is the *right way(tm)* to do this would seem to be to use open recursion: in Ocaml you can use polymorphic variants with a parameter which is closed to form a concrete type, and for an extension you can add new variants to the open form and then close that. With this technology we have real subtyping: we have a type which is open for modification, and can be trivially closed for use, thus satisfying the open/closed principle.

Why can't we do this for grammars?

There are some real trivial uses for this. Dypgen supports 3 polymorphic operators already, namely * + and ?. But now, suppose I want to define "comma separated list of arbitrary-nonterminal" which in fact I need a lot.

I'm being asked to use a technology so seriously archaic it is worse than Basic or Cobol: it doesn't even have "subroutines". What I need here is actually quite flat: it needs parametric grammars, though not open recursion.

Given the huge amount of research into type systems .. why am I still using Assembler to write my grammars?