LtU Forum

On constness

I'm trying to decide how I want to approach the issue of immutability in my language. So far, I've identified three general approaches.

Obviously, there are the purely functional languages, in which there is no mutability at all. I'm not interested in that approach, mainly because my focus is on multi-paradigm programming and that approach is too narrow for me.

The second approach is that used by C++, where 'const' is a type modifier - that is, a mutable type can be declared read-only after the fact.

The third approach is that taken by Java and many other languages where there are separate mutable and immutable types, which may conform to a common interface but which have separate implementations. For example, you have ArrayList and ImmutableList, both of which implement the List interface.

An advantage of the Java approach is that the immutable implementations can in many cases be simpler and more efficient than their mutable counterparts. A case in point is ImmutableList, which has an optimization whereby empty lists are represented by a reference to a static singleton instance.

On the other hand, it's extra work to have to implement two versions of every collection. That extra work isn't a problem for the most commonly-used containers, but for the long tail of specialized and possibly application-specific types it can be a significant burden.

C++ on the other hand has it's own version of the 'double work' problem, whereby a lot of classes end up having to implement two versions of many methods, a 'const' and 'non-const' version. However, this can be solved by language design - a 'conditional const' type modifier which is const only if the enclosing type is also const. This allows many of these double-definitions to be collapsed.

However, you still have the problem of maximizing efficiency of container implementations. Suppose I have a C++ class X that internally contains a vector. Adding the const modifier to X doesn't switch the implementation of the vector over to a more efficient, immutable version; And even if the language supported it, it wouldn't be very efficient because now you have to add a level of indirection between X and the vector (using virtual methods or some other technique) so that X's implementation can switch between the mutable and immutable implementation of vector based on whether X is mutable.

So I'm wondering if there are any other interesting approaches out there, or if there's a sweet spot that I'm missing.

Jonathan Blow's next foray into game language design

As of about a week ago, Jonathan Blow (creator of various well received games such as Braid as well as various PL efforts) has pair of talks on (later both put on youtube) where he issues a call to arms for game developers to design a new programming language around their concerns. He discusses what he sees as some of the shortcomings of existing languages that are supposed to be C++ replacements, and some of the requirements he feels game devs have. There was a Q&A session as well.
The talks are more practical than PL theoretical, but interesting (and occasionally frustrating) never the less.

Twitter feed recommendation: Meredith Patterson

Meredith tweets under the handle @maradydd and contributes much to the tiny niche comprising the intersection of type theory and security. For example:

  1. She shares links to and comments upon interesting contentshe summarises a 1964 post of Doug McIlroy's asserting four key failures of programming languages. (To recast these concerns as desiderata, I call these the McIlroy Systems-PL Minimum);
  2. Her retweets are sharp, e.g., on requirements gaps in formal specifications of programming languages; and 'It’s almost as if basing systems around an untyped programming language is a bad idea';
  3. And she promotes worthy things like the @typetheorypodcast and neat little sequent calculus tutorials.

Kaya: Declarative Reactive

The presentation of Kaya at the Future of Programming Workshop at the Strange Loop conference.

Kaya is declarative like SQL, reactive like a spreadsheet, and a spreadsheet metaphor is used to render powerful, expressive data structures. Code is not written in a text editor, but instead you compose applications in a spreadsheet-like editor. The resulting contextual nature ameliorates some of the typical need for control structures, and leads to a more natural way to compose applications.

Controlling time and space

Evan's (the Elm guy) strangeloop talk, it is actually quite good (and not bad for Prezi); slides:

Video of talk:

A very nice (and useful) overview about the different flavors of FRP.

Extended Axiomatic Language

Axiomatic language is a formal system for specifying recursively enumerable sets of hierarchical symbolic expressions. But axiomatic language does not have negation. Extended axiomatic language is based on the idea that when one specifies a recursively enumerable set, one is simultaneously specifying the complement of that set (which may not be recursively enumerable). This complement set can be useful for specification. Extended axiomatic language makes use of this complement set and can be considered a form of logic programming negation. The web page defines the language and gives examples.

Seeking artricle on syntax sugar and comparing programming languages

I recall there was an article on LtU that defined term syntax sugar. IIRC the generic idea was that syntax sugar requires only the local transformation. And language features that require non-local program rewrites cannot be called syntax sugar and it can be core of argument that one language is higher-level then another.

Optimisation by repeated beta- and eta-reduction

The following post recently showed up on Planet Haskell: Morte: an intermediate language for super-optimising functional programs. From the post:

Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected.

Here I will introduce such an intermediate language named Morte that obeys this stronger guarantee.

The typed lambda calculus possesses a useful property: every term in the lambda calculus has a unique normal form if you beta-reduce everything. If you're new to lambda calculus, normalizing an expression equates to indiscriminately inlining every function call.

I am worried about this because the author explicitly wishes to support both folds and unfolds, and, according to Catamorphisms and anamorphisms = general or primitive recursion, folds and unfolds together have the expressive power of general recursion—so that not every term has a normal form (right?). More generally, it seems to me that being able to offer the strong guarantee that the author offers implies in particular a solution to the halting problem, hence a non-Turing-complete language.

Later, the author says:

You can take any recursive data type and mechanically transform the type into a fold and transform functions on the type into functions on folds.
I have a similar concern here; it seems to me to be saying that folds can express general recursion, but I thought (though I don't have a reference) that they could express only primitive recursion.

Have I got something badly conceptually wrong?

Re-thinking Prolog

A recent paper by Oleg Kiselyov and Yukiyoshi Kameyama at the university of Tsukuba discusses weaknesses and areas for improvement to Prolog.

Quite many computations and models are mostly deterministic. Implementing them in Prolog with any acceptable performance requires the extensive use of problematic features such as cut. Purity is also compromised when interfacing with mainstream language libraries, which are deterministic and cannot run backwards. Divergence is the constant threat, forcing the Prolog programmers to forsake the declarative specification and program directly against the search strategy. All in all, Classical Prolog is the exquisite square peg in the world with mostly round holes

The strong points of Prolog can be brought into an ordinary functional programming language. Using OCaml as a representative, we implement lazy guessing as a library, with which we reproduce classical Prolog examples. Furthermore, we demonstrate parser combinators that use committed choice (maximal munch) and can still be run forwards and backwards. They cannot be written in Classical Prolog. Logic variables, unification, and its WAM compilation strategy naturally emerge as a "mere optimization" of the Herbrand universe enumeration.

The paper mentions the strength of the approach used by miniKanren (which embeds logic programming with fairer search strategy than normal Prolog into Scheme) and Hansei (which embeds probability based nondeterminism into Ocaml using delimited continuations to allow direct-style expression of monadic code).

After motivating some choices by studying the prototypical example of running append backwards they cover running parsers with "maximal munch" rule backwards - something that cannot be (declaratively) expressed in prolog.

A very interesting paper on logic programming! It also thanks Tom Schrijvers of CHR fame at the end.

Request For Advice and Guidance On Writing a Scheme To C Compiler?

Dear Lambda The Ultimate,

I've been working very hard recently on writing something I thought would be entirely straightforward: A scheme to C compiler that bootstraps - nothing I do seems to work. On my first attempt (which was able to compile and run its own parser) there was no GC or tail calls so it demanded >8GB of memory and failed, on my next the emitted code blows up so huge that it grinds to a halt. I hope someone could give me some advice on what I should do or what I need to learn to complete this? It would also be interesting to discuss and read about other members experiences.


I've been interested in programming languages since I started programming a long time ago, and have explored and written parsers, type checkers and interpreters for a fair selection of different types of programming languages like lisp (of course!), prolog (and minikanren), a cut down version of haskell with hindly milnor type checking as well as some unusual things like linear and reversible programming languages.

A long time ago I read Marc Feeley - 90 Minute Scheme to C compiler when it came up on this site and it was fascinating to understand (or so I thought) how a scheme compiler could be done, in particular using the CPS transform to deal with tail calls and call-with-current-continuation. This got me interested in continuations and I studied Appel - Compiling with Continuations but I didn't write a compiler back then.


Since then I've read about Abstracting and Representing Control from Olivier Danvy and written metacircular interpreters that emit continuation semantics (pure lambda terms without any control operators). I also came to prefer delimited continuations rather than call-with-current-continuation and decided that I would like to implement those directly in my own compiler. I noticed that in Marc Feeley's compiler he was able to just use the continuation semantics of call-with-current-continuation as is because it's already in CPS form, on the other hand the continuation semantics for shift and reset are not in CPS form so this technique cannot be used: one would have to CPS transform twice to get the necessary metacontinuations - but this also causes a huge blowup in the number of lambda terms. So I studied other approaches to direct implementation of shift/reset:

In the end I felt like building a runtime for my compiler to target (in C) that handled delimited continuations was something I could add later on but for now I should forget about that (and if it was require in bootstrapping I had an easy way to interpret them), so I just used Matt Might's very clear and simple explanation of hybrid higher order CPS transform How to Compile with Continuations.

Getting Started

When the got the idea to write a self hosting scheme compiler I quickly wrote out a parser that's able to parse as much of scheme as I needed as well as the file itself. Then I initially came up with something very simple that just performed closure conversion: It was a surprise to me that this was basically enough to execute lambda using C procedures for code and vectors for their environment - it just lacked garbage collection and tail call elimination. I pushed on with this hoping lack of optimisations wouldn't be an issue ending up with a system that had one compiler pass per file as well, some utility functions and a script that chained them all together. This quite successfully compiles all the small simple scheme programs I gave to it (getting a program with the Y combinator to compile and run for the first time was very exciting!) but it just isn't good enough to bootstrap.

So I looked back a bit and read over Guy Steele's RABBIT paper and decided to start again from scratch. This time I started with a C runtime that had a simple two space copy and collect GC which uses the stack as the root - the stack itself is executed by a trampoline (a while loop that pops a continuation off the stack and calls it) and the continuations pop arguments off the stack, compute something with them then push a continuation onto the stack and return. I wrote some tests for the runtime: for example compiling recursive and iterative factorial functions by hand. The value of this approach is that you get tail call optimisation "for free" it only requires doing a CPS transform in the compiler.

So the overall architecture of the compiler is now:

  • Parse a scheme file to get the s-expression tree
  • "desugaring" - I have looked into syntactic closures and I think that I would like to build a good macro system out of this in future but to start with I decided to just hard code expansions for all the special forms.
  • mutability analysis - since I use what I found out later is called "flat-closures" I had to box variables that SET! is used on
  • continuation passing style transform - this makes every call a tail call
  • lambda lifting - I added this optimisation at the end but it didn't help at all [it looks for in place lambda applications like ((lambda (var ..) body ...) param ...) and adds its free variables as extra parameters - this means that it can stack allocate them instead of heap allocate then]
  • closure conversion - this removes all free-variables from the code by creating closures (code & environment vector pairs that holds the closed over variables)
  • hoisting - I give every lambda a name and move it to the top level since the code part of each closure will be implemented as a C function
  • c-gen - this translates continuation passing procedure calls into stack machine operations that my runtime can execute, it also handles taking things off the stack to allocate them on the heap

after that actual C source code is emitted and some simple programs like an infinite lazy stream of fibonacci numbers do execute (with the correct space usage i.e. TCO and GC are working) - but when I started to head towards bootstrapping just adding a recursive definition of the EQUAL? function blow up horrible creating 40k lines of C.

Looking for more to read

I've looked for more modern things to read that might help me build a self hosting scheme compiler like:

and tried to find short readable implementations (studying real world compilers is just too hard since they are so complex and enormous)


but all in all I just don't know what I'm missing - I don't want to give up though because I think there's an important difference between understanding something and actually doing it. In particular I feel like I can't progress in my study of programming languages until I pass this block (I want to experiment with building compilers using Futamura projections like PyPy does after I complete this for one thing).

XML feed