Adequate bootstrap for compiler with defmacro?


I am writing a Lisp->JavaScript compiler (link), and I am at the point where I want to add defmacro to the language, so that higher-level, convenient operators like let can be added to the language in terms of simpler, core forms.

The boostrap problem is this: the macro bodies need a sizable portion of the language as support code (e.g. AST objects, collections, map, other utilities...), but I don't want to write that part of the language without the convenient operators afforded by macros.

I see a couple of possible solutions:

  1. Add a non-turing-complete macro system (e.g. syntax-rules), and write the convenience macros in that system;
  2. Write the convenience macros as built-ins in the compiler's host language;
  3. Use some kind of external tool, e.g. pipe my S-expressions through a Scheme and utilize its macro system;
  4. Bite the bullet and write the defmacro support code without the convenience macros.

Any suggestions? Thanks,

-- Manuel

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

negative suggestion

(4) is "giving up". (2) is "giving up". (3) is "somebody else's problem". So: pick (1).


A different sort of challenge

Well, (4) isn't really giving up; it's tackling the problem the hard way. It's still an interesting problem, though, because you can set yourself the goal of minimizing the code: figure out your dependencies, and order the tasks so that you build the more powerful pieces first. For example, (map) can be written pretty easily if you've got tail recursion; then you can use it instead of writing loop macros.

I recently wrote an interpreter using a combination of these strategies. For my unit tests, I constructed the ASTs in the host language (C++); but that was impractical for the larger sort of functions I needed to write, so I needed a parser. I didn't want to write the parser in the C++, but I couldn't write it in the target language until I had a parser to read it. So I wrote it in Scheme syntax (almost the same, except for characters) and wrote a tiny Scheme program to reader the parser code and generate C++ that would build the AST of the parser code. Since the parser couldn't rely on most of my language's library yet, it was a nice challenge.

I also made sure to check in the generated C++; if I ever distributed the language, people would be able to build it without having a Scheme interpreter around, unless they needed to modify the parser.


Sure, there can be good reasons to use any of the approaches. I was a little bit tongue in cheek just encouraging bootstrapping efforts to minimize dependencies and work because that's the direction in which lastingly elegant solutions are found.


edit: (oops... misthreaded... this was a reply to "different kind of challenge")

This sounds like too much infrastructure

The problem here may be that you are writing a compiler rather than an interpreter. In "real" LISP implementations, all that is needed is CONS cells, NIL, and atoms. If you are thinking about ASTs, you are too far up the processing chain to be doing macros in a LISP system.

Doing macros well is tricky. It requires that the compiler be able to execute a target-compatible interpreter.

Look ma, no interpreter

Doing macros well is tricky. It requires that the compiler be able to execute a target-compatible interpreter.

I have implemented a Lisp->JavaScript compiler, including macros, without an interpreter.

Here's what I did:

  1. I defined a minimal ("Lisp-2") kernel language with operators such as %%lambda (the usual, with Common Lisp-like optional and rest arguments), %%defvar (for creating a global variable binding), %%defun (for creating a global function binding), and %%defmacro (for mapping a function name to an expander function, that takes a form and produces a form). This kernel language is compiled to JavaScript.

    For example, (%%lambda (x) x) is compiled to the following JavaScript code (_key_ is a calling convention argument).

    function(_key_, _v_x){ lisp_arity_min_max(arguments.length, 2, 2); return ((_v_x)); }
  2. Using the target language (JavaScript), I defined datatypes and some operations for the different kinds of forms (string forms, symbol forms, number forms, compound (list) forms). Among the operations are things like %%compound-elt (for accessing the nth element of a list form), %%compound-slice (for taking a slice out of a list form), and %%compound-apply (for destructuring a form).
  3. In JavaScript, I wrote a quasiquote implementation, based on Alan Bawden's "Quasiquotation in LISP."
  4. With these basics in place, I can bootstrap a comfortable defmacro in terms of the lowlevel %%defmacro:
    ;;; (defmacro name sig &rest body)  [defmacro-form]
    ;;;  0        1    2         3
    (%%defmacro defmacro
      (%%lambda (defmacro-form)
        `(%%defmacro ,(%%compound-elt defmacro-form 1)
           (%%lambda (%%form)         
               (%%lambda ,(%%compound-elt defmacro-form 2)             
                   ,@(%%compound-slice defmacro-form 3)))           
               (%%compound-slice %%form 1))))))

    The outer %%defmacro sets defmacro to a function that takes as input a defmacro form (of the shape described in the comment at the top), and returns its expansion in terms of the lowlevel, built-in %%defmacro.
    "Calls" to the defmacro macro result in %%defmacro calls, that use the %%compound-apply destructuring operation, thereby providing Common Lisp-like destructuring lambda lists to macro writers.

  5. Now, I can start defining other macros, e.g.
    (defmacro when (test &rest consequent)
      `(if ,test (progn ,@consequent) null))
    (defmacro unless (test &rest alternative)
      `(if ,test null (progn ,@alternative)))

I think the same approach could also be used when targetting C or machine language – I would need to use dynamic loading, but it looks as if a separate interpreter can be avoided by structuring the bootstrap in a certain way.

Just to nitpick

I think that syntax-rules actually is turing complete.