Making a one-pass compiler by generating fexprs that generate code

I'm starting to write a simple compiler that transcompiles a simple scripting language for general game playing (at least for chess-like and a few other board games) into C which will be compiled in memory with the Tiny C library.

I noticed that there's a mismatch between the order in which parser generators trigger actions and the order in which tree nodes need to be visited in order that identifiers can be type checked and used to generate code.

Parser generators trigger actions from the bottom up. Ideally when you generate code, you visit nodes in the tree in whatever order gives you the type information before you see the identifiers used in an expression. Since fexprs let you control what order you visit parts of the inner expression, they're perfect for that.

So my parser is being written so that the parse generates an s-expression that contains fexprs that when run semantically checks the program and transcompiles in a single pass.

This also suggests a new version of Greenspun's 10th rule:
A sufficiently complex C++ program will contain an informally specified, buggy and incomplete implementation of John Shutt's Kernel language.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Looking forward

I'm fascinated. I've suspected for years that fexprs might make a good substratum for compilation, perhaps giving the programmer especially natural degrees of freedom; but I haven't explored it, so I couldn't explain why a device that collapses the compile-time/run-time distinction would be good for compilation. Now that you say, it seems natural that introducing fexprs into compilation would tend to collapse the compilation process into a single pass.

I'm also reminded of the mirror-image analogy I noticed, in passing, between fexprs and "micros" (the topic of Shriram Krishnamurti's dissertation — Shriram having been on my dissertation committee). From my dissertation (page 7):

Each fexpr specifies a computation directly from the (target-language) operands of the fexpr call to a final result, bypassing automatic operand evaluation and thus giving the programmer complete semantic control over computation from the target language. [...]

A remarkably parallel statement can be made about micros ([Kr01]): each micro specifies a translation directly from the (source-language) operands of the micro call to a target expression, bypassing automatic operand translation and thus giving the programmer complete syntactic control over translation from the source language. Thus, micros are to translation what fexprs are to computation. The parameters of the analogy are that micros bypass processing that would happen after a macro call, and are inherently syntactic; while fexprs bypass processing that would happen before a procedure call, and are inherently semantic.

The analogy also extends to internal mechanics of the devices: micros, as treated in [Kr01], rely heavily on a function dispatch that explicitly performs source translations, compensating for the loss of automatic operand translations — just as fexprs (treated here) rely heavily on a function eval that explicitly performs evaluations, compensating for the loss of automatic operand evaluations.

In the context of using fexprs in compilation, I might say fexprs face forward to the output, while micros look backward to the input.

Tree traversal control

I probably could generate without fexprs if the syntax of my program always puts the type information sufficiently before the use of the variables so that the types are completely parsed before the expressions are, and in my language that's no doubt true.

But in that case I'd have to write the compiler out of combiners that don't care what order they run in. I'm sure that's been done a lot of times.

But that whole process just feels like a kludge.

Using fexpers I can combine a deterministic tree traversal order with the generation and do it succinctly.

It's the succinctness of combining the tree traversal policy with everything else that so nice.

Also s-expressions are ASTs instead of CSTs

The other nice thing here is the preprocessing that happens while building the s-expressions.

Those s-expressions are an actually ABSTRACT syntax tree rather than the concrete syntax tree that parsers can automatically generate. It only contains the elements necessary.

It is a two stage process so maybe you could call it a two pass compiler, sort of.

That first stage elides unnecessary information from the parse.

I could give an example from my test code.
the type:
goes through multiple levels of parse nodes to build the lists but in the end the lists build nothing more than (actual C++):

L(f_function_type, L(L(L(f_simple_type, L("type", "real"), "double"), L(LVar("a",1), LVar("b",1), LVar("c",1))), L(L(f_simple_type, L("type", "string"), "char *"), L(LVar("d",1), LVar("e",1), LVar("f",1)))),L(f_simple_type, L("type", "integer"), "int"));

Where L is a list builder, and LVar(string, int) builds a token that knows what its line number is, 1 being a place holder in this case, and the functions that start with "f_" are fexprs.

The lists that start with strings aren't expressions because they're parameters to fexprs that will never be evaluated.

Also, note that the first parameter to f_function_type isn't an expression it's a list of lists where only part of each element is an expression and that fexpr knows which elements of the list to eval.

The fexprs are actually building and returning complex objects that know how to generate code, know what the types are and in the case of function types have symbol tables. If you defined the same parameter name twice, f_function_type would throw an exception.

Composable Type Inference

If i understand correctly, there is another way to do this (one pass compiling), which is composable type inference. If any program fragment can be given a type in isolation and when you compose program fragments the type of the composed fragments depends only on the types of the fragments (so you have principal typings), then the types can be inferred and checked bottom up, the same way the parser produces the AST.

The HM style type system I designed here:

Does this (you can see the types are generated before each parser returns the AST). I am using the same type system and inference in my scripting language:

However here i am using separate passes to keep things simple, but i will probably move to the single pass approach i used in the C++ in the future.