C-Machine

I semi-formalized some part of the translation process for a compiler I am working on. I need people, to do my homework, to shoot as many holes in it as possible, and possibly supply references. I am aware of Thompsons work, but I'll admit I never really got past his elaborate notation.

Below a convenience notation I use. Please read it as a preview. I could have done it shorter, the full notation and evaluation process is elaborated as you read. I should rewrite it such that it introduces the full notation in a few paragraphs. Excuses for that.

A DOT Evaluator
The gist of DOT evaluation, semi-formal. A DOT evaluator is a combinatorial evaluation strategy used as an intermediate notation to make clear how thunks can be set up and reduced.

Note that there is nothing new about this notation. It is a machine-amiable convenience notation used to show, with a few invariants, that it is possible to safely translate combinatorial expressions to C, not as a new notation for combinatorics.

A DOT expression is a syntactic notation for standard combinatorial expressions. A combinatorial DOT expression consists of constants, variables written in lower case, dots '.', combinators in uppercase, and series of squared series out of the former four. A rewrite rule is a squared series starting with a combinator producing a combinator expression, bound variables may occur in rewrites.

Examples:

[I x] = [x]

[K x y] = [x]

[S x y z] = [x z .] [y z]

(What is the expression for lazy evaluation of S?)

Basic Rewrite Rules

A DOT evaluator rewrites a combinator with a fixed strategy. Always, the last expression in a series is evaluated. When it reduces to a constant, its value is placed in the first not-nested dot encountered reading from right-to-left. When it reduces to a squared series, that squared series replaces the original term. Evaluation halts when the last expression does not reduce.

The above rewriter gives a fixed rewrite strategy on DOT expressions. Note that the order of the series rewritten to matters.

Example, using some extra notation:

[F n] = if n = 1 then [1] else [* . .] [n] [F .] [- n 1]

Evaluation of F 3:

[F 3]

= (if 3 = 1 then [1] else [* . .] [3] [F .] [- 3 1])

= [* . .] [3] [F .] [- 3 1]
= [* . .] [3] [F .] [2]
= [* . .] [3] [F 2]

= [* . .] [3] (if 2 = 1 then [1] else [* . .] [2] [F .] [- 2 1])

= [* . .] [3] [* . .] [2] [F .] [- 2 1]

= [* . .] [3] [* . .] [2] [F .] [1]

= [* . .] [3] [* . .] [2] [F 1]

= [* . .] [3] [* . .] [2] (if 1 = 1 then [1] else [* . .] [1] [F .] [- 1 1])

= [* . .] [3] [* . .] [2] [1]

= [* . .] [3] [* . 1] [2]

= [* . .] [3] [* 2 1]

= [* . .] [3] [2]

= [* . 2] [3]

= [* 3 2]

= [6]

Expand and apply rules.

The above rewrite rules are not complete. Reducing a combinator also implies that all extra arguments are inherited by the first squared expression of the series reduced to. Also, a Curried expression is treated as a constant. Dots or variables may not occur in the first position, rather, a special apply @ combinator is used where [@ v] = v, in case the v held a squared expression. In the rules above, the apply is implicit. I.e., the rewrite for I should be read as [I x] = [@ x].

Examples:

[K x y z] 
= [ @ x z ]

also,

[K . y z] [S x] 

= [K [S x] y z]

= [@ [S x] z]

= [S x z]

Well-formedness and complete reduction
(I need help getting this section right.)

All variables occurring on the right-hand side must be bound on the left-hand side of a rewrite. In the right-hand-side of a rewrite, a squared expression must be bound to an exclusive dot to its right, there may not be more dots than squared expressions. All squared expression either start with a combinator followed by arguments, or hold one constant.

Since a rewrite substitutes all variables, no variables will occur when rewriting a term which doesn't contain variables; a term without variables will therefore rewrite to another term without variables. Since the last term, during the rewriting of a dot expression, cannot contain dots, since all dots in the last term will always be filled, terms rewritten are always strongly reduced. Since all terms rewritten to always start with a combinator, albeit apply, all squared terms always start with a combinator.

The above invariants are sufficient to guarantee that each combinator can be compiled directly to a C routine, and that, by the confluency lemma, terms who have a normal form reduce to that. (I guess.) (I should show Turing equivalence here.)

Shift/Reset
The shift/reset pair saves and restores a stack position. It cannot be expressed directly, therefore some extra notation with curly braces is used. When a shift is encountered, curly braces are introduced, and evaluation continues in that expression, when a value is determined the curly braces are lost, when a reset is reduced, the whole evaluation continues with that expression.

For example (check with compiler internals):

[ Shift G ]

= { [G .] [Reset] }

= { [G [Reset]] }

= {[K . . ] [ F ] [ Reset . ] [3] }

= {[K . . ] [ F ] [ Reset 3 ] }

= [3]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Not really sure where you're

Not really sure where you're going with this; you might get better reactions if you do some editing. Otherwise it's like listening to half of a conversation with yourself :)

True

I admit I abuse LtU as my personal scratch pad ;)

Please No

Really, please don't do that. LtU should not be anyone's personal scratch pad. The policies document is quite clear about this.

Neat.

This is an interesting notation. At first I thought you just wanted to eliminate nesting, but are you trying to implement CPS?

You lost me at "Expand and apply rules", though. Why do you write the @ sometimes and not others?

I would prefer a more formal presentation of your notation. For example, a grammar, rules for substitution, etc.

Yah.

The whole idea is just to have an intermediate representation which shows that 1) there is a straightforward representation for combinators in a thunk form, 2) thunks can be easily evaluated with a reverse-polish evaluator, and 3) a number of invariants are maintained during rewriting. And lastly, of course, the representation compiles to C, or some other language, in a straightforward manner.

It is all known stuff, the intermediate DOT representation just makes all that known stuff explicit.

Your questions:

Regarding CPS, no that was exactly what I was trying to avoid since I want to be as close as possible to a one-on-one correspondence between compiled code and the source.

An apply rule is introduced since, for example, in [. .] [f] [3], which is equivalent to an expression let x = f in let y = 3 in x y, the first squared series [. .] doesn't start with a combinator, therefore, isn't a thunk. Therefore, in the right-hand side of rewrites, variable or dot applications are made explicit with the @ combinator. (I want all squared expressions to be thunks, i.e., they need to start with a combinator, otherwise it doesn't compile trivially. Stated otherwise, if each thunk starts with a combinator, then all thunks can be evaluated trivially by applying the definition of the combinator; I need that invariant for the translation.)

The expand rule is there to deal with the case where the arity of the rewrite is lower than the number of arguments supplied. Which happens a lot in functional languages when dealing with curried expressions. For example, say f = map inc, and you evaluate f (1,2,3). Then [f] rewrites to [map inc], but [f (1,2,3)] should rewrite to [map inc (1,2,3)].

I am writing the whole enchalada down in TeX atm, will post it here. I don't think there is a lot new or interesting about the notation, I just needed (something like) it to be sure that the translation I thought of works.

[This is also a reply to Andy]

the whole enchalada?

The correct phrase is "The Whole Enchilada".

The Enchilada Programming Language is similar to the C-Machine in that it is also based on term rewriting.
However, Enchilada can't be easily compiled because every redex (must) carry a cryptographic hash.

Web site issues

What is the enchilada.jnlp Java Web Start applet that site says can't load? I don't have Java installed/enabled on my system these days since it is a huge annoyance and most desktop vendors that write code in Java (that I depend on) compile it to binary form to maximize interop.

This isn't a complaint, just a hopefully helpful observation that learning about your language feels a bit like a struggle and not hacker friendly.