archives

C-Machine

I semi-formalized some part of the translation process for a compiler I am working on. I need people, to do my homework, to shoot as many holes in it as possible, and possibly supply references. I am aware of Thompsons work, but I'll admit I never really got past his elaborate notation.

Below a convenience notation I use. Please read it as a preview. I could have done it shorter, the full notation and evaluation process is elaborated as you read. I should rewrite it such that it introduces the full notation in a few paragraphs. Excuses for that.

A DOT Evaluator
The gist of DOT evaluation, semi-formal. A DOT evaluator is a combinatorial evaluation strategy used as an intermediate notation to make clear how thunks can be set up and reduced.

Note that there is nothing new about this notation. It is a machine-amiable convenience notation used to show, with a few invariants, that it is possible to safely translate combinatorial expressions to C, not as a new notation for combinatorics.

A DOT expression is a syntactic notation for standard combinatorial expressions. A combinatorial DOT expression consists of constants, variables written in lower case, dots '.', combinators in uppercase, and series of squared series out of the former four. A rewrite rule is a squared series starting with a combinator producing a combinator expression, bound variables may occur in rewrites.

Examples:

[I x] = [x]

[K x y] = [x]

[S x y z] = [x z .] [y z]

(What is the expression for lazy evaluation of S?)

Basic Rewrite Rules

A DOT evaluator rewrites a combinator with a fixed strategy. Always, the last expression in a series is evaluated. When it reduces to a constant, its value is placed in the first not-nested dot encountered reading from right-to-left. When it reduces to a squared series, that squared series replaces the original term. Evaluation halts when the last expression does not reduce.

The above rewriter gives a fixed rewrite strategy on DOT expressions. Note that the order of the series rewritten to matters.

Example, using some extra notation:

[F n] = if n = 1 then [1] else [* . .] [n] [F .] [- n 1]

Evaluation of F 3:

[F 3]

= (if 3 = 1 then [1] else [* . .] [3] [F .] [- 3 1])

= [* . .] [3] [F .] [- 3 1]
= [* . .] [3] [F .] [2]
= [* . .] [3] [F 2]

= [* . .] [3] (if 2 = 1 then [1] else [* . .] [2] [F .] [- 2 1])

= [* . .] [3] [* . .] [2] [F .] [- 2 1]

= [* . .] [3] [* . .] [2] [F .] [1]

= [* . .] [3] [* . .] [2] [F 1]

= [* . .] [3] [* . .] [2] (if 1 = 1 then [1] else [* . .] [1] [F .] [- 1 1])

= [* . .] [3] [* . .] [2] [1]

= [* . .] [3] [* . 1] [2]

= [* . .] [3] [* 2 1]

= [* . .] [3] [2]

= [* . 2] [3]

= [* 3 2]

= [6]

Expand and apply rules.

The above rewrite rules are not complete. Reducing a combinator also implies that all extra arguments are inherited by the first squared expression of the series reduced to. Also, a Curried expression is treated as a constant. Dots or variables may not occur in the first position, rather, a special apply @ combinator is used where [@ v] = v, in case the v held a squared expression. In the rules above, the apply is implicit. I.e., the rewrite for I should be read as [I x] = [@ x].

Examples:

[K x y z] 
= [ @ x z ]

also,

[K . y z] [S x] 

= [K [S x] y z]

= [@ [S x] z]

= [S x z]

Well-formedness and complete reduction
(I need help getting this section right.)

All variables occurring on the right-hand side must be bound on the left-hand side of a rewrite. In the right-hand-side of a rewrite, a squared expression must be bound to an exclusive dot to its right, there may not be more dots than squared expressions. All squared expression either start with a combinator followed by arguments, or hold one constant.

Since a rewrite substitutes all variables, no variables will occur when rewriting a term which doesn't contain variables; a term without variables will therefore rewrite to another term without variables. Since the last term, during the rewriting of a dot expression, cannot contain dots, since all dots in the last term will always be filled, terms rewritten are always strongly reduced. Since all terms rewritten to always start with a combinator, albeit apply, all squared terms always start with a combinator.

The above invariants are sufficient to guarantee that each combinator can be compiled directly to a C routine, and that, by the confluency lemma, terms who have a normal form reduce to that. (I guess.) (I should show Turing equivalence here.)

Shift/Reset
The shift/reset pair saves and restores a stack position. It cannot be expressed directly, therefore some extra notation with curly braces is used. When a shift is encountered, curly braces are introduced, and evaluation continues in that expression, when a value is determined the curly braces are lost, when a reset is reduced, the whole evaluation continues with that expression.

For example (check with compiler internals):

[ Shift G ]

= { [G .] [Reset] }

= { [G [Reset]] }

= {[K . . ] [ F ] [ Reset . ] [3] }

= {[K . . ] [ F ] [ Reset 3 ] }

= [3]