Linear Logic and Permutation Stacks--The Forth Shall Be First

Linear Logic and Permutation Stacks--The Forth Shall Be First by Henry Baker, 1993.

Girard's linear logic can be used to model programming languages in which each bound variable name has exactly one "occurrence"--i.e., no variable can have implicit "fan-out"; multiple uses require explicit duplication. Among other nice properties, "linear" languages need no garbage collector, yet have no dangling reference problems. We show a natural equivalence between a "linear" programming language and a stack machine in which the top items can undergo arbitrary permutations. Such permutation stack machines can be considered combinator abstractions of Moore's Forth programming language.

I remembered this paper while chatting with a friend who's designing a stack-based instruction set and looking for relevant compilation techniques (imagine compiling C to Forth). Do you have some relevant references?

Today I found this paragraph particularly intriguing:

Since Forth is usually implemented on a traditional von Neumann machine, one thinks of the return stack as holding "return addresses". However, in these days of large instruction caches, in which entire cache lines are read from the main memory in one transaction, this view should be updated. It is well-known that non-scientific programs have a very high rate of conditional branches, with the mean number of instructions between branches being on the order of 10 or less. Forth programs are also very short, with "straight-line" (non-branching) sequences averaging 10 items or less. In these environments, it makes more sense to view the return stack itself as the instruction buffer cache! In other words, the return stack doesn't hold "return addresses" at all, but the instructions themselves! When a routine is entered, the entire routine is dumped onto the top of the return stack, and execution proceeds with the top item of this stack. Since routines are generally very short, the transfer of an entire routine is about the same amount of work as transferring a complete cache line in present architectures. Furthermore, an instruction stack-cache-buffer is normally accessed sequentially, and therefore can be implemented using shift register technology. Since a shift register can be shifted faster than a RAM can be accessed, the "access time" of this instruction stack-cache-buffer will not be a limiting factor in a machine's speed. Executing a loop in an instruction stack-cache-buffer is essentially the making of connections necessary to create a cyclic shift register which literally cycles the instructions of the loop around the cyclic shift register.

Imagine that!

Comment viewing options

The RAFTS project had some work on converting Forth to SSA. I believe it also gave some details about inverting that transformation which might be useful for such an endeavor.

Of course any work on compiling typical languages to stack machines, e.g. Java to JVM, would be relevant and there is tons of it.

Alternatively, going off your referenced article, the real logic, or rather a closer one, that you'd want for Forth is ordered linear logic. A proof transformation taking an intuitionistic proof to an ordered linear logic proof would correspond to a compiler (assuming an embedding of C into the lambda calculus). Technically, this is just a change of syntax, but you might be able to find some pre-existing work in this vein and it may provide a different, useful perspective (if nothing else it will help avoid ad-hockery and overly detail specification.)

Cat is linear

I believe Cat is linear in the sense Baker was talking about. Calling 'dup' actually copies an array and mutation does not cause visible side effects. If I'm incorrect, I'm sure Christopher will step in and clarify.

The problem with this approach is that dealing with data structures besides arrays is often very awkward. Pointer-based data structures in particular (e.g. binary trees) are quite difficult to deal with. It can also be much less efficient than bump-allocating small cells in a nursery and copying out of it; all of the explicit deconstruction with the linear approach is hardly cheap. You can avoid allocation by reusing cells, but this requires very awkward approaches to simple problems. These problems come on top of what I already perceive as the difficult act of manipulating recursively-defined data structures in a stack-based language.

I'm of the opinion that uniqueness types are a more realistic approach than "full linearity". It's a shame they haven't been more widely adopted.

Cat is no longer linear

There was a version of Cat that was linear, but this is no longer the case.

Reasoning?

What was the reason behind this move?

I couldn't figure out how to

I couldn't figure out how to translate an object oriented programming language to Cat, without passing the entire set of live objects from function to function.

Would you elaborate?

Would you elaborate? This doesn't make any sense to me.

I think...

He means the set of life objects in an OO program usually forms a graph. There is no efficient means of handling (updates to) that in a stack based language, as far as I know.

lo i am confused about define:"stack"

presumably this is a different sense of "stack based language" than what people mean when talking about the jvm?

The JVM is a hybrid

The JVM is a hybrid stack-based/register-based intermediate language.

Why not?

Why wouldn't there be? Plenty of stack-based OO languages already exist. Is your comment specific to linear stack-based languages?

No

It is specific to the interplay between a stack-based language (as without a heap), and OO languages, say with Java-like semantics. But ask cdiggins.

Since when do stack-based

Since when do stack-based languages not have a heap? Can you give an example of a language that demonstrates what you're talking about?

?

As far as I got it there are languages where values are/can be automatically allocated on the heap, and languages where values are allocated on the stack, or sometimes a separate value/operand stack.

You can always have a heap through calls to the OS, the question is whether the language implementation assumes that values are placed on a stack. [Some stack based languages have explicit stack manipulation instructions.]

Examples: Forth, Joy, PostScript, and Factor.

Not mutually exclusive

Now that cdiggins's Cat is no longer linear, I'm not aware of any stack languages where the stack contains values (as opposed to references to values). In Factor for instance, all objects are heap-allocated and the stack is a stack of references. This lets you have Java-style OO with mutable state.

Edit: fix typo

Ah well

Guess I got it wrong then.

Hmm...

Surely Factor allocates some things on the stack, like small integers and bools?

Um

Isn't Forth a counter-example to both your comment and the comment of marco's that you're following up? What am I missing?

Not much

The confusion is whether a stack-based language is the same as a language with a stack.

It isn't.

Forth CREATE DOES>

Forth has defining words that allocate heap and define operations on the allocated data. See here.

Forth also has simple heap variables with fetch (@) and store (!) operations.

Stack-based by Nature

Since Forth programs are a list of symbols which are processed in sequence, it is pretty natural to let them manipulate a stack, but there isn't any requirement that the symbols should do that.

Forth: Stack-based by nature instead of stack-based by design?

linear, functional, stack ,object oriented

I couldn't figure out how to translate an object oriented programming language to Cat, without passing the entire set of live objects from function to function

Well, do that, then. Values in the OO language don't have to map to values in the functional stack language. For example, in Java, an Object is a first-class value. In translation, an Object ID and a first-class heap would both be first-class values instead. What's wrong with that?

-t

Recursive posts would be difficult in a linear logic

Although luckily they work quite well in practice. More seriously I was about to suggest the preprint on using a linear logic to describe heap shapes as interesting, when I found that it has been discussed already, mentioning this paper as interesting work...

Why didn't it work out?

In an abstract sense I am doing something roughly the same, so possibly I could use it.

But then, if it is so good, are there any languages doing this?

[I was referring to the second quote.]

Some pointers

I've collected some pointers here. I find the general idea of use-once references intriguing, but I never found out if they would work well in a general purpose PL.

It works well in Clean

But it does place a burden on the programmer. I don't really remember the draw-backs, I think it had to do with that some general purpose routines don't always have linear type annotations, and the reverse, and annotating is not nice of course (as is maintaining the linear invariant); but you should really ask an expert.

But you can always wrap a linear type in a monadic thingy; not the reverse.

Thanks

Thanks for posting this Luke. I hadn't come across it before, but it's been (for me) one of those papers that gave me the small change in perspective needed to suddenly see some connections between a couple of seemingly different things I've been reading on lately (in particular, it's helped to illuminate some parts of Baez and Stay's "Rosetta stone" paper, recently mentioned on LtU here, that I hadn't fully digested).

To answer the question, only fractionally tounge-in-cheek:

I remembered this paper while chatting with a friend who's designing a stack-based instruction set and looking for relevant compilation techniques (imagine compiling C to Forth). Do you have some relevant references?

The answer is (perhaps) in Baker's paper, a statement as true today as it was then:

"There has never been a simple or particularly effective mathematical model to aid in the deep understanding or compiling of programs in these languages.

There is a kind of symbiotic, emergent irrationality that happens between language and processor design. Programming language design consistently tries to respond to the most invested in HW which in turn consistently tries to respond to the most invested in programming languages and, in the case of the algol-class and traditional functional languages that feedback loop seems to have been sparked by a hypothesis about programming notations that, to this day, has yet to be confirmed. in spite of countless attempts.

Each field being mainly about the other adds up to an over-arching meta-field that is about mainly just its own quirks.

-t

Efficiency problem

What happens when you have a loop of 11 instructions? Do we need to fall back to traditional methods?

11 efficiency

What happens when you have a loop of 11 instructions? Do we need to fall back to traditional methods?

So, we have a basic block like "A B C D [...] K" -- 11 words and let's assume these are all "primitive" words - somehow built-in - they aren't synthetic subroutine calls.

Suppose it's in an "infinite loop" - just as an easy, simplified case to think about. We could rewrite the loop, using tail calls, as: "A B C D [..] H I N" where "N" is actually a call.

"N" of course is defined as "J K A B C D E F G P" where "P" is defined as "H I J K A B C D E Q" which is defined "F G [...]" etc.

To generalize from the infinite loop case to finite loops, look up (if you don't already know it) "Duff's device".

So picture the hardware there. When we load the first instance ("A [...] N") it's all straight-line code except for the tail call "N" and we can probably detect that very quickly. So while one part of the CPU starts performing primitive operation "A", another part can start pre-fetching the definition of "N" (namely "J K A B [...]").

Which is essentially what hardware does already, sorta, except that the way we do it today is contorted like mad, for mostly historic reasons, compared to what Baker was describing.

-t

p.s.: the cyclic tail-call solution I gave isn't the only one. You can also just automagically break up long sequences of primitives into a tree of calls. Note how, in both cases (both solutions), real performance (wall-clock time) is a lot easier to think about and predict than with what we have today. And the HW is likely to be much simpler.