archives

Linear Logic and Permutation Stacks--The Forth Shall Be First

Linear Logic and Permutation Stacks--The Forth Shall Be First by Henry Baker, 1993.

Girard's linear logic can be used to model programming languages in which each bound variable name has exactly one "occurrence"--i.e., no variable can have implicit "fan-out"; multiple uses require explicit duplication. Among other nice properties, "linear" languages need no garbage collector, yet have no dangling reference problems. We show a natural equivalence between a "linear" programming language and a stack machine in which the top items can undergo arbitrary permutations. Such permutation stack machines can be considered combinator abstractions of Moore's Forth programming language.

I remembered this paper while chatting with a friend who's designing a stack-based instruction set and looking for relevant compilation techniques (imagine compiling C to Forth). Do you have some relevant references?

Today I found this paragraph particularly intriguing:

Since Forth is usually implemented on a traditional von Neumann machine, one thinks of the return stack as holding "return addresses". However, in these days of large instruction caches, in which entire cache lines are read from the main memory in one transaction, this view should be updated. It is well-known that non-scientific programs have a very high rate of conditional branches, with the mean number of instructions between branches being on the order of 10 or less. Forth programs are also very short, with "straight-line" (non-branching) sequences averaging 10 items or less. In these environments, it makes more sense to view the return stack itself as the instruction buffer cache! In other words, the return stack doesn't hold "return addresses" at all, but the instructions themselves! When a routine is entered, the entire routine is dumped onto the top of the return stack, and execution proceeds with the top item of this stack. Since routines are generally very short, the transfer of an entire routine is about the same amount of work as transferring a complete cache line in present architectures. Furthermore, an instruction stack-cache-buffer is normally accessed sequentially, and therefore can be implemented using shift register technology. Since a shift register can be shifted faster than a RAM can be accessed, the "access time" of this instruction stack-cache-buffer will not be a limiting factor in a machine's speed. Executing a loop in an instruction stack-cache-buffer is essentially the making of connections necessary to create a cyclic shift register which literally cycles the instructions of the loop around the cyclic shift register.

Imagine that!