## The Killer Application for Concatenative Languages

I have long had a gut feel that the ability to algebraic
refactor / rewrite / simplify programs is a killer feature of
Joy / concatenative languages.

See Manfred von Thun's Rewriting System for Joy paper for more info on this.

I have just noted an practical application for this...

Out here in the embedded computing world there is a vast body of Legacy
assembler applications which cannot be ported to new CPU's as the old
parts go obselete.

Write in a concatenative language of your choice fragments that exactly
duplicate the behaviour of each machine instruction.

"assemble" the assembler source code into the concatenative language.

Algebraically simplify as much as possible. (Quite likely to something
even simpler than the original source!)

Compile the concatenative language to new CPU.

## Comment viewing options

### The hard part is arbitrary memory reads and writes.

Though the function call stack is important, all the non-function-call-related instructions in most machine languages isn't stack based. They manipulate registers and an memory locations at computed offsets. I don't see how the transformations of a simple rewrite system can help untangling that mess.

### Also huge parts of those

Also huge parts of those kind of programs are manipulating things like timers, i/o-ports etc. directly, because with the often rather limited processing power and memory they can't affort doing it via library-calls. Those things variate very much between different kinds of processors and you have to emulate them too.

### byte codes are concatenative languages

Well in the same way as a stack based language's operators consume a stack and return a stack, the instructions for virtual machines consume the entire current state of the machine and return a new state of the machine. Thus all virtual machine instruction sets are purely functional concatenative languages -- until you get to I/O as Karsten Wagner points out.

So whatever rewriting you could do in a concatenative language like Joy, you could do directly on the instructions or byte codes of a virtual machine. Personally I think it is easier to do rewriting on expression trees rather than strings of operators.

### rewriting

I have long had a gut feel that the ability to algebraic refactor / rewrite / simplify programs is a killer feature of Joy / concatenative languages.

You can do this with any pure expression language. I don't see why this is an advantage for concatenative languages.

### Refactoring

You can do this with any pure expression language. I don't see why this is an advantage for concatenative languages

The key feature is the absense of bound named variables. You can take any contiguous run of words in a concatenative program and factor it out into a new separate word by simply copying and pasting. Compare this to factoring out a program fragment in a traditional language where you need to be observant of the bound variables referenced in the fragment.

### This is very well put. I

This is very well put. I consider an another advantage of concatenative languages (which I think should be called compositional languages, but I fear momentum may have won out here) is the fact that they can return multiple values. This allows more code to be reduced trivially, e.g.:

{ swap swap } -> { }
{ cons uncons } -> { }

### Really?

I would like to see more empirical studies about the practical potential for optimization on this level, and how often optimizable patterns come up in real code.

Absence of bound variables means that patterns which take variables from a different depth look different, while in a language with nested expressions they look the same modulo variable names (which are easily abstracted away).

There are definitely high level optimizations which are easier performed on a more high level code representation. Data flow analysis (e.g. propagation of inferred types) is easier with named variables because a given reference keeps the same name all the time, it's not shuffled around in the stack as different values are needed near the top. Gathering usages of a variable (e.g. to see whether it needs to be materialized in the generic representation or it's enough to unbox its contents) is easier: with named variables you scan usages of the name, with a stack based language there is a higher probability of the value being moved around without performing essential operations on it which shouldn't disable the optimization.

### Trivial transformation

The key feature is the absense of bound named variables. You can take any contiguous run of words in a concatenative program and factor it out into a new separate word by simply copying and pasting.

This has obvious intuitive appeal, but it doesn't make a significant difference for automated transformations. The analysis involved in identifying variables that are free in a term being extracted, and making them into parameters of the resulting procedure, is trivial.

### Fusion Optimizations

This is a valid point, but I think of greater consequence is the ability to identify and fuse higher-order operations. For example, consider the case of:

define f(x) = map(g, reverse(reverse, map(h, (x))));

Now expressed compositionally (I still refuse to use the term concatenative), you have:

define f { [h] map reverse reverse [g] map }

Now apply the following trivial rewriting transformations :

transform { reverse reverse } -> { }
transform { [f] map [g] map } -> { [f g] map }

You end up with:

define f { [h g] map }

Which is significant and easily achieved.

This is a faked example, but there are substantially more non-trival transformations. I have slides from a recent presentation at http://www.cdiggins.com/cat/cat.pdf which explores this very idea.

### A.k.a. macros

Transformations like this can easily be expressed as macros. Here's a working example of the above transformations, using PLT Scheme (edit: modified to support multiple transforms per macro):

; Use Jens Axel SÃ¸gaard's syntax unexpander for PLT,
; to support multiple transformation passes
(require (planet "pp-syntax.ss" ("soegaard" "syntax.plt")))

; macro to define optimization transformers
(define-syntax define-transform
(syntax-rules (=>)
((_ name (other-name ...) (patt => result) ...)
(define-syntax name
(syntax-rules (name other-name ...)
; match pattern and perform transformation
(patt result)
...
; handle calls to 'name' that don't match pattern
((_ arg (... ...))
((#%top . name) arg (... ...))))))))

; define the actual transformations
(define-transform reverse () ((_ (reverse exp)) => exp))
(define-transform map () ((_ f (map g l)) => (map (compose f g) l)))

; define the test expression
(define test
(syntax
(define (f x)
(map g (reverse (reverse (map h x)))))))

; perform two transformation passes (could be made more general)
(unexpand-to-datum (expand (unexpand (expand test))))


The final line produces the desired result, having eliminated the two calls to reverse, and one call to map:

(define f (lambda (x) (map (compose g h) x)))


Notice that with the define-transform macro, actually defining transforms will often be a one-liner. From a brief look, I believe that most of the transforms in the linked slides could be expressed this way, e.g. from slide 86 (edit: also added slide 87 transform):

(define-transform filter (map)
((_ g (filter f xs)) => (filter (lambda (x) (and (f x) (g x))) xs))
((_ pred (map f xs)) => (filtermap
(lambda (x)
(let ((ans (f x)))
(values (pred ans) ans))) xs)))
; alt to latter rhs: (filtermap f g xs), filtermap constructs function


[Edit: any transforms that couldn't be expressed like this could be implemented directly, rather than via my simple define-transform macro.]

BTW, if the semantics of Cat are similar to Joy, then I would expect that the right hand side of the first transform on slide 86 should be { [dup f swap g and] filter }, instead of just { [f g and] filter }. I noticed this when converting it to use names, so if it is indeed a bug, then this seems to indicate a benefit to using local names... ;)

### I'm impressed! That is quite

I'm impressed! That is quite cool. Surely the flat representation of Cat programs versus the nested representation of applicative languages such as Scheme, must offer some advantage? If you had to argue that position what insight could you offer me?

(I hope that the advantages of Cat, is not nothing more than fewer parantheses!)

You most definitely did uncover a bug, thanks for pointing that out. I won't argue that there are aren't clear benefits to using local names.

### Linear models

I'm impressed! That is quite cool.

Macros are cool! ;)

Surely the flat representation of Cat programs versus the nested representation of applicative languages such as Scheme, must offer some advantage? If you had to argue that position what insight could you offer me?

Some advantage for what purpose? For a language implementation, the biggest advantage of the "flat" model seems to be that it's very easy to implement a language that uses it. That certainly has its benefits.

Beyond that, there are certainly benefits to exploring and thinking in terms of more than one model. It can lead to insights that might not be arrived at by thinking in terms of a single model, and may also lead to useful hybrid approaches. In that sense, "flat" representations are interesting and potentially useful, in particular because they simplify reasoning in certain areas.

For example, a related model is linear naming, which can be useful for distributed systems — see Alan Bawden's Implementing Distributed Systems Using Linear Naming. In the case of Forth/Joy/Cat, a value pushed on the stack can only be used once — to use it more than once, it has to be duplicated explicitly. In linear naming, a name can only be used once. In another related model, linear types, "values belonging to a linear type must be used exactly once" (Wadler).

As you probably know, Henry Baker pointed out connections between these models in Linear Logic and Permutation Stacks--The Forth Shall Be First. The section "Forth is a system of linear combinators" points out that combinator systems "replace random access to a value by means of a name with steering logic to propagate values to the locations in an expression tree where they will be used." Stacks are one possible steering logic. Forth, Joy and Cat could be described as linear-valued languages with purely stack-based steering logic (trips right off the tongue!)

However, purely stack-based linear-valued languages tend to have a pair of features which are somewhat at odds with each other for real implementations: programs tend to be lower-level without local names, in that the explicit stack operations that are needed in a program are not usually a feature of the program's problem domain; but at the same time, a pure stack model results in some inefficiencies that can only really be addressed by abandoning the purity of the stack model. This means that the language doesn't get as much of an efficiency benefit from being lower-level as it might otherwise. Addressing either the low-level issue or the efficiency issue leads to hybrid models with a less constrained approach to abstracting values, in which case the stack becomes just one aspect of the value steering logic.