The Problem with "dup" and "swap" in Stack-Based Languages

Operations like "dup" and "swap" in stack-based languages are special in that they return multiple results (unlike most other languages which simulate multiple results by returning tuples).

I find expressing the concept in a type system challenging (if anyone else has done it successfully I would definitely like to hear about it).

One of the core problems is that the type of a function like eval, which is supposed to execute any function:

  eval : (a (a -> b) -> b)

This makes a lot of sense when a function only has one argument and one result. I can reduce every function to an equivalent one having one argument:

  (a b -> c) = (a -> b -> c)

But I do not know how to reduce a function to having one result. This leads to specific problems with certain terms, which I go into more depth on my blog ( http://cdiggins.com/2006/12/09/type-ambiguity-in-cat-and-the-importance-of-formalization/ ).

The only solution I can think of is to creat another eval functions which returns two results:

  eval2 : (a (a -> b c) -> b c)

Any thoughts?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Stop me if this is dumb, but

Stop me if this is dumb, but aren't dup and swap really defined by their side effects on the stack?(I'm playing with stack programming by writing a (dynamically typed) stack interpreter, where my eval is a function (stack -> stack).

Not dumb at all!

... aren't dup and swap really defined by their side effects on the stack?

I believe this to be a perfectly valid imperative model for untyped or weakly-typed stack-based languages. In other words, some languages like Forth and various assembly languages, do take this approach. My quest however is to design a functional stack based language with a formal type system.

One goal of the Cat language is to assure that at every step in the program the configuration of the types on the stacks is unambiguous, and can be determined in advance.

If you look through documentation of various untyped languages like CLI (and I believe also the the JVML) you will see statements about the requirement for the type and number of values on the stack to be unambiguous and knowable at compile time (someone jump in and correct me if I am making a mistake here), my goal is to express these kinds of requirements formally using a type system.

What do you mean by "untyped"?

A little off-topic, but I'm curious to know what you mean by "untyped" with regard to the JVM and CLR. The bytecode verifier will reconstruct the type information necessary to make sure all the operations are safe and so, at least from that perspective, the language is typed. I believe that the Java 6 bytecode format even adds type annotations to make the verifier's job easier.

Then I stand corrected.

I stand corrected, I had always thought the CIL and JVML were considered untyped or weakly typed languages. Perhaps this used to be true in the past? Anyway, it doesn't really matter where I got my misconceptions from :-)

Thanks for correcting me.

I'm not at the caliber of many who post here, but:

Here is how I've approached strongly-typed stack stuff in a hobby project...

The entire stack is typed -- Rather than invent some new type for represnting the type of a stack for this post, I'll use a cons-style notation to create stack types out of tuple types. Some arbitrary portion of a stack type might be variables.

A stack with a single int on top might be (int,'resta). One with two ints: (int,(int, 'restb)). The final type variable always represents the type of "the rest of" the stack.

All operations/functions would be from a stack to a stack:

AddInt : (int,(int,'a)) -> (int, 'a)
Dup: ('a,'b) -> ('a, ('a, 'b))
Swap: ('a,('b,'rest)) -> ('b,('a,'rest))
Eval (calls fn on the top of the stack, with the rest of the stack as its argument):
  ('a->'b, 'a) -> 'b
Swap . Dup : ('a,('b,'rest)) -> ('b,'('b, ('a, 'rest)))

I think the key thing you're looking for is:
- All operations/functions are from a stack, to a stack.
- You can use a type variable to represent the remainder of the stack in the signatures of functions.

This way the type of a function fully encapsulates its effects upon the stack, and preserves (as a varaible) whatever parts of the stack that it does -not- effect.

Perhaps I've made some horrendous logical errors that someone else can point out to me; but this seems to be a very straightforward way of dealing with strongly-typed stacks and certainly worked "well enough" for my purposes so far.

Cheers!

Interesting

This looks very similar to the Cat type system, which I find heartening, as it indicates I may be on a good path. :-)

It is interesting that you chose to express stacks where the left most item is the top of the stack. I have been flip-flopping on this design issue. How comfortable do you feel about your choice, do you prefer it greatly to the alternative?

I have some type questions. How do you type the following terms:

  compose : ('a->'b 'c->'d) -> ??

Where compose takes two functions and combines them to create a new one, or do you not support this function? Also what about the following:

  [dup] dip eval : ???

Those are the terms which have given me the most grief so far. Thanks in advance.

Maybe I'm missing something

Maybe I'm missing something (haven't closely followed), but what's wrong with

dup :     (a, r) -> (a, a, r)
swap :    (a, b, r) -> (b, a, r)
eval :    ((r)->(s), r) -> (s)
compose : ((r)->(s), (s)->(t), u) -> ((r)->(t), u)

where I use a,b,c for ordinary type variables, and r,s,t for stack (row) variables.

I didn't understand what dip is supposed to do... :-)

EDIT: Now I do. :-) So I would expect

dip : ((r)->(s), a, r) -> (a, s)

which then leaves you with

[dup] dip eval : ((a, a, r)->(s), a, r) -> (s)

by just doing a bit of unification on the types above.

(Also fixed type of compose.)

Oh-oh

Now we have multiple stack notations floating around :-)

THe general consensus which I have now adopted seems to have been that the following notation is preferable:

dup :     (r a -> r a a)
swap :    (r b a) -> (r a b)
eval :    (r (r -> s) -> s)
compose : (u (r -> s) (s -> t) -> u (r -> t))

What are your thoughts on the new notation?

I corrected your version of compose (which I believe is based on a mistake I made earlier). Despite being correct the issue I have is with the type of compose. Consider the following:

define f { [42] [+] compose } // same as [42 +]
[42] : ( -> ( -> int))    
[+] : ( -> (int int -> int))

Notice that if you do a derivation the s variable maps both to "int int" (consumption of +) and to "int" (production of 42). Or am I missing something here?

It's really simple

Notice that if you do a derivation the s variable maps both to "int int" (consumption of +) and to "int" (production of 42). Or am I missing something here?

Yes: the types for [42] and [+] are wrong. They should be:

[42] :    (r) -> (r (s)->(s int))
[+] :     (u) -> (u (v int int)->(v int))
compose : (w (x)->(y) (y)->(z)) -> (w (x)->(z))

Now you just unify:

first cat yields: (r (s)->(s int))  ==  (u)
   ==>  u = r (s)->(s int)

second cat yields: (u (v int int)->(v int))  ==  (w (x)->(y) (y)->(z))
   ==>  z = v int
        y = v int int
        u = w (x)->(y)

propagating u yields: r (s)->(s int)  ==  w (x)->(y)
   ==>  w = r
        x = s
        y = s int

propagating y yields: v int int  ==  s int
   ==>  s = v int

Taking the input type of [42] and the output of compose, and substituting, gives you the final type:

(r) -> (r (v int)->(v int))

Really, I think you are making it all too compliated. AFAICS, the type system of your language can be as simple as the language. You don't need subtyping, you don't need that type compose operator, you don't even need kinds. You only need row polymorphism.

Let me give it a try. Here is a simple grammar:

(programs)  p ::= empty | x | i | [p] | p p
(types)     t ::= a | int | (r)->(r)
(rows)      r ::= A | r t
(schemes)   s ::= !a1..anA1..Am.t 

I distinguish type and row variables syntactically by capitalisation (a and A), so I don't need kinds. "!" stands for forall. Note that there are no empty rows, because you never need them.

Now just five straightforward syntax-directed typing rules. T is the environment mapping variables to type schemes.


--------------------- (EMPTY)
T :- empty : (r)->(r)

        T(x) = !a1..anA1..Am.t
------------------------------------------ (VAR)
T :- x : t[t1/a1]..[tn/an][r1/A1]..[rm/Am]

--------------------- (CONST)
T :- i : (r)->(r int)

    T :- p : t
--------------------- (QUOTE)
T :- [p] : (r)->(r t)

T :- p1 : (r1)->(r2)    T :- p2 : (r2)->(r3)
-------------------------------------------- (COMPOSE)
           T :- p1 p2 : (r1)->(r3)

That's it. The typing rules are indeterministic (as usual with implicit polymorphism), so you have to guess the right types and rows where needed. You implement this using fresh type variables and unify them later (which is what I sort of implicitly did in my computation above).

Doh

Yes, that is very simple indeed. Thanks Andreas!

Great stuff!

"int" (production of 42). Or am I missing something here?

Yes: the types for [42] and [+] are wrong. They should be:

[42] : (r) -> (r (s)->(s int))
[+] : (u) -> (u (v int int)->(v int))
compose : (w (x)->(y) (y)->(z)) -> (w (x)->(z))

I was just leaving the rho variables (r,s,u,v,w) implied, however it is clear to me that just leads to complicated overcompensation. I see that by making them explicit everything is much simpler.

Really, I think you are making it all too compliated. AFAICS, the type system of your language can be as simple as the
language

Wow, I see what you mean! That is an impressively compact set of rules. So, is this type system blindingly obvious, or is there some novelty to it, despite its simplicity?

I distinguish type and row variables syntactically by capitalisation (a and A), so I don't need kinds.

This statement confuses me. You have a separate distinction between types and rows. Doesn't making a distinction imply that they are separate kinds, even if you don't call it out explicitly? I'm probably missing something fundamental here as well.

I have to say Andreas, as always I really appreciate your help here!

Explanations

I was just leaving the rho variables (r,s,u,v,w) implied, however it is clear to me that just leads to complicated overcompensation.

Yes, in particular you no longer see where the variables are shared. For instance, in the type of eval this is not canonical.

So, is this type system blindingly obvious, or is there some novelty to it, despite its simplicity?

Well, frankly, I'd say it's fairly obvious, at least to somebody familiar with polymorphic type systems. And I'm quite sure that something along this lines has been done many times before, although I don't have any reference. Sorry for that. :(

Doesn't making a distinction imply that they are separate kinds, even if you don't call it out explicitly?

In a sense, yes. If you want, I didn't need explicit kinds. But usually types and kinds are what a given type system defines as such, so if it does not define kinds, there aren't any. ;-)

Everything else makes perfect sense to me, but I have trouble fully grasping the implications of this rule. It seems to be mapping types to type variables, but I don't know what all of the components me (e.g. where does "t" come from, what specifically does "t1/a1" mean?)

"u[v/a]" is standard notation for substituting the free variable a by v in u (equivalent notations also used in literature are u[a|->v] or u[a:=v]). The type t in the rule is the body of the type scheme !a1..Am.t classifying x, looked up in T. The substitutions are instantiating it to a monomorphic type. The types and rows in the substitutions can be anything - for a typing derivation you have to "guess them right" nondeterministically (just like the rows in the other rules). That is the usual declarative formulation of ML-style implicit polymorphism.

Algorithmically, you use the fresh-type-variables-and-unification approach I was mentioning to implement this: instead of some ti or rj you substitute fresh unification variables and determine the actual types they represent incrementally by unification based on the constraints imposed from the other rules downstream. In this system, the only rule that actually imposes a constraint is the COMPOSE rule.

If you want to dive in deeper: Pierce' Types And Programming Languages has a readable 20 page chapter on type reconstruction that explains all this. If you want to get yourself bootstrapped with type systems that book is a must-read anyway.

T(x) =


        T(x) = !a1..anA1..Am.t
------------------------------------------ (VAR)
T :- x : t[t1/a1]..[tn/an][r1/A1]..[rm/Am]

Everything else makes perfect sense to me, but I have trouble fully grasping the implications of this rule. It seems to be mapping types to type variables, but I don't know what all of the components me (e.g. where does "t" come from, what specifically does "t1/a1" mean?)

Sorry for being dense.

Attempt at further clarification

So for other people attempting to read this thread, I want to point out that Andreas here is making explicit the so-called rho (ρ) variable which is implicit in every stack function and represents the "rest of the stack". In Cat every function has the following form (using the updated notation which lists the stack right to left/top to bottom):

  A : stack
  B : stack
  ∀ρ.(ρ A -> ρ B)

This means to say "this function maps all stacks with the configuration of types on top (labelled A) to a new stack where A has been replaced by the new type configuration B.

This follows from earlier discussion in the forums.

I hope this helps clear things up a bit for other readers?

Keep in mind...

All functions are from a stack, to a stack -- so they are all single-arugment functions, really: So 'compose' takes as its input not two functions, but a stack with two functions at the top. Like so, reusing my earlier notation (which was chosen to eliminate any "magic" in having a type variable represent the remainder of the stack):

compose : ('a->'b ('c->'d, 'rest)) -> ('a->'d,'rest)

And if I googled dip correctly:

dip : ('a,('b->'c, 'b))->('a,'c)

I'm afraid I don't know enough about notation or evaluation order to know how to interpret '[dup] dip eval'.

However, just for fun, here are some implementations in O'Caml. The stack is represented as nested "consed" tuples with the first item representing the top, and the second item being a tuple with the rest of the stack.

let eval (fn,stack) = fn stack
let addint (a,(b,stack)) = (a+b,stack)
let dip (a,(fn,stack)) = (a,fn stack)
let cons (hd,(rest,stack)) = (hd::rest,stack)
let uncons(list,stack) = (List.hd list, (List.tl list, stack))
let dup (x,stack) = (x,(x,stack))
let nil stack = ([],stack)

The types I have been giving have been more or less exactly what OCaml's type inference gives you.

If I interpret "[dup] dip eval" as pushing a 'dup' onto the top of the stack, calling dip, then eval, I can rewrite it as a function:

let cdiggins stack =
  let pushedDup = (dup,stack) in
  let evaled = eval pushedDup in
  dip evaled

That function has type: "('a -> 'b) * 'a -> ('a -> 'b) * 'b", or, rewriting to use "," instead of "*" for tuples:

('a->'b,'a) -> ('a->'b,'b)

Hopefully that is clear as mud! :)

Cheers

Thanks

I know my response is a bit delayed, but this post is very helpful. It took me a while to grasp it fully. Thank you very much.

Are you sure you are trying

Are you sure you are trying to type dup and swap? What you're providing is a formal stack effect notation I think, one that is possibly checked at compile time. But a type? The type is stack -> stack. Unless maybe you think of swap as having the type (stack -> stack) -> (stack -> stack), where the argument is actually the rest of the program, and it produces a new program. Well I was going to write more, but instead I googled and found a strongly typed forth that you may want to look at for inspiration. http://home.vrweb.de/stephan.becher/forth/ In that dup has type ( SINGLE -- 1ST 1ST ) where SINGLE apparently has many subtypes and 1ST repsents the actual type of the value.

StrongForth

StrongForth expresses a useful type system for the Forth language, but since it only types Forth it lacks higher order functions. The type system is mostly inadequate to express common higher-order functional idioms (if I am not mistaken, please correct me if I am wrong).

So if I understand correctly, you are questioning the core premise of whether different functions in a stack based language have different types. Is this correct?

As you say, all functions map from a stack to a stack. So technically they can all be viewed as having the same type. I can't prove your assertion wrong, but my intuition says that a function which consumes a stack with two integers on top and produces a new stack with one integer on top, is fundamentallly different than a similar function which consumes two strings and produces one string. This indicates to me that they are in fact examples of two different types since they are incompatible.

So by way of a concrete example:

  4 2 add_ints     // well-typed
  "hello " "world" // ill-typed

So what I'm getting at is that it is useful to make a distinction between different stack configurations, and calling these different configurations "types" appeals to my intuition. Of course I can not state this with any great authority, since I am not an expert in type theory.

I think that since you're

I think that since you're defining a type system, you get to say what the types are. You could give every term a type (Stack->Stack), but that's not very interesting.

Of course I can not state

Of course I can not state this with any great authority, since I am not an expert in type theory.

Same goes for me! ;)

dup :: S a -> S a a
swap :: S a b -> S b a

works of course but that still leaves you with eval, etc. I think you might need dependant typing, but see disclaimer above ;)

OTOH, if eval does what I think it does, it's not a function. It's function application.

f (x,y) -- What is the type of the "space" between f and (x,y) ?

same with

[ dup ] eval right? That's the same as
dup

The act of applying the function to the stack doesn't have a type does it? eval is syntax, not a function. At least if I may be allowed to appeal to _my_ intuition :)

Good question

OTOH, if eval does what I think it does, it's not a function. It's function application.

Can't we represent function application using a function? I don't see why not.

f (x,y) -- What is the type of the "space" between f and (x,y) ?

This is an insightful question. I have never thought of asking it before. I don't know what "(x, y)" is supposed to represent in this context, but I might be able to answer the question for "f g".

In mathematical terms "f g" is equal to "g o f" or the compose g of f.

In semantic terms:

  [f g] = [f] [g] compose

Hopefully this answers your question (but I fear I may just be dancing around the correct response).

eval is syntax, not a function

I don't see why it can't have a type, and be treated exactly like any other function (e.g. passed as a parameter returned from a function). The following makes perfect sense to me:

  [eval] dip : (A (A -> B) c -> B c)

Perhaps I am missing something fundamental here, but if it has first class status I believe then it no longer qualifies as simply syntax.

Hmm maybe to borrow a term

Hmm maybe to borrow a term from lisp, "special form"? Yes eval is first class, but I'm having difficulty viewing it as a function per se. I'm not sure it's sane to type eval. Eg in languages with string evals the result of eval can be any value in the language. Your eval is more like Lisp's, but Lisp is dynamically typed, so no help there. Maybe I just have a personal problem :)

[dup] dip eval

On your blog you mentioned a term - [dup] dip eval - as an example of the problem. I agree that's a good example... I had the feeling there was something like that going on, but couldn't put my finger on it. One thing to note is that the type rules I listed in the last thread will I believe derive any of the types you identified as possibilities for that term. The point is that there is no principal type for such a term in your type system.

If you added constraints to your type system you could find a principal type for that term "Z a (X->Y) -> W Y where Z a a = W X", but then I think unification will be undecidable.

Making every function fixed arity will solve your problem, as you've noted, but will that be a reasonable reduction in power? When you talk about proving your type system complete, do you know what you want it to be complete with respect to? It's not going to be complete with respect to the untyped semantics, since for example the untyped language would allow terms that pushed and popped an unbounded number of values.

One thing to note is that

One thing to note is that the type rules I listed in the last thread will I believe derive any of the types you identified as possibilities for that term. The point is that there is no principal type for such a term in your type system.

Yes that it is precisely how I arrived at the problem, I was creating a derivation tree and I came up with two legal derivations. I assume this is what is commonly called an "ambiguous typing".

If you added constraints to your type system you could find a principal type for that term "Z a (X->Y) -> W Y where Z a a = W X", but then I think unification will be undecidable.

This is an interesting solution. A decidable type system is not a particularly high priority for me. I have mostly abandoned hope of having both an expressive and decidable type system for Cat. I don't believe having both characteristics is a reasonable goal.

Another alternative solution which may also work is to resort to my original composition rules which used stack difference.

  [dup] dip eval : (B\(a a) (B -> C) -> (a a)\B C)

For those unfamiliar with the notation, lowercase variables are type variables, while upper case variables are stack variables, B\(a a) represents the difference between the stack B and the stack with two a's on top of if, and (a a)\B represents the difference between the stack with the a's on top, and the stack B. More is available on this in the current draft of the paper (warning: the stack syntax is the reverse of the current discussion)

Notice this also implicitly captures the following constraints on b:

 
  B : nil | (a) | (a a) | ∀ρ.(&rho a a)

The other reason I return to this is that the stack difference operator is not intended to be a purely theoretical construct (i.e. solely for describing the type system). Such a thing becomes crucial I believe to express the type of functions like the compose operator:

  compose : ((A -> B) (C -> D) -> (C\B A -> B\C D))

Note that all primitives need to have types which can be expressed using the type annotation systems.

Postscript: I want thank you very much BTW for your patient tutelage in learning the notation and style of producing derivation trees. I would very much like to credit you in the finished paper, perhaps you could share your full name and contact info with me by email? cdiggins@gmail.com. The same also applies to Mario B. who has been a big help in getting started down this path.

Another alternative solution

Another alternative solution which may also work is to resort to my original composition rules which used stack difference.

[dup] dip eval : (B\(a a) (B -> C) -> (a a)\B C)

I see your point about how your \ notation could be a better way to encode the constaints, but this looks a little wrong to me. Shouldn't it be:


[dup] dip eval : (B\(a) B (B->C) -> B\(a a) C)

EDIT: Oops, this is in error. I'll come back to it.

I find this notation a little confusing though, since B needn't be a prefix of (a), in which case you're pulling off of the stack before it. I don't think the clamping description you give for X\Y is sufficient to explain what's going on here when |X|>|Y| (though I may be wrong). Maybe you should explicitly model the entire stack:


[dup] dip eval : (B\(D a) B (B->C) -> B\(D a a) C)


Postscript: I want thank you

It's no problem - I'm happy to help where I can. I may send you an email sometime, but don't worry about crediting me.

Completeness

Since my other post was quite long, I wanted to address this separately:

Making every function fixed arity will solve your problem, as you've noted, but will that be a reasonable reduction in power?

As you can imagine it severely restricts the usability of the language compared to untyped languages like Forth and Joy. I would rather avoid it if at all possible.

When you talk about proving your type system complete, do you know what you want it to be complete with respect to? It's not going to be complete with respect to the untyped semantics, since for example the untyped language would allow terms that pushed and popped an unbounded number of values.

I agree. I believe it is unreasonable to aspire to the entire semantics of untyped stack-based languages. Unfortunately I can't express with any clarity the completeness I aspire to. I am hoping to achieve something which expresses the majority of the semantics of Joy, but I am happy to part with some of the functions which have effects on the entire stack.

One thing you could try is

One thing you could try is to establish a correspondance with terms in a typed lambda calculus with an appropriate recursion operator. Then maybe you could aim to show completeness with respect to that interpretation.

CPS?

This may be obvious enough that it hasn't been mentioned, but you can always use a continuation to turn a function with multiple return values into a single-valued one:

a -> b c   =>   a -> (b -> c -> 'd) -> 'd

I can't, however, claim to know whether that would play nice with a stack-based language like Cat...

Thank you!

I didn't actually realize this fact. I really appreciate you pointing it out to me. Previouslly I understood continuations only superficially, not formally. This is very helpful.

Tuple type inlining

You could use a type-domain operation that "inlines" the constitutents of a tuple type into the containing tuple type. I used this operation once in the past for syntactic sugar.

Assume that the ".." suffix means "inline this tuple type into its container". (If the suffixed type is not a tuple type, then ".." has no effect.)

eval : (A.. (A.. -> B..) -> B..)
dup  : (V -> V V)

When "eval" is applied to "dup", "A" will bind to "V" and "B" will bind to "(V V)".

Another interesting function is one that converts a traditional single-input-single-output function into one that operates directly on the stack. So, lets say you have a function "foo" that accepts a tuple of size two and returns a tuple of size three. The lifted version of "foo" would pop two values off the stack and push three values on to it.

lift     : (A.. (A -> B) -> B..)
foo      : ((int string) -> (bool bool float))

foo lift : (int string -> bool bool float)

I'm not too familiar with Cat, but I think the inline operation can replace the special "any*" type as well.

This is interesting

So could this be classified as a unification operator? It seems to do what Andreas does in his unification step when creating a type derivation.

If I understand what Andreas has been saying earlier, there appears to be no actual need to make such an operator explicit. A correct type system seems to be unambiguous about what to do.

The lift function, is a great idea though. When Cat supports tuples then it will come in very handy.

I'm getting curious

I'd like to take a look at this problem too, do you have a minimal set of stack operations that covers all the essential problems?

Unfortunately I don't.

Unfortunately I don't. However, it looks like Andreas Rossberg's posting earlier provides a complete solution.

Forget about stacks?

I've often thought of stack operators as simply a way of referring to ("naming") subexpressions, if we imagine a lazily evaluated stack language where we have unevaluated expression trees on the stack, instead of values.

It might be useful to define a mechanical translation to CPS in conventional functional notation. I have the suspicion that the stack operators would be executed at "compile-time", having no obvious equivalent form in the result (replaced by named variables).

For example: E dup * (where E is an expression evaluating to a single value) could be rewritten as (let x = E in (* x x)). It gets a bit complicated with expressions like if swap then -, I guess continuations would come in handy there.