The Type of 42

The following is a reposting from my blog at http://www.artima.com/weblogs/viewpost.jsp?thread=182140.

When I was designing Cat I started with the assumption that the type of a number such as 42, must be a function which pushes a primitive integer type onto the stack. So the type of 42 would be:

42 : ()->(int)

The type of a function which yielded 42 would be:

[42] : ()->(()->(int))

In Joy the operation for evaluating functions is "i". The equivalent in Cat is "eval". The type of eval in Cat currently is:

eval : (A (A:any*)->(B:any*))->(B)

This meant that the following was badly typed in Cat:

42 eval

This is because eval expected a function on the top of the stack, but 42 produced a primitive on the top of the stack. You would have to instead write:

[42] eval

The decision for this type design was based on two things:

  1. That kind of type system was more familiar to me
  2. The programs more closely mapped to assembly language as-is

In assembly an operation which yields 42 and the value 42 are very different things. My thought was that I wanted to be able to trivially map Cat functions to assembly code. My concern is that the execution loop of an assembly program where values were executable would be non-trivial. I assumed you would need a switch statement and this would signficiantly hurt performance (Cat is striving to become a full competitor with real assembly code, so performance is a big concern). I am no longer convinced that this would actually be a problem.

The first issue is writing the type of numbers. It appears that what is needed is the ability to name new types and express them recursively:

42 : int;
int = ()->(int);

The next challenge is converting to assembly code. I haven't worked out all of the details, but I believe that when generating assembly, I need to simply insert type conversion routines at key places (i.e. where an integer is being executed, or treated like a function).

Do all this seem sane and rational?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The answer is 6x8

Do all this seem sane and rational?

How can we tell? Without a description of your type system's typing rules and some idea about what you are trying to accomplish via your type system, I'm not sure that any of us can assess the sanity and rationality of the proposal.

Sorry ...

There is an informal description of the type system http://www.cat-language.com/manual.html. I hope that helps provide some background. I am still working on a formal description of the Cat type system.

What is your type system for?

I am still working on a formal description of the Cat type system.

You might want to check out this thread.

What kinds of programs would fail to receive a type in your system? For example, your (A:any*)->(B:any*) type looks at first glance to match any arbitrary function without side-effects (and possibly even any WITH side-effects; again I can't tell from the description how it might detect that side-effects occur).

If all programs (not just some well-formed subset of programs) can be typed, then your type system is equivalent to an inconsistent logic, which isn't usually considered useful.

I will grant that if the types were explicitly written, this might still serve as a useful hint to programmers who come later of the intent of a given program, but you clearly state that you want inference. So I have to confess, I'm not sure why you want to have a type system at all for Cat.

Not all programs can be

Not all programs can be typed. Any well-typed Cat program without side-effects will indeed match (A:any*)->(B:any*). Some programs fail to be typed, because they violate the typing rules of primitive operations:

For example the primitive add_int has the following type:

define add_int : (int int)->(int);

So the following program:

define bad { "string" 42 add_int }

Can not be typed.

A Cheshire Cat?

How about:

define mystery : (A:any*)->();

define strange_adder { 1 2 3 4 mystery add_int }

What will strange_adder's type be?

(int int)->(int)

strange_adder : (int int)->(int)

The mystery function must be a clear_stack. The star parameter when used alone in the consumption of a function (rather than a function parameter) is interpreted as a greedy consumer of stack values.

Propagation instead of inference?

Even if the type system is unsound, propagation should be able to cut down on typing a lot. The type system still doesn't seem as useful, though.

Will the real integers please stand up?

The first issue is writing the type of numbers. It appears that what is needed is the ability to name new types and express them recursively:

42 : int;
int = ()->(int);

This is at odds with the stated original assumption, "that the type of a number such as 42, must be a function which pushes a primitive integer type onto the stack." That statement implies a type definition more like:

int = ()->(prim_int)

Can you clarify what you have in mind here?

The former definition implies that there's no such thing as a primitive integer in the language, that all integers are functions which return themselves (the same function). This also implies that other operations are somehow able to break the encapsulation on these "functions" to access the primitive integers hidden inside, although these primitive values would presumably never appear on the stack.

The latter definition implies a dichotomy between two different forms of integer - can both types appear on the stack, for example? This raises consistency questions.

I imagine that what you're getting at is that the "actual" value on the stack will just be a representation of an integer, but you want to pretend it's a function for consistency reasons, so that you can claim that every value in the language is a function. Unlike in the lambda calculus, though, this appears to be a fiction - presumably, your evaluator will have special logic to handle such values, so they're not really independent functions. This raises the question of what this is really buying you.

In general, this seems to be getting at the idea of self-evaluating values. In a traditional interpreter, literal values in an expression are terms which are converted to runtime values by the evaluator. I take it that Cat programs can be represented as data, on the stack, so it becomes possible to eliminate the the usual term/value distinction. The "usual" way to do this is to have values which evaluate "to themselves", but for Cat, you want every value to be a function. These two requirements don't seem to have been fully reconciled.

One thing that might cut through all of this quite quickly is to implement a very small evaluator for a Cat core, in a language like SML. Expressing the core evaluator in a typed lambda calculus should give immediate insights into some of these questions. That would have the added benefit of communicating the core language semantics to many LtU readers quite quickly. It doesn't have to be big: Greg Buchholz just posted two evaluators in a single post, here. Although for this purpose, I'd stay away from both GADTs and typeclasses, which is partly why I suggested SML.

Good idea

One thing that might cut through all of this quite quickly is to implement a very small evaluator for a Cat core, in a language like SML. Expressing the core evaluator in a typed lambda calculus should give immediate insights into some of these questions. That would have the added benefit of communicating the core language semantics to many LtU readers quite quickly.

That sounds like a very good idea. It would definitely lend clarity to my reasoning. Thank you!

Eval in two lines

Great!

BTW, I've previously implemented a toy Joy interpreter, and the eval function was as simple as (in SML):

(* eval : stack -> stack *)
fun eval (Prim f :: s) = eval (f s)
  | eval s = s

Afaict, this implements correct Joy semantics. Everything on the stack is either a primitive function, or a self-evaluating value, including lists (a.k.a. "quotations").

This used a value datatype like this:

datatype value =
  | Prim of (stack -> stack)
  | Var of string
  | List of value list (* a reverse stack *)
  | Int of int
  (* etc *)

withtype stack = value list

If you want to be strict, you could have a separate term type which is the type of the elements in a quotation (list), and a value type which is the type of values on the stack. But the only difference between the two is that quotations don't contain primitives, and the stack doesn't contain variables (at least, I don't think it can observably do so in Joy). Sharing a common value type simplifies things a little.

From a type perspective, there's nothing special about the above: ints would just be ints, not functions. But if you implemented a term/value distinction, I could imagine pretending that all terms are functions which push values onto the stack. You could translate the constant term 42 into the function (fn s => push 42 s), for example, which as you say has type ()->int. But those functions wouldn't normally exist on the main stack, they'd only exist (nominally) in quotations, and as such, their types are really at a different level than those of ordinary functions in the language.

I still don't see what this would achieve, though. My implementation of Joy's "i" operation, which pushes the elements of a quotation onto the stack, looks like this:

fun push_elems (l, s) = foldl push_elem s l

and push_elem (Var v, s) = push_elems (lookup v, s)
  | push_elem     (x, s) = x :: s

IOW, pushing of variables is special, since it pushes their value(s) onto the stack, but all other values are pushed as is. The use of the same representation for quoted values (terms) and actual values is convenient. Introducing some kind of meta-level functions here seems to add complexity for uncertain benefit.

Being strict

If you want to be strict, you could have a separate term type which is the type of the elements in a quotation (list), and a value type which is the type of values on the stack.

I should rephrase that. If you're developing an interpreter for purposes of clarifying the semantics, you do want to be strict, and it's important to make this term/value distinction at the type level. In the Joy case, it would statically enforce that functions intended to work on the stack vs. on quotations are not confused with each other, for example, and in general it will improve the explanatory power of the interpreter.