Lambda the Ultimate

On Understanding Types, Data Abstraction, and Polymorphism
started 8/11/2002; 3:01:57 AM - last post 8/12/2002; 8:23:10 AM

jon fernquest - On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 3:01:57 AM (reads: 1921, responses: 7)

On Understanding Types, Data Abstraction, and Polymorphism

In this classic survey of types and polymorphism, the typed lambda-calculus is extended "with type facilities that are more powerful and expressive than those of existing programming languages."

"Object-oriented languages provide both a framework and a motivation for exploring the interaction among the concepts of type, data abstraction, and polymorphism, since they extend the notion of type to data abstraction and since type inheritance is an important form of polymorphism. We develop a lambda-calculus based model for type systems that allows us to explore these interactions in a simple setting, unencumbered by complexities of production programming languages... The evolution of languages from untyped universes to monomorphic and then polymorphic type systems is reviewed. Mechanisms for polymorphism such as overloading, coercion, subtyping, and parameterization are examined. A unifying framework for polymorphic type systems is developed in terms of the typed lambda-calculus augmented to include binding of types by quantification as well as binding of values by abstraction. The typed lambda-calculus is augmented by universal quantification to model generic functions with type parameters, existential quantification and packaging (information hiding) to model abstract data types, and bounded quantification to model subtypes and type inheritance. In this way we obtain a simple and precise characterization of a powerful type system that includes abstract data types, parametric polymorphism, and multiple inheritance in a single consistent framework." (1)

Precise definitions and examples of the terminology behind polymorphism: universal, ad-hoc, parametric, inclusion, overloading, and coercion (4).
Posted to LC by jon fernquest on 8/11/02; 3:02:59 AM

jon fernquest - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 3:52:15 AM (reads: 1123, responses: 0)

Great diagram relating various typing schemes. In chapter 4 on universal quantification the example of a generic id function that is the id function for all types, polymorphic as well as mono-morphic types, is the same as the Rank-2 polymorphism wiki example Both the argument and the type of the argument have to be supplied as parameters to this id function.

Any more good examples of how Rank-2 polymorphism can be useful?

Matthew Danish - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 9:01:50 AM (reads: 1103, responses: 0)

This may be a matter of mistaken terminology, but I don't see how the following quote is self-consistent:

``[...] the weaker requirement that all expressions are guarenteed to be type-consistent although the type itself may be statically unknown; this can generally be done by introducing some run-time type checking. Languages in which all expressions are type-consistent are called strongly typed languages. If a language is strongly typed its compiler can guarentee that the programs it accepts will execute without type errors.''

But when there are run-time type checks, there exists the possibility of the checks failing, or else there would be no need for the checks to be used in the first place. So how can the compiler guarentee that the programs it accepts will have no run-time type errors, then? The only way I can see is to silently pretend that no error has been detected and continue on in some fashion; but this is far worse behavior than simply signalling an abnormal condition and allowing it to be handled in some manner.

In a related vein, the authors also fail to mention that S-expressions can contain values with types more interesting than ATOM and CONS. I don't think there's any dispute over the types in (1 "ABC" SYMBOL :KEYWORD 5.2 #(1 2 #A) #S(FOO :BAR 1/2) (1 . 2)). And that's just types with printable representations. Whatever "Pure Lisp" is, it's not like any Lisp that's been around since 1960. While it cannot be determined at "analysis-time" (what if you don't compile?) that an S-expression will contain any more specific types than ATOM or CONS, that does not preclude it from containing such types. Of course, given the confusion evident in the above quote, this mistake probably comes as no surprise.

Any thoughts? Did I misread, or miss, something?

Frank Atanassow - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 10:17:27 AM (reads: 1155, responses: 0)

Any more good examples of how Rank-2 polymorphism can be useful?

You need it for first-class modules (modules which can be passed around and returned from functions), and for polymorphic recursion (where recursive calls in a polymorphic function are invoked at a different type from the parent call).

But when there are run-time type checks, there exists the possibility of the checks failing, or else there would be no need for the checks to be used in the first place. So how can the compiler guarentee that the programs it accepts will have no run-time type errors, then?

I think the authors take the view that a "type error" occurs when the runtime tries to perform an operation like executing a procedure accepting ints on a parameter which is a string (or "treating an arbitrary S-expression as a program"). In languages like LISP, the runtime does a type check before that occurs, as you say. If the check fails, the runtime does something like printing an error message, but it will never crash or silently proceed.

In other words, the authors don't consider the failure of a dynamic type check to be a "type error", because it will be trapped; more broadly, trapped errors are not considered errors. The property that all errors are trapped is usually called "safety" nowadays.

In a related vein, the authors also fail to mention that S-expressions can contain values with types more interesting than ATOM and CONS... Whatever "Pure Lisp" is, it's not like any Lisp that's been around since 1960.

No, it certainly isn't; that's why they called it "Pure LISP" and not, for instance, "MacLisp" or "Common LISP". :) But, for type-theoretic purposes, it really doesn't make much difference if you include additional "types" like int, string, symbol, etc., since they just entail some extra reduction rules which are all relatively similar. It makes a difference to the programmer, of course, but it doesn't change the character of the type system and is easily coped with in practice, so pedagogically speaking it is better to leave them out.

While it cannot be determined at "analysis-time" (what if you don't compile?) that an S-expression will contain any more specific types than ATOM or CONS, that does not preclude it from containing such types.

I'm afraid I don't understand this remark. First, even if you interpret your program, there is an analysis phase before it gets executed, for example, parsing and scope-checking. Second, it is not possible to distinguish at "analysis-time" (statically) even between ATOM and CONS. Of course, if you are only talking about the source text, then it is possible to distinguish not only between ATOM and CONS, but all the other "types" as well, along with all the expressions forms and so on; that's what parsing is.

Personally, I agree with you that the standard treatment of untyped languages and strong typing and dynamic typing is a bit muddled. I feel it is confusing to call things like ATOM, CONS, SYMBOL, INT, etc. in LISP/Scheme "types". A better perspective is to is to regard LISP/Scheme as having only one type, which has no particular name. The sets characterized by atom?, cons?, symbol?, int?, etc. are merely (covering) subsets of this one, universal type, just as the sets of even and odd integers are subsets of the type 'int', or the sets of empty and non-empty lists of integers are subsets of the type 'list(int)', in a statically typed language. The difference is just that the LISP runtime is specially equipped to deal with atoms, conses, symbols, ints, etc.

This is why untyped languages are sometimes called un(i)typed languages.

Matthew Danish - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 11:47:13 AM (reads: 1111, responses: 1)

I think the authors take the view that a "type error" occurs when the runtime tries to perform an operation like executing a procedure accepting ints on a parameter which is a string

So this is truly then a confusion of terminology (perhaps both on my part and theirs). But wouldn't such a language which allowed its runtime to perform such an operation be termed an "untyped" one? (More below).

(or "treating an arbitrary S-expression as a program").

This depends how you define program. The following S-expression: (if T 1 (asdf zxcv qwerty mnb)) can be evaluated with no ill effects, though any analyzer might note that those 4 symbols have no definition. Perhaps it should be noted that while any arbitrary sequence of characters can be fed into an SML (or Haskell, or whatever) environment, no program will be run unless the analyzation step completes successfully; whereas a Lisp environment will attempt to evaluate a given S-expression as far as possible. Not to mention, even after that S-expression has been defined (and bound, presumably, to some variable) its meaning can change depending on the state of the environment (ie. I can bind those symbols to various values that would allow EVAL to process them, 'twere it to do so).

I'm afraid I don't understand this remark. First, even if you interpret your program, there is an analysis phase before it gets executed

That is exactly my point, I suppose it was not very well put however. In a lot of literature I've seen, there are many references to "compile-time" when what is really meant is "analysis-time". The act of compilation and analysis are not tied together and I wanted to distinguish them. The little note in parenthesis was supposed to indicate that I was using the term "analysis-time" in place of "compile-time".

Second, it is not possible to distinguish at "analysis-time" (statically) even between ATOM and CONS.

This is not what I meant either (what a horrible sentence I must have written). An S-expression, whether depicted in printed representation or in memory, consists of atoms and conses and that is all that can be known about it without run-time inspection of its contents. In other words: if something is an S-expression then you know that it consists of atoms and conses. Any further knowledge requires investigation of the actual run-time contents.

A better perspective is to is to regard LISP/Scheme as having only one type, which has no particular name.

This is a thought-workaround to explain the fact that in Lisp variables are not associated with types. The proper way to think of types in Lisp is that they are associated with objects, not with variables. Once one abandons the concept that variables must have types, this becomes much easier to grasp. Variables, to a Lisper, are handles that can be bound to objects in order to refer to and manipulate them. The type of an object indicates the set of objects to which it belongs and, in Lisp, is bundled along with the object itself. In statically-typed languages the type is bundled instead with the variable which refers to the object; hence the need for the concept of a "typed variable". But this doesn't make Lisp (or other "dynamically typed" languages) "untyped"; such a label only makes sense for a language that does not make provisions for type information in any way.

As the original paper states: type-consistent languages are strongly-typed; type-consistency may be achieved through run-time type checks. Therefore if Lisp environments perform run-time type checks, then they can be type-consistent and as a result, strongly-typed. It also notes that ``every statically typed language is strongly-typed but the converse is not necessarily true.''

Ehud Lamm - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/11/2002; 12:21:38 PM (reads: 1161, responses: 0)

I read the paper a long time ago (and I find it to be really good), but I remember that the terminology and taxonomy used is a bit old fashioned (I seem to recall that the classification of polymporphism kinds was problematic). Beware!

Frank Atanassow - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/12/2002; 7:50:51 AM (reads: 1031, responses: 0)

So this is truly then a confusion of terminology (perhaps both on my part and theirs). But wouldn't such a language which allowed its runtime to perform such an operation be termed an "untyped" one?

No, I don't think I made myself clear. When, for example, an application of an int-accepting procedure to an argument is evaluated, the runtime of a strongly typed language checks whether the argument is an int. If it is, it substitutes the argument in, and executes the procedure body. If it isn't, it raises a "type error". If the language is not strongly typed, it may elide the check, just perform the substitution and and execute the procedure body, likely eliciting a crash. The difference is that, in the first case, the runtime is effectively trapping potential crashes.

(or "treating an arbitrary S-expression as a program").
This depends how you define program.

One would hope that the notion of LISP program is well-defined and non-trivial. :) I think their point is that there exist more S-expressions than LISP programs.

This is a thought-workaround to explain the fact that in Lisp variables are not associated with types.

Who is to say which is the workaround and which is "correct"? They are just models.

The proper way to think of types in Lisp is that they are associated with objects, not with variables.

I'm aware of this perspective.

Frank Atanassow - Re: On Understanding Types, Data Abstraction, and Polymorphism

8/12/2002; 8:23:10 AM (reads: 1027, responses: 0)

Having given it some thought, it seems to me that the unitype model and Matthew's tag model of types may not be wholly incompatible.

In the statically typed language I am working on now, it is possible to define not only datatypes, but also datakinds, via a declaration which looks like the same as a declaration for an algebraic datatype. The programmer is then obligated to give, for each data constructor of the datakind, a datatype declaration. For example, one can define a datakind of tuples:

data Tuple k = Nil | Cons k (Tuple k)
data Nil = Unit
data Cons t tuple = Cons t tuple

The key thing is that every data constructor (tag) of a datakind becomes a type constructor.

Now, forgetting that for a moment, if I were to define a datatype for Scheme values in a language like Haskell, I would write something like:

data Val = Int Int | Cons Val Val | Nil | Proc (Val -> Val)

and there you see all the "types" of Scheme appearing as data constructors (tags) for Val. So, by analogy with the case for datakinds, one can imagine a level of terms below this, which relate to types as types relate to kinds. Indeed, that level is just what we call "values" (as opposed to "types" or "kinds" or "superkinds"; unfortunately there is no good name for them), but my point is that it is not so odd to regard data constructors as the type constructors of a dynamically typed language.