First Class Values, Types as values within a Programming Language

I have been looking at and using programming languages for nearly 30 years now. I have just finished reading http://lambda-the-ultimate.org/node/100 and have found the various disussions "interesting". In terms of language design, I have looked at Chris Strachey and Robert milne's book "A Theory of Programming Language Semantics", Dana Scott's "Denottaional Semantics" and David Harland's "Polymorphic Programming Langugaes" and "Comcurrency and Programming Languages".

What I am interested in is the following question. In Static Type checking systems as well as Dynamic Type Checking systems, what is the effect of making Types (or Type Tags) first class citizens of the programming language? What is the effect of being able to create new types within the code, in other words having a function that on different invocations returns a new type? How does ST handle this (either theorectically or practically)?

Secondly, in ST languages, how is one able to create polymorphic structures without recource to "type variables"? Or another way to put it, how does St languages determine the type of a type variable?

I may not be grokking the way types are used in OCAML and CLEAN. An minor example of one problem I see with them is that I cannot (that I have yet found - if anyone knows how this would be achieved please let me know) create types that take any kind of value of any type.

Let me explain. I understand that it is easy to create a structure in these languages using "type variables". So I can create a list that can take any type of value, as long as all element values are the same type. What I want to do is create a list that has a variety of values of different types (simultaneosly) without having to create a union type. There are problems I have been involved in that required this kind of solution. Solutions developed ended up being far more complicated that the problem required because of static type checking available at the time.

[ 1 2.0 array.of.something "a atring" true ]

In OCAML the following creates a tree

type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;;

My problem is that 'a sets all Nodes to have the same type for the datum values, where I really need the datum values to be of different types.

The requirement is that the code should not be required to be recompiled to add in an additional type signature for the structure. In other words, it needs to be future proof.

An area that I see types as being first class values is in a compiler. It has to be able to create new types on the fly and process them in the analysis of a programming language source file.

I have been working on a personal project of a new language that has types, code, processes, environments, etc as first class values - some of the facilities are to allow the creation of new code values (from within functions) and having functions defined as either lazy or eager. It purpose is to investigate abstraction. One area of abstraction is to see how to add code in the definition of a function that evaulates lazy parameters to give an eager parameter scheme.

Where I can use ideas from existing languages I will do so.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Intersection Types

To say that a something can be of "any kind of value of any type" is to say that it is a Universal Type - which is a distinct type in itself. But this can effectively mean that the type is is completely undecidable - what can you do with this type - add? append? iterate? - and what mechanism would be provided to dispatch to the particular method?

Anyhow, to get around this, the typical method is to declare a datatype that represents the intersection of all the kinds that you can stick in the above list. Something along the lines of:

datatype unitype = UTstring    of string
                 | UTint       of int
                 | UTchar      of char
                 | UTword      of word
                 | UTreal      of real
                 | UTlist      of unitype list
                 | UTpair      of unitype*unitype
                 | UTfun       of unitype->unitype
                 | UTunit      of unit

Then you could consider the type of your list to be of unitype. The downside is that you (a) have to attach a constructor to each value; and (b) add a new line to the type declaration whenever you encounter another type not explicitly laid out in the intersection.

Not necessarily - it could

Not necessarily - it could also be the type forall a.a.

The problem with that is that modulo side-effects the only thing you can do with it is take the identity. It's narrowing it down to types you know support some operation that's interesting - even if the operation's "take a look at the type tag and dispatch based on that".

Java

It's narrowing it down to types you know support some operation that's interesting [...]

Which even lowly Java supports — I can declare a List<Bar> where Bar is some interface, and then put in objects of any class that supports the interface. This subtyping is usually the feature that I miss most when I occassionally try out Haskell.

IIRC, one of the problems...

...is that no one has come up with a sound type system for OO subtype polymorphism - i.e. a type system that is decidable.

Even then it seems complicated

Chopsticks and forks

This is similar to writing lisp code where you are determined not to use macros and expect static checks. I'd recommend reading (more) Haskell code to see how a similar problem is solved using Haskell idioms.

Unfortunately I can't think of many examples at the moment, but Oleg's site has a ton of neat code in Haskell and too many other languages (which makes me jealous).

Not at all

I've read plenty of Haskell code, and I know various ways to solve the problems that I would usually solve with subtyping. I still miss subtyping when programming in Haskell. Are we not allowed to mention features we'd like to see in languages we like?

I also miss it sometimes. A

I also miss it sometimes. A lot of that could be solved with existential types and impredicativity (so that I could have a list of Showable things, for example), but not all of it - record and variant subtyping's useful.

Row polymorphism

AFAICT, row polymorphism for records and variants is superior (at least in combination with existential types).

Impredicativity and Row Polymorphism

Putting impredicativity and row-variable polymorphism in the same language promise to get you quite close to first-class modules. See Daan Leijen's recent work, espeecially Qualified types for MLF.

Impredicativity is nicer than existential types for Philippa's example - you can construct that list of Showable values without needing to inject them into an existential container. I can't see how row-variable polymorphism is at all relevant to that example.

I was rather dismayed

I was rather dismayed recently when I realised that not only does a project of mine fake first class modules in Haskell but that has unavoidable mutually recursive module dependencies - and thus takes some serious effort to pull off in the middle of a 'nice' system.

All I'd done was build a transformer stack for implementations of an API - except the implementations needed to be able to call the 'final' versions of calls.

Incidentally you still need an existential for the type class example to get the right level of polymorphism sadly - it just doesn't have to be faked via the datatype mechanism.

Impredicativity and existential types

Impredicativity is nicer than existential types

I don't follow. How are existential types interesting without some form of impredicativity in the first place? And how can you construct a list of Showables without at least faking existentials?

I can't see how row-variable polymorphism is at all relevant to that example.

I was commenting on the "record and variant subtyping's useful" part of Philippa's post.

Not meant to be exclusive

I mean to compare existential types in a predicative and impredicative system. In a predicative system like Haskell you have to make a data type to hide the existential qunatifer if you want a list of existentially typed values. In an impredicative system, you can just work with a list of whatever polymorphic type you like.

Of course not

I didn't mean that. Nonetheless, "occassionally trying out" sounds quite different now that you put it this way :)

That's a union type

Your example is what is called a discriminated union type (the "discriminated" adjective distinguishes it from the undiscriminated unions of C and C++). Intersection types are different.

Syntactically, intersection types are what you get if you replace the "|" characters in your example with "∧". Semantically, an intersection type is simultaneously a member of more than one type. It can be treated like either type, without a dynamic check. In fancier terminology, the type is inhabited only by those values that belong to each of its subtypes.

I won't hazard to try to give a meaningful example of an intersection type, since I'm at the limit of my understanding in this area.

Records

Perhaps the most intuitive and easy to understand (though not the only useful) is intersection of record types - the values of intersection of A and B have all the fields from both of them (thus, while you intersect extents of A and B, you actually take a union of their features). I believe we already had a discussion of this phenomenon.
[on edit: e.g., Conjunction types]

My Bad

Know what I wanted to say, just couldn't remember the correct technical term. :-)

Intersection type example?

Am I correct in thinking that Perl's lists have an intersection type? They are both lists and integers (corresponding to the length of the list).

Needs to be user-specifiable

"Intersection types" refers to a feature of a static type system that lets you define a derived type T by specifying a set of types S. Any value that inhabits every type in S also inhabits T.

For example, if the language used the '&' symbol to intersect types, you could do something like:

class Dog {
   Bark() {}
}
class Cat {
   Meow() {}
}

Run(x : Dog&Cat) {
   x.Bark();
   x.Meow();
}

Run(new Dog()); // -- error
Run(new Cat()); // -- error

class MyWeirdAnimal extends Dog, Cat {}

Run(new MyWeirdAnimal()); // -- OK

A 'new Dog()' value inhabits the type 'Dog'. A 'new Cat()' value inhabits the type 'Cat'. A 'new MyWeirdAnimal()' value inhabits both the 'Dog' and 'Cat' types and therefore inhabits the 'Dog&Cat' type.

Not Really

Perl's curious that way, in that the notion of scalar versus aggregate context can determine which operation to perform. I tend to think of it as an operator-oriented language with container types, but even that's not entirely accurate.

Maybe it helps to think of evaluating an array in scalar context as calling a method on the array different from the one called by evaluating the array in list context.

Single-type lists exist because of recursive combinators

What I want to do is create a list that has a variety of values of different types (simultaneosly) without having to create a union type.

The way I see it, the reason this is not such a good idea is that a datatype like a list, abstractly, is really something that provides structure for higher-order combinators like map and fold to work off. More generally, the whole idea of looping over the elements of a data structure and doing "the same thing" with each of them presupposes that they are all subcases of one type, which the operation you loop with accepts. That is, if you ever have a function that processes the elements of a mixed concrete type list uniformly, you're implementing a union type implicitly, by hand.

There are problems I have been involved in that required this kind of solution.

I am really skeptical of this, and strongly suspect you've missed some abstraction. For example, I often hear Lisp advocates complain about the fact that languages like ML don't allow lists of elements of mixed types, and that this is somehow a big advantage of Lisp. Frankly, the only examples I've seen that are really compelling involve using Lisp lists as a concrete representation for a domain-specific language; and in this case, the abstraction that the argument is missing is that the abstract common type of the mixed concrete type elements of the list is "expression of embedded language X." The real advantage of Lisp in this regard is that its lists are far more syntactically lightweight for DSL use than anything ML provides, not that its lists can hold mixed-type elements.

Existential types

...are what you want. For example, a list of type List(∃a.a) can have arbitrary different types of elements.

The problem is, as others already pointed out, what can you do with such a list? Since you do not know anything about the type of an individual element (it is just "some type"), you cannot use it for anything but passing it around abstractly.

So, in addition you need at least one of the following three features:

1. Intensional type analysis. Lets you inspect the actual type dynamically, by a form of case expression over types. This essentially is a fancy version of downcast - with all its disadvantages. But given it, the type ∃a.a can be viewed as a first-class pair of a value and its type (the combination of these two features is also known as type "dynamic", a universal type).

2. Subtyping. Allows you to form types of the form ∃a≤T.a, such that you can at least do anything you could do with a T.

3. Qualified polymorphism. Superficially similar to subtyping, but technically different, and in fact more flexible. Take a look at Haskell type classes (and their combination with existential types) to get an idea.

Short of that, you have to resort to sum types (not necessarily universal), and that is in fact the conventional way to address this. I agree with 'em' that chances are that there is a serious design error somewhere if you really believe to need existential types. For example, I don't see why you'd need them in a conventional compiler, except if you want a reflective compilation interface.

Your deepest desires: on screwing

Excuse me but the elements of a list of type List(∃a.a) are not of "arbitrary different types"; rather they are all of type ∃a.a.

But this whole thread, frankly, is surely beside the point. The poster does not really want to stick things of different types into a list; he only thinks he does. This is like the guy from that Twilight Zone episode who releases an evil djinni that fulfills all his wishes to the letter, but always in the worst way possible.

The poster thinks he wants to change some language L to L' to allow building such lists because he has some set X of problems which he thinks L' can solve but L can't. And maybe L' can solve them, but only at the cost of creating other problems which are far worse. In fact, he does not only want to solve X in L': he also wants L' to be able to solve all the other problems Y which L was already able to solve: he wants the extension L->L' to be conservative. But it won't be.

What the poster really wants is undoubtedly more dependent on the specific nature of what the problems X are, and in fact might require a number of different techniques rather than one blanket solution.

Languager designers often fall into the trap of thinking that by adding this or that feature it will conservatively address some class of problems because it only involves extending the syntax this way or that. But they fail to see the effect it has on the language as a whole, on its properties, on its semantics.

For example, if you add Type:Type to a language suddenly you get a language where you must drop all pretense of regarding your types as sets, since there is no set which is its own member (by definition). Consequently also "functions" stop being functions, "integers" stop being integers, etc. (*) It is a bit like one of those jokes about screwing in a lightbulb: instead of one guy turning the bulb, he remains stationary while an army rotates the building.

Don't rotate; screw.

(*) These systems can be interesting and worthwhile. But they are not solutions to the problem under the conditions stated.

Differences of opinion -

But this whole thread, frankly, is surely beside the point. The poster does not really want to stick things of different types into a list; he only thinks he does. This is like the guy from that Twilight Zone episode who releases an evil djinni that fulfills all his wishes to the letter, but always in the worst way possible.

Here we have a difference of opinion. In your experience, you may never have come across a problem where you want to process/handle a number of different things - that's fine. However, that doesn't mean that other people haven't.

I have had to solve problems in the past where the solution to the problem was "best"/"cleanest" (from the problems point of view) if there had been a heterogenous list/sequence/stack to place the various "entities" into. Due to the restraints of the programming languages, a considerably more "complicated" solution was developed.

You may well have come up with a different solution - no problem.

The poster thinks he wants to change some language L to L' to allow building such lists because he has some set X of problems which he thinks L' can solve but L can't. And maybe L' can solve them, but only at the cost of creating other problems which are far worse.

Frank, please cite examples of the problems created or give references to papers that discuss examples of these problems. I think that there are those of us who would be interested in seeing what difficulties would actually arise. If they are not obvious, it is then all the more important that they be stated clearly.

In fact, he does not only want to solve X in L': he also wants L' to be able to solve all the other problems Y which L was already able to solve: he wants the extension L->L' to be conservative. But it won't be.

Just as a comment, I have never found any language that suits all problems - I use whatever language is available out of my toolbox that has the particular facilities that I need for the problem at hand. I am, however, interested in some ideas that (from my perspective) have not been adequately investigated. Hence my questions to get a feel for what other people know. Some of the responses have lead to some interesting information.

For example, if you add Type:Type to a language suddenly you get a language where you must drop all pretense of regarding your types as sets, since there is no set which is its own member (by definition).

I have seen this statement made before - a research paper from (I think) about the turn of the century. From my perspective, [Type:Type] is representing the set and is not the set. The research paper (a copy of which I have on one of my backups somewhere) did not seem to differentiate between something acting as a representation and being the thing represented. Your perspective may vary - again not a problem.

For me, each element of the Type Space represents a different Space of Values. Each element of each of these Spaces represents a different value within that space. Hence one of the values of the Type space can and does represent the Type space.

Consequently also "functions" stop being functions, "integers" stop being integers, etc.

[Humour]
Question: What's an integer?
Answer: A can of whole worms that will cause more arguments than is useful - a mathematician represents them by sets (or other things), a prisoner represents them by marks on the cell wall, Scrooge McDuck by the coins in his Vault.
[/Humour]

If the idea that a set can contain a value that represents itself causes the mathematical theorists heartburn, so be it. I personally don't see a problem since we are dealing with representations.

I was trained as an electrical engineer and as a part of the history of engineering, we were required to understand that engineers were using a mathematical technique (algorithm) which worked in practice but could not be proved by mathematicians for over half a century. Till one bright spark jumped laterally a long way and found a solution to why it worked. This pleased the mathematicians and really made no difference to the engineering solution.

Since we are dealing with computational machinery (where where every "thing" being computed is represented by some "thing") and is not a theorectical quantity, a difference from theory will arise.

True as far as it goes

While all your points have a great deal of truth to them, I have to say that:

Bruce Rennie: If the idea that a set can contain a value that represents itself causes the mathematical theorists heartburn, so be it. I personally don't see a problem since we are dealing with representations.

isn't warranted, because the manner in which we are "dealing" with representations is to manipulate them according to some logic (by the Curry-Howard Isomorphism). Frank's point, or at least my interpretation of it, is that, consciously or otherwise a programmer is making a set of assumptions about the algebraic properties of their code, and that (some of) those assumptions are violated instantaneously, with everything that that implies for the correctness of our code, the moment you introduce something like Type:Type into your language.

In practice, of course, many languages do have the moral equivalent of Type:Type, and people continue to write software in them, but such software traditionally ends up requiring all manner of contortions, some of which fall under the charming-sounding rubric of "design patterns" (many of which simply aren't necessary, or are at least trivial to express, in more expressive languages than Smalltalk, C++, or Java) and others of which simply fall into the category of being unsound, e.g. "instanceof" in Java. If the language in question supports metaprogramming, you now have to worry about non-termination in recursive types. etc.

So while the map isn't the territory, in our domain, the map might best be described as an encoding of the territory—a certain amount of fidelity is, in fact, required for the code to make any sense. To put it in circuit engineering terms, introducing Type:Type has a strong tendency to introduce noise into the system. The signals a user of the language thinks they're sending have a greater likelihood of getting garbled in transmission.

Systems of Logic

isn't warranted, because the manner in which we are "dealing" with representations is to manipulate them according to some logic (by the Curry-Howard Isomorphism). Frank's point, or at least my interpretation of it, is that, consciously or otherwise a programmer is making a set of assumptions about the algebraic properties of their code, and that (some of) those assumptions are violated instantaneously, with everything that that implies for the correctness of our code, the moment you introduce something like Type:Type into your language.

Agreed - according to some logic. This automatically means that there are some premises that are the basis of the logic that cannot be proved, they are assumed to be true (if you like, the Godel points - if I understand this correctly). If a modification of these premises are used then what is the result? Do some of the logical inconsistancies disappear? This is a question that must be asked and looked at.

If the basis of the logic (and logic is based on ones fundemental beliefs about what is correct) is not allowed to be investigated then one cannot escape from the "boundaries" of the box.

Computers (as we know them) and the "science" behind them have only been in existance for a very short time. So I'll use am example of a logic system that gave rise to a monstrousity - The epicyclic universe. The funny thing was that at the same time it was coming into being an alternative was also being proposed.

Put it as an intuition, but to me it appears that the real fundemantals haven;t been looked at closely enough. I don't have the background (mathematically) to delve deeply into this, but there must be those who are active in the field who can.

Take the above comments for what it is worth (and it may be worth nothing to you).

I have been given enough to start another thread about another subject raised. But I need to formulate the question more clearly.

Today is a good day, I have wood to cut, lawns to mow, a family to spend time with and Jesus to walk with. May much blessing fall upon all of you today.

Heterogenous collections

I have had to solve problems in the past where the solution to the problem was "best"/"cleanest" (from the problems point of view) if there had been a heterogenous list/sequence/stack to place the various "entities" into. Due to the restraints of the programming languages, a considerably more "complicated" solution was developed.

Can you give an example? In my experience, heterogenous collections are almost always a hack/poor-man's-version of a more specific abstraction (object, record, interface, etc.) that a better design (and maybe a better language) would solve more cleanly.

I'd be very interested if you have a specific counter-example.

Heterogenous collections

The simplest example is a print routine. Another is a communications channel between applications (my working background includes communication systems). Both of these, at a more abstract level, process lists/streams of different values. You may disagree and from your perspective that's okay. But from my perspective and those I was working with, the problem required this higher level approach. Because we looked at the problem from this level, we were able to develop a solution that worked where the "experts" were stuck. The tools we had to use were primitive, assembler, cobol, a scripting language called TCL (no relation to the language of that name today). The solution was multi-streamed, multi-user, multi-tasked, multi-machine and it worked. We (myself and another colleague) were able to get the system to work under another OS and communications network very quickly.

Another one of the top of my head that arose so many times over the years is BOM. This one is usually related to databases.

As an apt comment, computer languages are tools and as such, it should be in the hands of the craftsman to choose what tool to use where and when. Whether static/dynamic, typed/untyped, functional/imperative or high-level/low-level.

We are supposed to be craftsmen/craftswomen. Let us develop the "best" tools for the various problem domains that we experience. No two of us will ever have exactly the same set of problems to solve, but we can certainly learn from each other.

I'll leave it here as I need to get out and downtown. Lots of things to do (as previously posted).

Print routine

Everytime heterogenous lists appear, people talk about print routines. Danvy's Functional unparsing is really cool, typesafe, and implemented without fancy type systems (standard ML, nothing but basic HM types). The technique applies to many other domains.

(edit: fixed link url)

C format strings

Thank you Carlos for the pointer to the paper. Interesting reading - I'll need to look at it closer. The only problem I have with it, is the fact that it is based on C format strings (reminds of the days when I programmed in Fortran IV and Fortran 10).

From my perspective, C format strings are a kludge and unpleasent way of outputing strings. I do know that other people find them pleasent. Each to his/her own in that regard. The problem I see with them and not addressed by the paper is the printing of "abstract" structures (sequences, lists, communication channels, trees, records, environments, functions, etc) in a simple uniform way, i.e.,

print (sort list)

print (sort table1)

print 12 "This is a string" [ 1 2 34 [ ....] ]

etc.

In addition,

Everytime heterogenous lists appear, people talk about print routines.

it is the one common example of a (system supplied) procedure/function that you can't duplicate by writing your own (with the possible exception of C) - polymorphic and able to process an unlimited number of parameters.

On the whole, it is one reason why I like Icon/UnIcon - they both give you the ability to write your versions of the system supplied functions. You have access to the most of the fundemental features.

For poorly specified problems

Heterogeneous lists tend to be useful when the your dealing with something that isn't well specified. The reasons for a lack of exacting specification are manifold; it could be that you're having to process data from some other poorly specified system; perhaps the requirements for a given portion of ode are not that strict, and flexibility is more important; maybe code that is flexible enough to perform adequately on all manner of input and keep on ticking is more valuable to you than guarantees of getting things right for most cases but is fragile for anything else.

Whatever the reason for not having tight specifications, the result can often end up with heterogeneous lists being a decent answer - sure, if you knew more about exactly what sorts of items you'll ever need to add, or knew exactly how many items you'll need, or any of a number of other things, then you could could likely put together a much cleaner solution, with much better guarantees of correctness, using objects, or records, or union types, or something else; sometimes you don't have that information though.

Sometimes flexibility is of more value in a solution than certainty, sometimes certainty is far more important than flexibility. There's plenty of scope, from dynamic languages, through static types, all the way to formal methods, to cover the needs of different sorts of problems. Can't we just the tool that suits the problem instead of trying to claim that one is better than the other?

I don't buy it.

Here's why (a weird mix of pseudo-ML and pseudo-Haskell notation, I hope the intention is clear):

type t = 
  [ SomeTypeOfData of int
  | SomeOtherTypeOfData of (int * int)
  | UnknownTypeOfData of string
  ]

func decode :: raw_data_stream -> list t
func decode =
    (try to decode as much as we currently understand)

This trivially lets you add cases as you learn more about the data you're supposed to be processing, and you can get stronger guarantees from the type system(*) than from a type system that allows heterogeneous lists.

(*) Such as full case/match coverage.

The point is that what you

The point is that what you have there is something that needs to be taken down, modified, and recompiled, every time a new case that you hadn't yet anticipated comes along. That may well be a good answer for some projects, but other projects have constraints where not having to do that sort of thing is more valuable than stronger type system guarantees.

Think of it like security - sometimes having as many security features in place and locking absolutely everything down is the way to go. Other times secuity just isn't that important, and all those locked down things simply make it impossible for anyone to actually do anything so you sacrificce some security for a little usability. Sometimes there are projects for which correctness is not the most important feature; flexibility is more important. In that case you may well trade off a little type safety for a little flexibility.

Updating running software

I don't see what updating running software has to do with the strictness of the type system. I haven't looked at it too closesly yet, but Acute would seem to provide the necessary language support to make software updates more seamless what with dynamic loading support, type safe distribution, etc.

For the Example Given, Good Points

However, it's worth pointing back at my post here, which copies an example from "Code Reuse Through Polymorphic Variants." The example begins with a simple interpreter for variable substitution, then develops one for the untyped lambda calculus that reuses the variable substitution interpreter, then develops one that does addition and multiplication reusing the variable substitution interpreter, then develops one that combines the lambda calculus and addition and multiplication, all without recompiling any of the earlier definitions. Indeed, they can even be compiled separately. So you can have your polymorphism, static type safety, and extensibility all at the same time, contrary to common belief among those not familiar with the really good statically-typed languages.

Updating a live system without stopping it, however, is indeed a much harder problem. Another poster whose name I can't recall right now has already done me the favor of not having to link to Acute again with respect to those issues. :-)

Also have a look into generalised

Also have a look into generalised algebraic data types (GADTs) (a generalization of existential types), GHC compiler has a haskell extension to support GADTs using first-class phantom types (check here for more info).

Martin-Löf type theory & dependant type systems and Epigram programming language where types are basically first-class entities

In Static Type checking

In Static Type checking systems as well as Dynamic Type Checking systems, what is the effect of making Types (or Type Tags) first class citizens of the programming language?

I think you end up with dependent types. If you know Haskell, you might like...

So I can create a list that can take any type of value, as long as all element values are the same type. What I want to do is create a list that has a variety of values of different types (simultaneously) without having to create a union type.

There are two main solutions. Use someone else's predefined union-of-everything library: Data.Dynamic. Or use a fully static solution like HLists (Strongly typed heterogeneous collections), or nested tuples...

{-# OPTIONS -fglasgow-exts #-}
data Nil = Nil

class    Length a                 where len :: a -> Integer
instance Length Nil               where len x = 0
instance Length b => Length (a,b) where len (h,t) = 1+len(t)
    
list_of_stuff = (1, (2.0, ([9,8,7], ("a string",(True,Nil)))))

main = print (len list_of_stuff)

#include <iostream>
#include <utility>

using namespace std;
class Nil {};

template <class H, class T> int length(pair<H,T> x) {
    return 1 + length(x.second);
}
int length(Nil x) { return 0; }

int main(int argc, char* argv[])
{
    cout << length( make_pair(1,
                    make_pair(2.0,
                    make_pair((int[]){9,8,7},
                    make_pair("a string",
                    make_pair(true,Nil())))))) << endl;
    return 0;
}

...You may also be interested in existential types

I second the Hlist papers

I think the original poster may be interested in reading about HLists.

"Strong Types for Relational Databases" is also a very good paper which uses HLists to model relational algebra.

I was about to post a question on LtU myself along similar lines of this post: how does one deal with type system in a database environment?

A user may create/alter/remove tables WHILE the database server is running, are statically typed languages still useful in such a scenario? As far as I can tell, the "Types for Relational Databases" paper shows how statically typed tables can be defined in Haskell code by a programmer. How can Haskell's (extended) type system help if one attempted to build a complete database server (a server which receives sql queries, processes its own data, returns results)?

A third solution

Extensible Union Types are, in my not-so-humble opinion, a better solution to the problem posed by the original poster. I first encountered them in the paper Monad Transformers and Modular Interpreters (Liang, Hudak & Jones; 1995). They are not ideal, but they are much more flexible than Union-Of-Everything types, and not as heavyweight as HLists.

Here's some Haskell code to do what the original posted was asking for:

{-# OPTIONS_GHC -fglasgow-exts #-}
{-# OPTIONS_GHC -fallow-overlapping-instances #-}

{- **************************************************************
   * Extensible Union Types used to print "heterogeneous" lists *
   ************************************************************** -}

import ExtUnion

-- Make Extensible Type Unions instances of the Show class, but only
-- if every member of the union is an instance of Show.
instance (Show head, Show tail) => Show (TypeCons head tail) where
    show (TCHead x)  = show x
    show (TCTail xs) = show xs
instance Show TypeNil where
    show TypeNil = error "show: cannot show TypeNil"

-- Here's a type union.
type PrintableTypeUnion =
      TypeCons Float
    ( TypeCons Int
    ( TypeCons String
    ( TypeCons Bool
               TypeNil
    )))

-- Making ad-hoc extensions to the union is trivial.
type LargerPrintableTypeUnion = TypeCons [Ordering] PrintableTypeUnion

-- Let's see it in action.
main = print ([ toTypeList (1::Int)
              , toTypeList (2.0::Float)
              , toTypeList [LT, EQ, GT]
              , toTypeList "a string"
              , toTypeList True
              ] :: [LargerPrintableTypeUnion])
-- Running main prints:
-- [1,2.0,[LT,EQ,GT],"a string",True]

The code imported from ExtUnion is closely based on code in the aforementioned paper, but yell if you want me to post it here.

Extensible unions as negation of extensible records

Peter McArthur wrote:

main = print ([ toTypeList (1::Int)
              , toTypeList (2.0::Float)
              , toTypeList [LT, EQ, GT]
              , toTypeList "a string"
              , toTypeList True
              ] :: [LargerPrintableTypeUnion])

The OOHaskell paper describes a similar tool called `consEither' --
which creates the similar extensible union type
Either a (Either b ...)
`on the fly'. No prior declarations are needed. Incidentally, such a
union type supports safe downcasting (aka typecase).

Incidentally, by negating HList's extensible records (in the
Curry-Howard sense) we can obtain extensible unions automatically. No
typeclasses are needed and no type declarations. Incidentally, these
extensible unions solve the expression problem for Haskell.

http://www.haskell.org/pipermail/haskell/2006-July/018172.html

The other solution is described in Koji Kagawa's paper `Polymorphic Variants
in Haskell' referenced in the message.

Sounds Like the Expression Problem Again

Bruce Rennie: In OCAML the following creates a tree

type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;;
My problem is that 'a sets all Nodes to have the same type for the datum values, where I really need the datum values to be of different types.

The requirement is that the code should not be required to be recompiled to add in an additional type signature for the structure. In other words, it needs to be future proof.

So you need a type that functions can operate on, and both the type and the functions need to be independently extensible? That's called the "expression problem," and there are a few good resources available describing solutions to it, the best one that I know of even being in O'Caml. I wrote about them here, and particularly recommend following the link to "On the (un)reality of virtual types." Please also see Jacques Garrigue's Code Reuse Through Polymorphic Variants, which demonstrates many different approaches to the issue, all within O'Caml.

Update: "Code Reuse Through Polymorphic Variants" is concise enough that I can probably convey its value inline, so let me just crib from it. It defines an evaluator for the lambda calculus, incrementally going from a language with only variables; to one with variables, abstraction, and application; to one with variables, abstraction, application, and basic arithmetic, all without recompiling previous definitions. I'm going to show the code for each and offer my own brief explanation, please see the paper linked above for more details. Here goes:

The language of variables is, of course, dirt simple:

type var = [`Var of string]

let eval_var sub (`Var s as v : var) =
  try List.assoc s sub with Not_found -> v

Let's try it:

# eval_var [] (`Var "x");;
- : [> `Var of string ] = `Var "x"

# eval_var ["x", `Var "y"] (`Var "x");;
- : [> `Var of string ] = `Var "y"

So far so good, eh? But now we want to add abstraction and application, i.e. we want the lambda calculus, but without having to change how variables get evaluated. That looks like this:

type 'a lambda = [`Var of string | `Abs of string * 'a | `App of 'a * 'a]

let gensym = let n = ref 0 in fun () -> incr n; "_" ^ string_of_int !n

let eval_lambda eval_rec subst : 'a lambda -> [> 'a lambda] = function
    #var as v -> eval_var subst v
  | `App(l1, l2) ->
      let l2' = eval_rec subst l2 in
      begin match eval_rec subst l1 with
        `Abs(s, body) ->
          eval_rec [s,l2'] body
      | l1' ->
          `App (l1', l2')
      end
  | `Abs(s, l1) ->
      let s' = gensym () in
      `Abs(s', eval_rec ((s,`Var s')::subst) l1)

let rec eval1 subst = eval_lambda eval1 subst

The salient points here are:

  • the type of lambda expressions is left open: it's parameterized by a type variable for the type of terms, 'a.
  • the new type includes the type constructor for variables, i.e. the new type is a subtype of the var type.
  • the eval_lambda function relies on open recursion (the eval_rec parameter) and is annotated to indicate that it takes a lambda expression to "at least a lambda expression," i.e. the open recursion can return a result that is a subtype of lambda expressions, which are themselves a subtype of var expressions. It's this parallel of open types, in the form of polymorphic variants, and open functions via open recursion that is the key to the whole thing.
  • the var case is handled by calling the existing eval_var function; the other cases are handled partially by calling the recursive functional parameter on subterms.
  • to actually get a working evaluator for the language being defined, we need to close the recursion; that's what "eval1" does.

So how does our lambda calculus evaluator do now?

eval1 [] (`App(`Abs("x",`Var"x"), `Var"y"));;
- : 'a lambda as 'a = `Var "y"

Now let's add a couple of arithmetic functions, namely addition and multiplication:

type 'a expr =
    [`Var of string | `Num of int | `Add of 'a * 'a | `Mult of 'a * 'a]

let map_expr f : 'a expr -> [> 'a expr] = function
    #var | `Num _ as e -> e
  | `Add(e1, e2) -> `Add (f e1, f e2)
  | `Mult(e1, e2) -> `Mult (f e1, f e2)

let eval_expr eval_rec subst (e : 'a expr) : 'a =
  match map_expr (eval_rec subst) e with
    #var as v -> eval_var subst v
  | `Add(`Num m, `Num n) -> `Num (m+n)
  | `Mult(`Num m, `Num n) -> `Num (m*n)
  | e -> e

let rec eval2 subst = eval_expr eval2 subst

This is a bit more subtle: the expr type is a subtype of var, but not of lambda expressions. Its term type is still open, though. Also, because we're now dealing with arithmetic, we introduce a function to evaluate subterms before doing the pattern-matching so as to deal with redexes. Beyond that, the pattern remains the same: let eval_var deal with variables and just handle the rest. Note that any pattern not handled explicitly is just returned as-is; again as part of remaining open. And, as always, we need to close the recursion to get an actual evaluator:

# eval2 [] (`Add(`Mult(`Num 3, `Num 2), `Var"x"));;
- : 'a expr as 'a = `Add (`Num 6, `Var "x")

Now watch how easy it is to tie the lambda and expr evaluation together!

type 'a lexpr = [ 'a lambda | 'a expr]

let eval_lexpr eval_rec subst : 'a lexpr -> [> 'a lexpr] = function
    #lambda as x -> eval_lambda eval_rec subst x
  | #expr as x -> eval_expr eval_rec subst x

let rec eval3 subst = eval_lexpr eval3 subst

A lexpr is either a lambda expression or a expr expression; evaluating a lexpr just delegates to eval_lambda and eval_expr. Just because we can, though, we once again leave the types and evaluators open, in case someone wants to extend them later. As always, we close the recursion to get an actual evaluator.

# eval3 [] (`Add(`App(`Abs("x",`Mult(`Var"x",`Var"x")),`Num 2), `Num 5));;
- : 'a lexpr as 'a = `Num 9

And there it is: the incremental extension of a type and the functions that operate on it, without recompiling any of the previous definitions.

and also

Open data types and open functions which was discussed quite recently on LtU here.

Thanks for all the feedback and discussion

My thanks to you all for your time in providing feedback to the original post. I have a fairly long post to come but that will wait till tonight.

Just a couple of quick clarifications, each value in the universe of discourse belongs to one and only one type (type space). Each type (type space) is represented by a value in the type Type. Every value carries with it, as an intrinsic part of the value, what type (type space) it belongs to. The values exist and in the framework of the programming language and its computational system cannot be decomposed or separated.

A simple example of the use of a list containing different types of values is

map print [ 1 2.0 array.of.something "a atring" true ]

which should print out the external representation of each value. print should be defined for each type created.

More details to follow.

Again thank you to all who have responded.

Difference between union types and subtypes?

What is the principal problem in handling union types as subtypes? the problem of subtyping has not been solved (i.e. it is undecidable or something like that) in static type systems, but in my mind there is no practical difference between subtyping and union types: a programming language could infer the interface signatures by looking at all available uses of the elements of a collection.

Fundamentally different

There are at least two fundamental differences between (disjoint) union types and subtyping:

1. Openness vs. closedness. A union type describes all possible alternatives exhaustively. Consequently, a type checker can check that you never forget to handle any of them. With subtyping OTOH the number of alternatives is unlimited - this makes them extensible but also weaker in checking, because you always need catch-all clauses that may accidentally cover unintended cases, especially when you add some later.

2. Subsumption vs. injection. To construct a value of union type you have to explicitly inject it. With subtyping, any value of a subtype implicitly is a value of any supertype. This is more convenient on one hand, but also makes types less precise, because they are only ever exact up to subtyping.

If you meant non-disjoint union types then the story is a bit different, but these are known to be a very bad idea anyway.

BTW, the problem of subtyping (inference) is not undecidable - it just is largely impracticle, and most likely will remain so.

What prevents a compiler from keeping union types open?

Openness vs. closedness. A union type describes all possible alternatives exhaustively. Consequently, a type checker can check that you never forget to handle any of them. With subtyping OTOH the number of alternatives is unlimited - this makes them extensible but also weaker in checking, because you always need catch-all clauses that may accidentally cover unintended cases, especially when you add some later.

What prevents a compiler from keeping union types open? I do not see it as a problem; a compiler could easily maintain an open slot for extending a union type in released code. The open slot could be optimized away in run time or link time.

Subsumption vs. injection. To construct a value of union type you have to explicitly inject it. With subtyping, any value of a subtype implicitly is a value of any supertype. This is more convenient on one hand, but also makes types less precise, because they are only ever exact up to subtyping.

Again, I see no problem: a compiler could easily provide a typing mechanism where only subtypes of a type are allowed. In this case union types work like C++ templates with constraints: I can pass any subtype of type T, but not T, and a different piece of code will be instantiated according to the type.

Not problems but features

You misunderstand. Closedness is not a problem, it's a feature. As I said it allows stronger static checking, and that often is more important than after-the-fact extensibility.

Regarding subsumption I don't understand your comment. What you describe is how subsumption works already. Again, it is an advantage of union types that you do not need subsumption, because that makes the type system more precise.

Subtyping has a different set of advantages, so in short, subtyping and union types simply make different trade-offs.

I do not see how closedness allows for stronger type checking.

Closedness is not a problem, it's a feature

If you do not want to add more types to a union type, that is. But what if you do?

As I said it allows stronger static checking, and that often is more important than after-the-fact extensibility

I fail to see how closedness allows for stronger type checking. If a type B is a subtype of type A, then B can fit in any function that requires an A. I do not see how stronger static checking can be.

Subtyping has a different set of advantages, so in short, subtyping and union types simply make different trade-offs.

What I am saying is that subtyping and union types are the same thing, and there shouldn't be any trade-offs/constraints unless explicitely requested.

Subtyping and unions again

If you do not want to add more types to a union type, that is. But what if you do?

Then you change the definition and the type checker happily points out every single place you need to adapt. Not so with subtyping (or other open approaches), where code handling existing cases may accidentally(!) subsume new ones without the type checker being able to know and point out. Depending on the application this may or may not be what you want.

If a type B is a subtype of type A, then B can fit in any function that requires an A. I do not see how stronger static checking can be.

By enabling you to say: "Here I want to allow A, and A only!". As soon as you have subtyping, you cannot express that anymore.

What I am saying is that subtyping and union types are the same thing, and there shouldn't be any trade-offs/constraints unless explicitely requested.

What shall I reply to that? I can only repeat that they are not the same thing, and they have different, incompatible advantages. I explained why. You won't change that fact by arguing that one should be turned into the other.

By enabling you to say:

By enabling you to say: "Here I want to allow A, and A only!". As soon as you have subtyping, you cannot express that anymore.

How is that a problem? by using subtyping + constraints, you can achieve just that. For example:

template <class T extends foo> void action(T &a) {
}

In the above, only subtypes of foo are allowed.

I can only repeat that they are not the same thing, and they have different, incompatible advantages. I explained why. You won't change that fact by arguing that one should be turned into the other.

I am not saying that one should be turned into the other. I am saying that they are the same thing! the only difference is that subtyping is open union typing. Don't get caught up in current implementations; think freely. Here is an example:

(* a union type *)
type Foo = Foo1
         | Foo2
         | Foo3

void action(Foo param) { bla bla }

//subtypes
class Foo {
}

class Foo1 extends Foo {
}

class Foo2 extends Foo {
}

class Foo3 extends Foo {
}

template <class T extends Foo> void action(T param) { bla bla}

In both cases, the function 'action' accepts only the 3 elements that are related to Foo: Foo1, Foo2, Foo3.

It seems that subtyping == union types.

Circling

In the above, only subtypes of foo are allowed.

You missed the whole point. My example was about being able to express a precise type, i.e. exclude any other type. With subtyping there is no way to exclude subtypes. Of course, that's the whole point of subtyping, but sometimes it is not what you want. See below.

I am saying that they are the same thing!

We are going in circles. I explained why they aren't, and my explanation was not based on any "implementation", but on the standard notions in type theory as well as practical arguments that you want to have both on occasion.

To address your example, with the subtyping formulation you cannot prevent anybody from defining additional subtypes of Foo or FooN somewhere else and call action with them, possibly breaking all its invariants (Philippa gave a good example). So you will never have the same guarantees as with a closed union type.

union types = enum on steroids

You examine unions like enums using a case-construct. As the union type is closed, the type checker can verify that all cases are handled. I hope this helps.

Look Up for Counterexample

If you scroll up a bit, you'll find my crib from "Code Reuse Through Polymorphic Variants," in O'Caml, of an elaboration of three evaluators for a language. The types (var, lambda, expr, and lexpr) are all open thanks to being based on polymorphic variants. The evaluators are all open thanks to relying on open recursion. However, the evaluators are also all annotated to accept only the specific type they evaluate, but to possibly return a subtype—any subtype—of the type that they accept. So as you can see, it's quite possible to restrict a function to taking a specific (union) type and still remain open. And that's just one possible approach; the "Code Reuse Through Polymorphic Variants" web page provides about half a dozen others, with various uses of modules, objects, regular variants, and so on. You might find it interesting.

They are not the same thing,

They are not the same thing, and that's why any attempt to make them the same thing will impose constraints on typical use cases for one or the other. If union types're open, I've no way of knowing that my standard inductive list type's not just been turned into a tree by Moe or Joe "helpfully" adding a data constructor

Branch a (List a) (List a)

. That's a constraint on what I can do with the type system.

You can in fact do subtyping on union types. It turns out to work 'backwards' compared to subtyping on records, but nevertheless has its uses (but see earlier comment about row polymorphism - I've a bad habit of confusing the two because row polymorphism effectively comes with a notion of subtyping).

The openness of a type can be user-defined.

If you want your union type closed, then the language should allow just that. But in some cases, it may be needed to be open.

A type T1 that is a part of a union type T is a subtype of T, for all practical purposes.

No. I might have more than

No. I might have more than one data constructor that takes parameters of exactly the same type. There's a link, but it's not as strong as you're claiming.

You're right that open unions have their uses - the classical example would be exceptions. I've advocated them as an option before. They most certainly should not be the default though.

Could you please tone down the claims you make when you think you've spotted a link? You do this regularly, and you get it wrong.

Detailed comments on responses

Again Thanks to all of you that have responded.

In relation to the comment I made earlier about each value belonging to only one type. With respect to the system that I am building, each value carries it type with it at all times. Example of this are as follows:

The Integer 1 would be respresented in the system as [Int:1] where for the purposes of this discussion [:] is an indivisible entity in the underlying computational system. The exact coding of these "entities" is an implementation issue and does not impact on the system as such - I believe the term to describe these is that they are the Godel points.

Likewise the Integer 2 would be represented by [Int:2].

In the case of the two Boolean values True and False, these would be represented by [Bool:True] and [Bool:False] respectively. Every value is distinguishable from every other value and it is obvious from which type (type space) the value comes.

Types themselves are also values in the universe of discourse and are represented within the computational system. Examples of this would be [Type:Type], which represents the Type type (type space). For discussion, the following would also be valid:

[Type:Int] would represent the type Integer,
[Type:Bool] would represent the type Boolean,
[Type:Float] would represent the type Float (eg 2.0 etc),
[Type:List] would represent the type List (here I am talking about a polymorphic list).

Where I am coming from in terms of structured types is that components of a structured type do not themselves have to have a specified type, however the system certainly is designed to allow me to do so and in many cases this will be done. What this means is that when I use the type generator, a new type is brought into existance that is distinguishable from every other type that currently exists. Part of the creation of the type is the creation of a value generator for that type.

In addition to the above, I am building in the facility (extendible) to specify a constraint on the allowable values that a component may take. In other words, I will be able to specify that I don't care what values are allowed (all are allowed) right up to specifying which specific values from specific type or types that are allowed.

There is no provision for a "universal" type Univ. In addition, no type will be a sub-type of any other. All types (type spaces) are disjoint.

Thank you for the reference to Conjunction types, but this doesn't appear to be what I am looking for. As I said above, no value will belong to more than one type.

From my perspective, sub-typing is the wrong solution to the problem. It seems to me that defining an "expression" or "function" that describes a constraint is more meaningful. That is not to say that the concept of a sub-type is not valid, I just don't see it as an ultimately viable solution. I prefer to consider the constraint a sub-range specification and not the delaration/specification of a new type.

Please take what I am about to write as purely a personal opinion.
[[Opinion On]]
I consider the concept of sub-typing and sub-types as furphies. I personally consider sub-typing to belong to the same bucket as XML (and its associated technologies) - a "bad" solution looking for a problem to solve. Wherein by "bad" I mean that it is an overly and unneccessarily complicated solution. It just doesn't have the "Ah Ha" quality about it. And my opinion of XML and its ilk comes from my data communications days.
[[Opinion Off]]

Again the type of every value is known at all times and is not seperable from the value it represents. Therefore having a list (of different types of values) doesn't hide the type of each of the component values in the list.

Since I have no hierarchy in the type system, upcasting and downcasting are not relevant to what I am trying to achieve. However, a functional mapping between two types (type spaces) is not only allowed but encouraged when appropriate. Examples of this are as follows:

[Float:_] to [Int:_] or
[Int:_] to [Float:_]

would be expected with the appropriate mapping rules. Another example could be

[Circle:_] to [Ellipse:_]

There may not be appropriate mappings from one type to another, this is a programmer decision.

I will continue to look at the Haskell type classes, but at first view they don't appear to be relevant to the information I am after.

How does dependent types relate to lattices? I will take a look at the various references given later, but has anyone got a simple explanation?

I'll also look at the reference given for Code Reuse Through Polymorphic Variants. I am more than willing to look at the various ideas that you each have raised to glean a better understanding. I am reserving the right to not incorporate any of them if they don't fit within or modify the model I am currently working with.

In relation to the previously given example of a polymorphic list, which will probably seem unpalatable to some, I give an example from my days of writing communications programs( in COBOL of all things).

A messaging system sends a variety of messages from one process to another. One can specify that all messages belong to the type [Type:Message] wherein the contents of the messages could/do belong to different types (control/protocols and data).

Lastly an observation: DSLs are problem domain solutions not solution domain solutions and as such are directed at solving the particular problem domain problem, which is where I am trying to work.

Again I wish to thank everyone who has responded, I appreciate the time and effort each of you has taken in responding. You have given me food for thought.

Dependent types

Your notion of first-class types seems to resemble so-called dependent types. There have been a number of experimental languages with dependent types, e.g. Cayenne or Epigram, which emphasize the ability to express all kinds of properties on the type level.

Note however that type theories with unrestricted Type:Type are known to be undecidable, and often otherwise problematic as well (e.g. logical inconsistencies can arise). This is usually circumvented by introducing towers of universes, Typei, where Typei:Typei+1, but not vice versa.

I am a long-time fun of the idea that values are types.

I am a long-time fun of the idea that values are types. See an old comment of mine here:

http://lambda-the-ultimate.org/node/1273#comment-14206

First of all, I have to disagree on subtyping being 'bad' (as in 'XML is bad'). Subtyping is nothing more than classification based on desired properties. In other words, a grouping of values (hence my claim that subtyping equals union types). And with grouping comes reuse. The real problem with subtyping is its implementations: current implementations enforce a cast-in-stone relationship between types which may not always be desirable.

Secondly, I think you can built all types out of three basic types: 0, 1, and a tuple of those. The infinite combinations of 0 and 1 in different configurations can produce infinite types.

For example, a bit is a union type = 0 | 1.
A byte is a tuple of 8 bits.
A short integer is a tuple of 2 bytes.
An integer is a tuple of 2 16-bit words.
A pointer is a tuple of 32 bits.
A linked list is a tuple of two pointers and an integer.
A class is a tuple of its members and a vtable.

The point being?

So how do you distinguish a pointer from a 32 bit integer? How do you distinguish pointers to bytes from pointers to integers? How do you express the types of functions?

More importantly, how do you express type abstraction? Your examples seem to completely miss the fact that types should express properties (i.e. semantics), not representations (i.e. implementation).

Of course you can encode any kind of information into bit sequences, but that's not news. What's the point of using such a low-level approach for types?

So how do you distinguish a

So how do you distinguish a pointer from a 32 bit integer?

You distinguish a pointer from a 32-bit integer by using a strong type definition. For example:

type ptr T = integer 

Then in any function that requires an integer, you can not pass a pointer, but you can do pointer arithmetic because a pointer is just an integer.

If you want pointers without integer properties, then you could
declare a tuple with one member:

type ptr T = {address : integer} 

Then pointer operations can work on the address, but a ptr is not an integer.

More importantly, how do you express type abstraction? Your examples seem to completely miss the fact that types should express properties (i.e. semantics), not representations (i.e. implementation).

Expressing semantics is a different ballgame. I think that the best approach is to have types as functions. Values are functions nevertheless with constant computations. For example, one could declare a positive number like this:

type positive n = n as integer and n > 0

A union type can then be expressed like this:

type foo T = T as foo1 or T as foo2 or T as foo3
Of course you can encode any kind of information into bit sequences, but that's not news. What's the point of using such a low-level approach for types?

The point is flexibility: one can make any type desired, from low level pointers to high level functions. The compiler can stay small (it can even provide only the IR)...if coupled with a powerful macro system that can transform code into basic elements in compile time, then you have a powerful language appropriate for all tasks.

It goes without saying that this is just an idea which might not work in reality :-).

Try it out yourself

It goes without saying that this is just an idea which might not work in reality :-).

Why not have a go at it yourself? You have lots of ideas and I think developing them in more detail will mature them.

If you need background reading, I'd recommend Benjamin Pierce's
Types and Programming Languages. It has implementations of all type systems that are presented and you can use these as the basis for your own experiments. The book contains enough theory to extend your intuition about the type systems with solid proofs of their power and limitations.

I am trying.

Thanks a lot for the link. I've read parts of the book you mention online, if I recall, and also a couple of other books that I have found lying around in the web.

I have a full time job as a programmer and I do not really have the time to undertake such a big project. Although in my spare time I am trying to write a compiler framework so I can test these ideas.

Types as First Class Values

I have just realised that my first question doesn't seem to have been answered. So I'll repeat it again.

What is the effect of making Types (or Type Tags) first class citizens of the programming language?

What is the effect of being able to create new types within the code, in other words having a function that on different invocations returns a new type?

How does ST or ST languages handle this (types as first class values) (either theorectically or practically)?

And my apologies to Robert Milne for not getting his name right in the first instance.

AFAIK

The answers to these would seem to lie around the already-mentioned dependent types or possibly having Ontic-like semantics.

Non top-level types?

What is the effect of being able to create new types within the code, in other words having a function that on different invocations returns a new type?

Hmm. Maybe you need to supply code snippets of some examples of what you'd like to do with these first-class types. Are you asking that function to return values of a new type? That doesn't seem possible/desirable in a static language. It seems like only the stuff inside the lexical scope of the original function would be able to use these types. This seems somewhat related to existential types to me.

Another point you'll no doubt want to investigate is why pretty much all languages require types to be declared at the top level.

Also, I think you might be interested in the Omega language, which has higher order, extensible kinds ("kinds" are the types of types).

pretty much all languages

pretty much all languages require types to be declared at the top level

Well, I have no statistics, but many of the ST languages I'm familiar with allow (at least some) types to be declared at other than the top-level. Take a look at Standard ML, for example.

What is the effect of making

What is the effect of making Types (or Type Tags) first class citizens of a programming language?

These are actually two different questions, with very different answers.

I'll answer the second question first.

First, think of a type as a classification of data. Then, a type tag is then a piece of data that encodes what the type of another piece of data is. So, the structure of the tag tells you what the type of the object is, and a "dynamically typed value" would be a pair, consisting of a tag and a value of the type indicated by the tag.

So you might have a datatype of tags like:

datatype tag = 
   | IntTag
   | BoolTag
   | PairTag of tag * tag
   | FunTag of tag * tag

Now that you have a datatype of tags (these are richer than usual), you can write an interpretation function that takes a tag, and computes the appropriate type from it.

typeof : tag -> type
typeof IntTag              = int
typeof BoolTag             = bool
typeof PairTag(tag1, tag2) = typeof(tag1) * typeof(tag2)
typeof FunTag(tag1, tag2)  = typeof(tag1) -> typeof(tag2)

Notice that this function takes a piece of data (a tag) as input, and computes a type from it, so we are using a program value at compile-time.

Finally, we can give the type of "dynamic values":

type dynamic_value = (t:tag * typeof(t))

What this type says is that we have a pair, consisting of a tag named t, and a second component whose type is typeof(t).

So a typical program using a dynamic value will project out the type tag, case-analyze it to figure out what type the second component is, and then do something appropriate. This is all sensible and safe from a type-theoretic point of view.

These sorts of definitions are called "dependent types", and are pretty common in theorem proving systems. Limited forms of them form the basis of the ML module system, and for Haskell's guarded algebraic types. Full dependent types will almost certainly start showing up in the next generation of functional language designs, soon to be brought to you by a grad student near you. :)

The answer to your first question, however, about what happens when you make the types themselves (rather than things like type tags) into program values is that many good properties of the type system start to break down, with more properties breaking down the more you can do. Simply adding a type of types to the language is enough to make typechecking undecidable, though it does not let you write programs that dump core. If you can get the type of a value and then case-analyze the type itself, you additionally lose any facility for data abstraction.

having and making first class types

They seemed good (very general and basic) questions, which can be hard to answer. I can answer for a Smalltalk-like DT system, but I can't say much about ST languages. Here's a short remark on my ignorance about ST, before I tackle the DT perspective.

I find complexity and/or ambiguity from ST at compile-time hard to tolerate, because it forces me to embrace classes of problems I could arrange to avoid by design. (I suspect avoiding large classes of costs by design instead of facing them head-on is one big distinquishing characteristic among developers.) So for example, I don't understand Haskell beyond surface impressions. I only came close to understanding ML back in 1995, which I had to learn ML well enough to read Andrew Appel's book on compiling continuations. But I soon turned to reading the C and assembler source code for New Jersey ML instead, which told me what I really wanted to know. :-)

Bruce Rennie: What is the effect of making Types (or Type Tags) first class citizens of the programming language?

This happens in some languages like Smalltalk, where a Type is an object of type Class, and every object in the system is associated with a Class. (And each Class has a type which is a Metaclass.)

It makes a language 'reflective' if you can write code asking Type objects questions. If Types survive to runtime (which would make it a lot easier to have code ask questions), and changing these Type objects also changes system behavior, this is part of being 'meta-complete' as well.

A typical thing to ask a Type at runtime is, "What can I do with your instances?" In Smalltalk, dispatching a method to an object amounts to finding a method in the object's class's method table, or in a superclass's method table (which ought to be cached for performance).

Bruce Rennie: What is the effect of being able to create new types within the code, in other words having a function that on different invocations returns a new type?

Mainly it means at runtime you can have all the sorts of changes that might only occur at development time in other more static systems. You'd want to make sure the runtime had good behavior under scaling, in case something extreme happened, like making a new type for every client request sent to a server, as a plausible worst case example.

It makes it hard to reason about what happens with the new types, when trying to predict the limits of behavior at compile time, before the system starts running.

I don't think Classes == Type

This happens in some languages like Smalltalk, where a Type is an object of type Class, and every object in the system is associated with a Class. (And each Class has a type which is a Metaclass.)

You'll have to forgive me, as I am speaking intuitively rather than from a sound theoretical basis, but I don't believe that a class is the same thing as a type, especially in Smalltalk. I believe that Smalltalk has one type "object" (things that can receive messages) which includes classes. I say this because theres no way to know for sure what class something is. Someone can easily create an object that lies through its teeth, and you probably don't care that it's lying anyway (duck-typing). Having classes to me, in Smalltalk anyway does not equate to first-class types. Smalltalk has only one type that just happens to be very, very flexible.

duck typing

Logan Capaldo: You'll have to forgive me, as I am speaking intuitively rather than from a sound theoretical basis, but I don't believe that a class is the same thing as a type, especially in Smalltalk.

I'm not attack oriented, which is uncommon, and I prefer shades of gray in my monotones, and not just black and white. I can squint and see all your points as true for a choice of meaning for 'type' and for a set of goals (e.g. to see a unifying quality in Smalltalk objects).

I think the intent of a class under Smalltalk is informal representation of type; but this needn't fit a definition of type under other criteria. So it depends on what you want to extract from a line of reasoning. For example, I can say what I meant to extract from my class as type view.

You can use duck typing as a definition of type, and have a compiler statically predict whether you risk having a doesNotUnderstand failure at runtime. (If the goal is predicting runtime failure at compile time, then a Smalltalk class works as a type, even if only as a duck type.) The subset of methods Mx used on an object x downstream from the point where x is passed as an argument amounts to the required type of the argument. Specifying the formal type of x as a Tx where Tx >= Mx (Mx is a subset of methods in Tx) would be safe if every value passed as x had at least all the methods in Tx.

That's the sense in which I meant a Smalltalk class was a type. The only kind of static typing I might add to Lisp and Smalltalk would be based on duck typing. I only want to catch inconsistency at compile time between how I use a value and the ability to respond to that use.

Catching Errors vs. Waxing Philosophical

Aha I see. I sometimes forget that the main point of type-checking is to make sure the program is right, not to play mental games with myself ;).

Real uses

So, actually, what is the use of types and advanced typing features ?
- Catching errors.
- Function overloading : A single "print" function to remember.
- Optimisation hints for the compiler and precise memory sizing ( for example the bit fields in C, "RANGE" stuff in ADA & VHDL )
- Formal proving

What are the other uses ?

Compiler checked documentation

Another use that I'd add to that list is that a type provides a form of compiler checked documentation. Glancing at the type can give quite a bit of information about the function.

Stoy, not Scott?

Just a minor bibliographic and spelling correction; when you say that you have read:

" ... Dana Scott's "Denottaional Semantics" ...

do you mean Joe Stoy's book? Amazon has the information on it as:

"Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory" by Joseph E. Stoy

which I think is correct.

Of course the work on domain theory and denotational semantics is largely Scott's and Strachey's (and Milne's, and many, many others now), but the book itself (if you mean the one I suspect) is Joe Stoy's. (Scott also contributed a foreword to the book on Strachey and his work, written not long after he had passed away.)

Stoy, not Scott?

Thank you, you are correct. I have just got my copy of the book out and can see my mistake. Dana Scott's name is in a larger font than Joseph Stoy's name on the front cover.

I rushed a little too fast with my typing.

regards and thanks.

Some people do not understand types.

I have posted a comment in this discussion showing the benefits of functional programming languages' type systems but I saw no reaction about it, which was a great disappoinment. I think people don't get type systems, especially those who have the mainstream imperative programming languages as their main line of work.

A major problem in Java (i.e. handling of null) is a non-problem in functional programming languages due to the way null is handled in a logical way as a separate type...the problem is solved for many years now, yet mainstream programming languages ignore the solution...

Another day, another disappoinment...

Not sure I would agree

The above makes it clear that when it comes to factorial, number 0 is a different type than the rest of the numbers.

Using ML as the basis for the defining the type natural numbers:

datatype number = Zero | Succ of number

val zero = Zero
val one = Succ(Zero)
val two = Succ(Succ(Zero))
val three = Succ(Succ(Succ(Zero)))

fun factorial (Succ x) = (Succ x) * (factorial x)
            | Zero = Succ(Zero)

From this definition, it should be clear that Zero and Succ are constructors of number, and that both constructors yield the same type. More specifically, the set of numbers is defined as the union of zero and all succeeding numbers (ignoring negative integers). The real difference in this case is not that the types are different, but rather that the value of zero does not recursively reference the number type, whereas the succ numbers do. I'd be more inclined to say that the advantage that you cite has more to do with mapping induction directly to the functional recursion over a type.

The case of lists would be similar in recursion:

datatype 'a list = Nil | Cons of 'a * ('a list)

(Also, the subthread on Downcasting in ML is also somewhat related).

Certainly.

O is not strictly a separate type...but it can be thought of as such in the context of a value-based type system...just like nil.

A type defines the domain of possible values

In this case, 0 and all succeeding numbers define the range of values for the type of natural numbers. Dependent types would allow you to have much finer control over the values within a type, but once defined, the stipulated values would all have the same type. Stated differently, the set of natural numbers is [0,1,2,....,n].

It's not that I disagree with your conclusion that functional languages provide a fine facility, it's just that I think you are defining Type a bit loosely. In my opinion, the advantage you are talking about has to do with functional languages directly supporting proof by induction, which naturally lends itself to recursion over a type. As in all cases of proof by induction, you must have a base to start with, and then prove that each succeeding value.

Range types aren't types then?

Some languages allow for defining a range type (ADA for example). Zero is a range type by itself. Therefore one can consider factorial (for example) to be defined twice (in other words, be overloaded): one for the range [0] and one for range of [1,inf].

Sure, but...

taking such a perspective, you are treading dangerously close to dependent types (a more widely used theme, though, is taking head of cons lists, not factorial).

Not bad by itself, but worth knowing that.

Yeap.

Just as a reminder, one of the relevant LtU threads:

Dependent types: literature, implementations and limitations ?

I think dependent types should be the way forward for programming languages, but I do not see it happening in the next 100 years.

Depends how wide your value

Depends how wide your value of "dependent types" is - the odds of something like GADTs not making their way into the mainstream seems pretty slim to me.

The big four FP languages

Scheme: The typical Scheme definition of factorial (straight out of SICP) does not use pattern matching. IIRC, Scheme can support pattern matching, but I don't know the language well enough of how it would really differ

(define (factorial n)
  (if (= n 0)
      1
      (* n (factorial (- n 1)))))


Standard ML: SML comes closer to your view of how functions can pattern match on specific values:

fun factorial 0 = 1
  | factorial n = n * (factorial (n-1))

There are a couple of problems with your analysis though. First, there are not two functions here - there is only a single function that has a type (int -> int). You may model in your head that there are two functions being overloaded with two types, but the programming language has no semantics along these lines. The second problem is that the clauses are order dependent. The following ML code will go into an infinite loop:

fun factorial n = n * (factorial (n-1))
  | factorial 0 = 1

Finally, the syntactic form of repeating the function name over the different patterns is really just syntactic sugar for:

fun factorial n =
   case n of
      0 => 1
    | n => n * (factorial (n-1))

which coincidentally corresponds to the form of the Scheme, OCaml, and Haskell code.

OCaml: Caml uses the match construct:

let rec factorial n =
   match n with
      0 -> 1
    | n -> n * (factorial (n - 1))


Haskell: Haskell uses guards to accomplish same:

factorial n | n == 0 = 1
            | otherwise = n * factorial (n - 1)

Haskell

I think you can write it as 2 seperate functions in Haskell:

factorial 0 = 1
factorial n = n * factorial (n-1)

But I guess it is still order dependent.

Not separate

I think you can write it as 2 seperate functions in Haskell

Even though the syntax might suggest otherwise, this does not define 2 separate functions. The syntax has the same meaning as the SML example, and similar desugaring rules.

Yes, this is just sugar for

Yes, this is just sugar for case.

Various Haskell derivatives and Haskell-like languages use most-specific pattern matching: Hope is an example.

OCaml

In OCaml it can also be written everso slightly more succinctly as

let rec factorial = function
0 -> 1
| n -> n * (factorial (n - 1))

I've done a bit more in Standard ML...

...so it doesn't surprise me that there are closer approximations in OCaml and Haskell than I could come up with in short order. I did make sure they compiled first, so at least I know they were legal snippets. I now know a wee bit more about OCaml and Haskell. :-)

Some people...

I think you could have been a bit more explicit in stating that making the null/non-null distinction a part of the type of value means that a value cannot ever be null when that is not explicitly allowed by its type -- that is, it cannot unexpectedly become null through programmer error on either side of the API. I think that is what most people on that thread were missing.

In my own experience such things did require small "leaps" in understanding before I finally "got" the stronger, more expressive type systems. Being more explicit might help more people make the leap (to the benefit of all :)).

Good observation! thanks.

Good observation! thanks.

...that a value cannot ever

...that a value cannot ever be null when that is not explicitly allowed by its type

Isn't that the progress property of type safety?

More assumptions

The definition of preservation and progess in
these notes seems pretty standard. Getting null where non-null-ref a is expected is a violation of preservation, unless your type system is so broken you can conclude null : non-null-ref a. Whether it's a violation of progress as well depends on your evalution rules.

Of course, in the context of how Achilleas might have more effective explained non-null reference types he couldn't have assumed his readers would have the formal background in type theory to make this conclusion for themselves.