GADTs in a dynamically type-checked language?

I've been reading papers about Ωmega, and I'm very fascinated by them.

At the same time, I'm invested in untyped, dynamically type-checked programming. I'm not sure I want to give it up - it has some conveniences in interactive programming that I consider essential.

But is there actually a reason why GADTs wouldn't work in a dynamically type-checked language? FGJω and an extension of C# show that you can encode (some?) uses of GADTs once you have type operators/higher kinds or equational type constraints, respectively. Couple that with exact runtime types as in the CLR and it seems to me that theoretically you could enforce GADT properties even in a dynamically type-checked language.

Now, you might ask, what's the point? Let's take the type-safe AST evaluator (yeah, I know... ;)). Well, of course, you'd lose the ability to prevent the construction of ill-formed AST objects at compile-time, but you'd still have these guarantees at run-time. And that would be an improvement over plain dynamically type-checked languages, no?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

dynamic GADTs

My intuition is that GADTs would be nothing special in dynamically typed languages.

Could you explain what you would believe to be a difficulty for GADTs in dynamically typed languages? Which properties do you expect would be problematic?

Flying blind...

First, it seems to me that you'd need exact runtime types for type arguments, as we discussed recently, and which is something that no existing dynamically checked language (that I know of), except C# 4.0, has.

Second, it's unclear to me how to translate the examples from the Ωmega papers to such an untyped framework, i.e. what the pragmatics of it are. I guess I'll have to try it out, but your comment seems to indicate that it's no big deal, which I find heartening.

I must say though that I don't understand your "nothing special" remark. Isn't the ability to rule out the construction of certain data objects through the type system in an untyped language something special?

Dynamic Typing?

As I understand it, a dynamically type-safe language will have well-defined behavior even if the input is of an unexpected type. That does not mean the behavior will predictably be a thrown exception.

It seems you are aiming to more explicitly reject (or "rule out") inputs that aren't of the correct type. This could be done if you have a little reflection and first-class types, but I believe it goes above and beyond dynamic typing.

I've not heard that C# 4.0 is a dynamically typed language.

First, it seems to me that

First, it seems to me that you'd need exact runtime types for type arguments, as we discussed recently, and which is something that no existing dynamically checked language (that I know of), except C# 4.0, has.

Assuming I understand what you mean by "exact runtime types for type arguments", then I'm working on adding exactly that to Magpie right now.

Magpie is dynamically-typed, but has an optional static checking pass that it performs. Types are "erased" at runtime (i.e. the type annotations are ignored when a method is invoked), but type parameters will be available at runtime.

All types are first-class in Magpie, so it's fairly straightforward for something like an instance of a generic type to also have a reference to its type parameter: it's just another object.

Hopefully that makes at least a little sense. I can elaborate more if you want.

Exact runtime types

I'm using this term as it's defined in the CLR generics paper: it's possible to distinguish a list of integers from a list of strings at runtime.

the type annotations are ignored when a method is invoked

That certainly mustn't be the case. The runtime has to be the ultimate arbiter of type safety: if a function parameter has a type annotation, the runtime must verify that it matches the type of the actual argument value when the function is called.

You cannot have your cake and eat it, too

I don't think you can have "exact runtime types" without (some form of) proper (i.e. static) type system. How do you know the "exact" type of the empty list? Of a function?

I'm not sure I understand

I'm not sure I understand why static types are necessary for this. If you have an object representing an empty list, and it has a field referencing the type (i.e. a type object, first-class class, or whatever you want to call it: a runtime object that represents a type), don't you know enough to tell what type an empty list is without any static types?

Type or data?

A "field referencing the type" is merely an implementation detail, it doesn't explain where you get that type from. If the static type system does not fill it in, who does? If the programmer fills it in himself, without any verification (like you suggest in your other post), then I don't see how this can honestly be called a type system - "types" have been replaced by essentially random data. It also defeats the point of having an untyped language, namely that (supposedly) the programmer doesn't have to worry about types.

If the static type system

If the static type system does not fill it in, who does?

It would be provided as an argument when constructing the object, just like other arguments. So, if you wanted a list of ints, something as simple as:

var myList = List new(Int)
without any verification (like you suggest in your other post), then I don't see how this can honestly be called a type system

A fair point. It depends on your definition of "type". If you're going with the formal type theory definition (which is perfectly valid, and what I'd expect here) then yes, there's nothing in the above example that's a type.

More informally, if you consider a "type" to be some construct that gives you information about a collection of objects, then the above does. Python, for example, doesn't have types in the formal sense, but it does have first-class classes, and you can use those classes to answer questions about the capabilities of instances of that class (for instance, membership, which is the main question you ask of formal types).

But, of course, none of this happens statically (i.e. before runtime) so it isn't anywhere near as useful as a real type system. What I'm trying to get working for Magpie right now is the following:

var myList = List new[Int]

The difference between the square brackets and regular parentheses is that square brackets indicate a static argument to the method. They will be evaluated at runtime (so that type arguments are available at runtime) but also at type-check time. This means they're available to the type-checker for use in subsequent type annotations. With that, you should be able to define a generic list that statically strongly-types its methods based on the passed-in type argument.

I'm not an academic, but I think that gives you something that meets the formal definition of a type: we've got information we're using statically to tell which operations are valid and which aren't.

That's the idea at least. I'm still hacking the implementation together so maybe this will fall over on me.

It also defeats the point of having an untyped language, namely that (supposedly) the programmer doesn't have to worry about types.

You can also omit the type annotation or argument, in which case it will default to Dynamic and fall back to working like a vanilla dynamically-typed language.

You're right that it

You're right that it requires the programmers to manually annotate types. However, there are still advantages: the types can be concerned with the data's meaning rather than its representation (which is what drives many people to dynamically-typed languages), and the explicit construction of types can also trigger runtime invariants checks.

The latter seems especially useful to me: while many properties may have to be asserted by the programmer, checking that they are respected when composing types (e.g. a sorted bag of [x]) rather than erroring out in an internal routine (or never) seems like a win. In my experience, the problem with large systems isn't that we need computers to infer or check the properties of individual modules, but rather that it is hard to track and check invariants when composing multiple modules. Arguably, compile-time checks are better for that, but moving runtime checks as close to the actual composition of modules as possible also seems like a worthy goal.

I'll also note that statically typed languages may also have to manually annotate types: for instance, while a data type may only depend on a strict order on its element (e.g. sorted bag of integers), there may be multiple ways one will want to compare integers. In the end, static types can be propagated to a certain extent, and that's certainly useful, but the general case still requires manual annotation, both in a typed and an untyped setting.

Confusing terminology

I don't think you can have "exact runtime types" without (some form of) proper (i.e. static) type system.

Sorry for my unclear terminology. What I meant was that the language doesn't require being able to assign types to expressions statically, without running the program. OTOH, objects do indeed have types (tags), and as Derek said, what I'm trying to do is basically run (parts of) the type system at runtime.

How do you know the "exact" type of the empty list? Of a function?

I should have been clearer here, too. Object constructors do take type arguments, as in new List[Integer](). Furthermore, functions may have type constraints for their parameters and result - if they're not given, they default to something like C#'s dynamic.

What are the values of type arguments?

I often find it helpful to think of type parameters as real parameters. I.e. they are extra parameters passed to methods and classes (if you have them in your language) whose values range over the universe of static types. When you allocate a new object, you can think of actually binding the type arguments as "fields" in the object or metaobject. When you use a parameterized method in a first class way, you actually are binding the type arguments in a little closure for the method.

Using this analogy, whenever you apply a method you must supply its type arguments, just as you supply its value arguments. But what type argument do you supply? In a dynamically typed world, the best the compiler could do would be...Any? That doesn't seem helpful...

That's exactly how I look at

That's exactly how I look at it with Magpie: type arguments are just another kind of argument.

But what type argument do you supply?

If all types are first-class (i.e. they're available at runtime as values) then you can just pass in the object representing that type.

it's possible to distinguish

it's possible to distinguish a list of integers from a list of strings at runtime.

Yeah, Magpie will support that. The basic idea is that you can pass both static and dynamic arguments to a method:

foo[Int](123, "hi")

Here, Int is a static argument (a reference to the global variable containing the Int type object) and 123 and "hi" are dynamic. Static arguments will be evaluated at type-check type (so that the results can be used in subsequent type annotations) and at runtime. The latter means that type arguments will also be available a runtime. In the case of a method that happens to be a constructor for a generic type, the idea is that the runtime evaluation will store that type argument with the constructed object so that it can be used later. That also means we'll be able to distinguish between a list of ints and a list of strings (even if its empty).

The runtime has to be the ultimate arbiter of type safety

The runtime semantics don't have the same concept of a "type" as the type-checker. The runtime is safe (in the same way that all dynamic languages are, by checking that a method exists before invoking it), but it isn't type-safe. If a program has already statically type-checked (and the type-checking isn't busted), there's no need to check the types again at runtime.

You can, of course, go out of your way to lie to the type-system and defeat the static checking, but you do that at your own risk. I don't know if it makes a lot of sense to do redundant type-checking at runtime just for that minority case. The worst case is still just falling back to regular dynamic semantics, which aren't that bad.

Different goals

You can, of course, go out of your way to lie to the type-system and defeat the static checking, but you do that at your own risk.

For me, that's an absolute no-no. If I declare the type of a function parameter for example, I want the absolute guarantee that that parameter will always hold values of that type, and never any others.

Nonsensical

What GADTs would even mean in an untyped language is far from clear to me. The G part of GADTs relies on type parameters which simply aren't there at all in an untyped language. Consider one of the foundational GADTs,

data EQ :: * -> * -> * where Refl :: EQ a a

There is only one, atomic, value in this GADT. In an untyped language there is no way to distinguish different Refls at different types. Even if you did have a way to differentiate the Refls and identify 'a', what does that accomplish? The whole point of this GADT is to unify to type variables; there are no type variables in an untyped language. There is no way to propagate this type information.

To make this work out you could create an elaborate framework of detailed tags, the "exact runtime types" that Manuel mentions, and a system to propagate and check these tags. Of course, this is nothing more than a type system. All that is happening in this scenario is that you are designing a statically typed language and building a type checker for it. The type checker just happens to be invoked at run-time. This is a relatively unexplored area that I have brought up a few times before, but it is statically typed programming, not dynamically typed or "softly" typed.

There is a possible hybrid solution. You could do as I described above, but only incrementally solve the type constraints generated by the type checker, potentially allowing the program to continue even in the face of missing or even inconsistent type information. I suspect you could make a reasonable system like this for plain HM, but that the potential to have an inconsistent constraint store would seriously weaken the benefit of GADTs.

At any rate, it is certainly not "nothing special" unless all you meant was that the "trivial" type erasure of a type correct program using GADTs will have the same operational semantics. This is pretty clearly not what Manuel is asking for though.

.

dup

Similar Thoughts

I'm also trying to grok what Manuel was asking. Though, he mixes the words 'dynamically typed' and 'untyped', those do mean different things from where I'm sitting. Nonetheless, it is unclear to me that a "trivial" type erasure of a correct program using GADTs isn't the desired translation of a statically typed GADT program into the same with dynamic types...

untyped = dynamically type-checked (= unityped), no?

That's how I'm using these words. By untyped (and the others) I basically mean that the language generally doesn't try to assign types to expressions at compile-time/statically, but rather checks the types of values at runtime/dynamically.

No. That doesn't seem

No. That doesn't seem typical at all.

Untyped and unityped are usually the same (due to terminology borrowed from 'untyped lambda calculus') but there is also a sense in which unsafe languages might also be 'untyped' but not 'unityped'.

Technically, if a language is unityped, then you know ALL the types statically. It isn't difficult, since there is only one type. There is no purpose to type-checking a unityped language, since you know the answer in advance.

'Dynamic type' suggests that there is more than one a type, but that the type of a variable or expression might not be known before executing it. Dynamic type-checking usually occurs right at the site of an operation. For example, in JavaScript, the arithmetic code invoked by 3 * {owl:"hoot"} peeks at the types of its parameters, tries to convert the object to a number, and eventually returns NaN (rather than raising an error). JavaScript, Ruby, PHP, and MS Excel are examples of dynamically typed languages - and implicit conversions between 'types' are a regular tool in the dynamically typed toolbox. [edit: checked example in SpiderMonkey, corrected output.]

It is unclear to me what you are aiming for. You certainly want dynamic typing, but it also seems you don't want it to get to the call-site before raising the wtf exception. This suggests staged typing.

OK

For the purposes of this discussion, let's use C# as a model, but we always declare all variables and function parameters with dynamic (that's also the sense in which I called, somewhat stretching things, C# a dynamically type-checked language above). This is analogous to Lisp (if Lisp had parameterized classes and exact runtime types - i.e. the types include the type arguments for parameterized classes).

It is unclear to me what you are aiming for. You certainly want dynamic typing, but it also seems you don't want it to get to the call-site before raising the wtf exception. This suggests staged typing.

The type system should, at runtime, prevent the construction of class instances/objects that are illegal per GADT rules.

Reify GADT constructions?

The type system should, at runtime, prevent the construction of class instances/objects that are illegal per GADT rules.

Neglecting syntax issues, does this not boil down to reifying the GADT types as an actual field of the resulting object in the target language, and turning the GADT typechecking rules into rules that get applied to how the "type" field should be constructed?

The syntax problems, I think, this could be solved with Scheme macros in a way that would end up looking a lot like what you want.

[edit: Sorry - should have read Derek Elkins' post first, which makes much the same point as I do, much more clearly]

Fields...

One minor detail is that it's not objects that remember the type fields, but rather instantiations of parameterized classes. As in the CLR, parameterized classes have different instantiations at runtime. You'd have List[String], List[Integer], etc. No need for objects themselves to remembers the type arguments, they're stored in the class instantiations. But that's an implementational detail.

Yes

To make this work out you could create an elaborate framework of detailed tags, the "exact runtime types" that Manuel mentions, and a system to propagate and check these tags. Of course, this is nothing more than a type system. All that is happening in this scenario is that you are designing a statically typed language and building a type checker for it. The type checker just happens to be invoked at run-time. This is a relatively unexplored area that I have brought up a few times before, but it is statically typed programming, not dynamically typed or "softly" typed.

That's what I'm talking about. See my reply to David re possible terminological confusion wrt dynamic typing.

Do you have any pointers to these previous discussions handy?

Why not both?

Of course, this is nothing more than a type system. All that is happening in this scenario is that you are designing a statically typed language and building a type checker for it. The type checker just happens to be invoked at run-time. This is a relatively unexplored area that I have brought up a few times before, but it is statically typed programming, not dynamically typed or "softly" typed.

That's a pretty accurate description of Magpie, but I don't see why that would make it not dynamically-typed. I think the clearest description is that it's both dynamically-typed and statically-typed:

  1. It statically checks that operations are valid based on the annotated types of variables.
  2. At runtime, it makes sure a method exists before trying to dispatch it.

The former sounds like static typing to me, the latter like dynamic. Is this really a dichotomy?