A Self-Checking Type System

I'm a little hesitant to ask the big brains here to take a look at my little language Magpie, but I managed to pull something off in it that I think some of you might find interesting. (Or, at the very least, you can tell me what I did wrong.)

Magpie is a hybrid dynamic/static language. The type system it uses during type-checking is implemented in Magpie. So, during static checking, it switches between statically walking the code and dynamically evaluating type annotations and type comparisons.

I wrote a bit about it here and I'd appreciate any feedback you have on it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A Self-Checking Type System

What properties are you looking for in a type system?

The type system itself isn't

The type system itself isn't particularly interesting. It's a basic mixture of ML and an OOP language: classes, functions, tuples, unions, and interfaces. I'm working on parametric types.

The only thing novel here, if there is anything novel, is that the type system is implemented in the same language which means it can type-check its own implementation, and that users can extend the type system without having to hack the interpreter itself. The type system is defined almost entirely at the user level.

Still more interesting than today's average, IMHO (re: tuples)

The type system itself isn't particularly interesting

AFAIK, tuples still are, in 2010, a oft-missing type system feature in many languages, including the mainstream ones, and especially in the OO flavor (but yes, you also said Magpie is hybrid in that respect as well — as a "a basic mixture of ML and an OOP language")

So, I'm glad (again) to see someone putting them upfront in the language's design, as you do with Magpie. :)

Okay, so just a few questions (note: and asked independently of the "hybrid dynamic/static language" and type-checking extensibility aspects — so, let's say, first, from a "classical" static type-checking point of view, where user-defined type-checking amendments would never be possible) :

1. Are you willing, a priori, to always enforce the respect to the "usual" and covariant properties for tuple types ?

That is:

1.a. (A, B, C) <: (A, B) <: (A)
and
1.b. A <: U and B <: V => (A, B) <: (U, V)

where "<:" is the sub-typing relation, if any, in some sense.

(The covariance property being essentially shown in 1.b.)

2. Will tuple types' values be possibly mutable/with state as for class types' instances, or are we talking about immutable, "pure" tuple types' values (say, as if a n-tuple type were implemented by a pure function from the domain {1,2,...n} — n a natural integer — to the co-domain T1 x T2 x ... Tn) ?

If so (i.e., you'd want them mutable), what is your rationale ?

3. Does (or will) Magpie provide tuple field access by name ? By position ? By pattern matching ?

AFAIC, I'd want to have (1), I wouldn't want mutability in (2), and pattern matching is I think the most powerful/nicest-to-have for (3), but also needing the most work for you to implement — so, I guess either of the first two options might just do it for a start, there.

(Note I don't want mutability in (2) because it's just I basically agree that this opinion (re: tuple types) makes sense, if one wants practical tuple types in a practical language which also tries to keep things orthogonal as much as possible — well, yes: if you allow mutable tuple values, then I believe you introduce thru them an overlap of their semantic with class/object types', when the latter are stateful, too, that is. And since the breaking of orthogonality in the language often brings a higher probability of accidental complexity in programs' semantics at run-time...)

Thanks!

[EDIT]
Note I'm asking (1) above because when it's time for you (if you contemplate it) to embrace user-definable type-checking amendments over the tuple types' "nature", then you may have (well, note I'm not sure) to reifiy that sub-typing relation "( , , ,...) <: ( , ,...)" as well somewhere for them (like for the other natures — class types, functions/lambdas, ...) in your type-checker dispatch mechanism.

This, to take into account your choice of a duck-typing approach vs. the "nominal sub-typing" (i.e., driven by name hierarchies, most oft-encountered in "classical" OO) and that is, re: the tuple types' element types — following the same path of idea you had with those canAssignFrom/getMemberType annotations for class types w.r.t their relation with interfaces.

Now, I guess(?) that if there is no such notion of a sub-typing relation of some sort for tuple types (hypothetical relation chosen to be meaningful for the kind of operators you'd want to allow over tuple values), then there's no problem for tuple types w.r.t to their elements types, the latter being of any type natures in Magpie's built-in/base type system, and when combined with a user-definable static type-checking over them... and just ignore my question (1) above (well, unless someone sees another similar problem!)

& HTH

1.a. (A, B, C) <: (A, B) <:

1.a. (A, B, C) <: (A, B) <: (A)

It doesn't do that right now, but maybe it should. I'm iffy on whether I want to allow that. Doing so won't cause any problems, but it could hide some things that might be user errors (accidentally passing too many parameters to a function).

A <: U and B <: V => (A, B) <: (U, V)

Yup.

Does (or will) Magpie provide tuple field access by name ? By position ? By pattern matching ?

It does it by named field right now (fields are named "_0", "_1", etc.). Pattern matching isn't implemented yet but will be at some point, I hope. If not full pattern matching, it should at least have multiple assignment, like:

var x, y = 1, 2

But that isn't implemented yet.

Will tuple types' values be possibly mutable/with state as for class types' instances, or are we talking about immutable, "pure" tuple types' values

Immutable. They have to be or covariance would be unsafe. It would also make them generally less useful for lots of other stuff like hashkeys. If you want mutability, that's what objects are for. (Of course, strictly speaking, tuples are objects too.)

you may have (well, note I'm not sure) to reifiy that sub-typing relation "( , , ,...)

That should work correctly as it currently is. Like other type-checking operations, type checking a tuple just recurses into the same canAssignFrom on the fields, so a tuple of some new user-defined type should work fine. The code in question is just:

def Tuple canAssignFrom(other Type -> Bool)
    let otherTuple = as[Tuple](other) then
        // With the same number of fields.
        if count != otherTuple count then return false

        // And the fields must be assignable.
        for field = this
        for otherField = otherTuple do
            let otherTypeField = as[Type](otherField) then
                if field canAssignFrom(otherTypeField) not then return false
            end
        end

        true
    else false
end

You may notice that changing the first line where it checks the number of fields to:

// With enough fields.
if count > otherTuple count then return false

would give you the other subtype relation you first mentioned.

Thanks for the

Thanks for the clarification/complement info!

Well, sounds pretty nice to me, then, on the tuple types' topic, so far. :)

If you want mutability,

If you want mutability, that's what objects are for

(Oh, and that was part of my point, when I expressed the same preference about it, sure. Thanks again for the sound reply :)

Point for effort

Going down another rabbit hole myself I enjoy seeing others pursuing "pureness" in language design.

Your pattern rhyme well with an insight Ive had during my language design - namely that types are conventions. I'm heading down this rabbit hole (not for static checking) but with the idea that type conventions should be tested like unit tests - and hence allow the user great control over the definition of types. Your approach is interesting and I might end up with something like that along the way.

Would like to see Magpie presented in a somewhat more accessible manner (take the coffescript site as example)..

Docs

Agreed, more documentation is needed. I'm working on it, but I'm trying to nail down the basic object model first. A bunch of things are still in flux so Magpie isn't quite at the point where it's ready for other people to take a look at the language as a whole.

+1 from me, too, as I do

+1 from me, too, as I do find your ideas very interesting, indeed.

After reading your post about Magpie's type system boostrap, it looks to me that, at least from a design point of view, the language is already at the stage where it doesn't miss much (if anything) to be practical and useful. Anyway, and although I, too, am following other "rabbit holes", I look forward to reading more about Magpie's approach and details about its (dynamic, user-level) performing to type-check phrases written in it (which I believe novel as well, btw)... And possibly see how I can relate it as a useful complement to my own experiments.

Still have to spend some more time to think about it, though, but thank you for sharing your story, already!

Not there yet...

the language is already at the stage where it doesn't miss much (if anything) to be practical and useful.

Well, provided all you need to do is print to stdout, yes it's useful. :)

I have a lot of work left to do in terms of interop with existing Java code so that you can actually write programs that do real work, and fleshing out the object system so that things like constructors and inheritance work correctly. But it's getting there.

Once I've got it into a roughly usable language, I'll try to write a decent guide to using it, and hopefully it will be more approachable then.

Don't play nice with others.

Why do you immediately want to interoperate with Java code? This is a potentially fatal mistake that Scala made early on. You'll be stuck with supporting interoperability with Java (and running on the JVM) forever. And just for some hashtables, trees, and lists? Not worth it!

On the shoulders of giants

I'm fairly certain that Scala's design does not relegate it strictly to the JVM. There was (when I looked) a .NET version, albeit one that was ill-maintained.

Regardless, having a large existing standard library is an important adoption criteria for anything approaching real world use. No need to make more work for yourself if you don't have to.

That said, it's not entirely clear what is available to the methods you implement to define a Magpie type. Is it purely untyped/dynamic code? If that's the case, I have to question the utility of static typing when user code can cause errors in the type checker.

Is it purely untyped/dynamic

Is it purely untyped/dynamic code?

That doesn't seem to be the case. Static type checking as we generally present it (or have it presented to us and discussed) seems to be Magpie's "nominal case", as I understood it. But one of the language's innovations appears to have it possibly dependent upon user code (from which the type checker implementation dispatches itself) that is, in specific syntactical contexts/constructs for type expressions — before the actual run-time of programs.

Well, I hope I didn't misunderstand what I think is the language's creator nice idea that was/is worth pursuing(?)

From what I understood

From what I understood, the type checker hooks into two method implementations: getMemberType and canAssignFrom. The type checker will actually send the implementations of these methods (user code) into a (untyped?) Magpie interpreter to get the result.

In your post, you refer to them as 'in specific syntactical contexts/constructs for type expressions', which to me implies some limitation on what code may be used within these two method definitions. From the OP, I did not get the impression that there are any limitations on those implementations (which led to my last post).

That design decision will have a lot of practical implications, and I'm not sure that it has been spelled out.

I understood these methods

I understood these methods as being just a specific kind of type "annotations" (in Magpie's parlance, anyway), only purposedly introduced (or "adhoc" if one wishes) for the specific type-checking context of conformance in assignments (canAssignFrom) and the resolving of type members' ... well, membership (getMemberType). Both, of course, being relevant to type-checking tasks at call sites in the rest of the user code, later on.

But the actual explanation of this Magpie's feature seemed to me to be in the discussion continuing in the paragraph that follows in munificient's blog post: the one titled "A Side-Effect: One Less Language". That is where I understood it as being more syntax-and-user-code dependent than being just for a "one-spot language-adhocery-to-fit-them-all" in the language's type-checking; quoting:

With all types being actual live Magpie objects, that raises an odd question: “What is a type annotation?”[...] But in Magpie, even type expressions need to yield Magpie objects, so there is meaningful distinction between expressions and type annotations. Type annotations are just regular Magpie expressions, evaluated dynamically during type-checking. If you have a function like:

parseNumber(value String -> Int | Nothing)
// ...
end

That String is just a regular Magpie expression, as is Int | Nothing. In the case of the latter, | is just an operator on the Class class. This means that not only is Magpie’s type system extensible, even its type annotations are. If you wanted to, you could define new operators or functions and use them in type annotations[...]

There then, Magpie seems to try enabling the move of some of the type-checking decision rules from the language and its supporting base library themselves, towards the programmer's expression (unlike Java, C#, ...)

[Edit] As I said upfront, I think this is a Magpie idea worth developing. I can even extrapolate it (or is it interpolating it?) and think for myself of what would have been at least one possible use case, say, in C#'s past if it had had Magpie's approach to type-cheking, right from v1:

for instance, I'm referring to when they allowed "?" as part of static type expressions, e.g. "T? ...", and gave those the specific, actual meaning of "Nullable<T> ..." during C# (2.0) compiler's type-checking process. The bottom line being obviously the language's definition had to evolve, hence the v2, and the compiler to be leveraged, ...

Ack'ed, though, they did make profit of another orthogonal CLR change (generics) which was seen as needed anyway... But still: the language and compiler had to evolve if only to include this, and even if generics had been available already. (Note that it's not about the relevance/quality of that idea of introducing nullable types' "T?" syntactic sugar for "Nullable<T>" from C# v1 to C# v2, though; which I guess pertains to another discussion.)

Still, the questions come:

Still, the questions come:

In the case of the latter, | is just an operator on the Class class[...]

How does the Type Checker run that code when the code itself needs to be Type Checked?

Type annotations are just regular Magpie expressions, evaluated dynamically during type-checking.

Implies that they're not type checked at all, but it's not completely explicit.

If it's not untyped, does the type checker cascade down the rabbit hole? What happens when that code encounters an error? What constraints are put on that code to avoid some of these problems and/or to allow it to run in isolation?

The blog post is short on these (imo critical) details.

The blog post is short on

The blog post is short on these (imo critical) details

Agreed. munificient replied to this type of remark already, though ("I have a lot of work left[...]and fleshing out the object system so that things like constructors and inheritance work correctly[...]").

Anyway, this is where I'll be glad to learn more about the current language's design status per se (otherwise pretty sound and wise to me, aside this kind of novelty which does need more thinking), and then, w.r.t its available implementation.

What happens when that code

What happens when that code encounters an error?

Exceptions aren't in the language yet, so there's no happy solution for that yet. When they are, you'll likely get an type-check error saying that an error occurred during type-checking.

What constraints are put on that code to avoid some of these problems and/or to allow it to run in isolation?

Code representing types or type annotations is evaluated in a slightly different context from regular code, but there is overlap in that they share the same set of global variables. (One way to look at Magpie is that it uses the global variable scope as the type-checker's symbol table.) This means you could indeed do perverse stuff like muck with global state during type checking. I think the simplest solution is: don't do that.

The blog post is short on these (imo critical) details.

A fair point. I was writing for a general audience which is much less technical than the one here. While I definitely want the language to do the right thing, I have less motivation to fully formally specify things than an academic would. I'm not trying to publish papers on this, just create a usable language. (Ideally I'd do both, but my time is limited.)

Thanks

Thanks for the clarification!

I'd like to echo Y-Sharp's interest in seeing things move along, since it might seem as though I don't. Really, with the off-the-wall sort of things I'm dabbling with I shouldn't even think of throwing stones.

That said, the error during type-checking seems dangerous to me (while wearing the 'commercial bizdev' hat). It seems like a short step for someone to want to do something 'clever' with the type implementation like have it defined via DB schema, or XML, or some web service somewhere. Having your program not compile nicely because a database is unreachable (and having to debug that) seems like the sort of thing that makes me worry that the downsides are worse than the upsides.

I have less motivation to fully formally specify things than an academic would.

Again, not a critique on your work; in fact I for one appreciate the less academic tone. And I can certainly appreciate the difficulty in trying to translate brain to blog. Just trying to keep the discussion from roaming into hypotheticals.

That said, the error during

That said, the error during type-checking seems dangerous to me (while wearing the 'commercial bizdev' hat). It seems like a short step for someone to want to do something 'clever' with the type implementation like have it defined via DB schema, or XML, or some web service somewhere.

Yeah, I can see it being risky, but to me its no more risky (and hopefully safer) than any dynamic language in wide use out there right now. Sure, you could possibly get errors during type-checking, but is that any less safe than no type checking at all?

My expectation is that very few users would ever need to touch Magpie code at this level anyway. It's there as an implementation detail, in much the same way that printf() happens to be implemented in C, but not something that typical users would need to modify.

Even if you did want things like building types to reflect DB schemae, you could do that without touching the core type system itself: just imperatively build a class that reflects the schema. That's what I consider a more normal use case.

This is also my intuition

This is also my intuition about what this Magpie's type-checking extensibility feature usage should/will be:

My expectation is that very few users would ever need to touch Magpie code at this level anyway

Now, while in some cases, but not all hopefully, it certainly won't be trivial for people to make sure that their type-checking extensions terminate in a practical amount of time and/or convey the exact meaning they want to achieve in the programs using them, the very nice thing I hope to see happen is:

when they can, then they'll be able to coin those extensions "once for all" in Magpie libraries dedicated to this aspect, allowing for very interesting opportunities of incremental evolution of the language's type system itself — in specific extensions' usage context — and along with its expression power, per the mere use of such libraries.

I can't really speculate further than that (which is maybe too much already) — but I suspect that this may eventually lead you/us (after some time that Magpie implementations have been around and being used) to the need for more formal study regarding how such extensions are designed and the patterns to follow to put them to work best in programs (e.g., in the safest way possible, when it's one's primary concerns).

But of course I'm among those with the opinion that our (temporary) inability to formally prove important properties (or lack thereof) about innovations that came out of our intuition should not, a priori, be a reason to prevent us from starting to implement them — and then, to refine them, as we stumble upon practical or theoretical difficulties for the applications/use cases we target.

Again, I believe you have found a nice "thinking-out-of-the-box project" to continue with, there, in Magpie's design and implementation. :)

But of course I'm among

But of course I'm among those with the opinion that our (temporary) inability to formally prove important properties (or lack thereof) about innovations that came out of our intuition should not, a priori, be a reason to prevent us from starting to implement them

I agree. I think theory and proof is definitely important and I'm glad that lots of other people do it. But for me, that isn't as important a goal as trying to build a good language that I want to use and enjoying myself in the process.

That said, the error during

That said, the error during type-checking seems dangerous to me (while wearing the 'commercial bizdev' hat). It seems like a short step for someone to want to do something 'clever' with the type implementation like have it defined via DB schema, or XML, or some web service somewhere.

Note that this is what they are doing with Type Providers in F#. I think the upsides far outweigh the downsides.

Yup

The type checker will actually send the implementations of these methods (user code) into a (untyped?) Magpie interpreter to get the result.

That's correct.

Also, I surely agree with

Also, I surely agree with this:

I'm fairly certain that Scala's design does not relegate it strictly to the JVM[...] Regardless, having a large existing standard library is an important adoption criteria for anything approaching real world use. No need to make more work for yourself if you don't have to.

as I guess that, ideally, a new PL's design that tries to innovate in both the concepts and in the constructs used to put them at work, should not a priori exclude either of the two mainstream (and competing) VMs, well, unless the creator as strong reasons/opinions to do so...

That would be, I suppose, a bit sad in the sense that the language could miss interesting opportunities of use/applications, in the context of the respective tooling supports (sometimes disjoint re: user-friendliness, implementation availability, standards, etc) found otherwise on top of those (VMs).

But at the same time I can relate with Ben's point too, at least to the extent that a too much tight coupling between the language's design and its expectations w.r.t a specific underlying or sidekick VM is likely, precisely, to make more difficult the making of efficient implementations on others.

Since I suspect than the VM-strength evolution (marketwise) topic is an even more difficult bet to make in the mid-/long-term, together with the impression I have that Magpie's design tries to be heading to a language not meant for (only) academias who'd have strong preference in one VM... Then yes, I'd see the language's design relation with the JVM's best kept non-exclusive of .NET's (and others), well, as much as possible/feasible within the roadmap.

That said, it's not entirely

That said, it's not entirely clear what is available to the methods you implement to define a Magpie type.

Everything. Magpie is basically a dynamic/untyped core with a layer of optional static typing on top. So the parts of the type system that are implemented in Magpie use that base dynamic behavior. This means that even though the code defining the types is statically-checked, some of that static checking will likely occur *after* that code is actually called and used.

This also means you can indeed do dumb stuff like have a non-terminating type-checker, or even an impure one. On the other hand, that may let you do really interesting stuff like automatically generating types on startup by reading the schema from a live SQL DB.

I have to question the utility of static typing when user code can cause errors in the type checker.

That depends on your definition of "user code". Yes, the Magpie code in the base lib that defines the types can (and currently does) have bugs in it. Those bugs are part of the type-checker, so the type checker is buggy. It would be equally buggy if I moved that code over to Java.

It's also true that some other user code could accidentally break the type system by doing something screwy like replacing canAssignFrom() in Interface. I'd hope that most users are smart enough to not do that. (Conversely, smart users can use that to their advantage to extend or change the type system. Don't you wish you could fix the covariant array "bug" in Java?)

My theory is that people can get by in dynamic languages without any type-checking at all, so even a "breakable" system can still add value. My philosophy is more "we're all adults" here than "the language won't let you hurt yourself", though I'm always keen to minimize sharp corners.

Agreed

Understood. I'm not personally married to the JVM. Me saying "interop with Java code" really just means "be able to do anything other than just print". Right now, you can't write programs in Magpie that do real work because there's no I/O facilities or other "touch the outside world" support beyond print. I need to do *something* here obviously, but the degree to which it looks Magpie-centric or if it just piggybacks the JVM is up in the air.

It's not really about treating Java classes as first-class citizens, because Magpie's type system is pretty different. It's just about being able to have Magpie talk to the outside world in *some* fashion.

Well, provided all you need

Well, provided all you need to do is print to stdout, yes it's useful.

Well, I was more alluding to: Magpie's type system fundamentals, the way you addressed the expression problem, your choice to stick to static type checking, your choice to follow the uniform access principle (that I find very important and about which Meyer went in great lengths giving the reasons of, while designing Eiffel), your type masquerades, ... I like very much all of this in Magpie. A number of other languages in their early design did allow more than "to print to stdout", but too bad they missed the point of picking up an actual "right core" (that does more than just follow the time's fashion, for instance... as it happened too often), unlike what you've been doing so far, imho.

[Edit] Otherwise, I may have slightly different "tastes" in a couple other areas, like some features of the concrete language syntax Magpie seems to be heading to, but I regard them as less important to future-proof first than the points above — well, yes, as language design is all about future-proofing language concepts and constructs, even before future-proofing our programs written in it, isn't it? — especially since I was glad to see these (choices) also following the rationale that accompanies them, and I agree with.

If you don't mind...

I have a lot of work left to do in terms of interop with existing Java code so that you can actually write programs that do real work, and fleshing out the object system so that things like constructors and inheritance work correctly. But it's getting there.

Okay, and if you don't mind contemplating .NET's CLR as well, I'd love to help, just so you know. :)

By the way, I know, kind of

By the way, I know, kind of random and quite unimportant at this stage, but if only for the record (and doesn't really call for replies, even if I put a question mark) :

Magpie is a hybrid dynamic/static language

Then what about coining this as "stynamic" type-checking, in the case of Magpie? (Hmm... Ok, maybe just sounds silly. It's just jargon is funny to play with :)

Type checking quantification

I assume based on your description that you're not type checking generics until all type parameters are around. So you won't be able to express higher rank types, and will have to type check generics each time they get instantiated (do you plan on caching instantiations?), sort of like C++ templates. Is that right?

Good questions

I'm hoping that Magpie's dynamicism will let me simplify generics at least a little. Because types are already first-class in the language, you don't need generics for some places where you would need them in C++: you can just pass in a type as a regular value parameter. The only time generics come into play is when you want a type parameter to affect type annotations.

I assume based on your description that you're not type checking generics until all type parameters are around.

That's currently the case, yes. Magpie doesn't have generic classes or methods just yet, only generic functions. Right now, the type-checker checks the generic function body after it gets instantiated with a concrete type.

I don't mind that for efficiency's sake (it can always be optimized). What I don't like about that is that (like C++) an uninstantiated generic will never be checked. So you can have hidden bugs.

My current plan to address that is to allow type parameter constraints as in C# (similar to C++ concepts, I believe). Then I can check a generic function outside of instantiation by just temporarily instantiating it with the constraint types.

An actual instantiation then won't need to recheck the generic function's body: it just needs to test that the type parameters meet the constraints.

So you won't be able to express higher rank types

Good question. Magpie's type system is weird and open-ended enough that you may be able to sneak something like that (or possibly even dependent types?) in there, but I haven't looked into it. Most of my background is C# and C++, so I don't know as much about more complex type system features as I'd like. "Learn Haskell" has been on my to-do list for a while.

What I don't like about that

What I don't like about that is that (like C++) an uninstantiated generic will never be checked.

Also it means you lose modular type checking - you need the whole program to type check anything.

I agree that there are ways to fix this. Your setup is actually very similar to the type checking protocol I'm working on. One thing I do that you could consider, but might not care about, is to have the type annotations associated with expressions in a powerful logic (like Coq), so that the custom type rules can actually be proved sound.

Anyway, it's interesting. Good luck!

Just make type parameters

Just make type parameters into types and define type rules for them, and they you can typecheck bodies separately, by "instantiating" the type parameters to themselves.

If I understand you

If I understand you correctly, that's the current plan. You'll be able to specify a constraint, which is just a type, for any given type parameter. Then you can check the body by "instantiating" the parameter using the constraint itself as the type argument.

Then, when a generic is actually instantiated, we just need to check that the type argument matches the constraint (which is just the usual subtype relation). As far as I understand it, that's how C# works. I don't know if that's the best system, but it's one I understand, so I'm planning to start from there.

In fact, now that I think about it, that's exactly how checking regular function arguments works, and is indeed critical there for dodging the need for whole-program analysis. If I'm clever, I should even be able to reuse the code for both of these cases...

Yes, to have studied/used it

Yes, to have studied/used it several times (for different purposes, in different use cases), I can confirm you that this is (at least, the essential of) C# v2.0 (and above)'s compiler type-checking scheme idea, regarding constrained generics:

As far as I understand it, that's how C# works

From my experience, though, it's also often useful to remember, and again in the case of C# v2.0+ on top of .NET's CLR 2.0, than this isn't a language feature that you can consider only/purely at the level of the language's design decisions which had to be made.

What I mean is:

well, it works pretty fine for C# because they have, also, let's say, "cheated a bit" in some sense — and I suppose, was done (maybe?) if only to ease C# 2.0's implementation, but more likely so that other .NET v2+ languages can make profit of it just the same — by conveniently reifying formal generic type parameters in .NET v2 and provided introspection (reflection) support for them, & at the BCL level, thus, for generic types instanciation (and related run-time querying over types too, to some extent).

Note that it doesn't mean that all aspects of C# generics have been (or had to be) reified in the .NET v2+, but the subtyping relation conformance constraint that some formal generic type parameters can bear (as an option, since unconstrained genericity is also possible) is definitely so, if only for one.

(This is still today, I believe, one (if not "the") main big difference between C#'s and Java's support for generics — would/could all other things in both phases be considered "equal" otherwise, outside of type introspection: during compile-time as much as during run-time.)

& HTH

Yes. I find it helpful to

Yes. I find it helpful to think of type parameters as "real" parameters--i.e. they are extra arguments to the function or method, or are extra fields in the class. In fact, from what I read of your design, since types are first class, you could easily use this as an implementation strategy.

Given this view you can think of polymorphic specialization as a form of value specialization.

I agree with this advice

I agree with this advice,

In fact, from what I read of your design, since types are first class, you could easily use this as an implementation strategy

which I think is a wise one, precisely in order to avoid that kind of problem or "unwelcome tediousness" that Magpie implementations coming after the first one, might encounter:

e.g., if one relies on a possibly too strong adherence to some specifics of a VM's underlying run-time types, w.r.t to their representation and semantic found in that one (first chosen) VM only, be it the JVM or .NET, doesn't matter.

And there, generics are the typical example of a strong, essential design choices difference between the JVM and .NET: let's say .NET would be picked up first. Then, trying to make much (too much) profit of the "built-in" BCL support for generics type arguments in the CLR v2.0 (though my reminder about it above could appear seductive, if one has the idea to take some "shortcuts" thanks to it) would certainly require a significant refactoring when the JVM is contemplated in its turn.

Also, anyway, Magpie's design seems to be heading to give a specific interpretation of what the static type-checking of generic constructs consists in, and when/how it will be triggered: hence another reason, IMHO, to follow the kind of approach suggested by Ben, and not rely too much (if at all) on what whatever VM "says" on the topic.

Agreed. Magpie is

Agreed. Magpie is implemented purely as an interpreter right now hoisted well above the JVM. Magpie's concepts of types and generics aren't at all tied to how the JVM thinks of them.

Reified generics won't happen on the JVM

And there, generics are the typical example of a strong, essential design choices difference between the JVM and .NET: let's say .NET would be picked up first. Then, trying to make much (too much) profit of the "built-in" BCL support for generics type arguments in the CLR v2.0 (though my reminder about it above could appear seductive, if one has the idea to take some "shortcuts" thanks to it) would certainly require a significant refactoring when the JVM is contemplated in its turn.

Sadly I think the choice to implement Java generics with erasure has precluded reified generics in the JVM by setting up a constraint system for backward compatibility that is unsolvable. No one that I am aware of in the JVM implementation community is willing to step up to the plate to bite that one off, and the entrenched bureaucracy would reject it anyway. Those in charge of steering the Java language don't see the potential language-level benefits as worth much, and the dynamic languages crowd don't want and don't need VM-level generics.

So it's up to you! :(

You're going to face the monomorphization problem at some point, whether you target the JVM or any other platform that doesn't support type parameters (which of course includes "real" machines like x86 and PPC). If you have no primitive types in your language, then the problem is considerably easier, because you can use a uniform representation for all values. If you have non-uniform representation of values there is no escape--it is going to require specialization, at least to the representation level.

Those in charge of steering

Those in charge of steering the Java language don't see the potential language-level benefits as worth much

Which I find odd considering the hundreds of security bugs found in JVM bytecode over the years, particularly as regards "full abstraction" failures, where the CLR has had comparatively few.

since types are first class,

since types are first class, you could easily use this as an implementation strategy.

That's exactly right. When I first started working on it, I thought I could dodge the need for explicit generics entirely: just pass in a type as a value argument.

It turns out they are needed for one thing: to make the type argument available for later type annotations. Without generics, I could make a List class whose constructor took an argument that was a type so that it knew its element type. But what I couldn't do is have the methods on that list then use that element type in their annotations.

So generics in Magpie really just boil down to regular functions at runtime. The only difference is that at type-check time, they are evaluated instead of just checked, and the arguments get bound to names in a scope that's visible to the type annotations.

How do you know which

How do you know which arguments to evaluate at type checking time? Is there a syntactic distinction, like foo[T](a,b)?

Exactly right. Parentheses

Exactly right. Parentheses are regular function application. Square brackets are "static" (check-time) evaluated.

So generics in Magpie really

So generics in Magpie really just boil down to regular functions at runtime. The only difference is that at type-check time, they are evaluated instead of just checked, and the arguments get bound to names in a scope that's visible to the type annotations.

Interesting. So type checking is a form of partial evaluation in your language. In this case, static and dynamic bindings are distinguished syntactically, ie. [] application instead of ().

I wonder if there might be some sort of inference algorithm that could distinguish static/dynamic without the syntactic distinction. Still, exposing static evaluation to the user has its own benefits. Is this static evaluation fully general, or specific only to types?

I wonder if there might be

I wonder if there might be some sort of inference algorithm that could distinguish static/dynamic without the syntactic distinction.

You might be able to look at the type annotations that occur within a function body. If any of them reference one of the function's formal parameters, you could infer that that function needs to be statically evaluated.

But I kind of like having a distinct syntax for that for comprehensibility.

Is this static evaluation fully general, or specific only to types?

Fully general. Magpie doesn't actually even distinguish between types and other kinds of values since types are first-class. You could use an expression that evaluates to some other type as the argument to a statically evaluated function. In practice, I'm not sure how useful that will be (maybe you could approximate something like dependent types with it?), but there's no cost for that generality.

Types are just regular values, so the grammar for type annotations is just "regular Magpie expression" and semantics for evaluating a type expression are just the regular evaluation semantics.

Fully general. Magpie

Fully general. Magpie doesn't actually even distinguish between types and other kinds of values since types are first-class.

So my follow-up question then is, why not generalize this to arbitrary staging ala MetaOCaml? Looks like you already have quote, run and splice, but you've limited it to two stages: compile-time and runtime.

Good question. I may

Good question. I may eventually end up doing that. As a programmer, any time I see a fixed number of something, my inclination is to say, "Can I make it support any number?" and two "stages" of compilation falls into that.

I'll have to do some more research and thinking to see if that makes sense. I like increasing generality, but (unlike a lot of other programmers), I'm willing to give some up if I get a lot of usability in return. If two stages really is the magic number that almost all users will need, by sticking to that, I may be able to make the language easier to use. But I'm not sure, I'll have to keep plugging away at it and see what happens.

If a language feature is too

If a language feature is too hard to write, or too hard to explain, it is probably too hard to use well and nightmarish to debug. Language design is not a competition in conciseness - a language feature has to prove its worth in being useful in all parts of a programs lifecycle.

Complicated type systems are a barrier to entry, and getting useful error messages from wrongly typed programs is still a challenge. Getting useful errors from user defined type checks will be fun too.

I would advocate going for the simplest thing that works that gives you the semantics you require, regardless of how terribly slow it will be. Write it as a purely dynamic language and only check types at function invocation/exit.

Correct then fast.

Once you have a system working, then you can start to optimize it by incorporating the static checking you require. Good luck with a turing complete type system though, it won't be easy. You could start by partially evaluating your program with known static data, or by defining a meta-stage programming language to make distinct which parts run at compile time and which at run time. Either one will make your compiler insane by several orders of magnitude.

Aside:

As Hoare said - you've got two options as a language designer - consolidation or invention. If you're going to invent a language feature, build it off an existing language. If you're looking to build a new language, built it out of existing features. Don't despair though if you are confined to the latter. You would be surprised what features and mechanics have been implemented and described.

For example, you may like the erlang approach: A dynamically typed language with optional static checking with a tool. Like in strongtalk, the type signatures are ignored by the compiler. Strongtalk decided that it could make dynamic dispatch fast enough anyway, rather than embedding a type system within a language.

I think Gilad Bracha has been an advocate of pluggable types -- type systems and annotations do not change run time semantics of programs (a bit like not having side effects in asserts), but allow simpler checks. The key idea is that you can plug any formal verification onto the language.

On duck typing: I think Wadler has done some work on dynamic casts in static typing - his paper 'Threesomes, With and Without Blame' covers this: http://docs.google.com/viewer?url=ecee.colorado.edu/~siek/popl10.pdf

If a language feature is too

If a language feature is too hard to write, or too hard to explain, it is probably too hard to use well and nightmarish to debug.

I agree that simplicity is important, but I think good API design lets you decouple complexity of implementation from complexity of use. Good hashing algorithms are quite hard to write and explain, yet a hashtable is easy to use.

My personal metric for a language feature is how much code it takes to implement. Understanding a feature requires a user to mentally simulate it, so if the code is complex, they're mental effort must be too. If the code seems straightforward, I figure the feature is comprehensible.

I would advocate going for the simplest thing that works that gives you the semantics you require, regardless of how terribly slow it will be. Write it as a purely dynamic language and only check types at function invocation/exit.

That's the current plan. (And Magpie is indeed hilariously slow right now.) But my one of my most important required semantics is static checking. If Magpie is just another dynamic language that does type tests when you call a function (which kind of defeats the purpose to me) then I wouldn't have a lot of motivation to create it. It's the fact that you can imperatively modify classes but then still get type-checking errors before main() is ever called that makes the language interesting to me.

Complicated type systems are a barrier to entry, and getting useful error messages from wrongly typed programs is still a challenge. Getting useful errors from user defined type checks will be fun too.

Agreed, but there's a distinction between the type system of the language and how the type system is implemented. The implementation of the type system in Magpie is a bit... odd, granted. But the type system actually implemented on top of that is pretty simple: just classes, interfaces, tuples, arrays, and functions.

Assuming Magpie ever has actual users, I don't expect the majority of them to ever venture beyond that, any more than the average C++ user modifies the STL. But I consider the fact that that power is there to be a bonus for the minority of users who can wield it effectively.

Either way, I didn't actually set out to create a weird self-hosted type system. All I wanted was a dynamic language with a static checking pass, and this was the simplest way I could figure out to implement it.

If you're looking to build a new language, built it out of existing features. Don't despair though if you are confined to the latter.

That's my intent with Magpie, and even the motivation behind the name: take the shiny features I like in other languages and (hopefully!) integrate them together well. The challenge is that two key features I want (dynamic open classes and static type-checking) are pretty strange bedfellows. The interaction between the two ends up leading to more novelty than I'd like. If I'm lucky, I'll get them to play nice together.

I think Gilad Bracha has been an advocate of pluggable types

Exactly right. His advocacy of optional types is a big inspiration for me and is how Magpie works right now (although I'm not sure if it will always work that way).

This is what triggered my

This is what triggered my interest,

It's the fact that you can imperatively modify classes but then still get type-checking errors before main() is ever called that makes the language interesting to me

yes, AFAIC.

Strongtalk and Erlang...

* I didn't actually set out to create a weird self-hosted type system. All I wanted was a dynamic language with a static checking pass, and this was the simplest way I could figure out to implement it.

It is easier to implement if the static pass doesn't have to show the program is type safe, but just rejects any program it can show to be type unsafe.

This is the notion of 'success typings' - you infer what the most general types functions can accept, and with a little bit of analysis you can find contradictions - some type errors. If you can't find a contradiction it does not mean the program is type safe, but hey that is what the dynamic part is for.

Both strongtalk and erlang are dynamic languages that can have a static checking pass without the self hosting part. Erlang uses something like the success typing idea in a tool called dialyzer.

Strongtalk argues it doesn't need static type information to have efficient dispatch through the use of 'hidden classes' and 'polymorphic inline caches' (built upon the work of the self vm).

Meanwhile Wadler seems to have incorporated the notion of blames into a dynamic type system, so that when you can't show a check is safe, when you make it at runtime you know where the problem happened. This gives you as much static checking as you can, but better diagnosis on runtime type errors.

All of these are pretty powerful approaches, some of which adopted in modern languages.

I don't think you need the self-hosting bit to get a workable tool with dynamic classes and some static checking.

Yes and no...

If a language feature is too hard to write, or too hard to explain, it is probably too hard to use well and nightmarish to debug... Complicated type systems are a barrier to entry...

tef is speaking here about language features that are exposed to the user, and in that sense I agree with him (up to a point - see below). But it's important to distinguish between the externally visible language and the internal framework on which the language is realized. For example: there are advantages to an under-the-covers type system that knows about regions, even if the language itself doesn't expose that concept or exposes it in a limited way. The added internal sophistication provides useful information for diagnosis, error discovery, and so forth. It also ensures that the language is built on a consistent and extensible core, which helps ensure that the language is evolvable later.

Concerning visible complexity, I agree with tef with one caveat: inconsistency is a form of complexity in its own right. If a feature is partially exposed, in such a way that it doesn't compose naturally in the presence of other language features, then the language will feel broken in the eyes of the user.

This is the real reason that new language features are hard to integrate: it's very easy to introduce them (perhaps inadvertently) in a non-composable or not-surface-consistent way. In my observations, that is the thing that really drives users away. Features they don't understand can simply be avoided. Features they think they understand that generate surprise are deadly.

keeping it simple

Regarding “visible complexity” (comment-62757), cf. comment-1184:

Lots of time spent saying “keep it simple,” but what did they mean? Only Scott McKay defined what he meant by “simple” and that is The Right Thing, i.e. simplicity and generality of interface no matter what the cost in complexity of implementation. That's totally different to Hoare/Wirth-brand simplicity of interface and implementation over generality and/or clever optimization.

totally.

my point was about features of the language that the user, and I didn't mean the internal mechanisms on which the language is built.

another form of complexity is when abstractions don't cover failure cases of the code

(for example, python does not handle EINTR, and so you must re-implement some file handling in robust software)

Not exactly the way you did

Not exactly the way you did it, but similar approach is fairly common in non-mainstream dynamic languages. A language will support optional type annotations (using either a special mini-language or run-time type assertions) and have a type-checker as an external tool that users run at will. The type-checker itself will be written in the same language, thus can take itself as an input. Examples include, if I recall correctly, Strongtalk and Cecil.