Eric Lippert's Sharp Regrets

In an article for InformIT, Eric Lippert runs down his "bottom 10" C# language design decisions:

When I was on the C# design team, several times a year we would have "meet the team" events at conferences, where we would take questions from C# enthusiasts. Probably the most common question we consistently got was "Are there any language design decisions that you now regret?" and my answer is "Good heavens, yes!"

This article presents my "bottom 10" list of features in C# that I wish had been designed differently, with the lessons we can learn about language design from each decision.

The "lessons learned in retrospect" for each one are nicely done.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I think he has a good list,

I think he has a good list, and his questionable list is also well motivated, but I'm surprised that the arbitrary restrictions on type constraints weren't mentioned. For instance, you can't constraint a type parameter T to be a System.Delegate, a System.Enum, a System.Array, or a sealed (non-inheritable) type.

There is literally no good reason to have them, as it weakens the equalities you can express for no gain, and the underlying CLR supports these constraints. They added array covariance to C# because the CLR already supported it so it came for free, but why restrict these?

random comments

For the void type, the obvious clean solution is to give the type a value. Let it be an optimization, rather than part of the language spec, that that singleton value usually doesn't need to be passed around. As one example among many, suppose you have a map() routine that runs on any kind of function, and then you try to pass it a function that returns void? Why shouldn't you be allowed to do that? Do you have to define a separate foreach() just to support the void type? It's silly.

For lambdas, it's been interesting watching them go from an exotic functional-language feature to something that every OO language designer likes over the last few decades.

Likewise, it's been interesting seeing different designers gravitate toward a uniform syntax of "kind name type details" being a great way to encode all manner of different constructs.

I don't think it would improve things to have a separate integer type for bags-of-bits. It's really nice for both programming and for language implementation that Java has a simplified set of integer types, and the unsigned integers in C are especially prone to bugs.

Many languages seem to have trouble with enums. Perhaps designers think of it as a little problem not worthy of their attention?

Possibly missing from the list:
- Better value types. He discusses why a struct needs to have all this boilerplate for equality, but why doesn't the "struct" keyword make it all work correctly? And also give you an ordering operation, and a toString, and a serialization method? Every OO language deserves something like a case class.
- Reified generics. It's better to compile in such a way that you can communicate across multiple languages as well as across the network, and it's hard to do that if your ABI involves a particular kind of generics that is never going to quite match what any other language is doing. Type erasure in Java has real practical benefits.

Type erasure is a mistake.

Type erasure is a mistake, both for performance and interoperability. Even Java designers agree now, and they're going to fix it as far as that's possible. https://en.wikipedia.org/wiki/Project_Valhalla_(Java_language)

Academia will follow in ~10 years ;-)

That depends on the use case

For example, I believe the Scala implementors have stated multiple times that there are things in Scala that would have simply been impossible to implement without being able to just cheat by having e.g. List everywhere. Granted, it might be the case that those issues would not exist these days with dynamicinvoke and whatnot, but...

(There's also an argument to be made that reification combined with reflection makes it much too easy to subvert parametricity.)

In short: It's a non-trivial issue.

EDIT: I should say that, personally, I'm not at all convinced that having a multi-language VM is necessarily a good idea.

Right. These are engineering

Right. These are engineering considerations. Ada's emphasis on real time system comes to mind regarding the compilation to generics.

re: cheating

Has anybody brainstormed how to do whatever Scala needed to do, w/out that kind of cheating? It seems like good food for thought, that such problems exist.

They would have had to give

They would have had to give up the JVM and thus Java interoperability... and that would have meant that they would have gained no traction. So: "Death". :)

EDIT: I'm sure the language, or at the very least the implementation of the language, would have been much better and simpler, but... see above. :)

re: multi-language VM

So a non-virtual hardware machine supports multiple languages. What is it about VMs that causes them to be less amenable to that? Are there other possible ways to do / connotations to have about "VM" that would work better (and still be of any use)?

Two languages implemented on

Two languages implemented on a hardware "VM" don't try to share an object model. You need explicit conversions or wrappers from one language's values to the other.

What kind of things are

What kind of things are impossible? Wasn't there Scala.NET even?

Reflection is at fault for the failure of parametricity. Reflection violates it even with type erasure.

That was back when scala was

That was back when scala was much simpler. Note: you can cheat with the CLR as well, but give up the ability to interface with generic using libraries. Many core libraries don't bother with generics, maybe for this reason but probably because they were written before generics were a thing.

What kind of things are impossible?

Higher-kinded types.

Not supported by the java vm or clr. But erasure makes interoperability possible in the former case.

As per my previous post,

As per my previous post, erasure is possible in the CLR; you just don't use CLR generics. Anyways, the main reason scala.net didn't go anywhere was lack of interest - C# wasn't lagging so far behind as Java, while F# was considered decent by FP enthusiasts. Also, even when I was there, there was already talk of a JavaScript backend with an eye on being where the users were (.net is mostly enterprisey).

I'd love to hear more about

I'd love to hear more about this. My first impression -- from mailing lists and such -- was that, in fact, reification was one of the biggest hurdles for Scala.Net (or whatever it was called), but I'll happily admit that my impression was wrong given enough evidence! :)

(I'll happily agree that, given enough escape hatches, anything is possible.)

It was a problem to be

It was a problem to be solved (for interop), but I think the bigger problem was that the person doing the work moved to austrailia and there just wasn't much community interest in solving scala.net's problems.

I took a whack at it when I started at Microsoft but quickly lost interest, C# just wasn't that bad.

Haha :) Is "moved to

Haha :) Is "moved to Australia" a euphemism, or do people who work on the Scala compiler just generally emigrate to foriegn, sunnier, lands, or what? ;)

(I've mentioned Paul Phillips in this thread, and I tend to believe him about the goings-on in the scala-compiler-development-area.)

Uhm, no, I meant that guy

Uhm, no, I meant that guy literally moved to austrailia. It happened while I was there.

Fair enough, I guess.

Fair enough, I guess. Personally, as sentences go, I think it ranks along the likes of "my wife just walked in as a was Hoovering The Talking Seal" :D.

I think you're basically

I think you're basically right, but I also seem to recall something about the whole "traits = interfaces with implementations" thing being basically impossible on the JVM without being able to just cast everything to Object. (Just for reference I think I gleaned this from a talk by Paul Phillips. He may be disgruntled, but he does know what he's talkiig about.)

Paul Phillips will be the

Paul Phillips will be the first one to tell you he knows what he's talking about, and that should be quite telling in itself.

The JVM doesn't support traits very well, but they are implemented in Scala as interfaces with inlined state for extending classes, so object casts aren't technically required (since all types involved can be modeled as interfaces). This is quite bad for separate compilation of course, but otherwise completely workable. (It could have changed since 2007, but I doubt it since the solution was robust enough)

Don't diss paulp, because he

Don't diss paulp, because he has way more cred[1] than you.

(I'm emphatically not trying to start a fight, but...)

[1] I should say: Publically established cred.

Right, and he'll be the

Right, and he'll be the first one to tell you that. You know, if someone has to emphasize over and over again how much code they wrote and there checking to the scala code repository, oh, and how bad this others guy code was even though it did things that even today's scala IDE can't accomplish, then really something must be wrong. The scala community is a graveyard of self promoters and huge egos, but at the end of the day nothing good really comes out of those behaviors.

Please stop. Your epidermis

Please stop. Your epidermis is showing :).

EDIT: Btw, you're not engaged in shameless self-promition almost daily on HN? That's a surprise to me. Please get real and just admit that you have some sort of problem with the guy.

(I don't mind people having interpersonal problems... I have problems with people denying -- incredibly -- that they have interpersonal problems.)

EDIT#2: Anyway, I'm off... have your last word :) I may respond tomorrow, but whatever...

I don't promote myself like

I don't promote myself like that, at least not in that way. Anyways, it is not like my thoughts about this are unique, and I'm very happy to be out of what I see as a very toxic community. Live and let live.

What do you mean?

I'm very happy to be out of what I see as a very toxic community

Are you leaving HN for good. then?

No, the Scala community. I

No, the Scala community. I was having nightmares about it and developing an ulcer before I decided to just give it up. Nothing is worth that.

HN is fine, people are quite pragmatic and anyways it never gets personal; the same with LtU. In the Scala community, it was always personal. paulp came after me, but my only interaction with him was extremely personal. And you know, someone like Martin earned their cred, they don't have to brag about it, anyone who sees their work or works with them knows that they are good. But this guy, wow.

You know, I don't have much cred, I admit that, I also don't try and tell people my work is good, I just exhibit it and let them judge for themselves. This is how it should be, they either like it or hate it. And in the field I'm working in (inventive PL design), there are plenty of critics, plenty of personalities, plenty of judging, but it is done in a way that is at least...not health debilitating.

I see. I don't perceive the

I see. I don't perceive the Scala community as particularly toxic, but I guess it might just be me "being used to it" and being pretty thick-skinned in general.

Re: Martin vs. PaulP: I haven't had any personal interaction with either of them, but my impression obviously differs from yours. FWIW, I don't think that Martin is "stupid" (or any such thing), I just think he's pig-headed and in some sense *wrong* to pursue the direction that he's going -- I'm not sure if he actually needs grant money these days, but as someone put it Scala seems to suffer greatly from the "grad student/grant money attention deficit disorder". Martin is obviously extremely intelligent -- otherwise he couldn't have come up with the giant mess that the Scala Collections turned out to be!

I actually use Scala in day-to-day work, but not because it's objectively "good". It's just the least bad option if you're targeting the JVM. (Scala.JS was a huge boon to productivity.)

I actually use Scala in

I actually use Scala in day-to-day work, but not because it's objectively "good". It's just the least bad option if you're targeting the JVM.

No love for Ceylon? It looks promising.

Yes, Ceylon is indeed very

Yes, Ceylon is indeed very interesting. A colleague of mine investigated doing practical things in it and found it... interesting, but lacking for real-world applications.

(We will re-visit it periodically.)

A colleague of mine

A colleague of mine investigated doing practical things in it and found it... interesting, but lacking for real-world applications.

Was this documented anywhere?

Alas, no -- it should have

Alas, no -- it should have been a blog post at least, but... As it is, it's a "personal communication" type of citation. Apologies, and take it for what it's worth.

EDIT: ... which may not be much, but I seem to recall the main isues having to do with the lack of higher-kinded types (which has been adressed recently!)... and possibly something about unrestricted side-effects, but that latter part may just be my invention(!).

Fallacy battle

I'd suggest you both stick to technical conversation topics, that is, how to compile Scala to (non)reified generics. What Sean wrote matches my experience/recollections and makes sense. In general, you can run any typed language on the JVM with (significant) overhead — just erase all types to MyLanguageAny.

Otherwise, you're both sampling logical fallacies: one argues from authority (or rather, from what he remembers from a talk of the authority), the other does an ad hominem on the authority, then it becomes personal.
If we were to discuss elsewhere the Scala community (and we probably shouldn't), I could dig relevant evidence, but for now I'll refrain from posting it.

Agreed. I'll try to avoid

Agreed. I'll try to avoid personal things from now on. Thanks for the reminder, btw! It's much too easy to get carried away and "personal" and thus non-substantial during these arguments.

That said... my original point in bringing up paulp was the he's an undisputed expert in the scala compiler implementation (until until ca. late 2014), and that this means that we should take his opinions seriously when it comes to actually implementing a compiler and if/how Scala's interop requirements make the language, compiler and/or tools more complicated.

Theory may have advanced since then, but his 2014 lecture circuit provided some incredible food for thought about how to practically design a practical language. Especially the talk about collections and why the Scala Collections design is horribly misguided and wrong.

EDIT: It's no secret that I'm a Haskell fan, but paulp (during one of his talks) gave us a really valuable insight, speaking about his failure to install GHC on a new-fangled OS X Mavericks (or whatever it was): If your "language" (i.e. compiler + tools + &c) doesn't work out of the box without further fiddling... then it's not going to win. Platform support is a huge thing these days, which is why I'm quitely hopeful for GHCJS, Scala.js and similar things. :)

EDIT#2: Btw, I apologize. This should be a place for dispassionate discussion, and I kind of... failed to be dispassionate. I apologize to everyone.

Which is the JVM talk?

Which is the JVM talk? https://www.youtube.com/watch?v=o2WtIroR7Ag ? (Not watched yet).

To understand Paul's claim, somebody will have to take a look and see the argument (of course, nobody has that duty). He's an "authority", but arguing from authority is just a (useful but dangerous) heuristic. Without understanding the actual argument, we won't know if it's purely technical and absolute (and then it's just right or wrong), or it's contingent on technical constraints we don't care about, or on value judgements.

While you can't argue from authority, they tend to be right more often, so it's often worth listening to them (though this is still an unfair heuristic). Since it's Paul and I know he's smart, there's a higher chance I'll get to watch that video (I tend to hate watching videos, just like him; slides are much faster).

Sorry, this is just a sore

Sorry, this is just a sore topic for me. I really want nothing to do with scala and probably shouldn't go near conversations about it. My only interaction with paulp was quite negative; I'm glad to be out of it.

You know, many people took

You know, many people took many whacks at the scala collections, including myself. And it was a hard problem, quite easy to get wrong, requiring lots of experimentation. People just don't appreciate that. They see result X and assume it was caused by Y (e.g, bored grad students), when that was completely wrong. It was/is just a hard problem with alot of competing requirements. But really, people imagine what they want.

It was a self-made "hard

It was a self-made "hard problem". I don't see the same kind of fidgeting over abominations like "CanBuildFrom" in e.g. O'Caml or Haskell. It was a situation where the people in charge said "oh, we have features X,Y,Z, let's see if we can use X,Y,Z to implement collections!" rather than "how do we make collections maximally useful to the programmer?". (The inheritance chain on most Scala collections is completely insane -- to save, what, two or three lines of code?)

Does haskell's rich

Does haskell's rich collection library have to satisfy both in-place mutable and immutable uses, functional and OO styles of uses? Does it have to be compatible with the JCL? Or you could fork that into different libraries to satisfy each demand, but would that be better?

And that is the crux of it: scala is a big tent language and that tent is way too big, at least for the most opinionated users to be very happy. Haskell, in contrast doesn't have to worry about this, it is quite opinionated already and you either like that or not. Scala proves that you can't make everyone happy, and you'll just get yelled at for trying.

Nope. I'm sorry, but that

Nope. I'm sorry, but that kind of unjustified assumption is fine in a pub. "I don't get X" doesn't mean it doesn't make sense.

Haskell/Ocaml's interfaces are trivial, but they have their own downsides: in Haskell you can't encapsulate which kind of sequence you choose.

People have experimented with not sharing so much code as nowadays, but they had problems with code duplication, inconsistency, and so on (http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf). A damning indictment is that these problems still survive.

I have strong issues with what actually happened with Scala's 2.8 collection library, but it's a different story. (Basically, too few iterations over the design before committing to it).

Jules Jacobs (who's also around here) wrote a couple of blog posts pointing out problems in all collection libraries around. Including the one in Haskell. (You'll see different designs for those too, with people dissing them as well.)

https://julesjacobs.github.io/2014/10/16/the-best-collections-library-design-1.html

One point is: CanBuildFrom ensures that e.g. map on a collection of type X returns a new collection of type X if appropriate (I'm aware I'm oversimplifying). Jules claims that map, instead, should always return a sequence, whatever's the source.
If "true", that would remove a huge source of complexity — but lots of libraries don't do that. Also, such a claim is a value judgment over what is more convenient, so we don't have an easy way of "checking" whether it's true or not.

Note that C#/Linq made the

Note that C#/Linq made the right call here, all comprehensions return an uncomputed iterable. But this makes them much less useful for many cases, and I find myself using Linq less and less over time. They also make debugging a bit of a pain, as control flow deferred constructs typically do. For loops support breakpoints, at least.

This might still be

This might still be pub-level talk, but ask paulp about the collections and how/why they are are broken :). See https://www.youtube.com/watch?v=4jh94gowim0

One point is: CanBuildFrom ensures that e.g. map on a collection of type X returns a new collection of type X if appropriate (I'm aware I'm oversimplifying). Jules claims that map, instead, should always return a sequence, whatever's the source.
If "true", that would remove a huge source of complexity — but lots of libraries don't do that. Also, such a claim is a value judgment over what is more convenient, so we don't have an easy way of "checking" whether it's true or not.

And that's solving a non-problem -- in practice this isn't a big-enough deal to justify the enormous cognitive load of CanBuildFrom and the insanity that follows, especially regarding compilation error messages. (Btw, the collections are still unbelievably buggy. Understandably.) paulp refers to this whole idea as the "bitset gimmick" and he's absolutely right that it's a gimmick. It has marginal value at best... maybe we should just take a step back and not try to force $GRAND_IDEA onto collections...?

It's also a completely artificial problem: If we didn't have incredible levels of subtyping and hierarchy, then it's not actually a problem! For example, in Haskell an fmap just gives you back the type of structure you started with. Parametricity... simple!

RE: Jules' article on collection design

That's interesting. Jules, have you posed a link here before? Have you already done work on part III?

Higher order

It does make sense to return a sequence so that results don't have to be stored and can be generated as needed - and also so that the assumptions of the original collection don't have to apply to the result collection, such as uniqueness or whether the elements are orderable.

But it might be easier to specify arbitrary processing as loops over collections - consider such "filters" as finite impulse response filters - such filters rely on a neighborhood of values.

Or on other sorts of optimized filters that process whole blocks with a little bit of overlap blocks, such as FFT based filtering or approximations that use recursive filters with some sort of overlap on blocks and crossfade.

I mention these more complex filters because I remember someone had a presentation on translating arbitrary filters into streams automatically. It involved coding in a simpler, more direct form and having some sort of metacompiler translate the code into a different form.

So instead of having collection or streams, you have some sort of metacompiler. I didn't understand the talk, which was probably attempting things much simpler than my examples, but I liked the idea.

Object-orientation creates hard issues

(I know relatively little about Scala and less about the specifics of the "new collections" design, although I have seen examples of the corresponding signatures and understand them. My point is not to contradict anyone in this conversation, but rather point a subtle technical aspect in the design of object-oriented languages that not enough people know about.)

It is not completely fair to compare OCaml and Haskell's design with Scala's, because an important point with Scala is that it is an object-oriented language. For fundamental reasons, designing APIs can be harder if you stick to an object-oriented style. The best example I know of is from the article Generalized Algebraic Data Types and Object-Oriented Programming by Andrew Kennedy and Claudio Russo, 2005: the function "flatten : List (List a) -> List a" cannot be expressed in OO (Object-Oriented) style as a method of a List<a> class without using a very powerful feature that is the OO counterpart of GADTs, namely equality constraints in method prototypes (it is unclear to me whether or not Scala supports this feature, while I looked carefully at it a few years back; in any case it is not widely known and probably not robust enough to use).

The problem is that the method "flatten" of the class List<a> is only well-typed if the type (a) happens to be equal to a type of the form List<b>.

To work around this limitation, a different style is used in most OO languages that have a List class and implement a "flatten" feature somewhere in they standard library. Making it an external function works, but the Scala choice (even before the "new collections" work, I believe) is to reify the static assumption (exists b such that a = List<b>) into an implicit witness that a can be converted in List<b>.

I understand the "new collections" design in Scala to be a result of a further exploration of this direction. I find it interesting that it can be traced back to a fundamental difference (in the need for static type equalities) between object-oriented and functional-style APIs.

Indeed

It is not completely fair to compare OCaml and Haskell's design with Scala's, because an important point with Scala is that it is an object-oriented language. For fundamental reasons, designing APIs can be harder if you stick to an object-oriented style.

Indeed, and that's my (and I suspect paulp's) point! It seems that all of this effort is ultimately completely pointless -- my interest would be in the realm of does this actually solve any problems?[1], not "so we have this solution... can we find any problems to solve?"

[1] I mean in the general sense of making it easier to avoid mistakes, etc. etc.

Encapsulation

OO languages enable more encapsulation than Haskell, and that's much more likely to be a real concern.

Suppose I want to change how Strings are implemented. Compare how much code is impacted in Haskell vs an OO language. With OO, that can be a localized change. With Haskell's standard design, you end up altering lots of code. I understand you can witness that in practice in Haskell, due to String vs bytestring vs ...: the initial String implementation is agreed to be a bad idea, but lots of code is stuck using it.

Next, can you write code that is independent of how Strings are implemented? Not with the standard design, and the Haskell alternatives supporting this have been criticized (because the appropriate typeclasses end up being lawless).

ML modules allow supporting this better, but the complexity of the feature is comparable with Scala. I can't comment on any ML collection library, but heuristically I'd be surprised if they had hit a perfect solution.

Well, that is the main point

Well, that is the main point of FP vs. OOP, the problem is that scala's niche is straddling both worlds. It does a pretty good job at that, but some of the FP enthusiasts that have been attracted to it want to get rid of the OOP...scala should be Haskell because...why?

Subtyping is actually a nice feature that just happens to have a high complexity price in a functional world. I don't think anyone has gone as far with collections in two worlds as scala has. I'm not really sure if it's the "best" decision, but design is about making a bunch of trade offs that won't make everyone happy.

In fact, Haskell should be more like Scala on modules

Not quite. But Haskell might finally add something like ML modules (the ones you encode in Scala via the cake pattern).

Socially, Scala allows encoding Haskell idioms to attract Haskellers.

Technically, FPers have IMHO one valid point against pure OO. That is, if *everything* is late-bound, *everything* is hard to control (where is this call to this.foo going to?) — the same problem that killed AOP. That's why C# chose to *not* make virtual the default, why ML has subtyping in very specific places. It's something that Bob Harper, Paul Phillips and countless others rant about.

OTOH, Alan Kay argues that if everything is late-bound, everything can be patched — which you need because you don't know what you get wrong, so the alternative would be to rewrite half your program.
But some people like looking at the whole program to convince themselves it's right — and to do that, it's better to do the huge rewrite! This set of people aligns well with FPers/Haskellers.

I don't think Scala was ever

I don't think Scala was ever setup to attract Haskellers, I mean, that just happened, but it felt more like an invasion :)

C# having non-virtual by default was a carry over from C++, and it isn't a big thing at that; and anyways, you have higher-order functions in FP that amount to an even more extreme form of late binding that often involve explicit rather than implicit higher-order reasoning on behalf of the user. In one case, you have different variant behavior, something that isn't very scary and has straightforward real world analogies, on the other hand you have just...f.

I think the tension between OOP and FP mostly shows up in the type system, the classic tension between parametricity and subtyping. At least, this is where much of the complexity comes from that people complain about in the collection library.

Changing how strings are implemented

Under the pure OO approach, changing how all strings are implemented in a modular way isn't even possible, is it?

Just edit the code of the

Just edit the code of the string class?

Which is the string class?

Which is the string class? Isn't the point of pure OO that anyone is allowed to pass in their own impostor string object as long as it adheres to the correct interface?

No, pure OOP has never

No, pure OOP has never specified that interface must be separated from implementation. That is a rather recent thing (ObjectiveC protocols), and is rather controversial at that (it does not allow for encapsulated interfaces).

Complexity of Scala vs ML modules

ML modules allow supporting this better, but the complexity of the feature is comparable with Scala.

Actually, no, not even close. ML modules are basically just System F-omega + non-recursive subtyping and quantifier hoisting. Scala is far, far away from that.

You can do higher kinded

You can do higher kinded types just fine by inserting casts to and from Object in the right places. With type erasure you need to insert strictly more casts (that is, for all generics and not just for the higher kinded ones).

There's some buzz that Don

You don't even need to cast to and from object, you can use branding to enforce casts to and from a base type, and those casts are mostly safe even in C#.

There's some buzz that Don Syme is working on HKT for the CLR. Unfortunately it's private, so we can't see the real goings on.

Very nice!


I'm only relaying

I'm only relaying information from others; see paulp's talks on JVM interop. My suspicion is that casts would have significant overhead if everything weren't Object... and that this would not be well-optimised by a JIT.

(... but that's just my vague recollection of a couple of talks, so take it for what that's worth.)

My suspicion is that casts

My suspicion is that casts would have significant overhead if everything weren't Object... and that this would not be well-optimised by a JIT.

My microbenchmarks on the CLR indicate that casts between subclasses are much faster than casts to and from System.Object, and also much faster than to and from interfaces. I don't recall how interfaces compare to System.Object casts.

Reflection and parametricity

IIRC, you *can* combine parametricity and the right reflection interface. If you do that, def identity[T]: T => T can only be a polymorphic identity function, while def typecase[T: Typeable]: T => T can do a typecase on T. IIUC, GHC Haskell provides this interface (https://hackage.haskell.org/package/base-4.8.1.0/docs/Data-Typeable.html).

Does a vm need to know about types at all? It's a target!

Maybe these vms are just too high level. They start out targeting a specific language instead of just supporting some libraries and being a platform.

I realize that the jvm tries to be a safe interchange format but it probably fails at that. Maybe the "safe interchange formats" need to be totally separate layers from actual implementation formats.

You know the best thing about the jvm is not the language (which is damn weak), it's the included group of multi-processor supporting garbage collectors.

If they abstracted not a language but just basic processor abilities and tied those to good garbage collectors then you'd have a better and more flexible vm than the jvm or .net

Assembly language is a perfectly good target and it knows nothing about types.

Maybe these vms are just too

Maybe these vms are just too high level.

Agreed. C-- tried to forge this path, but it didn't gain much traction. Object models, dispatch, fixed subtyping, it's either too general, or not general enough.

Assembly language is a perfectly good target and it knows nothing about types.

The question is whether it should. LLVM is an acceptable pseudo-assembly-level target language if you don't much care about typing. I'm more interested in one that supports enough typing to convey meaningful benefits, but that doesn't overly impede innovative new abstractions.

I'm not sure its possible

I'm not sure that it's possible to have such a generic type system. Type systems the permit everything are not interesting. Interesting type systems restrict what you can do to some subset of possible operations. To encode all possible source language type systems, the in VM type system needs to be more general, therefore less interesting. To cope with new interesting type systems it will have to be the most general type system, which is no type system.

In other words whatever type system you have for the VM, there will be possible type systems that break it, the only question is, are the systems that break it interesting. By defining a VM type system you are basically saying you can predict what all the interesting future type systems are, and they are all compatible with this VM type system. I would not be comfortable making that claim. I think the best you can achieve is a VM type system that can support all the current language type systems that are interesting.

Why not a meta system

You can have a "type system" that's jit/trace/optimizable but simple.

You can also have a system where the vm knows nothing about it and depends on the compiler to generate good code in the first place - like assembly language - and that just needs hooks for the garbage collector to figure out JUST ENOUGH to be able to collect them. You could even hand arbitrary code over to the garbage collector so that can parse types,and put hook calls for the gc (think of Bohem, that's in C but it still manages to collect and DOES have precise rather than conservative modes).

You can also have things in between for the optimizer and garbage collector, including generating arbitrary optimizer code or arbitrary garbage collector code over to the jit/tracing optimizer and gc. That could be simple type info it could be logic programs in order to be better behaved than arbitrary code or it could be truely arbitrary code.

The point isn't to capture

The point isn't to capture all of the high-level semantics, but rather to expose a typed assembly language to ensure the safety of low-level semantics. So something with jump labels, structs, arrays, bounded pointers, etc., sort of like the Mu type system that gasche linked.

TAL

Something like this: http://www.cs.cornell.edu/talc/papers/talx86-wcsss.pdf I have been looking at something like this in my own work, where I am looking for a type system that can type all the intermediate stages between source and target code in a nano-pass compiler. To do this it has to capture all the high level and low level semantics in one type system.

I don't imagine it will be of any use apart from compiling the source language that I have in mind, and don't intend it to support future language level abstractions.

Why?

My question would be: Why would you want to? AFAICT, there's no feasible definition of "low-level semantics" that's shared with more than, say, 2 or 3 other languages at most. Personally, I think the question of multi-language VMs is an ill-posed question... at best.

(I apologise for not having any kind of formal or statistically-backed proof. Perhaps this is a ripe area for some young student to earn a few degrees/PhDs/prizes in? We're sorely lacking in empirical research in this area, IMO. I'll leave it there.)

AFAICT, there's no feasible

AFAICT, there's no feasible definition of "low-level semantics" that's shared with more than, say, 2 or 3 other languages at most.

I don't think a typed assembly language is infeasible or ill-posed at all. The point is only to reflect a thin hardware abstraction to avoid as many platform dependencies as possible, while enabling as many reasonable safety properties on this abstract machine as possible, nothing more. Something like the Mu VM, but with a few more enhancements would be a decent example.

Yes

Well, we have have that, I think. It's called "x64" and whatever-ARM-call-their-instruction-set. It's not very valuable in practice :).

EDIT: Alright, it's not typed, but I'm not convinced that any kind of "typed things" make any reasonable sense at this level (essentially "assembly"). At least strings of instructions are easy to reproduce and debug.

This is one of the few places where I actually appreciate LISP/Scheme-type languages, because there's actually a formal way to "go up the tower" of abstraction.

LLVM

"LLVM is an acceptable pseudo-assembly-level target language"

No its not. It's seriously compromised by failing to provide a suitable basis for control. What's the point of a "low level" VM that provides a handful of low level computations (shifts, additions, etc) and only two (nontrivial) control transfer instructions, one being a call, and one being a call with an exception handling trampoline. How do you do continuation passing, control exchange, coroutines, or other such more interesting things with such a lame instruction set?

It's a one language VM. It was designed for C/C++. Putting the calling conventions into the backend is begging the question. Control management is interesting. A handful of calculations can already be done easily in C isn't: there's no need for yet another low level virtual machine like C when lack of control is the biggest problem in C.

Exacly. You put it so much

Exacly. You put it so much better than I did! Thanks!

Jump/tail-call?

How do you do continuation passing, control exchange, coroutines, or other such more interesting things with such a lame instruction set?

IIRC, we know that all of that requires, basically, just tail calls from the backend; then the backend can transform to continuation-passing style, and everything else can be expressed on top easily — that is, by desugaring. In other words, adding one primitive would be enough for all those things. (Indeed, there aren't infinite primitives to add).

Topic move

I'm very interested in a conversation on flexible VM instruction sets, but I think the aforementioned Micro Virtual Machines as Solid Foundation for Language Development would be a better place to discuss it, and possibly preserve some valuable discussion space here for in-retrospect discussions of surface language design.

(Thank you for the remark on Scala stuff, I felt it was out of place as well.)

tail call

That may be so in a high level theoretical model, but in a low level machine one needs to distinguish, for example, a direct jump, an indirect jump, jumps with some way to preserve the current continuation, etc etc. In my Felix language I copied the PDP-11 instruction set and provided branch-and-link, which in turn requires both label variables and a computed (indirect) goto instruction.

Note such an instruction makes no mention of passing any data, and also makes no assumption about the existence of a stack. At this level a "continuation" is just a machine code address.

So the issue here is that at the low level a subroutine call saves the current continuation on the machine stack and calls some procedure which returns by popping the continuation off the machine stack and jumping to it. In other words procedure calls, even without any thought of passing arguments or returning values, are actually quite high level instructions.

It turns out (IMHO) that whilst stacks have good performance, you can't design a decent programming language with just subroutine calls .. in fact the stack is an evil enemy. We're a long way here from a functional programming system with closures and tail calls! What we need is a machine to *implement* such a system.

Relevant pointer

naasking and you know about this already, but for future reference, we discussed Micro Virtual Machines as Solid Foundation for Language Development on LtU.

+1

+1

Maybe these vms are just too

Maybe these vms are just too high level.

(I snipped most of your message, but that's for reader convenience. I mean to respond to all of it.)

No.

There's no feasible way to make a "generic" VM unless all your hosted languages are essentially the same language. (Dynamic/static doesn't matter, interestingly.) If we look at the JVM and recent initiatives, they're talking about Ruby, JS, Java, Groovy, etc. The major point is: They're all essentially the same language! Sure some of them are dynamically type checked, some are statically checked, but... they all have the same object model.

If that doesn't tell you something, then I don't know what would.

(Sorry about the self-reply,

(Sorry about the self-reply, I just thought it belonged in a separate post, but I think it might still be relevant.)

The most recent entrant into the "generic VM" space was LLVM, and while it's been spectacularly successful for C-like languages, it still needed quite a few patches to accomodate the Haskell LLVM backend. Even then, while the newish LLVM Haskell backend does show some promise on numerically intensive benchmarks, it's not a completely obvious thing to swap in LLVM as the default backend. (Otherwise, the GHC devs would have done it, I'm sure. They're actually pragmatic about what goes underneath the Haskell compiler and runtime.)

(That's more of an "experience report" than any kind of "proof", but there it is. Take it for what it's worth. :))

LLVM doesn't enforce an ABI,

LLVM doesn't enforce an ABI, it doesn't enforce an object model. Just because two languages that compile to LLVM says nothing about their interoperability. LLVM is strictly for code gen.

Now, the JVM defines an ABI, it defines a an object model, it is much higher level. Two languages that compile to the JVM should be able to talk to each other, in fact that is the entire point! So the design constraints are completely different.

Exactly...

... and that's my point. That's why GHC/LLVM needed custom extensions... and why LLVM isn't a "generic VM". Thus negating the ideal of a "generic VM".

Bonus question: Can GHC/LLVM interact with any other languages on LLVM?

EDIT: Bonus answer (SPOILERS!): No, becuase LLVM is not a generic VM. It's basically just an IR for a compiler. And that's fine, but it's not a "generic VM".

Well, think about what VM

Well, think about what VM has traditionally meant: hardware abstraction. That ABI abstraction was added to that later in the JVM and gained that meaning later on was probably an accident. But given the context of kids building a new compiler infrastructure, they probably saw the JVM as their main competition, so named accordingly, even though it turned out that LLVM became something more like an improved GCC backend.

I think this is why Microsoft decided to call the CLR a run-time rather than a VM. Technically, it was competing with the JVM, but Microsoft also had a big interest in traditional VMs, so calling it a CLVM would be confusing. And of course, there is also ABI abstraction with COM, which pops up in WinRT (where the CLR can be used, but non-CLR C++ or Javascript can be used as well, it is just the ABI that is standardized along with ref-counting to deal with memory management in a robust performant way).

"Void" is an Actor that is also a type

Void is an Actor that is also a type. Void is also a reserved word in many programming languages for the Actor.

What's the state of C#?

What's the state of C#?

It seems fine, actually. It

It seems fine, actually. It keeps being improved at a conservative pace that its users can deal with; there are still a lot of features being considered that are kind of exciting, though other languages already have these.

C# itself is getting a big visibility boost from OSS efforts. Unity continues to play a big role in Indie game development. Though I honestly don't see it expanding much outside of the Microsoft world, even though I think that would be a good thing (C# is better than Java, but the future of Java is Scala anyways).