Generics: The Importance of Wildcards

Martin Bravenboer writes about generic wildcards in Java, and concludes that it is unfortunate that C# will not support wildcards or a similar mechanism.

Eric Gunnerson from Microsoft replies.

I was originally a type-erasure fan, but these days I am not so sure. I hope this turns into a fruitful discussion that helps me decide...

P.S

The Java paper was mentioned on LtU before.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Hairy details

While scanning Eric Gunnerson's blog




The key difference between C# boxing and Java 5.0 boxing is that C# boxes between primitive types and object, and Java does it between primitive types and the predefined wrapper types. This allows them to do implicit conversions both directions (an Integer and an int are pretty close in behavior).


There are some issues on this that have to do with object identity. Consider the following:



Map m = new Map();
m.add(3, foo);
string s = m.at(3);

What does the third line do? Well, it turns out that the two 3 values arent the same because they are autoboxed to different reference types. That behavior is a bit surprising.

To improve the situation, theyve come up with an interesting modification, which I believe this is not part of the standard, but part of javac.

They get around this and Im not making this up by playing tricks for small integer values, but not for large integer values. They did not give an exact description for how this behavior is controlled.




Brwtfltlft? My god, I am thinking.... 31 bits....?

Wow

Wow, talk about the blind flaming the blind. The problem with variance in both languages isn't type erasure versus runtime support, or wildcards versus pure generics. Neither writer seems to even understand the problem.

The problem is that both languages only expose mutable arrays. If they exposed read-only arrays (similarly to lists in Haskell), those would be covariant in C# and Java; and if they exposed write-only arrays, those would be contravariant.

But there is a solution that is more fundamental still: to separate arrays from mutable references entirely, so that arrays are always covariant. Then the type of arrays of integers is a subtype of the type of arrays of numbers, and the true source of the invariance is exposed: it is the reference type constructor (like IoRef t in Haskell) that is invariant.

Microsoft employs the greatest set of programming language researchers ever assembled; they should be able to do better than this.

Maybe MS don't know that they do

... but they have great taste covered.

single inheritance is both co- and contravariant?

read-only arrays (similarly to lists in Haskell), those would be covariant

I assume you mean the container (e.g. array) is read-only, not class members of the class instances contained in it.

Edit: most of the comment was removed before it was in error. Thanks.

Re: single inheritance is both co- and contravariant?

I don't understand this comment. Covariance and contravariance is a property of typing rules, or type constructors such as List⟨t⟩ or Function⟨arg, val⟩; it is entirely unrelated to references or assignment. In languages which conflate inheritance with subtyping, inheritance hierarchies as a rule are covariant, meaning that an instance of a derived class is-a (can substitute for) an instance of its parent class.

As for references, read-only references are covariant, as expected; a read-only reference which points to an instance of a subtype can also be assumed to refer to the supertype, without impacting type safety. Conversely, write-only refs must be contravariant: a write-only ref to a subtype cannot accept an instance of the supertype, but a ref to a supertype can always accept instances of a subtype.

Covariance and contravariance for more complex data types can be derived by repeatedly applying these rules. For instance, a function which takes a writable ref to T as an argument is covariant in T: a function which taks a writable ref to Sub and writes to it can always be assumed to take a ref to Super instead.

Re: single inheritance is both co- and contravariant?

Edit: this comment has been heavily pruned, as several of my original points were erroneous.

Covariance and contravariance is a property of typing rules, or type constructors such as List⟨t⟩ or Function⟨arg, val⟩; it is entirely unrelated to references or assignment.

Assignment and referencing is the substitution that Liskov Substitution applies to.

Assignment and referencing are type conversion operators.

In languages which conflate inheritance with subtyping

Okay I will contemplate what semantics inheritance could have if it was not conflated with subtyping of a type's interface. I suppose that "non-typing inheritance" could have some behavioral semantics orthogonal to the types of the interfaces of type, then that would be (see caveat below for virtual methods) an orthogonal undeclared assumption of the programmer or an orthogonal language not considered by the typing system, e.g. CSS is not considered by the DOM typing system.

Thus when compiling (possibly inferred) static subtyping, if there an identifier for an interface, then it is declared that a subtype inherits that interface, then afaics that is not conflating "non-typing inheritance" and subtyping, rather that is giving a name to a subtype group and reusing it via composition. If you choose to call that "inheritance" then let us note that it is "typing inheritance" and is thus not conflated with "non-typing inheritance".

The key I guess is that interface has no implementation.

If the programmer has made some behavioral assumptions about that "typing inheritance" other than what is enforced by the subtyping, then the programmer is not be aided by any OOP enforcement of those assumptions. Thus there is no conflation from the language, but rather from the programmers misuse of OOP subtyping. The programmer should apply a declared behavioral inheritance paradigm to formalize such implicit assumptions, e.g. pre/post condition proving.

Note if you are thinking of virtual methods, these break Liskov Substitution Principle and thus are dynamic behavioral typing (e.g. a square that inherits from a rectangle but enforces equal length sides on its virtual set method) and thus yes are a conflation that should not be in any modern language that claims to be statically type safe.

, inheritance hierarchies as a rule are covariant, meaning that an instance of a derived class is-a (can substitute for) an instance of its parent class.

Due to LSP, the hierarchies are only covariant if the method parameters (and writable members) are contravariant and return values (and readable members) are covariant.

By "derived" you mean augments and does not remove interfaces, i.e. makes the type less generalized (more narrow) and more specific (less wide). Covariant preserves the order of the hierarchy of interface generality, and contravariant reverses that order.

And "substitute for" includes assignment or taking a reference, which are read and write type conversion operations from the type of what is being assigned/referenced, to the type being assigned to.

As for references, read-only references are covariant, as expected; a read-only reference which points to an instance of a subtype can also be assumed to refer to the supertype, without impacting type safety.

Conversely, write-only refs must be contravariant: a write-only ref to a subtype cannot accept an instance of the supertype, but a ref to a supertype can always accept instances of a subtype.

If you are referring to the r/w attibute of the reference identifier, then assignment (substitution) can never occur if the left-hand operand is not writable and the right-hand operand is not readable. Thus such an operation is always covariant, as it agrees with the hierarchal order of the typing system.

If instead you are referring to the r/w attribute of what the reference points to, then I agree.

Edit: what I had written below here before was incorrect.

Covariance and contravariance for more complex data types can be derived by repeatedly applying these rules. For instance, a function which takes a writable ref to T as an argument is covariant in T: a function which taks a writable ref to Sub and writes to it can always be assumed to take a ref to Super instead.

I can not parse that well. My explanation is in the very first link of this post.

Btw, referential transparency can be a significant benefit, because it enables all covariant type conversion.

Tim Sweeney wrote:

But there is a solution that is more fundamental still: to separate arrays from mutable references entirely, so that arrays are always covariant. Then the type of arrays of integers is a subtype of the type of arrays of numbers, and the true source of the invariance is exposed: it is the reference type constructor (like IoRef t in Haskell) that is invariant.

If Sweeney means the references to the parametrized types (e.g. array), which means they can take a reference to a subtype, then yes eliminating references and using only literals can never incur the unsafe type conversions. But isn't that a rather meaningless assertion, because if we can't take a reference, then what is the utility of subtyping?

Perhaps the notion that explicit constructors are harmful (Gilad Bracha) applies.

Appears to me that perhaps Sweeney missed the point that what makes Haskell covariant for parametrized types is referential transparency, i.e. that function arguments are read-only.

Here are my rough thoughts that Haskell does have subtyping and (its) covariant substitution is always due to immutability.

Orthogonal to type-erasure

I was originally a type-erasure fan, but these days I am not so sure. I hope this turns into a fruitful discussion that helps me decide...

Ignoring technical artefacts of the two platforms in question, is the adoption of wildcards really related to the choice for or against type erasure? Wildcards are just crippled existential quantification (with ad-hoc restrictions like "top-level" only "capturing" due to the desire of leaving quantifiers implicit). I see no fundamental reason why existential quantification couldn't be supported in a type passing framework like CLR.

Type deducible from call stack...

I am still going through the Java paper. Something I don't get [well, my life is like that ;-)]

In the cited paper on page 6 they discuss functional language parametric polymorphism.

This is what I understand from it for now:

They discuss a list example, continue to discuss ML's value restriction, and then discuss that, because of the side effects of an impure language, there are not enough static guarantees such that an application of a polymorphic function will be type-safe (because in a functional language this information is derivable from the stack?).

Que? This seems untrue to me. Should I read this as: we don't have enough enough information at JVM-class-load time to prove type consistency of the whole program? What is this stack stuff?

Maybe they meant function call graph.

I agree... The stack is a runtime thing and cannot be used for static analysis. I think they might have meant function call graph, which is like a static description of all possible runtime stacks.

Makes more sense

That would make more sense, at least to me. But then, ?assignments? making type inference hard? They must have meant something else.

[edit: snapped a comment everybody knows]

Mutablity?

I also don't know what the exact type stuff has to do with mutability. I thought the reason you can't statically determine the exact type of something in an OO language was subtyping:

f (x : Boolean) : Shape {
   if (x)
      then return new Square()
      else return new Circle()
}

mutability (somewhat) leads to the value restriction

The problem isn't in determining exact types; its in whether you can generalize an expression to have polymorphic type and retain a sound type system.

Essentially, people came up with a rule (for inferring a polymorphic type) that worked when you didn't have mutation, but then when they added mutation, that rule was no longer sound.

So Andrew Wright suggested the value restriction, which most ML implementations have adopted, I believe.

http://citeseer.csail.mit.edu/wright93polymorphism.html

(Other solutions were proposed in the research literature of that time, such as adopting an effect system to retain soundness. But the value restriction is really quite understandable, and the remarkable thing is that most programs already conformed to it.)

I didn't read the Java paper closely enough to know whether they were talking about this particular detail or not. But this is one place where mutability come into play with type inference.

Ouch - generalization

Ha! thanks for the reference, I noted and somewhat understood ocaml's odd behavior but didn't read the paper with design decisions.

Still, it isn't about mutability, right? It seems it is the same question as: Can I stick \x -> x + 1 into a list of type [a->a]? (That was the thing I edited away above)

Dated 93, hmmm, I guess most people nowadays call this value restriction, well, a feature?

Hmpf

Ah, now I get it. They were discussing exact typing vs subtyping in programming languages [wich also allow polymorphic types]. Seems to me that they got some stuff wrong in their reasoning though; as far as I know, you do have static guarantees in subtyped languages [even with polymorphism] and you don't need to carry around all type information [at run time]. Ah well, whatever.

[edit: didn't get it, it's about generalization, see argument above by felix]