Subsumption at all costs

Gilad Bracha gives a pretty compelling argument with good examples on why we should favor subsumption over other things (ADT/inheritance) in "Subsuming Packages and other Stories".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It turns out...

It turns out that you cannot have ADTs, inheritance and subsumption in one language.

Is there another paper that argues this claim? I'm not sure you can reach this conclusion from a few examples all involving name shadowing.

It has nothing to do with name shadowing

It simply falls out from the fact ADTs use type abstraction and inheritance uses procedural abstraction.

There are other good points Gilad has made in the past not mentioned in this blog post. When he was at Sun, he wrote a blog post about "Modular Development for Development", where he gave some examples of how the Java package system, while sufficient for most needs, is actually quite lacking when you want to do Jigsaw-like combinator-like functionality. In fact, it's also verbose and littered with corner cases where you simply can't do something, such as when you want a package of packages but don't want to expose subpackage's implementation details to one another.

You can still have subsumption and inheritance in one language, and then provide type abstraction via pluggable types - in this way you don't have to restrict your use cases to ADTs. This is a part of Gilad's design preferences most don't see.

The issue is name shadowing

Here's the argument he gives (from the comments) that allegedly shows that you cannot have ADTs, inheritance, and subsumption:

Say you have ADT_S with abstract type S, with private feature m of type int. You extend this with ADT_T with abstract type T. T has feature m of type String. No contradiction, as ADT_T cannot see the unexposed feature m of S. Now suppose you add

foo(T t){return t.m;}

to ADT_S. What is the type of the returned value? It seems obvious that it is String. But t also has type S by subsumption, and so one can prove that t.m has type int.

In other words:

class S {
    private int m;
    foo(T t){return t.m;}
}
class T extends S 
{
    public string m;
}

There are two unrelated members named 'm' ("as ADT_T cannot see the unexposed feature m of S"). The problem is that Gilad assumes that the reference to "t.m" will resolve to the public member of T, shadowing the member of S -- presumably because that's what Java does. But what if this was simply a compiler error because it cannot be determined which 'm' is intended, and some form of ambiguity resolution was required? Then any danger of silently violating subsumption is averted.

Maybe there's some other argument that can be made, but the flaw in the one presented here seems pretty obvious to me.

I didn't get everything

Case 1 (and the other similar cases) is caused by overloading, there is a well-know 'tension' between overloading and inheritance.

Case 3 is interesting and IMHO a bug in Java specification: C shouldn't have access to foo!

It seems that Java sins by having the same

It seems that Java sins by having the same syntax for instance members (a.foo) and class members (A::foo).

In case 3, I agree this should not be allowed. Separate packages means there should be no coupling of the sort.

I am most interested in case 4. Is it accepted practice that a casting allows to bypass polymorphism?

{
((A) new B()).f(); // 42
new B().f(); // “!”
}

How can the first line return anything other than "!"?

A violation of subsumption or of something else?

The field shadowing problem is interesting. But I'm not totally convinced that it is a violation of subsumption. Bracha's "case 1" example:

class S { final int f = 42;}
class T extends S { final String f = “!”;}

new T().f; // “!”
((S) new T()).f; // 42

The first field access is for a field named "T.f" and the second is for a field named "S.f". They are different fields. The first part of the field name ("S." or "T.") is determined by the static type of the expression, which is fully under programmer control.

I see how this can be confusing, but it seems wrong to categorize it as a violation of subsumption.

What if you pass an instance

What if you pass an instance of S to code operating on a T? The old field is accessed, not the new field as some might expect.

The coercion S->T also happens implicitly, so you could argue that this is the source of the problem, but this shadowing behaviour is inconsistent with how methods overriding is treated in these languages, so a case for a violation of subsumption is certainly possible.

What is "semantic substitution"?

Taking another look at this, I think the problem here is that "semantic substitution" hasn't been defined. It seems like Bracha's definition is along the lines of simple context-ignorant AST substitution, which doesn't correctly compensate for the fact that Java uses static type information to perform slot lookup.

That's a reasonable definition of substitution, since it's probably close to how the human brain works. But this just means that the root problem is how the language's definition of safe substitution matches up with the programmer's intuition -- a fuzzier problem. The subsumption behavior is just a symptom.

And even if...

...you decide to call that subsumption failure, you could always choose to solve the problem by e.g. having a different syntax for accessing an object through its public interface than is used to access fields with local knowledge. Then public access would satisfy the subsumption property.

In any event, the claim I quoted above doesn't look warranted to me.

Strange conclusion for case 1

For the case 1, the conclusion shouldn't be "It would be much better to ensure that fields are always private, or better yet, avoid field references altogether, as in Self or Newspeak.", but that if you allow this then the language should enforce that T.f's type is a subtype of S.f so that the substition principle is preserved.

Subsumption

The real issue is shadowing of member names and type-dependent namespacing for lookup. In Java (or any of the other languages in the same class), when you write o.a, it really means o.a_of_static_class_of(o).

I agree that this semantics is ugly and confusing. It implies that the operational semantics of the language is type-dependent, even for basic operations, and erasure becomes difficult (that is the root cause for the implementation difficulties he was mentioning).

I am not sure it violates the substitution principle, though. True, technically, you cannot simply beta-substitute objects, as that might change the static type to a subtype. But that is not necessarily the same as subsumption -- because subtyping can be coercive (and is so in these languages). And even though Liskov named it the Substitution Principle, she was talking about subtyping.

Shadowing is inevitable

The issue is that in the presence of shadowed member names, you really only have two options:

  1. Declare by fiat that separately-declared members with the same name are the same member.
  2. Use other contextual information to disambiguate references.

Bracha argues that the second of these leads to a violation of subsumption, but the first is a pretty clear violation too. So shadowing of names leads to ugly semantics no matter what approach you take.

You obviously can't systematically rule out name shadowing in a large or distributed project, so the question is really how often it happens in practice, and what you do to mitigate it. I wrote about this a bit on my blog a while back.

The most direct approach, of course, would be to eliminate inheritance, or at least decouple inheritance from subtyping.

Name clashes are inevitable

Shadowing to me implies that, in context, a name that can be understood as referring to multiple things is resolved to refer to one (the shadowing symbol) over the others (the shadowed symbols). If you deal with name clashes by requiring that the context be amended to remove ambiguity, then IMO you don't have shadowing.

Name clashes are semantically benign. You do need reasonable policies to ensure that they don't lead to programmer error, but IMO this isn't particularly hard (even though it's frequently gotten wrong). If accidental name clash issues are a big problem in your language, then you have poor namespace management facilities in your language. There's nothing deep here.

It sounds like you agree...

Maybe you disagree with my choice of terminology, but you seem to agree with my conclusion: name clashes happen, and you need policies that deal with them. Name clashes are only "semantically benign" when your semantics make them so.

The specific case I am interested in is:

  1. Programmer X creates class A
  2. Programmer Y creates class B, inherits from A, and adds member M
  3. Programmer X (oblivious to Y) modifies class A to have its own member M
Is this "shadowing" or just a "name clash" in your terminology?

Given an object a and a member reference like a.M, what do we expect the behavior to be when a is statically/dynamically of type A or B?

There is great variety in how current languages answer that question. You may argue that there is "nothing deep here," but this is at least a design question with breadth.

You're right

I agree that I was mostly just nitpicking terminology with you. Most of the tone of that post was directed at Gilad Bracha's blog -- not yours. I agree that this is an important issue to get right (and have opinions on how to do it). But I think the linked blog's "ADTs, inheritance, subsumption: choose two" makes the issue sound much more substantial than it actually is.

Regarding the case you're interested in: you've described a name clash. Shadowing would be if one of those members M blocks visibility of the other.