Preventing downcasting and adding a universal variant

I am working on a rather simple object-oriented programming language. Here are some minor differences from a Java:

  1. No downcasting (i.e. casting from a super-type to a sub-type [edit: thanks Ohad!])
  2. A universal variant (i.e. can hold any value plus type information)
  3. No virtual functions (dynamic dispatch through interfaces only)

What I am wondering about now (and I wrote a bit more on my blog) is whether preventing downcasting would cause any long term problems (in other words, has the community reached a consensus on this issues), and whether a variant type can cause violations of the Liskov substitution principle. Any insights into the problem, would be appreciated.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Minor correction

Surely your explanation of downcasting is meant to be the other way around...

Thanks, definitely!

Thanks, it definitely was. Made the correction.

I'm assuming

I'm assuming that a term of the variant type then can have an arbitrary method invoked on it, which will complete if the actual type supports the method, else fail with a suitable exception?

Sounds to me like java.lang.Object plus introspection plus nicer syntax--in other words, duck typing added on top of the usual Java approach.

I'm not sure why you would want to eliminate downcasting. Sometimes, downcasting is a poor substitute for more powerful type queries, and can violate parametricity if not done right. On the other hand, there are many cases where an abstraction requires a polymorphism, but not any old Object will do--it's far nicer to be able to declare exactly what the problem domain requires, rather than accepting any old Object and not failing until an exception is raised when an operation cannot be performed.

Parametricity

downcasting is a poor substitute for more powerful type queries, and can violate parametricity if not done right.

Mh, can you elaborate? How can the presence of downcast (let alone more powerful "type queries") not violate parametricity?

Don't you already know the answer to that...?

Non-Parametric Parametricity,
Georg Neis, Derek Dreyer and Andreas Rossberg, ICFP 2009.

:)

Principia Narcissus

I just like the title of the paper, but to my good fortune, it's also on topic: Principia Narcissus.

Language vs terms

Well, no. :-) That paper only shows that you can work around the lack of parametricity in the language and reestablish parametricity for individual expressions. It does not give you a way to define the semantics of cast such that the entire language would remain parametric. I asked because I thought Scott was talking about the language feature in general, not just individual uses, but maybe I misunderstood.

I'm lost

There is just too much back story here that I am missing. For example what are you both meaning by "parametricity"? How can one violate it, and why do people care?

Parametricity...

So, one of the most cliched bits of wisdom in the software field is that when we use a library, we should program strictly to the interface, and not write any code that depends on details of the underlying implementation.

The reason we repeat this advice over and over is because it's what lets us develop programs in a modular way -- we can swap out one implementation of a library with another, without disturbing the behavior of the rest of the system. So if we program to the finite map interface, we can replace the hash table implementation without breaking all its clients.

Languages can help support this style of modular programming, if they support the construction of abstract data types: if you allow modules to export types whose implementation is hidden, then by construction client programs can't rely on the exact implementation type. The only way they can access the hidden type is by the module operations that manipulate it.

The PL jargon for this property is "parametricity", because we want client programs to be parameterized in the implementation of a library.

In an OO language, a class exposes a type, and a set of operations on that type, without exposing the fields (the representation). However, downcasts, and type analysis in general, break this property. A client program can use type analysis to figure out how the abstract type was implemented, and branch on that implementation. If you can write a client program that depends on whether the finite map is implemented with a hash table or a binary tree, then you can't switch between those two implementations without changing the behavior of the client. In this way, a client program can circumvent the information hiding mechanisms of the language.

parametricity could still be preserved with downcast

If inheritance can be classified as private and public (as in C++) and downcasts respect this classification (as does dynamic_cast), then parametricity could still be preserved.

class Public_A { ... };
class Public_B : public Public_A { ... };

class Private_B_1 : private Public_B {};
class Private_B_2 : private Public_B {};

// user code
Public_A *a = ...
dynamic_cast<Public_B *>(a)     // allowed
dynamic_cast<Private_B_1 *>(a)  // compile-time error

Regardless of public/private inheritance, implementation classes should not be in the public interface anyway. Thus, the module user should not be able to expose this detail anyway.

Security

Neel has, of course, done a great job answering the question. I would only like to call attention to James Iry's link to Principia Narcissus, which addresses the tension between type-directed programming and parametricity with an eye towards parametricity's role in ensuring data privacy and integrity, and Washburn and Weyrich's related Generalizing Parametricity Using Information Flow. I also recommend taking a look at Type-checking zero knowledge, in which IMHO the key motivating sentence is "Zero-knowledge proofs are given dependent types where the messages kept secret by the proof are existentially quantified in the logic." In case the connection isn't obvious, see Mitchell and Plotkin's Abstract Types Have Existential Type.

To summarize, parametricity is important from a software engineering perspective in order to prevent changing implementations of an interface from blocking progress in executing clients of the interface, and from a security perspective in order to prevent private information from leaking across an interface. That is, ensuring progress and security are intimately related, an insight that is perhaps found most explicitly in Mark Miller's Ph.D. thesis.

I'm assuming that a term of

I'm assuming that a term of the variant type then can have an arbitrary method invoked on it, which will complete if the actual type supports the method, else fail with a suitable exception?

This is the plan. This way, working with only the variant type we can program as if using a fully dynamic language.

Sounds to me like java.lang.Object plus introspection plus nicer syntax

Close but not quite. Consider a class D derived from a base class B. In Java a function f(B b) { ... } can downcast the B to D. This is something that would be explicitly impossible in my proposed system. By forcing an upcast, the original object's type information could not be retrieved by any means. Any casting to a variant from that point on, would cause the variant to see the object as only an object of type "B" never of "D".

Okay, so the next question is "why these constraints"? I believe it makes the language more tractable and easy to analyze (for humans and tools) if you can be assured that an object will only ever be used as the declared type (and not arbitrarily cast to some other type). This is admittedly the part of the question which I am unsure about.

... rather than accepting any old Object and not failing until an exception is raised when an operation cannot be performed.

I agree with this sentiment. I am thinking that to achieve this it makes sense to include algebraic data types, if I try to maintain the goals above.

Now I'm a bit confused

If the variant is universal (i.e. can hold any Object), then the only operations which can be safely performed on it are those defined for Object, no? While Objects in Java can do a lot... doing most useful things requires a term to be of a stronger type than Object. And in the absence of static analysis that can prove a strengthening operation valid--you need to downcast. (Or do an unchecked downcast, which is no better and in many ways worse).

Or is your variant capable of parameterization on an interface---i.e. you can cast a D to a var<B> iff D implements B; and the resulting variant is capable of invoking methods defined in B? How does that differ from a simple reference of type B?

I can see the reason to restrict implementation inheritance, and maintain a cleaner separation between concrete classes (which are at the leaves of the subtyping graph) and interfaces. My current toy language project does that. :) OTOH, I'm not restricting downcasting in any way.

One thing you might consider--a wrap<T> type, which takes an object (or variant) of type T, and creates a "blind proxy" which will delegate any method calls on the object that match the interface of T, but does not permit downcasting, or other insufficiently-parametric operations.

Edited (twice) because I forgot that Java-style template syntax looks like markup to LtU....

Variant casts aren't downcasts

If the variant is universal (i.e. can hold any Object), then the only operations which can be safely performed on it are those defined for Object, no?

By "safely" you mean can't throw an exception, right? Well the variant can throw an exception for any operation. As long as people know what they are getting into by using them I consider it fine. There are after all lots of happy dynamic language users out there.

The variant is a rule breaker in my language. Only the variant carries type information (e.g. are you an instance of T) and allows casts (e.g. cast to T) which may throw exceptions at run-time (e.g. I am not a T)

Or is your variant capable of parameterization on an interface---i.e. you can cast a D to a var<B> iff D implements B; and the resulting variant is capable of invoking methods defined in B?

No. What I propose for now is a simple runtime checked variant that can hold anything, and will support dynamic function lookup.

What you could do is the following:

f(variant v) {
  try {  
    B b = v.cast<B>();
  }
  catch () {
    Console.WriteLine("v does not hold an instance of B");
  }
}
Doing most useful things requires a term to be of a stronger type than Object.

I agree, but in an OOPL you can avoid using "Object" (which is the analogy of my proposed variant) by careful design much of the time.

What is confusing here is that the variant is being equated with "Object". An "Object" is a super-type of all objects, but I don't consider "variant" to be. This is because no object other than variant supports: "is_type_of" or "cast_to". So technically casting out of a variant is not a downcast like it is in Java (note: I don't even know what to call it).

One goal of my proposed system is that during static analysis, you can't end up with a value of type B and wonder if it is going to cast a D at some point, it is simply not able to happen (even if you cast it to a variant). So if your code is correct with regards to values of type B, you aren't going to screw things up by passing in a value of type D.

As far as I can tell this is going to lend some integrity to the type system (less chance of violating LSP), while providing a degree of dynamicism (is that a word?) to programmers if desired.

Some idle thoughts

I believe Boo has a type called "Duck" which works more or less as you describe. http://boo.codehaus.org/Duck+Typing. I'm not a Boo guy, though, so I might be missing a subtlety of what Boo does.

If you are going to have this dynamic type, it might be worthwhile to have runtime type switching ala Scala instead of try/cast/catch.

It's also worth reading up on Wadler's Well Typed Programs Can't be Blamed about the boundaries between dynamic and static typing.

What is confusing here is

What is confusing here is that the variant is being equated with "Object". An "Object" is a super-type of all objects, but I don't consider "variant" to be.

Ok, so you want to build an OO language that has no default universal super type, and that injection/projection from a universal type would be a manual operation. I agree that this would be an improvement on current OO languages. Virgil also lacks a universal super type.

This was part of my confusion

I was assuming automatic injection, which would make your variant essentially the same as Object.

Of course, the projection operation from variant to a concrete type is fundamentally equivalent to a downcast--it's an operation that can fail, and uses runtime information to perform a strenghtening operation that cannot safely be performed at compile-time, in the general case. (And the injection operation, like automatic subsumption, aka upcasting, in OO languages, is information-lossy).

One thing that Java does, which I consider a (minor) design mistake, is that Object is non-abstract. "new Object" is perfectly valid Java, and returns a lowercase-o object which is useless apart from its identity (a name, essentially). While this faculty itself is occasionally useful; it should have been provided in some other (leaf) class, and Object should have been abstract.

I was assuming automatic

I was assuming automatic injection, which would make your variant essentially the same as Object.

I think it's still a little different, even if you had implicit casting. Consider the following, with D deriving B:

variant foo(B b) {  
    return b; // implicit, but captures type-tag B
}
void main() {
    variant v = foo(new D());
    bool isD = v.is_type(); // false
} 

You could also allow a safe downcast form...

that is, you could require any use of a downcast to come paired with an error handler for the case of the test failing:

    let downcast x : C = e in   (* downcast e to class C, and name that x *)
      body 
    otherwise                   (* in case the cast fails *)
      handler_body 

Is it fair to explicitly raise

BadCastException or #doesNotUnderstand or whatever in the handler body? :)

Many attempt to limit partiality of functions, which I suspect motivates the OP, merely pushes the complexity onto the caller. Sometimes, a partial function becomes effectively total with a well-behaved caller.

And, unless you ban exception mechanisms all together, any partial function can be made to look total by raising one.

Valid point

This is definitely a valid approach.

I am trying to limit casting exceptions to functions that deal only with variants. I believe that this would make static code analysis (and ease of understanding) simpler. Basically: if you see variants, expect potential casting exceptions, and possible violations of the Liskov substitution principle.

I should give a better example of what I am trying to avoid. Consider the following code:

  bool f(B b) {
    try {
      D d = downcast<D>(b);
      return true;
    }
    catch {
      return false;
    }
  }

This function returns true for any argument of type "B". The problem with this code, is that f(x) violates the LSP, because if you pass it an object of "D" that is a subtype of "B" the function changes behavior.

This is why I think that preventing downcasts could make a language easier to analyze, understand, and optimize.

why do people need downcasting in the first place?

since i'm not a big fan of runtime errors, i wonder what all the reasons are that lead us to need downcasting at all. is it something lacking about our languages, or is it something inherent in the real world problems we are trying to solve? thanks for any thoughts/insight/edification.

Incompleteness

Type systems are inherently incomplete, thanks to the halting problem. Therefore, there will always be situations where the programmer knows that some situation is not possible, but the type system cannot prove it. If your language isn't total, then there will the opportunity for a possibly-failing runtime check to circumvent the type system here. Downcasting is just one way to write that. (Incomplete pattern matches, exceptions, and anything else that raises an error can be used here too.)

Downcasting

Programmers in languages of the Java-ilk seem to use downcasting as either a poor man's algebraic datatype dispatch or as a way to plug a difficulty with using the static type system.

Bang on

This is it. In a Java-ish language, it seems that downcasting is the path of least resistance to implementing certain kinds of algorithms (such as a parse tree walkers).

Of course...

in functional programming languages, it could be said, algebraic datatypes are the path of least resistance in implemening certain kinds of algorithms (such as GUIs) more cleanly handled with objects. :)

Occurrence Typing

You could use occurrence typing, as in Typed Scheme (Tobin-Hochstadt & Felleisen, POPL 08), to ensure that all downcasts are safe.

Neat stuff.

I assume you mean The Design and Implementation of Typed Scheme. Looks very promising. I'll look into it.

i'm thick

i'm (still, haven't quite given up yet) trying to read that and grok the relation to down-casting, but it is slow going for me. is the idea that the compiler can prove that you aren't allowed to do a particular down-cast?

Much appreciated

Thanks a lot to everyone who has contributed to this thread. It has been very illuminating. There is a lot of information to process! :)