Implicit conversion (subtyping) vs Explicit conversion

In the design discussions for Zen, (see: http://lambda-the-ultimate.org/node/5376) we have hit an important decision point, and I wanted to get opinions from the PL community about what they think is the best approach. The choice is between:

- Implicit conversions with explicit invariance
- Implicit invariance with explicit conversions

By way of an example, given a type like this Set<Int>|Set<Float> we could implicitly convert it to Set<Int|Float> where required (so Set<Int>|Set<Float> <: Set<Int|Float>.

- Could this cause problems where the fact that all types in the collection must be the same is a property relied upon by an algorithm.
- How does this interact with mutability? Is it only sound to implicitly convert with immutable collections?

Then there is the question of boilerplate. With implicit conversion we need an annotation to say 'do not implicitly convert', in contrast by forcing explicit conversion we need an annotation (function call) to do the conversion.

- Which option results in the most boilerplate depends on how often the situations occur. Do people find they are wanting to convert types (subsumption/subtyping etc) more often than they are wanting types to remain fixed.

I would greatly appreciate views from the community on their experiences of designing languages with these features, or using languages with either approach.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Always do it for immutable collections

This conversion is completely unsound for mutable collections. If we pass a MutableSet<Int> to a procedure expecting a MutableSet<Int|Float>, then it will feel free to add a Float to it, which violates its original invariant.

But for immutable collections, it is always sound. A procedure that is prepared to accept either ints or floats in the set it receives should be fine with a set that is all ints or all floats. I see no reason not to permit this conversion always.

By the same token, it should be possible to pass either an ImmutableSet<Int> or an ImmutableSet<Float> where an ImmutableSet<Int|Float> is allowed. This coercion is what Algol 68 called uniting (it was the first language to introduce the word "union" for alternative types).

I see no reason not to

I see no reason not to permit this conversion always.

If the set values are unboxed and the types are not size-compatible, this conversion simply won't work.

Not subtyping

So we cannot allow the type system to subsume the types, that makes sense to me.

There is another option, which is implicit conversion, so that a compiler pass resolves the type mismatch by automatically inserting a conversion, like some languages do with numeric types.

A third option is to automatically generate a wrapper type class that allows a read of "Int|String" to work by boxing the read value.

This is the set situation in

This is the set situation in my language, if it helps. This would be a set definition:

/*
  An existence of '@' denotes existing variables. Its absence
  denotes new variables. I write the function operator backwards
  (<-) to achieve data centric solution. Existing variables are
  expected to be applied later as parameters through pattern 
  matching. New variables don't require application and
  automatically lift one level up on application of their children.
  I introduced meta sections to be filled between '<' and
  '>' that accept only valid types.
*/

Set <- (
    (
        Item <- <@T>,
        Next <- (@Set @T | ())
    ) <- (T <- @Type)
)

Now if we want to instantiate a set we can do it by first providing a type as the first parameter:

UnmutableSet <- @Set @Int,
MutableSet   <- @Set (@Int | @Float)


Then we can provide elements by pattern matching:

X <- @UnmutableSet (1, 2, 4)


Now it would be perfectly safe to appply elements of the unmutable set X to construct a mutable set with the same elements:

Y <- @MutableSet @X


In this case the implicit conversion works.

Let's see the other side by redefining our unmutable set:

UnmutableSet <- (@Set @Int | @Set @Float)


now we can write:

X <- @UnmutableSet (1, 2, 4)


or

X <- @UnmutableSet (0.1, 0.2, 0.4)


but in niether case X can contain both elements of Int and Float. I think that I am dealing only with unmutable sets in this example also, so the situation is again safe, it is either a set of ints or a set of floats, it is never mixed up. Again, I can do the following, no matter what X is, a set of ints or a set of floats:

Y <- @MutableSet @X


I hope I didn't miss the point :/

General Principle

I propose a general principal which is that almost all conversions should be explicit initially. Then, very carefully add implicit conversions to simplify concrete syntax, but only when you have use cases to add to the regression test suite, and only when the implicit conversions act together soundly. (1)

As a guide, favour isomorphisms and natural transforms for implicit conversions.

What you make implicit is, to quite some degree, what distinguishes your language from others. Providing explicit conversions even if not commonly elaborated has useful consequences: there is a way to make a complex combination more explicit, and, for value conversions, the explicit form may sometimes usefully be promoted to a function.

However there are some implicit conversions which are hard not to build into a language. One of these is the conversion of functions to closures and the corresponding application operators. Its quite common to implicitly convert a function in the function position of an application to a closure, and use the closure application operation in the concrete syntax, and to convert a function used in the argument position to a closure as well.

Another common implicit conversion is eta-expansion. Some languages such as Felix implicitly convert type constructors to function closures this way. Ocaml, for example, does not.

Almost universally, specialisation is implicit when applying a function to an argument, the parameter type of the function is specialised to match the argument. however this is not always possible, especially if you have type inference. A good example is Ocaml polymorphic variants, which sometimes require explicit conversions as well as explicit type annotations.

So again, as a general rule, start with as much as you can explicit, if only to ensure there is in fact an explicit way of explaining what you intend to make implicit. I think you may be surprised how much is already implicit and finding explicit representations is often quite challenging.

(1) By "soundly" I mean that if the conversions are ambiguous, the semantics are the same. Often several compositions of conversions could be expected when omitted, but the choice has no consequences. Indeed in languages with overloading (such as C++) implicit choices abound and are actually the basis of apparently generic behaviour. One finds quickly this is only apparent when trying to generalise it, and it can certainly be confusing.

This actually leads to another principle: if the program is modified or the programmer just made a mistake, and an error is found, the diagnostic can pinpoint and explain the source of the problem. C++ template errors and HM-type inference are two examples of very bad design choices according to this principle, since it is impossible to explain the errors well in both cases.

Examples of poor choices

There is no better example of very bad design choices than C++. It supports overloading, implicit conversions between numeric types, implicit conversion to bool of both integers and pointers, implicit conversion of 0 to NULL pointers, implicitly applied explicitly defined operator conversion, implicit overloads of function templates, implicit specialisation, implicit conversion of references of references to references, implicit conversion of non-const to const pointers and references, implicit constructor conversions, implicit overloading of function on assignment, implicit use of the "this" pointer in methods, implicit type variables inside template classes, implicit qualification of names in classes and namespaces, introduction of implicit qualification with using declarations, implicit conversions of arrays to pointers, implicit conversion of functions to function pointers, implicit conversion of function pointers to function closures, implicit class-nominal subtyping, sporadic silently lazy name binding (dependent name lookup), implicit disambiguation of ambiguous parsing of some constructions as definitions or declarations, implicit conversions of pointers to void* ... er .. did I leave anything out?

Some of these implicit conversions actually lead to unsoundness in the type system, the most well known being the implicit conversion of an array to a pointer, and implicit subtyping of pointers, together make the type system unsound (an array of derived converts to an pointer to a base which if increment moves the wrong distance in the pointed at array). This in turn shows that the underlying C language is fragile and badly designed, because it is hard to generalise.

Another, less well known unsoundness is the implicit conversion of the type of a class under construction to the non-const pointer "this" even if the value being constructed is bound by a const reference: if the this pointer is allowed to escape the object becomes mutable, despite being declared const.

Yet another issue, this time also well known, is the implicit slicing of a value of a derived type on assignment to a variable of base type. The problem is that the variance is wrong, the assignment is unsound. Be wary of this if you have a language with mutable objects (as already mentioned in a previous post). Assignment is invariant, you cannot "subtype" either the LHS or RHS.

Implicit defined inside the type system

The unsoundness problem is why I prefer to start with a sound type system with no implicit conversion. Implicit conversions can be added as a type class inside the type system forcing all conversions to be sound. This reminds me of the difference between intentional and extensional, in that with an intentional equality there is no definition of equality inside the system, whereas with extensional there is. I am not very clear if I am applying the concept of intentional/extensional correctly,does anyone have a clearer understanding of this?

intensionally extensional

https://en.wikipedia.org/wiki/Extensional_and_intensional_definitions#Intensional_definition

https://en.wikipedia.org/wiki/Extensionality

I somehow doubt adding conversions to a type class is enough to make them sound. It is only enough to make the typing sound. For all suitably composable conversions f g h k the requirement is f . g = h . k, that is, all squares must commute.

This is typically why in C++ significant implicit conversions used in matching arguments to parameters are limited to one.

However it is worse: an implicit conversion should probably also be a functor. This means the above square must also commute when any two of the operations are explicit. Roughly if you add two shorts, and convert to int, you should get the same result as adding two ints. Otherwise the conversion silently changes the semantics (which C is famous for). Yet there is no unsoundness here in the typing. It's the semantics that are unsound.

I think this explains why matching an argument to a parameter with polymorphic subsumption is an acceptable implicit conversion: it is an embedding, specified by the MGU of the unification, hence monic.

Subtyping, however, seems risky because it discards information. So I'd tend to require subtyping conversions to be explicit, so the user can always point to where in the code they deliberately threw information out. If, as in C++, you have overloading and implicit subtyping conversions the semantics can be silently changed by adding an overload, which is known as hijacking. [And yes hijacking can also happen with type classes although it seems less likely]

Recursive and Generative, Soundness and Completeness

I am not sure how those definitions apply in a recursive and generative definition. For example if I say that n=0 is a natural number and all n=n+1 is also an natural number, that would be extensional right? I am listing all the members, even though I am doing so algorithmically.

If so this corresponds to my definition of "within the system" as we must be able to state both "0" and "+1" inside the logic in which we are making the extensional definition.

On the other hand I get the impression if I provide a special operator that may not even be definable within the system, that is intentional, as this is referring to some necessary and sufficient condition that may not be enumerable within the system.

Commutativity is not a requirement for type soundness. The property of soundness means that a program which passes type checking cannot "go wrong" for some definition of "go wrong". Completeness means there is a valid type for every program that does not go wrong.

So if we have a sound type system, then we cannot within the type system make any unsound rules (because if the type system is sound, then the rules must be sound). However such rules may not necessarily be complete as you may only include one direction of a bidirectional equivalence. This does not make it unsound, just incomplete. As I do not know if any sound and complete type system, then lack of completeness does not seem problematic as long as we preserve soundness.

Commutativity

I didn't say commutativity was required of operators, just for implicit conversions. The reason I think it is a sane requirement in that case if that you cannot "see" the actual operations chosen, it had better not make any difference.

The issue is not type soundness, but semantic soundness. Obviously we can have two pairs of functions for which

    f . g != h . k

even though the types agree (the comparison is type safe!). I'm saying you should not model f.g,h,k as implicit operations. In C++ this problem does in fact arise, and one has to very very carefully try to decipher disambiguation rules in the C++ Standard to discover what actually happens.

Type checking reduces the possible class of errors in a program, it doesn't stop the program going wrong. More expressive type systems reduce the class more in theory, but in practice this may not be realised unless the typing is intuitive because programmers can always bypass the type system one way or another (by casts or by simulation).

I think it is also incorrect to say that in a sound type system you cannot make unsound rules. The rules of the type system are not governed by the type system, they're outside the type system by definition. If you wish to catch errors in the introduction of new type rules you need a kinding system for those rules. It will not catch all errors, so it will not stop you making your type system unsound, but it will help.

Extensional rules cannot be unsound

My point was extensional rules are inside the type system, therefore cannot be unsound.

Consider a type system like Haskell. If we add type families to the type system we can add some type level operations that are a bit like type functions, and yet all the types are subject to the normal type checking rules. So you can add any type families you like, you do not make the Haskell type system unsound (providing the type family types are sound, so they must be coherent, but the definition of type families is part of the type system)

So this is a fundamental point, extensional definition cannot make the system hosting the definitions unsound, as they are inside the system.