Overlapping Instances + Functional Dependencies Unsound?

In the course of a BitC discussion today, somebody pointed me at this recent note by Simon Peyton-Jones. In it, Simon cites Oleg and Martin as pointing out that (1) type families do not play nicely with overlapping instances, (2) functional dependencies appear to, but (3) the two are equivalent, so something smells wrong.

Simon gives two illustrating examples, both of which are constructed by exploiting lexical hiding across modules. I found this quite disturbing, since we rely on functional dependencies in BitC and we've been struggling with the instance resolution question. But looking at Simon's examples, it seems to me that both of them result from obvious type checking failures in the assumed implementation. I would greatly appreciate a sanity check on this.

A functional dependency constitutes a statement by the programmer that a deterministic function exists between the type parameters of the type function and its type result. We are told that such a function exists, but not how it is defined. That is: the functional dependency is an assertion that such a deterministic function exists, and we should interpret it as meaning that the typing of the program is sound provided the assertion is true. The assertion itself remains to be checked.

Both of Simon's examples construct circumstances in which two modules construct conflicting instantiations of a functional dependency by exploiting the limitations of scope visibility. Both break a type class:

class C a b | a -> b

The first does so directly by introducing instances C Int Bool and C Int Char. The second does so by simultaneously introducing C Int Bool and C 'a [a] forall a. This is a problem because the second instance might instantiate polymorphically to C Int [Int].

In both cases, it seems to me that we have a global determinism requirement on the type function that isn't being checked correctly, and that this is the root of the problem. In the first case, the overlap is direct. In the second case, it is sufficient to note that the two instances are not unifiable. The root problem appears to be that a global assertion is not being globally checked. That such a failure to check would lead to unsoundness does not seem either complex or surprising to me. Indeed, I would be surprised by any other outcome!

So here is my question: assuming we do check that the functional dependency requirements are not violated, does an unsoundness problem remain? It's not too late to drop fundeps from BitC v1 if we need to be conservative about this.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I think, this just shouldn't have type-checked:

From one of the examples:

module M2 where

  import X

  instance C Int Bool

  g :: T Int -> Bool

  g (MkT x) = x

I don't know Haskell too well but my 2 pennies: If you just follow the values, I think this is where any value may be converted to a boolean. It hints at that the definition of T should have been written/interpreted with probably an existential somewhere since type information is lost.

(x should have been typed with a skolem constant and not unify with Bool, therefor the problem is not with functional dependencies but with the definition of type T.)

Take a closer look

Take a closer look at the type signature of MkT in that example. MkT x having type T Int should imply that x has type Bool -- assuming instances cannot overlap.

Depends on your semantics

If you type infere with the given instances as a database of facts, you're right. If you interpret the types without that knowledge, but just use what you can infer from the Class declaration, the thing should have an existential type, which maybe later can be used to specialize that type to type Bool within the context of the whole program - if there wouldn't be overlapping instances in the whole program.

That's what I remarked later. He shouldn't typecheck with a database of facts, but just get the types right first, and only use the overlapping instances the last moment to maybe optimize some stuff away.

I guess SPJ looked too much at prolog and not enough at HOL proofs when he devised his method for type checking classes and instances.

[ In essence, SPJ uses a closed world semantics. From a single instance C Int Bool he derives that all instances of C are Int Bool (which is true in a closed world/the scope of that particular module, but false within the scope of the whole program), that and the fact that the type of T is off is the mistake. ]

TBH, it's a lesson from my own Hi compiler.

Ok, so, I am building a compiler, named Hi -forthcoming in another year or another few centuries,- which shares a few features with Haskell. I skimmed the 'Typing Haskell in Haskell' paper by SPJ, didn't like some of it, and implemented another type checker in/for Hi which, amongst others, should solve that closed-world assumption approach.

At least, it solves it conceptually. Can't say I really trust my type checker too much at the moment.

But, I doubt there is any real conceptual problem with functional dependencies, I just don't like the proposed solution by SPJ. It seems just too close to: "Ok, if I can derive that all lists only contain characters, then I can use that fact." One normally doesn't use overlapping instances for that either, only, maybe to optimize within a specific context.

Not closed world semantics at all

The class declaration claims that for any given a there is only one b such that there is an instance C a b. Deducing that b ~ Bool from C Int b in a scope where there is an instance C Int Bool is not making a closed world assumption at all.

The fishy part, as shap says, is in how the compiler checks whether you live up to the promise made by the fundep.

I thought

there is only one b such that there is an instance C a b.

contrary, that the class declaration states that 'b' depends on 'a,' no uniqueness implications? But, I can't say I know Haskell that good.

[ Ok, guess I was smoking the wrong, uuh, pipe, here. Wth are they trying to solve then? ]

Not so much a Haskell thing really

contrary, that the class declaration states that 'b' depends on 'a,' no uniqueness implications? But, I can't say I know Haskell that good.

As far as I can tell, the term "functional dependency" originates from relational database theory, where "a -> b" means exactly that for any "a" there is a single uniquely determined "b". This is essentially how GHC uses it as well, both for drawing inferences and rejecting programs where the relation does not hold.

uh. nope? or, uh, yes.

From the original post:

A functional dependency constitutes a statement by the programmer that a deterministic function exists between the type parameters of the type function and its type result.

I.e., a->b just posits the existence of a Skolem function. Or rather, yes, b is uniquely determined by a, not for any b a is unique.

The Root of the Problem

I am not sure but isn't Haskell's type checking with `type families' an instance of a closed world assumption (proving with a database of facts), where the world just later turns out to be a bit more open than assumed?

If that is so, I don't think there's a lot wrong with the types being written, or unsoundness. It's just a matter of whether you want to use that closed world assumption or not, and the benefits/costs of that approach.

I am little bit puzzled what

I am little bit puzzled what you mean by unsoundness here (but then again, I am really no expert on Haskell).

A problem where overlapping instances can lead to logical unsoundness in HOL is described here:

Checking Conservativity of Overloaded Definitions in Higher-Order Logic

I am not sure if there is a connection to your problem.

Int cast to Bool

One of Simon's examples can lead to an int being interpreted as a bool. It's not hard to generalize the problem to other mis-typings.

Haskell's module system

Haskell's module system provides a form of namespace support, but that support is not so well designed - c.f. http://hackage.haskell.org/trac/haskell-prime/wiki/FirstClassLabels

It also runs into problems which parallel ones commonly found in the implementation of multiple inheritance - i.e. what happens when different modules import from a common third module?

See also the discussion here - http://www.mail-archive.com/haskell@haskell.org/msg21158.html -

All of the above makes me want to ask why it is that you see "overlapping instance" as a feature to be copied rather than a hackish workaround for this problematic name space support in Haskell? Seems like it would be better to start with a different namespace model - e.g. like C++'s notion of namespace - and then improve on that.

My Understanding

I haven't scruitinized GHC enough to verify this, but I expect GHC is unsoundly mixing environments derived from conflicting instance declarations in different modules.

For soundness, each function needs to either be:

1. Typechecked and executed fully under the current module's instance declarations, not accepting any of the runtime typeclass environments implied by "MyClass t => .." constraints on the function. Here, any use of "C Int t" must unify t with Bool.

2. Typechecked solely on the basis of the current module's class definitions (ignoring its instance declarations), and relying on the caller's runtime typeclass environments. Here, any use of "C Int t" can't derive t from local instance declarations, since these may differ from other modules' instances.

I think Haskell mixes type inference from instances the current module with environments from the caller's module derived from a potentially different and conflicting set of instances.

At any rate, Haskell's underlying typeclass approach is so implicit that it complicates understanding which instance's bindings (local or caller's) are applicable in a given scenario, and fundeps then draw their types into question.

It's always useful to rethink a given program's typeclass semantics by rewriting it in Cayenne using dependent types, with classes represented as record types and instances are record values, and all environments passed around explicitly.

Overlapping Instances + FD in GHC are still sound

It should be stressed that GHC does NOT admit the problematic examples
in the Simon Peyton-Jones message. That is, they do not type check,
and no segmentation or other run-time faults arise. Even the example
Bad at the end of the message does not type check in GHC
6.10. Actually, even the module M does not type check.

For example, in the following program

class C a b | a -> b
instance C Int Bool
data T a = forall b. C a b => T b
foo :: T Int
foo = T True

bar :: T Int -> T Int
bar (T x) = T (not x)

although the definition foo is accepted, the definition bar is rejected.
Although the type checker could deduce from function dependencies that
x must be a boolean value, the type checker still refuses to type
check the program because T is existential, and quantification imposes
the abstraction barrier.

Simon Peyton-Jones message was discussing a general situation; it
seems that in general one has to check for conflicts between instances
eagerly. I personally think that GHC does the right thing when it
refuses to improve `rigid type variables' -- however irritating that
turns out to be.