J&: Nested Intersection for Scalable Software Composition

J&: Nested Intersection for Scalable Software Composition by Nathaniel Nystrom, Xin Qi, Andrew C. Myers. 2006.
We identify the following requirements for general extension and composition of software systems:
  1. Orthogonal extension: Extensions may require both new data types and new operations.
  2. Type safety: Extensions cannot create run-time type errors.
  3. Modularity: The base system can be extended without modifying or recompiling its code.
  4. Scalability: Extensions should be scalable. The amount of code needed should be proportional to the functionality added.
  5. Non-destructive extension: The base system should still be available for use within the extended system.
  6. Composability of extensions.
The first three of these requirements correspond to Wadler’s expression problem. Scalability (4) is often but not necessarily satisfied by supporting separate compilation; it is important for extending large software. Non-destructive extension (5) enables existing clients of the base system and also the extended system itself to interoperate with code and data of the base system, an important requirement for backward compatibility. Nested inheritance addresses the first five requirements, but it does not support extension composition. Nested intersection adds this capability.
Compare this approach to one taken by Scala (or read the section 7).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This approach looks

This approach looks promising. No doubt intersection types and dependent types have a multitude of useful applications yet to be explored. Has something similar been done with modules?

Mixin modules

Has something similar been done with modules?

If you don't mind the plug, you might be interested in a new paper by Derek Dreyer and myself, which discusses (nested) mixin composition as a basis for full-blown ML modules.

First-class Modules Redux

That's a great paper, thanks! This is exactly what I was after. I've always meant to investigate mixins, and your paper put them into perspective as compared to ML modules.

I'm interested in any developments in module systems, with the ultimate goal unifying modules with the core language. I see that you mentioned first-class modules as future work too. You and I have touched on this subject before in fact. I've been digesting Extensible Records With Scoped Labels, which mentions first-class modules by extending MLF with a powerful record calculus ala Jone's Using Parameterized Signatures to Express Modular Structure. I believe Jones addresses your type abstraction and translucency concerns in parameterized signatures.

Alas, while Hindley-Milner systems might be able to specify type sharing, they are not up to the task of Leijen's Prelude example which uses impredicative higher-ranked polymorphism. I wonder whether there's a simple resolution to that using explicit type annotations. I think MLF is beyond me at the moment.

I'm curious what additional complexity your MixML brings to the first-class modules table, and whether it too can be expressed in MLF with a suitable record calculus, and what that calculus will look like.

MixML and first-class modules

Modules as first-class values wouldn't be any harder in the MixML setting than elsewhere. The main reason we omitted them was space.

Expressing MixML in a record calculus isn't that simple. You can regard the IL from the appendix and the evidence translation into it as such a step. Obviously, the result is not how you would want to write programs, esp considering its way of coping with unresolved recursion. Going the other way round, i.e. expressing records and functions in terms of modules (plus some local type inference), is much more compelling.

As for Jones' paper: while nice it doesn't get anywhere close to addressing all the subtle issues that pop up in the context of full-featured modules (not to mention recursive ones). If you haven't read it already, Harper & Pierce' chapter in ATTAPL discusses a number of them. Derek's thesis is a definite reference if you want to go into the nasty technicalities.

Going the other way round,

Going the other way round, i.e. expressing records and functions in terms of modules (plus some local type inference), is much more compelling.

Any way to reduce the overall complexity of a language is welcome. MixML on its own is already quite complicated, and to tack on a core language makes it very intimidating to less experienced programmers. Consider the number of concepts in MixML above and beyond a core language of functions, records, etc.: modules, units, unit suspension/instantiation, sealing, imports/exports, kinds, opacity/translucency/transparency, mixin composition/merge, and some injection/projection required for recursive type definitions that I don't quite understand yet.

Contrast this with more popular OO languages like C#/Java: classes, methods, interfaces, type parameters, public/private/protected/internal modifiers. Almost half the number of concepts to learn. I'm not saying the additional concepts aren't well-motivated, just that the burden on the user is much higher. If defining a core language in terms of modules is easier than the other way around, then I'm all for it as long as the end-user burden is reduced.

Questions/notes on MixML:

1. Typo on page 5? "For that reason, a dynamic occurs check is performed when a type definition is evaluated". The "dynamic occurs check" reads oddly. "dynamic check" alone captures the idea does it not?

2. MixML appears to require exceptions in order to support its linking, due to the dynamic cycle checks. What if a language does not support exceptions? For that matter, how is the exception caught and handled, since module imports and linking often occurs at top-level. Does the user have to keep possible runtime exceptions in mind when importing modules?

3. What level of type inference can we expect from MixML, particularly when used to define a core language? Given module type systems are generally more powerful and complex than core language type systems, I'd wager inference would be very difficult.

Complexity

MixML on its own is already quite complicated

While this is certainly true, one point we are trying to bring across in the paper is that it is actually simpler than conventional accounts for ML modules (esp when extended with recursion), because we are able to unify a lot of constructs. And I can assure you that it is significantly simpler than most mainstream OO languages. The reason that this may not show is that you hardly ever look at them at a comparable level of deconstruction and formal rigour. The paper primarily presents an idealised calculus, not necessarily a real language.

Re Q1: "occurs check" is standard terminology for the check performed in unification algorithms to detect cycles.

Re Q2: Indeed, this is a question that has to be addressed for a more concrete language design. But in fact, we have recently found a way to prove that these checks aren't needed.

Re Q3: You can easily combine MixML with an HM-style core language -- no difference relative to current MLs. If you want to unify language levels then you certainly face an open research question -- again, as with current ML. I can only envision some form of local type inference / bidirectional type checking.

And I can assure you that it

And I can assure you that it is significantly simpler than most mainstream OO languages.

Of course. I was just noting the user's perceived burden. MixML is certainly simpler than "modular type classes".

Re Q2: Indeed, this is a question that has to be addressed for a more concrete language design. But in fact, we have recently found a way to prove that these checks aren't needed.

Neat. Can I get a hint? :-)

Re: modules as first-class values

You said it's no harder than supporting any other system. Just to check my understanding, all that's really needed is higher-ranked polymorphism correct?

Technical

Neat. Can I get a hint? :-)

It's very technical. Intuitively, you have to show that cyclic types are never inhabited, so don't matter for soundness. However, actually proving this is not easy, and in our case involves step-indexed logical relations.

all that's really needed is higher-ranked polymorphism correct?

Well, no, what I meant is that you can still do it in the style of e.g. Moscow ML.

As for Jones' paper: while

As for Jones' paper: while nice it doesn't get anywhere close to addressing all the subtle issues that pop up in the context of full-featured modules (not to mention recursive ones). If you haven't read it already, Harper & Pierce' chapter in ATTAPL discusses a number of them..

I am reading ATTAPL now, so I'll pay careful attention to that chapter. The most significant omission of Jones' paper is obviously abstract types, but he tackles that, recursive definitions, and other barriers to full-blown first-class modules in First-Class Modules for Haskell. While not as syntatically usable as the ML module language, it does seem that he's gotten pretty close to unifying core and module language. I'll see if ATTAPL (or anyone here?) can point out any remaining deficiencies.

Actually, I totally forgot

Actually, I totally forgot about Dan Leijen's paper First-class polymorphism with existential types. It's a more elegant solution than the skolemization Jones' Haskell paper uses. Existentials are much easier to use in that work, so I'm curious about how you think this work furthers the integration of modules with the core language. In re-reading Derek's Higher Order Modules paper, I take your point that second-class modules supposedly propagate strictly more type information that first-class modules, so perhaps there is still something fundamentally missing from the core language. I suspect that multistaging + even limited first-class modules should provide anything second-class modules can currently handle.

Comparison with Scala

I haven't had time to look at this in any detail, and I'm afraid I might not find the time... But I would note that at least a few aspects of their comparison with Scala are out-of-date. In particular, Scala's traits at this point do support fields, and there is a (relatively) lightweight encoding of virtual classes in Scala. I would be curious to see an updated comparison, since at this point I'm fairly familiar with Scala's capabilities.

Can you provide a pointer?

The OOPSLA paper is from a couple of years ago, and may be a little out of date, but I'm having trouble finding any documentation of either of the Scala features you mention. Scala does have virtual types but from my browsing, it seems that virtual classes are at least not implemented yet, and I can't find any explanation of a design. Can you point to any documentation or paper that discusses this?

The J& approach does not have virtual classes in the Beta sense. Instead, nested classes are inherited as members of their containing class, not as members of a containing object. Nested inheritance is a module-level mechanism that works even without instances of the containing class. One benefit is that in J&, whole packages can be inherited and intersected.

Pointers...

Regarding fields in traits, see the end of the Scala Language Spec, the section called "Changes in Version 2.0", under "Changes in the Mixin Model." It says

The syntax of traits has been generalized from version 1.0, in that traits are now allowed to have mutable fields.
The main spec no longer makes reference to this old restriction.

Regarding virtual classes in Scala, you're right that there is currently no implementation in the language. There is a relatively lightweight encoding, although it does require some boilerplate (see here). I believe there are plans to "formalize" this encoding with some syntactic sugar in a future version of Scala, but I don't know for sure.

Regarding J&, you've made me curious. I'll have to try to find time to look at the paper...

old news, anyone?

This little abstract reads just like Stroustrup and Alexandrescu. People have been doing this stuff in C++ for years.

I think you missed a

I think you missed a critical point from the abstract:

Type safety: Extensions cannot create run-time type errors.

That is most definitely NOT available in C++.

C++ doesn't do this

C++ doesn't support intersection/composition at the package level, let alone type-safe intersection. As described in the paper, in J& you can compose two compilers to get a new compiler that implements a merge of the two languages. And it doesn't require source access to the two merged compilers.