1ML — Core and modules united

This is an amazing paper that I've been anticipating ever since I read the abstract about half a year ago.

ML is two languages in one: there is the core, with types and expressions, and there are modules, with signatures, structures and functors. Modules form a separate, higher-order functional language on top of the core. There are both practical and technical reasons for this stratification; yet, it creates substantial duplication in syntax and semantics, and it reduces expressiveness. For example, selecting a module cannot be made a dynamic decision. Language extensions allowing modules to be packaged up as first-class values have been proposed and implemented in different variations. However, they remedy expressiveness only to some extent, are syntactically cumbersome, and do not alleviate redundancy.

We propose a redesign of ML in which modules are truly first-class values, and core and module layer are unified into one language. In this "1ML", functions, functors, and even type constructors are one and the same construct; likewise, no distinction is made between structures, records, or tuples. Or viewed the other way round, everything is just ("a mode of use of") modules. Yet, 1ML does not required dependent types, and its type structure is expressible in terms of plain System Fω, in a minor variation of our F-ing modules approach. We introduce both an explicitly typed version of 1ML, and an extension with Damas/Milner-style implicit quantification. Type inference for this language is not complete, but, we argue, not substantially worse than for Standard ML.

An alternative view is that 1ML is a user-friendly surface syntax for System Fω that allows combining term and type abstraction in a more compositional manner than the bare calculus.

Basically, it unifies module and core languages, without dependent types, while supporting type inference without requiring more type annotations than OCaml (i.e. only on module or higher-rank types). It leaves applicative functors for future work.

An extended technical report and a prototype interpreter are available here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

To be clear, it natively

To be clear, it natively supports a subset of applicative functors.

Looks pretty cool so far. Add support for implicits, and you have a nice, small but expressive language.

Records, Modules and Typeclasses

This sounds similar to some discussions we had a while back where i was advocating combining records, modules and type-classes (which can be done by adding implicits). It seems like its going in the right direction to me, the main differences I would want are purity with side-effect control, borrowing a bit from Haskell, but strict (with yield) not lazy.

See this discussion: http://lambda-the-ultimate.org/node/4865#comment-78081

Actually, it's exactly what

Actually, it's exactly what you proposed there - a single type system unifying records and modules. What Andreas did is that he made modules into first-class objects while keeping the type system decidable; in essence, if I understand his paper correctly, he makes sure that types are not "expanded" when type-checking, i.e. abstract types cannot be instantiated by module signatures (which might contain yet other abstract types, which would need to be instantiated, and so on, exploding the type-checking system).

I guess purity is a rather orthogonal topic of the type system, although the one problem I see is that there are no truly ideal effect typing systems out there. Laziness has more to do with the implementation than with type system (as long as you avoid dependent types).

Predicativity

Indeed, but working with ML involves dealing with various issues that are the result of design decisions taken in ML before considering that records and modules (and type-classes) might be merged into one construct. The paper goes to a lot of trouble to handle opaque associated types, when I would just go with type-families and avoid the problem. If types cannot depend on values you lose nothing by doing this, and get a cleaner, more easily understandable type system. I agree though that predicativity is the key property, but my approach is to consider only type system features that are predicative.

Module specific

The idea that modules are just (dependent) records is as old as the module literature. Systems that "merge" them have already been devised 25 years ago. The reason for the complexity is not that people didn't see this and took a wrong turn somewhere in the past, but inherent in first-class modules.

The usual definition of a module is a container that can bundle all language entities. Type families + plain records might give you an easier system (depending on your metric), but they don't give you modules, let alone first-class ones. (Nor vice versa, btw: ML modules cannot express family polymorphism.)

What is a Module?

I was not claiming any originality, but I was certainly advocating this approach back then (and still do). I like 1ML and think it represents an improvement over ML, and is going in the right direction. I find that you have a very specific definition of module, that I have trouble with. It is certainly an ML centric view of a module. We can have predicative associated types in the way Haskell does them (and this is really just a type-family declared in the record):

data Readable x = Readable { -- interface
    type IndexType = Int 
    data ValueType x :: *
    source :: x -> IndexType -> IO (ValueType x)
}

a = Readable (Array Int) { -- implementation
   data ValueType (Array Int) = Int
   source a n = return a[n]
}

This seems quite module like to me. It seems reasonable to say module instance types can only depend on types. We can have externally visible types in the 'record definition' and type-dependent, and hidden type definitions in the 'record definition' and the 'record instance'. We can also have types that are externally visible, but where we hide the constructors/destructors inside the record (that either are type dependent, or in the record definition only). What am I missing?

Hm

Hm, that example is not valid Haskell AFAICT. I'm not sure what semantics you envision, so let me ask a couple of questions.

  • Can I write a basic signature like

    data M = M {
      data T :: *
      f :: T -> Int
    }
    

    that is not required to be artificially indexed by an external type?

  • Can I define different modules with this signature

    m1 = M {
      data T = A | B
      f A = 0; f B = 1
    }
    m2 = M {
      data T = C Int
      f n = n
    }
    

    with different implementation types?

  • Can I then reference those from other modules or signatures (never mind the dot notation):

    data N = N {
      data V :: *
      g :: m1.S -> V
    }
    n = N {
      data V = C m1.S
      g = C
    }
    

Also, Haskell itself already defines what its notion of module. Can I put type class definitions into a record? Instances? Other things I can define in an actual Haskell module? If not, how do records represent modules, even under a completely Haskell-centric view?

Not Haskell

I am using a Haskell like type system, but with extensions that are not in Haskell. Typeclasses don't exist as such, and are replaced by implicit parameters, with records acting as modules and type-classes.

To answer the questions 1). the basic signature is just a type definition in the class, it has to be the same for all instances. 2) No you cannot have types that depend on values, so all implementations must have the same types, or types that depend on a type parameter. 3) Yes you can reference the types from other module signatures.

So we would have to do this:

data M x = M {
  data T x :: *
  f :: T x -> Int
}

data Phantom1
m1 = M {
  data T Phantom1 = A | B
  f A = 0; f B = 1
}

data Phantom2
m2 = M {
  data T Phantom2 = C Int
  f n = n
}

data N x = N {
  data V x :: *
  g :: (M Phantom1).S -> V
}

data Phantom3
n = N {
  data V Phantom3 = C (M Phantom1).S
  g = C
}

Not modular

And can you take your code snippet above and put it into a record? In particular, can you define a phantom type in a record? The way I understand your explanation you can't. So there still is some richer notion of module underlying your language that you cannot capture with records.

The requirement to index everything by external types is key to that problem. The obvious idea of a module is that it is modular. That means things like self-contained, encapsulated, compositional. In your encoding, you cannot express module records without referring to a specific external phantom index type. The closest to what you could call "module" there would be the pair of that phantom type and the record. This pair in turn cannot be bundled as a self-contained (i.e. modular) entity, unless you have existential types. If you have existential types you could pack it up manually, but then you could not access this entity without unpacking everywhere (while losing all type sharing). Nor could you nest or abstract such existentials in a compositional (i.e. modular) fashion. To do so, you would have to manually do the kind of phase-splitting encoding and quantifier pushing that is complicated about the semantics in the paper.

Run time typing

I have a simple philosophy, types are for compile time, and values are for runtime. If you want runtime polymorphism its better to use a datatype. So in the case where I have several records with different types, types are not the right mechanism rather this is what I want:

data M1 = M1 {
  data T = A | B
  type S = Int
  f :: T -> Int
}

data M2 = M2 {
  data T = C Int
  type S = Int
  f :: T -> Int
}

data M1or2 = C_M1 M1 | C_M2 M2

data N1 = N1 {
  type V = C M1.S
  g :: M1.S -> V
}

n1 = N1 {
  g = C
}

The only real difference between data M1or2 and a type-class, is that the datatype is runtime polymorphic and the type-class is statically polymorphic, I propose implicit parameters to deal with this. The other difference is type classes are open and datatypes closed. I propose to allow open datatypes, the syntax is currently like this:

data X ...
data X | C1 Int
data X | C2 Bool

We don't need the 'M1or2' if we can statically resolve the type, we can just use M1 or M2. In the dynamic case we could automatically create this datatype. This allows the same code to be used, but with dynamic polymorphism if the record is passed explicitly and static polymorphism if it is passes implicitly.

Finally with some syntactic sugar we could If we wanted make the definitions of the pairs M1/m1 and M2/m2 look like instances of a single datatype.

If I combine this with the phantom types example, then M1orM2 would just include the phantoms like this:

data M1orM2 = M1 (M Phantom1) | M2 (M Phantom2)

It wouldn't be hard to use an open sum datatype and automatically append to it as you create the instances.

Not runtime typing

No, I wasn't talking about runtime typing. In fact, the very purpose of phase splitting is maintaining the typing/runtime phase separation, by factoring apart types and values that are bundled in a module (to be clear, the F-ing approach doesn't actually do a full phase-splitting transformation in the traditional sense, but it has some aspects of it).

I'm talking about abstraction. (1) A module needs to be able to express its own implementation details locally. And (2) it needs to be able to enforce that client code remains independent of those details. In a typed language, abstract types play a central role in this.

And even if you didn't care about ADTs, as soon as you have any form of nominal type definition (like datatypes), you already have the same problem. A construct that cannot encapsulate such a definition does not really qualify as a module.

I'm still not clear whether you agree or disagree that records in your language cannot express everything you can express in a "file". If you agree, then clearly, they aren't modules. If you disagree, then you would have to demonstrate how you can wrap an arbitrary piece of program into a self-contained record, in a straightforward, compositional fashion.

Considering Static Only

Yes, I agree that you should be able to encapsulate any valid program fragment in a module. Starting from the premise that Module = Record = Type-Class, the core language is defined by only allowing that which can be wrapped in a record.

As for making it straightforward, syntax can be introduced to do things like automatically declare the phantom types.

The other property I want is one type system (and one type of types, so no small and large types), and again I can pick language features to keep that requirement as I am not working with the constraints of an existing language.

Solutions

I'm pretty sure that the only solutions in the intersection of those requirements are variations of the simply-typed lambda calculus. ;)

Simply Typed Lambda Calculus.

Well at least the language will be simple. Expressive power relates to being able to express complex problems in simple ways. The other way to view this would be to make everything in the language a module. Maybe I'll end up re-inventing ML (which at least is educational), but that doesn't feel like where I am going at the moment. I see language design as very much a question of balancing different trade-offs, and ML isn't what I am looking for, or I would already be using it and not working on something new.

Kinds of Records

I am reasonably sure that there are many solutions to this, for example one solution would be to make the ADT instances types and introduce kinds for their signatures. Now I can declare a type of kind 'x' for each implementation of the signature, and with the right type this can contain any part of the language (using existentials for data hiding etc). This would not allow nested modules, but you could do automatic promotion to the appropriate 'level', so that module instances with no associated types (that do not depend on a module type parameter) are values. A module containing an associated type that does not depend on a module type parameter would need to be promoted to a type, and its module signature would need to be a kind. A module containing a nested module signature would need to be a kind. This preserves predicativity by stratification.

Type Families vs. Modules

(Nor vice versa, btw: ML modules cannot express family polymorphism.)

Any ideas on how we could solve this? I don't usually use Haskell, but most uses of type families that I've seen were for associated types of type classes, which seem to be expressible with ML modules (if we added suitable implicits). I don't know, however, if "raw" type families are really useful, and if yes, how often.

Program Units can express

Program Units can express family polymorphism (see the Jiazzi paper), but they are not first class in that context (units are first class in ML/Racket, but not Java).

Edit: never mind, I'm confusing type families with family polymorphism, which are completely unrelated.

Type Families

Outside of type-classes type-families can do things like:

data Z
data S x
data family Nat x :: *
data instance Nat Z = Z
data instance Nat (S x) = S (Nat x)

isnat :: Nat x -> ()
isnat _ = ()

"isnat" will only pass type checking on valid 'Nats'.

Records, type-families and implicit parameters seem to cleanly separate structure (records), type relations (families) and overload-resolution (implicits) in a clean orthogonal way. Modules cover two (records and families) and type-classes cover two (records and implicits). I am not sure what you get with type-families and implicits, some kind of logic programming (search based overload resolution and maps from head to goals for further search).

What's the issue?

Is it that in Haskell you can have a type class on type (Foo A) that produces the associated type (Bar A)? Whereas in ML, you don't have a way to pattern match on types? Those don't seem like they belong in the same mechanism to me (polytypic meta-programming and modules).

Type Families are non-parametric

The way I understand type families, they allow non-parametric implementations. For example, say I have a type for arrays, Array a. I could have one generic implementation (for most types), but have a specific implementation for Array Bool, which would be implemented as a concrete bitvector type.

That's in Haskell. In ML, an array like this would probably be implemented either with a module or with an applicative functor producing a separate module for each element type.

Hm... actually, when I think about it, this might actually be possible in 1ML, by writing the functor with an if statement that return a BitVector module if the element type is bool and a normal Array with type t = 'a module otherwise.

type families are parametric

The implementations if type families I have seen all behave like a map from one type to another (limited type level functions).

If I understand type theory

If I understand type theory correctly, just because a function has a parameter does not make it parametric.

Wikipedia:

parametricity is an abstract uniformity property enjoyed by parametrically polymorphic functions, which captures the intuition that all instances of a polymorphic function act the same way.

GHC user manual:

The difference between vanilla parametrised type constructors and family constructors is much like between parametrically polymorphic functions and (ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions behave the same at all type instances, whereas class methods can change their behaviour in dependence on the class type parameters. Similarly, vanilla type constructors imply the same data representation for all type instances, but family constructors can have varying representation types for varying type indices.

Not what you originally stated

This sounds reasonable, but what you originally stated about specialising an array seems different. A type family is a type level map from one type to another. Type families cannot specialise the value level, that's what type-classes do. Look at the Nat validation type-family I posted elsewhere in this thread. What is that specialising?

Specialized arrays

Well, by "specialized arrays" I meant exactly type maps; you have a generic type (family) Array a, that maps to a specialized BitVector when a == bool. There's an example on slide 5 of this slide show.

weird, but right.

It seems a weird way of looking at it to me, as the array implementations (ParArray etc) are pre-existing and nothing to do with the type-family. As a description of type families this seems confusing as it conflates the type map and the value level implementation of the arrays. It is of course a valid use of type families (and could even be part of motivation for them) but a pure type level description is I think a better explanation of them.

parametric

On further thought I am not sure, but it seems this interpretation is reading too much into 'parametric'. I don't think there is any requirement for smoothness of functions. For example intersection type systems would still be parametric, but functions can be over discontinuous type intersections.

I think smoothness (and the generic aspect you referred to) are simply a property of parametric polymorphism in universally quantified type systems, and not anything to do with parametricity itself.

"Parametricity" is that smoothness

In the type theory/PL literature/community, "parametricity" is really never just the property of having parameters, even though the word might suggest otherwise. This dates back to Strachey's distinction between "parametric (i.e., uniform) polymorphism" and "ad-hoc polymorphism" — intersection types model the latter, and I think no PL people would call them parametric. I see why the etymology of the words is misleading you, but that's the terminology we have.

At least for me, searching "parametricity" on Google Scholar confirms this idea. The first 20 papers on parametricity, at least at a glance, are extremely related to this concept.

https://scholar.google.com/scholar?hl=en&q=parametricity&btnG=&lr=lang_en%7Clang_de%7Clang_it&as_sdt=1%2C5&as_sdtp=

Universal, Parametric and Finitary

Yes, it appears parametric is overloaded with this definition, although the reason for choosing this term does not appear obvious to me. Why was parametric (clearly from "parameter") given this specific meaning? Universal polymorphism seems to be an alternate term with a clearer meaning.

Intersection type system would appear to have finitary-polymorphism.

Isn't there a difference

Isn't there a difference between module system existentials and dependent sums? If you create a module with a type in it, then outside of the module you can't see what type that is, whereas if you make a record with a type in it then you can get that type out again. I think you can may be able to do module system existentials with dependent sums + HoTT by defining the equalities such that you cannot observe the first component of the dependent sum.

Modules have both

A pivotal feature of ML modules is that they coherently support both transparent and opaque type components, which is the essence of the whole business with "type sharing". Harper & Lillibridge first formalised that in their translucent sums calculus (a calculus of records with both kinds of type components), which later was superseded by the theory of singleton types/kinds. Turns out that isn't quite enough, though, and you still need proper weak sums (existentials) on top to express local type abstraction. The main insight of F-ing modules was that, at that point, you can just drop all the other stuff and express modules with plain existentials and universals directly, since you gonna need elaboration anyway.

Visible type with hidden constructor/destructor

I would be happy with having the type visible but un-constructable. I think this is required to statically determine types and for specialisation in any case. I don't see a real need to 'hide' types (unlike functions) as they cannot 'do' anything at runtime. The only reason to hide types in my current opinion is for runtime-polymorphism which can be done with existential or sum types.

What about exposing existing types?

If you expose the definition of a type, and that type is just a type synonym of a pre-existing type, you can't suddenly hide existing constructors and destructors. That's what hiding types is for.

If a type is visible but without associated constructors/destructors, it does look essentially still hidden like an existential. In Haskell, it's enough to define a newtype without exporting its definition. But does this encoding buys anything over having hidden type members?

Does this encoding buy anything?

But does this encoding buys anything over having hidden type members?

No, it doesn't. In fact, it is significantly less expressive. For example, by existentially hiding "type t = int", it is a no-op to convert a list(int) to an abstract list(t), an int->int to a t->t, or a ref(int) to a ref(t). With the newtype approach, you'd need to perform an expensive map over the list, write a wrapper for the function, and it is outright impossible to abstract the reference.

Hiding Types

If 't' is completely hidden, then there is no problem in the first place. Its simply a scoped type, it cannot occur outside the module, nor in the module interface. No externally visible functions in the module can return or be passed a 't'. It simply does not exist outside the module.

The more interesting case is where you want expose the type, but not the contents or structure. For example a 'Window' type that represents a screen window. The graphics module may have functions that return windows and get passed windows, but it should be impossible for anything outside the graphics module to deconstruct a window. It is opaque. This is the use case I was suggesting hiding of the constructors and destructors would be sufficient. A 'Window' type defined outside the module would be a conflict or a different type (if we disambiguate types with their signature as an optional prefix).

With regards to abstracting the reference, that should be easy. In Haskell a reference is a Monad, so you have StateT Int you can use a functor to push the constructor inside the monad, as in fmap :: (a -> b) -> m a -> m b, so given data T = T int, we can write fmap (\x -> T x) and apply that to the reference.

Monads don't help

Yes, I was talking about the latter. Of course, the newtype approach works fine for a Window type, because it is a trivial example as far as abstraction goes. As for references, the Haskell equivalent would be using data abstraction to abstract an IORef Int as an IORef T, for a newtype T, which is impossible. Monads don't help here.

Type Equality

The 'Window' example may be trivial, but it is enough for any Abstract Data Type, or Object. What kind of thing needs something more sophisticated?

As for the abstraction over an IORef, it should be possible. All that is missing is the identity (IORef x = IORef y) iff (x = y). What I mean is from a type-system perspective, IORef is just an ordinary type-constructor, and from a semantics point of view it should be allowed.

Wasn't the point that you

Wasn't the point that you weren't assuming t = int but were making it a different type? If in one context you have type t, and in another you learn that t = int, don't you support hiding types?

Hidden type equality.

Well I think this part of the discussion is about local types, so for example you have "type t = int" hidden so that you cannot see 't' from outside the module. Now consider you have an "x : IORef t" and an "y : IORef int" both inside the module. It should be fine to assign x to y, but the type system will only allow it if it considers "IORef t" to be the same type as "IORef int". We don't need it to be a different type if it is not visible outside the module.

I think you still

I think you still misunderstand. The discussion is about being able to export T, but not expose its definition. For example, in ML:

module M :> {
  type T
  val v : T
  val l : List T
  val f : T -> T
  val r : Ref T
} = {
  type T = Int
  val v = 7
  val l = [8, 9]
  val r = ref 0
  val f x = x + !r
}

M.r := M.v  -- fine
M.r := 0  -- error

This is straightforward existential quantification. But you cannot express the equivalent with newtype.

Coercible

For the record (I'm sure Andreas knows about this), in modern Haskell you can avoid the no-op by using the Coercible type-class, which permits no-op coercion between ref(int) and ref(t). But it's well known that it's difficult to control the visibility of type class instances, and this causes consequences for, e.g., Safe Haskell, which relies on data abstraction for its safety guarantees.

Universal Visibility of Classes

There is a reasonable argument that type-classes should always be visible everywhere. They represent (or should be used to represent) universally true properties (like commutativity of addition).

From this point of view, trying to restrict he scope of a type class is a bad idea. Objects and Modules are a better model for things that have restricted visibility.

True ≠ known

"Universally true" is not the same as "universally known" or "accessible". Information hiding / encapsulation is a core mechanism for enforcing modularity.

Imprecise

Yes, sorry, I knew what I meant :-). My point was type classes are anti-modular and express truths that hold across all modules (or should be used for this, mixing them with modularity seems confused at best).

Nice!

wow, this is nice. Thanks to Andreas. Sorta wish Alice were alive to move in this direction. :-)

Only small types

The paper, which I've just skimmed so far, had some language about maintaining predicativity by only [ ... words ... ] small types. Would someone, Andreas perhaps, explain what kind of module use is ruled out to make this nice? Bonus points for example code.

Small = not a module type

No module you can write in ordinary ML is ruled out (*). The way to think about this is that instead of distinguishing "types" and "signatures" syntactically, 1ML distinguishes them semantically, such that "small" corresponds to what is an ordinary type in traditional ML. There effectively is only one place where the distinction is relevant, and that is signature matching: you can have the equivalent of abstract types, but not the equivalent of abstract signatures:

type S = {type T}        ;; large, because it contains an abstract type
A = {type T = int} :> S  ;; fine, abstracting int
B = {type T = S} :> S    ;; error, S is large, cannot be abstracted

(*) Other than cases of OCaml's abstract signatures, which are undecidable for the same reason.

Thanks

That makes sense. I've encountered a similar obstruction in my module semantics though I'm not targeting System F. It seems as though first class signatures would be genuinely useful in some cases and I wonder what your thoughts are on this.

One example that I remember is an abstract Map. It seems that an instance of Map should come equipped with a theory of its Key type. So, HashMap requires a hashable Key and TreeMap requires an orderable Key, etc. But this puts an abstract signature in Map.

It looks possible to allow this kind of thing in most useful cases by doing something like Coq universe checking at link-time. i.e. we keep track of signature flow and make sure it's a DAG. Thoughts?

Impredicative abstraction

You can still have impredicative abstraction, you just need to be more explicit. The paper briefly discusses two possibilities: either by piggybacking on datatypes (i.e., constructor arguments may be large), or by introducing a "wrapped" type that explicitly injects the large into the small universe (while forbidding subtyping):

C = {type T = wrap S} :> S    ;; ok

That's an easy extension. I don't think link-time checks would help. Unrestricted impredicativity already makes subtyping undecidable.

Differences

I guess our setups are quite different (e.g. I don't have the impredicativity of System F to use). First class signatures lead to potential cardinality paradoxes for me, so your reference to big and small types resonated and I wanted to try to understand a little better.

My module system is also nominal, not structural, so I don't have the subtyping worries. Do you have any good arguments (passed on by value or by reference) in favor of structural typing for modules? It seems to me that we're already using names for labels, and it doesn't make much sense to me to want to name specific labels but then attempt to infer deeper structural relationships.

Oops

This last sentence makes me sound much more certain than I actually am about how this stuff ought to work. I'm genuinely interested in examples that work better with structural typing of modules / records. The 1ML approach looks pretty smooth and I'm trying to figure out if that's the way to go.

Stratified Type System

I think I prefer the idea of a stratified type system with universes, rather than large and small types in a single universe. This would be like applying the stratification of dependent types to non-value-dependent type systems.

Stratified types

Small and large are two universes. That's their whole point.

Intuitionistic Logic

So its just a standard intuitionistic logic with universes then. That makes it a lot easier to understand for me. What's interesting is that this is effectively a restricted dependent type system. I wonder if the system is simpler starting from the full calculus-of-constructions (there are some very small and neat type checkers out there) and adding axioms for the restrictions.

Stratified types

Predicative stratified types can introduce complexity, though, as soon as you have polymorphism. Instantiating one polymoprhic type at another might not be sound. I'm tinkering with a solution to this problem that looks promising to me. I really need to run my ideas by someone who knows what they're doing, though.

Indeed

On further reflection, I don't think stratified types are necessary. I think if you enforce a clean separation of interface and implementation, and instead build your heirachies of the interfaces, you can get away with the interfaces being plain types, related by type constraints (typeclasses). This is what I had originally planned, but shifting viewpoint to the way ML does things confused me.

Nominal is not fully modular

No worries. It's a fair question.

At the module level, nominal typing is not an option of you want to be truly modular. By that I mean e.g. being able to compile modules independently from one another. That is only possible if all interface information is structural. If you model interface conformance based on nominal types, then you always need to share a nominal interface definition at least. You then have incremental compilation at best (like Java), but not independent compilation.

Also, nominal doesn't safe you from actually doing all the deep structural checks if you want it to be sound. Otherwise you end up with a system like Java, where class loading just matches up names without rechecking their definitions, and thus is fundamentally unsound (you can e.g. get a NoSuchMethodError any time).

Sounds like Gilad's argument

As I recall (here it is), you were taking the other side of that argument. I guess you're making the same point as Derek did in that thread. I'm not convinced, though. If there is agreement before hand on what the module signature should be, then that absolutely should be shared explicitly in the code (as opposed to shared over lunch).

If we're talking about post hoc reuse of the form "hey, I built this module and it happens to exactly match what you need" then really, what are the odds that everything, including all of the function names match? And even if they do, requiring an explicit coercion provides a place for someone to examine the precise semantics (not just the types) and make a determination that there is a real semantic match.

The modular linking problem

The modular linking problem is easy enough to solve, just use program units for Java. But I got out of this game because the real problem was semantic, and the modular linking problem is an acedemic one that isn't a big problem to most. In the real world, enforced module versioning is important since even without a change in signature, you are still likely going to be incompatible!

Explicit coercions come with reflection and really late binding. You are on your own at that point, but if you can tolerate linking, C# assemblies are semantically versioned and quite safe.

Economics, not just ergonomics

I don't necessarily mind being explicit about sharing an interface per se. But if it breaks separate compilation, that can mean the difference between constant (because parallel) and linear (because sequential) compilation times, for example. That can be a very real problem in large projects.

*If* it breaks separate compilation?

I think I'm missing your point. Is there something about having a shared signature that breaks separate compilation? How is this different than having a shared data type?

Think of them as closed expressions

The model for separate compilation, at least from the point of view of ML modules, is functors: you can view a compilation unit as a closed functor over its imports; being closed, you can compile it without knowing the actual identities of the modules you link against later. And you can parameterise over (the identity of) nominal datatypes just fine, by (structurally) abstracting over a module that contains them, with the right shape.

If, however, module types themselves are nominal, how do you abstract over them? You can't, without having another structural layer on top that allows to express their expected shape.

IOW, separate compilation requires the ability to model modules as closed expressions, including their types. Nominal interfaces OTOH preclude expressing these types in a closed manner.

You always need something structural at the top to abstract away from direct references.

Shades of nominality

I think my use of terminology may have lost you some time ago. I mean that modules are nominal in the same sense that data types are nominal. These are incompatible:

data Foo = Bar | Baz Foo
data Nat = Zero | Succ Nat

We do know their structure (and so they're structural in that sense), but type compatibility is based on the name (Foo or Nat) not the structure.

So back to modules, in ML you write down a structure as a record which gets some inferred type and then you have the problem of matching that to a signature (the "subtyping worries" I mentioned). My approach is more like Haskell. You'd write something like this:

theory -- optional name omitted
    type Monad(a)
    return: a -> Monad(a)
    bind: Monad(a) -> (a -> Monad(b)) -> Monad(b)

context ListMonad
    type Monad(a) = List(a)
    return(x) = [x]
    bind(xs f) = xs.map(f).flatten

As soon as you assign to Monad(a), the context (structure) you're building needs to satisfy the theory (signature) of Monad.

So... do you see any problems ahead for me with this scheme?

Similar Thoughts

This is very much along the lines of my own thinking. All modules need a signature, that can optionally include axioms.

Yeah, you've written a

Yeah, you've written a number of things on LtU in which I see parallels with in my own project. I do allow axioms in theories.

Module identities

I'm not sure I understand. I agree it's fine for modules to have identity (they have in some MLs as well). But how does this avoid structural (sub)typing, while still allowing the kind of separate compilation I was describing?

Specifically, can I define a structurally equivalent theory elsewhere, and they are compatible? If not, how can I independently compile two units using the same theory?

As for Haskell, Backpack, the proposed package system for it, notably is a module system with structural signatures, very much in the tradiiton of ML modules.

Explicit coercion subtyping

Backpack looks neat. I'll have to investigate that more carefully. That looks very similar to the style of my system (the holes and polarity stuff).

I wonder if the difference is that I require explicit subtyping coercions in situations where you'd expect things to just work?

Looking at the Backpack paper here, figure 2 on page 4 shows a snippet where two array packages are defined. The way I would do this is that arrays-a and arrays-b would need to explicitly pull in arrays-sig. I don't ever link symbols by name. Symbols are always tied to shared declaration sites. With that in mind, where does this example require subtyping? If it doesn't, do you have an example that does? Or do you think that requiring the common arrays-sig breaks separate compilation somehow?

Correct me if I'm wrong, but

Correct me if I'm wrong, but I think the issue is whether string names are part of a module's interface. In ML modules the names of the functions are significant. This essentially makes modules compile time maps of strings, just like in Javascript objects are run time maps of strings. This is unlike e.g. Java where polymorphic methods always explicitly refer to the interface that they are implementing. ClassA.foo may be a totally different method than ClassB.foo even though they have the same name. They only become compatible if ClassA and ClassB inherit from a common superclass or interface that declares foo. This means that string names are in some sense only relevant for the programmer and not for the compiler. The identifier "foo" is declared in a unique place, and from that point on it would be possible to refer to it by unique ID rather than by name. So the issue isn't that modules have identity, but that signatures have identity.

This doesn't prevent separate compilation: a module and its client can be compiled separately, they just both need to refer to the same signature and not just "accidentally" have the same shape.

It does prevent

It does prevent separate compilation. In the Java model, to compile one class "module" implementing an interface, you need access to the interface "module", and any classes or interfaces it mentions transitively. There is no mechanism in the language that allows breaking this dependency.

I don't think Java allows

I don't think Java allows post-hoc interface extension, so a class in Java is required to list all of the interfaces it extends. That hurts modularity, since anyone who uses a class must know about all of the abstractions it satisfies. Though Jules' specific point seems correct - the client and module can themselves be separated. My system doesn't require listing all interfaces satisfied at a declaration point. An interface is a dictionary and can be filled in later. I still don't see any problem with separate compilation.

Clarification attempt, plus alternatives

I think Andreas has a central point about separate compilation: If a module uses any name, that name's signature must be described somewhere in that module, or else the module can't be independently typechecked.

This is a barrier for nominal typing. Every client needs to be able to identify the unique type they're actually importing. If they can't, then they can't determine that two imports of type (Foo -> Bar) and (Bar -> Baz) actually use the same type Bar. But if the client module tries to describe Bar itself, it can't; that's the point of nominal types.

I'm not sure, but there might be an approach to working around this. Nominal logic doesn't just give us a quantifier over fresh names; it also lets us assert equality of names. So to manage nominal typing, module authors could diligently export enough equality propositions so their clients could glue (Foo -> Bar) to (Bar -> Baz). However, if we have any syntax for these equality assertions, such as (ModuleA.Bar = ModuleB.Bar), then we can probably write (ModuleA.Foo -> ModuleA.Bar) in the first place, and we're back to structural typing.

I think technically that return to structural typing doesn't cover all uses of nominal typing, but would we miss out on anything we care about?

My system doesn't require listing all interfaces satisfied at a declaration point.

Then you couldn't expect to typecheck interface conformance at the implementation site. Maybe that's what you expect, and you want to typecheck it at the usage site. Then the module containing that usage site must describe the signatures of all the interfaces that need to be checked.

It would be useless to put the interface's signature in a module of its own. Both the implementation site and the usage site would be typechecked independently of that module, so that module would never be consulted.

Injecting some of my own opinion on this, I think separate typechecking is useful to care about, but it gets buried in other issues. A typechecking-time dependency is just another name for an extension to the typechecker, and I like the idea of a modular typechecker, so why not?

So I look at my own module system design sketches as a matter of assigning rich program semantics to independently typecheckable clusters of modules, while also providing algorithms for crawling through clusters, diagnosing missing pieces, and doing DBMS-like tasks on the stored module data. A single module may not have rich enough semantics for programming or math, but the module crawler can bundle it up into a cluster, and a cluster does.

Even in a scheme like this, the signature of a cluster would tend to be structurally typed by the above argument.

Kicking the can down the road

Thanks for the clarifications, I agree with most of what you said. Let me reinforce one other point, though.

Even if you do not care about fully separate compilation (I do, especially at scale), having nominal module interfaces is ultimately just kicking the can down the road. You have to deal with a structural notion of interface descriptions at some level, if not explicitly then at least implicitly.

Assume a language with nominal module interfaces. Now you are building a package that provides some such interfaces and some modules implementing them. Somebody can then write other packages dependent on those, using an incremental compilation approach.

But how do you express the interface of the package? It exports nominal interfaces, but at the package level, you somehow need to describe their structure. Packages essentially would be a (structural) module system one level up. That is exactly the situation with Backpack in Haskell.

You can argue that the language does not need to have explicit syntax for interfaces at package level. But they still exist semantically, and I'd argue that not being able to express them makes the language incomplete. At some point, the ability to express these interfaces will likely be needed, and at that point, people typically resort to ad-hoc solutions.

Again, Haskell may serve as an example: an implementation like GHC always had its .hi files to (structurally) describe the interface of Haskell modules. These are outside the language and their syntax and semantics not properly defined anywhere. Mostly you can ignore them as a programmer, except when you actually need module-level recursion. Backpack essentially turns these into a proper, semantically well-defined notion of structural interfaces, in a module language one level up.

That works, but of course, introducing a structurally typed package system as sort of a meta module system, yet another language level, introduces significant redundancy. It is far more economic to make the module system itself structural, and thereby self-contained directly.

Nothing to see here

Move along

automatic lifting

You could include a mechanism for automatic lifting from one level to another, like universe inference in a dependently typed language.

Nominal

I think you and Ross may both still be using nominal differently than I am. Specifically, it sounds like you're imagining this:

theory
    import TheoryNameIHaventSeenDefined

This is useful with datatypes, where you can generate code that passes around a handle to data without ever inspecting it. Your comments on kicking the can down the road make sense in that context, but that is not what I'm talking about. Instead, just assume that I know the structure of any theory whose name I know.
Look at what happens when I try to encode the subtyping-nontermination example from your paper:

theory TH1
    type A1
    f1: A1 -> ()
    
theory TH2
    type A2
    f2: (TH1 where type A1 = A2) -> ()
   
theory TH3 
    import T1
    
    type A1 = TH2

theory TH4
    import TH2, TH3
     
    unify f1 = f2
    -- Error:  TH2 =/= (TH1 where type A1 = A2)
    -- Theories TH2 and TH1 are nominally different.  Cast required.

With names, there are only ever bounded number of theories around, so unification can't explode in this kind of situation. They can also act to spell out a coinduction to the compiler -- assume same named things are equal -- where it might be undecidable for general structures since you don't know where to give up and stop expanding things.

Not quite

Specifically, it sounds like you're imagining this: [...]

Do you mean an unbound identifier? If so, then no, that's not what I have in mind (nor would ever want to support).

Let me try to explain what I meant using your example. Imagine your example was a file, i.e., a compilation unit, and you want to be able to compile units like that separately. How would you express a specification of what that unit defines, other than structurally? Because the "interface" of such a compilation unit would need to include descriptions of the theory definitions it provides.

Regarding the content of the example, I don't think it reflects the one from the paper, for several reasons: (1) the members have different names in different theories, so are unrelated, (2) you are comparing the types of f, not the types TH1 and TH3, and most importantly, (3) you are only comparing for equality, not for inclusion (equality is easier, and perfectly decidable in ML as well).

Also, I'm confused how the issue of decidability of this example relates to the current discussion about the purpose of structural typing. Isn't that a separate issue?

Theories implemented as fields

Do you mean an unbound identifier? If so, then no [...]

Oops, I guess it was me imagining that.

(1) the members have different names in different theories, so are unrelated

Ah! I wonder if Matt's named theory types could desugar to a format where the theory names turned into field names (or parts of field names). They're practically nominal types within the module, but they don't challenge the structural typing of the module interface any more than field names would.

That might have an interesting effect: After desugaring, all theories in a module would have different field names. This would make it challenging to use two different same-named theories from different modules, but I can see a couple of approaches Matt might take: Have a theory..as syntax reminiscent of import..as, or just make do with defining little modules that have no purpose but to translate a module so it uses one theory name rather than another. In the latter approach, subtyping between theories would practically be the same as equality during per-module typechecking, but proper subtypes could still matter for linking exports to imports.

Matt, is this anything like what you're thinking of? It probably wouldn't help for me to bring up a system nobody's actually talking about. :)

Matt, is this anything like

Matt, is this anything like what you're thinking of?

Well, see my post to Andreas: I don't use names as anything other than a way to point at which symbol you mean. So each theory gets a unique theory symbol and resolving name conflicts is a pedestrian problem of adding qualifiers.

Structural

Regarding the content of the example, I don't think it reflects the one from the paper, for several reasons [...]

You're right. I was trying to understand whether the example applied to my system or if not, why not. I suspect it doesn't and it's related to nominal threories, but I'm confused at the moment as to what the right way is to map the example into my setting (witnessed by my dead post above). This part of the discussion probably doesn't belong here, though, as the topic is 1ML.

How would you express a specification of what that unit defines, other than structurally?

Definitions of fresh symbols can introduce both a theory and a satisfying instance at once, so there isn't a regress.

I think the difference in our viewpoints stems from how we view the role of modules. I gather that you're looking at them as a way to package up arbitrary declarations into a bundle. From that point of view, names belong in the module system.

I view name management as user interface / HCI. It's not something that is semantically nice and I think we shouldn't be routinely packaging up aspects of name management into first class objects. As an example, suppose we add localization support into our language: exported symbols can have names and documentation provided in several languages and each client module chooses which language to be written in (and is allowed to name symbols with the localized name). Should this feature affect the module system?

Obviously I think it shouldn't, and so I don't want first class packages and first class modules to be the same mechanism. First class modules are about logical separation and abstraction. First class packages should only be used when the intention is to manipulate code.

Names

Definitions of fresh symbols can introduce both a theory and a satisfying instance at once, so there isn't a regress.

Hm, I'm afraid I don't understand what you mean by that.

I don't quite follow your concern about names. A module provides a set of definitions. There needs to be a way to address the individual definitions. AFAICT, there are only two options: by index or by name. Obviously, the first is far worse, and labelled products preferable to unlabelled ones. Other than that, names shouldn't matter indeed.

Finally, about modules vs packages: of course, the point of 1ML is to have a language with fewer levels and less redundancy rather than more. Making a rather arbitrary distinction between modules and packages is a step in the wrong direction, as far as I'm concerned.

Modularity of modules

I see these three as distinct mechanisms:

- Packages: Units of code distribution
- Theories: Units of logical correctness
- Namespace management: Bringing symbols into scope and giving them names

If you think combining them into one mechanism would necessarily make things simpler, you should check out object oriented programming :).

But in honesty, I haven't really been focused much on programming at scale. My instinct is that if the only reason you're splitting a symbol declaration from its definition is to support separate compilation, then that's something the tools should be doing for you. I think theories should be introduced for logical modularity.

The library and client

The library and client can still be compiled separately. That you need the interface to compile the library/client is irrelevant since it's just a type definition. In the ML approach you could duplicate that type definition in the library and client but I don't see what that buys you in practice.

Separate compilation is largely a red herring anyway since: (1) what we really want is fast & incremental compilation (2) we want cross module optimizations. So some sort of dependency tracking combined with caching is probably a better solution (if Foo.f gets inlined into Bar.g and Foo.f is changed then Bar.g needs to be recompiled).

what we really want is fast

what we really want is fast & incremental compilation

This is actually quite easy to achieve, but it is an up hill battle convincing the community to implement their type systems differently. I'm not aware of one truly incremental type checker that I didn't implement myself.

Incremental type-checking

In the OCaml community, my understanding is that we haven't really felt the need for incremental type-checking yet.

The main tool with incremental needs is Merlin, an editor-assistant with interpets incomplete program chunk sent by editors and provide IDE-like services. It does incremental parsing (by teaching a parser generator how to expose its intermediate state), but the OCaml type-checker is fast-enough to just keep snapshots (indexed by parsing state) and run the non-incremental implementation on them.

I'm not a Merlin developer myself, but my impression is that the typechecking-related performance issues came from incorrect programs that depend on extremely large external libraries: the cost of loading those library interfaces (in the hundreds of megabytes) can dominate the compilation time if the compilation fails early with a typing error. So it would seem that a different internal representation of types (that does not require interfaces to be traversed/freshened at import time) or a form of cross-typechecker-calls sharing of large dependencies would be more important for them that actual incrementality.

Incremental type checking is

Incremental type checking is not just about performance, but in using history to report type errors better. A stateless type checker does not take into account the order that code was typed, while a continuous incremental one does (since the code either existed or not after the last edit).

Also, being incremental would absolutely benefit a slower compiler like scalac, especially if you don't want type checking resulting lagging programmer typing by a few+ seconds. I reject the excuse that such long delays are tolerable or even desirable in an IDE (as is also made for Visual Studio/C#), these are just rationalizations of poor user experiences.

testify!

word. up. lots.

compositional type checking

I wonder how this would compare to a compositional type inference algorithm with undo (reversible unification). You could subtract a program fragment (undoing the unifications that joined that fragment to the rest of the program type) and then add the new fragment in.

Can you expand on the idea

Can you expand on the idea of "incremental type checker", maybe provide an example?

Give me a week, (or two, but

Give me a week, (or two, but probably just one), and I'll give you a video.

Separate compilation matters

Separate compilation is largely a red herring anyway

Emphatic disagreement. Separate compilation is a goalpost that signals that you achieved the design of a modular language. "Fast & Incremental" is not harsh enough a requirement to force you to describe your interface language, and this is key to programming practice. It's not (only) about compilation time.

That said, I agree with your point that nominal typing is not incompatible with separate compilation: a library and its users can still be given meaning separately, they only need to reference a common interface in an ancestor module.

On the other hand, I have observed in practice that requiring people to agree on an single ancestor module can be a cause of compatibility issues and split library communities. This works well when the nominal type or signature is part of the "standard library", but can be socially problematic with grass-roots library development -- whereas structural types have been observed in the wild to make this easier.

Separate compilation is an

Separate compilation is an ideal that is necessary but not sufficient, but it is also a fuzzy, not binary property. How much implementation details does a module have to expose to be recompiled? Keep in mind that signatures are just an abstraction of these details.

Structural types just deal with syntactic type, not semantic type incompatibilities. The reason Jiazzi went nowhere is that people were more concerned about the latter problem than the former; assemblies with hard versioning is what the industry moved to.

It had nothing to do with the lack of structural types (again, which Jiazzi provided for Java); it turns out this is a concern mostly only exists in academia and people in industry have more difficult problems to solve.

For a modular language you

For a modular language you only need that the type checking of the client depends only on the signature, not on the implementation of the library. That's of course a very important property that should not be violated. Whether compilation of the client depends on the implementation of the library is irrelevant for language design: given any possible language you can always design some run-time representation that makes separate compilation possible, so it's not actually constraining the language design at all. It's only constraining the implementation of the compiler to produce slow code, or to do optimization in the linker, which doesn't really deserve to be called separate compilation. If you are allowed to make your own linker then the term separate compilation is meaningless, except for modular type checking since the linker is not allowed to signal a type error (presumably). If you are not allowed to make your own linker, but instead are forced to use the C linker or some other linker, that is quite an artificial constraint on your compiler. That's why in my opinion the requirement of modular type checking and fast and incremental compilation should not be put together in a single separate compilation requirement. They are two different requirements that deserve different solutions.

I agree

Requiring machine code generation in an abstract setting is a bad idea, with the most obvious example of why being generating code that manipulates an abstract numerical type. If you implement that with pointers and the ground type turns out to be Nat or Float, you just lost a whole lot of performance.

ML does not agree

I agree that the interface definition should facilitate separate compilation. With ML there is a problem, and that is that associated types can vary by modules (implementations) for a single signature (interface). It seems to me this design decision is the cause of a lot of the type system complexities in this paper.

I think the simpler solution is to make all types that are visible outside of an implementation either defined in the interface, or completely dependent on a type parameter of the implementation.

We can achieve similar results to functors in ML by having type level functions on interfaces and even subclass hierarchies of interfaces.

I'm not an expert in ML but it seems to me it has the same issues as Ada generics, where the thing you really want to work with is the signature not the module.

So for translating into ML speak when I say module I mean a pair of interface (ml signature) and implementation (ml module).

deploying interfaces/signatures

I think I understand the distinction you're making between implementing a Java interface and satisfying a module signature, but I'm not sure what the implications are from an engineering perspective.

To restate the problem: In Java, if library A provides a bunch of code plus an interface and library B implements it, then library B can not be compiled without having first compiled library A. Having a copy of the interface in B would create a different interface.

And the proposed solution: In ML, if library A provides a bunch of code plus a signature and library B satisfies it, then library B *can* be compiled without having first compiled library B. This is true if provided that library B contains a copy of the signature, which will equate structurally with the original signature.

Is that reasonably accurate?

My question is: How do you, from a practical/engineering standpoint, deal with the "copying" of signatures?

It seems like a bad idea to copy/paste signatures between projects, so you'd have to store them in a separate repository to be shared independently. In Java, the compiler is generally fast enough not to bother with this, but it's not totally unheard of to have jars that only contain interfaces (especially from dependency injection and/or TDD mocking aficionados).

Even still, in both cases you'd have libraries A and B both depending on interface/signature library X. In the Java case, A and B can't be compiled having a compiling X on hand. In the ML case, A and B can't be compiled without having the source for X on hand.

Am I missing something?

Removed

[Removed joke because it was dumb.]

I don't think the situation in ML is really that bad. You put your signatures in a common place and share them. When you're writing the module, you look at the interface in your editor and make sure you're satisfying it. When you try to seal the module to the signature, you'll get an error if you did it wrong.

I'm still missing something...

"You put your signatures in a common place and share them."

OK, sooo how is that different than what you do with Java interfaces?

It seems like the practical outcome differs not by separate compilation, but by tolerance of multiple-compilation.

dislike

One thing I strongly dislike about there being e.g. .h and .c files is then who really knows which one to document/comment? Especially in languages w/out visibility modifiers.

One more example

Here's the ML example you give in your paper that breaks type checking:

type T = {
    type A; 
    f: A -> ()
    }
type U = {
    type A; 
    f: (T where type A = A) -> ()
    }
type V = T where type A = U

If I had theories-as-types in my more nominal approach, I think the only way to introduce the same structure would be to explicitly indicate the recursion:

theory
    type A
    f: A -> ()

theory
    type B
    g: (theory of A where type A = B) -> ()

theory TH
    h: TH -> ()

    -- extends theory of A:
    type A = TH
    f = h
    
    -- extends theory of B:
    type B = TH
    g = h^ -- cast to (theory of A where type A = TH) -> ()
           -- works because (theory of A where type A = TH) <: TH 

That is, I don't think there's any way the recursion could crop up during unification of non-recursive looking things.

That said, recursively using a signature as a type in the signature being defined looks kind of weird. Is that something you allow?

Elm's Records

Elm's records provide an interesting approach to this, too: http://elm-lang.org/learn/Records.elm

Hm

Hm, can you elaborate? From a quick glance, I only see standard row polymorphism.

Compared with Scala objects

How does this proposal compare to Scala objects/traits, which also are first class and I believe have similar expressiveness? Specifically it would be interesting to compare it with the DOT calculus.

ML and Scala are incomparable

Scala does lack quite a bit of the expressiveness that ML modules provide (and vice versa). I actually answered a related question on StackOverflow a while ago, although that only considers the much simpler modules of SML.

Unfortunately, I don't know the DOT calculus well enough to say much about it.

type classes of module signatures.

I want to gather together my thoughts scattered above to get some opinions. I will try and use ML like terms to avoid confusion. If we say modules must only be values (no types that depend on the module instance), and all modules must have a signature (which is a type and can therefore have associated type definitions), we avoid the complexities involved in 1ML. We can have functors as type level functions (type-families) on signatures. If we wanted modules with associated types, they would have different signatures, but could be related by a type-class. Any problems with this approach?

Well

First, you will have a tough time arguing that this resembles modules. Second, throwing in type families does not avoid complexities, it adds them – the type theory of type families is significantly more complicated than System F.

Something that enables separate complilation

So if not a module, what do I call something that packages code fragments for separate compilation, and allows different implementations of the components to be substituted, functors created etc?

Personally I find the theory of type-families simpler (more regular), and orthogonal to the other features I want (records and implicits).

What is a module?

I don’t think there is a proper authoritative definition of what a module is, but the common grounds, going back way into the 70s, seems to be that a module is a

  • unit of grouping
  • unit of encapsulation
  • unit of compilation (and potentially, deployment)

The first means that you are able to take an arbitrary set of language-level definitions and wrap them into a module, without otherwise changing their form or meaning. Often, that also implies that modules are a name spacing mechanism.

The second means that a module is able to hide or privatise certain details from its implementation, either by hiding certain components completely, or by making details of their definitions abstract. The former typically implies that a module has its own local scope, the latter the ability to make types abstract.

The third means that modules are "independent" entities, and have a notion of interface (implicitly or explicitly) that is a sufficiently complete and self-contained description of what the module defines to allow separate compilation (or at least separate type checking) against uses of a module.

This is what modules provide in almost any language I’m aware of, be it Modula 2, ML, Haskell, or even proposals for C, make your pick. Now compare, does the thingy you describe provide any of this?

As for the complexity of type families, have you actually looked at the theory? ;) To express modules, type families are neither necessary nor sufficient, and primarily overkill.

All good.

Okay, so what I am proposing meets all those criteria. Based on the way you had been arguing against it, I had assumed you had some other definition in mind.

Er

That was a rhetoric question. ;) I think the correct answer is that it meets none of these criteria, for reasons I already pointed out scattered in earlier replies.

Type Families and Signatures.

Type families are just type-indexed-types which we covered in the HList paper (and implemented using type classes, although primitive implementations are a lot simpler, avoiding functional dependencies).

I can only conclude you haven't really understood what I am proposing, as it seems to trivially satisfy the requirements of being a module.

- Unit of grouping: The module is the unit of grouping for values. The signature is the unit of grouping for types.

- Unit of encapsulation: The module groups values, the signature groups types.

- Unit of compilation: The signature defines the interface for separate compilation. You reference the signature to use a module (without tying it to an implementation). You define implementations of signatures by referencing the signature.

The restriction that "modules" of the same signature must not have visibly different types does not affect any of the above. You can use subclass relations on signatures to substitute implementations with different visible types (module signatures). Signatures would either be in separate files, or the compiler would extract them into interface files automatically.

signature Array -- abstract signature, has no possible implementations
    type ValueType :: *
    type IndexType :: *
    get :: Array -> IndexType -> IO ValueType
    set :: Array -> IndexType -> ValueType -> IO ()

signature ArrayOfInts <: Array
    type ValueType = Int
    type IndexType = Int

We would then provide the implementation of ArrayOfInts (but there may be multiple implementations, that can be passed first class anywhere an ArrayOfInts is acceptable:

module ArrayOfInts
    get array index = do ...
    set array index value = do ...

I am just using the keywords "signature" and "module" to reference the ML equivalents. Actually a signature is nothing but a modified record type, and an implementation a value of that record type.

We might go on to define a map as a 'functor' on two array signatures (implementations can vary at runtime):

signature Map -- abstract signature
    type KeyType :: * <: Array
    type ValueType :: * <: Array
    lookup :: Map -> KeyType -> IO ValueType

Neither here nor there

Indeed, I haven't understood what you are proposing. And I still don't. Previously you sketched something based on type families, where "signatures" were classes and type components were indexed, and which didn't qualify for any of the criteria above. Now it seems you are proposing something completely different. Also, all of a sudden you throw in HList, which again seems entirely unrelated. (Are your "records" actually HLists? If so, I don't get how any of this is supposed to work.)

From first look, your examples look very similar to what I describe, but you haven't explained what the semantics is, nor where you think it is different. Or how it scales to higher-order polymorphism. Or how you avoid undecidability.

I'm also confused about your other claims:

- Unit of grouping: The module is the unit of grouping for values. The signature is the unit of grouping for types.

If the "module" itself cannot contain type definitions, then it is not a grouping mechanism as I described it. Moreover, what semantic relevance has a signature with types? How do you access these types?

- Unit of encapsulation: The module groups values, the signature groups types.

Which is just repeating the above without saying anything about encapsulation. Can a record contain private definitions? Can it abstract type definitions?

- Unit of compilation: The signature defines the interface for separate compilation. You reference the signature to use a module (without tying it to an implementation). You define implementations of signatures by referencing the signature.

I don't know what this means. If signatures are just ("modified"?) types, then naturally you can have as many different values for them (i.e., modules) as you want. If not, I don't know what they are, but certainly not types in the usual sense of the word.

Your "functor" example also doesn't square up for me.

I'm sorry, but you are just hand-waving from left to right. Until you actually work out a coherent story, write it up in a sufficiently precise and comprehensive manner (which in a tricky context like this, probably means sketching a formal semantics), and provide at least some plausible evidence for your claims, discussing this further is kind of pointless, and I don't feel the greatest urge to do so.

Its all from HList

Lets start with HList, where we implemented type-indexed-types in Haskell. The key thing is a type-family is a type-indexed-type. The 'key' is the first type, the value the result so:

data family X
data instance X A = B
data instance X C = D

Creates a type-index type called 'X'. The first key in 'X' is 'A', and the corresponding value is 'B'. As a type-indexed-type we would write this:

X = {A = B, C = D}

In the HList paper we would have written this: (note X is a type, so we cannot really write the above in Haskell without a type assignment operator, so lets pretend the environment is an anonymous HList)

X :=: ((A :=: B) :*: (C :=: D))

The 'records' are not HLists, they are type-indexed-types, and type-indexed-types are built into the type system, so they are plain types.

Signatures are types, and only types, just like a record definition is a type. The 'type-indexed-types' are open. so the array signature is actually the empty type-indexed-types for IndexType, ValueType, and then the types of the get and set functions. Its just like a record type with the addition of the type-indexed-types. They are clearly plain types for example the type of the signature above is:

Array = {
    IndexType = {},
    ValueType = {},
    Get = ArrayOfInts -> IndexType -> IO ValueType,
    Set = ArrayOfInts -> IndexType -> ValueType -> IO ()
} 

Where '{}' are type-indexed-types and we create 'Get' and 'Set' as the type-level keys for the values 'get' and 'set'.

The signature for the ArrayOfInt, defines entries for 'ArrayOfInt' in the type-indexed-types of the Array type. With the ArrayOfInt signature in scope we now have the type:

Array = {
    IndexType = {ArrayOfInt, Int},
    ValueType = {ArrayOfInt, Int},
    Get = ArrayOfInts -> IndexType -> IO ValueType,
    Set = ArrayOfInts -> IndexType -> ValueType -> IO ()
}

So far everything is just plain types.

The 'module' itself cannot (and should not in my opinion) contain type definitions. That is what the associate signature is for. So in ML you would have a module for ArrayOfInt, in my system you need a pair of signature and module. This is good because the signature is all you need for separate compilation, and can be distributed separately from the implementation in the case of a binary library distribution (like C header files).

So the pair of 'module' and 'signature' make a grouping mechanism as you describe it. All the types have to go into the signature. Many languages have this separations (C++ classes, Ada packages), but not done as cleanly so that any language feature can be packaged.

Manual phase splitting

Okay, so the one thing I think I understand now is that you are using type families to encode type-level records. But I don't understand why. If all you want is type-level records then add type-level records to the language. It is a trivial extension. Why would you want to introduce the semantic hammer that are type families just to encode something trivial?

But the more serious point is: by forcing separation of types and values you are completely missing the point of modules. You are basically dropping down to the low level of something analogous to raw System F. At that point, I could just as well program in System F plus records itself, your suggestion doesn't seem to add anything interesting.

But nobody wants to do that. As I mentioned earlier, you would be putting the burden of manually phase-splitting (separating into types and terms) each and every individual module on the programmer. That does not scale. It is not compositional. To express a functor, the programmer would have to abstract and supply types and terms separately. To nest modules, she would have to create two separate, parallel nesting structures, one for types, one for terms. To express the equivalent of a higher-order "functor", she would have to write two symmetric higher-order functions, one for types, one for terms. And if she actually wanted to pass a "module" around as a first-class entity, she would have to manually bundle both types and terms together after all, by manually forming an existential type. And if she wanted to unbundle that first-class thing, she'd be stuck in the local scope of existential open.

And this isn't even touching the question of what to do with language definitions that simultaneously introduce both types and terms, like e.g. datatypes in FP, or classes in OO. You would need a way to phase-split all of them!

The whole point of modules is that they enable you to consistently bundle definitions of diverse kind into one coherent entity, and that you can aggregate, abstract, apply, encapsulate such entities compositionally. Desugaring this compositional notion of module into low-level System F, by automatically phase-splitting all definitions, is exactly what the F-ing elaboration does for you.

There are a lot of other things that remain mysterious about your proposal, like how you define subtyping or matching, how you allow encapsulation, and so on, but I'll stop there. Basically, what you propose is simply not solving the problem that modules are intended to solve.

Type-Indexed-Types and Associated-Values

Type level records, type level maps, type level sets, etc.. are all examples of type-indexed-types. Type-indexed-types are simpler to implement, and more general than any of these other things which can be built on top of them fairly trivially. You seem to find type-indexed-types somehow complex or difficult, and I don't understand why. They are the core of everything in the HList paper, and it makes quite a strong argument for them, although not explicitly. If you re-read the HList paper (and the follow up OOHaskell paper) and think about what kind of language would make expressing these kind of concepts simple and natural, it should give you an idea of where I am coming from.

I strongly disagree with your point about 'missing the point of modules', and your statement "nobody wants to do that" is clearly false as at least one person (me) does want to do that. I really don't think what I am trying to do is that original or weird, but from your reaction it seems perhaps it is?

Looking at the functor issue, yes specifying a functor is two parts, and deliberately so, because a functor may have more than one implementation. The signature defines the interface and its relation to other types, so for example a Map might be a functor on two parameters both of which are random-access containers. Something like "Map 'a 'b" where 'a is an random-access container, and so is 'b. This is a type signature for all map implementations. So the functor is really just a parametric type with constraints on the types of 'a and 'b. All these examples have applicative semantics, but generative is possible if the signature type is in a monad. Using IO as an example (remembering that I am using a monaic effect system based on type-indexed-types and don't actually have an IO monad):

type GenMap a b = IO {...}

To pass the module around as a first class entity does not require any existential types, as the type will be inferred automatically. This is the same for any first class value, you can pass a Sting or an Int, but whether it is a string or an int does not require existentials. For example:

id :: a -> a
id x = x

putStrLn $ show $ id "xyz"
putStrLn $ show $ id 3

Notice how I did not need an existential to pass the string or the int to 'id'? Now I can obviously pass any module to id as well (even though different modules have different types).

The only time you might need existentials is for runtime/dynamic polymorphism as the above handles all static cases. As I pointed out earlier, you don't need existentials if you have open data types (at the moment I prefer open data types to existentials for simplicity, but this is not a final decision). For runtime polymorphism a simple datatype like the following could be declared:

data Array = IntArray ArrayOfInt | FloatArray ArrayOfFloat ...

However looking at the signatures for Array, we already have something like this type defined in the type-indexed-type 'ArrayType' which can perform the same function.

This leads to the general solution for datatypes, we encode a datatype as a type-indexed-type plus a bunch of singleton type constructors so that the above becomes:

Array = {IntArray = ArrayOfInt, FloatArray = ArrayOfFloat}

Where the indexes are automatically considered 'newtype' definitions.

So now we can use this in a type-case (see the variant example in the HList paper):

case (x <: Array) of
    IntArray -> ...
    FloatArray -> ...

and we have runtime polymorphism and an encoding for datatypes that cleanly splits types and values. We do allow types to have associated values, but not the other way around. Records already do this. So the phase splitting is no more complex than the existing type definitions for records.

The subtyping is defined as a constraint on types, effectively a type class that relates the subtypes to their supertypes.

The encapsulation mechanism (as in restricting access) is something I am still working on, and is less well defined. It might be worth discussing if you agree with me so far.

Things I can't figure out

- What's the point of outlawing types in modules? Couldn't you de-sugar types-in-modules to a local signature and a module implementing it?

- The syntax is a little funky since your signature statement is actually introducing a type of the given name inside of a new signature.

- If your signature isn't functionally dependent on any particular type, you have to introduce a "phantom" type, upon which all types are dependent?

- What does code look like that uses an abstract type like Array?

FYI: I strongly agree with Andreas that thinking carefully about what these things mean is a good idea.

Types in Modules

- Types are outlawed in modules because they belong in signatures. I want a clean separation between the type level and the value level. It means that modules can be treated as plain values and be first class without any complexities. One reason to do this is to make the implementation and type system simpler to understand, a second is to keep people's thinking when they program this language clean and simple.

- I am open to improvements in syntax, note I dont actually use the above syntax, I use something more like Haskell record syntax with associated types. I was trying to highlight the way ML concepts map into this system. Introducing the type is simply adding into type-indexed-types. See explanation of the type-system in response above.

- No each signature is a plain type. Its type is its type. I have written out some of the types using type-indexed-types in the response above.

- It looks like code that uses a record. I would of course use iterators to access the Array, so we would actually have something like (note I will use the IO monad, but I am actually using an effect system also based on type-indexed-types).

type Array = {
    type ArrayType,
    type IteratorType,
    begin = ArrayType -> IteratorType,
    end = ArrayType -> IteratorType,
    empty = ArrayType
} 

type ArrayOfInts = Array :> {
    type ArrayType = ArrayOfInts,
    type IteratorType = ArrayIntIterator,
} 

type Iterator {
    type IteratorType
    type ValueType
    source = IteratorType -> IO ValueType,
    sink = IteratorType -> ValueType -> IO (),
    successor = IteratorType -> IteratorType
}

type BidirectionalIterator = Iterator :> {
    predecessor = IteratorType -> IteratorType
} 

type ArrayIntIterator = BidirectionalIterator :> {
    type IteratorType = ArrayIntIteratorType
    type ValueType = Int
}
    
reverse f l | (f == l) | (successor(f) == l) = return ()
reverse f l = do
    x <- source f
    y <- source l 
    sink f y
    sink l x
    reverse (successor f) (predecessor l)

let a = createSomeArray in
    reverse (begin a) (end a)

The semantics of the modules are obvious as it is using nothing but type-indexed-types and some syntactic sugar.

Well

Previously you used the word 'signature Array = ' but then seemed to be actually introducing a type named Array with associated things. Now you've changed it to 'type Array = ' and you seem to be introducing a proper signature (record type).

The code I wanted to see is using Array in the abstract. What is the type of 'get' outside of the signature in which it's defined? Also, do a functor example with a Graph built out of an abstract Array would be useful to see.

Types are outlawed in modules because they belong in signatures. I want a clean separation between the type level and the value level.

You can always turn a signature into a type of proper values with an existential type (there exists some type components for which these are the values satisfying a signature). The problem is when you find yourself needing to refer to those types after you've sealed them up. Do examples writing functions/functors against the abstract signatures.

Open Data Types and Compositional Type Inference

I just made up the "signature" / "module" syntax to make clear the relation to ML signatures and modules. 'type X =' is the equivalent Haskell like syntax which is more along the lines I am working with. Syntax is of course superficial, so I am trying to stay close to existing things to make it easier to understand.

Do you use existentials with type-classes simply because type classes are open? With open data types, you can extend a data type like you can define a new instance for a type class. I am not sure I even need existentials with open data types.

In the above example, I added the Iterator intermediary as this is how I would currently define an Array, so 'source' is the equivalent of get. I have changed the syntax slightly as I can see the supertype is too easily confused with a type-constructor. Adding in the ':>' operator hopefully makes this clearer.

So there is an example use in 'reverse'. What is the type of 'sink' in the example, its something like:

source :: a -> IO c

Would be the type inferred from the definition of 'reverse'. Remember I am using compositional typing, so type type of reverse is something like:

{
    source : a -> IO c,
    source : b -> IO d,
    sink : a -> d -> IO (),
    sink : b -> c -> IO (),
    successor : a -> a,
    predecessor : b -> b
} |- a -> b -> IO ()

So it depends on the types of 'a' and 'b' passed in.

If we look at the application of reverse, we see 'begin' and 'end' applied to a. We know the type of 'a', because (for this example assume) createSomeArray returns an ArrayOfInt type. We know the return type of 'begin' because begin applied to ArrayOfInt's return type is an associated type, and hence if begin is applied to ArrayOfInt it is completely determined. The same applies to 'end', and hence we know the argument types of 'source' and because the result of source is an associated type of the iterator, we know that type too etc...

Modular type checking

So 'reverse' just gets a 'bag of constraints' type which isn't checked for coherence until all types are grounded? Or do you have some way of verifying that the constraints you have accumulated are consistent with the signatures seen so far?

Compositional Type Inference and AST slicing

With compositional type checking there are requirements and definitions. The types are checked for coherence at the point where the program fragment containing the definition and the fragment containing the requirement are joined.

This can follow the parser, so we have a parser that always outputs typed expressions / statements. Parsing a simple fragment like:

a

Infers the type {a : t} |- t and inference proceeds bottom up with parsing, as expressions are returned with types already inferred and checked.

Module signatures provide the interface between discrete components, so the inferred types have to agree with the module signatures for the current code block (for exports) and imported signatures.

So its not "until all types are grounded", but until all types are defined (basically the module boundaries). However the type of 'reverse' is a completely valid type all on its own. It's not just a principal type but is a principal typing. It should be trivial to see that the typing I gave for 'reverse' is a judgement of the form A |- M :t which represents all possible typings for 'reverse', noting that M is the definition of reverse.

It produces good error messages too, because lets say we got the parameters the wrong way around for 'sink' in the example above. It doesn't matter when the compiler tries to join the fragments together, but when the fragment containing the use of 'sink' is joined to the fragment containing the definition of 'sink' there will be a type error, which gets explained as "you tried to use sink like this" with the inferred types from the rest of the inference, "but the definition is like this" with the type inferred for the definition (if in the same module) or the type specified in the module signature (if in a different module). There is no left-to-right bias, so if you use 'sink' polymorphically (as seen with source and sink above) each requirement gets propagated until the definition is joined, and at this point we can test each use to see if it is an instance of the polymorphic type and return appropriate errors, or if it is in fact a monomorphic type we can report all uses, including the first as an error.

However there is no reason that the type-checking needs to follow the parser order. You can infact slice the Abstract-Syntax-Tree at any point and look at the required vs defined types. This could be used interactively to narrow down type errors.

However for the moment I like the simplicity of having the parser always outputting type-checked code. It doesn't have to stop at the first error, it could just propagate the bottom-type, but for simplicity it currently terminates on first error.

Just for fun, here's the definition of a fragment of the type checking algorithm (for lambda calculus + let, expressions only no definitions) in logic. This uses the logic interpreter I have on GitHub https://github.com/keean/Clors which uses sound-unification, sound inequality (using constraint-logic-programming), and iterative deepening which means it does not respect clause order, but finds the shortest solution. As such this logic program represents a formal non-deterministic definition of the inference algorithm.
Given some basic definitions:

unifyall(cons(T, Tail), T) :-
    unifyall(Tail, T).
unifyall(nil, T).

unifyeach(cons(Tapp, Tail), Tdef, Cxt) :-
    duplicate_term(Tdef, pair(C1, Tapp)),
    unifyeach(Tail, Tdef, Csub),
    append(Csub, C1, Cxt).
unifyeach(nil, T, nil).

split(X, cons(def(Y, T), Tail1), Tail2, cons(def(Y, T), Tail3)) :-
    dif(X, Y),
    split(X, Tail1, Tail2, Tail3).
split(X, cons(def(X, T), Tail1), cons(T, Tail2), Tail3 ) :-
    split(X, Tail1, Tail2, Tail3).
split(X, nil, nil, nil).

nat(zero).
nat(succ(X)) :-
    nat(X).

bool(true).
bool(false).

This is the inference algorithm:

expr(var(X), cons(def(X, Type), nil), Type).
expr(nat(X1), nil, nat) :-
    nat(X).
expr(bool(X), nil, bool) :-
    bool(X).
expr(pair(X, Y), C3, prod(A, B)) :-
    expr(X, C1, A),
    expr(Y, C2, B),
    append(C1, C2, C3).
expr(app(Fun, Arg), Cxt, B) :-
    expr(Fun, C1, arrow(A, B)),
    expr(Arg, C2, A),
    append(C1, C2, Cxt).
expr(lam(Var, Body), Cxt, arrow(A, B)) :-
    expr(Body, C1, B),
    split(Var, C1, Def, Cxt),
    unifyall(Def, A).
expr(let(Var, Body, Rhs), Cxt, T2) :-
    expr(Body, C1, T1),
    expr(Rhs, C2, T2),
    split(Var, C2, Def, C3),
    unifyeach(Def, pair(C1, T1), C4),
    append(C3, C4, Cxt).

Here is a sample program to type:

:- expr(
  let(cast2, lam(f, lam(x, lam(y, app(app(var(f), var(x)), var(y))))), var(cast2)),
Cxt, Typ).

Which outputs:

yes(Cxt5 = nil, Typ1 = arrow(arrow(T6, arrow(T7, B4)), arrow(T6, arrow(T7, B4)))).

Along with the 40 step proof, which I have omitted for brevity.

Loster

The more I stare at this, the more lost I get:

type ArrayOfInts = Array :> {
    type ArrayType = ArrayOfInts,

Is that recursion on ArrayOfInts? Is ArrayOfInts the type of an array of ints or is it the type (signature) of a record?

I'm also still confused about how 'reverse' is typed. I gather you're saying that it has a 'bag of constraints' type until it reaches the module boundary at which point it is unified with the signature for the module. So what does the type look like then, if you import it from somewhere else?

I think it would be helpful to me if you could do an example where there are multiple instances of Array.

Also, you keep saying the semantics are simple ... type indexed types. Is there a core calculus for that you could describe and translate to?

Recursive Types and Objects

If you re-read the OOHaskell paper, you will see that creating a new object is related to the fix-point operator.

In this case an object of type 'ArrayOfInts' contains the values (but the contents are not part of the interface). The recursive type is what makes this an object. Without the recursive type, its just an interface, and would need an external datatype to store the values.

At the module boundary, assuming you have the relevant imports, reverse would be typed like this:

reverse :: {Subtype a Iterator, Subtype b Iterator} => a -> b -> IO ()

Which you might write

reverse :: (a <: Iterator) -> (b <: Iterator) -> IO ()

In the HList paper we translate to type-classes with functional-dependencies, but type-indexed-types are simpler than functional-dependencies, so it doesn't make much sense unless you are trying to implement in Haskell.

Objects vs. Modules

I thought we were talking about modules. I understand why objects are recursive. Are your modules actually objects?

Without the recursive type, its just an interface, and would need an external datatype to store the values.

This loses me again. An external datatype to store the values?

reverse :: (a <: Iterator) -> (b <: Iterator) -> IO ()

But where are the annotations as to which iterator we're talking about? What does the type for a function look like when it has to relate several types to a common module? Like 'get' takes an IndexType and returns a ValueType ... Could you show an example for 'doubleGet' that takes two IndexTypes, from different modules, and returns a pair of values, of corresponding ValueType?

DoubleGet

Well modules are abstract-data-types, and why not make them recursive so they are objects? They don't have to be though.

type DoubleGet = {
    type DoubleGetType
    type Array1Type
    type Array2Type
    get2 = DoubleGetType -> Array1Type.IndexType-> Array2Type.Index
        -> (Array1Type.ValueType, Array2Type.ValueType)
}

type DoubleIntGet = DoubleGet :> {
    type DoubleGetType = DoubleGetInt
    type Array1Type = ArrayOfInt
    type Array2Type = ArrayOfInt
}
 
get2 doubleGet 1 2

This assumes that the implementation of 'DoubleIntGet' stores the arrays internally and defines get2 appropriately. But I get that's not really what you were asking.

To get what you want, we can just pass the module types directly as they are first class:

get2 :: (a <: Array)-> (b <: Array)
    -> a.IndexType -> b.IndexType
    -> (a.ValueType, b.ValueType)

This can be combined with implicits like this:

gets :: [a <: Array] -> [b <: Array]
    -> a.IndexType -> b.IndexType
    -> (a.ValueType, b.ValueType)

However in this case you would still have to pass one of the arrays in directly, as you cannot resolve two identically types implicits. The implicits act as optional parameters so the same definition can be use d passing modules by value, or using them like type classes.

Another option is to define:

data DoubleGet (a <: Array) (b <: Array) = DG {
    get2 :: a.IndexType -> b.IndexType -> (a.ValueType, b.ValueType)
}

Then get2 can have the type:

get2 :: [DoubleGet (a <: Array) (b <: Array)]
    -> a.IndexType -> b.IndexType
    -> (a.ValueType, b.ValueType)

which can be resolved implicitly.

Here's where I'm still confused

Here's where I'm still confused. Compare these:

reverse :: (a <: Iterator) -> (b <: Iterator) -> IO ()

get2 :: (a <: Array)-> (b <: Array)
    -> a.IndexType -> b.IndexType
    -> (a.ValueType, b.ValueType)

In the case of reverse, you must intend for a and b to be types of iterators, so that this function takes two iterators (not just two iterator signatures). Otherwise I don't see how that type makes any sense for 'reverse'.

But in the case of get2, you reference a.IndexType. That says to me that a is itself a type and you're looking up an associated type (since modules don't contain types).

modules are abstract-data-types

Usually modules contain an abstract type data. You seem to be trying to identify the two, but I don't follow the details. Maybe the recursion would make more sense to me if I did.

Constrained Type Variables

Yes reverse takes two iterators, which are values, but every value has a type, and reverse is still polymorphic, just constrained such that it must be passed a type-indexed-type with an IndexType 'key'. The 'ValueType' is constrained to be the 'value' of the type looked up in the IndexType with the key being the actual type of 'a'.

You can implement all this with type-classes and functional dependencies, but type-indexed-types are simpler and have an effective type class:

class Lookup hListOfPairs key value | hListOfPairs key -> value ...

I don't see the confusion. These are type signatures, so in 'reverse' a is the type of the iterator passed (Iterator is not its type, its a subtype constraint), and in get2 its the type of the Array passed.

In both the signatures you quote, 'a' and 'b' are constrained type variables.

A further possibility is:

get2 :: [a <: Array] -> [b <: Array]
  -> a.ArrayType -> b.ArrayType
  -> a.IndexType -> b.IndexType
  -> (a.ValueType, b.ValueType)

With the appropriate signatures. In this case the actual 'ArrayType' is visible outside the module, which may not be desired. The module types should be resolvable without being passed explicitly.

OK, gotcha. Can I recommend

OK, gotcha. Can I recommend you change the equals to 'where'?

type Array where {
    type ArrayType,
    type IteratorType,
    begin = ArrayType -> IteratorType,
    end = ArrayType -> IteratorType,
    empty = ArrayType
}

I've entertained a similar syntax (probably with similar but different semantics).

So how do you write the type signature of a function that doesn't actually need a value of type Array, but rather just needs an index or value? Maybe something like this?

 foo: { a <: Array } |- a.IndexType -> a.ValueType 

Well, no... that probably says that 'a' is a variable in the environment of foo (rather than a parameter)?

Implicit Parameters

Okay, I'm happy to take that advice. As the array type is implicit, but could optionally be passed explicitly, and the types of 'IndexType' and 'ValueType' can depend on the ground type of 'a', I would write:

foo : [a <: Array] -> a.IndexType -> a.ValueType

So the first parameter is optional, if its passed the value of an array it uses that one as the dictionary, and its type as the "type-dictionary". If you don't pass the first parameter it is resolved implicitly from the environment, however there must be exactly one type in the environment the meets the constraints for implicit resolution to succeed.

The judgement syntax is currently only used for inferred types, and you can't write the inferred type for 'foo' without seeing the actual definition. I do understand the type you have written though, and it does make sense, its just not the way I am doing it at the moment.

Hopefully you have a fairly clear sketch of how I intend record-types to function as modules and type-classes using implicits and type-indexed-types (providing type-level maps, type-level sets, and type-level 'static' variants). The other parts of the language are mutability and the effect-inference system (not the IO monad as I have used in the examples to avoid explaining here) and the type-level logic language. This allows defining axioms in signatures that modules providing implementations must obey, and encapsulating imperative mutation inside a module implementation for externally pure functional signatures, with axioms and the effect system being used to prove that the module implementation behaves as implied by a pure functional signature.

Optional parameter of type Array

So the first parameter is optional, if its passed the value of an array it uses that one as the dictionary, and its type as the "type-dictionary".

What is the dictionary that is not the type-dictionary? I'm getting the vibe again that we're combining objects and modules.

I'm asking for a way to specify a type parameter without specifying a value parameter of that type. Kind of like this:

foo: forall a <: Array. a.IndexType -> a.ValueType

Except that's not what you're envisioning, I think.

My system feels similar to yours in some ways. Exceptions I see right now are that 1) I'm not trying to identify types and signatures (so type X and theory of X are different things), and 2) I don't have the goal of monomorphizing all types at compile-time like you do.

Implicits

So you either pass a value (because there are more than one possible type that could be resolved in the environment) or you implicitly resolve the type from the environment (because there is only one matching type). I don't think this has anything to do with objects. I'm probably not describing the mechanism very well, but its like Haskell's mechanism for passing Dictionaries to implement type classes, except you can pass the dictionary explicitly too.

I don't have a way of specifying a type parameter without a value, you would pass a value of that type (type constructor?), as in:

data Z = Z

You pass the type-constructor 'Z' as a value where you want type 'Z'.

As to the differences, theories are logic programs over types, not necessarily types themselves. Monomorphizing is not an explicit goal, but something desirable. I think I may even be able to handle polymorphic-recursion using type-indexed-types. The goal is more that runtime polymorphism is always explicitly requested (a bit like declaring a vitual function in C++) so that the programmer is aware when they are paying the performance cost.

Hmm

The usual technique in Haskell is to pass bottom, cast to the appropriate type. Seems kind of kludgy, though, for common uses.

I'm still confused. I understand how you intend to index the other parts of the signature with a type index. But it looks like you still have two meanings for e.g. Array: the type of an actual array value, and the type of a module value that provides all of the associated values for Array.

I'm probably not describing the mechanism very well, but its like Haskell's mechanism for passing Dictionaries to implement type classes, except you can pass the dictionary explicitly too.

Do you have to have a module around? Or can you just use type-indexed values?

As for the differences, we're certainly thinking about the semantics differently, but I'm not sure that there isn't a simple mapping between the schemes. Who knows.. I don't understand yours very well yet.

Both.

Just realised I missed this question. I am probably making things more confusing than they need to be, as I want both (a bit like Ada). So modules would be passed separately, but there might also be objects.

There seems to be very little difference between an object and some modules. If you define an ADT in a module then how does that differ from an object? Obviously a modules don't have to hold values, they could define types that carry the values separately. Modules seem to be a superset of objects and interfaces.

Modules vs. objects

If you haven't already seen it, you might check out this article On Understanding Data Abstraction, Revisited. I don't know if it's exactly on point. Object systems generally don't allow types as fields, and modules generally aren't created as pervasively as objects. Operations in a module generally have parameters on which they're to operate. To encode objects with modules, you'd need to create a new module every time you wanted to return a new value.

Object = Module + State + FixedPoint

I see the point about needing a new module for each new value, but that would be the same for immutable objects. Its more likely that you would use mutable state for an object. I am using 'data' as a single keyword to define modules, records etc. Also what I am calling a module is what ML calls a signature. Technically these are module interface definitions, you can have multiple implementations of a module.

An ADT defines a type to hold the value external to the module:

data SetADT x = SetADT {
    data Set -- externally visible definition of set type

    insert : Set -> x -> Bool
    find : Set-> x -> Bool

private:
    data Set = Set (ref (List x)) -- internally visible definition of set type
}

An object stores the value in the module, the only slightly tricky thing is the functions are defined on the module itself, which is requires first class modules.

data SetObj x = SetObj {
    insert : [SetObj x] -> x -> Bool 
    find : [SetObj x] -> x -> Bool

private:
    s : ref (List x)
}

Note: '[' ']' represent boxey types, not lists (which for which I am using 'List'). See: http://www.andres-loeh.de/PiSigma/PiSigma.pdf

Objects = Modules - Type-level components + First-classness

In particular, modules can very well have state, and they can be recursive. But they are not usually first-class values (exceptions notwithstanding). On the other hand, objects do not usually have type components (exceptions like Scala notwithstanding) or other non-value members.

why not unify the two?

I guess I see these things as reasonably moving towards each other (e.g. cf. Scala) . I guess I want to know from a UX perspective, what is it that really needs to be different about them? If we made a unifying abstraction would it be really all that bad? I pretty much want everything to be first class; to be parameterizable; i'd probably like to move towards stateless things; etc.

Indeed

Well, that's exactly the direction this paper is trying to go, by making modules truly first-class. The tiny bit missing in 1ML is module recursion. :)

A few years ago we have developed MixML as a very general but minimal language for (second-class) recursive modules, based on mix-in composition. As it turns out, the result already is quite object-like in ways. Alas, the semantics of recursive modules is very complicated, and involves various trade-offs (e.g. less static checking, the requirement that every sort of language-level entity (values, types, whatever else your language has) supports a suitable form of fixed point, etc.).

If it was possible to integrate the 1ML and MixML calculi, then that would be one possible solution to a module-object unification -- at least from my perspective. The result would look somewhat like Scala, but with more expressive structural typing, type components, and vertical and horizontal composition.

However, combining 1ML and MixML is highly non-trivial. And whether such a unification is really worth the complexity (and the additional trade-offs) is a widely open question, too. I have my doubts, to be honest. And if I had to make a choice, I'd rather have full composability and limited recursion than the other way round.

Is there a well-known,

Is there a well-known, compelling example to justify full recursive modules?

Two choices

From my perspective, there are only two coherent choices.

Either you have no module-level recursion, and limit fixed points to specific definitions, e.g., functions and types. Then everything is easy. That's what I meant above, and it's all that 1ML has.

Or you do have module-level recursion. But then it better be properly compositional. In particular, such a system should still allow any module to be abstracted ("functorised") over its imports, even if it is mutually recursive with them. One motivation for that is e.g. to enable separate compilation of recursive modules (the paper discusses an example). MixML provides this in the presence of the full expressive power of ML modules.

Anything in between seems unsatisfactory to me, because it would not be properly closed under modular abstraction, and thus bound to bite you eventually.

Dependent Types

The other alternative is a dependent type system, which would allow values to have associated types.

...are nice

...but as I pointed out elsewhere, associated types solve a completely different problem that has very little to do with composable modules, let alone separate compilation for recursive ones.

No Associated Types

If you don't want associated types in modules, then its trivial to make them first class as all the type-level structure is in the signature. As I would require all modules to have a signature, separate compilation is no problem, the signature goes in a header shared by implementation and code using the module. As modules are first class composability is handled, functors are a suitable signature and a function that returns a module. Recursive modules are defined as a functor, then fixed (see the object example below).

Objects and stuff

The ML terminology is Structure = Module, Signature = Interface. Using 'module' to mean the interface is going to cause confusion, I think.

You can have objects with state if you want. It raises some issues you'll have to be careful with. An alternative for state would raise some other issues, though.

I can't tell what you intend to happen at object creation time. Is the boxed [SetObj x] parameter a self parameter that will be bound at object creation? If so, why doesn't s have one? Usually the concern with fixpoints and objects is supporting inheritance. Is that what you're doing?

Signatures and Modules

I think it would be worth using signature in future in the interest of keeping things consistent with other languages.

The boxed [SetObj x] prevents infinite recursion in the type checking, and is an iso-recursive type without the type declaration. They are suspended-types, and inhabit kind ↑Type. Quoting the relevant section from the paper I linked to above:

Lifting A lifted type ↑σ contains boxed terms [t ]. Definitions are not unfolded inside boxes. If a box is forced using !, then evaluation can continue. To enable the definition of recursive types we introduce a type former Rec which turns a suspended type (i.e., an inhabitant of ↑Type) into a type. Rec comes together with a constructor fold and an eliminator unfold.

Boxing wasn't the issue

I wasn't asking about the boxing. I was asking about the use of that parameter with boxed type. Typically in an OOP language, you would write myset.insert(5), not myset.insert(myset,5). This usually works because the object myset contains a record of methods that are specific to that particular object -- not for the entire class. So that's what I'm asking about. Also class inheritance makes it where you often want to tie the recursive knot later, after all classes in the inheritance tree are known. Basically, I just don't know what you're doing.

Unnecessary Parameter

I think you are right, I've included that parameter unnecessarily. In any case instances of the record carry their own definitions of the methods as well as the value. You would probably want to have a constructor function.
Object signature:

data SetObj x = SetObj {
    insert : x -> Bool 
    find : x -> Bool

private:
    s : ref (List x)
}

Object implementation:

mk_setObj self = SetObj {
    insert x = ... self.s
    find x = ... self.s
    s = ref []
}

Construct object

o1 = fix mk_setObj

Lambda-the-ultimate still shines

If I can indluge in a meta-comment, this paper has also been discussed in the places that threaten LtU role as a place to discuss programming languages, namely reddit and hackernews. With all due respects to the few commenters that provided very informed summaries of the work (it's not clear to me why you would even try to foster scientific commentary in a place that is primarily obsessed with venture capitalism, but tastes and colors...), the quality of the discussion there was absolutely terrible. Most commenters interpreted the webpage subtitle "unifying ML into one language" as a call to merge Standard ML and OCaml, and the discussion went downhill from there.

Ha!

Thanks for reminding me that I should also unify the titles of page and paper. :) Done, now less misleading.