modules are anti-modular

I understand this has been tweeted all over tarnation and discussed on Reddit, but a quick Google search of LtU seems to show that it hasn't been mentioned here. Which is a shame. So, here goes: Types Are Anti-Modular by Gilad Bracha. A nice followup on some of the theses that Gilad put forth in FLOSS Weekly 159: Newspeak where he was interviewed by Randal Schwartz.

A couple of quotes from the interview:

Learning is brain damage. I've spent the last 18 to 20 years unlearning the things I learned in my Ph.D. studies.

Also:

[The fact that Newspeak is not mainstream] is actually a competitive advantage.

Not exactly a new sentiment but still… Overall, the FLOSS interview with Bracha was almost as good as the one with Ingalls.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Ha! IIRC, I asked about this

Ha! IIRC, I asked about this very issue just a few months back here on LTU.

Specifically, I believe I asked about potential transitive resolution of imported types used in argument and return types in a module's exported terms (e.g., functions).

One wants to "keep it very simple," and in reality it's so very easier said than done.

- Scott

This is what functors and structural matching are all about!

One of the main points of functors and structural signature matching (aka structural subtyping) in the ML module system is that one does not need to necessarily agree upon interfaces completely up front. I can provide a module of signature S1, you as a client can abstract yourself (using a functor) over a module of signature S2, and we write our modules completely independently. Then, if S1 matches S2, voila! If not, you can write a coercion module after the fact. How the absence of types would make this kind of programming any more modular is completely beyond me.

So, at least if you program in a civilized language with a proper structural module system, Bracha's argument doesn't hold water AFAICS.

Interfaces in Google Go also work this way

You can code against a set of private interfaces describing the features you want from the types you use. Then the code in that module is only checked at compile time against its own contents. (In Go, the table mapping the methods of an interface to those of a type is computed at runtime using reflection.)

"You cannot typecheck modules in isolation [...] because types capture cross-module knowledge" is wrong because typechecking within modules can be separated from typechecking/linkage between modules.

http://research.swtch.com/2009/12/go-data-structures-interfaces.html

No reflection involved

Just a comment: Go interfaces give you a certain degree of structural typing, but it is comparably weak. Also, while lookup tables are generally constructed at runtime, there is no reflection involved (IIRC, Go deliberately does not have any form of reflection). The construction is fully determined by static types and environments, and only parameterized over other dynamic lookup tables. Somewhat similar to the translation of type classes in Haskell.

Just for clarification

The language seems to have some amount of reflection, http://golang.org/pkg/reflect/

I would like to think that

I would like to think that Gilad knows about functors and structural matching, and that his criticism goes beyond the technical definition of separate compilation to include informal semantics that can't be captured by the type system. T

Huh?

How can types inhibit informal semantics that they don't capture? If anything, the inability to capture assumptions about inter-module dependencies would be an argument that existing type systems are too *weak*, not too strong.

Moreover, the argument that someone should know about X has no bearing whatsoever on whether their argument is refuted by X.

My point was I don't think

My point was I don't think Gilad's point was just about types and separate/independent compilation as this problem is pretty much solved. In this case, I'm questioning our understanding of the thesis, not the argument of its maker.

Right

Obviously, I agree with Derek. :)

Moreover, I think that the other part of Bracha's argument -- namely, that type inference as an alternative to explicit signatures would leak implementation details -- is equally misleading. Inference will certainly expose certain details in general. But it won't do any more so then an untyped language. In fact, you can generally make more observations in an untyped Ianguage, so by Bracha's own line of argument, untyped languages are even less modular!

Right

Bracha is making an extremely strong claim here, as I point out in the comments there. He thinks that all static information is unmodular, because you have to agree on it ahead of time. This includes names of exports.

Basically, he wants everything, even reference, to be a message send, but he phrased it in a way to provoke more discussion.

If it includes names of

If it includes names of exports, why doesn't it include names of messages?

Assumptions

...or the format of the message payload?

Seriously, you have to assume something (= agree on it ahead of time), otherwise you cannot do anything. For this purpose, a minimal module signature, a structural object type, a message protocol, or whatever else you prefer to come up with, are pretty much isomorphic implementations of "something". And not checking that assumption early on doesn't make it less of an assumption.

messages

Because that allows "independent compilation", which is the definition that Gilad's taking of "modular". Not that I think this makes sense, but there you go.

Why?

I still don't see why naming exports (or any other similar feature) would necessarily prohibit the kind of independent compilation he desires.

Naming exports

If your module can import names from another module, then to compile your module, the compiler must know what those names are. That requires either the compiler to have previously looked at the other module (not independent compilation) or some shared interface file. Gilad declares that interfaces also violate independent compilation, leaving you with nothing.

What do you mean by import?


module Blah
  import foo, bar, baz
  ...
end

I can certainly write a separate compiler for languages with this kind of module, without having to know about other modules, since Blah is declaring what its own imports are. Of course, you will later need a linker to resolve Blah's names to stuff, but ML functors and mixin systems (eg, Derek's RTG and PLT's units) have supported this kind of stuff for ages.

What am I missing?

I've found most of this

I've found most of this discussion vague because I don't know what is meant by compile -- to what? Not full compilation seems to be an assumption, because some sort of linker will be left (in most comments). Likewise, basic optimizations like stack allocation seem to be assumed unnecessary as well. The question, as I understand it so far, leads to solutions that are worse than useless.

(Edit: this wasn't meant at Sam, but more at a meta level of the type of back-and-forth the vague formulation is leading to).

Compile means compile

I'm not sure what you're confused about here. Separate compilation is a well-understood notion. You compile modules of a program separately, and a module that has some dependencies on unknown other modules will essentially be compiled as a function. Then, when you want to form a full executable program, you have to produce some additional code to link the already-compiled modules (essentially by function application). There are other possibilities, but that's a common one.

But it's worth noting that one need not care about separate compilation per se in order to enjoy the benefits of types and modularity. They are useful for robust program structuring even if you only care about compiling whole programs.

What must be accomplished by

What must be accomplished by the compiler and what can be accepted as a link-time / load-time action? For example, compiling without IPO should be more the 1% case rather than the 99% one, yet separate compilation of modules prohibits this. If I accept IPO (including across module boundaries) as a necessary part of the compiler, separate compilation is impossible (unless you have ridiculous import statements). If you can get something to compile separately, by my definition, it's relatively useless.

We can push these optimizations into load / link / jit time (and many, like tuning, really do need to be there). Considering there's no problem there -- static separate compilation just builds ASTs etc. -- it sounds like we're not really talking about compilers but type checkers / verifiers. There, I agree with Gilad's comments about type systems being either too weak or cumbersome for true switching of modules without testing. My opinion doesn't really matter: the point is that a different problem is being worried about than how to compile.

If the goal isn't correctness but fast compilation, a simple solution is to use an interpreter. If you want your cake and to eat it too (fast code with fast compilation), there's a heck of a body of literature on how to write fast compilers that isn't used in practice (and much of the modern JS JIT stuff is actually about avoiding this trade off). E.g., parallel, SIMD, and incremental evaluation at a fine-grained level. I'm increasingly a fan of PGO/tuned approaches, and in those, the set of problems is way different from what's being discussed here.

My opinion doesn't really

My opinion doesn't really matter: the point is that a different problem is being worried about than how to compile.

This is exactly what I thought. We are getting hung up on a detail that really isn't that important. I spent my early grad school years working on modularity and separate compilation and found that it was easy to do even for an OO language like Java. So it works, but it was kind of meh: separate compilation wasn't really the problem with modules, the problem was more about high-level semantics that couldn't be captured by a type.

If the goal isn't correctness but fast compilation, a simple solution is to use an interpreter.

I'm partial to incremental compilation and live programming myself, but again is this the problem people really care about? Or is there something else about developer efficiency that would still act as a bottleneck?

I'm partial to incremental

I'm partial to incremental compilation and live programming myself, but again is this the problem people really care about?

The usability of these approaches for software in the large is questionable. I'm obviously biased for the GUI domain more for coordination than performance reasons), but, even then, acknowledge that traditional approaches have heavy usability barriers (when considering the target audience of designers. My ideal would probably be more like PBD on top of a tangible change-propagation layer. The stuff on languages for incremental algorithms has been very questionable; I haven't seen anything compelling to me outside of the GUI and graphics domains.

However... The problem of a fast compiler itself is pretty important. E.g., in the mobile space, there's a big demand for a fast optimizing compiler for CSS/DOM/JavaScript code. The current dominance of low-level languages in mobile systems seems symptomatic of our failure here. Some funny solutions are possible, like Opera Mini's and SkyFire's attempts at offloading (SkyFire's is definitely one to watch, and we're seeing it elsewhere like in OnLive), but once we take power and connectivity into consideration for mobile, local computation solutions are still desirable, and heterogeneity likely forces our hand for local compilation.

The particular approach of getting a fast compiler by specifying it in a declaratively incremental/parallel language and code generating the compiler from it has been compelling enough for me to base my thesis on it. I'm specifying the CSS language and code generating a layout engine / compiler with all sorts of optimizations too hard to do by hand (can email you about all that, including some new GUI language we're doing on top that you might like). I've definitely taken the high road here -- the status quo for optimizing compilers is to roll your own for anything above lexing/parsing.

There is quite a big

There is quite a big difference between compilation during development and compilation when the product is deployed, so I think that they should be considered separately.

The only reason we are in the mess today of transferring source JavaScript code over the wire rather than machine code or optimized byte code is that it is convenient and we must think the compilation technology is good enough. If the problems become too great, there is always better (but less convenient) solutions to consider.

The stuff on languages for incremental algorithms has been very questionable; I haven't seen anything compelling to me outside of the GUI and graphics domains.

This sounds right but the field is still young, I wouldn't give up yet. I had a lot of fun turning the Scala compiler into a true incremental compiler via cheap dynamic dependency tracking and re-execution, so I believe there are still many more pragmatic solutions to explore, possible at a very non-invasive language level.

The particular approach of getting a fast compiler by specifying it in a declaratively incremental/parallel language and code generating the compiler from it has been compelling enough for me to base my thesis on it.

Generating a compiler from a specification is a sort of holy-grail of PL. Many of us have attempted it and most (or all?) of us have failed at it, I wish you luck :). CSS sounds like a more manageable problem, perhaps. I would love to hear about the details, and perhaps you could publish something about it at PLDI next year, its in my neighborhood.

This sounds right but the

This sounds right but the field is still young, I wouldn't give up yet

I haven't :) Revisions is one particular new comers that increases usability.

More intriguing, I think we have a lot of unexpected pay-offs ahead. For example, I've been playing with the idea of optimal parallel speculative execution by using incremental repair for the CSS stuff (not really essential for our domain, but we need to shove stuff like that on top of the real work if we do want to do something like PLDI :)) It's a useful tool to have in your bag!

re: incremental algorithms

stuff on languages for incremental algorithms has been very questionable; I haven't seen anything compelling to me outside of the GUI and graphics domains

They is plenty of value for incremental algorithms (and incremental compilation) in live data-fusion and control systems, too.

The question is more of

The question is more of using a language-based solution.

Another place where it makes sense is dealing with the coming problem of cache vs. recompute to deal with slow memory.

Incremental for Job Control and Concurrency

Suppose a language requires us to express non-incremental algorithms in an incremental manner, forcing our hand at any non real-time calculation (such as ad-hoc loops or recursion). In each step (or instant), the incremental process provides a view of the result.

A non-incremental algorithm can be modeled in such a language; it just presents a 'Maybe' type for the intermediate view, with the final answer or nothing. This wouldn't be much easier than providing a useful intermediate answer.

I believe this would allow us to more effectively express a lot of issues - such as job-control, timeouts, concurrency, CPU resource scheduling, priority, and concurrency. In state of the art languages, most of these issues are implicitly and badly handled.

I'm not convinced there's a future case for languages that DON'T enforce a model of incremental computation. I question whether there will be any value for such languages in the future.

Extensionality in danger?

What you want to do is to break the "function computation/evaluation as a black box" model of usual programming languages, and choose finer-grained semantics reflecting "low-level" considerations you have (computation time, memory consumption, scheduling...).

The current general approach¹ is to layer a finer-grained model where such "less extensional" objects are defined on top of the previous paradigm: explicit datatypes with explicit "evaluation" computations, encoding those finer-grained aspects you're interested in.
Functions are a meta-level tool of abstraction and definitional convenience; all "serious matter" should be handled by explicit data types.

Your claim amounts to saying that *everyone* should drop the black-box model and move to a given finer-grained model you're more interested in for your application domain. That is, you hope that your specific domain will become the de facto universal model in the future. I'm not convinced, but the future will tell.

¹: Haskell Monads and the Scala grant approach of defining DSLs on top of Scala by reification are both instances of this approach. It's maybe questionable for Haskell Monads, as they may also be seen as a way to translate higher concepts *into* functions, seen as the basic construction of computation. So maybe it's only some Haskell monads, or a particular style of use of monad.

PS: agreed, programmers already reason non-extensionally about functions and programs, with consideration of complexity, runtime, memory consumption, etc. This is however mostly informal, not part of the language semantics (and eg. the compiler can break most of this unspecified reasoning).

No danger there.

I'm very interested in secure, robust composition and extension of systems. My current model, RDP, is easily extensible - agents can discover each other and interact through shared registries (both public and private) thus supporting plug-ins and introducing new services. I favor object capability security, which is very effective as a basis for 'black box' services.

So I'm not breaking any black boxes.

But I am interested in improving on the traditional request-wait-response protocol typically associated with function evaluation and composition. Our choice of communication and composition protocols (message passing, REST, streaming and dataflows, etc.) affects which rules each process must obey, and how we integrate a new component or extension with the rest of the system. In an incremental model, for example, we will always have an incremental request (for each step, possibly with incremental data) and an incremental response (with each intermediate view).

And I'd also force many of today's broken black boxes to be more obvious about it. For example, you can't hide delay or divergence in an open system. It's unsafe to even try. The fact that our algorithms aren't incremental today doesn't mean they happen instantaneously; it simply means that we use hackish mechanisms such as timeouts and 'kill -9' to stop a request that is taking too long.

Black boxes in physical reality cannot subvert physics. This comes as a surprise to many programmers, but: black boxes in virtual realities cannot subvert physics, either. We can have black boxes, so long as they all obey certain rules.

import

What you write is fine in Gilad's perspective, and that's basically exactly what Newspeak, his language, does, except you'd write it this way:

module Blah(foo, bar, baz) ... end

This makes it clear that we've just written a function, which we'll link later.

If what you want to do is write something like this:

module Blah
import foo
x + 1 // x comes from foo
end

Then you need static information about foo ahead of time. This is the sort of thing you can do with the ML module system, or with Units, or with any number of other systems.

Another way to think about the perspective that Gilad's taking (which, again, I disagree with) is that every variable reference is really a message send, and static knowledge about variables is trying to rule out MessageNotUnderstood, which he doesn't want to try to do.

Thanks, that makes sense

So if we removed the open Foo construct from ML's module language, then it would be Gilad-ically acceptable? I could definitely live with that, since open is my least favorite feature of ML. :)

I am not quite on board with treating modules as functions, though. Mixin linking seems very natural to me. That is, if you have a module M which imports an interface I and exports J, and a module M' which imports J and exports I, then you should should be able to put M and M' together to get an I and a J (perhaps up to some well-foundedness constraints).

Stuff like this is a PITA in ML, but is a reasonable thing to want.

open is not enough

No, open is a red herring. If you have a structure Foo with members x and y, then Foo.z is a static error. It is that static error that Gilad objects to, because it requires knowledge of Foo when compiling the references.

Ahead of time?

Yeah, and my point is that you don't have to agree on static information "ahead of time". I can write whatever code I want, you can write your code assuming whatever "static information" you want about my code, and if they turn out not to link up immediately (e.g. because we named our methods differently), then big whoop: we can write coercion code "later in time" to convert the names of stuff in my exports to the names of stuff in your imports. I fail to see what the problem is that he's complaining about.

Fallacious

Seems a bit a fallacious argument.

You could for instance allow to compile a module A which reference another module B without checking the B types and defer the typechecking at runtime when doing the dynamic linking of both modules. Would that be modular ?

And what about languages that autocompile from sources ?

Dependencies are not modular in general, whatever the type system. Since you need to know the API it means that you need to have at least an agreement on method names, and maybe on types as well.

A hypothetical counterexample ...

I don't think I quite know all the relevant jargon here, but I'm remembering an old programming language where there was a "Morphic" datatype which was part of the language, and thinking it may be a counterexample.

Morphics supported allocation, destruction, the addition or deletion of attributes, mutation of or reference to the value of an attribute, listing of the current set of attribute names, and "locking and unlocking" (when a Morphic was locked, addition or deletion of attributes, or mutation of or reference to attributes declared private, would fail. Morphics could only be unlocked in the same compilation unit that had most recently locked them).

There were no user-defined types at all. In practice, we defined what we called types procedurally by creating a Morphic, adding relevant named attributes to it, and locking it. Routines were first-class values, so an attribute of a Morphic could have a "routine" or "method" as its value (the last two distinguished because a "method" was a closure that contained a self pointer and so had a way to refer to the particular Morphic it was carried by, and a "routine" did not).

Independent compilation was easy, because "Morphic" was sufficient as a signature for linking. A signal value for a runtime error would be returned if you queried or attempted to mutate an attribute that a particular Morphic lacked at that particular instant.

If I were updating it, I think I'd add the idea of interfaces and invariants to allow some more useful automated checking; But there could be a lot of different Morphics from different sources that all satisfied the particular set of interfaces and invariants required by a particular module.

Are the "Interfaces and Invariants" what he means by "structural typing?"

Does repeating that something has to satisfy the "iteration" interface in a module that actually uses iteration over it count as repeating unnecessary information in many places?

Or is it the need for a context in which the interface name "iteration" can be resolved that he's against?

Also, many of the interfaces (though not the invariants) could be inferred. It seems that routines to check invariants couldn't be part of the object; that would be meaningless because if something were the wrong type it would carry the wrong invariant-checking routines -- so they would be checking a different set of invariants than the set required by the particular module at hand.

Would the invariant-checking routines (for example, a routine that checks to make sure that the polar and rectangular coordinates of a complex number refer to the same point) be what he considers to be redundant or repetitive information? Even if it were included in a module that wouldn't work correctly unless it were true, and not included in a module that refers to those attributes (say to build a density model of a region in complex space represented by a sample of complex numbers in a list) and which would work correctly regardless?

Or is checking invariants the kind of operation he considers to be leaking implementation information?

Curious....

Composition is Anti-Modular

at least in the sense Gilad Bracha seems to be using anti-modular. You can defer putting all the pieces of a program together until later and later but at some point, before you can run a program, you do have to put them together. You could probably even go as far as saying that meaning is anti-modular in this sense. Which is why most people use a more pragmatic definition of modular involving well-defined contracts between modules.