Types for Flexible Objects

Types for Flexible Objects, by Pottayil Harisanker Menon, Zachary Palmer, Alexander Rozenshteyn, Scott Smith:

Scripting languages are popular in part due to their extremely flexible objects. These languages support numerous object features, including dynamic extension, mixins, traits, and first-class messages. While some work has succeeded in typing these features individually, the solutions have limitations in some cases and no project has combined the results.

In this paper we define TinyBang, a small typed language containing only functions, labeled data, a data combinator, and pattern matching. We show how it can directly express all of the aforementioned flexible object features and still have sound typing. We use a subtype constraint type inference system with several novel extensions to ensure full type inference; our algorithm refines parametric polymorphism for both flexibility and efficiency. We also use TinyBang to solve an open problem in OO literature: objects can be extended after being messaged without loss of width or depth subtyping and without dedicated metatheory. A core subset of TinyBang is proven sound and a preliminary implementation has been constructed.

An interesting paper I stumbled across quite by accident, it purports quite an ambitious set of features: generalizing previous work on first-class cases while supporting subtyping, mutation, and polymorphism all with full type inference, in an effort to match the flexibility of dynamically typed languages.

It does so by introducing a host of new concepts that are almost-but-not-quite generalizations of existing concepts, like "onions" which are kind of a type-indexed extensible record, and "scapes" which are sort of a generalization of pattern matching cases.

Instead of approaching objects via a record calculus, they approach it using its dual as variant matching. Matching functions then have degenerate dependent types, which I first saw in the paper Type Inference for First-Class Messages with Match-Functions. Interesting aside, Scott Smith was a coauthor on this last paper too, but it isn't referenced in the "flexible objects" paper, despite the fact that "scapes" are "match-functions".

Overall, quite a dense and ambitous paper, but the resulting TinyBang language looks very promising and quite expressive. Future work includes making the system more modular, as it currently requires whole program compilation, and adding first-class labels, which in past work has led to interesting results as well. Most work exploiting row polymorphism is particularly interesting because it supports efficient compilation to index-passing code for both records and variants. It's not clear if onions and scapes are also amenable to this sort of translation.

Edit: a previous paper was published in 2012, A Practical, Typed Variant Object Model -- Or, How to Stand On Your Head and Enjoy the View. BigBang is their language that provides syntactic sugar on top of TinyBang.

Edit 2: commas fixed, thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Comma comma comma comma comma chameleon

It turns out that Pottayil Harisanker Menon Zachary Palmer Alexander Rozenshteyn Scott Smith is four people.

Meme. Brutally installed

Meme. Brutally installed through eyes. Culture club will never be the same.

Now I've found a whole culture of comma chameleon...

And here I have lived my

And here I have lived my whole life thinking they were singing "comma comma comma comma comma comedian." This is going to trigger a midlife crisis.

Scapes vs. First-Class Patterns

Our scapes generalize first-class cases by supporting composition of arbitrary case expressions, not just the addition of one clause. [..] We believe that scapes offer the first high-level algebraic case composition operator which can directly support object inheritance for a variant encoding of objects.

Whatever happened to first-class patterns? Or even better: prisms?

Perhaps I'm not seeing what is special about these scapes. Is there some special property they offer?

The return values of

The return values of first-class patterns must all unify to the same type. This was a problem with all variant-based object encodings until match-functions, and then scapes, as far as I know. The return type depends on the argument value (variant/message), which is why match-functions/scapes are lightweight dependent types.

Edit: although I see what you mean, in that you can argue that the feature I mention is the property provided by onions. Scapes are actually less expressive on their own than first-class patterns, which return a "Maybe T" to support match failure. Scapes don't support match failure and so are total.

Scapes don't support match

Scapes don't support match failure and so are total.

Ah, that's the detail that I was somehow ignoring. We need dependent types for first-class patterns to be equivalent to scapes. OTOH, scapes also seem to depend on dependent types.

We just finished more material

I am the Zachary Palmer on the paper described in the post and I'm glad to see that some people might have enjoyed the read. For anyone who might be interested, we've just put up a technical report on a related topic. This report describes a language with which we're experimenting, PatBang, which extends TinyBang to have higher-order, first-class, and recursive patterns. Please feel free to share your thoughts; we love feedback. :)

Very interesting extension

Very interesting extension of TinyBang. With higher-order and recursive patterns, PatBang captures a lot of the power of the pattern calculus and even some type systems. It looks pretty expressive and the concepts are themselves a little mind-bending, although I'm still getting used to the syntax needed to manage these concepts.

I think the only remaining downer is the requirement for whole program compilation. Do you expect modular type checking of PatBang to be difficult?

This is probably premature, but what sort of compilation strategy do you envision for PatBang or TinyBang? I mentioned in the story post that other row polymorphic systems are amenable to dictionary-passing implementations for pattern matching, but it's not clear whether this is still the case here, particularly given PatBang's higher-order patterns.

Also, the first sentence of the intro is a little confusing:

Patterns provide an important dimension of expressiveness to functional programming languages because they describe a concise syntax for data is destruction.

the only remaining downer is

the only remaining downer is the requirement for whole program compilation

Why is this a downer?

I can understand a desire for incremental compilation (so we can avoid redundant effort). And for staged compilation (so we can have late-binding for values).

But I haven't understood the value of separate module compilation in a very long time. It hinders optimization, results in more fragile code and weaker safety properties, it isn't much better than an obfuscator for protecting IP (and I want my code exposed-source anyway), and so on.

What is the appeal?

Because non-modular

Because non-modular compilation inhibits modularization by definition. Packaging, shipping and composing programs after the fact becomes very difficult with a closed-world assumption.

Non-disagreement?

I think Dave is complaining about local code generation whereas you're arguing for modular semantics, type checking, etc. These two positions aren't at odds (or, at least, can coexist)! But I think this work suffers from a lack of modular semantics, as you complained.

Software components are very

Software components are very modular.

That doesn't mean we want to compile them separately. There are often many cross-component optimizations, specializations, and dead-code or redundant-code eliminations one can perform once one knows the larger context of usage.

I'm not sure why you're imagining a closed-world assumption. Packaging, shipping, and composing programs is quite possible without compiling them first, or by incrementally recompiling them live.

Modularity is a gradient

Separate compilation is useful in isolating type errors, its not just about optimizations. Also, those optimizations are often not that useful in practice, and trading them off for better compilation performance is preferred...consider the mess that is C++ templates!

Finally, the question is usually not "if" information seeps across a module boundary, but how much? The worst case is that the entire implementation seeps through, the best case is that nothing seeps through (but this isn't practical if the module is "used"), with many many viable points in between. See this paper for a more thorough discussion.

Note that modules don't really require explicit signatures, although this helps. We can always infer the signature as part of a unit compilation process, which is what typeless does to avoid whole program compilation without type annotations that would otherwise effectively act like explicit signatures.

Separate compilation is

Separate compilation is useful in isolating type errors

Is this true? I acknowledge there seems to be a correlation between separate compilation and isolation type errors, but my intuition is that the cause lies elsewhere.

I do know for certain that separate compilation, in actual practice, seems to obfuscate many errors - e.g. if you have a null-pointer assertion, you might find that (a) you don't have the code to see where it occurred, or (b) you have the wrong version of the code and your pointers are off in whitespace.

If my goal was isolating errors, I'd be more interested in securable programming models, eschewing ambient authority, controlling access to state. I'd be interested in types. I'd be very interested in live programming, incremental compilation, and IDEs that can identify conflicting edits.

But I don't think separate compilation would ever make my list.

those optimizations are often not that useful in practice

That depends on the programming model. :)

Dataflow models seem to benefit more than control-flow models. Dataflow models with broadcast communication (tuple-spaces, shared state, or publish-subscribe) - seem to benefit more than dataflow models without it. Also, a model with commutative and idempotent effects can greatly benefit.

the question is usually not "if" information seeps across a module boundary, but how much?

We should also be concerned about "when" information seeps across module boundaries.

Consider a choice:

  1. implementation details leak across system boundaries at runtime
  2. implementation details leak across system boundaries at compile time

Now, there are the obvious potential scenarios:

  • there are no obvious problems at compile-time
  • there are obvious problems at compile-time

We don't have solid evidence, but I think anyone here can make a good, educated guess about how these scenarios are likely to intersect. It seems to me that, if I try too hard to support boundaries for separate compilation, that I might become a victim of my own success: shifting more logic to runtime, dimming my awareness of module interactions, creating a boundary for my own comprehension.

Rhetorical question: What, if any, is the relationship between Greenspun's tenth rule, and separate compilation?

YinYang

The YinYang paper you linked is quite interesting work which seems very related to what we are doing. Please correct me if any of the following is wrong, of course. :) The term assignment graph described in Section 2 seems to be quite similar to a type constraint set; that is, an extension "# seed A ▷ Duck" seems to be quite similar to a subtype constraint "α_A

One interesting property that YinYang has is that method calls are nominally tied to trait declarations; that is, "A.Waddle()" infers that A is some kind of Duck because the Duck trait declaration includes the Waddle method. TinyBang does not require or offer such a declaration mechanism, so the desugaring of "A.Waddle()" would only infer that A needs to be able to Waddle.

One reason that TinyBang does not use a signature inference approach similar to YinYang is the weakly dependent type behavior of scapes. The TinyBang expression "((char -> 'z') & (int -> 4)) arg" is inferred to have type "int" whenever arg is "int"; this will be true regardless of how many boundaries the arg variable has crossed, how complex its type became, or how complicated the bodies of these (otherwise type-constant) functions become. These weakly dependent properties allow TinyBang to model complex, widely-scoped invariants, but they are also the reason that our inferred types make for bad module signatures: they expose a lot of implementation detail, meaning that almost any interesting change to the implementation will change the signature.

I hope I didn't ramble on for too long. Thank you for the thoroughly interesting and quite related read. :)

The term assignment graph

The term assignment graph described in Section 2 seems to be quite similar to a type constraint set

The primary difference in YinYang is that constraints always flow in the requirement direction and never in the the "provides" direction, which means we have to do something special for encapsulated values in methods that cannot take on anymore traits.

TinyBang does not require or offer such a declaration mechanism, so the desugaring of "A.Waddle()" would only infer that A needs to be able to Waddle.

Ya, YinYang doesn't support duck typing by default. You could always make Waddle an abstract method in one trait though, which is roughly equivalent.

These weakly dependent properties allow TinyBang to model complex, widely-scoped invariants, but they are also the reason that our inferred types make for bad module signatures: they expose a lot of implementation detail, meaning that almost any interesting change to the implementation will change the signature.

YinYang signatures are also not fit for human consumption, and inference is sensitive to small implementation changes, e.g. consider changing:

def foo: 
 return 10

to

def foo:
 return 10 + 20

The latter signature will actually include "+"'s arguments and return values, since "+" is a public method. YinYang's encapsulation function will hide private terms (non-argument/return values) as well as private members (if encapsulating through a trait), so implementation changes do not disturb the signature except when they occur at the periphery of the compilation unit.

I am getting into a bit of dependent typing in YinYang when considering how to model dynamic type checks; e.g. consider:

def foo(x,y):
 switch x:
  case z : int => y.Waddle()

If x is an int in this case, then y must be a duck. I haven't implemented this yet and I'm not sure if this scheme will work; the closer I approach dependent types, the more nervous I get about scalability.

A sort of dual limitation

Thanks for the feedback. :) I think I've sorted out a relationship between TinyBang and YinYang. In TinyBang, because all data is pushed forward, we cannot typecheck code unless we have a concrete example of data. (This will lead us to requiring type signatures at module boundaries because we can use the type signatures as concrete examples.) In YinYang, however, we must have an example of use in order to correctly construct an object; that is, the naïve translation of something like a Java factory class would not work unless the code at the construction site exhibited the behavior necessary to ensure that the trait was included. For instance, consider

public class FooFactoryImpl implements FooFactory {
public Foo make() {
return new FooImpl();
}
}

is not correctly translated to


def make():
new()

since we won't know anything about the object by the time it crosses the module boundary. (I imagine something like a type cast or an optional explicit form of object construction would get around this problem.)

This isn't really a tremendous limitation or anything, but I feel like it completes a puzzle for me and I wanted to share. :)

You can always specify your

You can always specify your trait extensions explicitly (new is FooImpl), but its not how YinYang was designed to be used, which doesn't need factories given mixin-like traits and trait inference.

We could actually allow for inference of trait extensions for encapsulated object allocations in a way that was safe, type safety and modularity aren't the issue, but how to satisfy constructor arguments?

Conditional Constraints?

In response to your dependent typing comment, it sounds like it gets pretty close to a full-blown conditional constraint system. I've seen similar work in the space (e.g. I was just looking at "A versatile constraint-based type inference system" by François Pottier) and it seems like you are right: there are significant obstacles in terms of scalability. I'm not terribly familiar with efforts to simplify conditional constraint sets, though, so I'm afraid I'm not in a position to give a summary.

Packaging, shipping, and

Packaging, shipping, and composing programs is quite possible without compiling them first, or by incrementally recompiling them live.

In what form would you package and ship such programs? With whole program compilation you're forced to ship the entire source code or some AST-equivalent form. As you previously mentioned, this has licensing issues, and not just for proprietary code; even some open source licenses would preclude this sort of compilation.

Separate compilation also impacts the scalability of the build process. Modular compilation implies parallel compilation along module boundaries, and incremental recompilation as you previously mentioned.

If code distribution is the

If code distribution is the norm for a language - e.g. in Python or JavaScript - then developers and their communities will choose licenses with this in mind.

Separate compilation of modules might be a sufficient condition for a scalable build process. But it is certainly not a necessary condition. Incremental compilation can be based more on memoization, caching, dynamic algorithms. Parallel compilation can be based more on pipelines, purity of the compilation process, and comprehending dataflows.

Assuming your concerns of licensing and scalability were addressed, would you be satisfied? Or are we beating around some deeper root-concern here? Is it due to bad experiences with C and C++? I've been confused about the often strong emotional reactions surrounding this issue.

Is it due to bad

Is it due to bad experiences with C and C++? I've been confused about the often strong emotional reactions surrounding this issue.

Not just C++, but most whole program compilations in other languages like MLTon have led to pathological cases of some sort. Certainly far less common than C++, of course.

It also seems like a good idea to have well-defined boundaries for understandability and debugging purposes. If your language semantics requires global reasoning, it just seems like an impediment to mental scaling. If your language is not an impediment to mental scaling, then it probably has a modular compilation strategy, so why not specify it?

Further, if you have some feature that depends on whole program resolution, like Haskell's type classes, first-class cases or multimethods, it's not at all clear how this interacts with dynamic code loading. A modular compilation strategy clarifies all such issues.

I'm not particularly against whole program compilation for systems programming, since the environments here are far more constrained and dynamic loading less pressing, but it certainly seems more important in higher level systems.

the moment you specify it, you lose something

If your language is not an impediment to mental scaling, then it probably has a modular compilation strategy, so why not specify it?

Because, the moment you specify it, you lose something.

The moment you start drawing hard lines, you lose the expressiveness of fuzzy lines. And this has a real cost in terms of flexibility, adaptability, certain forms of natural robustness and ad-hoc specialization. This is an interest of mine, cf. my article on stone soup programming.

If your language semantics requires global reasoning, it just seems like an impediment to mental scaling.

I agree there. I'm all for 'compositional' reasoning, which is by nature even more constrained than 'modular' reasoning.

But I think (hope) reasoning can be modular or compositional without having so many hard lines. This should even be more natural. Just think about how you go about your day, the plans you make, how your goals are integrated into a reasonable course of action.

how this interacts with dynamic code loading

I strongly believe that any such interaction issues should be addressed in semantics, not via an independent 'modular compilation strategy'.

Imposing whole-program is a design defect

If you can explain modular compilation (and typechecking, etc.), you can do whole-program as well (just make the whole program a single unit). So none of the problems you pointed out are a fundamental limitation of language/systems designed with modular compilation in mind.

I find the fight for modular compilation to be a good design pressure. You smell something wrong when you see some ambient authority (without necessarily being able to immediately tell what will go wrong with it), I smell issues when whole-program something is mentioned -- in a way that makes it clear that the alternative of modularity is not available.

Indeed

I whole-heartedly agree. Whole-program means that you are almost inevitably going to run into a scalability problem. It's bad for both compilers and users.

If you cannot compile modularly, then -- even if you are fine with a JIT -- you cannot generally compile lazily or in parallel, which is what high-performance runtimes need to do.

And if you cannot even type-check modularly, then that means that the user also cannot reason modularly.

It's fine to use whole-program compilation. It's not fine to require it.

Re: scalable, lazy, parallel

You assert several things: "inevitably going to run into scalability problems", "cannot generally compile lazily or in parallel", "cannot type-check modularly means the user cannot reason modularly".

But I am not convinced. Indeed, I have already provided counter-arguments regarding scalability, parallel, and modular reasoning elsewhere in this discussion. Those issues are orthogonal to source-level modules.

I'm sure laziness could also be addressed, but I don't know of any motivation for doing so. At least in the context of whole-program compilation and per-callsite specialization, I can compute precisely what I need and I'm not aware of a good reason to be lazy about it.

Can you provide strong logical justifications for your assertions? Or is your "whole-hearted agreement" based more in your heart than your head?

There is a bit of a grey

There is a bit of a grey area around whole program. There is always a whole program step, even in C you have the linker. The question is how much happens in that step. There are obviously huge advantages to having modular type checking, but for compilation/optimization I'm not so sure. What you want is fast compilation and good optimization quality. There is no real reason why that should happen at module/file granularity. Those boundaries are there for humans. You want to optimize across module boundaries. If you have some other mechanism to do incremental/fast compilation, without necessarily losing cross module optimization, that is better. You could imagine some kind of dependency tracking like self adjusting computation where only part of the compilation gets invalidated if you change the code. For example if function foo in module A got inlined into function bar in module B, then if foo's code changes bar has to be recompiled.

The question is how much

The question is how much information from the linked units seep into the linker (or to say, how strongly do signatures encapsulate unit implementation details). For humans, this influences how they can understand various forms of semantic feedback; for compilers, this influences how scalable their analyses can be. Incremental/fast compilation only increases the ceiling, it doesn't make the unscalable scalable (e.g. consider CFA).

You could imagine some kind of dependency tracking like self adjusting computation where only part of the compilation gets invalidated if you change the code.

YinYang works like this right now: it uses Glitch (dependency tracing + operation logging) to incrementally adjust unit data-flow analysis and signatures on a code change, and is fairly efficient and can deliver real time feedback. But having signatures is still quite useful for scalability reasons! And of course, the whole live programming thing means Glitch is deligentally applied (its not just type checking computations that get invalidated, but program execution nodes!).

What is quite cool is that modularity is not just a static source code concept, but also applies to execution; e.g. the fewer dynamic dependencies between two nodes, the better the program's incremental and parallel execution characteristics. There is a grand unified theory of modularity to be found somewhere, I think, at one point in my life I was interested in finding it (but it made for a disastrous thesis proposal...).

In the long run, we agree

In the long run, we agree that a whole-program requirement isn't practical. Our current strategy is to start from that position and work backwards, determining how much of the whole-program requirement we can relax.

Our objective is to infer types for a programming language with a very flexible data structure and a duck-typing feel. Constraint-based types are excellent for this except that, because they are so precise, the types we infer are not suited to be used as an interface: they describe too much of the inner workings of the value. At some point, we need to know from the programmer which of the module's internal invariants should be exposed and which should not. At this point, it seems that explicit type signatures at module boundaries should be sufficient; this allows the programmer to draw the line wherever they choose.

Re: "It's a design defect because I smell issues"

I am interested in any objective, concrete issues you can provide.

It is true that optimization and fragility issues can be partially addressed by performing some whole-program compilation. However, you seem to be considering only half the picture. As a language designer, after you make a decision to support separate compilation of modules, you will make other decisions that curtail the compile-time communication between modules. As a concrete example, you are likely to eschew compile-time arguments, functions, and return values. The optimizations you achieve with whole-program compilation of a system designed for modular compilation is unlikely to be very expressive, explicit, or effective compared to what a language designer who chose whole-program compilation will achieve.

Further, while whole-program compilation of a system designed for modular compilation can address some of the fragility and safety concerns (especially those resulting from module-version management), we still have weaker safety properties just because we push certain forms of inter-module negotiations into the runtime. I have a hypothesis that separate compilation of modules tends to obfuscate implementation-leaks rather than avoid them.

And all that doesn't even begin to address the expressiveness issues! I didn't mention those as 'a problem' because, in context, Sandro just got finished mentioning how PatBang was super-expressive before he voiced "the only downer". But that doesn't mean we should ignore the issue. A system designed for modular compilation will certainly curtail some forms of expressiveness, in particular those that involving subprogram transforms that would otherwise cross compile-time module-instance boundaries.

As a language designer, I seem to have this choice:

  • I choose whole-program compilation and get a bunch of powerful explicit optimizations, compile-time negotiations between modular elements, greater compile-time comprehension of relationships between modules, greater expressiveness.
  • I choose separate compilation of modules and my language smells better to people who can't seem to explain why, I lose a bunch of expressiveness, a bunch of optimizations become a great deal more complicated, I get a whole additional set of version-management, DLL hell, and fragility issues (as though source-code version management isn't bad enough). To address those I'm told "but you can do whole-program compilation in this system!"

I'm having a very difficult time seeing why option #2 is the better one.

I do find merit in incremental builds. Even though some changes can require rebuilding huge chunks of the program graph, many changes tend to be local or naturally incremental. But incremental compilation does not imply separate compilation of modules. It seems to me that simple memoization, perhaps coupled with a little analysis of the opaque parametric types, is sufficient.

Though, focusing on incremental builds can be misleading. In my experience, separate compilation of modules is pretty much orthogonal to the issue of whether you'll need to rebuild your whole system after a given change. If a module is a dependency of many other modules, then much recompilation must occur. The difference - which has dubious relevance - is whether the whole system compile happens one module at a time or whole-program. I think it an interesting point that an incremental strategy for whole-program compilation seems much more likely to deal effectively with identifying "irrelevant changes" than just relying on separate compilation of modules.

I also find merit in separate compilation at higher layers in staged computations: mobile software agents, services, UI extensions or plugins. Though, I still want source-code or a representation with full-abstraction; in this case I simply mean "I don't recompile the usage context" by providing semantic support for dynamic behaviors.

I am interested if anyone can provide a good, objective argument in favor of separate compilation of source-level modules, under the assumption that incremental compilation and staging can be addressed effectively by an orthogonal mechanism.

A system designed for

A system designed for modular compilation will certainly curtail some forms of expressiveness, in particular those that involving subprogram transforms that would otherwise cross compile-time module-instance boundaries.

I don't think this is right. Given any particular language feature (e.g. "subprogram transforms" -- whatever that means) that you want to support, there is a compilation process that is as modular as is possible, i.e. the minimal amount of information about modules is exposed in order to be able to compose them. If this minimal amount of information is too large then you have an honest problem with reasoning in your language, along the lines that Sandro pointed out earlier. The only thing that I see you might gain by keeping everything whole-program is the ability to use ad hoc schemes for modularity without going through the process of formalizing how exactly they work. This is what I took you to be getting at earlier with your "fuzzy lines" comments. But any particular scheme that works does so for a reason and that reason can be formalized.

If this minimal amount of

If this minimal amount of information is too large then you have an honest problem with reasoning in your language, along the lines that Sandro pointed out earlier.

Let's try a concrete example: Haskell's type classes. Instance resolution is global, so what are the reasoning consequences?

1. overriding existing instances involves a newtype declaration, ie. overriding the ordering on numbers, for instance.
2. how many Haskell programs need to enable overlapping instances?
3. type-level search allows a sort of compile-time metaprogramming, although there is no reason in principle why this sort of search couldn't also happen at some sort of link time.

Does this imply Haskell-style type classes are anti-modular in important ways? I know some people think so, like the BitC folks, and there have been all sorts of proposals to address this, like named instances and instance arguments. Haskell seems to get along ok though. (Edit: although I do think something better would solve all of these issues, and provide a sane story for modular linking)

I don't think there is any

I don't think there is any general problem with globally resolving certain things, as long as you make enough guarantees to reasoning locally in the ways you need to reason. I don't like Haskell type classes very much (particularly the way they link), but I think you may be trying to frame the problems with them in too abstract a setting. My problem with them is that we shouldn't be dispatching on types and looking at them as inferred parameters is much cleaner, IMO.

Typeclasses

I'm confused about what this example is intended to argue or demonstrate. But I do believe we could benefit from more precisely modeling the 'dataflow semantics' of typeclasses, i.e. regarding how they enter our environment, how they are selected.

I've made a few attempts to model typeclasses with regards to distributed programming, extensions, plugins, open systems. My approach has usually drifted towards a 'staged programming' model, where one of the stages is just the gathering of typeclass instances (or publishing, for active module concepts). Despite my best efforts, I found this awkward and a terrible waste of a fine stage. Maybe someone else can do better.

But during these efforts, I found variations that are almost as expressive, simpler, more configurable, and support spatial reasoning. The idea is simple: typeclasses become objects in some form of scope-managed special variable. There seems to be a relationship to algebraic effects and reader monads.

Really, this is probably the better option. Attempting to precisely model Haskell's typeclasses seems to involve a lot of awkward effort for a negative return on investment - e.g. a loss of fine-grained configurations for different subprograms.

provide a sane story for modular linking

Indeed. I'm all for "sane stories for modular linking". Programmers can model a modular system, model the linking process, perform modular reasoning. I just don't see how separate compilation is supposed to be a sane part of this story. (If I'm missing something, I'd like to know what.)

Flat global namespaces (and

Flat global namespaces (and consequently resolution) are actually a good thing for humanity. They don't so much hinder separate compilation as they do enable a global dictionary of things in the programming universe (like Google does for the web...).

The problem is that type checking is still intimately involved in name resolution, which I think is quite wrong. Names are for humans, don't waste them on compilers (tm J. Edwards). After we accept that, resolving names to GUIDs through some process outside of the compiler avoids the problem completely.

source modules vs. modeled modules - same diff

The ability to model a modular structure within the language is something I encourage: software agents and blackboards, services and clients, frameworks and components, functions and objects. Programmers can operate within ad-hoc, formally or implicitly understood module systems. They can have modular or spatial reasoning, and very flexible organization of behaviors and data. A well-designed module system within the program also enables simplistic memoization-based incremental compilation to be relatively effective.

But I think this does not make a good (or even relevant) argument in favor of separate compilation of modules. Consider:

  1. there is no inherent reason the modeled modules will align with source-level modules. In practice, they often won't
  2. even if you can make your compiler aware: "oh, hey, this is a model of a module that crosses several source boundaries; please compile it" - then in that case the "module" is effectively the "whole program" as far as the people in favor of separate compilation are concerned
  3. even for modules modeled within the language, there can be a great deal of benefit for compile-time specialization and negotiation between modules. Interestingly, compile-time communication can be modular yet still contradicts separate-compilation of modules
  4. even for modules modeled within the language, programmers can benefit from high-level metaprogramming that configures, arranges, transforms, and operates on multiple of these semantic modules together, to specify ad-hoc subsystems

From the former two points, it seems dishonest to argue the issue that I can model modules within the language. From the latter two points, it would be pointless to do so, since I'd just end up repeating my arguments (including even the 'fuzzy lines' - adaptive modules concerns) for each module system invented within the language.

When I say "modular compilation will certainly curtail some forms of expressiveness, in particular those that involving subprogram transforms that would otherwise cross compile-time module-instance boundaries", this true for logical 'module-instances' modeled within the application, just as it is for source-level modules that the compiler recognizes.

Whole-program and compilation

I've corrected the typos in the PDF on our website; thanks for calling them out. :)

The whole-program requirement can be a bit of a downer, yes; one doesn't want to typecheck the entire stdlib just for "Hello, world." Because we had such aggressive demands for our type system -- being able to typecheck a scripting-style programming language with no type declarations -- we started by assuming whole-program and intend to see how much of that requirement we can relax (as opposed to letting the modularity requirement constrain our expressiveness). We're working on a model now which would use static type signatures at module boundaries. (The problem of defining a subtype relation between two polymorphic constrained types is not known to be decidable -- see http://pl.cs.jhu.edu/papers/sas96.pdf -- but we believe that signature matching can be solved with a slightly weaker relation.) In terms of type system rules and operational semantics, though, modularity doesn't seem to present a big problem.

We actually have been considering our compilation strategy; Hari, another author on the above, is the lead on that work. Most of the subtle issues with the above type signature approach have to do with memory layout of data structures. We aim to avoid any form of runtime hashing or other expensive lookup when accessing the components of onions but, given how flexible the data structures are, there doesn't seem to be a general memory layout for onions that will work for all programs without an expensive runtime lookup. So we intend to pick a different, tailored memory layout for each program. This will allow us to generate lookups which are simply offsets based on runtime tags, but it means that each module has a different layout for its data; we're currently considering a few different ways to resolve this.

(delete)

DUP

Whole program is a

Whole program is a reasonable place to start, I think.

YinYang avoids the undecidable subtyping problem by not bothering with subtyping checks in favor instead of compatibility checks.

YinYang just uses dictionaries. I'm doing so many other horrible things in my language (like logging operations for replay/live programming purposes) that method lookup is just in the noise at this point. For the same reason, compilation in YinYang doesn't optimize with custom layouts to prevent them from needing to be changed on a code change (besides, YinYang supports dynamic inheritance/uninhertance, so this would screw that up).

Why?

Can someone explain this:

seal = fixpoint (seal -> obj ->
obj & (msg -> obj ('self (seal obj) & msg))) in ...

What's the point of onioning obj to the left of a scape that matches any msg? How is that different than the following?

seal = fixpoint (seal -> obj ->
msg -> obj ('self (seal obj) & msg)) in ...

Another question: is shadowing allowed? x & x

Onioning the Object

The distinction between the two is that, in the case of the former, any non-scape values on obj are still visible. That is, if we have

def obj = `counter 0 & ...

then

def sObj = seal obj in sObj.counter

will still work (bearing in mind that dot projection is really just sugar for a scape that gets the contents of the `counter label).

In response to your second question, the onion operator concatenates shallowly and asymmetrically, preferring elements toward the right. For instance,

(`A 3 & `B 2) & (`B 4 & `C 5)

is equivalent to

`A 3 & `B 4 & `C 5

because the projector `B is the determining factor and we prefer elements on the right. "x & x" is equivalent to "x", of course, taking "x" to be a variable (and thus not induce aggregate side-effects).

We made the asymmetric decision because it allows us to encode a lot of fundamentally asymmetric things (like inheritance and overriding) with very little effort.

Oh, I see

Thanks. Yes, I somehow missed that the first time through.

It may seem unusual that obj is, at the top level, a heterogeneous “mash” of a record field (the `x) and a function (the scape which handles `inc). This is sensible because onions are type-indexed [27], meaning that they use the types of the values themselves to identify data.

Weird.

(7 & 3) + 1 is just 4.

Just like in C!