Import Systems

An import system enables connecting the inside and the outside of a language from within a language. Imports or includes are declared as statements which get resolved by a preprocessor, compiler or a language runtime. Particular solutions such as those of C/C++, Java or Python are well known but hardly ever reflected except when people struggle once and again with one or another.

What are the states of the art in 2013? Are there innovations? What are the perspectives considered under different aspects such as security, ease of configuration and robustness? Should a loose organization wired by external configuration files or a "classpath" be preferred over forms of a more strict organization like a plugin repository at a dedicated path? Is there an abstract, platonic ideal which conflicts with the gritty reality of real operating systems?

Just to avoid misunderstandings: I'm not asking for advanced module systems and import expressions which might contain parameters, inject dependencies or call for dependency resolution solvers which choose between module variants - at least not as long as it doesn't contribute to the idea on how a language relates to its environment.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

no wildcards

In my opinion wildcard imports are considered a mistake nowadays. Java programmers nearly never use import foo.*. Of course, C/C++ programmers are stuck with their #include mechanism, which is always wildcard-style.

disagree :-)

haxe < version 3 did not support .* and that drives me nuts daily. haxe 3 is, i think, now supporting .* :-)

DI, plugins, reflection

I think most state of the art here is happening in dependency injection frameworks, plugins models, and also in some reflective frameworks (e.g. in Java). It's an area where dynamic features can be quite valuable.

Modules/import should die someday

This whole idea that there should some sort of unit above the language's own abstraction capabilities has been necessary but has never worked very well. It is much better just to build search and versioning straight into the language somehow, and then throw all code into some sort of Internet-like global namespace.

Excellent idea

That is an excellent idea for brittle, unmaintainable, unevolvable, creaky and downright broken software. At Google we have an absurd amount of code--nearly all of it open, shared, and reusable--and have learned very painfully that unstructured reuse of code is extremely bad. It leads to long build times, huge binaries, frequent breakages, and a considerable drag on productivity and evolution. We have learned the hard way that modules really do help programming in the large.

Modules are compatible with

Modules are compatible with Sean's vision. They'll be first class, subject to functional processing, transform, inclusion in lists, etc..

Static reasoning about modules

First class modules are very difficult to make static guarantees about. Dealing with a codebase in the many millions of lines of code requires static reasoning, and I am skeptical that first class modules can be designed that don't defeat large-scale tools.

Static reasoning in

Static reasoning in first-class staged programming systems does require alternative mechanisms - e.g. dependent types, or evidence-carrying code that allows reconstructing a proof (cf. Microsoft Research's F*). I'd say some such techniques already beat traditional large-scale tools, at least for expressiveness and utility in open or long-running systems.

Static local reasoning can similarly be supported if imports themselves are described or implicitly constrained in terms of proofs and contracts (e.g. import by type, constraint models).

I don't really expect these techniques to take off this decade, though.

Self-imposed structure

If we integrate modules seamlessly into "the language's own abstraction capabilities," then I think the "Internet-like global namespace" will not be unstructured, due to its own self-imposed structure. When Sean McDirmid talks about search and versioning, I hear flexible dependency resolution and immunity to (or at least awareness of) breaking changes. (Maybe I hear too much? I might be coloring over this with my own vision.)

Nevertheless, maybe I'm not telling you anything you didn't expect; your concerns would still be valid whenever the module developers didn't impose an effective structure on themselves. Programmers would need to use semantically honest and open-minded search terms, or else they'd have brittle one-to-one dependencies. Programmers would need to do frequent deprecation-related refactoring, or else they'd drag down build times and binary sizes with several API-incompatible versions of a "single" dependency. It would be nice to minimize these responsibilities too, but how? I expect cultural conventions to become more supportive than formalism at this point.

Language agnostic modules

I think language-provided modules are best, and there are certainly opportunities along the ones you describe. However real systems for the foreseeable future are going to continue to be multi-module and multi-language. Thus some kind of language-agnostic or at least inter-language module system is needed.

That is an excellent idea

That is an excellent idea for brittle, unmaintainable, unevolvable, creaky and downright broken software.

I'm sure someone said this about the web once, then we fixed the problem without going the Yahoo directory approach. The module problem with programming seems to be mostly one of legacy, but I don't think it's an intrinsic one. I really see no reasons for the multiple layers of modularity that we have now, just look at how awkward the classes vs. module debates were 15 or so years ago. I bet we can solve this problem with one top program organization abstraction (like classes or traits, or functions if you are from that world) + much better tooling to handle macro issues (search, versioning).

I played the module game before, and this is just my feeling on where we should have been going instead.

I'm sure someone said this

I'm sure someone said this about the web once, then we fixed the problem without going the Yahoo directory approach.

Well, humans tend to handle 404s, syntax errors and grammar ambiguities much better than your average programming language do.

I don't disagree with your feeling that both modules and classes/traits/functions carries some level of redundancy.

Oh, the problem is much more

Oh, the problem is much more difficult than that. It's versioning and trust that are the biggest issues with not having deployment modules. There is also cost to consider, but then we fly into the problem of component markets.

Wouldn't it suffice to

Wouldn't it suffice to replace the harsh "module X not found" error message on a failed import with a search result? In a search-as-you-type editor this wouldn't possibly make much of a difference with your more radical suggestion UX wise. It is also no law by nature that autocompletion must work from the root of a package to the leaf. I'm annoyed by long namespace paths as well but wouldn't bother if I could type "Token" after "import" and get a list of all packages containing "Token" in the path ordered by certain preferences.

Well, We know at least this about how to NOT do it.

Whatever else happens, we know now to avoid building a system where a file must be scanned/preprocessed/parsed every time an import is made.

C uses a preprocessor that can change the meaning of a header file for different iterations of reading it. This means that source files can define preprocessor directives and then #include the same header file to get different effects. Or the header file itself can define preprocessor directives and therefore change the meaning of its next import regardless of what the importing module does. And therefore you have to re-preprocess and re-parse that header file every time someone includes it -- and every time something they include includes it -- and every time something they include includes something that includes it -- ad nauseam. Although the exponential base is low given typical usage, the amount of time spent processing .h files in a large project is exponential in the size of the project, and beyond a certain point does not scale.

To avoid the problem, your 'importables' need to be idempotent so there's no need to read any of them more than once when building a project. Peculiarities of any source module's interaction with them (for example its choice of what definitions to import) need to be defined in the including module, not in the 'importable'.

The collapse of the universe

In Python caching modules has always been quirky. The language uses a flat cache which associates a dot-separated module path "M1.M2.." as used in import statements together with a (module) object. This is really fast when it comes to (module) look-ups but it can happen that the same physical module is accessed from different import paths and yields different (module) objects. In a sense the "physics" is not coordinate invariant. Changing the perspective means changing the world.

Usually, for us, "changing the perspective" means that we use different sets of paths e.g. a classpath which changes from device to device. Having the wrong classpath means that the world is just broken: galaxy clusters remain black and vanish from our universe if not the whole universe collapses. Although the collapse may not be entirely avoidable it happens so frequently that I wonder if the number of occurrences cannot be somewhat reduced?

Managing Complexity

It sounds as if you know something about this, but it is fun to post simpe answers to monumental problems. The basic problem of regulation is to see, and identify. A classic in this area is this paper by Conant and Ashby. Here are a fiew more references. The principles are universal. An observable thing has bounded complexity. Gurevich's postulate two.