A module system for the C family

Doug Gregor of Apple presented a talk on "A module system for the C family" at the 2012 LLVM Developers' Meeting.

The C preprocessor has long been a source of problems for programmers and tools alike. Programmers must contend with widespread macro pollution and include-ordering problems due to ill-behaved headers. Developers habitually employ various preprocessor workarounds, such as LONG_MACRO_PREFIXES, include guards, and the occasional #undef of a library macro to mitigate these problems. Tools, on the other hand, must cope with the inherent scalability problems associated with parsing the same headers repeatedly, because each different preprocessing context could effect how a header is interpreted---even though the programmer rarely wants it. Modules seeks to solve this problem by isolating the interface of a particular library and compiling it (once) into an efficient, serialized representation that can be efficiently imported whenever that library is used, improving both the programmer's experience and the scalability of the compilation process.

Slide[PDF] and Video[MP4]

Slides and videos from other presentations from the meeting are also available.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

a nice improvement

The approach this group describes is to replace #include by a form of module import, which is how programmers think of the facility anyway. The perhaps unobvious trick is to scope the goal down to just killing #include. The rest of the preprocessor is left alone, and in particular, you can import a macro definition from another module.

I'm glad the authors consider how to transition from current C to C with modules. However, in this case, I'm not clear what the precise transition problem is; maybe this is a case where the feature can be added and that's the end of the story. For providers of C libraries, they can provide both .h files and modules, where the modules are built by including the .h files in them. For users of C libraries, they can use #ifdef and autoconf to choose between modules and #include.

how does it compare to Knit?

http://www.cs.utah.edu/flux/papers/knit-osdi00.pdf

A small pragmatic step

This seems to be a very simple proposal: add support for import and export but no external linking or renaming (provided by knit and other unit systems).

we're still using C?

oh man!

In OS and embedded systems,

In OS and embedded systems, sure! No one has tried to build a decent module system for C++ yet, but then I think that would look something like Jiazzi (if it was done units style).

This is a module system for C++

The LLVM project is for "the C family", including C++, Doug Gregor chairs the ISO C++ committee's study group on modules, the main motivation comes from the C++ community and IIUC the focus is on modules for C++ not C. If it works for C and Objective-C too, great. Replacing #include with import is relevant to the whole family that shares the C preprocessor, even if you're not using C itself.

How do you handle

How do you handle inheritance? Is there an signature ascription going on?

Not sure what you mean here.

Not sure what you mean here. What does any of this have to do with inheritance? Why do you think anything would need to change about it?

If your module system

If your module system supports signature ascription, then you'll have to figure out what you can safely hide while still enabling separate compilation (the whole point of a module system actually). I guess its pretty much expected that inheritance relationships can't be hidden if one is dealing with C++, but what about non-public possibly virtual methods? Are cyclic dependencies allowed, where mutual extension occurs between two modules? Name clashes then can become a problem.

I'm assuming that they just did the simple thing here, used header files as module signatures, and added an import directive mainly for namespace control (this might not even be about separate compilation, the standard conflation of modules for both tasks).

i know, it was meant to be rhetorical :-)

i just am sad about it, is all. especially since i'm doing objective-c at the moment.

I'm not familiar with the

I'm not familiar with the term "signature ascription", sorry. The only explanations I could find seemed to be of some concept specific to SML, a language I don't know.

In the module proposal here, modules are about separate compilation (basically standardizing a form of precompiled headers) and encapsulation (importing a module only brings in the symbols that were explicitly exported, instead of everything in the header). Namespaces are orthogonal to modules.

(Edit: That was meant to be a reply to Sean, sorry about the misthreading.)

module abuse

You are still talking about namespace management, not encapsulation. Encapsulation is hiding something so it cannot be used outside of a designated scope. Namespace management is allowing us to name a symbol in some scope when that symbol is not encapsulated to that scope. So if we were talking about Java, "private" would encapsulate a member, while package import would make some class/member visible in a certain scope (namespace management) without qualification. Java lacks modules, or to say, classes act as poor modules in Java (namespace management get's intertwined here). Java really has no good story here, and C definitely has never had a good story for separate compilation that I know of.

Also, separate compilation is more about the safety problem of being able to detect type errors in a modular way vs. the simple physical problem of compiling bits separately (if you need to track dependencies, its not really separate compilation!). Or to put it another way, its about safe linking of program parts whose bits are compiled separately. If one module changes, do we know when we need to recompile the other modules, or is some sort of hacky dependency management scheme needed?

Big change

After reading the PDF/PowerPoint type presentation, I'm not too sure what to think.

It's a big change for C. There's a new notion of "public" and "private". It doesn't introduce namespaces to C, and it doesn't seem to interact with the namespace system in C++. That may need to be thought through.

There's a certain amount of UNIX semantics built into this.

module ClangAST {
umbrella header “AST/AST.h”
module * { }
}

assumes that programs are stored as files in a UNIX-type directory tree. C and C++ did not previously assume that.

The notion of wildcard includes is troublesome. Now the structure of the program sometimes (but not always) depends on the contents of directories.

This looks like too much to solve the #include problem, but not enough to manage modules well.