How important is language support for namespace management?

This paper lists the most widely used languages on Red Hat Linux in 2001/2002. They were: C, C++, Shell, (Emacs) Lisp, and assembler (in that order).

It's generally agreed in the literature that language support for namespace management is important, but only one (C++) of the 5 most popular languages actually has that feature. The other four scrape by with manual prefixing of exported names, and have resulted in some of the largest and most widely used software systems in existence (Linux kernel, GNU, Emacs).

I don't want to go as far as saying that the lack of namespace management could be a cause for their success, but rather ask the question: Is it crazy to design a new language with a single, flat namespace, or could it actually be worthwhile?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Flat Namespace

I am enamored of the concept of shallow, global namespaces. I believe that languages encourage whatever they make easy. When creating little namespace niches for a project is easy, then that is what people will do. The negative effects: it becomes difficult to share, search, and reuse code across projects, difficult to integrate projects and automate integration tests, difficult to know when a change in one project breaks other projects.

On the other hand, global namespaces create political contention for popular names, can result in elongated names, and not all project code is intended for sharing.

Avoiding political contention has historically been achieved by de-facto mediation. For example, in Java, we might use 'org.lambda-the-ultimate.ClassName' to name a package. In Haskell, you would instead obey the hackage/cabal conventions. For Perl, I understand that you use 'package-Descriptive-Sentence' and drop it into cpan. Many of these approaches achieve a 'global' namespace, but fail to keep the names shallow, short, or memorable.

I've been wondering if a Wiki-based naming convention might be more suitable - contention for names would be much higher, but would be resolved socially and mechanically (i.e. by fistfights, arguments, and global refactoring). A few black eyes and bruised fists among central contributors might be an acceptable sacrifice if it results in shorter, more memorable names for regular users.

As an aside, C's namespace is not truly 'flat' in the sense discussed above. In-the-large, C can be observed as having an ad-hoc hierarchy based around filesystem organizations. C seems to choose the worst of each world when it comes to namespaces. Collision resolution is based on ordered parameters to the linker (plus explicit overrides). Its namespace even fails to inform a linker where to find the implementation code.

Names should be structured at least well enough to help the linker (and type-system, and IDE, and so on) locate the implementation. A truly flat, unstructured namespace is achievable, but will not simplify things; instead, that would shift essential complexity to IDE, linker, project-files and makefiles. There are many advantages to having the names indicate where an IDE or linker should look for definitions, making it easy to bounce around code and precisely determine what needs rebuilt after a change.

So I would not recommend a truly flat namespace unless you have a global dictionary. However, a very 'shallow' namespace can be achieved by eliminating hierarchy beyond 'function@package'.

Flat namespace is definitely

Flat namespace is definitely possible in a language that supports naturalistic naming, I think we've gone over this in another thread before. Basically, natural language doesn't have namespaces, names all exist in a flat dictionary, ambiguity is resolved using context.

There would be a lot more required to make a naturalistic naming system available, like how to name code constructs in a natural way when they are often artificial constructs. But then I'm thinking ahead to code wikis...

Discoverability, Language

Discoverability, Language Features, and the First Step Toward Composition [LtU] held an interesting discussion on this subject.

That said, I'm entirely wary of context-based naming. Humans have a hard enough time with dangling modifiers and similar grammatical structures in natural languages even when they're in full control of the context. In a multi-developer environment, we risk modifying the context assumed by other developers. Further, context-based names may hinder refactoring (such as extracting a method).

Shallow artificial namespaces, minus context, is what I imagine would be a likely target for the core language. Anything further could be handled by extensible syntax mechanisms. In the language I'm developing, I am thinking that even 'import package (x y z)' might be handled by extensible attribute grammar, causing a parser to replace 'x' with 'x@package' for the rest of the file.

Basically, natural language

Basically, natural language doesn't have namespaces, names all exist in a flat dictionary, ambiguity is resolved using context.

I think this only works when there's a protocol for backtracking and/or disambiguation. Humans do this naturally during conversation, by assuming people mean X until they encounter a contradiction and backtrack to the point of assumption and reconstruct the conversation using assumption Y instead, or they engage a communication protocol requesting the other party to more precisely define their terms. We have some tools and mechanisms which look like this, but nothing exactly like it.

Backtracking Parsers

Backtracking is an old, common technology in parsers and in automated theorem proving (which may be used for logic programming, type-inference and such). We can produce a lazy forest of ASTs from text then further filter by type analysis. We've intentionally gone in the other direction, though - avoiding these approaches where possible because backtracking can result in exponential search paths (and sloooow parsing). But I'm not seeing what we're missing on that aspect.

Though it should be noted that, quite often, ambiguities go forever unresolved among humans. We even use that as a basis for humor (double entendres). Sometimes we even use ambiguity for income (psychics and palmistry... and politics).

More relevantly, 'human conversation' is interactive. We can indicate confusion by asking "what do you mean?", or providing a response that describes our understanding, or by body language. We can gain confirmation of our understanding, or denial of it - and sometimes further explanation. We can challenge one another's understanding (test questions) in order to gauge it. We depend on these mechanisms on a regular basis.

It would be nice if software development also took more advantage of such mechanisms - i.e. via ever more interactive IDEs and good support for language-side heuristics, strategies, databases - but this is an area where we lack tools.

Dictionaries and Contexts

Flat namespace is definitely possible in a language that supports naturalistic naming, I think we've gone over this in another thread before. Basically, natural language doesn't have namespaces, names all exist in a flat dictionary, ambiguity is resolved using context.

A namespace solves what name you want from what dictionary.

Namespace mgmt and build mgmt are orthogonal

Names should be structured at least well enough to help the linker (and type-system, and IDE, and so on) locate the implementation. A truly flat, unstructured namespace is achievable, but will not simplify things; instead, that would shift essential complexity to IDE, linker, project-files and makefiles.

Not quite. You could well have a (require foo) form, that tells the compiler and linker where to find the package "foo" (edit: e.g. the file "foo.l" in some standard directory), but the package's symbols would still be imported into the only, global namespace (and would thus usually be prefixed with "foo-", as in Elisp).

Not so orthogonal

I recall use of the word "should", not could :-).

The technique you describe does not allow a compiler/interpreter/IDE to generally know the source for a symbol at the point of the symbol's use. That is, code such as (require foo) (require baz) could result in a very different program than does (require foobar) (require baz). It remains unclear to a linker/compiler how to handle naming conflicts. A refactoring IDE would be a pain to write.

A moderately sane variation on that is the hyperstatic global environment, where names can be traced to their source. Add some export control to the hyperstatic environment (to prevent accidental exports and coupling to implementation-details), and allow 'import as' or some sort of qualified reference to imported names, it would be reasonably complete.

Naming conflicts are verboten

Let's say that naming conflicts are simply not allowed, i.e. the compiler gives up immediately and yells at you. That shouldn't be a problem — in my proposed scheme, naming conflicts wouldn't happen in normal use, because all names (except for kernel language words) would be prefixed with their module name.

In this case, it's always clear what definition a name refers to.

(As to the hyperstatic environment, I'm sceptical because it goes against both the C and Lisp traditions.)

Naming conflicts are still possible

Naming conflicts are still possible. (Defining the consequence doesn't avoid the cause.) And there is still no clear relationship between a name and the module it comes from. You might see Casey's comment, and my response, for some of the issues your proposal handles poorly.

As to the use of brick and mortar, I'm skeptical because it goes against the mud and straw traditions.

Ha!

As to the use of brick and mortar, I'm skeptical because it goes against the mud and straw traditions.

I had that coming. :D

mud houses

actually sometimes have some benefits over brick and mortar. of course, that doesn't mean the same goes for hyperstatic vs. c/lisp.

Alan Kay

From Scientific American, 1984:

The same notation that specifies elevator music specifies the organ fugues of Bach. In a computer the same notation can specify actuarial tables or bring a new world to life. The fact that the notation for graffiti and for sonnets can be the same is not new. That this holds also for computers removes much of the new technology's mystery and puts thinking about it on firmer ground.

As with most media from which things are built, whether the thing is a cathedral, a bacterium, a sonnet, a fugue or a word processor, architecture dominates material.

Of course, I've seen/heard Alan repeat that emboldened part of the quote at least one other time: His OOPSLA 1997 speech, The Computer Revolution Hasn't Happened Yet.

A little harder to get right than you might think

Let's say that naming conflicts are simply not allowed, i.e. the compiler gives up immediately and yells at you.

Yes, "simply". ;-)

See this paper I just posted to my blog.

Where's the problem?

Can you describe in your own words where you see the problem?

I don't see how that paper relates to the issue; in fact I don't even get what its claim is.

It is saying that as you add

It is saying that as you add new attribute values, if you structure your queries a certain way, then you will get erroneous services as fulfilling the demands of the query. In other words, wild cards don't guard against the addition of a service that you don't want. If that service you don't want happens to be located, say, closer to your mobile device using INS, then the mobile device could swap to that erroneous service. INS is supposed to guard against name resolution failures and service failures, but not necessarily misconfiguration of queries. The authors of this paper are saying that INS as-is was stupid in that regard.

Namespace management is considered fairly important if you want the ability to configure mobile, pervasive, ubiquitous computing devices that can fallback to other external services in the event of failures in primary services.

However, you don't want something like a Java Messaging Service chain of responsibility to erroneously have the wrong service handling a message. Who knows what the repurcussions of that are (ahem... I know of one military case I can't speak of)...

Namespaces and Mobility

What you're describing seems far afield of the 'source-code namespace' that the OP was describing. They're good points, though; I have a lot of interest in mobile agent programming and mobile devices, and agree that changing between implementations of services introduces its own namespace issues.

Removing all ambient authorities for designating services by name can go a long way towards eliminating these problems (admittedly, by eliminating easy access to problematic solutions). Developers are forced towards more robust naming solutions. Some of the better solutions embrace object capability languages plus language support for at least one of the many unum patterns.

No

What you're describing seems far afield of the 'source-code namespace' that the OP was describing.

The context is perhaps far afield.

The basic ideas are NOT.

Here is another example:

Java Security: Hostile Applets, Holes & Antidotes by Gary McGraw and Ed Felten, and followed up in Securing Java. For example, see Chapter 5 Section 7: You're Not My Type for coverage on so-called type confusion attacks from bad VM implementation.

So let's see. We started off with you replying "Namespaces and Mobility". Your next reply should be titled, "Namespace and Mobility and Security". :)

Namespaces and Mobility and Security :)

That those issues are combined in Java says more about Java than it does about how "The basic ideas" are related. There is no requirement that the source-code namespace bear any relationship at all to the code and registries accessed by mobile objects, applets, etc.

Separating "the basic ideas" of source and runtime namespaces - and keeping their implementation separate - is useful for reasoning about consistent semantics for mobile code, and for reasoning about security, and for supporting hot-pluggable language runtimes.

No management is better than mismanagement

Perhaps the better question is whether namespace systems actually solve namespace issues, in the sense of correctly distinguishing two things with the same name. I mean, that shouldn't be hard, right?

Okay then, so I create a new language. It's got namespaces. The handful of users my language has think it's all great!

But then more people show up, and soon they don't all talk to each other regularly. I write some Official Namespace Policy Guidelines but, unexpectedly, people don't follow them perfectly. Two libraries are released that, for whatever reason, use the same namespace and have name collisions. Both libraries are for some reason amazingly useful, and I'd like to use both in an application. How does the namespace system handle this?

At any rate, people keep using the language. Some popular libraries grow and develop over time, but people are busy, and the libraries aren't always completely up-to-date. Now I'm writing a piece of code that uses two libraries, each of which use different, incompatible versions of some third library, causing namespace collisions between the two versions. How does the namespace system handle this?

At some point, two popular libraries with related functionality get tired of duplicating effort, so they collaborate to produce a third library providing the best of the shared features. To avoid unnecessarily breaking client code with the new versions, they'd like the new library to be able to occupy both original namespaces, or otherwise be easily substituted in. How would the namespace system handle this?

Another popular library has some developers leave over a dispute regarding open-source licensing (apparently, the license text of the original library ended sentences with prepositions and there was no consensus on relicensing to "correct the grammar") and produce their own reimplementation of the shared library. They'd like it to be a drop-in replacement for the other. The interface and functionality is the same, backed up by a huge suite of unit tests checking for identical behavior between corresponding functions. An application I'm working on uses two libraries, one of which requires the replacement library. The other library has no opinion on preposition placement, and uses the original basically by default, but could work with the replacement. Does the namespace system give me any means to easily reconcile this?

...I could probably come up with plenty more semi-contrived examples, but the point is: if a namespace system ostensibly exists to improve modularity and organization, it really ought to not ever make things worse. If the answer to any of the above scenarios is "modify the code for the problematic librar[y|ies]" then my modularity and organization are not being improved and I'll be sad. If namespace issues actively obstruct me from using quick workarounds, I'll be even sadder.

It's easy to design a system to make easy things a bit easier, but hard to do so without simultaneously making hard things a lot harder!

The benefit of not providing any namespace management at all is that it forces people to deal with all the headaches sooner, and accustoms them to dealing with irritating namespace issues, so that when an unusually tricky issue comes up--well, it's only a bit worse than the constant day-to-day hassles, and at least there's nothing getting in the way of hacking together some workaround.

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.

-- Douglas Adams

Unless you have a clear idea of the benefits, limitations, flexibility, and extensibility of a namespace system--or anything else baked into a language, for that matter--maybe better to leave it out entirely.

Import/Export Controls

The examples you describe are resolved readily enough by controlling exports and imports.

That is, importing a package should only introduce the names that library explicitly exports. This prevents you from importing the package's dependencies, and avoids unnecessary conflict. Additionally, you should be able to control the names you import (i.e. via 'import org.blog.whatever as W' or 'import foo (x y z)' such that someone adding a new export to an existing library will not affect the meaning of the code that uses it.

These controls resolve the 'two libraries' problems - or at least the namespace problems - by allowing them to coexist. There are, of course, many other potential problems with multiple conflicting libraries. These other issues must be resolved by other language design elements, such as: (a) abstract modules to avoid hard-coded inter-lib dependencies, and (b) object-capability language to avoid the 'multiple authoritative frameworks' issues that can arise with global state, FFIs, and multiple variations of a library.

Funny

Additionally, you should be able to control the names you import (i.e. via 'import org.blog.whatever as W' or 'import foo (x y z)' such that someone adding a new export to an existing library will not affect the meaning of the code that uses it.

The paradoxical thing is that such conflicts (where a change to a library affects the meaning of an importer) is simply not possible* in a "primitive" system like Elisp's — because all names are always prefixed with their module name, they never conflict with the importer's names.

(* As long as all programmers have the discipline to prefix their symbols. [Edit: and there's a central naming authority, which can be assumed thanks to the internet.])

Fully qualified names

I favor fully qualified names as well. But developers tend to be lazy, and aren't so interested in writing out the large package names every time. I would not be surprised to learn of many cases where elisp developers slipped in discipline.

And I know for certain that C developers often hate using package_name_function(args) all over the place, and that they introduce plenty of conflicts in practice (often because what was intended to be a local-use-only library eventually gets promoted).

Since you seem insistent on promoting Elisp's design, how about you answer Casey's challenges?

Modify the code

My answer is actually "modify the code for the problematic librar[y|ies]". This doesn't work for the general case, of course, but I'm assuming an environment like the Linux kernel, where code is continually massaged by a tight-knit group of programmers.

But what if it is "necessary"?

This prevents you from importing the package's dependencies, and avoids unnecessary conflict.

Unless the package does, in fact, need to re-export things from its dependencies. Some sort of data structure, perhaps, that forms part of the public interface for the package, that you need to work with as part of using the library.

With two libraries doing that with incompatible versions of the data structure library, then what? Could tie the names to the package that re-exported them somehow, but then if you have two libraries using the same version of the data structure, your code can't pass data from one to the other. Or disallow any sort of re-exporting, but that just rearranges the problems for the "incompatible versions" case. Either you've got the same name referring to different things, or different names referring to the same thing, or things you can't give a name to at all.

Not to mention that I'm distrustful of any solution that relies on other programmers doing the right thing. People make mistakes, libraries might behave improperly in ways that will only become obvious much later. "Well, they shouldn't have done that" fails to placate when one is faced with the resulting compiler/linker errors.

If a namespace system can't actually keep the association between names and logical entities straight in a useful, adaptable way, then what is it good for?

Import Controls

With Import/Export controls, you can control both sides - what is exported, and what is imported, and under which names things are imported. So you can also control conflict on the import side.

And even if you're using nominative typing, the type-system (and the IDE, and the users) can know easily enough when two names mean the same type. This isn't so different from 'typedef' in C/C++ or 'type' (as opposed to 'newtype') in Haskell. Having multiple names referring to the same type or same value is never a problematic scenario. Thus, from the "Either you've got..." list, we choose "different names referring to the same thing".

Separate Authorship

The problem you are talking about here is that of separate authorship of symbols.

That is addressable by making identifiers be composed of an author and a name.

You can then remap authors in the case of collisions.

This is essentially how CL packages work -- a CL package is more properly an author-environment.