The ideal style/system for namespaces.

I've been designing my own PL for a hobby for some time now, and I've gotten to the point where I need to start thinking about some kind of system to allow for organization of functions/constants/variables etc. The two systems that I feel I'm most familiar with are Perl's (namespaces largely by convention) and Java's (namespaces by enforcement). I'm curious what you guys might think about the topic and if there's things about some other system that are a must have or even something that is a complete mess and should be avoided at all costs for anything to be of any kind of practical use. So far I've {un,}fortunately been leaning towards having namespaces by enforcement BUT I don't think that is quite ideal since the language isn't intended to be purely object oriented and there isn't anything good to pick for enforcing a file name like there is in Java.

A bit about the language: I've been trying to design it to be as close to an ideal language for Applied Mathematics (I've been giving theoretical stuff some thought but I'm not sure how to accommodate it just yet) and Physics. Because of this it's been taking more of a functional design but not to the point where it ends up difficult to write any kind of general program in. It originally started as an open source clone of the Frink language/system but I've since decided that I didn't like its grammar (there's a few things in it that make it difficult if not impossible to parse with a yacc like parser); So I've been starting to take it into my own direction. There is an evaluator on the website but please be gentle on it since it runs it all on a VM at my house and the interpreter does have some issues with recursion (memory profile grows very fast). On the back end it uses PARI/GP to do the actual calculations so it can get higher precision floating point math along with complex math, etc.

You can read about the language and its development on my site at http://simcop2387.info/
please be gentle with me if I've accidentally put this post in the wrong place.

EDIT: made link from above.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Review

I'll point you to a few older threads on the subject that I found interesting:

Namespace design is important. At the very least, it affects compilation, backwards and forwards compatibility issues, code reuse, support for refactoring IDEs, whether a makefile system is needed, etc. You could do a lot better than Perl... and you can do so without becoming nearly so restrictive as Java.

Maybe I'll go code Ackermann on your VM now. :-) [Well, drat, page not loading.]

Thanks for the information

Thanks for the information about that I'm going to read them right now. They all look like great places to start. I consider myself reasonably informed on all the theory involved in all this but that doesn't mean I can tell what will work in practice.

The outage may have just been a temporary interruption (it's hosted on a free web host that doesn't have any illusions about five 9s but its free and otherwise decent. its loading for me right now, if not I'll start seriously investigating.)

ML module system

Since you didn't mention ML module systems in your post, I'm assuming that they're worth mentioning here. One introduction is in the OCaml manual. Module systems of this kind provide namespace management and more.

Module System != Namespace Management

Calling a module system namespace management is stretching it. It's like calling Java classes namespace management, whereas, to me, namespaces seem like an orthogonal feature to both.

Module System >= Namespace Management

Namespace management is a central aspect of any useful module system. Of course, it's far from being the only one. But it's not orthogonal either.

Module System <= Namespace Management

I would assume ML had namespaces if I could write stuff like: "inria.fr.Functor.apply." Now I only see a series of (parameterized) grouped declarations. A module in ocaml's sense is just a pure declaration of a construct.

Your mileage may vary on your definition on what modules or namespaces are, but if I take the java/C view on it, ocaml's modules have little to do with it.

[ Let's continue this discussion until we got the p.o. right ;) ]

Yes you can

I would assume ocaml had namespaces if I could write stuff like: "inria.fr.Functor.apply."

And you can. You just nest modules. No idea, though, whether that does or does not constitute "a pure declaration of a construct", and if I should worry about that...

if I take the java/C view on it, ocaml's modules have little to do with it.

Er... What has C possibly to say about namespacing, except that, apparently, you can live with its complete absence?

How to speak Obamanese?

Oh come on. You get the point, I'll call it C++ then. You happy now?

As far as I see, MLs modules are mostly used for introducing structured types, not, as per the Java/C++ convention, to order declarations in a certain scope. To me, those are different things.

And yes, you should bother. The question is central to what namespaces are for: For organizing modules, or for organizing software in a space where there are just a lot of different publishers.

That's why is chose the inria.fr.Functor example. What if glasgow.edu.uk also publishes it's own Functor definition?

To be fair...

I didn't understand what you meant about C either. Maybe it's just that I've never been a big user of C++, but I would never have assumed that a comment about C referred to a feature of C++.

--Wpedantic-errors

Yeah well. Most people I know hardly make the distinction since they normally happily use C++ extensions such as arrows all over the place where old ANSI C forbids that. It was a typo and could have been clear from context.

Sorry

There's no need to get defensive... I didn't mean to criticize you, I just wanted to point out that Andreas wasn't the only one who misunderstood.

Ah well

Not defensive at all, just thought it was clear from context. So, I'll publicly state it was a typo, sorry for that, also excuses to Andreas then. For the rest, let's get back on topic.

Err...

What 'arrows' are you referring to? The only thing I can think of in either C or C++ that I would describe as an 'arrow' is the syntax for accessing members of a struct via pointer, foo->bar, which is ANSI C.

I must be older than you are

Hmm, I distinctly remember compilers complaining when I used that in C sources, but I can't say I ever bothered about it. C99? Hmm, found a reference in a C91 book. It ain't K&R C?

foo->bar syntax

The foo->bar syntax goes back to at least the mid 70s. See, for example, page 5 of the 1974 version of the C reference manual.

Strange

I definitely remember C giving warnings about using C++ extensions in C code. I must a) either have remembered it wrong (using <<?) or b) it complained about something else (referencing something which was declared in C++ style)?

Nah, I remembered correct. Must have been a compiler bug.

Could well be such a warning

Don't know which compiler but back not too long ago (grin) I know some C/C++ compilers, such as Borland's for example, would do double duty just based on file name extension. I can easily imagine such compilers making some warnings along this line (good for the C++ adoption curve back then - C++ as just a "better C" was a big sell to crufty non-OO C programmer reactionaries :-). Cut my own teeth on Zortech C++ (grin again), and think it was that way as well - but it's really been way too long to be sure.

Well, at least that mistake is cleared

It probably was something easy like some implementor thinking that since '->' occurs a lot in Java/C++ sources, it ain't C, therefor my sources when compiled with strict conformance gave warnings, therefor a baffled me thought it ain't C.

Propagation of mistakes.

[Edit: this is actually a response to the post below, but I just thought this thread continued long enough. Hey, I agree, I looked it up myself, arrow is in C. I just distinctly remember a warning on using C++ constructs in C code, not an error, and being baffled by that message myself. I am pretty sure it was a wrong-issued warning which gave me the impression arrow isn't in C. ]

Again '->' pointer dereference is very much in C but ....

One might easily get a C++'ish error message in a hybrid compiler, such
as:
"err: foo.c:182 operator '->' not overloaded for type Foo
Maybe some crap like that.

?

It's easy to support "Inria.Fr.Functor.Apply" in the OCaml module system. Ur's module system is essentially the ML system, and it allows you to view nested directory structures in exactly this kind of way. It only takes a tiny preprocessing step to support this with any ML module system realization.

My Point

So either you do, or you don't, implement namespaces with a small preprocessing step.

Please use XML Qnames for everything. Thanks.

In XML, there is one big, flat, global name-space (for element and attribute names).

Every "fully qualified name" is comprised of a "name-space name" (an arbitrary URL) and an unqualified name (a simple identifier).

There is (in XML) no syntax for writing a Qname directly. Instead, you must declare a namespace prefix, and then use that prefix. For example:

xmlns:math="http://www.example.com/NS/mathematics"

math:sum

The first line declares the local prefix "math" to indicate the namespace "http://www.example.com/NS/mathematics".

"sum" is the unqualified part of the name.

math:sum gives the full qname.

Of course, you can make defaults. For example:

xmlns="http://www.example.com/NS/generic"

print

In the scope of the default declaration, the unqualified name "print" is automatically qualified to refer to the namespace "http://www.example.com/NS/generic".

That's about as far as XML goes. I'd suggest three additions. First, as in this example:

xmlns:math="http://www.example.com/NS/mathematics"
xmlns:misc="http://www.example.com/NS/miscellaneous"
xmlname:sum="math:sum"
xmlname:print="misc:print"

sum
print

The final "sum" and "print" refer to "math:sum" and "misc:print" respectively.

Second, let:

xmlns:foo="*:"

define "foo" as the prefix for a namespace with a unique and unprintable (pseudo-)URL. So, for example:

xmlns:foo="*:"
xmlns:bar="*:"
xmlns:baz="http://www.example.com/NS/"
xmlns:bat="http://www.example.com/NS/"

foo:x
bar:x
baz:x
bat:x

"foo:x" and "bar:x" are distinct Qnames. "baz:x" and "bat:x" are the same as one another, distinct from "foo:x" and "bar:x". No other prefix equivalent to "foo" or "bar" can be defined.

Third, add a prefix-less syntax for Qnames:

xmlns:foo="http://www.example.com/NS/"

foo:bar
::http://www.example.com/NS/bar

The last two lines name the same Qname.

So.... why? My rationale: You have all the simple semantics of a flat, global namespace. Your global namespace is large enough and structured enough that with only a modicum of sanity programmers can avoid name collisions without needing to coordinate ahead of time. Within local lexical environments, you have short, friendly names for everything. Additionally, you can map your identifiers directly into XML element names and that is likely to prove parsimonious in the long run.

Won't simple binding handle

Won't simple binding handle the first extension you desire?

xmlname:sum="math:sum"
xmlname:print="misc:print"

Minus language specific issues (like delaration syntax, const qualifiers, etc.), shouldn't we be able to simply write:

/* import machinery stuff */
...
/* resolve use of a few locally unqualified names */
val sum = math:sum
val print = misc:print
/* Example use of names */
int incrPrint(int x) = print(sum(x, 1));

I'm sleep deprived right now; I may well be missing some important capability related to moving this normal language binding capability into the regular namespace machinery. If so, I know someone will let me know :-)

re won't simple binding

No, I don't think that's equivalent.

If I say

xmlname:sum="math:sum"

then after that the identifier denoted by the lexeme "sum" is the same as the identifier denoted by the lexeme "math:sum".

If instead I say:

val sum = math:sum

then the lexemes "sum" and "math:sum" denote different identifiers - but presumably each denotes an identifier bound to a common value.

Interesting. I want to get

Interesting. I want to get it, but right now I can't wrap my head around "two identifiers are 'the same' identifier." I guess a nap hasn't cleared my head :-)

Lately I've been (way too lazily) implementing a sorta-kinda ML'ish language in Scheme (Racket), which supports import time renaming via prefixes and so forth. So anywho, I have a mental model of id's bound to values, including to reference values for mutable lvalues - the binding itself never changes. So "q = M.r" is no different than, say, "a = 1; b = a": there's no difference in the relations R(q,M.r) and R(a,b). For imported types, the semantics are similar - just an immutable "type foo = M.ty" style binding of a name to a type instead of a term. Yada yada.

So I imagine my thinking is kinda stuck in one direction.

I'm sure there's a counter example that distinguishes simple name/value (or name/type) binding from some notion of "two identifiers are the 'same' identifier" - but I can't think of it right now.

So given a hide+exp+imp-identifiers module system (plus renaming-style convenience features), it strikes me that alpha substitution of any 'renamed' imported identifiers with the fully qualified module+identifier names in an importing module would yield a semantically equivalent module, yes? (???)

Hmmm, I think I'm gonna learn something today :-)

S.

it's not "two identifiers are the same"

See my answer to Anton, below.

It isn't that "two identifiers are the 'same' identifer".

It is that "two lexemes denote the same identifier."

In a case insensitive language, foo and FOO are distinct lexemes which (in appropriate contexts) denote the same identifier.

Sorry, but

then after that the identifier denoted by the lexeme "sum" is the same as the identifier denoted by the lexeme "math:sum".

By the usual definitions of the terms "lexeme" and "identifier", this is incoherent. It implies a redefinition of those terms which seems likely to have several unfortunate consequences.

It destroys the usual 1:1 relationship between lexemes at the lexical level and identifiers at the syntactic level. It becomes necessary to use a term other than "identifier" to refer to what we normally call identifiers, because identifiers would now be a separate domain of values to which lexemes can be bound in arbitrary ways. If the term "lexeme" is used for this purpose, it erodes the normally clear distinction between the lexical and syntactic views of a program. It introduces an extra layer of indirection which duplicates the functionality of the usual binding of identifiers to values.

Given that multiple identifiers can be bound to the same value in most languages, this all seems very unnecessary.

"lexeme" v. "identifier"

By the usual definitions of the terms "lexeme" and "identifier", this is incoherent. It implies a redefinition of those terms which seems likely to have several unfortunate consequences.


It destroys the usual 1:1 relationship between lexemes at the lexical level and identifiers at the syntactic level. It becomes necessary to use a term other than "identifier" to refer to what we normally call identifiers, because identifiers would now be a separate domain of values to which lexemes can be bound in arbitrary ways. If the term "lexeme" is used for this purpose, it erodes the normally clear distinction between the lexical and syntactic views of a program. It introduces an extra layer of indirection which duplicates the functionality of the usual binding of identifiers to values.

With all due respect I think you are simply mistaken.

I regard a "lexeme" as a sequence of characters, typically that match some regular pattern applicable within a context established by charaters "to the left" of the lexeme. I regard an "identifier" as, as you suggest, a class of objects in an abstract syntax.

I believe my understandings are customary, as illustrated by the following example:

In the new programming language I've defined, identifiers are case insensitive.

I claim that what that typically means that for each identifier (that contains a cased character) there exists more than one lexeme which denotes that single identifier.

I would describe a typical lexical analyzer as a program that takes a source text as input, that recognizes lexemes, and that produces as (ordinary) output (for lexically valid input) a sequence of syntactic objects (namely syntactic objects which are in some sense "primitive" in the grammar for the language being parsed).

You make a separate point here:

Given that multiple identifiers can be bound to the same value in most languages, this all seems very unnecessary.

I disagree. What you are describing (multiple names bound to "the same value") solves one set of problems. What I am describing (how to manage a distributed, decentralized namespace) solves a different set of problems.

For example, the ability to bind two distinct identifiers to a common value is one aspect of procedural abstraction. It is also a mechanism by which two bodies of code, which use common identifiers in incompatible ways, can be composed to produce a third body of code in which all of the named things in either original are named - but naming conflicts have been eliminated.

XML QNames and similar mechanisms solve different problems. In particular, they solve the problem of allowing well behaved (convention-following) programmers to reliably choose globally unique names within a flat, shared name-space with little or no need for ongoing coordination with any central authority. XML QNames contrasted with other solutions (such as simple "naming conventions") have three specific advantages (in my view): They allow names to be organized into abstract hierarchies; they provide a suitable mechanism for abbreviating very long names; and they "pun" nicely -- encourage parsimony -- by being so widely used throughout the W3C family of standards and tools related to those.

identifier-syntax

Given that multiple identifiers can be bound to the same value in most languages, this all seems very unnecessary.

It seems that there's some value in being able to abstract over identifiers, as evidenced by Scheme's identifier-syntax and Common Lisp's define-symbol-macro.

(I haven't thought this issue through yet, but I was thinking about it recently, so I thought I'd post that here.)

Simple modules offer a simple solution

So far I've {un,}fortunately been leaning towards having namespaces by enforcement BUT I don't think that is quite ideal since the language isn't intended to be purely object oriented and there isn't anything good to pick for enforcing a file name like there is in Java.

The ML/OCaml module system has already been mentioned. At its core, that general approach is similar to the approaches used in Scheme and Haskell. Their solution to the issue of not having "anything good to pick for enforcing a file name" is generally to have "modules" whose names typically map in some trivial way to file names.

Modules in this most basic sense are essentially arbitrary collections of definitions, but in practice they tend to be organized by their purpose and relationships, just as classes in a language like Java are. The effect from a namespace perspective ends up being similar to Java's, in that a module hierarchy maps trivially to a directory structure and ultimately to a file/module name.

There are some important differences compared to an OO language, though. One difference has to do with the scope of names. In an OO system, if two classes both define an identifier with the same name, there's no conflict because the name is disambiguated by the class of the object that it's used with.

In more functional languages, there's no such automatic disambiguation in the core language (ignoring type classes and object systems for the moment.) This means that client modules need to resolve any name conflicts between modules that they import, typically by specifying a prefix for identifiers from the module being imported.

For example, in the Haskell module system, you'd typically use something like this:

import qualified My.Really.Amazing.Module as A
import qualified BigCo.Mediocre.Module as Bleh

...and then if both modules define an identifier foo, you can refer to A.foo and Bleh.foo in the importing module. (Or import one of them without a prefix, if you like.) Other features typically provided by these systems include being able to selectively import or hide specified identifiers, and rename individual identifiers.

The library system in R6RS Scheme follows a similar approach (slightly complicated by having to deal with macros and having explicit support for versioning.)

These systems are mostly quite simple, and provide a way of organizing code and managing namespaces without raising any technical difficulties to speak of. Broadly, Java solves the namespace problem by conflating namespaces with classes; Haskell and R6RS Scheme solve it by conflating namespaces with modules.

SML, OCaml, and some Scheme implementations (e.g. Racket and Scheme 48) take this a step further by providing higher-order modules - modules that can be parameterized by other modules. This is powerful, but also complicates things, and is mostly beyond the domain of namespaces, dealing instead with a notion of modules as first-class software components, more like OO classes than the simpler module-as-set-of-definitions approach.

That's what I wonder about

Import qualified is hardly used in Haskell, I didn't know about it and you can read about the complaint here, and I wonder if that is just the effect of being able to abbreviate qualified names? Having said that, abbreviation is a nice feature to have.

Musty old wiki pages

Import qualified is hardly used in Haskell

That's sort of true in the sense that it's only really needed when conflicts occur, but then it's pretty indispensable, and commonly used. A grep through the source of about 100 Hackage packages that I have handy reveals 3,357 uses of "import qualified".

The page you linked to is old, ported from the old Haskell wiki and based on an 11-year old email. Haskell has changed a smidgen since then. For a start, Hackage appeared, gained over 2,200 packages, and introduced many more name conflicts than existed in 1999.

Nowadays, the Haskell libraries contain all sorts of name conflicts, because of various reimplementations of basic functions to support libraries like Data.List, newer types like ByteString, and general type classes like Foldable and Traversable.

If you search Hoogle for a common function name like map, you'll find dozens of versions of it, and "import qualified" is commonly used to deal with that. (You can also specify a module abbreviation without the qualified keyword, but that just opens the door to conflicts again.)

In any case, in general, the module systems of all the languages I mentioned support such renaming in some form - something like that is essential to deal with conflicts practically.

Addendum, added in an edit:

This general approach to modules is a simple one that seems to be a sweet spot in terms of practicality. There's also an argument that if you want more sophisticated module system features, you can provide them orthogonally, which avoids complicating the namespace management features of the module system with higher-level semantic constructs.

E.g., in Racket, the basic module system follows the simple model I've described (again, I'm ignoring macros for this discussion.) Higher-order modules are provided as a separate feature on top of the basic module system. I believe one of Matthew Flatt's papers makes the case for this approach.

Haskell does something along similar lines, by providing type classes as a feature orthogonal to modules, instead of providing higher-order modules as in ML. Oleg and Shan cover the equivalences between these approaches.

Usefulness of modules?

Starting with a language like C, I can see one clear use for adding "modules" - avoiding linker errors. Can't even "import" be skipped altogether if we then just always use fully qualified names throughout a program's compilation units for any "exported", really just "not-hidden-from-linker", names?

Otherwise, it seems all this importing and reexporting imports and prefixing and import time renaming is a lot of meaningless convenience-driven cruft just meant to cut down on the light typing burden of software development - IMHO, a relatively trivial burden given the overall software development enterprise.

Of course, I'm not addressing ML modules and PLT Units and the like that actually do ADD some important capabilities to their respective languages - abstract data types, recursive module dependency resolution, a kind of module level dependency injection or whatever.

S.

p.s. While I'm at it, the hierarchical package/namespace/module organization seems like some kind of empty reflex as well at the moment. Ever notice that so many software language constructs (hell, tons of our more abstract cultural artifacts) just magically offer some kind of... surprise,TREE-LIKE decomposition? As if the tree was the magical solution to nearly all problems; as if we all had those old plastic flow chart templates but with only a box and a uni-directional arrow on it :-)

I have this rough (perhaps incoherent, therefore) idea that a "module system" might someday really be a "decomposition system." We'd have some family of ways to decompose systems - old fashioned top down "functional" decomposition, layer-based decomposition, common-substrate decomposition (like stdlib routines used anywhere in a program), chain-of-filters decomposition, and so on.

All code belongs to one or another decomposition unit with a particular internal structure and set of rules for composition and communication with other decomposition units - who-calls, constructors vs. users of values, etc. The "communications" between units of decomposition would be strictly enforced at the language level. 'main' would be the only standalone compilation unit - EVERYTHING else MUST be in a formal library which always conforms to one or the other decomposition unit types. You couldn't compile a program, or even type code, without explicitly stating the program (or library) decomposition structures (and so on recursively to the parts).

Get the decomposition right, and you're good. Screw it up, and just a little "refactoring" or one or two more "imports" or exporting a few more symbols or making a few variables globally scoped - all to give "needed" ad hoc, willy-nilly access to other parts of the system will get you NOWHERE - by design.

Then, we can have language level tools to "get the system structure right," and not just make do with the piles of ridiculous drawings plus hand waving most folks use today. We can also be more confident, and much sooner, when we think we just "got it wrong" on the total project structure and not throw good money after bad.

Imagine - a "module" system (really, much more ambitious, I'll admit) that actually did something for someone besides just avoid linker errors and save typing!

p.p.s. In my experience, we

p.p.s. In my experience, in practice, with "modules" we conflate program decomposition with dividing tasks between multiple developers - at first blush, a pretty stupid idea IMHO. And over the last few decades, I've seen more arbitrarily and poorly structured "modules" contrived for just this purpose.

Just think of the popularity of "unit" testing. What is the "unit" - usually some like, "Jim's task this week." A really, really, crappy basis for module structure, but often the *primary* consideration in the delineation of a "module".

Or how many of us have seen someone make several "modules" out of a single compilation unit just because a file was "too long" (for what?) - even though none of the new modules will be reused or add a single important abstraction to the original "too long" solution (file of code). Again, can we imagine any more stupid criteria for defining "modular program structure" or "solution/problem decomposition"? (I read stats on this around 15 years ago that reported whacky consistency on the upper bound of most source file sizes - don't recall the details.)

Sheesh! We're losing before we even started, IMHO.

My "decomposition" system would have an anti-relationship with the division of labor on purpose - decomposition must drive program structure and ONLY getting the problem and solution structure right could lead to a reasonable (successful) decomposition of the program artifacts.

Division of labor and ongoing testing would all be concerns completely separate from the decomposition system. I like Jackson's Problem Frames ideas on software design, if that helps understand where I'm coming from. He's a wonderful critic of "top down", "object oriented" and other *simplistic structural cliche's* he believes are often the basis for not just inadequate but WRONG decomposition strategies.

S.