public vs. published interfaces

Gilad Bracha is about to set in motion a JSR that may -- in a glacially unstoppable JCP fashion -- eventually address one of my pet peeves with Java: lack of distinction between public and published interfaces. The latter terms are due to Martin Fowler [PDF, 68K]:

One of the growing trends in software design is separating interface from implementation. The principle is about separating modules into public and private parts so that you can change the private part without coordinating with other modules. However, there is a further distinction -- the one between public and published interfaces.

... The two cases are quite different, yet there's nothing in the Java language to tell the difference -- a gap that's also present in a few other languages. Yet there's something to be said for the public-published distinction being more important than the more common public-private distinction.

Or, in the words of Erich Gamma:

A key challenge in framework development is how to preserve stability over time. The more miles a framework gets the better you understand how you should have built it in the first place. Therefore you would like to tweak and improve it. However, since your framework is heavily used you are highly constrained in what you can change. At this point it is crucial to have well defined APIs and to make it clear to the clients what is published API and what internal code is. For published APIs you should commit to stability and for internal code you have the freedom to change it.

To fully appreciate the kind of pain that this JSR is intended to ease, consider how developers deal with this problem today:

  • The Eclipse model, as described by Erich Gamma:

    A good example of how I like to see reuse at work is Eclipse. It's built of components we call plug-ins. A plug-in bundles your code and there is a separate manifest where you define which other plug-ins you extend and which points of extension your plug-in offers. Plug-ins provide reusable code following explicit conventions to separate API from internal code. The Eclipse component model is simple and consistent too. It has this kernel characteristic. Eclipse has a small kernel, and everything is done the same way via extension points.

    Some other projects have adopted similar conventions. For example, France Telecom is known to maintain the distinction between lib and api packages:

  • Unpublished javadoc.

    J2SE implementations consist of two parts:

    1. Classes and interfaces implementing the published J2SE APIs.
    2. Internal implementation artifacts that aren't meant to be exposed to users of the J2SE libary.

    Sun generates Javadoc only for the "official" classes. Implementation artifacts are undocumented are not supposed to be relied on.

Both of these approach amount to the same thing: convention. Nothing stops you from using the non-published public interfaces. It will be interesting to see what will come out of Bracha's JSR.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Modules vs Packages

Doesn't this come down to Java not having a modular interface system? In Dylan (IIRC), and I presume other languages, you expose your interfaces separately and possibly independently depending on what you want exposed in a separate module file.

Does it come down to an inability to expose multiple interfaces dependant on context?

The Martin Fowler link is

The Martin Fowler link is wrong.

Re: Martin Fowler link

Fixed, thanks.

C# has internal

public, private and internal

Internal is accessible only to code within the same "assembly", which I guess is much like a module, with the proviso that one "assembly" can nominate its "friends", which then also have access to classes and methods declared "internal".

My Java is quite rusty

My Java is quite rusty and I don't know much about C#, but I think the default access in Java (when no access modifier is specified) is similar to "internal" (access granted to code from same package). However, there's no possibility to declare other packages as friends.

Can one selectively declare which internal classes/methods "assembly friends" in C# have access to? If they get access to everything "internal" this may not be desireable. I understand the proposal as certain packages being able to access more of a package than "the general public" but still not having the same access rights to everything "internal" as the components from the original package itself.

.NET assemblies and friends

.NET assemblies are arbitrary collections of classes (more of a module than a package), and the classes they contain aren't restricted as to namespace (a namespace is orthogonal to an assembly). Assemblies are also "conceptual" entities, in that an assembly may comprise more than one physical file, allowing a portion of an assembly to be re-deployed without disturbing the remainder. This means that .NET assemblies can be fairly granular and self-contained -- "public" interfaces on an assembly pretty much are "published", while "internal" interfaces are "public-to-me-and-my-friends".

I'm not sure I agree with the idea that a module should simply expose two levels of "public" interfaces -- in the case of a componentised library, it makes a sort of sense, where you would want one "library-level" API for your users, and a "side-door" API to cooperate with other components. But in the general case, there may be more than two levels of access you wish to grant, and the problem becomes rather complicated.

Using the "friend" mechanism, .NET 2.0 allows multiple levels and vectors of access. A single assembly may be divided into several: a central "hub" assembly, containing core code, and multiple "gateway" assemblies containing cooperative code. The "hub" allows internal access to all the "gateways", and the gateways in turn allow internal access to their respective cooperative peers. Access is controlled at a fairly fine grain, and of course any of these assemblies can expose truly "public" interfaces as they see fit.

The biggest limitation here is that "friends" must be declared by name, in order to preserve security. That means that you must know ahead of time which other assemblies you're willing to trust, which may not be possible.

What's the Difference?

Could someone explain what the difference in this case is between a 'public' and 'published' interface? And what's the difference between a 'lib' and an 'api'?

A public API can be called

A public API can be called by other classes in the same application. A published API has been made available outside your codebase.

You can change a public API more easily than change a published API. If you change a public API you have to change other code in your application. Refactoring tools can do this for you automatically. If you change a published API you have to coordinate the change with all the users of your API and you lose all your powerful tool support.

In this case, the 'api' package contains code that is published to third parties and the 'lib' package contains code that is used in the implementation of the 'api' package but that is not guaranteed to be stable between releases.

Tool support for published API migration

If you change a published API you have to coordinate the change with all the users of your API and you lose all your powerful tool support.

There are refactoring tools which allow you to package up automated migration scripts which can be run by external api users. They just aren't in common use yet, as they tend to require all of your users be using the same refactoring tools (and the tools themselves are fairly new). I wouldn't be be surprised if we see a standardization effort in the next couple of years. There's too much value in allowing easy API migrations.

Re: Tool support for published API migration

Sounds interesting. Do you know the names of any off the top of your head?

Pretty much any Java IDE...

has at least a simple implementation of it. IntelliJ IDEA has long had a limited "Migrate" utility, that handles class move and renames. Eclipse is adding the a fuller utility that also handles method moves, renames, and signature changes. I believe JDeveloper is already shipping with similar. NetBeans is now shipping with the JackPot project, which provides a fairly full DSL for both migrations and for creating code audits/quickfixes.

That's true, but because

That's true, but because there is no standard for this it doesn't begin to address the problem. The distinction between a public and published interface is that you have no control over the users of your API, and so cannot force them to use any particular IDE.

Modula-3

An internative to having published/public/protected/internal/whatever fixed scheme to be verified is the one used in Modula-3 of having an arbitrary number of "partial revelations" (I hope I remember correctly) of the interface provided by a module.

Can't you do that just by

Can't you do that just by creating a new module that imports the whole old module interface but only exports part of it?

Not just by convention

At least in Java there are various means of preventing access to public but unpublished APIs. Gilad Bracha's upcoming JSR and the sister JSR 277 (module system) should make it easy and standardized but it is possible without them.

Eclipse uses the OSGi framework for this kind of thing, which "wires" plug-ins (OSGi "bundles") together using special ClassLoaders. These can ensure that only the right packages are being used in a given dependency.

NetBeans, in a somewhat similar fashion, permits package restrictions. By default no packages from a module JAR are available for use by other modules (i.e. cannot be linked against). As the author of a module you may enumerate certain packages to be "public" and usable by other modules. (Or you may list packages to export to only certain named "friend" modules.) As a back door, a module may request to use any package from another module - if it "signs a waiver" by declaring a dependency on the exact implementation version of the provider module, thus making the fragile nature of this dep explicit. All of this is enforced at compile time (through the Ant build infrastructure) and at runtime (through ClassLoaders).

I believe the J2SE partially enforces the set of published APIs by restricting access to internal packages via SecurityManager, but I don't know much about this.

There is a third basic technique for differentiating public from published classes/packages that I know about: validation. In the Java world, the "100% Java" testing tool is probably one of the earliest examples. More generally, Lattix lets you define hierarchical "rules" about which components in an app can access which other components (or external components such as the JRE or libraries), as part of your application modelling; rule violations can be browsed interactively in the GUI tool, or reported as warnings or errors during a build. There are probably many other examples in this area. Of course this style presumes that the user of the API cares enough about the public/published distinction to explicitly run such a tool.

With regards to upgrading clients of published APIs after an incompatible change (or deprecation): this is indeed a relatively young area for tools, especially among open-source choices. I am following the Jackpot project which may be successful in providing this kind of functionality for Java apps. It uses javac's native syntax tree and semantic model to represent a body of code - currently undergoing standardization. Jackpot can then run queries or transformations on the model, which can be written in a simple DSL for the common cases or in Java for more sophisticated cases.

Re: Not just by convention

jglick wrote:

... Bracha's upcoming JSR and the sister JSR 277 (module system) should make it easy and standardized...
Dalibor's take on this is entertaining as usual:

Last time someone tried to get people excited around one of those ueber-exciting JSRs that will totally reshape the future of Java (deployment), was JSR 277. That's the one where OSGi meets Maven, and they have a love child that is like a CPAN for Java, only with small JARs. Sorta. Kinda. It's hard to tell since the JSR 277 has not produced anything since its inception last summer, besides enthusiastic exclamations of support when it was announced.

The man has a point. JSR 277

The man has a point. JSR 277 is one of the most important JSRs in quite a while and they haven't made any effort to communicate with the users.

Standardization is very important at this level

Gilad Bracha's upcoming JSR and the sister JSR 277 (module system) should make it easy and standardized but it is possible without them.

True, but standardization would provide many benefits, particularly in the area of tooling. Right now, just about every Java IDE and build system has some conception of project structure, modularization, dependency rules, and (sometimes) versioning. Unfortunately, all of these conceptions are independent, extra-linguistic, and painful to map between. Putting these concepts into Java would make it simple to move between tools. It would also allow the tools to become more powerful, including analysis and critique of modularization, and automatic refactoring to improve modularization.

Two thoughts

First is that package-protected is the likely mechanism to allow this behavior if you really need it. If its your code - then its in your package and if you didn't make it public then nobody outside your package can call it.

My second thought is that this is yet another effort to distrust the programmer and will result in a bunch of corner cases that don't quite work - much like Java's crummy type system doesn't quite work either. Fact is, you don't need the compiler to enforce this stuff really - just put in a comment/naming convention to make your intent known and then break stuff when people don't follow the rules. They'll learn.

I can't count the number of times I've discovered some method made protected that probably should have been declared public. Compiler enforced access control is a waste of resources.

The purpose of the extension

The purpose of the extension is to avoid having to pack everything that "needs to interoperate" on a level below public access into a single, huge package.

I can't count the number of times I've discovered some method made protected that probably should have been declared public.

Depends on who wrote the code. I usually think before I make something public/protected/private. When people change my modifiers to access a protected method because they "need it", they're usually just taking the wrong approach.

I see no arguments against compiler-checked access modifiers other than that it makes it a bit more difficult to write some messy code in a hurry that "I'll clean up one day". It's not the compiler's fault if programmers make the wrong choices.

And I don't see it as a kind of "distrust". I see it as a way of being able to make certain guarantees about how the objects behave and enforce the correct way to use them.

and then break stuff when people don't follow the rules. They'll learn.

If I rewrite my internal interface, and somebody broke the rules and uses it somewhere else extensively, then it's me who has to clean up his mess so I get the program to compile to test my new code.

If somebody messes with my private state at some point without my knowledge which causes something to crash later in my code then it's again me who'll have to hunt down the source of the error.

I can't count the number of

I can't count the number of times I've discovered some method made protected that probably should have been declared public.

Me, too. Students seem to be taught that if they don't know if a method should be public, then make it private (i.e., be defensive). This seems like a good idea, but when the protection level is enforced by the compiler it can be a real PITA. The problem is that violating these mechanisms causes compiler errors rather than warnings. If I could reuse a whole library except for one method that is private, then I would rather see a warning and take my chances than have to write/maintain my own version of the library. You could always add another "enforce" modifier for cases where you really want a hard guarantee that a private method is private (i.e., where you are using the mechanism to enforce security).

If somebody messes with my private state at some point without my knowledge which causes something to crash later in my code then it's again me who'll have to hunt down the source of the error.

If you are talking about code reuse within a team, then yes, this can happen. But a warning on compile would show the location of the violation just as effectively as an error, wouldn't it? If however, you were talking about reuse externally — publishing a library for anyone to reuse — then you don't even have to know if someone violates your contract. They might get an error, and they might even (incorrectly) blame it on your code, but you don't have to know nor care about it.

Onus is on the developer using the code

If I rewrite my internal interface, and somebody broke the rules and uses it somewhere else extensively, then it's me who has to clean up his mess so I get the program to compile to test my new code.

Sounds like you have a dependency problem in your workflow.

I don't kow where you work but in my experience, the one who writes to your interface is likely the one who owns the program and if he wants to ship with your updates, its on him to fix the program. All you have to honor is your interface.

Stepping into the other guy's shoes - if I need to ship something and the only way I can make your code work for me is to break encapsulation - then I'll do it happily and deal with the consequences. After all, if I have to bust your interface to accomplish my task, maybe your interface wasn't adequate.

In my experience, these enforced limits just make existing code less and less useful for people who want to do new things you didn't anticipate.

My opinion was based on

My opinion was based on what I'm working on, (lower-level) APIs in a huge program that are used by different programmers in other modules, but within the same app so the whole thing wouldn't start up until everything is fixed.

For developing stand-alone libraries, that's a somewhat different matter. Although if you sell your library for some $1000 and keep breaking your clients' code and just tell them "told you so" - they might go looking for a different vendor, even if it's their fault.

So this was with that kind of application development in mind. In other cases, it may not be essential. At home I'm writing Lisp. CLOS has no access modifiers and it's fine with me. I also enjoy dynamic typing. Just because I am more productive that way doesn't mean that you can live without access control/static typing in large applications with hundreds of MByte of sourcecode.

I don't think it makes code less useful - unless wrong access is set which indeed is annoying. I've run into this as well. Had to copy & paste an entire source file from a 3rd party library because of a totally unnecessary "private" modifier. However, I still blame it on the developers, not the language ...

After all, if I have to bust your interface to accomplish my task, maybe your interface wasn't adequate.

Maybe the interface isn't adequate, maybe you're using the wrong library, ... whatever, it's not the language's fault.

Re: Two thoughts

tblanchard wrote:

Fact is, you don't need the compiler to enforce this stuff really - just put in a comment/naming convention to make your intent known and then break stuff when people don't follow the rules.

You sound like Eric Naggum:

if you live among thieves and bums who steal and rob you, by all means go for the compiler who smacks them in the face. if you live among nice people who you would _want_ to worry about a red-hot plate they can see inside a kitchen window or a broken window they need to look inside your house before reporting as a crime (serious bug), you wouldn't want the kind of armor-plating that you might want in the 'hood. that doesn't mean the _need_ for privacy is any different. it's just that in C++ and the like, you don't trust _anybody_, and in CLOS you basically trust everybody. the practical result is that thieves and bums use C++ and nice people use CLOS. :)

I'm almost convinced by his argument.

Be wary

These arguments by emotional anologies are usually misleading and unhelpful. See how they intend to play on your emotions by talking about "thieves and bums" vs "nice people"? Don't fall for it.

Personally, I like the simplicity of public vs private. I think their value is seriously undermined when they are just a "convention". I don't think that I'm a criminal or surrounded by criminals for having this opinion.

What starts out simple...

...quickly bogs down in complex scenarios. Nothing wrong with public and private until (a) someone really needs access to private definitions that the original designer could not foresee - problems with the Open-Closed principle; or (b) the software grows to a level where a private/public dichotomy is no longer sufficient for all the abstraction boundaries that are required (which is what this thread is about).

From my perspective, language designers should recognize that the visibility properties are "meta" information. As such, coming up with a way to manage meta-information holds the key to coming up with a long-term solution (as opposed to a band-aid). Sure it makes it simple to intertwine metadata with the implementation details, but it also presents problems of flexibility.

The problem with the visibility of private, protected, public, package (or whatever) is that it does not take into account that visibility of methods is very much a function of the layer of abstraction. What might be public at one level, should be private at another. And that point where the abstraction takes over is not always on a nice us-vs-them boundary.

I am wary

I have been treated like a criminal by other developers who locked me out of some useful functionality. In the end I end up picking their locks to make things work. My choice of course but I've got to eat too.

I have no trouble with the "idea" of public/private, but why must the compiler "enforce" it by refusing to build the program when I disagree with the author? Its only his opinion vs mine. Intent noted, now please stand aside while I ship this thing.

I think we wouldn't be having this discussion if it were possible to build a program with a flag like -dont-enforce-access-controls so the program would treat all violations of encapsulation as warnings. If you want to build strict - hey - god bless. If you want to build sloppy - hey - good luck.

This is why I like the ObjectiveC compiler. Its very polite and quite helpful with warnings - but it will still build your program if it can. You ignore warnings at your own risk.

Software workaround for a project management problem?

I have no trouble with the "idea" of public/private, but why must the compiler "enforce" it by refusing to build the program when I disagree with the author? Its only his opinion vs mine. Intent noted, now please stand aside while I ship this thing.

If your code has been overlooked as a customer of that feature, which is now getting in the way of shipping your application, the right solution is to go up the chain and have you added as a customer. Bypassing the compiler can have consequences like:

* The other guy can at some point rename/rewrite/delete his rightfully private method.

* The other guy could assume that a certain condition does not exist, looking (normally) only at his own code for calls to the private method. Your calling his private method may violate that condition, and now he has a bug to chase because of you.

Even without static checks, if the method has comments like "don't call me unless you're in the ABC group", then the right thing is not to call it, but to find a mediator/manager/architect/etc. to discuss it.

Zero, one, infinity rule

This seems to be about change control boundaries. You start at level zero with a language that has no inner boundaries. Then you say: we need one change control boundary. You add the public/private distinction.

That is OK for a while, then you start thinking: I wish we had two levels of change control. Private, public, published. How plausible is it that two will be enough? Surely this is the point to move to a more general structure.

More General Structures

For sure!

I believe the creators of Java have said repeatedly that they made it pretty darned simple on purpose, explicitly avoiding things which in the long run might be more powerful, but in the short term would prevent people from learning the language. (single/multiple inheritance, public/private/friend/package, etc.)

It would be neat for there to be "a more general structure" for just about every distinction made in languages? How does one make it something which is maintainable and not have it just lead to spaghetti?

Rule is a partial order

I understand the "zero, one, infinity rule" as proposing a partial order on the set {0,1,2,∞}. 2 is worse than the others.

Maybe 0) a distinction is unnecessary, or maybe 1) a binary distinction is appropriate, or perhaps ∞) a more general structure is best. The zero, one, infinity rule is silent on this.

The underlying idea is that a three way split is rare. If there really are three options, (left, right, straight on?) then the rule misleads, it is much better to have left and right rather than making do with left and filling your code with left,left,left as an idiom for right :-)

The typical situation is that a single binary distinction proves unsatisfactory because a threshold is being applied to a continuous quantity and it turns out that a finer quantisation is required. Adding a second threshold is quite attractive, because it is a small increment on complexity. On the other hand, the same dynamic that is creating the need for finer quantisation is likely to still be active. A third threshold will be added eventually. Meanwhile, although the language considered in isolation is simple, minimising the quantisation error when you only have three levels to play with is an on going complication.

The underlying issue is how much money will it cost to propagate a change to a function. If it has been kept private, it should be cheap. If it has been widely published it will be expensive. Perhaps one needs an intermediate level: public, but one is thresholding a continuous quantity (cost in time and money) with a wide dynamic range. Using two thresholds to implement a 3 level quantisation looks like the kind of comprise that will wear badly.

I've always liked the idea

I've always liked the idea behind Eiffel's access controls, where you provide a list of types that have access to each attribute. An empty list would be equivalent to private, the this type equivalent to protected, and the any type equivalent to public. I've not however written anything if Eiffel so I don't know how well this works in practice. I've also not heard of any other languages employing something similiar... anyone out there that knows some or can speak about the utility of this feature?