Symbol visibility (public, private, protected, etc.)

I'd be curious to know what people's opinions are on the matter of symbol visibility, and whether there are any new and fresh ideas in this space.

Now, you might think that symbol visibility would be a rather dull and pedestrian topic for a language theorist - few research papers on language design even mention the topic. However, I think that this is actually a very interesting area of language design.

Limiting the visibility of symbols has immense practical value, especially for large code bases: the main benefit is that it simplifies reasoning about the code. If you know that a symbol can only be referenced by a limited subset of the code, it makes thinking about that symbol easier.

There are a number of languages which have, in pursuit of the goal of overall simplicity, removed visibility options - for example Go does not support "protected" visibility. I think this is exactly the wrong approach - reducing overall cognitive burden is better achieved by having a rich language for specifying access, which allows the programmer to narrowly tailor the subset of code where the symbol is visible.

Here's a real-world example of what I mean: One of the code bases I work on is a large Java library with many subpackages that are supposed to be internal to the library, but are in fact public because Java has no way to restrict visibility above the immediate package level. In fact, many of the classes have a comment at the top saying "treat as superpackage private", but there's no enforcement of this in the language.

This could easily be solved if Java had something equivalent to C++'s "friend" declaration, the subpackages could then all be made private, and declare the library as a whole to be a friend.

However, I wonder if there's something that's even better...

Comment viewing options

as a grunt in the trenches programmer

i mostly heartily concur. especially since i just did some stuff in go. i say mostly because usability is a double-edged banana. things can (a) be poorly done in the language spec or (b) even if done well then the end-users can go crazy and perhaps make horribly complicated relationships that just make the code harder to grok for the next person down the line.

Different interfaces for different clients

This is only an idea, I don't know if it is implemented anywhere. I would prefer to write only minimal visibility annotations within code, at most something like Node.js's export magic variable. Instead, visibility would be controlled by separate interface files (akin to ML's module interfaces), and you could have several interfaces for different clients - your library would have the full interface to all modules, "subclassing" libraries would have a partial interface, and client programs would have only a very limited API. This would also allow to abstract types.

The only thing that is problematic with this approach is that it would be quite cumbersome for the programmer, or at least I have not yet thought of a way to make it obvious which interface applies to which part of code.

Something even better

Use modules & module signatures for information hiding.

Friends are important

Not all visibility relationships are expressible via hierarchical containment. Protected is especially useful in languages that support implementation specialization via inheritance.

Friends are important

Not all visibility relationships are expressible via hierarchical containment. Protected is especially useful in languages that support implementation specialization via inheritance.

This is expressible with

This is expressible with module signatures, provided you have something that can model inheritance with modules (e.g. mixin composition of modules). Suppose you have class A, with some private and some protected and some public methods. Then you define B to inherit from A. You can model this with a module A, to which you apply a signature which hides the private methods but not the public and protected methods in A. Then you do mixin composition to obtain B, then you apply a signature which hides the protected methods from B to obtain the public version of B. This is of course a bit more manual than with a protected keyword, but I'm not convinced that protected is such a common pattern that it deserves its own keyword if it can be expressed using just signatures.

what is the rule of thumb here?

So we can have a bunch of things that can be wired together to achieve a goal. Or we can implement another tool that is a succinct way of getting that goal. When do we know we should add that to the system? Vs. trying to not do that, to keep things "simple"? What are people's experiences here? I always like the sound of e.g. go-lang's parsimony, but then whenever I go to use something like that it drives me freaking nuts. "Just give me protected!" I rant and rave at the screen...

I don't think there is an

I don't think there is an easy rule of thumb. You would have to look at a number of case studies and see whether protected makes them clearer and easier to express. Then you decide whether that is worth the added complexity.

Information Hiding

Controlling access to information is useful for reasoning about programs. However, there are many means to achieve this, and I'm not at all fond of the 'symbol' based visibility model.

Symbol based visibility is problematic for several reasons. They greatly increase coupling within the 'class' or whatever unit of visibility is specified. This can hinder reuse, refactoring, decomposition of said unit into something finer grained. Using symbols for binding can also hinder composition and inductive reasoning: gluing subprograms together (even methods) based on hidden symbols is very ad-hoc, non-uniform. In some cases, it even hinders testing of subprograms. Further, symbol-based visibility is generally second-class, inflexible, difficult to extend or precisely graduate or attenuate.

My alternative of preference is the object capability model (cf. [1][2]). Attenuation can be modeled by transparently wrapping one object within another. But I also see value in typeful approaches. Linear types make it easy to reason about fan-in, fan-out, exclusivity. Modal types can make it feasible to reason about when and where access occurs, locality. Quantified (âˆ€,âˆƒ) types with polymorphism can make it easy to reason about what kind of access a given subprogram is given.

Symbol visibility is simplistic, easy to compute compared to some of these other mechanisms. So performance is one area where it has an edge. More sophisticated designs, such as object capability model, would rely on deep partial evaluation and specialization for equivalent performance.

Don't agree

Symbol based visibility is problematic for several reasons.

To the extent that I understand the problems you're complaining about, I think they're solvable without getting rid of symbols or even the use of symbol visibility as the mechanism for information hiding. In particular, it sounds like many of your complaints would be solved with a way to bind to sub-theories rather than pulling in the entire theory of a set of symbols.

binding without symbols

If binding is orthogonal to use of symbols, that does eschew one complaint (re: "gluing subprograms together (even methods) based on hidden symbols is very ad-hoc"). But I don't see how it helps with the others.

Could you give some examples

Could you give some examples of the problems you mention?

aggregate complexity

The problems I'm noting are mostly shallow in nature. Refactoring is hindered for obvious reasons: code that calls a method private to a specific code unit cannot (directly) be refactored into a separate code unit. Testing is hindered for obvious reasons: ensuring a 'private' method is visible to the testing framework. Flexibility is reduced for obvious reasons: our visibility policies aren't themselves programmable.

The remaining point, regarding composition, is simply an issue of symbol-based binding in general (be it public or private) - especially if the binding is recursive.

Anyhow, while the source problems are shallow, they do add up. And that aggregation becomes the real problem. Isolated examples don't much help with exposing this class of problem. Best I can recommend is that you read about some of the alternatives I mentioned (such as object capability model, linear types) and the motivations behind them.

These problems are of

These problems are of public/private/protected in particular. Modules like in OCaml don't have these problems because you have an explicit construct to selectively hide components: signature ascription (which is similar to casting an object to an interface).

For example if you have a module Foo that you give a public signature Bar with PublicFoo = Foo : Bar, then you can still test the hidden functions in Foo by simply testing on Foo instead of PublicFoo.

export control, too

The same problems apply to, for example, Haskell's approach to hiding information within modules (based on export control - a form of symbol visibility management). It isn't just OO class public/private/etc..

I think refining and/or adapting interfaces is useful for the flexibility and testing concerns. It does not seem sufficient to address the other two concerns.

In Javascript,

I tend to use a combination of the object capability model and multiple specifically tailored 'interfaces' per object. As Javascript doesn't have classes (only objects), an interface takes the form of a wrapper object.

The entity that creates an object can then distribute these interfaces on a need to know/do basis. In Javascript this scheme is particularly useful, as it protects (to an extent) against cross-site scripting attacks.

Being javascript, this approach is of course not statically typed. But it could be.

Submodules in Racket

Matthew Flatt has a great paper on explicit phase separation in racket submodules.

Visible vs Published

I don't think that a variety of levels, such as private, protected, package, public, etc. justifies the complexity cost. I've worked on quite a few real-world systems where goofy things were done to workaround accessibility restrictions. Sometimes, to get the job done, you need to violate encapsulation or otherwise take dependencies are implementation details. I value making this explicit, but do not value making it difficult. For example, the use of a leading underscore in Python is preferable over explicit reflection in Java.

There are also different schools of thought on what the default visibility setting should be. Should you be required to export the public interface? Or hide auxiliary types & functions? I prefer the former, so long as the private bits are still accessible. With the Common.js module system for example, you can't get at internals even if you're willing to consciously violate public contracts.

Although Clojure offers a {:private true} metadata, it's pretty common to see all functions defined as public and the promised/versioned/reliable functionality to be manually copied from an implementation namespace to a separate publicly documented namespace. See clojure.core.async for an example of that. As a consumer of such a library, I like this pattern, but wish it was a little less verbose to accomplish myself. There are also some unaddressed issues with aliasing mutable values, such as thread-local vars.

knock on effects

So after the code was mangled to get the product out the door, and then the 3rd party vendor changed the library and thus broke the mangling, what happened / what happens? I think the root cause of these problems is not public/protected/private, I think it is e.g. not using open source code :-) I say that half really seriously.

Availability of source is irrelevant here

the root cause of these problems is [...] not using open source code

Just because you can look at or change the source, doesn't mean you can deploy the change.

For example, you may know for a fact that you're deploying your code on to Linux machine running a particular version of the JVM. You don't expect to change operating systems and you're unable to change JVM versions without affecting any other services running on the machines you're deploying to. The Java base classes may not expose a public mechanism for facilities it can not reliably provide for Windows. However, you know that you can safely rely on the fact that a particular private method exists for an underlying native Linux API.

Another example I've encountered: The source code I was programming against existed at runtime as a dynamically loaded library, stored in read-only memory, on widely deployed consumer electronics.

the 3rd party vendor changed the library and thus broke the mangling, what happened / what happens?

Depends. When you knowingly violate a library contract, you need to have a contingency plan. You can 1) not upgrade 2) do feature detection, employing a fallback 3) plan with the 3rd party on a transition plan. Or any of an infinite number of other things.

Hell, I've had to work around accessor protections on C# code written by me!

The world of deployed software is complex.

thanks

These are great examples, thanks. So
(1) if the whole software and hardware stack were done sanely such that we didn't have to do all this hackery, what would it look like?
(2) if we assume we can't have a sane stack all the way through (ahem) then at our top level where we consume and interface with those other things, what would that best solution look like? so that (2a) it doesn't make the same mistakes and (2b) it somehow wrangles the mistakes of others? E.g. I mean what if we had a principled approach to wrangling the hacks?
p.s. you're hired!

Who says we we have now isn't sane?

I mean, it's certainly not *ideal*, but it is very much sane... from the operational perspective of any one rational actor. The net result is that the organism of engineering teams and the software community may seem chaotic, but it's built from individually sane decisions, at least mostly.

You use the word "hackery", but I want to be clear: It's only hackery if you perceive it as such. Going back to my original post, I don't think it's hackery to call private methods denoted by leading underscores in a Python codebase. It is, however, hackery to have to use reflection to do similarly in Java. The distinction as I see it: In Python, I think "Oh, this is a private method. Is it safe to use it? Yes." then I go ahead and use it. In Java, I think the same thing, but have to take perfectly sensible code and mangle it in to the exact same logic, but lifted in to the domain of reflection. It's one thing to encourage (or even require/enforce) some sort of assertion of my intention to violate encapsulation. It's an entirely separate thing to force me to dramatically switch interfaces to my language runtime to accomplish a task that is mechanically identical between the two. And it's *totally unacceptable* to disallow such behavior completely.

(1) if the whole software and hardware stack were done sanely such that we didn't have to do all this hackery, what would it look like?

Pretty much exactly the same, just significantly less in total (non-hackery included)!

what if we had a principled approach to wrangling the hacks?

I may be pretty far from what you were asking now, but hopefully this gives you some insight in to my perspective. I'll take your questions to be:

"How should we handle publishing, enforcing, and amending agreements between parties within software systems?"

I think that this is a very human question. In practice, this is most often solved with social (rather than technological) means. Honestly, I don't have any good technical answers. I'd just rather we discuss the problem holistically, rather than assume that there is a universal logical system that will solve the problem magically.

I've said before: How does a type system prove that your game is fun? It doesn't. Similarly, how does a function being marked as "public" guarantee that somebody upstream won't just delete the function and break the API? It doesn't!

it is very much sane... from

it is very much sane... from the operational perspective of any one rational actor [..] it's built from individually sane decisions

This strikes me as analogous to: "every voice in my head is sane, therefore I'm sane" ;)

Is making sausage math?

You found part of programming that is more like making sausage than math. A lot of attention used to be given to this sort of question at places like Apple in the late 80's and early 90's, where not breaking third party developers was a priority, and interface design was influenced by expected future need to make changes that don't instantly break apps depending on established api contracts.

It's hard to change what has been exposed, so good design often amounted to "don't expose it if you want to reserve option to change", and paid lip service to name and version schemes. But it's only easy to do linear versions, which isn't nearly granular enough for multiple complex interactions among devs who sinter together libraries with a graph of version inter-dependencies.

How should we handle publishing, enforcing, and amending agreements between parties within software systems?

Symbol names have something to do with publishing. Folks who love types dearly hope enforcement is done via types, even though this might require strong AI to represent complex interface contracts correctly in a verifiable way.

Amending agreements is a lot like mutation in concurrent code: don't do it when possible to avoid, because the chaos is costly to resolve. You should not change contracts without also changing names and/or types too, in a way that unambiguously tells consumers in responsible notification.

At a personal level, when updating old code, it's very dangerous to change the meaning of any old method, or a field in data structures. In the worst case, it might compile and build anyway, despite static type checking, and pass tests just well enough to let you ship before you find out what you did. Much safer is a scheme to add new methods and stop using old ones (if contracts permit this). Just treat code like immutable data in functional programming, and you'll usually be fine; let old deprecated code become unused and then garbage collected. But you may never know when devs consuming a library stop using old symbols.

In my daily work, if I ever change the meaning of a field, I also change its name so a build breaks if I miss a single place that needs fixing. Every single place a symbol is used must be examined to see if new behavior matches the expected contract at each point. The old name becomes a "todo: fix me" marker that must be studied everywhere it appears.

There isn't a nice answer. Making sausage is not for the squeamish.

exposed = visible or published ??

It's hard to change what has been exposed, so good design often amounted to "don't expose it if you want to reserve option to change"

This is why I've drawn a distinction between visible (or accessible) and published. "Exposed" is too vague.

Let's go with the Apple example. It is trivial to see or call all the private bits of an Objective-C API. However, you need to go *looking* for the private stuff. Either with tools (object browsers, decompilers, etc) or with source (not so lucky in this example). Apple enforces the ban on the use of private APIs with automated validation tools on their end. They can only do this now because they control the app store, the primary distribution channel. This wasn't always true, so they had to try to discourage utilizing private functionality but not publishing their existence (primarily in documentation, including auto-complete) and by introducing some barrier to accidental use (not published in default headers).

Although distribution control enables enforcement, it isn't necessary for the approach to work. A warning could be raised during build, test, lint, or load time. That warning could be treated as an error at any of those times too, even if recognized earlier. For example, a network administrator may choose not to allow private API usage in deployed applications on company workstations for fear of future compatibility issues. However, you can bet your bottom dollar that administrator would want the ability to overrule such a restriction for a critical line of business application! Can always pay somebody to fix it; might even be worth it.

locks only keep out the honest

(Read my use of "exposed" as meaning exposed in the public contract, not merely discoverable when you poke around behind the facade. Otherwise merely existing means being exposed and there's no difference between existing and exposed.)

I stipulate your points; we seem to agree. Finding entry points, then making up your own interfaces so you can call them will very often work — up until someone patches private code with supposedly no third-party consumers. For example, nothing stops you from re-declaring C++ classes with private fields made public, but responsibility for the contract violation is clear when this happens.

To the extent Apple polices use of private interfaces, they are doing a service to third-party developers who might otherwise simply get burned when rules are broken. A bad quality experience for users reflects well on no one, so Apple has incentive to stop devs from burning themselves. I find it slightly amusing Apple gets cast as the bad guy here. You can't let people insert themselves wherever they want.

(For example, you can't stop a burglar from picking locks and taking up residence in a living room easy chair. But that doesn't mean they get to stay when you come home. Finding and picking the lock doesn't grandfather a new contract they write themselves without your consent. It would make a funny comic strip panel though.)

I think it's good to have tools letting you express what you wish was (publicly) visible for various sorts of use, using both general rules and explicit case-by-case white-listing as seems convenient. Additionally, I think devs should think about each entry point and decide (then document) who is expected to call what and when.

Simplistic assumptions

When you knowingly violate a library contract, you need to have a contingency plan. You can 1) not upgrade 2) do feature detection, employing a fallback 3) plan with the 3rd party on a transition plan. Or any of an infinite number of other things.

Right. Except that in practice, that doesn't work. What happens in practice is this: you provide a component A, some other party provides a component B using your A, and stupidly, relying on implementation details it is not supposed to rely on. And then there are plenty of clients Ci who use A and B together.

Now you release a new version of A -- and it breaks all Ci. Now, those clients couldn't care less that it was actually B who should be blamed for all their problems. You'll get all the fire, and you'll have to deal with it. And more often than not, you'll be forced to back out of some changes, or provide stupid workarounds to keep irresponsible legacy code working. Or not change the system to begin with.

That is not a scalable model of software development. Modularity only exists where it can be properly enforced.

Energy, time, space

and stupidly, relying on implementation details it is not supposed to rely on

An engineer takes a look at a library, estimates the implementation and assumes a number of invariants, needs a performant algorithm thus designs it against those invariants, tests it, and it works. There is a good case that the provider of component A shares the blame since he could have known that the concreteness of software development forces his users to break the abstraction. (A good module where it may be assumed that you cannot get away with a pristine abstraction exposes its innards. In a good manner, I am not claiming spaghetti code is good.)

And then, somewhere, a too high-brow attitude anyway. "Look, a compiler is a functor! Now everything is neat and explained." Software development must, and will, be messy since reality will always kick in. If A breaks B then we'll fix it again.

Enforcement

Ideally, the prohibition against reliance on encapsulated details would be tied to publication. i.e. the system will not let component B be published because it relies on internal details of component A and the publisher of A has selected a policy of not allowing such dependence.

But it would be hard to enforce this technically unless everything was going through a marketplace for publication. I suppose it could be enforced legally (the license specifies no dependence on internal details). Even if you choose firm language-level rejection of encapsulation violations, the effectiveness of such measures depend on the distribution method. Are you distributing a library as a header file and binary blob or running a web service?

Legalities not appreciated

I know a man who told me he fired a programmer because "his code looked like a painting." True story.

Generalizing from that sample size of one, I doubt a marketplace for publication will be accepted.

Respectful and disrespectful imports

Not only do I think it's hard to enforce module encapsulation, I think the rewards from cracking open an encapsulated module are sometimes worth the cost of brittleness and voided warranties.

Ideally, the module system would make it easy to publish code that respects encapsulation boundaries, but would still make it possible to publish code that doesn't. Whoever installs a disrespectful module should have to manually resolve and maintain this situation quite a bit more than if they installed a respectful one, but only because that's essential complexity. If the module system hadn't made it possible, they'd still have spent that effort plus whatever effort they needed to work around the module system.

Three days ago, I probably wouldn't have said this. This thread's been food for thought.

I've also been thinking about legal annotations on code, some of which could talk about what's allowed at publication time, so our positions are very similar.

Depends on release management

Releasing a new version of A doesn't necessarily break anyone. Deploying a new version of A does that. If B is locked to A version 1 and you change the relied upon internals in A version 2, then the Ci clients will only all fail if you force the new A upon them.

Different language ecosystems perform at different levels of poorly in this regard. I'd like to see more progress in the versioning, release management, deployment, and other software ecosystem aspects. However, there's also some fundamental differences between releasing software libraries and deploying services.

The blame will fall on the last system to change anything. If Team A deploys v2 of Service A to a shared cluster, they will break all the Ci clients. However, if Team A' deploys v2 of some Code Artifact A to a build repository, then Team B will get the public blame, and rightfully so, if they upgrade to A v2 and then redeploy without any validation.

My point is basically this: Depending on internals is going to happen from time to time. How you smoothly you can deal with the repercussions is what's most important to me.

Do what I say, not what I do

Stepping back a bit (which is unhelpful in a specific situation, but helpful when planning for, say, language design), the problem seems to be a shear between A's declared interface (what it's claimed to do) and practical interface (what it actually does, for the purposes of B). We judge B by testing it, which is practical, and therefore if there's a shear between the declared and practical interfaces, the actual form of B will favor the practical over the declared. Testing is always the preferred criterion (what's that line about 'I've only proven it correct, I haven't tested it'?), so it's... impractical... to demand that B depend only on A's declared interface unless it's possible to test B using the declared interface. Language design affects the shape of B's practical dependencies on A, the shape of A's declared interface, and how well or badly matched those are to each other. Hmm.

Testing?

I don't follow. Are you suggesting that testing needs more privileged access (to dependent components) than ordinary execution? If so, why? If not, how is testing relevant to the problem?

And how is Knuth's quote related? The problem of today's SE practice certainly isn't too much trust in proofs and too little testing -- it's too much trust in tests and too little reasoning. And FWIW, tractable reasoning is only enabled by proper (i.e., reliable) abstraction.

Leakage is not privilege

When an abstration leaks, no privilege is needed.

The client ultimately ("end user") cares that the software does what they want it to, full stop. In a showdown, actual behavior trumps abstruse mathematical contracts. It follows, logically, that the winning move for contracts is to not be in conflict with behavior.

Hiding Implementation Details

I think there needs to be a mechanism for separating implementation from interface. I don't think hiding beyond that is necessary, although I don't agree with the OO approach of combining data-hiding with objects. I find the Ada way of having modules for data hiding and tagged types for object polymorphism as a much better system.

Having said that the object capability model is the way to go for an operating system. For real runtime data-hiding you need to manipulate the segment registers or page tables anyway.

I think the reasons for each are different and need to be kept separate, interface/implementation data hiding is about enabling structure in large projects and allowing teams of people to work together effectively and is a static source code thing. Capabilities are about security and need runtime enforcement to be secure, and are dynamic, as I should be able to remove a permission from a running program.

contracts

I like to give the hiding of implementation details a more concrete rubric: code with public contract.

Whatever is public is what is necessary for the consumer of the interface. Sometimes this does mean exposing a mechanism or implementation detail-- because it's necessary for proper use.

That which is enforced-private should be all those elements of the implementation irrelevant to the consumer.

Capability models then have a framework to sit in. In an operating system where there is a menagerie of consumers, a fine-grained and variable approach to interface consumption fits cleanly.

Modelling and Upstream

I try to avoid modifying the internals of libraries. I don't want to be tied to maintaining compatibility with future versions, so I stick strictly to the API, never use private-APIs. I would rather re-implement the functionality in the application than use a private-API and I would adjust development times, and prioritise features accordingly. As I prefer to use open-source libraries (even when developing on closed platforms like iOS), in the rare cases where library changes are required I have worked with the library developers to get the changes I need accepted into the library as an upstream patch, meaning I don't have to be responsible for future maintenance of that code.

I understand the temptation to use private APIs and break the abstractions, but in my hard won experience it is always a bad idea.