Libraries suck

I just had this argument crystallize for me after a conversation with Manuel Simoni: http://akkartik.name/blog/libraries

I'm realizing I'm pretty far-out liberal on Steve Yegge's spectrum of programmers, so I'd love to hear reactions from people here -- who seem to be from all over the spectrum.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'd say this is pretty much

I'd say this is pretty much what I have to say about object oriented programming... First it seems like you can code on a higher abstraction level, then you suddenly get struck by 120+ character identifier names claiming you have done some kind of type error with a parameter you have never heard of. Oh well, if you try the Boost library for C++ I guess you asked for it.

I've come to the conclusion that you shouldn't use object oriented programming, you should use object oriented thinking. Objects and classes will do fine when reasoning about stuff, but for actual implementation there are other stuff that are better at making things modular. Some languages actually call them "modules".

Don't twist this as an FP

Don't twist this as an FP vs. OOP argument. C++ sucks on its own level with non-modular type checking, you can totally have modular type checking in a OOP language.

As for the OP, abstractions invariably leak, but we just have to deal with that. There is definitely a cost to reusing a library, and the benefit you gain from reuse should outweigh that cost; if it doesn't, don't use the library.

You said it

This is basically it -- the cited problem is with type-checking C++ templates. Lacking "concepts" or a similar mechanism, template errors are identified at the deepest possible level rather than the shallowest possible level. Taking that and turning it into a criticism of "libraries" is definitely a bridge too far.

I don't disagree with anything you said, and yet..

Words like abstraction and reuse and cost/benefit have long led me astray.

Units of reuse aren't made in one big-bang moment. They evolve. You write code for one use case. Then maybe you find it works in another, and another. The organic process of reuse requires a lengthy gestation period. But our language and our libraries obscure this fact.

We imagine our job as programmers is to create abstractions. It is not. It is to understand the code surrounding us, that we interface with and use to create new things. Our job is to *understand* abstractions that have come before and proved themselves and maybe, occasionally, if we're very good and a little lucky, to create an abstraction that others will find useful. The conventional way we view libraries obscures this fact. It makes it seem like creating a unit of reuse is just a matter of running gcc/ar/ranlib. And so we're awash in libraries with prematurely ossified interfaces.

I still don't know what you

I still don't know what you are getting at, your thoughts are a bit too abstract. I think it might be related to Joel Spoleky's "Abstractions Leak" or the "Duct Tape Programmer" posts.

Yeah, I might not be saying anything well-posed or original

I'm mostly just working things out for myself, and looking for feedback on the process. What I have so far -- and what this post tried to summarize -- was that the world would be a better place if we all know a little bit more about our dependencies.

I think this is consistent with Joel Spolsky's posts. Part of the reason jwz is such a good programmer (and also part of Feynman's prowess) is this willingness to bounce between the two sides of abstraction boundaries. To truly understand that all abstractions leak is to be willing to learn about their internals, and if everyone did this we might even end up building less leaky abstractions rather than shackling ourselves to their leakier initial drafts.

I didn't have any twist in

I didn't have any twist in mind, I merely saw two cases that seemed to share a common type of problem: A promise of abstraction that bogs you with implementation details.

It might be C++ specific when it comes to OOP, though I'd say the problems are there for Python as well except the error messages are a bit easier to decrypt. Apart from that I don't have experience of other OO languages.

Pragmatism is in order

This reminds me of the approach to early rigorous mathematics education where you're not allowed use a theorem until you've proven it yourself. In areas where there are deep but fundamental theorems this can send you down some pretty long rabbit holes. With CS, the situation seems even worse -- isn't your OS just a library? If you can't use a modern filesystem without understanding how it's implemented, I foresee a low productivity future for you.

I agree

If a part of your codebase never kicks up an issue, it doesn't matter whether you understand it or not :) I tried to allude to pragmatic considerations when I said start with lowest-maturity/highest-risk.

---

And oh, guilty as charged, I'm pretty low-productivity. Productivity is over-rated.

Fahrenheit 451

The way I read this is that we should not become so dependent on our libraries that when they fail we cannot fix them. Libraries can fail due to licensing, bugs, inflexibility, or poor hierarchy. Understanding the internals increases our own ability to use the library, as well as increases the life of the library, should a maintainer go away, or an owner choose to no longer distribute it, etc.

Yup

With hindsight, perhaps the names I've used are confusing. Libraries aren't bad if you learn how they work. But in today's world they mostly seem to discourage poking inside them. We're more concerned with some speculative future issue like, "If I make a change how will I merge it upstream? I'll be maintaining my hack forever." Don't worry about that for now, just go understand how it works.

Thanks for working with my incompetence and taking the trouble to figure out what I meant :)

Immature libraries do in fact suck.

What I have discovered for what it's worth is that standard libraries are mostly excellent and third-party libraries mostly suck.

The distinction is mostly about maturity of design and implementation.

A library's maturity grows, IMO, with about the order of magnitude of the number of projects it's been used in and debugged relative to. The standard libraries of the language are used in tens of thousands to hundreds of thousands of programming projects -- possibly millions. So you get library maturity of, say, four, five or six. A third-party library, usually, has been used in hundreds or maybe thousands of projects -- maybe tens of thousands if it's something popular like boost. So you have maturity of, say, two or three, possibly four.

This is important, because level-one maturity is usually a first-cut design and often has categorical mistakes that make it impossible to do some broad categories of useful tasks. Level-one maturity usually also has unnecessarily complex interfaces, unnecessary or undocumented dependencies, portability or build problems, and lots and lots of bugs.

Usually around maturity level two or three, implementations whose designs are informed by the flaws and shortcomings of previous implementations become available, and these are better (simpler interfaces and no unnecessary or undocumented dependencies) but you'll be fighting with obsolete or misleading documentation, bugs in the new implementations, and sometimes significant incompatibilities between library versions. At level three or four, you'll still be looking at incompatibilities in different versions of the library, but by now they'll be minor, and the documentation will be catching up.

Once something has maturity level five or six, I expect it to be well designed, completely and usefully documented, consistently implemented, and don't expect to encounter any significant bugs in it. Mature libraries are a joy to work with - they save you tons of implementation effort and have already had (and fixed!) the mistakes you were going to make.

But most libraries, aside from the language runtime libraries specified by the standard, never reach that point.

Ray

Exactly

I'd just add that maturity = #users * time.

I almost feel we need a different name for immature libraries (which are now the vast majority of all libraries). That allows users to set expectations about the level of churn in the interface, and frees up library writers to correct earlier missteps. I think the Go language did this well, and lots of other projects do think about gradually firming up interfaces. But many projects also stay on one side or other of that divide. My impression is that open-source C++/java projects have historically cared about compatibility too soon, and the rails eco-system cares about compatibility too little. 6 years in, fairly standard gems are still constantly in a state of flux.

Another side-effect of a new name for immature libraries: it would utterly invalidate the title of my article.

However..

This is less important and more subjective, but I think we also need more attention to deprecation and deletion from mature libraries. Mature libraries tend to be great from a stability or performance perspective, but bad APIs do creep in. Libc and the java standard libraries and Common Lisp have lots of ugly warts that don't look like they'll ever be excised. (Is this just because they were prematurely ossified back in the day?)

My pet peeve: Lisp's multiple flavors of equality are still with us 50 years later, leading later languages like javascript astray. The superior approach IMO is to use structural equality by default, and offload the semantics of eq to an operator that returns a stable address for a value. In my toy language I call this operator addr. So I'd replace (eq a b) in Common Lisp with something like (equal (addr a) (addr b)).

Yeah, and?

So I'd replace (eq a b) in Common Lisp with something like (equal (addr a) (addr b)).

... and then people would just write

(defun (eq a b) (equal (addr a) (addr b)))

and mutter vaguely insulting questions about why you didn't provide it for them. And when enough of them had done so, people would say, "hey, look, there are dozens of definitions of this thing, it ought to be a standard item."

Not necessarily

a) That isn't what actually happened. In reality eq came first, in an era more miserly of compute cycles.

b) It *could* happen for some features, but you don't know that for sure. Short-circuiting the gradually-clarifying need for something is a recipe for bloat.

c) If it *did* happen, I'd argue the system was working as designed. That's kind of my whole point: that things should prove themselves before we freeze them.

Most programmers don't actually use eq that often. Python and ruby and perl still don't have it. No vaguely insulting muttering has resulted. (Is it really a big enough inconvenience to cause even muttered insult?) Most of the time eq is just an optimization, usually a premature optimization, and even otherwise one that can mostly be delegated to the implementation.

I think languages should provide opinionated defaults and empower users to change them if necessary. Wart lets you override the default equality comparison to simulate eq, or extend it to do so just for specific types or in specific scenarios.

Python and Ruby both have "eq".

Python's "eq" is the "is" operator. Ruby's "eq" is, confusingly, named "equal?". (I'm a fan of Baker's "egal" predicate, myself.)

Yes you're right

I was corrected elsewhere as well.

Reuse is more basic than programming

I don't care directly about abstraction, but I do care to act in ways that will keep on giving. The pursuit of reuse leads me to abstract away any non-portable details, until mathematical argument and computation are some of the only dependencies left. That's why I end up programming.

Understanding one's dependencies is useful, but so is black-box modularity, i.e. designing a subsystem so that it's reusable across a wide range of possible interacting subsystems. This kind of design comes in handy as the developer communication cost and reengineering cost rise. Our current systems may not always give us perfect black-box modularity, but we probably use them because their cost is low enough to compete with the costs of the alternatives.

Personally, I'm concerned about how the communication cost and reengineering cost will rise once I'm no longer around to maintain what I've made. Source code documentation would make reengineering another option here, but not if the "I'm no longer around" is actually "I'm focused on another project and I'd rather not take a detour to redesign the language I'm building it on." So black-box modularity is important to me even on a solitary developer basis. :)

(Disclaimer: Kartik and I have debated this topic for quite a while now.)

Agreed, though I don't follow your title :)

It's been a while since we debated, but if I recall correctly we aren't really disagreeing, just focusing on different parts of the state space. I'm totally fine with black box modularity for mature libraries as long as we pay attention to the process of gestation, and don't treat all libraries as equally mature.

I'd also prefer to leave a little escape hatch for even seemingly-mature libraries -- just in case we find a better API or something :) But that should be covered by the usual deprecation best practices.

The price of "free" libraries

This is a specification and quality control problem. In computing, we have become used to an environment where library vendors are not held responsible for non-compliance with their own specifications. Libraries suck because they are allowed to suck.

There are exceptions, and they are not cheap. See the International Mathematical and Statistical Libraries (IMSL). Each mathematical subroutine comes with a signed statement by a mathematician about the accuracy and limitations of the subroutine.

Many years ago, during the heyday of Ada, I ordered an Ada compiler, with an purchase order specifying a validated Ada compiler. The product showed up with a sheet of paper saying that the compiler did not pass validation and a validated version would follow. I was at an aerospace company at the time, and simply turned the package over to incoming quality control. The package was marked with a large red tag - REJECTED by Incoming Inspection - Does Not Conform To Specification. The vendor had to refund $40,000, was put on quality watch by purchasing, and a competing product was procured.

Aerospace still has that attitude. If A won't properly interoperate with B, you check the spec to see who's wrong. The non-compliant vendor gets a reject and has to fix their side. If the spec isn't specific enough to decide who's wrong, the spec is defective. This is why you can unbolt a Pratt and Whitney engine off a jetliner, bolt on a Rolls-Royce engine, and go fly.

That's a great story

Part of the problem is what I refer to, for want of a better symbol, as the tragedy of the commons. To some extent such tragedies are unavoidable in society. The trajectory of bureaucracies made such patterns hard to see -- you had to watch them closely over decades. Software is great because we can see dysfunction laid out over a period of months. Since the problems are easier to spot, I tend to be optimistic that we'll be able to do better with free software. You get a lot from free software in exchange for a certain amount of "doesn't validate completely against spec." I just think we don't squeeze out all the benefit we could because of these mental shackles, these imaginary constraints on the changes we can make.

A delightful snippet

Programmers manage abstraction boundaries, that's our stock in trade. Managing them requires bouncing around on both sides of them. If you restrict yourself to one side of an abstraction, you're limiting your growth as a programmer[1].

Don't manage labor; eliminate it.

You are not describing abstraction boundaries, you are describing technical contradictions, and programmers who find solutions to such contradictions are not merely programmers but society changers. Solving a technical contradiction requires innovation. Frequently, those who solve the puzzle don't waste time bouncing around from one contradiction to the opposing contradiction. Rather, they see something nobody has seen before, and they design and code it.

You may have put your finger on it

Well there's something to be said for looking at both sides of an abstraction during development (if that were never necessary then we'd never write any functions in the first place).

But I think that maybe what you're saying here is that, when a software problem is found, we shouldn't have to pierce multiple layers of abstraction to find it, but rather the bug should manifest as a specification error (and if bugs are otherwise possible, it's a failure of the specification that should have excluded such a possibility).

If that's what you're getting at, I agree completely. Programs should be correct by construction.

correct by construction?

If that were to become true then yes, my entire point might be moot. Can you point me to any papers on 'correct by construction'?

I'm not sure why you use the qualifier 'during development'. Do you mean during development of the abstraction? I'm pushing for 'always'. Or 'during any development I do'.

My post was deliberately agnostic of language or platform. But I suppose I don't know much about the life of say a Haskell programmer.

It's not necessarily a defect

So much as a bad process. But another saying besides my credo, "Don't manage labor; eliminate it", is also "One fact, one place, one time". Things work best if a specification is the single point of change.

Consider manually writing DAOs, DTOs, manually ferrying the data to domain objects, manually encoding visitors, manually doing change tracking, etc. There's a lot of points of failure in such a process.

Suppose an underlying field changes, or the order of serialization changes. How do you permanently avoid such issues? Such synchronicity issues in code can be handled through macro-expressivity or a custom DSL equivalent to a macro, and many additional safe ways, although one would ideally subsume all others.

It's a theory of organizations, which can be automated by computers acting on behalf of humans.

Fair enough

Don't repeat yourself, then?

I've definitely had some success, in the working world, formalizing hairy problems enough that the repetitive error-prone parts are mechanically derived from a formal spec, as you say.

Entanglement

Kartik's article misses the real problem, entanglement, which happens when choice of abstraction is tightly coupled to a particular implementation. This problem is common due to import by name (cf. modularity without a name and ban on imports). The same problems are experienced - e.g. with respect to failure modes, and monolithic dependencies - whether we import libraries by name or bind specific services by URLs.

The surface of any service involves many abstractions: authorities and responsibilities, roles and contracts, protocols and workflows, APIs and data schemas. The same can be said of the surface of a library, excepting that many PLs are poor for expressing the full range of service abstractions - cannot effectively address instancing, sessions, concurrency, maintenance, auditing, authority, persistence, extension, resource management. A consequence of this weak expressiveness of PLs is that many libraries become monolithic frameworks that Greenspun a fair portion of a PL themselves and require an awful lot of boiler-plate to integrate.

I've long advocated that PLs should better address decomposition of apps into services and composition of services into apps. If your PL does not effectively address services-as-libraries, then "libraries suck" is a property of your PL.

Once you accept service-based decomposition as part of a PL design, we can address important questions such as "good" models and modularizations of services - expressing and enforcing service-level abstractions; controlling binding and entanglement; addressing concerns of security, safety, extensibility, upgrade, concurrency, auditing, distribution, persistence, partial-failure, resilience, etc.

The real problem

... is not imports or names. Import polymorphism is a real problem, as Gilad discussed. But imports are fine just so long as you remember they are only for interaction with people (to aid with binding resolution). Imports are fuzzy. Imports should be dev time.

Yes, this is something I

Yes, this is something I learned when I did Jiazzi. That the external configuration of imports in units/modules/components wasn't really that great of a feature, that all the use cases for configuration connections were either contrived or I was abusing the system to do something else that wasn't really related to modules (open classes).

Specific use-cases can

Specific use-cases can reveal meany problems. But they are a biased lens. Many systemic problems cannot be observed through use-cases, especially when those use-cases are influenced by or described in terms of integration with an existing system. Issues such as software reuse, reconfiguration, and securable composition are unlikely to be observed in use-cases for any particular project. And yet there is great value in addressing them.

Wonderful, but I'm

Wonderful, but I'm commenting on today's world. If you simply make module import from Java-like languages externally configurable, you aren't doing much because imports are almost always made with specific implementations in mind. If we are talking about alternatives to type parameters, then ya, there is a use case we can address with external linking. But the two concepts are very different.

Scalable interaction with people

Even if we use imports "only for interaction with people", that's enough to lead developers straight to dependency hell.

An individual has the time and energy to directly maintain a number of relationships that is relatively small compared to the population as a whole. In a group of N people, there are N^2 potential communication relationships. That simple law is why we need hierarchies and bureaucracies. And it has a significant impact on modular organization of software.

Relationships with people are subject to variations over time and space, shifting in response to markets, resources, and requirements. If the number of relationships is small, we could perhaps maintain all these relationships concretely, by hand at dev-time. But, in general, we must specify policies and roles and manage relationships and "interactions with people" - including imports - automatically.

If you don't handle it in the PL, you'll end up hacking it (horribly) with tools like autoconf, cpp, and cmake.

An alternative to hierarchies

David, if I understand correctly, you're saying that hierarchies arise to organize a certain kind of decision so that large numbers of people are in sync and can work together.

Another possible solution: the population converges on using a specific solution over time. It requires individuals to be more conservative in the dependencies they rely on, and to grow an immune system that can assess new dependencies and choose between competing ones that provide overlapping functionality. Individuals may also go out of sync with the herd for short or long periods.

I used darcs for a couple of years but now I'm on git. That required me to learn to migrate my repos, something that an organization might delegate to one specialized little team in the basement.

I glimpse an analogue to the CAP theorem here. The 'market solution' allows individuals to go out of sync for periods of time in hopes of increased flexibility, and the ability to manoeuvre more quickly in response to changes in the environment. Hierarchies try to keep everyone in sync all the time, 'protecting' individuals who haven't grown an immune system yet, in hopes that obviating the need for redundant immune system management will provide efficiencies at scale. Both seem plausible as responses to the quadratic explosion you describe. Which is better depends on how much manoeuvring the environment requires.

In science fiction terms, we're talking about Star Trek vs Cyberpunk worldviews.

Hierarchy is a local structure

I did not mean to suggest a preference for global hierarchy, or whatever it is you're imagining with the "Star Trek worldview".

I'm quite fond of system heterogeneity and data model independence. I see heterogeneity as natural and valuable for evolving systems, relationships, and markets. I make much effort to address and support integration of heterogeneous systems in PL designs. (And heterogeneity benefits greatly from effective automation of relationships and imports.)

Hierarchy is typically a local structure in a larger, heterogeneous system. E.g. a company may have a hierarchy, but there are many different companies and the relationships between them aren't really part of any hierarchy. In secure modular software systems, small hierarchies will tend to exist in the resource models and distribution of authority.

A point to consider: even to "assess new dependencies and choose between competing ones" can become a O(N^2) effort if the number of competing solutions is proportional to the number of people or companies. (The prevalent "Not Invented Here" philosophy makes this a not unreasonable assumption.) More importantly, often the choice of competing solutions is a property of a relationship, not well isolated to any group.

Sure

The Enterprise is just one starship among many :) (Ok, I think that analogy's outlived its usefulness.)

Empirically it doesn't seem to me that there's quadratic explosion in competing choices. Do you see that differently? It also seems intuitively reasonable: there's just so many different problems. If everyone in the world was working on todo list apps or erlang compilers I'm sure there'd be a lot more todo list apps or erlang compilers. But fortunately the world seems to need both, and a zillion other things besides.

I'm trying hard not to characterize either side as a strawman. Let me know if my biases are betraying me. Our world is a lot more driven by hierarchies than people realize. My sense is that after favoring large organizations for a couple of hundred years, the pendulum is swinging back a bit toward many more, smaller organizations. That shift has implications for software that I'm trying to work out.

There is a roughly linear

There is a roughly linear increase in competing products with population. The quadratic explosion happens in the number of comparisons needed to choose the best one (re: the effort to "assess new dependencies and choose between competing ones"). Similarly, integration code tends to grow quadratic with heterogeneous choices.

That you don't see quadratic explosions directly is because there are already practices and mechanisms to mitigate them. Packaging of decisions (e.g. Ubuntu comes with Gedit, no need for each user to consider the many text editors). Compilers often use intermediate languages or representations (to support multiple front-end languages independent of multiple back-end optimizers).

Competition is further mitigated by the general fact that, in any competition, there are usually a few clear leaders to choose between. The rest are mostly ignored, yet are still important for the health of the system (new leaders must come from somewhere).

While there aren't many Erlang compilers, that's mostly because (relative to more popular languages) there aren't many Erlang users. :)

Ok, that makes sense

(Trying to return to the original topic..)

Ubuntu and compilers are responsible for taking some set of possible configurations and testing them to make sure they work. I am assuming that such testing will always be necessary. At any point the integrator will choose between a handful of choices -- and support some small subset of them -- for every component of the system he's providing. This doesn't seem like a problem to me.

This my core disagreement (or difference in emphasis), I think: PL techniques will never suffice for avoiding DLL hell. You'll always need to be mindful of your (transitive) dependencies. That's a social issue, an education issue.

Social Programming Experience

PL techniques will never suffice for avoiding DLL hell. You'll always need to be mindful of your (transitive) dependencies. That's a social issue, an education issue.

Hmm. I don't make a strong distinction. Cf. my article social aspects of PL design. We're really in the business of programming experience design (PXD) - a term coined by Sean McDirmid in a discussion with Jonathon Edwards in An IDE is not enough.

A wise PL designer will consider various social acts that contribute to development, dependencies, deployment, configuration, integration, maintenance, extension. Even if the vision is mostly of a solitary programmer, addressing rare social elements can improve the programming experience. A PL designer with a great interest in social experience or CSCW might pursue concepts like ubiquitous programming and micro-programming (cf. my article on ubiquitous programming with pen and paper).

Similar to social aspects, a PL designer can consider didactic aspects. Curl language, for example, was carefully designed to address issues such as a smooth learning curve and avoiding discontinuity spikes. My own efforts address those plus additional didactic concerns: compositional properties as a formal basis for intuition, a smooth progression from UI to PL and back so people gain useful programming intuitions without being aware of them, object capabilities as a means to formally graduate authority of a growing developer, support for revocation and resilience so people can perform exploratory programming and recover from mistakes. (I describe a few of these in a recent LtU comment regarding the motivations for RDP.) Bret Victor recently describes Learnable Programming, which certainly benefits from various PL design elements.

I vaguely remember that I once thought as you seem to - that PL design as something so distinct from various social, didactic, CSCW, and UX aspects. That view is encouraged by many programming languages. But it is not essential. Today, programmers experience a great canyon between PX and UX. But, like most canyons, we can find some places that are easy to bridge, and even find their ends if we walk far enough along their edges. If you seek a smooth live programming experience, I think that leads quite handily to regions of the design space where the distinction between PX and UX is only a fuzzy line on a political map.

I agree that developers will always need to concern themselves with dependencies, relationships, integration. The question is: how much will the PL help? When I consider possibilities such as stone soup programming I think the PLs can help a great deal in addressing various aspects - social and otherwise - of dependency management.

You've given me lots to digest

Many thanks!

My science fiction comment was based on your motivations for RDP :)

You'ved placed the blame in the wrong spot

Is there a need in a large scale system for automatic selection of components? Perhaps. I'm not really convinced of that, but I'm not arguing against it either.

What I am arguing here is that this shouldn't have anything to do with 'import'. Importing symbols should just be about namespace management. e.g. when you write 'import MP3Player', you should just be bringing in symbols defined in that namespace.

Look at your "import by need" idea from your link. It suffers from two additional problems to the ones you have listed:

- It resolves look-up after dev-time via a name. That's bad. Variables should be declared. Name look-up post dev-time is bad.

- You have some ad hoc assertion mechanism that specifies some properties the import is to have and you have to specify them every time you import that function.

Note that even C-style header files didn't have those problems. The main problem we had with C-style header files (ignoring the fact that it was a terrible pre-processor hack) is that they required each declared symbol to have a single global implementation. The solution, IMO, is to instead track such partially specified symbols as parameters and leave imports to namespace management.

Loading symbols or

Loading symbols or specifications I don't have a problem with, so long as it happens in a structured manner to control entanglement and coupling. My articles modules divided: interface and implement and user-defined syntax each describe structured use of names for loading symbols, specifications, or even full syntax.

I disagree with your "that's bad" assertions. Since you make no attempt to justify those assertions, I am unsure what assumptions you're making... but I doubt I share them. Look-up of behaviors after dev-time by shared symbols is the heart of OOP, row polymorphism, and other disciplines for modularity. More broadly, runtime lookups are quite useful for service brokering, hyperlinking, mime-types and app extensions, mobile code, etc.

That aside, nothing about import-by-need requires it happen "after dev-time". Much can be resolved statically. A language designer can address concerns for enforcing when certain dependencies are resolved, whether at dev-time or at deployment-time or even some later time. So long as we have clear stages, the timing of dependency resolution can be quite orthogonal to the mechanism for dependency resolution.

The problem of specifying on every import can itself be modularized. I did address that in the linked article. I agree that one might more conveniently address it with named interface modules, at a moderate cost to flexibility and uniformity.

Closer to agreement

Your "modules divided" article seems reasonable and along the lines of what I'm suggesting/doing. You still seem to be coupling import with interface, whereas I think it's simpler to allow people to import whatever they want (interface, implementation, new types, etc.). The important bit is that importing gets you exactly what's there statically at import time -- not also a promise to link to some other implementation. Linking is parameter substitution, and should be an orthogonal mechanism to namespace management.

I don't understand why you say that 'import by need' can be resolved at dev-time. In some cases I'm sure it can, but, in general, implementations need not even be dev-ed yet when you import an interface. What am I missing?

I didn't offer too much detail for why post-dev name resolution is bad other than to compare it to declaring variables. I still won't, because I think the advantages of static binding are well known and the analogy is pretty strong. You say that "shared symbols" are the heart of OOP, etc., but I was careful to refer to "names". The heart of OOP is dispatch on messages, but static binding of names to messages is consistent with OOP, commonly done, and is IMO a good thing. The same applies to row polymorphism or "other disciplines of modularity".

You're assuming we must

You're assuming we must support a "general case", but you seem to be forgetting that general cases don't happen unless we want them to. A language designer can ensure, for example, that validation tests on import are either pure or constrained in authority according to stage.

Re: You say that "shared symbols" are the heart of OOP, etc., but I was careful to refer to "names".

I noticed your use of names. But your careful use of names ("It resolves look-up after dev-time via a name. Name look-up post dev-time is bad.") was inappropriate when describing the problems with import-by-need, which uses shared symbols but does not use names. So, with that context, I assumed you have not seen or experienced much of a difference between the two (which is not unusual).

it's simpler to allow people

it's simpler to allow people to import whatever they want

If by simple you mean uniform, yes. But it's simplistic. It leads to complex entanglement issues. Consider your reaction to the assertion: "it's simpler to allow people to mutate whatever they want".

That said, by 'interface' I'm envisioning something close to an ML module signature, perhaps with some behavioral contracts. Support for types descriptions or a few fully constrained definitions would not be unreasonable.

The important bit is that importing gets you exactly what's there statically at import time -- not also a promise to link to some other implementation. Linking is parameter substitution, and should be an orthogonal mechanism to namespace management.

No, no, no. That is exactly wrong.

The important bit is that importing DOES NOT tightly couple you what is available at import time (whether that time is static or otherwise). This decoupling provides the wiggle room to avoid entanglement, enables orthogonal code upgrades, supports adaptation to different environments or resources, enables users to leverage preferences and policies.

Namespaces are not essential for computation or scalable programming. At best, they're a convenience. Linking of modular components, whether via substitution or unification or some other means, is the essential and important bit.

After reading your other

After reading your other post, I don't think we're in nearly as much disagreement as this new post implies. The most important issue semantically is linking, which I agree can be more than just vanilla parameter substitution (unification or lightweight constraint integration are fine) even though I still think of it as just substitution.

But...

The important bit is that importing gets you exactly what's there statically at import time

No, no, no. That is exactly wrong.

The important bit is that importing DOES NOT tightly couple you what is available at import time (whether that time is static or otherwise). [...]

The mechanism for avoiding tight coupling is linking. You specify that your component has a certain freedom (a generalized "parameter" that could be refined through unification / constraint solving) and then that "parameter" can be substituted/refined at a later link time.

Nothing in the preceding paragraph requires or has anything to do with imports! The only reason 'import' is related to any of this is that library producers will generally deliver an implementation and a specification and library clients will generally import the specification. And the specification should be imported verbatim.

And I see no reason to limit importing any kind of value or type. Any such restriction would just be annoying and serve no clear purpose. It's not like mutability at all in that regard. I guess you could argue that by forcing programmers to only import abstractions you are nudging them in the right direction, but I'd hate such a rule. To me that's just as arbitrary and annoying as a rule that functions can only be 100 lines long. No thanks. Just get rid of the obstacles that would make me reticent to use abstraction wherever appropriate and I won't need any prodding from a nanny language.

As a reason for why I will sometimes want to import an implementation rather than an interface, I believe that you should import enough specification that you can, in principle, prove the correctness of your local component. Finding a useful abstraction that specifies all of the properties a client will need is non-trivial. In cases where you're not going to bother abstracting (no parameterization), it's often better to just depend on the full implementation (at least until you have time to find a good abstraction).

Finally keep in mind the context: I'm responding to your claims that imports are bad. I'm not trying to prove that namespaces or packages must be present in a language. I'm only arguing for the existence of clean approaches to imports (orthogonal to linking) that don't have the entanglement issues you're worried about.

The phrase "correctness of

The phrase "correctness of your local component" is a contradiction in terms. Correctness is always contextual, never local. I.e. a function is correct with respect to a type or spec; a type is correct with respect to some larger context. (This is also why correctness is not compositional.) But you can prove partial correctness up to a partial specification and context.

It's not like mutability at all in that regard. [...] won't need any prodding from a nanny language.

Many of your complaints have been uttered before (with similar pathos) by people who don't see issues with pervasive mutability, who believe their personal discipline, experience, and foresight are reliable at scale. They don't like the idea of nanny languages, either.

You say you "see no reason", but I think that's an issue of perspective. Try a different one.

We prevent pervasive mutability for reuse, reasoning, and sanity of downstream developers. Controlling entanglement is similarly in support of downstream developers, who may wish to extricate part of a project or take most of a project code but with a few extensions or tweaks to fit a new context (and hence a slightly different notion of 'correctness'). To that downstream developer, your close-minded efforts to import implementation details is cause for much frustration.

Much of the time, you are a downstream developer. Designs that benefit downstream developers benefit you. Therefore, you have much reason to favor nanny languages that 'nudge' upstream developers, though you may not initially appreciate them when writing your own bits of code.

We have "don't pollute the river" laws because, no matter how much you believe you can depend on your own discipline, you can't depend on the discipline of every industry upstream.

I'm only arguing for the existence of clean approaches to imports (orthogonal to linking) that don't have the entanglement issues you're worried about.

You have yet to present an argument that the importing of implementation details and ad-hoc values and types won't have entanglement issues.

Correctness is always

Correctness is always contextual, never local. I.e. a function is correct with respect to a type or spec; a type is correct with respect to some larger context.

So? That you need details of the context was my point. Bind to the details you need. If you don't have an abstraction in mind, bind to the implementation if it's convenient. Yes, having implementation details scattered afar isn't good. So don't do that. IMO the ball is in your court to explain why namespace boundaries are a good place to enforce abstraction boundaries.

I'm going to abandon the discussion of which metaphor is most appropriate. That's getting a little too indirect. Except to say that I intend to prevent river pollution by filtering it out periodically at hydroelectric plants in my language.

So? That you need details of

So? That you need details of the context was my point. Bind to the details you need.

If you "bind to the details you need" and "you need details of the context", then the result is unreusable, inflexible code - i.e. code that is bound to a specific context, entangled with it.

IMO the ball is in your court to explain why namespace boundaries are a good place to enforce abstraction boundaries.

I did describe how I reached this conclusion in the linked article on modules divided so I did not see any reason to repeat it. I'll repeat the most immediately significant points for the lazy:

  • Dependencies between interfaces must be very tightly constrained in their dependencies to control entanglement.
  • To support effective reuse of specification code, interfaces may derive from one another and support limited parameterization.

The articles also provide an objective, operating definition of entanglement, in terms of the number of modules that must be copied to use a particular subprogram in a new project (new context).

Interfaces can provide some simple values (primitives, records, etc.), but nowhere near "any kind of value or type" simply because it cannot arbitrarily import other interfaces. But I don't see how implementations can ad-hoc import implementations while still controlling entanglement! I don't consider namespaces a "good place to enforce abstraction boundaries". Rather, I consider them a bad place to bind implementation details or context, for reasons of entanglement.

If you "bind to the details

If you "bind to the details you need" and "you need details of the context", then the result is unreusable, inflexible code - i.e. code that is bound to a specific context, entangled with it.

True, but there's no general way to avoid that. Proving that you're using some value in a way that produces the desired result usually requires additional specification of that value over and above e.g. the HM type. It doesn't need to make the code any more brittle or entangled because the code can be usable even if the proofs are broken. But there definitely seems to be some pressure to postpone proofs to later when things are somewhat stabilized in order to avoid breaking proofs repeatedly as things evolve (similarly, there's pressure to delay optimizations).

Dependencies between interfaces must be very tightly constrained in their dependencies to control entanglement.
To support effective reuse of specification code, interfaces may derive from one another and support limited parameterization.

I don't see anything about namespaces in that list. The reason it's difficult for me to follow the argument from your blog is that your imports do two things, and it's not clear to me which parts of the argument you think would still apply to the factorization I've been discussing.

[Interfaces] cannot arbitrarily import arbitrary other interfaces.

This doesn't make sense in my terminology. Interfaces don't import anything. Importing is namespace management. Do you have in mind that someone could write 'import windows.h' and suddenly everything they write has thousands of dependencies? That's not how my system works, for example.

Rather, I consider them a bad place to bind implementation details or context, for reasons of entanglement.

Namespaces are just for code organization. I should be able to organize my code any way I please. I don't want to be told that I can't split up my code along a boundary because that would violate some arbitrary rule. Organizational units are not abstraction units.

Construction by Proof

The module system I described leverages constraints to complete the links, I.e. construction by proof. This is a general way to address additional proofs (beyond HM types) without hurting flexibility or adaptability to contexts. (Actually, while enhancing flexibility and adaptability to contexts.)

I described interfaces in the article quite precisely. Why do you substitute your own terminology? Roughly, an interface is a module signature, ML style. It can serve as a namespace, albeit a closed namespace (as opposed to the cross-file C++ namespaces).

Your position that you should be able to organize code any way you please is contradicted if you have a goal to control entanglement. Entanglement is primarily an issue of organization, not abstraction, but I do not find it surprising that organizational constraints impact abstraction (cf. the earlier discussion of services). You've made your position and desires clear, but never justified them as wise.

This conversation is too

This conversation is too broken to fix easily. Maybe we can revisit this when I have something that I can share and point to and ask "where's the entanglement?"

Organizational units are not

Organizational units are not abstraction units.

This seems a rather extraordinary claim to make in passing.

I think if I made a list of organizational units, the vast majority are would directly correspond to abstractions. Consider just a few: directory, text file, paragraph, service, module, DLL, record, function, role, group, account.

Might a more rational position be just the opposite? Organizational units are the heart of many useful abstractions. What is a cloud? an organization of water vapor. What is bicycling? an organization of man, device, and motion. What is a monad? a structure that organizes code to compose in a sequence. Even broad abstractions - space, time, water, grass - get measured, divided, sometimes named into useful structures and organizations.

Entanglement is an organizational issue. I do not find surprising that constraints on organization (e.g. to control entanglement) can impact abstraction. I do find it surprising that you assume organization and abstraction to be separate issues. If you've some great revelation that will make your position as obviously correct as you assume it to be, I'd be happy to hear it.

What is bicycling?

What is bicycling? an organization of man, device, and motion.

This seems a rather extraordinary claim. Bicycling is a form of exercise and sport! Motion is an abstraction and can't be part of an organization! Women use bicycles, too!

If you've some great revelation that will make your position as obviously correct as you assume it to be, I'd be happy to hear it.

I'm not claiming that I'm going to going to shock and awe you into agreeing with me. I'm just hoping to provide enough context that we can communicate effectively.

I'm just hoping to provide

I'm just hoping to provide enough context that we can communicate effectively.

Then could you make a habit of justifying more of your assertions? Your arguments tend to consist of assertions I find dubious ("Name look-up post dev-time is bad.", "Importing symbols should just be about namespace management.", "I should be able to organize my code any way I please.", "Organizational units are not abstraction units.") presented without any explanation or justification. I suppose you believe them to be self-evident. When I ask for justification ("I am unsure what assumptions you're making", "Might a more rational position be just the opposite?") I'm simply rebuffed or ignored.

I don't know how you hope to provide any context except through communication. Perhaps you should blog some of your ideas or examples, so that you can link to them for extra context. It won't help for the lazy people who can't be bothered to follow links, of course, but they'd have no right to complain about lacking context.

Infrastructure support for contracts capabilities and unit tests

What's becoming clear to me is that some heavy infrastructure is needed to support all the compile-time, link-time, and runtime functionality you need in order to fully integrate interfaces, contracts, capabilities, and unit tests with modules and imports.

A "types-only interface specification" tells you the HM types of arguments and return values for a set of functions. A "contract" names a set of semantics those functions are supposed to have and a set of capabilities which they are not supposed to exceed. These two together are an "interface" as I'm thinking of it.

The contract is answered by a corresponding assertion in the unit that it conforms to a named contract. Next you have a "unit test" that verifies conformance to the contract for at least a few dozen cases, while profiling the proposed import to see about its performance and to see how much of its code gets exercised by the test cases.

Ideally the unit test reports 100% code coverage (meaning no unrelated functionality is hidden in the unit) and contract conformance. In practice 100% code coverage is hard to achieve when you don't compose the unit test while looking at the proposed implementation, so you will probably configure your system to prefer whichever potential import has the greatest code coverage (and speed) within the unit test, and may extend the unit test by generating "random" cases to test (at the expense of potentially long test runs) to try to get code coverage higher. Alternatively, in full paranoia mode, you may configure your system to "stub out" any code not exercised by the unit test with a jump to an error handler.

The unit tests go with the contract, and may require as much code or more code than the unit being imported! Their only virtue in terms of saving development effort is if the contracts themselves are standardized, and the unit tests may therefore be part of a standard library. In this case an implementation of the functionality could also have been part of the standard library, so, um, why not?? Because maybe the import integrates with something hairy or performs better? So, unit tests may include a known-valid but somehow undesirable implementation of the functionality itself to check the potential import against.

Each potential import may carry with it additional contracts (and unit tests) for subcontractor interfaces/libraries that *it* needs. The capability limitations you specified with the original contract that a unit must fulfil, are transitive and must be applied to these subcontractor units as well. If subcontractor libraries contain code not exercised by your unit tests of the library they're subcontracting for in the case of your program, you may need to auto-stub that code as well. This is because the unit tests for subcontractor units in untrusted code are themselves untrusted; you don't know that they're checking the right thing, so you can't accept their word that something conforms as testimony that the conforming thing is relevant or conformant to your own requirements.

Selecting the units that your build will use means selecting imports such that the transitive closure of all imports starting with those needed by your program is small/manageable/fast/has other desirable characteristics.

Eventually you identify a set of imports that will "complete" your program, and you link them all together. Now you have to compile in the capability management, valid-return assertions, and checks for any type information you couldn't prove statically, that you got from the program and from every subcontracting unit.

If you imported something in full-paranoia mode and inserted jumps to a handler rather than trusting un-exercised code, you have to add the handler, and the handler ought to record the set of arguments that led to testing the unchecked code, extending your unit test. If your unit test included a known-correct but undesirable implementation of the functionality, your handler may be able to jump to that implementation and retry, assuming it doesn't require state information that exists in a form stored by and only available to the imported code.

In all of this, you have to distinguish "trusted" code -- ie, code that someone you have paid money to and have a contract with accepts specific liability for, or locally-developed code that doesn't merit fully paranoid checking because you know you didn't put anything malicious into it, from "found" code -- ie, downloaded-from-somewhere code from someone you don't have any business relationship with, that requires full paranoia even at the expense of runtime checks.

Is this a reasonable description of the infrastructure you need to make this "interchangeable imports via contracts and capabilities" model a reality?

Ray

automatic mix of proofs and tests

I've been interested in techniques that might allow developers to express general contracts then use those to automatically generate a mix of tests and local proofs. This allows simple assertions to cover large numbers of tests, while allowing us to move forward with high confidence that might be less than 100%. I've seen systems that are good for proofs, and I've seen systems for automatic generation of tests, but I've not seen a system that provides smooth interpolation between the two.

I think 'auto-stubbing' to remove untested code is an awful idea. Even if code is untested or incorrect with respect to the domain model, it may still be protecting more global properties (type safety, duration coupling, etc.) that are easy to validate or enforce structurally but cannot be generically achieved via stub. Similarly, injecting runtime tests is dubious if the only thing you can do upon failing a test is violate abstractions a bit earlier.

Now you have to compile in the capability management

Why would you need to do this?

Of the capability models I've studied, object capabilities are the only ones worth learning and knowing. Object capabilities describe first-class, tight coupling of designation and authority. Developers must explicitly manage their object capabilities, and explicit management is a significant aspect of their value for secure interaction design - ensuring visibility, awareness, and that the path of least resistance is to grant least authority.

Using ocaps, you don't need to pass tests or type-checks to protect the important abstractions and services. You only need memory-safety, which is easy to ensure generically. Incorrect code is constrained in the damage it can cause by the object capability model. Developers can readily mix trusted and untrusted code.

Even better if your programming language also enforces real-time properties. (Batch processing tasks can always be modeled via incremental state manipulation with real-time increments.) If you have both ocaps and real-time properties, you're additionally protected against denial of service attacks and most timing-based covert channels.

You can have a healthy paranoia without elevating it to insanity.

Is this a reasonable description of the infrastructure you need to make this "interchangeable imports via contracts and capabilities" model a reality?

No. You can do a better job with simpler mechanisms.

Many of my comments that

Many of my comments that involve 'should' should be read as 'the way I think this should work' rather than as an assertion that I can prove these preferences are optimal. I do, of course, have reasons for my preferences.

Why do I think that unsupervised binding of names is bad? For one, it's a security problem. As I recall, Z-Bo's link has examples of what can go wrong. When I type a name, almost always, I'm attempting to reference a specific entity (even if that entity is a parameter). I want my IDE to make any ambiguity clear, allow me to resolve it, and have it easy for me to inspect what I've referenced. That requires it to happen at dev-time. For the same reasons that it's good for security, it's good for avoiding certain bugs.

Regarding 'organizational boundaries are not abstraction boundaries', I think it's useful to be able to import some but not all of the symbols defined in an abstraction, to rename symbols and give them local aliases, and even to split an abstraction between several namespaces (though that's probably rare). I'm not arguing that this is the one true way (though obviously I prefer it), but I am arguing that I don't see any entanglement issues. You claim you've demonstrated the entanglement issues on your blog, but then you summarize the argument in a way that doesn't mention name importing or namespaces at all.

The entanglement issue that I understand you to be describing has to do with failure to abstract properly, leading to dependencies of abstractions on things that should be implementation details. If names are semantically part of your abstractions then, yes, you can run into entanglement issues from failing to abstract from the names. But since I'm the one arguing that names and organization should not be semantically meaningful, I have a hard time understanding why you think my approach leads to entanglement.

The reason I am tempted to bail on this discussion is that it doesn't seem to be getting anywhere and because I do think that it will be much easier for me to explain my position with a little IDE for my language and some documentation rather than trying to figure out what set of concepts we're not using the same words for.

There are insecure

There are insecure approaches to binding of names. There are also secure approaches. It isn't clear to me that `unsupervised` is the relevant distinction. (If you depend heavily on supervision for computer security, you're doing it wrong.)

Use of names for 'specific entities' seems either rarer or less specific than you've been implying. For example, it is unlikely that you're using version numbers or hashes in your names. But I agree with a goal to understand and control ambiguity and resolution, both at dev-time and run-time.

Where you speak of current practice ("When I type a name, almost always [...]"), I note that you're constrained by current languages. It's difficult to appreciate other possibilities without experiencing them. I expect your habits would change if you had more options readily available.

The entanglement issue that I understand you to be describing has to do with failure to abstract properly

Entanglement is strictly an organizational issue. Entanglement may be objectively measured by the following question: "I want to take this syntactically bounded subprogram from project A and install it in project B. How much code do I need to haul to project B for this subprogram to be complete in its meaning?" Entanglement nearly always exists. The issue is controlling entanglement (e.g. with a big-O constraint or an asymptotic bound), not preventing it.

Entanglement isn't about abstraction. However, organization has a significant impact on expression of abstraction. It is necessary to constrain organization to control entanglement. Those constraints WILL impact abstractions.

Your "importing symbols from namespaces" does not control entanglement in any way. At the very least, you'll end up entangling the code that lists symbols from each namespace.

It isn't clear to me that

It isn't clear to me that `unsupervised` is the relevant distinction. (If you depend heavily on supervision for computer security, you're doing it wrong.)

Well, it seems we agree on the problems and some of the goals. This subthread started with my criticism of your proposed technique of import by need. That criticism was based on a particular understanding of what you had in mind, that perhaps was not correct. If that's the case, then I don't mind retracting my criticism, but I still haven't understood the details that would lead me to believe that's the case.

It's difficult to appreciate other possibilities without experiencing them.

It depends, but it's not that difficult in my experience. It's difficult to appreciate them without understanding them, though.

It's difficult to appreciate other possibilities without experiencing them.

I don't have trouble understanding this. I understand that I'm speaking to my personal viewpoints, even if it's not always clear, for expedience, in every sentence. You're the one that blogs about the non-existence of whole classes of solutions. When I speak up that I think you've missed a possibility in the design space, you insist that I consult your proof to the contrary.

For example, it is unlikely that you're using version numbers or hashes in your names.

Usually when I'm developing a module, the specific intention I have in mind with a symbol is "the most current version of this construct". Occasionally, I do intend "thisspecific version of this construct", and support for that is in the works.

Entanglement may be objectively measured by the following question: "I want to take this syntactically bounded subprogram from project A and install it in project B. How much code do I need to haul to project B for this subprogram to be complete in its meaning?"

In my system you will be able to export precisely what you need, regardless of how things are organized. And what you need is strictly about how well you've abstracted the dependencies. I have never claimed importing symbols from namespaces addressed entanglement - merely that it doesn't cause it.

Precise control of export is

Precise control of export is valuable. It can prevent accidental bindings. But I've yet to encounter or design a PL or architecture where it appreciably controls entanglement. Relevantly, programs become entangled due to lots of fine-grained, short sighted, intentional bindings.

"the most current version of this construct".

Oh? And do you mean that globally? (Is there even a total ordering on versions?) Or is there also some implicit constraint on where you look for this version? And where do half-completed versions fit?

My intended approach is actually more precise and formal about the sort of linker magic traditional languages leave to ad-hoc tooling. My motivation had a lot to do with understanding and modeling names (esp. Nominative types) in open distributed live programs where multiple versions must coexist.

Oh? And do you mean that

Oh? And do you mean that globally? (Is there even a total ordering on versions?) Or is there also some implicit constraint on where you look for this version? And where do half-completed versions fit?

-- Version 1 source:
x = 1 
y = x 

-- Version 2 source:
x = 2
y = x 

a = y -- a = 2
b = y@ver1 -- b = 1

Syntax is made up. The point is that no annotation means "the version that exists in the current version", but there is some way of bringing in older versions from source control.

I would be much happier if

I would be much happier if you were to take no annotation to mean 'automatically annotate as requiring the version tested against'. 'Current' or 'latest' version is far too ambiguous, far too dangerous.

You must remove all ambient authority from name designations

Any open-ended handling generally suggests ambient authority, which will either be abused, misused, or hard for programmers to understand the effects thereof. See a related point I made a long time ago: A little harder to get right than you might think.

But imports are fine just so long as you remember they are only for interaction with people (to aid with binding resolution).

A crucial innovation of bondi and its underlying Pattern Calculus is that the pattern variables used for reduction are separate from the binding variables used to control scope. In terms of usability, how would you say the various techniques in Haskell grade out for "interaction with people"?

Imports are fuzzy. Imports should be dev time.

Can you give me a practical example of what you mean? Does it mean the system allows a request whenever a subject has some capability that would satisfy it? Or does it mean the subject is required to explicitly present the capabilities that it wants to use with each request?

Consider my question in light of how you might implement configuring a system's run-time implementation.

Hey, that's the thread that triggered my article!

I found your clarifying follow-up useful, Z-Bo.

Fuzzy linking

Can you give me a practical example of what you mean? Does it mean the system allows a request whenever a subject has some capability that would satisfy it? Or does it mean the subject is required to explicitly present the capabilities that it wants to use with each request?

What if the answer was the subject is required to explicitly present the capabilities that it wants to use with each request?

A unique set of challenges

Build the system configuration out of rights amplification primitives, such as sealer/unsealer pairs and sibling communication. Alternatively, you could use an EQ operator, but using EQ is antithetical to this whole subthread since it ruins developing a good equational theory for a language.

Also, encourage developers to learn sound patterns for secure configuration, such as Grant Matcher.

Another big implication is that anything globally accessible must be transitively immutable.

anything globally accessible

anything globally accessible must be transitively immutable

In live programming, we must acknowledge that the entire program is mutable. Every function, constant, module, etc. is mutable, and we do access those. But authority to cause this mutation should be pretty well controlled - e.g. in many cases, it is not and should not be an authority held by the program itself. So I'd tend to weaken that constraint a bit. Try, instead: "authority for mutation must not be globally accessible".

A capability-secure program can be deeply mutable. But there must not be any hidden channels for communication between subprograms.

Getting imports right is hard.

There are (at least) two major concerns with importing libraries. And I contend that most posts here are ignoring at least one of them.

First, there is the question of how to determine *what* to import. Usually this is done by naming libraries and then looking for a matching named library to import.

Other schema are possible - for example looking for something that provides a named/specified interface, finding potentially several, and picking one to import. If you find several possible imports satisfying your interface requirement, you have the ability to pick, eg, the one that runs fastest, creates the fewest *additional* requirements, or requires the least additional space -- any of which can be valuable. If you do this, though, you must make absolutely sure you're importing something that *DOES* what you think it does; the interface must be checked against contracts and heavy use of assertions.

The second major concern, however, is the question of what an imported library is *allowed* to do - is it "mobile code" that needs to be sandboxed or is it trusted code that you don't need to protect your runtime from? Can it be permitted to access the local filesystem, access the network (and if so to which specific hosts and using what protocols)? Can it "see" or manipulate resources not passed to it as arguments generally? Can it be permitted to read or modify the local environment at the call site? Can it longjmp() to a call frame created by a completely different library, or by your runtime itself? Can it be permitted to treat expressions in its arguments syntactically (as, eg, a lisp macro does) rather than as subexpressions to be evaluated before it starts (as procedure calls in every eager language do)?

The above, even the insane things, are a big and incomplete list of things that executable machine code can in many circumstances do; if you're importing executable machine code that's going to run with access to non-virtualized hardware, it can completely rape your runtime access guarantees and language call semantics. Remember, raw machine code is not constrained to obey your runtime guarantees and may have been composed in raw binary completely bypassing your compiler and its "correct by construction" efforts.

Here is my point; if you make imports at all ambiguous in resolution -- that is, if just anybody can substitute one service for another and relink, or if there's any chance that a service will be chosen without human input -- then you're going to get malicious services that try to fulfill whatever interface requirements they need to fulfill in order to get selected and linked, but also try to install spyware or snoop the user's cache or send spam or whatever. In fact we're already seeing this to some extent with bogus and "snoopy" libraries, some of them from major media vendors.

The more flexible and powerful your procedure calls are -- whether that takes the form of high abstraction levels like lisp macrology or the form of low abstraction levels like raw machine code with non-virtualized hardware access -- the more imported procedures have a corresponding ability to *break* abstraction barriers if misused or malicious. Thus, the more you will have to exercise control over what imported procedures are allowed to do.

No set of assertions ensuring that the named interface is fulfilled correctly can assure that nothing else is done; you can't prove a negative. Therefore it becomes needful to strictly limit what imports are *allowed* to do, at the level of primitives or at the level of capabilities.

Bear

Human input on import

if there's any chance that a service will be chosen without human input -- then you're going to get malicious services

To be fair, human input doesn't really make a difference. Humans lack any real ability to inspect any code for obfuscated malicious intent. It's hard enough to spot accidental malign bugs and security holes that nobody is trying to hide. A human will do what humans always do: favor some sources over others based on who they decided to trust a long time ago (which updates slower than it should due to decision fatigue). In terms of code, that might be based on signatures or shared private registries. If we want to express anything stronger than a weak or moderate trust preference, the onus is ultimately on the machine.

Eliminating ambient authority is certainly useful for controlling and reasoning about what an import can do. I depend heavily on object capability model and a few simple patterns (involving fine-grained registries) to control what an import can do.

Proving any theorem about code will tend to eliminate both malicious and buggy code. As Benjamin Pierce notes in Types Considered Harmful, it doesn't much matter which theorems you attempt to prove. So proof-carrying code is also a great way to ensure safe and secure code. Type systems, linear types, etc. are ways of providing lots of small theorems to prove.

A follow-up

I just wrote a follow-up in an attempt to synthesize several conversations I had here and elsewhere, and to clarify some of the fog in the original post. (I'm especially interested in your reactions to the paragraph on reuse; that might be a fertile source of disagreement.)