Entanglement and Import by specification.

I've been thinking. Sorry, I can't help it. There is no 'off' switch on this thing.

A ubiquitous problem in programming is Entanglement.

We want clean, documented, specified interfaces, but it is all too easy to create an inadvertent reliance on undocumented, unspecified features of a particular implementation, which may or may not be correct according to the actual intended/specified interface and which may or may not be carried forward or repeated in future implementations of that intended interface. This is a problem because it means code breaks later, when a new implementation comes along.

Another Entanglement problem is that when we import a given library, we implicitly import all of that library's dependencies, and all of their dependencies, ad nauseam; often the transitive closure of stuff that gets imported contains a lot of duplicated functionality where subcomponents relied on "similar" services by importing different libraries where they could reasonably have imported the same library.

Finally, it is all too easy to create an implementation of some specification which fails to actually fully meet that specification, and not realize the fact because it works for the particular instance of our program. This is a related problem, because use of the library then causes bugs in other programs.

It seems to me that we don't really want to import ephemeral, overspecific, and possibly buggy implementations of functionality; we want to import hopefully less-ephemeral _specifications_ of functionality. Instead of saying "give me library foobar, located at /path/to/foobar" we should be saying "import a library which implements the foobar specification, where the specification file is located at /path/to/foobar."

The distinction is admittedly small from the programmer's POV, but this would allow the linker to pick an implementation (where multiple implementations meeting the spec are available) to optimize performance, or size, or other considerations. It would also provide for an executable specification that tests all (and only) the functionality guaranteed by the specification when some programmer asserts that their library meets the spec. Where different subcomponents import similar functionality from different sources implementing the same spec, the linker could simply relink them all to a shared source, eliminating redundant branches of their dependency trees. And finally, we should be testing our libraries anyway, right? Why not build the test suite directly into the standard library, along with more than one implementation of each? In using them, surely we'd discover bugs a lot faster.

If code breaks when a different library implementation is swapped in, then it means the specification test suite is failing to test some property that the code relies on. That means that either the specification test suite is incomplete, or the code that purports to rely on an import meeting exactly that specification is wrong. This would make detecting inadvertent Entanglement issues very much easier. Likewise, an executable test suite for each library being part of a language's standard would make a huge contribution to bug testing all new or updated versions of the standard library, allow providing multiple library versions optimized for different constraints and purposes transparently, and make it absolutely clear what a specification does and does not guarantee.

Parameterized modules have been discussed here before. Gilad Bracha has blogged about it. dmBarbour has posted here about linkers as constraint solvers.

I imagine providing a debugging version of each standard library, created specifically to protect against Entanglement. These should implement all of the spec -- and, where possible, do absolutely nothing else reliably. Where the spec doesn't specify, for example, which element of a set is returned or what order things are evaluated in, the debug library versions should deliberately randomize these things. Where the spec says an error may be thrown in the library, the debugging version should occasionally do so for no reason at all, exposing code that has no error recovery capabilities. Once your program runs correctly in debug mode (using the debugging versions of all libraries), you can simply relink in performance mode.

So, anyway, I'm just throwing this out there for your consideration and feedback.


Eventually consistent distributed STM

An implementation based on a paper I posted on LtU a couple years ago. It turned out to offer a great programming model, which looks like REST but dynamic, like Google Docs vs. a static Web site. The eventual consistency aspect allows server scalability and offline modification of data for clients. Versions merge later when connectivity is back, like distributed source control. link