Library maintenance - key to language success?

The process by which language libraries are maintained may have a bigger impact on language success than generally recognized.

A common event for developers is finding a bug in some crucial library function. The big question is, what happens then? There are several models:

The library is commercially supported.
1. The vendor is responsive to bug reports. The user submits a bug report and the problem is fixed.
2. The vendor is not responsive to bug reports.
The library is supported by volunteers.
1. There is an community supporting the library, with multiple people permitted to make updates.
2. The library is supported by a single person who is active in developing the library.
3. The library is supported by a single person who is not active, and no one else can change the code.

In case 1.2, one has the option of generating negative publicity for the vendor, which sometimes helps.

GCC and Perl (CPAN) third-party libraries are under organizational source control, so the absence of one person doesn't mean the code is never fixed. So they're in case 2.1. Open source projects can stall out in case 2.3. This is common in the Python third-party library world, and Go seems to be taking that route.

This is an organizational issue, not a licensing issue. It's quite possible for a library with an open source license to get stuck in case 2.3. Anyone can potentially do a fork, or write a replacement, but that's only worth the trouble if there is a serious need for the library. A sign of this problem is finding many packages for a language which do essentially the same thing.

Library maintenance is

Library maintenance is highly relevant, especially for volunteer-driven projects, and mostly underrated. Once you have launched a core language (compiler, interpreter) which stirred some substantial interest and traction, the next most important thing is to provide a clear platform how other people can contribute code. You can never cover all those application areas, protocols and frameworks yourself which users of your language will want to tackle. On the other hand many people are able and willing to contribute a module or a library that covers some useful part of an application domain. Missing out on that will hurt your language in the long run.

Perl and its CPAN are a shiny example. My take is that its topical structure is the basis for its strength. You have very general top-level categories (network protocols, text, graphics, databases,...) and at least some effort to replicate topical grouping on the second level (Syntax::Highlight::HTML, Syntax::Highlight::Perl, ...). This maps to module names that you use in import ("use") statements in code. Someone looking for a library can zoom in along the problem domain, can easily find suitable modules and can compare what's there.

A module author usually started out as a module user who didn't quite find what he was looking for, and set out fill that niche. CPAN makes it easy for him to find the right spot for the new module, give it a sensible name, register his module and upload it. There is help and aid to come up with the right structure for your module and fulfill the necessary protocols.

Python does a worse job on that. Python has name spaces like Perl, but its standard library has no structure at all (apart from some lone 'xml' and 'email' ... name spaces), maybe in a misguided attempt to keep standard library names short. Modules that belong together conceptually are separated by entirely different names. This carries over straight to Python's version of CPAN, called 'pypi' (Python package index). A huge flat space of module names, no structure, no grouping by name spaces. You can search the web site by keywords and hope that the index brings up a sensible list of moduls. No way to search systematically, no topological grouping, no corresponding naming scheme, and hence little systematical discipline for those writing modules. Python has a flourishing ecosystem, but I'd say in spite and not because of its library organization.

By thomash at Mon, 2013-01-14 21:46 | login or register to post comments

CPAN is a library archive

CPAN is a library archive which holds code. PyPi is a directory of links to libraries hosted elsewhere. CPAN has a process for transferring control of an existing module when the maintainer is unresponsive. PyPi does not. This may be a significant reason that Python, which is generally considered to be a better language than Perl, has not supplanted it after two decades.

A system for library maintenance which doesn't have strong "ownership" issues is a design problem worth working on. This is both a technical problem and a social problem. "Wiki" type systems successfully deal with the ownership problem, but allow too much churn for code. Nevertheless, looking at how Wikipedia does things is instructive.

An interesting approach would be a test-driven code management system, where multiple people can edit but new releases must pass tests before they can go live. This is appropriate for libraries, which are usually not too tough to test in a scripted way. If you submit a bug report and a test which elicits the bug, then you or others should be allowed to submit a fix. The automated process checks that all previously passed tests still pass, and that the new test also passes. Then some human approval should be required, but not necessarily that of the original developer.

Tuning the social aspects of this is the hard part. It's worth working on.

By John Nagle at Tue, 2013-01-15 04:01 | login or register to post comments

Although I see the advantage

Although I see the advantage of a unified code store I actually like the idea of separating the concerns of library organisation and code storage. The Github model of maintaining code, with repository forking and pull requests, is probably too successful to be ignored. In this regard I think PyPi is well suited. It leaves the question "where" to the code author(s), and also how they organize collaboration. In case of an abandoned module, a fork could take over and create a new entry in the directory. Users would realize over time that the new entry is actually better than the original, and the original library would dry out in terms of usage and/or test compliance. This would work as a pattern of shifting ownership.

But even in this model of distributing library organisation and code maintenance the library organisation would want to unify some aspects: (a) File/folder structure of a library, including standard files like MANIFEST, LICENSE, etc., suitable for the accepted package manager. (b) Build process, i.e. a unified interface how a module is built on a client platform. (c) Test interface like you wrote, i.e. a unified way to run unit tests and test coverage over the library.

These seem to be solved satisfactorily with CPAN in the Perl world, while Python's easy_install vs. pip vs. ... and no generally established testing and coverage framework (AFAICS) struggles.

By thomash at Tue, 2013-01-15 11:34 | login or register to post comments

In this regard I think PyPi

"In this regard I think PyPi is well suited. It leaves the question "where" to the code author(s), and also how they organize collaboration."

That's the problem. The code author retains control, and the only way to fix something in an abandoned package is to fork the entire package. There are five packages in PyPi for parsing ISO standard dates, all with different bugs and none of them completely correct. That's where the "author retains control" approach leads. I'm arguing that we need a source management model that doesn't work that way.

("easy-install" is generally considered to be a disaster. It has assumptions about where things are stored, and when those assumptions are not valid, it fails. This happens a sizable fraction of the time. "python setup.py install" generally does the right thing.)

By John Nagle at Tue, 2013-01-15 19:15 | login or register to post comments

There are five packages in

"There are five packages in PyPi for parsing ISO standard dates, all with different bugs and none of them completely correct. That's where the "author retains control" approach leads. I'm arguing that we need a source management model that doesn't work that way."

But even if you have some process to revoke ownership, the next author taking over the package can fail the same way as the previous, fixing some bugs and introducing new ones. It all hinges on the authors and their commitment and capabilities, unless you have a group of elite hackers at hand that can be assigned to flawed packages and rescue them. But this is probably very unlikely, at least in volunteer-driven projects.

There might be more promise in exposing similar packages to the users, making them easily comparable, penalize flaws and shortcomings, and reward quality code (e.g. by a StackOverflow-like reputation system). It's the old strategy of letting users rate, and give authors incentives to perform. Or am I missing something fundamental?!

By thomash at Tue, 2013-01-15 20:26 | login or register to post comments

" If you submit a bug report

" If you submit a bug report and a test which elicits the bug, then you or others should be allowed to submit a fix. The automated process checks that all previously passed tests still pass, and that the new test also passes. Then some human approval should be required, but not necessarily that of the original developer."

I realized that you are already proposing a different scheme where "authorship" is no longer tied to "ownership", but replaces it by a (nearly) automated authoring process based on unit tests. I wonder if that might work. Even if the already existing tests are written carefully enough to cover up the basic contract of each function or method, it might be too easy to come up with a failing test that, when fixed, effectively changes the spec of the unit.

By thomash at Wed, 2013-01-16 08:42 | login or register to post comments

Re: library maintenance is

No way to search systematically, no topological grouping, no corresponding naming scheme, and hence little systematical discipline for those writing modules. Python has a flourishing ecosystem, but I'd say in spite and not because of its library organization.

I can understand the sentiment, in any role that we consider (programmer, system developer, language developer etc) each of those properties seems more than desirable. My natural assumption would be that they are necessary because they ensure that the large scale structure of the language (basically the ecosystem around the language itself) is understandable. That it can be explored and navigated by users.

However the success of Python does not imply that it is a corner case that suceeded despite a lack of necessary structure. It could simply mean that our assumptions about the need for that structure are wrong. I tend to use google as a master index for all python documentation regardless of what type of project / library / system component it is that I am looking for. Weakly organised, incomplete search appears to trump systematic structure for my uses, even if I am not sure why.

By Andrew Moss at Tue, 2013-02-05 10:57 | login or register to post comments

Each case has its uses. Each has its problems.

Hierarchical structures are good when you don't know the terminology relating to what you're looking for. In that case you can narrow it down, discovering things one level at a time until you home in on something that answers the need. If I want to find an unfamiliar library to do an unfamiliar task I'm looking at in the vague language of a spec created by managers (or worse -- by salespeople), I will resort to the pain of navigating hierarchical structures hunting for a library.

If you *do* know the terminology of what you're looking for, hierarchical organization is just in your way. You at least want to be able to hopscotch the process of finding stuff, and would definitely prefer just entering the appropriate buzzwords and getting to a page of results. In the limit case, of course, you know the *name* of the library you're looking for and it isn't a string of letters that also resembles a common word. Such 'unique names' allow you to jump immediately to relevant results in a search engine.

The problem with search is that when it's been dumbed down for nontechnical use, and you don't know a unique name, it's not sufficiently precise. Google et al want to search for stemmed forms and related words and etc, and no longer allow you to turn that behavior off. Further, they no longer respect boolean conditions and connected-phrase requirements (or even specific-word requirements!) that once permitted people with more precise needs to make more precise searches.

When you're searching on technical vocabulary that has very precise meanings not shared by the stemmed forms and common-language related words, that just clutters up the results you're looking for, or worse, buries them under tons of irrelevant-but-more-popular pages.

So, I tend to agree that a search engine is appropriate for navigating to a desired library; but unless you have the good fortune to be searching for a specific thing whose unique name you know, that search engine is not Google. That search engine is something that still provides full-text boolean search and respects connected-phrase requirements.

By Ray Dillinger at Wed, 2013-02-06 02:12 | login or register to post comments

Although I see the advantage

[moved to reply]

By thomash at Tue, 2013-01-15 11:32 | login or register to post comments

the complex task of managing libraries

CPAN is a system for managing code; it involves a lot of tasks such as

- documentation extracted from sources / presented in web site.
- installation
- managing dependencies between libraries
- procedure for building native code libraries
- tests framework / common requirement of having tests with library.

and of course non functional political requirements
of managing library ownership

So managing all this is quite a task; in Java there is maven that handles the technical aspects of the cycle, but there is no central repository like CPAN that collects all the open source goodness.

By Michael Moser at Tue, 2013-01-15 21:07 | login or register to post comments

Go has the same problem

I was just checking out Go's approach to third-party libraries. See "Libraries Written in Go". This has the usual problem. There are, for example, five packages for connecting to MySQL, by five different people. The first one is broken, the second one worked when I tried it, #3 and #4 I haven't tried, and #5 has gone missing.

There's no reason for five packages; these all have to do essentially the same thing. What users need is one that Just Works. But there's no process here which converges on a single good package.
This sort of thing makes managers unwilling to use a new language for real projects.

It's the social part of the library development process that's broken. Someone more social than me should work on this.

By John Nagle at Wed, 2013-01-16 01:07 | login or register to post comments

wiki style

Has it ever been tried to adopt the wiki style model of collaboration for maintaining the libraries of a programming language ?

Under wiki style model I would understand no code ownership, anybody can edit any code but conflicts have to be resolved by some sort of voting procedure, and moderators can review / revert the edits.

By Michael Moser at Mon, 2013-02-04 12:08 | login or register to post comments

I have proposed this before,

I have proposed this before, and I'm working on what I think is necessary to achieve that, like finding anything usefuil in a global open namespace.

I think its a beautiful idea that we should aspire to, but it won't be very easy.

By Sean McDirmid at Mon, 2013-02-04 23:45 | login or register to post comments

Entanglement of wiki-based

Entanglement of wiki-based modules is one of the bigger problems for which I've not found a complete answer. Traditional wikis gain much value from entanglement, i.e. where every page is directly linked to other pages. But that same entanglement seems to be problematic for a programming models. Entanglement hinders partial reuse, version and configuration management, live programming, etc..

I have found a lot of partial answers, i.e. with linking based on content search constraint satisfaction, and reifying the module systems and platforms within the language itself. I think a global namespace can work, but can easily grow problematic unless we're really careful about what gets named and how names are allowed, through code, to entangle with other names.

Wiki-based programming environments are a very promising idea. But I think they're easy to do badly, and will require careful PL design to make them work well.

By dmbarbour at Tue, 2013-02-05 00:14 | login or register to post comments

As I said in my last paper,

As I said in my last paper, I agree with Jonathan Edwards that names "should be for users, not compilers!" How textual names are bound to constructs should be a concern for the IDE/search engine, which has to instead deal with the global namespace.

Dependency is a much harder problem to solve. I don't think we should spend too much effort isolating codes from each other, and maybe just spend more effort on awareness on when code entries become broken so they can be fixed. Really, success is mostly about setting good editorial policy.

By Sean McDirmid at Tue, 2013-02-05 00:52 | login or register to post comments

Controlling entanglement of

Controlling entanglement of constructs, and hence controlling naming, is for the users who will better be able to maintain, experiment with, share, and reuse code. Compilation and usability concerns are, in practice, deeply related. Idealistic aspirations about what "should be" separable concerns do not make them so.

Success is mostly about good infrastructure. Discipline, foresight, and "good editorial policy" can help in that last mile, but failure is the common and predictable consequence of chaining several of those 'last mile' efforts together, of relying too heavily on discipline. Naturally, since we spend most of our efforts on that last mile (e.g. 90% of effort on 10% of code) we might get the impression that discipline, policy, and foresight are what matter most. But that impression is a consequence of observational bias and attribution error.

Any solution to problems associated with managing dependencies will ultimately be embedded in infrastructure.

By dmbarbour at Tue, 2013-02-05 03:52 | login or register to post comments

I think people are too buggy for wiki style to work.

Consider questions of security. Wiki Style has no accountability. Wikipedia has endless debates about what to do about all the people who want an entry for "Thomas Edison" on the "Douchebag" disambiguation page, and at any given moment, some page you access may be in the few brown moments between vandalism and reversion. Now consider that while Wikipedia vandalism happens mostly for "grins and giggles" by people who get bored after a few times and go away, there are people who have persistent financial and ideological motives to insert attacks into a software library.

I would anticipate admin effort in reviewing code and reverting changes inserted by people who want the sorting libraries to, eg, detect when they are being used to sort email addresses and send those addresses to a spammer, etc, being far greater than the effort in developing reliable libraries by hand.

The issue is that with code libraries, the ABSENCE of certain things is very important, and there are hostile elements that have financial and ideological motives to insert those very things into code that other people use.

By Ray Dillinger at Mon, 2013-02-04 17:19 | login or register to post comments

Wiki style can work fine,

Wiki style can work fine, and it won't take special admin efforts. Automated tests and type checking (or contracts, or other proofs) would generally be sufficient to control against breaking changes. Security can be addressed by having a network of micro-Wikis (e.g. a project wiki that inherits from a public wiki), and also by securable programming models (e.g. object capability model, generic programming, linear types).

By dmbarbour at Mon, 2013-02-04 18:19 | login or register to post comments

Or better yet, maybe code

Or better yet, maybe code should degrade if no one cares about it! That is, the wiki exists, there are some protections against stupidy and vandalism, but really if someone breaks its just someone's responsibility to fix it or throw it out and replace it. We don't have to get very fancy here, not many expected the first wiki to "run itself," but it did!

By Sean McDirmid at Thu, 2013-02-07 05:59 | login or register to post comments

If you aren't entrusting

If you aren't entrusting anything important or valuable to the code, that might be okay.

By dmbarbour at Thu, 2013-02-07 06:52 | login or register to post comments

Nobody entrusts wikipedia

Nobody entrusts wikipedia with anything important or valuable, yet it has become very useful.

By Sean McDirmid at Thu, 2013-02-07 09:06 | login or register to post comments

Wikipedia has fault-tolerant

Wikipedia has fault-tolerant interpreters. Those interpreters typically know how to correct for errors, recognize vandalism, resolve ambiguities, and search edit histories. Also, the dependencies between pages are quite weak both spatially and temporally; a change in one page has very little effect on the meanings of other pages. I would not expect similar tolerances from an automated interpreter of code. In these respects, extrapolation from Wikipedia or C2 seems optimistic.

By dmbarbour at Thu, 2013-02-07 10:31 | login or register to post comments

Yep, I'm sure something

Yep, I'm sure something similar mechanisms would evolve for a code wiki. But trying to think of and solve all the problem upfront won't be very efficient.

We definitely should come up with some new ideas, but I'm guessing someone eventually will just throw something up for a dynamic language Python without much fuss with a few simply but innovative ideas and it will be wildly successful; we'll all feel bad that we didn't act faster.

By Sean McDirmid at Thu, 2013-02-07 10:39 | login or register to post comments

lots of bad programmers and low standards

This comment has to be anecdotal because it is so subjective. It could be seen as a free-floating insult applicable to "most" working programmers. I'll describe the problems I see in terms of the specific example of web site software but I mean that only as an example: similar problems apply as well in other popular categories of software.

A hell of a lot of people seem to make their living writing and deploying poorly engineered software systems. For example, the low end of web site design and implementation is notorious for its proliferation of sites that are terrible to use and often buggy that, instead of being seriously fixed, either (1) muddle through indefinitely in that state; (2) fail and go away; (3) fail and get replaced with new software with a different set of problems.

Robustness, maintainability and extensibility are qualities of some software systems that are rarely of any demonstrated value in the marketplace.

Conceptually simple features rarely get built if they don't fall out nearly "for free" from the framework.

Many tried and true frameworks are both language-specific monoliths and... barely work. The most popular paradigm for a fixing an issue during after implementation is not to look deep and get to the root of the problem, but to search "stackexchange" and similar resources to find the right magical alchemical formula to clear a "problem ticket".

Quite a few working programmers who produce this kind of product write publicly about their approach, either through blogging or questtions posted various help forums. In isolation their comments seem innocent enough but if you read enough of them the impression you get is of an enormous, handsomely paid pool of incompetence and, frankly, widespread disinterest in fixing the systemic problems. People just want "solutions" that get them through the next day.

The side of the road is littered with high concept projects that die for lack of economic support before any value from them can be realized. Off the top of my head, I would have to struggle to find any current public figures who are programmers known for "doing the right thing" and developing really smart, well-built systems.

The market for such shoddy labor and such shoddy projects has for a long time been quite robust. Customers are buying. Companies are making a profit. The people who type in the source code are getting paid and often they are paid rather well. No clamor is heard for "higher quality" or "more features", not as a rule.

The average poster to "Hacker News" (Y Combinator) seems to be fascinated with the question of what kind of system they can slap together in a weekend that might gain enough buzz to make them rich. The financiers in general seem to be fascinated by extracting as much money as they can from such crap.

I am middle aged, I guess. If I look back at the list of good programmers I "grew up with" -- the folks who really valued knowing what they were doing and doing it well -- most have ceased being software developers at all except for a few who are senior or management-level custodians of legacy systems they didn't build.

It was not like this in "the old days". In the old days, people who could program at all were rare and it was more common that professional success fell to those who could program well. Nowadays, in contrast, it is common to discourage young people from studying and pursuing a CS career because the market is saturated and seemingly satisfied with (at best) mediocrity.

So...

Libraries.

This is an organizational issue, not a licensing issue.

I don't buy that. There is no form of organization (short of "training" and "paying") that will magically produce high quality libraries. There are no customers for high quality libraries, at least to a first approximation.

New languages that are potentially good have trouble gaining acceptance because the average programmer can't appreciate them. The average programmer can't appreciate them because language innovations solve problems the average programmer will never face or even understand.

The shoddy condition of our repositories of libraries is reflective of both the supply and the demand sides of software economics.

We are redundant.

By Thomas Lord at Mon, 2013-02-04 20:53 | login or register to post comments

Mediocrity is in a perverse way a hard-won success story.

Consider this; those of us who hack languages and dev tools and frameworks and libraries have, as our general goal, making it easier for people to write programs.

Insofar as we succeed, it becomes possible for people who have not the skills that were once required of programmers to write programs. They give vague instructions, don't think precisely, don't happen to think of better approaches or more maintainable designs, and hack together preassembled and redundant parts that really aren't the best parts for the jobs.

The fact that they can succeed in producing something that works at all is nothing short of miraculous, and a testimony to the effectiveness and accumulated effort of the developers of languages and tools.

If you are looking for programmers who are absolutely sharp, ultra-careful, and fully understand programs and the implications of the changes they are about to make before they go making changes, you will find them whenever you look at the people who write machine code using a raw binary editor on a 16k machine. The fact that most people don't take that kind of care now -- and don't have to -- is a great if perverse testament to the success we've had so far in tools development -- and to the fact that CPU and memory are now in such great supply that even code which profligately wastes them can be considered useful.

The challenge now before us is even greater; next-generation development tools must take these heaving masses of barely-working crap and transform them into simple, efficient, easy-to-understand code with a minimum of redundant parts, while preserving desired behaviors and making those behaviors (and therefore the bugs in them) orders of magnitude easier to understand than they were in the source code as originally written.

It's my opinion that the source code as written by the developers is of increasingly little value going forward, for precisely the reasons you mention; most developers either don't give a crap or have deficient skills.

Only when we can develop systems that transform stinking code into something not just more "optimized" but also cleaner and better-designed, will we make it possible for the same skills-deficient people who write stinking code in the first place to see how the behavior *should* have been produced or how the behavior would be most simply or cleanly produced, and therefore understand how code that someone else wrote works (even if that code was originally written in a horrible stinking form) and find the bugs in it.

IOW, people with deficient skills are writing source code that's such a mess, and programs which are so badly designed, that it requires dev tools capable of substantially cleaning it up and redesigning it before people with the same level of deficient skills can also understand and work on it.

Of course, if we can analyze it so well as to redesign and clean up, we can also optimize it to make it orders of magnitude more efficient. Not that it's going to be easy.... But it's just what has to be done.

But remember this. When you make something idiot proof, nature responds by producing more and better idiots. So effectively, such tools will merely serve to redefine what "deficient" means when talking about skills.... 'Old guard' programmers who we'd see as having deficient skills will still be making the same crotchety observations in fifty years that we're making now.

Ray

By Ray Dillinger at Mon, 2013-02-04 22:21 | login or register to post comments

standards, feedback, avoiding shoddy outcomes

(The wiki sub-topic is interesting.) I have trouble picking a starting quote, so let's try several :-)

John Nagle: A common event for developers is finding a bug in some crucial library function.
Tom Lord: The market for such shoddy labor and such shoddy projects has for a long time been quite robust.
Ray Dillinger: When you make something idiot proof, nature responds by producing more and better idiots.

I like Tom's rant a lot — especially sociological parts of it — and I agree with most of it, but no one part gives me a place to launch a comment. And Ray's reply is good enough. So responding to John's OP may work best, while favoring Tom's focus on shoddy common results (but this may still meander).

I've been thinking about writing related libraries under MIT license with one library mostly about green threads and rewriting C into async form via continuation passing style that targets green threads. (Yeah, yeah, unix-like green processes imitating Erlang style practices, why not.) But I expect anyone using such libraries to rewrite them into incompatible versions, with massive forking and incompatible branches. It would be a nightmare unless there were a lot of tests, so that's a priority: some form of self-checking built into libraries, plus docs specifying generally what is supposed to happen, since otherwise you can't even audit whether tests verify what they ought.

I had in mind pursuing a third model added to John's list: The library is unsupported at first (if ever), but forking is encouraged. That would only work if docs made enough clear, and tests checked any part you wanted to depend upon. (This would be a crazily thorough amount of tests.) As a joke, docs would pursue a story about 26 brothers, named alphabetically A to Z, as a metaphor to explain what happens as they fork and version libraries with erratic merging. Younger brothers would naturally hate their elders for not doing everything perfectly in early releases. Absence of a standard is an intended part of the model. Fixing bugs is easy, but working on another version is hard when it's randomly different.

If there was a simple high level language in the mix—which clearly makes sense—how could it be successful? There wouldn't be a standard version, if balkanization was the norm, unless someone pushed one fork as a known target. It seems like some form of standardization is required to get a status called success. But then you're shut out from having instant gratification to make changes. (This is just another generic static vs dynamic conflict.)

After lots of experience, my personal opinion is that shoddy software is often caused by incomplete docs and tests, since this impedes checking quality, and tolerance by the market lets it happen. I want to relate a good anecdote about this, but I also want to avoid smearing anyone's reputation, so anything I say had best be laundered to bland vagueness. Still, there's a consistent principle: most folks will go to inordinate length to avoid experiencing any form of negative feedback, even if this was the only feedback that would lead to improvement. For improvement, criticism is much more effective than praise, but folks avoid it like they do acute pain. I used to entertain coworkers at lunch who asked, "But why do they do XYZ?", by always saying it was to avoid exposure to negative results.

Tom Lord: ... folks who really valued knowing what they were doing and doing it well -- most have ceased being software developers at all except for a few who are senior or management-level custodians of legacy systems they didn't build.

It has become unfashionable to write descriptions of software before it is developed, and this is a big problem. A coworker is one of the best engineers I've worked with, who developed at Tandem on high availability software, with a strong focus on maintenance and stability in the face of change and system failures. We both complain about interacting with interfaces whose behavior has not been specified, even informally, except in high level hand-wavy language in marketing docs. (And when we review coarse project docs and criticize, "This won't work unless you address problem ABC," the software still reaches ship date without addressing problem ABC, which is now an open problem in bug tracking, because criticism is just ignored.)

I'm now early 50's and some coworkers are older still, by as much as a dozen years, and we're still cranking away at development — perhaps because we were more interested in doing the right thing than becoming rich. (During the dot com era, focusing on money seemed to work much of the time, for lots of folks, and they retired.) Writing bespoke code from scratch from high level design down to bit twiddling and unit tests still happens, but maybe opportunities to do this are less widespread now.

My biggest problems now are caused by layers above me and below, written by folks who are no longer with us, whose behavior has never been described in plain english, and whose code does not contain comments, by and large, nor is qualified by any other sort of test than a complete running monolithic system. Many variables have exactly the same names as types, and many method names are substrings of one another, so it's impossible to search globally without dozens or hundreds of false positives. Deciding whether code you see has a bug is hard when you have no means of establishing what it was supposed to do.

This has given me a lot to think about, and informs my interest in libraries which permit testing what they do — as well as runtime inspection of actual behavior in a normal operational production system — since without means of detailed oversight one cannot criticize and audit in pursuit of quality to whatever degree is desired. Without being able to falsify any hypothesis easily, bounding possible behavior of a system is difficult. I'm far more interested in doing that than using a standard system which I can't examine myself for empirical results.

Everything that degrades accountability is a problem. While I like the idea of wiki style organization of code and docs, I hate to see obfuscation of purpose in docs or cause and effect in code, and social pressure to be nice in a wiki environment may encourage folks to remove means of getting negative feedback. When someone asks, "Does this code make me look fat?" the answer should be an honest one and not, "You look fabulous." Incentive to clarify purpose of code should be present. Otherwise folks have an incentive to hide why their code does anything. (Moe says, "You can't prove I made a mistake if you have no idea what my code was supposed to do. It does whatever it does, on purpose, just like I meant.") Code's objective must be stated independently of realization, at least briefly and informally, or else you can't audit how they correlate.

One last thing: once you start idiot-proofing something, you should take it as far as you can go. I once wrote an api whose implementation aggressively checked everything it could, so it was very likely to assert when used incorrectly. But the guy writing to this api responded by being less careful, since he was likely to get an assert on casual mistakes. Then I realized I should have added code to detect randomized api calls, since it was now likely to occur. For any contract requirement a caller must obey, if you don't check it at runtime, they may not bother to ensure this once you add any safety net at all.

By Rys McCusker at Thu, 2013-02-07 03:09 | login or register to post comments

Lambda the Ultimate

User login

Navigation