Diseases in Code

- Synopsis -

What is Code Disease?

A code disease is a pervasive property of a code base that harms or destroys basic ease of software development and maintenance. Often propagating problems up to the business execution and strategic levels, code diseases are costly and risk-inducing. As its strong name indicates, a code disease is very serious and potentially threatening, with its first victims the morale and sanity of the developers who live with it daily, and its final victims the customers who are punished for relying on your business’s affected systems.

Please find the PDF here - Diseases in Code (rev. 5)

LtU is the first place I've posted this, so hopefully I can get a bit of feedback!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This is nothing but F# propaganda

You erect a set of criteria on which you say F# is strong and all other languages weak, and then claim that no other language is fit for purpose, all others suffering from "show-stoppers".

You may, of course, believe what you like and advocate for what you will, but you should not claim impartiality while you do it.

Troll comment.

Next time, please make a more substantial attempt at comprehending the material before firing back with a knee-jerk reaction.

Didn't think I'd have to say it, but I'm looking only for constructive feedback here. If I wanted the intellectual rigor of a peanut gallery, I would have published this at Hacker News.

Calm down

I'm not sure you should be throwing stones about intellectual rigor after posting that paper.

I mean it starts off by stating that code disease is a more concrete code smell, something that is actually a problem, and then lists some subjective, unmeasurable program traits which are caused by even more subjective unmeasurable code traits.

And then there's charts detailing "typical" programs and their disease but nothing explaining how these values were measured, or what you used as "typical".

And then I stop reading because you're just making stuff up.

"And then I stop reading

"And then I stop reading because you're just making stuff up."

I'd love to know the manner of rigor you applied to come to THAT conclusion :)

Am I the only one around here who finds it ironic how all of my arguments must be backed by full platonic rigor, while any objections to them may be unaddressably vague, accusatory, and knee-jerk?

Bullet 1 in your code

Bullet 1 in your code disease list is "Structural Complexity".

How is this defined, and how did you measure it?

And more specifically, how did you measure it to arrive at the results in the chart on page 6?

Thank you for this excellent

Thank you for this excellent question!

I essentially used the techniques described in the article linked to from the paper - Cycles and Modularity in the Wild. Measurements for 'typical' programs were more general and experiential - mostly from my study of and professional experience with these code bases.

I will make it more clear in the paper that this was how I was defining structural complexity. The rigor is there, but I need to better emphasize the location of it from the paper.

Lots of work left to do!

Okay, that link has a good

Okay, that link has a good description of the measurements used, but I don't see a correlation between it's measurements and code quality.

Why does having more top level types, or having more dependencies necessarily lead to problems? Why is that metric more important than others? And how do you tie language design to this end result while excluding other effects (programming environment, the relative education/skill of developers, size/style of standard libraries, etc.)?

In short, I don't see these sort of things documented, so I have to assume that the research wasn't done...

"Constructive criticism" is an equivocal expression

See my writeup on the problems with the term.

And now it is nothing Idris propaganda.

Added Idris to the document as was initially intended but elided due to the need for more research on it. Idris happens to rate higher than F# in this document's narrow purview, and now I have apparently become a well-compensated shill for the Idris programming language rather than F# :)

*** * ** *****

Where are all these star charts coming from? Best guesses?

My personal knowledge,

My personal knowledge, experience, and some informal reasoning (as I would like to instead call it. :) I wish I had more diversified sources for the data, but I know of no directly applicable prior work. Please feel free to enlighten me here.

I could attempt to document my justifications for the rating in each cells of both of the matrices, but that seems outside the scope of the paper, and more significantly, beyond my current time budget.

However, I'm more than happy to explain my justifications for any individual cell here and to modify them as I collect more / better information. But again, that's not really the intent of the paper.


Your article doesn't offer any way to identify a diseased codebase or understand costs to real companies, much less developer morale. The various charts have numbers without sources or argument. It seems to me your claims are neither falsifiable nor verifiable.

Your arguments might be better served by a few case studies or autopsies... examples of how 'diseased' codebases have harmed companies and developer morale. Data is not the plural of anecdote, but at this point even anecdotes would help.

Agreed. Unfortunately, I'm

Agreed. Unfortunately, I'm just a working programmer (stiff), so falsifiability is not a level of rigor I currently feel a need to pursue. Even an anecdotal study would require more ambition than I currently have :(

However, I really hope this paper could inspire someone else (most likely a future version of myself) to consider doing exactly these things as time becomes available.

Hopefully this paper will at least become a starting point for more useful, applicable, and rigorous work down the road. After all, I had to get my initial thoughts down somewhere!

Thank you kindly for your input so far!


Seems related to anti-patterns, about which a fair amount has been written, and many patterns identified.

The Language Deficiency Matrix seems very subjective. It would need a lot more substantiation to provide any basis for fruitful discussion.

Definitely related, what with

Definitely related, what with code disease being more concrete yet at the same time more general. Unlike a pattern, which is inherently abstract, a code disease is a concrete code factor that can be pointed out in a particular code base and be potentially resolved. A code disease may also have no anti-pattern to describe it, so the previous literature mostly misses the mark.

The language deficiency matrix is somewhat subjective, but likely less than one would initially suspect. Regardless, as you say it needs a lot more substantiation for discussion toward that end (if only to help people realize how objective it really is). Fortunately, the LDM is not at all the point of the paper, so it should not block too much discussion fruitful to the paper's main content.