Larger Subroutines == Fewer Defects

Why is the defect density curve U-shaped with component size?
This paper explores the empirical results of a number of recent (and not-so-recent) papers showing that larger software components are proportionately much more reliable than smaller software components within the same system up to a certain size after which they rapidly deteriorate. This is strongly counter-intuitive to basic notions of software engineering such as modularisation.

The paper first demonstrates that a logarithmic distribution of fault with component complexity closely fits the observed data over a range of component sizes and languages up to around 200 lines or so, (deemed medium size here), after which approximately quadratic behaviour is observed. The paper will review mitigating influences for this non-intuitive behaviour before concluding that none is really satisfactory. It then unites this complex behaviour in a simple mathematical model of the physiology of the human two-level memory system. The resulting component fault rate model accurately predicts the observed data for all languages in this study.

If this is true, how should it affect the design of programming languages?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More expressivity == better?

That seems to be the conclusion. If you are going to get optimal fault rates around the 200-400 line component size, then you want those 300 lines to do as much as possible.

I don't believe it

I've heard this argument before... and I don't believe it. My functions tend to be short; few are much longer than 60 lines in a traditional language, or about 30-40 lines in a functional language. My code tends to be short; usually a good 20-40% shorter than your average competent programmer. Usually, I don't end up finding many bugs after I'm convinced I have it right. The bugs that are found, I rarely have trouble finding. Often, I don't even need to find them; I know exactly where to go in my code to fix it.

It's very hard to control for intangibles (or near-intangibles) in software engineering. I believe it's more important to look at how well the task has been broken down. If this is well done:

  1. Each function will be reasonably simple
  2. You won't have too many of them
  3. Your functions will compose well to accomplish many different tasks with a minimum of glue to do so.

If this isn't so well done, your code balloons in size... which leads to errors.

I don't believe most programmers are skilled at creating their own functions. They have the ability use functions that somebody else gives them, but they lack skills to create functions to model the task at hand.

You have to remember...

...that this is a statistical model, so it could be perfectly valid yet consistent with your personal experience. The model is showing the fat part of the bell curve, and you could be on a tail.

You are right.

I was responding to using this as an argument that one's functions should be 100-200 lines in length. And in fact, I got into a rather heated debate in school years ago with a software engineering geek who made this exact argument.

I made the mistake that this paper was making the same statement... whereas it is simply trying to explain the observation. I suppose it goes to show how extraordinarily tricky it is to draw meaningful conclusions from statistics, especially conclusions that prescribe a course of action.

Yes, my functions tend to be

Yes, my functions tend to be very short also, but I don't have measurements which show that they're less buggy than longer ones.

One such principle which I subscribed to for many years is the belief that
modularisation or structural decomposition is a good design concept and therefore
always improves systems. This is so wide-spread as to be almost
unchallengeable. It is responsible for the important programming language
concept of compilation models which are either separate, with guaranteed interface
consistency, (e.g. C++, 'new-style' C, Ada and Modula-2), or independent (e.g.
'old-style' C and Fortran), whereby a system can be built in pieces and glued
together later. It is a very attractive concept with strong roots in the "divide and
conquer" principle of traditional engineering. Of course, the proof of any such
concept relies on substantiation by the observation and measurement of real
systems.

In fact, I'd really like to see a study where my preconcieved notions were comfortably reinforced. Is there one available?

Mental Limits?

Assuming the above is generally true you could make a reasonable argument based on complexity and the limits of human cognition. It could be the sweet spot between highly connected small units, and sparsely connected large units.

Proper input validation and error checking (good for removing the semantic gotchas that would appear as bugs) would probably push up the tipping point as well.

For a language design POV I think the lesson is a familiar one, make the semantics of a component as surprise free as possible if you only have the interface to go by. Reading the implementation of dozens of components is unproductive.

Also from Les Hatton: Failure

Also from Les Hatton: Failure-aware programming: an introduction and some predictions

"There is an abrupt minimum for components of size 1. This may indicate support for systems built from effectively single line components for example, in functional languages, that they are the most reliable way of implementing a system, (for which the author confidently expects to be made the patron saint of functional languages)."

Apparently, in "Code Complete" (which, unfortunately, I do not have) section 5.5, Steve McConnell cites various studies which correlate routine size and cost, defect rate, and understandability.

e.g. "Routines averaging 100 to 150 lines of code need to be changed least."

See: Characteristics of High-Quality Routines, How Long Can a Routine Be?

The studies referenced appear to examine assembler and Algol-derived languages; maybe this explains the unexpectedly high recommended line count (we all know how verbose these languages are ;-)

Another aspect.

The studies referenced appear to examine assembler and Algol-derived languages; maybe this explains the unexpectedly high recommended line count

And they're all procedural languages without native support for garbage collection. I'd really like to see the studies repeated for a wider variety of languages and paradigms (Python, Java, Haskell, etc.).

Re: Another aspect

Yes, I agree. In languages like Factor, you just can't write a function that's 200 lines long. You can't write a function that's 20 lines long, even. However, most functions are not independent components. While they can be used seperately and are useful, they are heavily integrated with the surrounding code; they're not seperate and in need of "gluing". I can't think of one example where I've needed to glue something together in Factor.

I think one important thing t

I think one important thing to have in mind is that this rapport only show a correlation and not any causality. It could very well be that people use smaller funktions for difficullt and fault prone code. And larger funktions for more routine tasks.

Interesting Correlation

As noted above, the languages used by the papers referenced are all old traditional favourites (C/Pascal/Ada/etc). I find it interesting that in more modern languages (Java/C++/Scheme/ML) I have found a sweet-spot to be 20-40 lines per method, but 200-400 lines per class/module! Which correlates nicely with the observation in this paper about functions in the older languages.

I'll also note that my personal experience suggests that the above figures remain manageable (although not ideal) within an order of magnitude, after which they rapidly become difficult to maintain.