Metrics for Programmatic Complexity?

Are there any current metrics available for programmatic complexity? For example, there are computational complexity classes for different types of programs, and big-Oh notation for runtime complexity of different algorithms, but is there a measure of how hard it is to actually program a given task in a given system?

For example, let's say the task is converting a list of numbers representing miles into kilometers. In C++, the code would be something like

float *miles_to_kilometers(float[] mile_list, int length) {
  float *kilo_list = new float[length];
  for(int i = 0; i < length; i++) {
    kilo_list[i] = mile_list[i] * 1.61;
  }
  return kilo_list;

In Perl, it would be something like:

sub miles_to_kilometers {
  my @result = map { $_ * 1.61 } @{$_[0]};
  return \@result;
}

My intuition would be that the Perl version would have a lower programmatic complexity because the loop is built-in rather than explicitly coded. Is there a metric available that evaluates such propositions?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I'm not aware of many good

I'm not aware of many good generic metrics, but a custom language vs. language normalization can work. This means looking at language A and language B, and coming up with a single unit of complexity that can be measured in both languages. You have to make logical but mostly informal arguments about how constructs in each language map to the single unit of complexity. This is what I did in my dissertation, which attempted to compare complexity between Java and SuperGlue (the language described in my thesis). You can check out the evaluation section of my dissertation if you are interested in the gruesome details (off of http://lamp.epfl.ch/~mcdirmid).

LoC

I like Lines-of-Code as measured by wc -l as a metric of program complexity. Haskell scores well on this example:

milesToKilometers = map (* 1.61)

My favorite metric is

My favorite metric is described in Are Ours Really Smaller Than Theirs? where Haskell shines even more. ;)

More snippets

Y'know on second glance currying doesn't bring that much to the table in this basic example. Compare with the only slightly less nice Erlang:

miles_to_kilometers(L) -> [N * 1.61 || N <- L].

Lisp is a bit wordier:

(defun miles-to-kilometers (list)
  (mapcar (lambda (n) (* n 1.61)) list))

but Smalltalk could be better since collections understand the message * so there's no need to map/collect:

Collection>>milesToKilometers
  ^self * 1.61

And if we accept this for awk it's hard to beat:

{ $1 * 1.61 }

But C++ just sucks it would seem. :-)

Code complexity

The "most meaningful" metric I can think of is the laborious, costly and time-consuming method of taking a not to small random sample of programmers of different experience and watching how each of them performs at a predefined task using his or her preferred language. NASA once tried something like this: http://www-aig.jpl.nasa.gov/public/home/gat/lisp-study.html

I suspect that if you use the search function you might find that there have already been some discussions on this subject.

*which should be not to easy, but not to complex either since it should be probably finished in finite time.

Concerning the miles_to_kilometers example:
A more idiomatic form of C++ would use the STL like this

float *miles_to_kilometers2(float* mile_list, int length) {
  float *kilo_list = new float[length];
  transform(mile_list, mile_list + length, kilo_list, bind1st(multiplies(), 1.61));
  return kilo_list;
}

Or using the boost lambda library one could write

float *miles_to_kilometers3(float* mile_list, int length) {
  float *kilo_list = new float[length];
  transform(mile_list, mile_list + length, kilo_list, _1 * 1.61);
  return kilo_list;
}

which makes it look much more like the perl example. Also it might be preferable to use some container class (STL) instead of an array.

A few very subjective remarks from my side:
Since perl has a for-loop construct, I suspect there are also quite a few perl programmers out there who would implement above function using for, so this might be more telling about the programmers experience than about the language (though I suspect it takes much longer to reach an adequate level of experience in C++ than in perl).

There are a few other things apart from for vs. map, which might be indicative of complexity:

  • Memory managment is one of course, in C/C++ it is probably more natural to modify the array in place instead of returning a modified copy (it is an imperative language after all), and let the caller of the function take care of creating a copy of the array when he doesn't want it modified. Otherwise one has either to pray that the caller later delete[]s the container or one should think of using a smart-pointer, or boehm-gc or ...
  • using bind1st and the STL algorithms properly isn't easy at first, especially since a slight typing error might lead to quite incomprehensible error messages (especially when templates are involved)
  • For me a source of complexity in the perl example is that one has to create references and to dereference variables manually and that the dereferencing operators are different for arrays, hashes and scalars. This happens two times in the example, once in "@{$_[0]}" and once (probably for performance, I don't know perl enough) in "return \@result;". Would be nice if it wasn't necessary to take care of this. Perls handling of references can get especially ugly when one has to deal with more complex data structures.

And while I don't agree with Luke's use of LoC as a metric of complexity, I see nothing but sheer elegance when looking at that Haskell function, no hassle with memory-management or with dereferencing of variables or any other annoyances. This fits nicely with my personal experience, that the languages of the ML family tree are quite good at keeping unnecessary complexity away.

Now I have to go,
happy new year!

The more I think about it

The more I think about it the more it seems that big O can be used to indicate the complexity of the actual coding task. An O(n) algorithm like your example is pretty simple. O(n^2) has the complexity of coding nested for loops. A O(nln n) has the complexity of coding a tree structure etc.. But maybe you meant something different?

What I deal with at my work is coding complex business rules. Often times these can be presented as some quiet obscure DFSM's. Since any regular language can be coded in a FSM, I suppose the 'complex business rule' class of programming task would have the same complexity of coding a fsm. So whats big O for the execution of a FSM? Constant time?

Not really

This isn't really the case, because in any given language, the feature you're talking about, even at O(n^2), may be a single primitive in the language.

For example, a configuration file controls even larger swaths of programmatic complexity, yet configuration files are usually themselves very simple.

So, it is really primitive-dependent.

Does a language's standard library count?

I wouldn't determine a language's complexity based on included library functions, with the exception of functions that would be difficult to impiment within the language proper. The reason is that if particular functions (such as map in perl) are that much of timesavers, then many programmers will have either their own bag of tricks or create them on the fly for larger projects. The c++ example could be re-written using a c++ version of map(). Of course, for very short examples, the c/c++ version would still be more complex since you have to include your own libraries. Also, as a counter to my above point, the "flavor" of the included std. library affects the flavor of the code that is generally written in that language, so the typical code in that language may be more or less complex.

Of course, complexity has a number of other faces to it. Certain items may be dirt simple for one programmer, but may not be obvious to another. So you end up with different metrics when dealing with the lone coder v.s. a group. So if you are working on a code segment that others may have to maintain, then you'll end up with more code (making it more complex to you), but it may be easier for less experienced people to work with. But I may be confusing complexity with productivity, so this may not apply at all.

The Library

I would say that the library does count, though the size of the library also creates complexity, as does the potential (mis-)interaction of its parts.

The point is to see what the complexity is of a _task/goal_ (not the language) given a set of starting primitives.

Off the cuff, with no research and no clue, I would guess at the following for a somewhat arbitrary assignment of coding complexity:

For the simple integration of non-selective, non-looping primitives, the complexity is C, where C is the number of primitives used (including repeats, though probably it should count input and output parameters as well).

For a branching or looping task, the complexity is C^n, where C is the number of instructions, and n is the number of nested branches or loops.

Cyclomatic Complexity

If my only spec is "converting a list of numbers representing miles into kilometers", I think I might write those two examples as...
template <class T> list<T> mi_to_km(list<T> mi)
{
    list<T> km;
    transform(mi.begin(),mi.end(),back_inserter(km),
              bind1st(multiplies<T>(),1.61));
    return km;
}
...and...
sub mi_to_km { map $_*1.61, @_ }
...you might also be interested in cyclomatic complexity.

cyclomatic complexity

Is that related to the set of programs that can be written in fewer characters than the word cyclomatic? If so then {$1*1.61} in awk qualifies for this one. :-)

(Please excuse my unusual sense of fun tonight. :-))

Excellent!!

Cyclomatic complexity appears to be very much along the lines of what I am looking for. Thanks for the info!

Complexity?

Are we talking about how complex a program is to read to write or to specify?

Read

This kind of metric is applied to code. At that stage your concerns are debugging, testing and maintening it. So I would say "read".

(note: you could also look at implementation complexity but nobody mentioned it-- the answer would be "run")

Actually Coding

I'm looking at the complexity of actually coding.