The long tail of programming languages

Charles Simonyi's Intentional Software blog has an interesting piece by Magnus Christerson, on what he calls the long tail of programming languages. It seems that (not surprisingly) programming languages exhibit the same power law popularity distribution that has been observed in so many other areas. Christerson's entry devolves into more of a marketing spiel for Intentional's products by the end, but he makes some interesting observations on niche and domain-specific languages along the way. If nothing else, the plot of programming language popularity ranking is interesting (although I find it a little hard to believe that more people program in Postscript than in Ocaml).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It's not that hard to believe

It's not that hard to believe there's more Postscript than Ocaml code out there though.

No argument there. However, a

No argument there. However, as the website from which the popularity statistics originated says:

The TIOBE Programming Community index gives an indication of the popularity of programming languages. The index is updated once a month. The ratings are based on the world-wide availability of skilled engineers, courses and third party vendors. The popular search engines Google, MSN, and Yahoo! are used to calculate the ratings. Observe that the TPC index is not about the best programming language or the language in which most lines of code have been written. [emphasis mine]

OK, I'll Bite...

Allan McInnes: If nothing else, the plot of programming language popularity ranking is interesting (although I find it a little hard to believe that more people program in Postscript than in Ocaml).

Why?

Anecdotal evidence

I'm assuming that you're questioning my comment about Postscript and Ocaml, not the assertion about the interesting nature of the plot. So, in answer to that question:

Why? Mostly because I seem to come across people who program in Ocaml much more often than I come across people who program in Postscript. I know there are some folks who write directly in Postscript, but they seem (to me) to be few and far between.

And yes, this is purely anecdotal evidence. That's why I said "I find it a little hard to believe..." as opposed to "their statistics are wrong...". If you have evidence that counters my impressions of the relative popularity of Ocaml and Postscript I'd be interested to hear it.

.postfix dislike might you why idea no have I


%!
/Times-Roman findfont
20 scalefont
setfont
newpath
84 524 moveto
(Writing PostScript directly can be very much fun.) show
84 504 moveto
(My favorite examples are RandomFonts such as:) show
84 484 moveto
(http://www.letterror.com/foundry/beowolf/) show 
84 464 moveto
(and http://cgm.cs.mcgill.ca/~luc/randomfont.html) show
showpage

Too silly?

--Shae Erisson - ScannedInAvian.com

Well, if your programs eventually want to print...

...you may sooner or later do some postscript. Postscript and O'Caml don't really overlap, as one is most a dedicated print formatting language, and the other is more general purpose. To me it's about as useful as asking why more people do SQL than do Java.

Clarification

My impression was that most Postscript was auto-generated these days, as opposed to hand-coded. Hence my suprise at Postscript being "more popular" than Ocaml. Although I suppose it depends on how you (or, more importantly, the people doing the survey) define "popularity". As Marc has already pointed out, their methods are a little suspect.

As an aside, I really didn't intend to have this turn into some kind of language pissing match. I don't even use Ocaml [edit: I think I have probably written more hand-coded Postscript than I have Ocaml. Of course, even that only amounts to a couple of dozen lines]. I was just surprised by some of the relative rankings (with PS/Ocaml being the one that most caught my attention).

PostScript code for fractal drawing

you may sooner or later do some postscript

Here is a PostScript program for drawing fractals that I wrote many years ago. You can send it directly to the printer. Be careful if you use it for drawing very intricate fractals, since it can use a lot of processing time on the printer! The source code has some examples of other fractals (just uncomment them and have fun). The data for the fractals is compatible with the FractaSketch program.

I believe it

Although it does get used for some commercial work, OCaml is stil mostly a
dabbler language. Many people use it for small tasks because it's more soothing
to the soul than C++. But I'd argue that OCaml is a language without a true
niche.

Postscript, on the other hand, is at the foundation of a huge industry. Anyone
working on applications that generate pretty output digs into Postscript. And,
believe it or not, there's a good number of (crazy!) people who like to work
directly in Postscript, because it's more precise and more direct than using an
object-based editing program.

Similarly, I'd believe that more people program in Forth than OCaml. Forth has
a fairly poor internet presence, but it gets used in high volume embedded
applications. Odds are you're familiar with something that's been written in
Forth (though you wouldn't know it), but the same can't be said of OCaml.

Invisible Forth

Open Firmware alone pretty much guarantees that the last point is true.

Dodgy data = dodgy theories

It is interesting to note that the evidence presented for this piece is the Tiobe survey.

(I thought we had covered this survey in an earlier thread, but was unable to find it via search or google.)

Here is how they get their results:
The search query '+" programming" -tv' is used to calculate the TPC Index. This query is executed for the regular Google, MSN, and Yahoo! web search and the Google newsgroups for the last 12 months. The formula that is applied is #(normalized Google web hits) + #(normalized MSN web hits) + #(normalized Yahoo! web hits) + #(normalized Google newsgroup hits). The term "normalized" means that the sum of all web hits of the first 50 languages for a query is taken and evenly distributed.

Wow, that seems guaranteed to come up with solid results. ;-)

Particularily strange is that Javascript has actually dropped in April (according to their methodology), in spite of the fact that AJAX has been a frenzied topic of late.

But this isn't that surprising: how likely is the exact phrase "X programming" to show up in most posting about language X?

This methodology may give some vague idea of language popularity, but to actually use it as the basis for technical decisions, or for dodgy memes about "where the money is" for PL development, seems like a bad idea to me.

Note to self

Note to self: avoid editorial comments when posting articles to LtU. I posted this item because I thought the power law distribution aspect was interesting, particularly as it applies to the "under the radar" nature of many of the languages that get discussed here (just what does constitute "under the radar" on that curve?). And yet the majority of the comments so far have focused on my comment expressing surprise regarding the relative rankings of PS/Ocaml. Guess that'll teach me to open my big mouth...

Power Law comparisons?

Wikipedia's article about the Power Law links to a blog post about the Power Law in the blogosphere.

What can I say? It seems to reinforce the Explorers, Pioneers, Townsfolk comparison mentioned here recently.

Maybe you could use this sort of data to support the idea that most people are Townsfolk about everything? A more interesting study might compare whether people fit into the same part of the power law curve in many parts of their life.

For an on-topic study, what about comparing the features present in the languages to see if that can predict power law ranking?

Maybe popularity is a function of the notable people in a community rather than features?
Can you predict where Fortess will end up being ranked?

Can you use these measurements to design the academic equivalent of a manufactured boy band?


What do you think is interesting about the power law distribution?

--Shae Erisson - ScannedInAvian.com

Interest in power law distribution

For an on-topic study, what about comparing the features present in the languages to see if that can predict power law ranking?

Actually, I think one of the interesting things we can learn from the power law distribution of languages is that looking at features as a way to predict ranking won't really tell us a whole lot. As Clay Shirky points out in talking about the blogosphere, power law distributions tend to arise as a result of network effects in a system where many individuals are presented with a large range of choices. While features may play a part, they are probably not the dominant force in shaping language popularity. Which perhaps goes a long way towards explaining why "better" languages like Haskell and Oz are not popular, and perhaps helps to answer the question "why do the program in C++?" that was raised in another forum topic here.


Can you predict where Fortess will end up being ranked?

Can you use these measurements to design the academic equivalent of a manufactured boy band?
I wouldn't suggest using the results in the linked survey for any kind of prediction. The actual rankings seem like they may not be very accurate due to the nature of the survey. And, as I said above, design may have very little to do with popularity. Perhaps you could use this information to help you design a marketing campaign for a language, but I doubt it would help you design the language itself.

What do I find interesting about the power law distribution shown in the Tiobe survey? Here are a few things:
  • For starters, the fact that it exists at all. Which seems to indicate something other than "features" as a popularity driver for languages.
  • The way that the power law ties into previous discussions here about languages "just under the radar", and the pressures on "popular" languages to support certain features (Shirky points out similar pressures on "A-list" bloggers to become more like mainstream media outlets).
  • The fact that, due to the scale-free nature of a power law distribution, the notion of an "under the radar" language doesn't have any real meaning (where do you draw the line?)
  • The possibility of homeostasis in systems subject to the kind of feedback that produces power law distributions, and what that says about the difficulties faced by new languages in getting adopted.
  • The possibility that the global nature of the internet will allow niche languages that might previously have died out to thrive and survive, in the same way that Amazon can stock "long tail" books.

Power Law Distribution

I thought the power law distribution aspect was interesting

There is a key fallacy in the "long tail of distribution" reasoning that the long tail is numerically superior in aggregate, so it is where the opportunities lie.

This is sort of true, but to make it work for you (in business or in research), you have to be able to cover the WHOLE tail at once, or pick just the right mote in the tail that is headed for greatness.

The long tail itself tells you nothing other than that you have a diverse population. Stephen Jay Gould wrote a whole book about this: Full House.

just what does constitute "under the radar" on that curve?

An interesting test case for the whole "under the radar sweet-spot" argument. If this principle worked, you should be able to pick your distribution cut offs (upper and lower), and some unusually high percentage of languages there should have some "good qualities" as per the sweet-spot.

Am I alone in thinking this is unlikely? ;-)

Re: Note to self

I posted this item because I thought the power law distribution aspect was interesting

I posted OO runtime graphs are scale-free for the same reason: it may not be an earth-shattering observation that object graphs exhibit a power-law distribution, but it certainly is a mildly interesting piece of trivia. Another piece of power-law trivia: On the Pareto distribution of Sourceforge project [PDF, 141K]. The long tail of software was mentioned here: http://bnoopy.typepad.com/bnoopy/2005/03/the_long_tail_o.html.

Bad Statistics and Silly Conclusions

Almost any set of nominal data can be ordered in such a manner as to display an apparent power law distribution. The given data here could as easily have been arranged in an apparent normal distribution, for example. [ Try it: select the highest value and make it the mean; select the two lower values and place them symetrically around the first value; continue 'til done.]

A true power law distribution requires that the domain be defined on an interval scale. Inotherwords the absence of an interval scale on the X-axis tells us that it makes no sense to speak of power law distributions (indeed of any particular distribution).

The chart

Ruby seems to be missing. I can't imagine that it's not in the top 50 programming languages.

I agree with others about the crappy statistics.

Ruby is there

between bash and Tcl/Tk