Programming Language Popularity

Mind as well complete the daily trifecta and post the article on Programming Language Popularity. The author combines search, advertizing and job data to try and draw a measurement on various aspects of popularity. Open to criticism, but the results are somewhat non-surprising given the weights applied, and coming up with a truly objective measurement is probably impossible.

In conclusion, if we look at the data available to us, especially as presented in the final, normalized chart below, we can see that there are broad patterns in language usage. Beyond the overall ranking, it is also possible to see whether a language is more used (jobs) or promoted (ads), and also whether it is used for open source projects, where presumably the participants have chosen a language because they feel it is truly the best choice, rather than dictated by management or commercial needs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

"Windows" is a language now?

"Windows" is a language now?

Extra data point

Sheesh - it's an extra data point added out of curiousity.

See also..

screwed up

This survey is basically useless given the skew that google results can add to it, some early faults i see

1. There are a number of factors that apply to these search results that I don't think anyone knows how to discount. Factors such as programming portals in which every page has a menu link to a number of articles on the subject, if the site has 200 articles on VB and 15 articles on C then both languages return 215 results (by which I mean if the menu has the text "c programming" as the text of the link [the question of course would be if Google actually indexed each of these, and the answer to that question is beyond the surveyer's control])

2. There are a number of terms used that mean widely varying things, or that are just not programming languages but programming specializations. Doug Orleans has already pointed out that Windows is not a programming language, we might also note that Shell programming is not a language but a specialty or domain, and anyway when you talk about shell programming which shell is it. Does Shell programming effect the return Unix programming, probably, what is interesting of course is that there is an interplay between shell programming and windows programming as well
The Query 'windows "shell programming" explorer' returned 5,740 hits,
http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=windows+%22shell+programming%22+explorer&btnG=Search
minus the explorer and with ' -bash -cygwin' it returned 82,000 http://www.google.com/search?hl=en&lr=&ie=UTF-8&c2coff=1&q=windows+%22shell+programming%22+-bash+-cygwin&btnG=Search
obviously you could keep on wittling this down with -korn etc. etc.

Finally if you examine the query http://www.google.com/search?hl=en&ie=UTF-8&q=%22c+programming%22&btnG=Google+Search
"c programming" the 7th result i got was to http://devcentral.iticentral.com/ in which the only instance of the string "c programming" was an ad for a book on MFC programming (you'll find it in the cached version). There does not seem to be any method for weeding out false positives with this survey, should we just guestimate the rest of the way?

other false positives:

"windows programming" -" x-windows programming" returns 183,000
".net programming" is gonna have a lot of false positives, result # 6 of that search http://www.gustavo.net/programming/c__tutorials.shtml

no mention of the .net framework on the page.

if we try the following search: ".net programming" -" microsoft .net "
we get 303,000 of which the second return http://www.gustavo.net/programming/ in the cache http://216.239.59.104/search?q=cache:zlg4uKupyUQJ:www.gustavo.net/programming/+%22.net+programming%22+-%22+microsoft+.net+%22&hl=en
shows us that google has returned the page because the string "gustavo.net - programming" appears in the page

You are ignoring the other data sources

I responded to a lot of comments in the osnews thread, so I'll try and be brief here.

1) Maybe google results should be seen as "visibility" rather than "popularity". That still counts for something. It's easier to tell your potential client that you're going to use this thing he's heard something about, unless you're a really good talker:-)

2) There are three other data sources that are not google, and provide more accurate results for the type of information sought there. Taking all 4 sources together, in my opinion, does give you some rough ideas.

3) It's evident to everyone that it's not as scientific as we would like. What is harder than complaining about that is coming up with useful ideas on how to refine and improve the results. I'm more than willing to take good ideas into consideration, or if you're the individualistic sort, feel free to go out and do something better. It's obvious that the results are of interest!

BTW, to repeat for those here wondering "where is XYZ?", I used the Overture data as the cutoff point. No one is paying for ads about Erlang, for instance, and jobs data is also hard to come by for these languages.

improving

"3) It's evident to everyone that it's not as scientific as we would like. What is harder than complaining about that is coming up with useful ideas on how to refine and improve the results."

Well my main complaints about the use of google is that a single pass query does not allow you to accertain that it is in fact about what you're discussing. this gets magnified in the instances that I pointed out, i am willing to agree that tcl programming query is probably all going to be about tcl programming, but .net programming, c programming? these skew the results.

given the Google API one could build something that refined the search.

I'm not sure as to how it should be refined, my initial ideas would involve the following:

1. searches done at different times could be used to filter out the false positives contained within ads, such as the MFC programming.

2. doing a first pass with the search term, followed with the term with additional negative terms can provide results that hopefully would restrict false positives. Or be used to define a range.

These however are both unsatisfactory in that the google search interface is just not rich enough to produce results we can be reasonably certain are about the programming language itself, as opposed to links to places that are about the programming language, or just letters etc. arranged in such a way on the page that the result has popped up erroneously, this is mainly a problem in the context of windows programming, .net programming, or c programming. Mainly the google results would function as a good list of URIs for a bot to consume that would be able to bring more advanced checking into place to determine, hey is this really about c programming?

Abouts Ads, I wonder if they are really good for checking the popularity of the language http://www.java.net/cs/user/view/cs_msg/6301
in a case such as this is knowing the language really a benefit or an actual requirement.

Define popularity.

Basically, what we have here is "how popular are some programming languages/environments on some widely-available internet resources". There are other ways to measure how popular programming languages are:

I actually see three 'worlds' of language use:

  • Academia, at the graduate/undergraduate level:
    • How many students learn X at the undergraduate level? (in the US/rest of the World/country to country)
    • How many paper does X generate?
  • Industry
    • Well, that means surveying ISVs and big software companies, but also what is used in house in other industries.
    • What jobs are available (like in this survey)
  • Open Source: pretty much covered here, altough one could weigh projects by their relative size.

Survey Methods

If I had the time, financial resources and inclination to undertake a survey of that magnitude, I'd probably aim to get paid for it, rather than give it away for free in order for people to make snide comments about "windows programming":-)

Infoworld survey

Infoworld has a related survey of developers that rates language popularity. Of course, IW is going to slanted to whatever their audience is composed of.