Python and Scientific Computing

This interesting blog post argues that in recent years Python has gained libraries making it the choice language for scientific computing (over MATLAB and R primarily).

I find the details discussed in the post interesting. Two small points that occur to me are that in several domains Mathematica is still the tool of choice. From what I could see nothing free, let alone open source, is even in the same ballpark in these cases. Second, I find it interesting that several of the people commenting mentioned IPython. It seems to be gaining ground as the primary environment many people use.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The platform vs. the language

As a long time R user, one of the things that has kept me using R is not so much the language, but the platform. It is the wide variety of statistical tools available combined with a relatively comfortable language to stitch them all together that works for me.

Having seen some of the interesting developments on a PL front on the Java platform (clojure, scala, etc) and on the .NET platform (C#, F#, etc), I wonder whether a similar thing may end up happening with R? From a design point of view, the R package system is less modular than the class libraries in Java and .NET, and that may pose some problems.

It is interesting to compare R (and of course S and S-Plus which were the languages from which R evolved) to the statistical analysis languages that preceded it such as SAS and SPSS. These older languages were effectively just sequences of commands, with some limited flow control and macro programming capabilities. The S/R/S-Plus family of languages represented a major step forward in terms of bringing more advanced programming concepts to statistical programs. S/R/S-Plus have proper first class functions, a number of different forms of "object oriented" style programming where the function called is determined by the data type passed and numerous other "advanced" features.

When you say platform do you

When you say platform do you mean set of available libraries or do you mean runtime system?

Probably libraries.

I assumed platform meant libraries rather than runtime, though of course I like to hear about runtime stuff. :-) By the word platform, most folks mean "what I can invoke" in available interfaces, as opposed to "how I can invent" in available infrastructure.

Yarkoni's blog post (on Python eating other languages' lunch, especially R's) seems to be mainly about more libraries appearing in Python to do what he previously did in R, which indicates a very healthy Python ecosystem. A desire to use one common tool when feasible seems perfectly rational.

Most languages seem capable of roughly similar feats, so I no longer expect much difference in the what gets done, as much as the how, when it comes to language choices. Crossing boundaries between systems is a source of cost and complexity, so there's good reason to stick with one PL as long as the cost/benefit analysis works out. (Sorry if this sounds banal when it's so close to common consensus.)

Almost all the design hobby work I do lately is about libraries affecting runtime, in a way mostly orthogonal to what you would do in a language on top. So there's almost no cause for me to discuss it here. Except I think there might be more things you can do in a PL if the runtime abstracts the OS, so it's feasible to do things we normally say only the OS does. But parity remains in what can be done in any PL when features are language agnostic.

Yeah, I figured the claim was

Yeah, I figured the claim was about libraries. But of course Yarkoni's post is mostly about Python+libraries compared to R+libraries.

I did mean the set of

I did mean the set of libraries, rather than the run time system

So you disagree with the

So you disagree with the claim that most scientific tasks Python libraries are getting to be as good, even if they are not there yet?

For me personally ggplot2 is

For me personally ggplot2 is still far superior to matplotlib. In other areas Python is getting there. Python is a pleasant language, and it is especially nice to stay in one language, rather than to have to export your data in language A to a file which is then read in language B to plot it, so I'm all for Python becoming the best tool in all areas.

ggplot for python.

Very nice, I'm going to try

Very nice, I'm going to try that out.

so I'm all for Python

so I'm all for Python becoming the best tool in all areas.

Except for programming, of course :)

true in bioinformatics

which makes me a teensy bit sad since i'm more of a static typing bigot. inside story from one place was that they started off with scala, but then got over-ridden since there's so much more in the way of desired 3rd party libraries for biotech in python, and since a lot of the non-cs-yet-programmer types were coming from a python background more than a statically typed one (let alone one that would be comfy with scala i guess).

How useful and/or practical

How useful and/or practical is static typing (or strong typing?) in exploratory data analysis work? Not sure.
I recall some work on Haskell libraries for bioinformatics. I don't think it went very far.

i meant 'bioinformatics' in the widest sense

including: exploratory data anlaysis work; web sites to deal with biological samples; systems to correlate dna to predictions about getting diseases; controlling robots to do work on physical samples; etc.

also, if somebody came to me and said "my brand-new new-fangled algorithm-code i just whipped up here says you will die in 40 seconds (after lifting off from Kourou, French Guiana)... oh, wait, i screwed up the units here, hold on..., er, make that 40 years. i think. sorry, my bad!" i'd be annoyed. :-)

Symbolic vs. Numeric work

From what I can tell from people doing the sort of work I am mostly interested in, Mathematica is (still) the tool of choice for exploratory, conceptual, mostly symbolic work. I find it troubling like many others and it can complicate life even in an academic setting. For example, I am still looking for an non-Mathematica alternative to Dynamo, an evolutionary game theory toolkit, for a seminar I am giving next semester.

Another thing that concerns me, is this notebook industry. I wasn't particularly inspired by the browser-based model, though I haven't explored this too deeply. But as some mentioned in these discussions a model like ESS for R is more appealing. Now with org mode it seems a better approach for enabling reproducible exploratory research, right?