R in the New York Times

The New York Time says Data Analysts Captivated by Power of R.

R is ... the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

Hmmm, "fine tune financial models". Does R stand for Recession?

More seriously, does data mining plus multi-core machines add up to an important language direction for the next few years? How well does R fare on such boxen?

More on R previously on LtU here and here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

R is a DSL

I think that R is an example of a powerful DSL. As I understand, that the higher level language the better it describes final intention of its used the easier it is to perform all sorts of optimization tricks under the hood.

I think tweaking R runtime to perform better under multicore environment is easier than doing same for C or Java or Python.

R is great but slow.

It may have changed a lot since in the 4 years since I used it at university but using it standardly, it is bad for large data sets. A lot of it is written in fortran. It does most processing in memory. However, I still use it, as most statistics can be done one smallish aggregated data sets anyway and memory is large enough now. It has everything statistical you could ever want unless you are professor in stats and most of them use it anyway. If its not there the profs normally create a module for it anyway for others to use.

R and Perl

It is funny that R and Perl are the two most widely used programming languages in Bioinformatics. The only reason I can imagine is that they provided something (text processing and statistic computing) needed in that field. I say funny, it is because according to me, both of them have quite obscure syntax and semantics, yet many bioinformaticians (especially those as biologist origin) have to struggle to learn them. Poor them!

"R stand for Recession"

At first glance, I read it as "R stand for Regression".
Good joke ;-)

Great picture

There are a bunch of
statistics textbooks
that use R, and the article itself is apparently controversial within the R/S community.

But by far the best thing is the picture.

Whoever thought they'd see an illustration of static scoping in the Times!

Parallel R

For a complete treatment of the state of parallel processing in R see this as yet unpublished JSS article.

The authors are far more optimistic than I on R's prospects. R's semantics guarantee virtually unlimited access to the executing environment, which would be hard to replicate across boxes. For example, R allows functions to call parent.frame() which returns a copy of the environment in which the function was called -- dynamic scope on demand. Functions can then change the environment, adding variables, changing state, you name it. Mutability stands in opposition to parallelization and R is crazy mutable.

Even on a single box there is tons of non-reentrant C and Fortran code below the hood. Not that it couldn't be rewritten, but rather that it won't.

R has a great community, but the community's collective strength is statistics, not algorithms and building concurrent systems. As someone who expects to be using R for years (against my will largely), I'm hoping for the best, but not expecting big improvements. At the same time, smart people are smart people. Perhaps I'm underestimating my colleagues.