The R Project

One of my best friends is a Ph.D. student in a well-respected geology department, and an avid R user. He informs me that the programming language "R" is highly fashionable in his department, and is increasingly popular across his field.

I mention this because R encourages functional programming, and I have not heard it mentioned on LtU. Here is a quote from the language manual:

R is a system for statistical computation and graphics. It provides, among other things, a programming language, high level graphics, interfaces to other languages and debugging facilities. This manual details and defines the R language.

The R language is a dialect of S which was designed in the 1980s and has been in widespread use in the statistical community since. Its principal designer, John M. Chambers, was awarded the 1998 ACM Software Systems Award for S.

The language syntax has a superficial similarity with C, but the semantics are of the FPL (functional programming language) variety with stronger affinities with Lisp and APL. In particular, it allows “computing on the language”, which in turn makes it possible to write functions that take expressions as input, something that is often useful for statistical modeling and graphics.

It is possible to get quite far using R interactively, executing simple expressions from the command line. Some users may never need to go beyond that level, others will want to write their own functions either in an ad hoc fashion to systematize repetitive work or with the perspective of writing add-on packages for new functionality.

Here is another quote from the FAQ:

The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme.

The upshot is that "S" is dynamically scoped while "R" is lexically scoped. I applaud R for getting this right, but the FAQ makes an interesting counterpoint that I paraphrase here:

Nested lexically scoped functions also imply a further major difference. Whereas S stores all objects as separate files in a directory somewhere (usually .Data under the current directory), R does not. Having everything in memory is necessary because it is not really possible to externally maintain all relevant environments of symbol/value pairs. This difference seems to make R faster than S.

The down side is that if R crashes you will lose all the work for the current session. Saving and restoring the memory images can be a bit slow, especially if they are big. In S this does not happen, because everything is saved in disk files and if you crash nothing is likely to happen to them. (In fact, one might conjecture that the S developers felt that the price of changing their approach to persistent storage just to accommodate lexical scope was far too expensive.)

Other than scope, R tries to be as close to S as possible. I'll end with two amusing remarks from the introduction:

Warning: for() loops are used in R code much less often than in compiled languages. Code that takes a `whole object' view is likely to be both clearer and faster in R.

Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function. Thus the assignment X <- qr(X) does not affect the value of the argument in the calling program.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Briefly mentioned on LtU

Briefly mentioned on LtU, but worthy of more discussion.

Interpreting intent

R is great for statistics because it's designed for that. R really is very targetted and has a lot of implicit functions and data conversions that save inordinate amounts of time and pointless hoop jumping if you just want to roll up your sleeves and dig through data. That's great when you're doing any sort of statistics or data-mining, but a horrible pain in the ass if you're trying to do some sort of general purpose programming.

I love R!

For stats analysis its superb and rich.

I will admit to terrifying my ex-colleagues by creating a largish program in R that had nary a single loop. :-)

So if you are a data muncher, head for R now, it's lovely.

The R language is surprisingly sophisticated

The language itself has a number of interesting features that in many cases go beyond what is available in more mainstream languages. The function-call semantics, in particular, is very cool and combines lazy evaluation, positional and named arguments, defaults, and split-level scoping to virtually obviate argument-handling tedium. I found it so unusual I wrote about it on my blog: Wondrous oddities: R's function-call semantics.