From our recent discussion on R, I thought this paper deserved its own post (ECOOP final version) by Floreal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek; abstract:
R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
Excerpts from the paper:
One of our discoveries while working out the semantics was how eager evaluation of promises turns out to be. The semantics captures this with C; the only cases where promises are not evaluated is in the arguments of a function call and when promises occur in a nested function body, all other references to promises are evaluated. In particular, it was surprising and unnecessary to force assignments as this hampers building inï¬nite structures. Many basic functions that are lazy in Haskell, for example, are strict in R, including data type constructors. As for sharing, the semantics cleary demonstrates that R prevents sharing by performing copies at assignments.
The R implementation uses copy-on-write to reduce the number of copies. With superassignment, environments can be used as shared mutable data structures. The way assignment into vectors preserves the pass-by-value semantics is rather unusual and, from personal experience, it is unclear if programmers understand the feature. ... It is noteworthy that objects are mutable within a function (since ï¬elds are attributes), but are copied when passed as an argument.
But then, they do a corpus analysis to see what features programmers actually use! We don't do enough of these in PL; examples from the paper:
R symbol lookup is context sensitive. This feature, which is neither Lisp nor Scheme scoping, is exercised in less than 0.05% of function name lookups...The only symbols for which this feature actually mattered in the Bioconductor vignettes are c and file, both popular variables names and built-in functions.
Lazy evaluation is a distinctive feature of R that has the potential for reducing unnecessary work performed by a computation. Our corpus, however, does not bear this out. Fig. 14(a) shows the rate of promise evaluation across all of our data sets. The average rate is 90%. Fig. 14(b) shows that on average 80% of promises are evaluated in the ï¬rst function they are passed into.
And so on. A lot of great data-driven insights.