Lisp-Stat does not seem to be in good health lately.

The Journal of Statistical Software http://www.jstatsoft.org/ has a Special Volume devoted to the topic: "Lisp-Stat, Past, Present and Future".

In the world of statistics, it appears that XLISP-STAT http://www.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html has lost out to the S family of languages: S / R / S-plus:

In fact, the S languages are not statistical per se; instead they provide an environment within which many classical and modern statistical techniques have been implemented.

An article giving an excellent overview of the special volume is: "The Health of Lisp-Stat" http://www.jstatsoft.org/v13/i10/v13i10.pdf

Some of the articles describe the declining user base of the language due to defections:

whilst other articles describe active projects using XLisp-Stat, often leveraging the power of the language, in particular for producing dynamic graphics.

The S family of languages, originally developed at Bell Labs, has much to recommend it. S is an expression language with functional and class features. However, as the original creator and main developer of XLisp-Stat, (and now R developer) Luke Tierney explains in "Some Notes on the Past and Future of Lisp-Stat" http://www.jstatsoft.org/v13/i09/v13i09.pdf ,

"While R and Lisp are internally very similar, in places where they differ the design choices of Lisp are in many cases superior."

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Social reasons

The main cause of the downfall of XLisp-Stat appears to be the primary maintainer stopping support. At least that is the impression I gain from the papers above.

R is an interesting language. It's like Scheme with a different syntax and more warts. Check out section 10.7 of the Introduction to R:

The discussion in this section is somewhat more technical than in other parts of this document. However, it details one of the major differences between S-Plus and R. [10.7 Scope]

Well I probably wouldn't wait till section 10 before I discussed scope in a programming language tutorial. This nicely illustrates the difference between the programmer and statistician communities. There is no reason for R, or S-Plus, or XLisp-Stat to exist. All their functionality could be easily implemented in any other dynamic language and then someone else would maintain the infrastructure. However the focus would be different, and this appears to be something that the statistical community values (and the existing investment in statistical code).

At least R isn't fundamentally a crock, which from my experience the other big numerical language, Matlab, most definitely is (and from the discussion of scoping rules, S-Plus is as well.)

The R core development team works principally in Scheme

There are some R-related articles on IBM Developerworks that introduce R to developers

Laird gives a short introduction to S and R:

A Bell Labs team began developing a research project called "S" back in the mid-'70s. Eventually, the project became a full-blown, general-purpose computing language, with rich statistical capabilities. ... Project leader Dr. John Chambers received the ACM Software System Award in 1999, in recognition that, among other achievements, "S has forever altered the way people analyze, visualize, and manipulate data." Among S's many strengths, it "plays nicely" with modules written in such other languages as Fortran and C.

Insightful Corporation sells a commercially successful, widely respected, descendant of S it calls S-PLUS. In the early 1990s, Robert Gentleman and Ross Ihaka of the University of Auckland began work on R, which they released as free software, and which evolved (according to Ihaka) to resemble S quite closely. R's implementation, though, along with a few of its interfaces are entirely different from S and S-PLUS. The R core development team, which Chambers joined in 1997, works principally in Scheme. ...

Language choice in statistics should become less of an issue due to the OmegaHat project. This is described by Chambers to be a component based statistical computing environment, where many languages, Java, R, S, Lisp-Stat, Fortran, C, included, can be used together.

A talk covering both of the above is The R and Omegahat Projects in Statistical Computing Ripley 2001

A Couple of Notes...

Noel, you're right. We should all just go back to using FORTRAN (actually suggested by some Swedish guy at the Joint Statistical Meetings a few years back). Seriously though, the reason these tools exist rather than simply tacking statistical packages onto an existing language (actually a suggestion made in Jan de Leeuw's paper in the above) is that most statisticians are not programmers, nor do they desire to become programmers. The interesting bit is the analysis of data, not writing elegant software or pumping out cool hacks. The bulk of R users will probably never write a package, some may never move beyond interactive use (though I would suggest to those users that they should explore R's "literate analysis" tools). Putting a "better" (from a language designer's point of view) in front of these users is not likely to impress them. Thats the same reason scope isn't introduced until section 10---most users aren't even going to read that document at all (they'll read Modern Applied Statistics in S-PLUS or some similar book).

Hell, having a full programming language may actually be a hindrance. SAS is basically the de facto standard in several fields and, I assure you, it has nothing to do with the quality of its programming language (its a jumped up mainframe batch language---IIRC related to PL/I). Among other things (its really really good at regression models on large amounts of data, for example) its trusted by entities like the FDA. Anything the statistician codes is, in some sense, suspect since there's no assurance that the results you get aren't due to a bug in the code. (Incidentally, this is a point in favor of the open and peer-reviewed development of, at least, the core statistical features of ANY statistical package.)

Anyways, I ramble. Moving on the Omegahat thing, most of the effort there (due mostly to Duncan Temple Lang) is concerned with bindings for R to other things (Java, Lisp-Stat, Matlab, Perl, Python, etc). I don't know the common statistical framework part has really caught on. Personally I like Luke's common statistical virtual machine idea--there's really no reason why R, Lisp-Stat and more domain-specific languages like BUGS and such can't all live in the same VM and share data structures and core routines (or somebody else's VM---though I think you'd really like to have vector math primitives, there's no such thing as a scalar in R for a reason).

Agreed, but...

I agree with many of your points. I'm not advocating Fortran, but a more modern language like (modern) Scheme or O'Caml, perhaps with some domain specific tweaks. These language have several advantages: they make compiling them easier, and speed is always a problem in my experience, and high-level operations (such as pattern matching and array comprehensions) make algorithms clearer and hence raises the bar for what is 'obviously' correct.

Right, so, like I said: 99% o

Right, so, like I said: 99% of R users don't care about ease of compilation or " 'obviously' correct" algorithms. For the most part they care about getting their job done. At best the things that make language designers and researchers happy are orthogonal to this task and a hindrance at worst. Nobody ever says "wow, language X's ease of compilation made fitting my non-linear model a breeze!" The only compelling reason for switching away from a special-purpose statistical language to a more general purpose language is to take advantage of the libraries available for the language, not because it makes data analysis any easier.

I Disagree

I disagree with practically every statement you make:

R users don't care about ease of compilation or " 'obviously' correct" algorithms

In the statistical work I've done (e.g. clustering HMMs) speed is really important. If something takes a week or more to run and I can make it run faster with very little effort I'm really happy. Other people I know in the area (statistical machine learning) have the same problem. A high-level language that can be easily compiled would be a real boon. Furthermore, as you say "Anything the statistician codes is, in some sense, suspect since there's no assurance that the results you get aren't due to a bug in the code" so anything that raises the bar for 'obviously correct' algorithms is a big win as well. This is "getting their job done."

At best the things that make language designers and researchers happy are orthogonal to this task and a hindrance at worst.

I've used R and I've used Matlab, and if they had pattern matching and array comprehensions it would be a lot easier to write complex algorithms in them. If they had clean semantics they could be easily compiled; see above.

So in conclusion I argue that a better language makes development faster (less code must be written; code has less bugs) and gets results faster (code runs faster). Besides, the so called "special-purpose statistical languages" aren't really that special (with the exception of Mathematica) from a PL point-of-view. Sure they tend to have some nice notation for arrays-slices, and overloaded operators for arrays, but that could be easily accommodated in a modern general-purpose language, and modern languages have features far in advance of that in, say, R.