archives

Interactive scientific computing; of pythonic parts and goldilocks languages

Graydon Hoare has an excellent series of (two) blog posts about programming languages for interactive scientific computing.
technicalities: interactive scientific computing #1 of 2, pythonic parts
technicalities: interactive scientific computing #2 of 2, goldilocks languages

The scenario of these posts is to explain and constrast the difference between two scientific computing languages, Python and "SciPy/SymPy/NumPy, IPython, and Sage" on one side, and Julia on the other, as the result of two different design traditions, one (Python) following Ousterhout's Dichotomy of having a convenient scripting language on top of a fast system language, and the other rejecting it (in the tradition of Lisp/Dylan and ML), promoting a single general-purpose language.

I don't necessarily buy the whole argument, but the posts are a good read, and have some rather insightful comments about programming language use and design.

Quotes from the first post:

There is a further split in scientific computing worth noting, though I won't delve too deep into it here; I'll return to it in the second post on Julia. There is a division between "numerical" and "symbolic" scientific systems. The difference has to do with whether the tool is specialized to working with definite (numerical) data, or indefinite (symbolic) expressions, and it turns out that this split has given rise to quite radically different programming languages at the interaction layer of such tools, over the course of computing history. The symbolic systems typically (though not always) have much better-engineered languages. For reasons we'll get to in the next post.

[..]

I think these systems are a big deal because, at least in the category of tools that accept Ousterhout's Dichotomy, they seem to be about as good a set of hybrid systems as we've managed to get so far. The Python language is very human-friendly, the systems-level languages and libraries that it binds to are well enough supported to provide adequate speed for many tasks, the environments seem as rich as any interactive scientific computing systems to date, and (crucially) they're free, open source, universally available, easily shared and publication-friendly. So I'm enjoying them, and somewhat hopeful that they take over this space.

Quotes from the second:

the interesting history here is that in the process of implementing formal reasoning tools that manipulate symbolic expressions, researchers working on logical frameworks -- i.e. with background in mathematical logic -- have had a tendency to produce implementation languages along the way that are very ... "tidy". Tidy in a way that befits a mathematical logician: orthogonal, minimal, utterly clear and unambiguous, defined more in terms of mathematical logic than machine concepts. Much clearer than other languages at the time, and much more amenable to reasoning about. The original manual for the Logic Theory Machine and IPL (1956) makes it clear that the authors were deeply concerned that nothing sneak in to their implementation language that was some silly artifact of a machine; they wanted a language that they could hand-simulate the reasoning steps of, that they could be certain of the meaning of their programs. They were, after all, translating Russel and Whitehead into mechanical form!

[..]

In fact, the first couple generations of "web languages" were abysmally inefficient. Indirect-threaded bytecode interpreters were the fast case: many were just AST-walking interpreters. PHP initially implemented its loops by fseek() in the source code. It's a testament to the amount of effort that had to go into building the other parts of the web -- the protocols, security, naming, linking and information-organization aspects -- that the programming languages underlying it all could be pretty much anything, technology-wise, so long as they were sufficiently web-friendly.

Of course, performance always eventually returns to consideration -- computers are about speed, fundamentally -- and the more-dynamic nature of many of the web languages eventually meant (re)deployment of many of the old performance-enhancing tricks of the Lisp and Smalltalk family, in order to speed up later generations of the web languages: generational GC, JITs, runtime type analysis and specialization, and polymorphic inline caching in particular. None of these were "new" techniques; but it was new for industry to be willing to rely completely on languages that benefit, or even require, such techniques in the first place.

[..]

Julia, like Dylan and Lisp before it, is a Goldilocks language. Done by a bunch of Lisp hackers who seriously know what they're doing.

It is trying to span the entire spectrum of its target users' needs, from numerical inner loops to glue-language scripting to dynamic code generation and reflection. And it's doing a very credible job at it. Its designers have produced a language that seems to be a strict improvement on Dylan, which was itself excellent. Julia's multimethods are type-parametric. It ships with really good multi-language FFIs, green coroutines and integrated package management. Its codegen is LLVM-MCJIT, which is as good as it gets these days.