Languages best suited for scientific computing?

I work in a field where the standard for high-performance scientific computing is still Fortran (albeit Fortran 95 nowadays). The array-based nature of Fortran provides a relatively clean and intuitive syntax for solving the systems of equations often involved in numerical simulations. The simplicity of Fortran has also facilitated highly efficient Fortran compiler implementations.

However I'm searching for a more modern and general purpose scientific computing language. Fortran is probably not the ideal language for writing networked graphical applications, neither I would argue is MATLAB or IDL. SciPy has deservedly gained traction amongst scientists recently and it is certainly a very attractive option. Unfortunately "CPython" can exhibit underwhelming performance characteristics requiring inconvenient work-arounds for CPU intensive code (see here for an example).


I'm of the belief that the ideal language should also be functional and open-source. Some contenders I've tried:

  • SAC lacks many desirable features for a general purpose programming language.
  • Boo is one of the most promising new "main-stream" languages but multidimensional array operations are not a core feature (although it recently gained array slices). It is also tied to the CLR.
  • OCaml probably comes closest, although again multidimensional arrays/matrices are not first-class citizens and the syntax is unfamiliar for most scientists (although the "OCaml Whitespace Thing" might help here).

Any suggestions in my quest for a better language are welcome. And yes I'm aware of hacks in C++ (Blitz) and Java (JScience); neither of these are very promising going forward IMHO!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Sun's latest entry

Sun has been developing Fortress. If I remember correctly it tries to make it simple to implicitly parallelize among other things.

Maybe Haskell?

There's a blog post here describing 11 reasons to use Haskell as a mathematician:
http://sigfpe.blogspot.com/2006/01/eleven-reasons-to-use-haskell-as.html

Perhaps some of the reasons listed apply to scientists as well. (Disclaimer: I'm not a scientist or a Haskell expert)

Not Directly

Haskell numeric performance is nothing impressive. The native code generator in GHC doesn't even use SIMD instructions yet.

It might be useful in a staged system, like
Generative Code Specialisation for High-Performance Monte Carlo Simulations.

Data Parallel Haskell should eventually offer reasonably good performance and some automatic (SMP) parallelism.

Haskell not the best choice here

Unfortunately, the requirements of (pure) mathematicians differ from those doing numerical computations. In my experience, scientific computation is about arrays, and these are not a Haskell strong point. Using various monads it is possible, at least, do to things like in-place array updates, but the code generated isn't all that great. It's not absolutely terrible though - within a factor 2 of what I get with C++ for some simple-minded experiments I tried (like evaluating simple expressions at each point in an array). But the Haskell code to achieve this can be a bit verbose.

For a Haskell-like language with better array support there's always Clean.

C++

This is one niche where C++ seems to do very well. Many C++ compilers support (as a nonstandard extension) the C99 "restrict" keyword--which essentially asserts to the compiler that a given pointer is to a unique (unaliased) object, which permits the optimization of caching the pointed-to object in registers. Since Fortran restricts aliasing, Fortran compilers have been performing similar optimizations for years.

C++ also has some rather high quality numeric libraries (Blitz++) which combine high performance with nice language-level semantics. Why you characterize Blitz as a "hack", I don't know--the stuff the Blitz guys do is downright brilliant. (Remember--the C++ template mechanism is a Turing-complete lazy functional language, which the Blitz developers take full advantage of).

Some other useful attributes of C++ for numerical programming include:

* No overspecification of floating-point behavior (those who do heavy-duty numerical analysis thoroughly hate strict IEEE-754 floating point), so C++ is readily portable to hardware that has other floating-point semantics.
* Some level of control of memory access/allocation patterns--a key concern for high-level numerical programming on modern hardware is avoiding cache misses.

Hate IEEE-754?

(those who do heavy-duty numerical analysis thoroughly hate strict IEEE-754 floating point)

Really? My impression is just the opposite -- that what people hate is half-hearted 754 implementations, for which Intel's inadequate handling of denorms is the poster child. (Intel either clamps them to zero, thereby (quasi-)silently flushing all precision or lets a trap-handler emulate denormalized arithmetic hundreds of times slower than adequate hardware would have.)
C++ is readily portable to hardware that has other floating-point semantics.
What machines are you referring to? Except for near-extinct breeds of legacy-bound IBM or Cray machines and microcontrollers with no fp support at all, I believe that every computer manufactured commercially today has IEEE floating point.

OCaml

I am using OCaml for atmospheric data analysis and number crunching and have been, overall, very happy with it. The C FFI is relatively simple to work with, and it has a decent selection of available libraries. I have also found its type inference and strict typing to be very beneficial, both for updating old code and writing new code. + vs +. may seem like an issue at first, I have found it help avoid more problems than it causes.

I'm surprised!

I'm surprised nobody's mentioned APL and its derivatives: J, and K. At least J is still actively developed, and it's "free" (as in beer, not as in speech IIRC).

Nial

I am learning Nial now, and it seems to be very much suitable for scientific computing (It is claimed in the site.). The syntax (Array based) is very consistent and intuitive.

The source is available for download under artistic license.

ZPL?

Speaking of array languages, has anyone here ever taken ZPL out for a spin?

ZPL is an array programming language designed from first principles for fast execution on both sequential and parallel computers. It provides a convenient high-level programming medium for supercomputers and large-scale clusters with efficiency comparable to hand-coded message passing. It is the perfect alternative to using a sequential language like C or Fortran and a message passing library like MPI.

Sage + Cython

If you were interested by SciPy you'll love Sage. Obviously CPython isn't fast enough for intensive computation. But you can use various schemes for improving performance. One option I like is writing c code within python with Cython. Another option is simply write your code intensive parts in c or fortran and then use those functions within python.

Great suggestions

Thanks everyone for the informative replies to my original post. It is evident there is probably not yet an ideal "silver bullet". However the suggestions here have certainly made me re-evaluate some of my assumptions and explore some new languages.

Cython and Python/Numpy/Scipy/Sage (possibly coupled with IPython1) currently appeals most to my aesthetics. It appears to be a reasonable amalgam of a powerful and general purpose language with more specific scientific computing constructs. The syntax is easy for a scientist to work with, there is a solid set of libraries and a vibrant/growing community of contributers.

C++/Blitz++ might also be a solid option. However I have some reservations about programmer productivity with C++, especially for newcomers to the language or for non-computer scientists.

OCaml is an extraordinary language and my experiences with it have been extremely positive (and I agree the "+." syntax for floating-poin t addition makes sense in a strongly statically typed language). I would like to see more energy/drive in the community to make this language a truly great contender (for general-purpose and scientific computing). I've been following OCaml developments for many years. While there is promising work being done, I've noticed some fracturing in the ranks over this time; many have moved on to F#, some brilliant OCaml hackers have been hired by Jane Street Captial limiting their ability to contribute back to the community, still others have become frustrated with aspects of OCaml to the point of inventing new languages eg Neko.

I'm hopeful that some of the current research-level languages will evolve to provide a more complete solution in this area (a great mix might be a concurrent OCaml/G'Caml + Nial + a cleaner Python inspired syntax?).

F#

F# seems to be well suited for interactive, exploratory programming with good performance and interesting visualization capabilities.

The productizing of F# should make its already wonderful Visual Studio addin even better.

It's cross platform for those on Unix and other platforms where Mono runs.

The original poster mentioned Boo. Boo is a interesting language, but I'm not sure about its performance.

Scala?

Check out Scala since it has the interactive, exploratory abilities of F#, and it can generate bytecode for supposedly either the JVM or the CLR. (If one is paranoid enough about patent torpedos, one might want to steer away from MS-based-stuff; everybody's got a different comfort zone, tho, i'm sure.)

(¿and ignore the fact that Java apparently screws the pooch when it comes to IEEE [1] [2]? oh, er, wait, maybe not so much any more, right.)

Titanium

I think Fortress is very neat, but not quite at the heavy deployment phase (then again, at such a stage, usage would probably inspire personal attention by the designers :). It's high on my todo list of languages worth getting a feel for :)

In terms of getting code right, Titanium has a neat combination of Java-like features (GC, generics), concurrency features (textual barriers keep you honest, OpenMP style loops), and limited modality support (global address space with basic type support). I can't really endorse it as I haven't used it, but I find the feature combination intriguing -- getting code to work efficiently on both SMPs and over interconnects without destroying readability is tough. I believe it can be interpreted as UPC's safe and friendly Java cousin. The feature set is also interesting program analysis-wise...

one language to rule them all

I enjoyed reading this! Thank you JustinTrellis for posting and all you other for your comments. *place mental bookmark for LTU*

You might find this interesting:
"The Search For a New HPC Language"
http://www.hpcwire.com/hpc/837711.html

It describes three big programming languages for the future of large scale computing. One each from Chapel (Cray), Fortress (Sun) and X10 (IBM). If I remmeber correctly, only X10 and Chapel remains in the competition of the gold! :-) However, in my meaning, not DARPA but the USERS will determine the final winner.

Chapel - http://chapel.cs.washington.edu/

Looks like everything I wanted from Fortran, like generics, classes and tons of array functions. All wrapped up with a simple, intuitive syntax. My favourite! (Read the paper "Finite Difference Stencils Implemented Using Chapel")

Fortress - http://projectfortress.sun.com/

Exotic notation in "mathematical language" as prime example? Have they learned nothing from history? Programmers do not want weird symbols (APL). (Here is the weird code example: NAS-CG.pdf)

Remark: I know that you dont have to use the notation if you don't want to. But the language should not offer to many DOFs because then you are stuck with having to draw up design documents etc. before you even can write a single line of code. Especially if you are collaborating with non-proffesional programmers (always).

X10 - www.research.ibm.com/x10/

Looks like something from Java with the array and number crunching features reworked. An easy-to-digest presentation of X10: X10-PSC-Tutorial-v10.ppt. Up close second thanks to it's array extensions but basically its Java without the latest fancy 5.0 or 6.0 stuffs (for better or worse. :-))

PS. A glimps of our history, before even FORTRAN: "A Dozen Precursors of Fortran" (120 min, Google video), a talk at "Computer History Museum" by Don Knuth.