techniques for JIT (parallel?) compilation of straight line numerical code

Reading A Methodology for Generating Verified Combinatorial Circuits, which involved some code generation/staging and AST optimization, I found myself wondering if there are libraries for just-in-time compilation (and then dynamic loading/linking) of simple numerical code. C seems like a decent language as a generation target (in that it's fairly assembly-like, and in its later (C99?) incarnations has fairly portable support for low level types like (and operations for) e.g. 80-bit floating point values.

Further, although I'm tempted to write everything in one language (currently C++) since it has made interoperability trivial, there must be some reasonable argument that the overhead of launching an optimizing C compiler surely gives you the leeway to say, invoke Meta-Ocaml routines that do the code-generation magic (never having embedded another language in a C++ program or called wrapped C++ in another language, I'm not sure how expensive or hard this is).

Anyone have some good references (to theory or practical implementations) or advice? Even the basic technique necessary to say, compile and dynamically load/link C code into another language (while probably somewhat platform specific) is foreign to me :)

I'm not at all committed to C as the code generation target; I just want something that compiles and optimizes loopless straight-line numerical computation well, and bonus points if it can split up the computation so that it runs in a distributed parallel fashion (N cpus per node, local memory on each node, relatively expensive network communication). It would also be great if code could be reordered to take advantage of cache locality and processor parallel-numeric abilities as well (of course this should be handled by the compiler and not addressed directly by the person generating the numerical code?). I have no idea which compilers (Intel's C compiler might be good) can do such analyses and how to encourage them to do so; by default I use gcc, which I understand is somewhat basic.

My intended application is repeated Estimation-Maximization (EM) parameter optimization; the computation for accumulating counts for each parameter as a function of the previous iteration's count can be compiled and made branchless, optimized (sharing common subexpressions, etc.), and then run a few hundred times. Thus I want to compile and load compiled code on the fly (the generated code of course depends on the input automata I'm estimating parameters for). For historical reasons, my work so far has been entirely C++, but obviously, switching to code generation/compilation would allow me to use any language as the driver as long as it can access the results of the compiled code without expensive marshalling or interprocess communication.

(apologies if this is more a compilers/tools than a language question)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Supercomputer Toolkit

Here's a ref on generating the straight-line code from higher-level code, then scheduling for parallel processing, etc.:

Andrew A. Berlin and Rajeev J. Surati. "Partial Evaluation for Scientific Computing: The Supercomputer Toolkit Experience".

They used Scheme and a parallel computer of their own design, rather than C++ and PCs, but it sounds a lot like what you're wanting to do, and they applied it to real problems and learned some lessons about how their system could've been more broadly useful.