[Newbie] Where can I learn about VM / C interfacing?

Hi all,
I worked through EOPL (1st edition) and consequently can write interpreters/(simple) byte code virtual machines for simple languages.

Now I'd like to learn how to interface these interpreters/vms to libraries written in C. Most scripting languages seem to have some kind of FFI and I am eager to learn how to interface my (admittedly toy) languages to existing libraries.

Can anyone point me at books/papers etc that deal with this aspect?

Is it necessary for the interpreter/vm to be written in C to interface with a library written in C? (If this is a dumb question, please do not hesitate to correct me. I am very much a newbie).

Any pointers to something that tells me "How to interface an interpreter/VM with C libraries" (or something that comes close) is greatly appreciated.

Ideally someone has written a book that deals with these aspects but *any* pointers are fine! I am totally flummoxed.

Thanks in advance,

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Best thing to do is look at JNI

I don't know of any specific "How do I build an FFI" books, because it's mostly though too small of a problem to be worthy of that big of a treatment. That said, your best bet is probably to look at books on JNI, the Java Native Interface. There are a couple of good ones out there, but the Spec+Programmer's Guide is probably a good place to start: http://www.amazon.com/gp/product/0201325772/qid=1135098629/sr=8-1/ref=pd_bbs_1/104-8525119-8007901?n=507846&s=books&v=glance.

It won't tell you how to build an FFI, but it is a great way of learning the issues behind a C interface.

Thanks for the suggestion. But

the specific problem I am trying to solve is that of how to *build* an FFI. Let me restate that. I am looking for information on how to get the language interpreters/virtual machines I build (using what i learned from eopl) to be be able to use c libraries.

So any books referred need not be exclusively about how to build an FFI. I can imagine that may be too narrow a scope for a book. But if a "how to build compilers and interpreters" book (like EOPL) has a couple of chapters on this topic, that would be nice to know.

Or some papers. Or lectures. Or something.

Thus, I am not sure if examining the JNI spec (I have *used* JNI), while a worthwhile activity in itself, may not help me solve *my* problem. Examining how the jvm implements jni might. But hopefully someone can suggest a simpler way.

Thanks again,

Some Publications

Two publications I know of are: No-Longer-Foreign: Teaching an ML compiler to speak C "natively"
and Foreign Interface for PLT Scheme

If you just wanted to interface to C libraries without all the tedium of writing a C FFI I'd just port your VMs to PLT Scheme and use it's (very good) FFI. You can do some fun things with PLT's FFI, like generating machine code on the fly and getting it to execute it for you.

You might also be interested in libffi, and in the C ABI (Application Binary Interface) for your platform.

Oh, and just remembered there are some Haskell papers as well. Search for "Haskell greencard"


To address your question, does the interpreter have to be written in C...: IMO: Although this doesn't have to be the case, it would make things much easier. Many systems that have actively used FFIs tend to also be languages with some amount of C infrastructure in the language implementation itself. Take, for example, Perl, Python or OCaml, all of which have very heavily used FFIs for wrapping libraries or writing things in a 'lower-level' manner.

That said, another route that would be nearly as simple would be if you used one of the above mentioned languages as your implementation language for the interpreter. Then the values, functions and AST of your toy language would just be values in the already FFI-able high-level language. You would then make some kind of wrapper that uses the FFI of the high-level language to further export symbols from C to the toy language's interpreted environment (or viceversa).

I think another important question to consider is, what is the purpose of your FFI? Perhaps there is a precise definition of what 'FFI' means (aside from the words the acronym expand to), but it seems like when people say it, they often mean one of two things: (1) a system to implement or wrap libraries in C/asm/etc. and then use them in a 'higher' language. But a less common (and perhaps a misuse of 'FFI') is (2) to poke at C-style data blocks using a high-level language, i.e., to pack and unpack data blocks from files or other (otherwise unorganized) memory heaps, etc.

Are you just interested in (1), i.e. the 'wrapping problem'?

Also, I would agree with your reaction to the suggestion of looking at JNI. Java, again in my own opinion, is often too complex (and hairy?) for its own good, and certainly for the purposes of modeling a new (and probably much simpler) FFI system. Again, I believe that the languages I listed above will be more amenable to learning from. There are probably more that others can add, such as Lua, Scheme, and LISP, though I haven't investigated them myself.

[OCaml Manual for its FFI]

Take a look at Lua-ML

I strongly suggest to read Norman Ramsey's paper about Lua-ML where he discusses several FFI designs and presents a clever new design based on higher-order functions.


Haskell FFI

A nicely designed FFI is Haskell FFI.

The specification doesn't show how to implement it though.

Thanks everyone

for the suggestions and pointers. Your suggestions have helped me refine my thinking. I've realised that I was trying to answer two distinct questions.

1)how do I 'import' existing c libraries into my interpreters? (the 'wrapping' problem as Hammer puts it).I agree that using the host language's ffi (PLT Scheme's in my case) would give me this effect.

2)How does one implement an FFI?

The papers on PLT Scheme's FFI and the one on Lua-ML are almost exactly what I was looking for. I'll need to dig into source to see how exactly these are built (and once I understand how maybe I can write it up) but the papers are a great help. This forum is a great resource.

Thanks again.

See also the FFCALL libraries

There is a GPL library (or set of libraries), FFCALL, the most imporant being avcall "-- build a C argument list incrementally and call a C function on it." This is basically the inverse of stdarg.

I have also adapted its reentrant trampoline code (callback/trampoline_r/trampoline.c) to have the Win32 callback functions call Perl subroutines.

And libffi, as well

Searching around, I discover that gcc now ships with libffi, similar to, but different from ffcall.

If you google "libffi ffcall", you can see discussion of using them and the trade-offs.

Neko FFI

I'm a big fan of OCaml C FFI, the only bad point being that the GC doesn't scan the C stack so you need so use some kind of macros to store parameters and values that will register them as roots.

In Neko, this is not the case since the GC will scan the C stack. The Neko FFI documentation is pretty clear and the implementation is also, since a lot of operations are declared in the header as macros, and the others are a clear and regular API (inspired from OCaml).

I would avoid all kind of FFI that need to implement primitives in terms of VM stack manipulation since it tends to be at the same time less clear and less efficient.

two more

Two approaches to interfacing existing code to some kind of VM. Both pages are short and informal.

The function binding system used in the game 'Dungeon Siege':


Interfacing Win32 API and SIOD: