Embedded concurrent FPish languages?

Given the expanding pool of small multi-core chips (that search doesn't include all of them e.g. ARM, Mips, ...), what "advanced" (at least having garbage collection :-) languages can fit in a small memory space and yet support some form of concurrency which takes advantage of these various architectures? There's two things that would be neat: #1 a chip + language that Just Works but is affordable (e.g. a decent breadboard development kit is < $200), #2 a nice language which would somehow be relatively easy to port to these different architectures.

Is this a compelling thing for neat languages to target? Or are we doomed to hand-roll with C? :-)

I've poked around reading up on some of the usual suspects like Erlang, various Schemes, various concurrent C things, I guess there's OcamlP3L, but I haven't come across any of them clearly being already targeted or easily re-targeted at these whacky chips. (I guess maybe ARM would be the most likely to be targeted but haven't hit pay dirt yet.) Presumably two of the hold-ups are: #1 whacky instruction sets and #2 super small memory space. E.g. something like Spin is not-so-portable I guess. And I don't know how expensive the SEAforth stuff is, or how much I want to be working in Forth.

Also, debugging (sorry if an ad shows up first) might be "fun".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Maybe occam

Hm, well, if it worked for LEGOs... Dunno how portable it really would be to nutty things like the Propeller.


Occam is, as I think you're already aware, not really an "FP" language. Its sequential subset is pretty solidly imperative (more like Hoare's original 1978 CSP than the later version published in his textbook). The original occam doesn't support garbage collection either - largely because it statically allocates everything. Since occam-pi supports dynamic process creation and mobility, I imagine that it probably supports some kind of garbage collection (although I couldn't tell you what it uses). But I'm not sure if the Transterpreter (which is what you'd want to use to run occam on small micros) supports the full occam-pi language yet.

Good point

Sorry, I should be less completely random and imprecise when I talk about these things. I guess if there were something Erlang-esque (I'd heard of Termite and thanks for the reminder (below), I'll go look again) which is functional-but-with-easy-mutation that might bring nice properties to the development process. On the other hand, if Occam tools can help prevent painting myself into a deadlock corner, that's pretty attractive in my mind.

Breaking the deadlock

Occam by itself won't prevent you from generating deadlocks (although there are some common process-interaction patterns used within the occam community that allow you to avoid deadlock - Jeremy Martin's PhD thesis documents several of them). However, since occam has much in common with CSP it's fairly easy to model the process interaction in your occam program using CSP, and then apply a tool like FDR2 or Jeremy Martin's Deadlock Checker to verify freedom from deadlock (FDR2 also lets you verify a lot of other properties as well).

Re: process interaction patterns

Some day I would like to know or determine if the patterns in Occam, OTP, P3L, and any others are similar or surprisingly different. I wonder how much the difference is due to language differences, or if there are patterns in one and missing in another which could be successfully brought over.

Some of the low-level

Some of the low-level Haskell stuff might be relevant. They describe the high-level architecture using Haskell, type check it, and try prove whatever properties they're interested in, and then provide a semantics preserving translation to C. I wish I had time to look for the references, but they're here in LTU somewhere, so I'm sure someone else will dig them up. They've performed this process with some Windows drivers I believe. Ada is also good, but it's not functional.


Termite is designed to the Erlang model; using GambitC it compiles to C.


Have you seen Hume? It's a statically typed, mostly functional language. The runtime footprint for programs is less than 50kb. Apparently, it has been discussed earlier in another forum topic.

Embedded Gofer

An older project, but possibly worth looking at: Malcolm Wallace produced a version of Gofer that was extended with I/O register access; communicating processes; interrupts; and a real-time incremental garbage collector. His doctoral thesis on the topic can be found here, and a download of "Embedded Gofer" is available from his web-site.

Some more people seeing the light?

Good to see more folks getting the word out about implicitly parallel approaches, which seems better than sticking with the shared-mutable-state thing. Oh, wait! They are researchers who made an implictly parallel Haskell, which I guess isn't targeted at teeny machines.

Well, more commercial things coming out of the embedded world, at least. Hopefully something in there isn't just a hacked up C/++ but something more clean and fun.

But is it an oncoming train?

I'm skeptical that implicitly parallel (or concurrent) programming is the way to go for embedded systems. There are two reasons for my skepticism:

  1. Predictability - we have enough trouble getting predictable timing out of existing systems, particularly when you throw in nifty compiler optimizations, caches, etc. Implicit parallelization seems like yet another opportunity for the compiler to do things that make it even harder to predict timing (perhaps if the language had syntax or annotations that let us express timing constraints the compiler would be required to fulfill).
  2. World/Machine mapping - the real-world is parallel, and explicitly expressing the concurrency needed to respond to real-world events tends to lead to a clearer system architecture. It's also (in my experience) conceptually easier to write the code. I think it's noteworthy that the "dramatic example" of producer-consumer implicit parallelism in the embedded.com article you link to looks suspiciously like a process- or actor-oriented description of the problem (i.e. it's pretty much explicitly parallel, and identifies the actors in what sounds like synchronous dataflow network).

Joy (or Cat)

I believe an implementation or dialect of Joy, which resembles a functional Forth, could be appropriate for embedded systems development. My own language Cat, which is a typed language derived from Joy, might also have some promise in this area. I have been working on an implementation of Cat appropriate for embedded systems, but I don't have anything ready to share with the public at large. If you are particularly interested in this sort of thing feel free to contact me and I'll share what I've got so far.

Speaky of whacky instruction sets, you might want to see a rather hare-brained proposal I made on the concatenative mailing list for a SeaForth inspired 4-bit instruction set.

O'Caml, with effort

It has apparently been asked before, and with some hackery O'Caml could be a contender.


I don't know if it aims the embedded segment, but manticore is definitely for multicore machines.

Stream-processing: StreamIT, Flask, Regiment/WaveScript

I strongly believe in stream-processing languages for this sort of thing (languages where the program is an explicit dataflow graph that can execute in parallel). I hadn't seen OcamlP3l before; it looks to fall within this family. I'm kind of surprised that it didn't go further. Maybe it was just a little too early.

Anyway, there have been a bunch of functional approaches to streaming and dataflow computation. Id and pH (mentioned by raould above) were implicitly parallel and usable for this sort of thing, though too general purpose.

Functional reactive programming (FRP, discussed here) is a high-level functional approach, which is nice, but not high performance, embedded, or parallel.

Recently, there've been a slew of research stream languages: BrookGPU, Sequoia, and StreamIT. StreamIT can run on all manner of things, tiled architectures, Cell, and GPUs are in-progress. But they haven't particularly focused on embedded use. And it's not functional, or very high-level (static memory allocation).

Flask targets TinyOS, in distributed sensor networks. But none of the "motes" running TinyOS have any concurrency internally.

My own contribution, Regiment, which became WaveScript, is a high-level, ML-like stream-processing language that attempts to capture some of the benefits of FRP while having high-performance parallel and distributed execution. We use a multi-stage evaluation to reduce the source (FRP-like) program to a dataflow graph that can be scheduled in parallel.

In our first deployment (a sensor network to acoustically localize marmots), we ended up using our MLton code generator. We ported MLton to the ARM processors that we were using. MLton has a number of advantages over OCaml in this space. Since then we've moved to a C-backend so we can play around with GC strategies and such.

We've targetted multicore machines, and embedded machines, but not yet embedded multicore machines.

Sounds fun

Cool! Glad to hear there's some activity around these things. With respect to PLT/implementation, can you perhaps summarize the challenges of getting a high-level language to work on a resource constrained system? E.g. are there HLLs which seem to more easily map down perhaps? Presumably something which cons's inefficiently would not be good (unless you can get tricky with mem reuse, I guess). How can we have our cake and eat it etc.

Functional reactive

Functional reactive programming (FRP, discussed here) is a high-level functional approach, which is nice, but not high performance, embedded, or parallel.

There was at least some experimentation with real-time and embedded FRP, e.g. this.

My own contribution, Regiment, which became WaveScript, is a high-level, ML-like stream-processing language that attempts to capture some of the benefits of FRP while having high-performance parallel and distributed execution. We use a multi-stage evaluation to reduce the source (FRP-like) program to a dataflow graph that can be scheduled in parallel.

I mentioned Regiment several years ago, but it did not seem to garner interest then.


The folks at PacSoft are mean :-) in that they do all sorts of interesting things (like, also, Timber) but those things don't seem to get too far beyond the academic boundary?

Real-Time or not?

You mention Garbage Collection and multicore, I'm under the impression that there isn't that many 'real' (not just research) GC which are multi-CPU friendly ie not 'stop the world' when doing a garbage collection on one CPU.

And if you need to have a realtime GC additionnaly, then maybe the answer is none.. Of course I'd be happy to be proven wrong.

No hard requirements

Sounds true. I'm not personally looking for hard real-time + GC + multicore, although if that existed and wasn't a bear to use that could be really keen. (I'm not sure what differences there might be between multi-core and multi-cpu, by the way; are they possibly different beasts sometimes?)

If, say, Erlang could fit, it would be fine with me to have 1 Erlang per core/CPU, with their own memory areas, and use message passing among those core-bound VMs. I think Erlang is claimed to be soft-real-time, whatever that means :-)

Similar thing could go for Scala+Actors+Remoting or OCaml+MPI (yuck on both?) or whatever, I guess. So maybe it comes down to: a small enough friendly language that has a decent library for message passing... but I don't expect anything like that would really fit on the smallest of devices like the Propeller. That thing is pretty darned small, it seems. Well, maybe somehow PICBIT Scheme or something could be applied? Or the previously mentioned Transterpreter.

So, the question and the answers are very "it depends" upon the actual hardware I available.


I realized I misspoke above as there has been some work on real-time and parallel FRP.

For parallel GC, and real-time GC, I recommend reading Erez Petrank and David Bacon's publications (respectively). Petrank's 2007 "Stopless" paper describes a real-time *and* parallel collector.

Sure, many these are "research" systems. But I also think that the Metronome stuff from IBM research (Bacon et al.) gets used pretty seriously. It was discussed on LTU here.


Thanks for the reference to Stopless, I'll read the paper.

As for Metronome, it's real time sure but I doubt very much that it scales for SMP loads given that IBM doesn't talk at all about SMP in their presentations..