Pico Lisp: A Case for Minimalist Interpreters?

Pico Lisp: A Radical Approach to Application Development describes a minimalist Lisp interpreter which has been used to build real applications for ~20 years. If you ignore the tendency toward self-aggrandizing, there's some food for thought here. Pico Lisp only has native support for symbols, integers, and lists (no arrays, no structures), and is implemented as a straightfoward interpreter. Yet in the benchmarks (usual disclaimers here), Pico Lisp fares well against a compiled Lisp. Is there a case to be made for ultra lean and mean interpreters as a viable alternative to compilers?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Note that I've had years to

Note that I've had years to indoctrinate myself in the virtues first of R4RS, then CL in general, then SBCL in particular, so take my language bigotry with a grain of salt. But that said...

The "compiled Lisp" you refer to happens to be a byte compiler, producing interpreted byte code. It's not nonsense to call it compiled: people call javac a compiler, too. But to use it as the only representative for the performance of compiled Lisp has a strawman feel to it: it's a bit like choosing to judge "compiled OCaml" by ocamlc instead of ocamlopt.

I think the ultra-austere set of types probably needs more justification than the benchmarks in the paper, too. Once you go beyond stripping the numeric tower out of R4RS, I think you're well into the regime where you need to get serious about convincing people that the missing types (vectors, hash tables...) aren't important. And I don't think benchmarks which happen to be problems where even in R4RS no one would use vectors or hash tables qualify as getting serious.

Admittedly lots of classic performance-critical operations can be done through the FFI: the 1988 pixel manipulations the paper mentions, and also modern things like compression, encryption, numeric code (FFTs, matrices, special functions...), and pattern matching. But lots of other things resist that: e.g., compilers/linkers/assemblers, and various AI-ish algorithms (e.g., code from Norvig's _Artificial Intelligence: A Modern Approach_ or _Paradigms of Artificial Intelligence Programming_). And even when FFI works it can be a chore: hard/soft layers are a fine idea, but it can be especially easy and flexible to have them in the same language.

So there seem to be four performance points: (1) Perl and Python and CLISP and hugs are plenty fast enough for lots of things, (2) R4RS had far too many non-numeric data types, (3) compilation is compilation (native code generation is beside the point, so CLISP is a fair representative of compiled performance), and (4) compilation isn't worth it.

The first point is uncontroversial. I am facing writer's block on an appropriate way to respond to the third point. I'd find the second and fourth point a lot more convincing if the author were to take on a native compiler (not a byte compiler) in something which naturally compiles to tight loops (not loops which cons a new cell on each pass) and which doesn't happen to be "well, duh, of course you don't need vectors or hash tables for that." I'd nominate an FFT (because that's a reasonably manageable piece of code to write as a standalone benchmark), BDDs (because they're not *too* much code, and because they aren't so obviously contrived to need data types other than cells), regexp matching (because I have seen with my very own eyes CLers being quite smug about the performance of CL-PPCRE), or any number of things from Norvig's books.

"it's a bit like choosing to

"it's a bit like choosing to judge "compiled OCaml" by ocamlc instead of ocamlopt." Except that ocamlc is only about four times slower than ocamlopt on my machine, while CLisp is something like ten times slower than SBCL. Roughly, of course.

Dynamic binding

PicoLisp, it seems, is a classical dynamic-binding interpreter using a shallow binding strategy. That makes it interesting for playing around with small dynamic-binding interpreters, for one thing.

I wonder if one could add a "special" declaration and a "function" quote (for funargs) to PicoLisp; the interpreter would ignore them, and an offline checker would investigate whether any nonspecial variables were being used outside their static scopes. Anyone have any ideas about that?

I second William's sentiment...

...using a bytecode compiler without describing it clearly as such makes it hard to give credence to the other points made in this article.

I think there's a strong case to be made for a clean, simple interpreter like pico lisp- But that case would also need to contain the phrase "yes, performance will take a hit" or it will appear disingenuous, IMHO.

Lean and mean

I get the feeling that too much of the mean-ness - i.e. speed - is being attributed to the lean-ness - a spartan data type set. That's not necessarily true. You *can* have vectors, hashtables, and the R5RS lot without adding a tinge of additional complexity to the execution of the interpreter than without these types.

The only place where the lean-ness might have considerable impact on the mean-ness in an interpreter is in the numeric tower and this is because processors work natively with certain number types only. If you abstract away numbers and various numeric types as generic objects, you might have to give up the performance advantage of a cell directly encoding, say, an 32-bit integer or float. A way to get over this is to provide numeric vectors and fast native operators that work on them in the first place - in which case you get more mean-ness in spite of sacrificing lean-ness.

Author posted a challenge on comp.lang.lisp

succinctness is false

http://esoteric.voxelperfect.net/wiki/HQ9_Plus

Definition of succinct misunderstood

Many people seem to believe that 'succinct' is a synonym for 'compressed', regardless of the comprehensibility of the result, but that's not the case. The American Heritage Dictionary defines 'succinct' as "Characterized by clear, precise expression in few words". That's the sense in which the word is being used in this context.

Piccola

I am intrigued by the approach used in the Piccola language which provides no data types at all! (edit: other than forms)

(obligatory fixed point: I mentioned Piccola once before on LtU)

LISP lists = Piccola forms?

Take LISP, replace lists with tuples, add names to members of tuples and you have Piccola forms. Am I missing something?

Lists and tuples are equal anyway, aren't they? they only have a small implementation difference: the function used to navigate from one member to the other is different.

It's still pretty large

I was sort of expecting something along the lines of the insane tiny Forth implementations that are out there. I mean in theory a Lisp interpreter could be absolutely tiny.

Alas, this "pico" Lisp is actually larger than Lua (about twice the size in fact).

Agreed

I agree that it's still large in comparison to Forth (of which I am very fond), but I haven't seen much in the way of tiny implementations of languages that aren't either low-level (like Forth) or teaching toys (like implementations of Scheme in one page of code).

I'd love to see some examples of ultra-tiny Lisp interpreters. With Forth, I've gotten to the point where I could write the core in a few hundred bytes (not an exaggeration), but in the process I move a lot of concerns from the language implementer to the user. A tiny Lisp might be a better route to take.

Old thread, but I just took

Old thread, but I just took a look at the language spec and was also quite alarmed at the size of this "little" Lisp dialect. Good take away for me - a limited set of data types does not by itself a small language make.

Tiny languages for Tiny machines

Is there a case to be made for ultra lean and mean interpreters as a viable alternative to compilers?

Yes!

I am working with small 8-bit micro controllers these days (AVRs in C) where while they are pretty speedy, they are severely constrained memory-wise (16->256K flash for code; 1->8KB SRAM for data). I do most of my work in C, but would love to see a truly lean interpreter to use to play around with hardware (sensors, LEDs, etc).

My daily programming consists of: modify, compile, download, test, modify, compile, download ... etc

Forth is often the answer, but it is typically coded in assembler and I tend to move around between different controllers in the AVR family, so pure ANSI C is preferable.

But (outside my own strange needs), why a dynamic high level interpreted language for tiny micro controllers?? Well, there is a new language customer out there: robot hobbyists! Their current choices: BASIC and C.

Now, getting something like Lisp to live in 128K code space and 4K of RAM (AVR ATMega128) would be a truly minimalistic feat!

No problem

Now, getting something like Lisp to live in 128K code space and 4K of RAM (AVR ATMega128) would be a truly minimalistic feat!

That's pretty doable. See e.g.:

1. BIT: A Very Compact Scheme System for Embedded Applications: "With this system, we demonstrate that it's clearly possible to run Scheme programs on a microcontroller with 64 KB of memory such as the Motorola 68HC11. Executables that include the whole library can be as little as 22 KB."

2. XS: Lisp on Lego MindStorms

3. LEGOScheme (project may be moribund)

4. Not Lisp, but the Transterpreter implements the occam-pi language, which is modeled on the CSP calculus.

AVR's uses the Harvard Architecture

The big problem with porting these tiny languages is that the AVR uses Harvard architecture (for the most part, the ones you mention seem to expect a von Neumann architecture). So, only 4K (in the case of the ATMega128) is available for the heap (minus call stacks, etc).

As bad as this sounds, the AVR stuff is pretty cheap and fast (and is fairly popular in embedded and gaining popularity in hobby robotics). The program space flash (128K on the ATMega128) is writable, so I've seen one implementation of forth "compile" words into the flash. Still, for languages requiring GC, 4K isn't a lot to work with. (Ironically, the processor runs up to 16 MIPS -- so doing GC time-wise probably wouldn't be too painful).

PICBIT

When I posted that comment, I was actually thinking about PICBIT: A Scheme System for the PIC Microcontroller, but I forgot that there were two papers and didn't realize that the latter one was more appropriate (I've worked with PIC and Ubicom/Scenix chips, but not the Motorola ones). The PICBIT paper talks about running with 2K of RAM.

Compiler vs Interpreter

I just scanned the paper, it looks very interesting. But, it seems to be describing running a byte code interpreter (VM) on the PIC. I am more interested in interactivity. What would you have to strip from a Lisp in order to have a very simple (but extensible) "interpreter" running in ROM (plus just 2-4K of RAM). *Potential Answer: Just about everything that makes Lisp interesting and useful ;-)

Sure, parsing and lexing would be expensive (but speed is not needed -- the language would be a "control" or "shell" language for the micro controller). What I want is a way to "script" a micro controller.

Why would you want an

Why would you want an ordinary interpreter on a system that size? Would being able to take in new parsed sexps be sufficient?

A DSL, really..

Not to hijack this thread, but I stumbled here while looking around for interpreter ideas for a small project: I've got an AVR micro controller board controlling an iRobot Create. While tethered to a PC, I want a "casual" programmer to be able to try out a few things interactively (rather than install a full C toolchain).

For the actual autonomous action, scripts would be loaded onto an SD card connected to the AVR. Aside from the debugging capabilities I would have by putting the intepreter on the AVR, I like the idea of *not* relying on a PC environment.

However, you are right, I wouldn't want an "ordinary interpreter". I want an interpreter engine upon which to build a DSL specifically geared toward driving and reacting to the Create.

My first thought was to go with a very minimalistic Tcl like interpreter (but Tcl is geared toward strings, I have no use for strings).

Try building yourself a tiny

Try building yourself a tiny cons+atoms lisp with a binary rep to get started with? If you don't mind keeping the initial prototypes PC-bound you should be able to do that in a day or so in an FPL without having planned it out beforehand, though it won't be the most efficient interpreter/VM ever.

Scheme for an AVR

We have a Scheme implementation for AVR-based Mica-series wireless sensor nodes. It's entirely interpreted, with primitives written in C, and pages to serial flash for virtual memory. Programs are stored as lists, and can in fact transmit themselves over the wireless network; if you don't need this feature, it should be possible to change the program representation to something more compact. The interpreter can run on a PC or a sensor node.
[actornet paper] [actornet software]

Linux for $100 and change

I dabbled in robots about five years ago and again last year, and I might do it again.

Five years ago, I had thoughts like yours, although AVR in-system-programming makes downloading so convenient that I was never seriously tempted to bring my 1980s-era Forth skillz up to date.

These days, though, you can get something like Gumstix, which (for $109 at the price I just surfed to, or $154 if you choose to hang your Bluetooth adapter there) gets you Linux and 64M RAM. There is a reasonably-complicated robot problem, popular here in Dallas, that is my mental benchmark for entry-level robot design (gathering up cans). I think today the natural place for AVRs in a flexible robot in that class would be to hang the AVRs from some sort of bus (TWI, e.g.) running to a Gumstix-class Linux+wireless board. Then unless you just enjoy the achievement of building your control hardware from $18 of parts (which I grant is pretty neat), running more than device-driver-level software on an AVR seems unnecessary. And I like working in Lisp, but I don't mind writing C or assembly for the kinds of 1-page-of-code device drivers that I envision on the AVRs. ("Device drivers" for midrange embedded microcontrollers, the kinds of devices that include big chunks of network stacks or polygon-shading libraries or filesystems or whatever, might be a different matter.)

Linux is not tiny enough ;-)

When it comes to low power consumption, nothing beats a simple 8-bit without an OS. GC can be a power hog too (as well as just managing a malloc/free managed heap), so maybe that is why Forth always seems to find a home on the smallest processors.

The question I am really pondering is how small can a high level language get before it loses all its "character" and usefulness. Specifically, I am talking about target hosted interpreters (not PC hosted compilers).

Good Opening

This is not directly relevant to your post, but is interesting nevertheless and is in this vein. I've run across Adam Dunkels before (probably due to Protothreads which I'm fairly certain have been discussed on LtU before). His work is pretty cool.

'should be entertaining, especially to those of us who like to do things like implementing raytracers in 256 bytes of object code. Oh, and useful to; yes, it should be useful as well.

a bit late but,

You may want to check out joy (a cousin of forth) with implementation in ANSI C.

|du -ksh joy
96K joy

speaking of small LISPs...

Todd, your posts reminded me of my first real programming project, at the Oregon Museum of Science and Industry. I modified a LISP 1.5 interpreter written in assembler for the PDP-8, to use both banks of memory (4K words each). Which is to say, the original version loaded and ran modest programs in 4K (12-bit) words. So I would think you could do something with 2-4K of RAM, since it would *all* be available for LISP programs and data, and you have by comparison a huge space for the interpreter. There is even a trick called cdr-coding that would use that RAM more efficiently, with a little more work from the interpreter.