LinearLisp: a proof of concept in full linearization for AST-walking interpreters

Or, how to walk an AST with (almost) no references on the heap and how it pays off:

https://github.com/ysharplanguage/LinearLisp

Implementation: C# / .NET 6.

Comes with basic microbenchmarks for some contenders: Microsoft's PowerShell, Microsoft's JScript, and Python 3.10+

Comment viewing options

PowerShell is so darn slow,

PowerShell is so darn slow, that you can't even afford to use the same upper bounds / constants as in the other interpreters' microbenchmarks, if you're just curious to see the timings of all four scripting languages under a minute or so.

Disclaimer & caveats

Still a pretty raw experiment. Unit tests still embarrassingly missing. Still unclear to me how the performance gains observed so far vs its (non-linearizing) predecessor (*) may or may not scale with larger or much larger source programs - need to check on that.

(* ie, roughly x5 to x8 speedups on same machine, same C# compiler, same microbenchmarks)

What is "full linearization for AST-Walking interpreters"

Can you explain what linearization means in this context, and what the relevance is? The Github repo doesn't explain much.

Sure, in a nutshell...

Sure, in a nutshell...

The basic idea isn't exactly new, and no doubts it's been done in other implementation languages, but I was curious to see how it would fare by implementing it in C#, and with as little compromise as possible wrt. the central representation choice.

Tbh, I started off pretty skeptical and more or less expecting that it would in fact be a rather silly thing to do, and that it wouldn't make much of a positive difference, if any at all, or if not possibly make things worse.

As it turns out, you never know until you actually try things out!

So here it goes:

You collapse the whole AST in just a single big flat 1-dimension array, that your interpreter's eval / apply evaluation core runs a cursor over.

Thus, all references to objects on the heap (formerly, nodes and children lists, attributes, etc) which "used to be" the first concrete runtime representation of your AST are now gone.

Indeed, the whole "AST" is now just an array of integers. My intuition was it might be - "who knows?" - more friendly with CPU cache locality for optimized compiled C# to only "eat" a single int[] array, or almost exclusively anyway, for the sake of walking the AST, rather than visiting the "more common place" bunch of refs to reference types or to boxed value types (for literals) that we typically end up with (when we stick to the notion of "tree" in "AST" literally) after we "naively" designed our representation data structures around it (ie, again, with representation types for "nodes", "child lists", "attributes", etc).

So, in LinearLisp, this expression / term:

( n + 1 )

becomes:

12, 3, 4, 5, 6, -1, 0, 18, 0, -15, 0, -20

From left to right:

12 gives the total subtree length to come;

then 3 is its arity (ie, # of children: "n", "+", "1");

then 4, 5, 6 are the relative offsets to each child, from left to right: 4 points to the "0, 18" that follows... ie, the "n" symbol - of index 18 - in the current environment / entry in the symbol table.

then 5 points to "0, -15" that follows... the builtin "+" operator (note the negative sign for builtins vs programmer defined identifiers)

then 6 points to "0, -20" finally... that'd be, the literal constant "1", for that particular source program, which is interned in the constant pool (also a negative index by convention, as you can just shove both interned literals and language builtins in the same pool, if you conventionally agree to allocate indices for literals to appear beyond those used for builtins).

finally, the -1 in the middle is a redundant marker to end the children offset list... which is only useful for builtins with variable arity that may not want to have to bother inspecting the count of formal vs actual arguments.

That's then all that is needed for an "AST walk" without any heap or garbage collector getting in the way (if we're strictly speaking of the AST walk per se, anyway; of course, you're still going to need a heap - under obsessive GC surveillance - for your dynamic environments, closures, etc) and still be able to eval anything just by using the host (implementation language)'s "natural" call stack.

Cf the Linearize and Rehydrate methods in the code... together they implement the bijection to go from / to the original S-expr (initial AST, right after parsing) vs the linearized version which the evaluator eventually only consumes as just one 1-dimension array.

'Hth.

Clearly, though

Clearly, though...

That's the sort of technique worth giving a shot at only under the assumption that your implementation language's "native arrays" are fast enough, which thankfully is the case with the .NET's CLR.

It's actually a post by Matt Warren about how, from the early days and on, the CLR team put a lot of efforts to optimize those good old arrays in the runtime, that gave me the idea to look into it.

Okay, I think I get it

Okay, makes sense. Representing trees as arrays is usually more efficient for traversal.

When you called is Linear Lisp I was thinking of the similarly named work of Henry Baker.

Forgive my ignorance and sorry for the confusion, if any: I was in the belief I was coining a new name of my own for the technique, and I wasn't at all aware of Baker's paper, let alone the (accidental) homonym.

So anyway, for the record

So anyway, for the record, just by the mere switch to this rather peculiar IR (for a managed host language like C#), I went from computing 12! (that fits in 32bit) x 100,000 times in about two seconds:

https://replit.com/@ysharp_design/Super-Lean

( ^ LinearLisp predecessor - aka YoctoLisp, with more bells and whistles - eg, set-theoretic operators, operator overloading, .NET interop, etc - but x 10 times slower, as it turned out)

... to computing 20! (that fits in 64bit) x 1,000,000 times in just under 2 seconds instead, on the same machine (my laptop, not replit's).

Not too shabby for otherwise the same extensible interpreter core in a couple hundreds of lines of C# :)

And yes, I do know that Python (ie, CPython, not PyPy, of course) isn't considered to be a fast type of interpreter anyway, but it's nice to at least play in the same ballpark with as small a codebase, no? Somehow, I doubted I could only do it while sticking to C# - so, you never know!

But again, I still need to check on larger, more involved algorithms to know how that pans out with more realistic usage, okay.

Quick update.

Quick update.

Okay, so I finally took the time to go a little larger in the source program size and I brought in the classic unoptimized version of the Eratosthenes Sieve into the tests, faithfully following the same algorithm and idiomatic data structures (an array, plus a list) in each of:

* C# / .NET 6

* Microsoft JScript (Windows Scripting Host)

* Python 3

* and "LinearLisp" (*) after adding to the latter the Lvalue-capable indexing operator for arrays (to allow arbitrary mutation of the i-th element of course, needed for that sieve algo)...

Then when I run this benchmark - to output all the primes below, non inclusive, 15,485,865 (that's the first million primes), I get, on same hardware, and about same prior idle CPU:

C# (compiled with optimizations on)... 120ms, consistently

MS JScript ... between 62 and 64 secs (over several runs, to be sure)

Python 3 ... between 4.8 and 5.2 secs

LinearLisp ... between 11.5 and 12 secs

But! I know why!

Python puts me to shame on these array accesses precisely!

Other than that, I'm still +/- 10% in Python's yard with things such as the dumb doubly recursive Fibonacci(47) - which takes both Python and my toy ~ 12 minutes to cope with... versus ~ 16 secs for C#.

Could it be that the morale of the story is AST walking interpreters indeed do not have to be slower (or not much slower, anyway) than bytecode-eating ones, even if implemented in a managed environment?

Cheers,

(*) Obviously, I didn't even bother trying PowerShell...

FTR - the vernaculars under test

The Python -

thePrimes = []

def getPrimesBelow(n):
if (n <= 2):
return
prime = [True] * n
prime[0] = False
prime[1] = False
i = 2
while (i * i < n):
if (prime[i]):
j = i
while (j * i < n):
prime[j * i] = False
j = j + 1
i = i + 1
for p in range(0, len(prime)):
if (prime[p]):
thePrimes.append(p)


The JScript -

var thePrimes = [];

function getPrimesBelow(n)
{
if (n <= 2) return;
var prime = new Array(n);
for (var i = 2; i < n; i++) prime[i] = true;
prime[1] = prime[0] = false;
for (var i = 2; i * i < n; i++)
{
if (prime[i]) for (var j = i; j * i < n; j++) prime[j * i] = false;
}
for (var p = 0; p < prime.length; p++) if (prime[p]) thePrimes.push(p);
}


The LinearLisp -

( let
(
thePrimes ( ... )
GetPrimesBelow ( ( n ) =>
(
let ( prime ( true [] n ) )
( ( 2 < n ) ?
(
( ( prime @ 1 ) = false ), ( ( prime @ 0 ) = false ),
( i = 2 ),
( while ( ( i * i ) < n ): (
(
( prime @ i ) ?
(
( j = i ),
( while ( ( j * i ) < n ): (
( ( prime @ ( j * i ) ) = false ), ( j = ( j + 1 ) ) )
) )
:
prime
),
( i = ( i + 1 ) )
) ),
( for p = 0, ( # prime ):
( ( prime @ p ) ? ( thePrimes : p ) : thePrimes )
)
)
: thePrimes )
) )
) ( ( ) => ( ( GetPrimesBelow 15485865 ), thePrimes ) ) )


The C# -

        var found = new List<long>();
if (n <= 2) return found;
var prime = new bool[n];
for (var i = 2; i < n; i++) prime[i] = true;
prime[1] = prime[0] = false;
for (var i = 2; i * i < n; i++)
{
if (prime[i]) for (var j = i; j * i < n; j++) prime[j * i] = false;
}
for (long p = 0; p < prime.Length; p++) if (prime[p]) found.Add(p);
return found;


Hopefully, there's no cheating, they're all the same naive version of that thing.

PS.
And yes, the accumulation of the result in a global variable is done on purpose (what implementors have to watch out).

PPS.
(Ah, and I opted for this admittedly awkward version of the outer loop, because I didn't want to have to go through the hassle of bringing Sqrt() as a new builtin in LinearLisp, which was besides the point for this comparison between the interpreters. I should put that sort of batteries in the language later, though... if ever.)

Hmmm...

I can't help but wonder what we'd see if we followed the same principle of using the heap strictly for environments, closures, and values only - but not in any way for the sake of digesting the IR - after porting this humble super-flat-AST-walking toy to C99...

If you're a C99 / C++ champion and curious about it, too... Well, just hit me up!