CPS without GC?

I am steadily growing frustrated with the programming languages and more importantly their implementations that I use both at work and at home. I won't bore you with why. So I thought I'd have a go at creating a language and RTL that I would like to use.

Personally, I happen to think OO is greatly overrated and prefer a functional/procedural approach and would probably prefer to use CPS for the implementation but would rather shy away from the complexities of garbage collection. But as far as I can tell it seems impossible to implement continuation passing style (and closures, of course) without garbage collection. Correct?

(I can tell as I write this that anyone responding is just going to say 'GC has decades of research behind it and these days can be extremely efficient while remaining relatively simple. What's the problem?' so rather than explain why I don't want GC perhaps my question can be taken as a hypothetical inquiry. :))

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

CPS

But as far as I can tell it seems impossible to implement continuation passing style (and closures, of course) without garbage collection.

You can support some special kinds of closures without the need for heap allocation (hence garbage collection). Obvious examples include closures with no captured variables. But even a closure that captures variables can be allocated on the stack as long as the environments containing the captured variables are guaranteed to outlive the closure on the call stack. Fortunately all the closures created by CPS transformations are of the latter variety.

BTW, a simple garbage collector is not very difficult to code--there's no reason to sweat it. No-one says you have to use a super fancy generational or train-track collector; you can just go with a dumb and simple mark-and-sweep collector. Trust me, if you can implement the rest of the language, a simple GC should prove no challenge.

Boehm gc

Or if you are using C you can simply link against Boehm's gc. Everybody seems to be doing it these days. :)

CPS without GC

The problem is closures with unknown extent, not with CPS. CPS on its own doesn't introduce closures with unknown extent into a language that doesn't have them, so if you don't otherwise put them into your language there's no reason you need GC. As it happens, this paper talks about that a little in section 3.

Paper name and author

Ghostview refuses to load that file on my machine. Can you give the paper name and author so I can look for a pdf? Also, it will make the LtU archives more relevant if the link ever goes stale.

Thanks!

Automatically Restructuring Programs for the Web

Automatically Restructuring Programs for the Web, discussed on LtU several times.

And one of the authors is Jacob Matthews :-)

Ahh.

Ahh, I've inadvertantly and unnecessarily coupled closures with CPS. Thanks. I'd considered supporting only closures that outlived their closed vars (in a traditional call-stack implementation) but thought it might be too limiting. I see CPS can actually help here.

Damn, it's so obvious when smart people point it out to you.

It's not difficult to write your own GC (at least for testing).

If you do not want to use Boehm's GC, then you can write your own. It is really easy; all you have to do is use two stacks for each thread: one for general data, and one for pointers. Allocation should take place using a global lock on a critical region from a predefined allocated area by advancing a pointer. At collection time, all threads should be suspended (except the current one, of course) then the collector should do the following:

a) scan the root set for objects; mark reachable objects (set/reset a bit in the object)
b) calculate the new address of object
c) traverse pointers again, as in step (a), and adjust them to point to the new memory address of the object.
d) traverse objects; move reachable objects to their new address and finalize unreachable objects.

It's very easy, and should not take you more than 200 lines of code.

But debugging a GC...

...is another matter altogether. Any bug in a garbage collector is, by definition, a Bug From Hell. Unless you REALLY want to learn about garbage collectors, the sane route is to use a canned one. :)

Design by Contract

When writing a GC you really have to adhere to a design by contract discipline and have debug code to thoroughly check every precondition, postcondition and invariant. I've found with that you can nail GC problems pretty quickly.

The KISS principle applies well to writing GCs.

When a new GC is written, the best way to proceed is to apply the KISS principle: making a simple GC as first, secure that it is correct, then proceed (if necessary) with more complex choices.

I have written many GCs in the last few months for C/C++ in order to learn how to do it. I've written completely naive C++ GCs based on special pointer classes (not reference counting) down to C GCs that examine the stack and global areas in the lowest level (for Windows only, as I am not a Linux hacker). It's hard to do the latter type of GC in C without knowing type information or knowing exactly where the data are (fortunately Windows provides APIs for walking stacks and access global areas), so I imagine that writing a GC for a new programming language it can be easier, since type information will be available at compiler-level, as well as thread information.

CPS and the stack

Doesn't CPS eventually lead to stack overflow, if your language (or its implementation) doesn't do tail-call elimination? Most invokations of continuations are tail-calls, so TCE makes it work nicely. But in a language without--each continuation adds a stack frame, and the stack tends to grow without bound, does it not?

Or am I missing something here?

Yes and no

Yes is would lead to stack overflow, but it seems the original poster is talking about using it for the implementation. Anyways, the poster would simply not use a language without TCO. I'm not sure why you bring it up.

Other than assembly...

all the languages I can think of that mandate TCO, have garbage collectors. When he mentioned a lack of GC, that made me assume a C/C++ implementation, which does not specify TCO. Nor can you manually encode a tail-call in C/C++, as interprocedural gotos are not allowed.

(Indeed, C/C++ has many features which are TCO-hostile, such as taking the address of items on the stack...how many C/C++ compilers can be counted on to implement TCO, anyway?)

Depending on what kind of lan

Depending on what kind of language your implementing and how you do it, you might have to implement your own GC even if youre implementing it in language that itself has GC.

Pre-Scheme?

all the languages I can think of that mandate TCO, have garbage collectors
While Pre-Scheme does not support full TCO, it does support local TCO, and it has no GC.

Of course, we can argue that Pre-Scheme is too low-level and therefore qualifies for assembly language :-)