Any research on garbage collection for a pure langauge?

I've been trying to find papers about garbage collection for a pure langauge. I can't find anything closer than garbage collectors for langauges that seldom mutate their objects. I've read a lot of Henry Baker's papers, and a few others. It seems to me that there should be big advantages to garbage collecting a pure language. Can anyone suggest any papers, either positive or negative, on the topic?

The reason I feel there should be advantages is that objects in a pure langauge can only reference older objects. Also which objects a given object references never change, so perhaps contiguous chunks of unchanging objects could arise.

[edit: Thomas Lord pointed out that I am not talking about pure langauges, but an even more restricted class of language. I was thinking about total pure languages, but I don't think you need totality to ensure that all objects only reference older objects, so that your reference graph is acyclic. I'd still love papers on garbage collection specialized to pure languages or total languages.]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


The reason I feel there should be advantages is that objects in a pure langauge can only reference older objects.

That's not close to true. For one thing, consider a constructor that yields multiple, mutually-referring objects all at once. For another thing, consider constructors that let you construct an object lazily where when the definition of a component is later provided, mutual reference can arise. Such examples are still "pure".

If you have only unchanging objects constructed out of earlier (completely) constructed objects then the reference counting literature would seem to be what you want - you're dealing with trees (at least as I understand you).

ah, you're right

You're absolutely right, that is still pure. Sorry, I've been mostly playing with total functional languages lately, so I forgot about such cases. I'll add a note to this effect at the end of my original post. Thank you!

Lazy updates

Another way old data can point to new data is when thunks are updated to normal form, which is done to prevent expressions being evaluated more than once. So even in total pure languages it is possible to grate cycles if you allow laziness.

You might like these papers:
Generational Garbage Collection For Haskell
Memory Management for Persistence

Generational garbage collection can net a big gain over a one-stage copying collector.

Pure Languages = Immutable Data

Pure languages tend to create lots of immutable data with few circular references. This greatly simplifies GC because there is no need to mark-and-sweep for immutable data, and thus reference counting provides a near optimal solution. Overall speed, however, is very poor when the compiler is not able to make efficient reuse of previous allocations.

Garbage Collection is a term closely related to memory management in languages with mutable data structures. As such, you may have more luck searching for just "memory management" or similar.

not reference counting

Reference counting, as far as I know, doesn't have the nice property of being faster than stack allocation, which copying collectors, for example, can have.

Is a copying collector the best we can do, even for a language that only creates acyclic graphs among its objects? (Not trees, Thomas, since I can do \x -> f x x)

Region-based memory

Region-based memory management is the best we can do, and it's not clear that we can do it well enough just yet, and it requires more static analysis and runtime params (but is more predictable).

Yes copying can be faster

Yes copying can be faster but only if you can pay the memory price.

Basic Question on Array and Hash Dict Intensive Langs

In the context of a typed, strict, "mostly functional" language with mutable arrays and hash dictionaries, I've often wondered about the cost of the generational copying collector's write barrier.

Given the hash code calculation, collision handling, etc. of dictionaries, I don't imagine an efficiently implemented write barrier would cost very much.

I can imagine that array update, OTOH, might suffer a significant penalty - compared to, a non-generational collector, or a C/Pascal/Modula-2/etc. style non-gc language.

One potential saving grace is that a typed language can detect assignments of non-heap data to an array slot and skip a write barrier check altogether. That handles a nice set of bool/integer/char array oriented algorithms - and perhaps arrays of doubles as well, given a fancier compiler and runtime that efficiently manages unboxed doubles.

Anyway, are there any good papers (minus the "functional data structures" style papers) on measured costs of the write barrier for array updates, various implementation strategies, etc.?

Thanks much!


p.s. If I turn something up in my own searching, I'll post links back here.

My Favourite MM References

Two useful meta lists on all aspects of MM and GC:

Although most is Java oriented there is some that is specific.