Lambda the Ultimate

inactiveTopic On Garbage Collection
started 11/22/2003; 7:12:02 PM - last post 11/28/2003; 3:20:42 PM
Patrick Logan - On Garbage Collection  blueArrow
11/22/2003; 7:12:02 PM (reads: 11081, responses: 14)
On Garbage Collection
This comes from the EROS capability-based operating system architecture email list. The leader of that project describes what he believes may be needed as a language for writing applications safely for EROS, especially with respect to garbage collection, concurrency, response time, and protection from memory allocation faults.

Several responses have been posted to the list as well. Here's the top of the thread.

In the course of deciding not to submit a DARPA proposal, I spent some time looking at what is going on in the world of penetration attacks. I conclude that we are solving the wrong problem -- or rather, we have been solving phase II in the hope that other people would solve phase I. I have now concluded that this hope is in vain...

There are those who may argue that we already have a replacement. Possible candidates might include C#, Java, or D (I restrict my attention here to statically typed languages, but without prejudice). I think that none of these are a good replacement.
Posted to implementation by Patrick Logan on 11/22/03; 7:14:28 PM

Nick Barnes - Re: On Garbage Collection  blueArrow
11/24/2003; 4:24:45 AM (reads: 643, responses: 1)
Some of these EROS people seem rather ill-informed and intemperate. They state, inter alia, (a) that there are no good GCs for concurrent multithreaded systems; (b) "nobody has demonstrated remotely useful performance out of any" ML-derived language; (c) that incremental GC is only "allegedly" incremental.

I'm biased: I worked on an SML implementation that had great performance. I have spent about ten years developing various GCs, including truly incremental ones, which were and are used in various systems, including concurrent multithreaded ones. Our MPS memory management system has been integrated into a number of such applications.

Ehud Lamm - Re: On Garbage Collection  blueArrow
11/24/2003; 4:42:20 AM (reads: 642, responses: 0)
I found the message Patrick linked to quite strange also.

There are tons of papers on these issues, if anyone is interested. I am not really into GC, but even I know about some of them...

James Hague - Re: On Garbage Collection  blueArrow
11/24/2003; 6:41:09 AM (reads: 622, responses: 0)
Agreed; these people seems to be basing their opinions entirely on hearsay. But there's some truth in still needing to be wary of garbage collectors for certain types applications. Let's say you have several hundred megabytes of essentially static data (say 3D geometry). In every garbage collected language I've used, there's always the possibility that the garbage collector will peruse all several hundred megabytes of that data at some point, resulting in a pause that's much more than the 10 milliseconds or so that can typically be afforded in real-time animated applications. (Generational collectors simply make this less frequent, but don't rule it out.)

On the subject of incremental garbage collection, here's something I've been wondering: Which existing language implementations use incremental garbage collectors (excluding simple reference counting)? GHC is one. Are there others?

Neel Krishnaswami - Re: On Garbage Collection  blueArrow
11/24/2003; 8:23:17 AM (reads: 589, responses: 0)
Hi Nick, could you talk more about the MPS system? I'm interested in how you would use it to build a good GC system -- it's open source, and it seems like a shame that so few language projects are taking advantage of it.

Nick Barnes - Re: On Garbage Collection  blueArrow
11/24/2003; 3:09:51 PM (reads: 521, responses: 0)
Sure, I can write a little about it. This is off-the-cuff, so don't expect a polished sales presentation.

We developed the MPS at Harlequin as a highly portable, flexible, reliable, and versatile memory manager, originally intended to provide (incremental, generational, mostly-copying) garbage collection for Harlequin's DylanWorks product and memory management for the ScriptWorks family of PostScript-language-compatible software products. A small team of engineers, including a number of GC experts, invested about thirty person-years of effort into the MPS at Harlequin, making it the best we could. In the process we made a number of innovations, several of which I believe are still unique to the MPS. We also worked hard on software process improvement, with great results.

Harlequin was a remarkable company, making most of its money from PostScript (ahem, "PostScript-language-compatible", as Adobe's lawyers taught us to say) but pursuing many of its founder's interests in Lisp (through various products including a Lisp development environment called "LispWorks") and other languages (e.g. a complete SML development environment "MLWorks") and technologies. The company had many strengths, but wasn't always good at identifying or commercializing them. Richard Brooksby (MPS project leader) and myself left in 1997 to form a small consultancy called Ravenbrook Ltd, initially working for Geodesic. Harlequin went bust in 1999 and the assets were purchased by Global Graphics. DylanWorks (including the MPS) was later brought to market by Functional Objects Inc as "Functional Developer", and the ScriptWorks technology (including the MPS) still forms part of the core Global Graphics product line. [the Lisp business has been spun off to make Xanalys Inc, which still makes a living from it, and in fact Ravenbrook are satisfied customers of theirs).

Enough about Harlequin. Richard Brooksby and I regretted leaving the MPS behind when we left, and in 2001 we were able to acquire the intellectual property from Global Graphics. This included the repository of MPS sources, the "Memory Management Reference" website <http://www.memorymanagement.org/> and also (maybe more importantly) the Infosys ("Memory Management Information System"), a large collection of documents including requirements, architecture, design, procedures, process documentation, document rules, review checklists, review records, minutes of meetings, references to books and papers, discussion documents, and so on and so forth. Regrettably all of this documentation was in the form of a Lotus Notes database, and was rather costly to extract into a non-proprietary format. We have also had to "redact" the code and the infosys, removing any Global Graphics trade secrets. By mid-2002 we were able to release the MPS, including some documentation, as an open-source product <http://www.ravenbrook.com/project/mps/>. We have been joined at Ravenbrook by several other ex-Harlequin staff.

The focus of the MPS is on providing memory management which is powerful, flexible, efficient, and reliable. An application using the MPS can choose a mixture of automatic and manual memory management, a combination of conservative and exact GC, a variety of generational and non-generational policies. Numerous clients can make simultaneous use of the MPS, either cooperatively or independently. The manual memory management is not "bolted on" to the side of a garbage collector: it is also flexible, fast, and thread-friendly.

All this power does have a price: integrating a client with the MPS is not trivial. It does not attempt to be a drop-in garbage collector; if you want one of those, use Boehm's GC or Great Circle (if you can still buy Great Circle). For instance, it does no root discovery: the client must tell the MPS where to find roots.

We intend to make an additional open-source release soon, with a focus on expanding the documentation and providing more example code to ease the job of integration. We have one commercial client, Configura <http://www.configura.com/>, who use the MPS for memory management including mostly-copying garbage collection in a new product of theirs. There is other commercial interest.

Nick Barnes - Re: On Garbage Collection  blueArrow
11/24/2003; 3:10:39 PM (reads: 517, responses: 0)
Argh. No paragraph breaks. Sorry.

Oh, look, an Edit this Page button. :-)

Patrick Logan - Re: On Garbage Collection  blueArrow
11/24/2003; 7:05:15 PM (reads: 492, responses: 0)
Some of these EROS people are evidently both ill-informed and intemperate.

Yeah, I thought the thread over there might spark a better conversation over here. The few times I've followed other threads there, the intemperate voices were also fairly discouraging.

Wouter van Oortmerssen - Re: On Garbage Collection  blueArrow
11/25/2003; 3:42:58 AM (reads: 460, responses: 1)
The original post touches a very interesting issue, and one I have been looking at in the past though I have no solid answer for it. I hope this doesn't just reduce to another pro/contra GC argument, as that is not the issue. In some cases, GC is really problematic, and it is interesting to keep looking at alternatives.

In an Orthogonally persistent OS like EROS for example you have the additional problem that the memory space to be managed is potentially huge, i.e. your entire harddrive. Garbage collecting that without the performance problems may be a challenge for even the most carefully tuned generational collector.

Recent PLDI and related conferences have seen a lot of renewed interest in regions, ownership, uniqueness, and all that. I'd be very interested to see if that will finally result in a generic memory management system for a (mainstream) language or an OS. For an orthogonally persistent OS it sound to me a pretty good match, but the problem with all these schemes is often that they work best with language/programmer cooperation, which often people feel is too high a price to pay (why?).

Lets face it, the problem of memory management is not "solved". The intrinsic inelegance of GC (throwing away liveness information, only to periodically recover it with brute force scanning) will mean that there will ALWAYS be a need for alternative memory management techniques in niche areas, no matter how good generational GC gets. I for one am very interested to see any advancements in this area (pointers, anyone?).

Ehud Lamm - Re: On Garbage Collection  blueArrow
11/25/2003; 4:02:45 AM (reads: 447, responses: 0)
Some interesting recent papers can be found on the Jikes RVM publications page.

Wouter van Oortmerssen - Re: On Garbage Collection  blueArrow
11/25/2003; 4:06:54 AM (reads: 446, responses: 0)
Oh and.. a post by Mark S. Miller mentions using an "Object Table" as an example of safe use of free(). But how does he intend to memory manage the object table itself (it would grow indefinitely) ? looks to me it would just be shifting the problem... or am I missing something?

Patrick Logan - Re: On Garbage Collection  blueArrow
11/25/2003; 10:00:57 AM (reads: 402, responses: 0)
Another data point is JRate, based on the GNU Compiler for Java. JRate is an implementation of RTSJ, the Real-Time Specification for Java.

...there will ALWAYS be a need for alternative memory management techniques in niche areas, no matter how good generational GC gets. I for one am very interested to see any advancements in this area (pointers, anyone?).

One of the most interesting is the Stalin compiler for Scheme. It does a liveness analysis and inserts "free()" on allocated memory in appropriate places. It also does an extensive representation analysis to allocate memory and use it efficiently.

Stalin is about 10 years old, so I don't know if it counts as an advancement.

Luke Gorrie - Re: On Garbage Collection  blueArrow
11/25/2003; 10:12:28 AM (reads: 397, responses: 0)
Another data point that I'd hesitate to call an "advancement" is the no-nonsense way the way some Lisps (e.g. CMUCL) commonly do it. They give you a garbage collected heap to serve most of your needs, but let you call malloc, mmap, and so on through their foreign-function interface. They offer you a few general pointer operations to do as you want if you really gotta.

Apparently ITA's flight-search system takes this approach for mapping in large amounts of static data, for example.

Patrick Logan - Re: On Garbage Collection  blueArrow
11/25/2003; 11:02:04 AM (reads: 398, responses: 0)
They give you a garbage collected heap to serve most of your needs, but let you call malloc, mmap, and so on through their foreign-function interface.

Yes. I didn't follow the ITA link yet, but a typical pattern I've used in Lisp and other GC'd languages is to wrap explicitly allocated memory in a GC'd object. The various commercial Smalltalks have (or had, in some cases) good FFI's including memory allocation. Then associate finalization code with the GC'd object which "free()'s" the allocation memory.

Sometimes, especially with large amounts of memory or scarce resources. (Are file descriptors scarce still?) You need to explicitly manage and free the resource, but doing so within the wrapper object is still worthwhile for encapsulation, customization, etc.

Wouter van Oortmerssen - Re: On Garbage Collection  blueArrow
11/28/2003; 3:20:42 PM (reads: 262, responses: 0)
Thanks for the pointers.

I remember looking at Stalin ages ago, but there doesn't seem to be a desciption of its memory management scheme anywhere... not in the Stalin archive, nor on the web. To automatically insert "free()", some compromises on the language will have to be made, I would like to know what they are.

RTSJ is interesting, but very limited in application as a generic memory management scheme. ScopedMemory (a region) cannot be used to store data structures that live longer, as you can't decouple regions from the execution thread (you can't exit a scoped region without deallocating everything in it). This makes it suitable for running an algorithm and making its temporary allocations realtime, but it does not work for more data structure centric code, for example, a program that manages a set of graphs, whereby each graph would be its own region (as a single graph is always deallocated atomically).

A solution here is to allow a region to attached to an object, and make entering & exiting the scope of that region automatic when methods on that object are called (and not free on scope exit but on object destruction). This of course means that now you have to somehow do memory management on the region object itself...