Garbage Collection Based on a Linear Type System

An oldie (2000), but in my current (ignorant, admittedly) mind a goodie, from Igarashi & Kobayashi. I just wish the research hadn't apparently immediately died off? Garbage Collection Based on a Linear Type System. Or, chocolate + peanut butter.

We propose a type-directed garbage collection (GC) scheme for a programming language with static memory management based on a linear type system. Linear type systems, which can guarantee certain values (called linear values) to be used only once during program execution, are useful for memory management: memory space for linear values can be reclaimed immediately after they are used. However, conventional pointer-tracing GC does not work under such a memory management scheme: as linear values are used, dangling pointers to the memory space for them will be yielded.

This problem is solved by exploiting static type information during garbage collection in a way similar to tag-free GC. Type information in our linear type system represents not only the shapes of heap objects but also how many times the heap objects are accessed in the rest of computation. Using such type information at GC-time, our GC can avoid tracing dangling pointers; in addition, our GC can reclaim even reachable garbage. We formalize such a GC algorithm and prove its correctness.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Very interesting for a system in which no GC is assumed

Skimming this, it seems to be about using linear types to provide what are essentially precise stack and register maps, to skirt the pitfalls of conservative scanning.

I mean to take a closer look at it, since in my current language project I’ve been thinking of providing library-defined smart pointer types for various GC strategies, but this sort of compiler support could help avoid the overhead of what I had intended to do: registering and unregistering GC roots at runtime.

Anything else in this vein I should be reading, while we’re at it?

the let down

The end of the paper lists a lot of todo's that sound like show stoppers until they are done. No further work directly in this lineage has happened, it appears to me. And when I mentioned it on the ATS list knowledgeable people were skeptical. :-)

...my bet is that it is unlikely to work. There are many reasons. The need for type information at run-time may be too much of a burden. How fast can such type information be inferred at run-time? Also, issues like handling data generated by foreign code are not yet touched.

But if somebody else has or will go and do something where we get really good GC, I'd buy it.

I am back to Reference Counting

The problem with most GC's at the moment is that performance scales down linearly with the amount of memory allocated.

If you can guarantee that the data collected is a tree, or DAG, instead of a graph you can get away with reference counting. The thing with stop-and-copy garbage collectors is that it mostly didn't matter whether you collect tree or graphs; therefor most languages implemented that (mostly Cheney.) But now, we have programs which need to collect gigabytes of data and it is hard to keep pause times low.

Sure, reference counting is slow. But it is local, it is simple to implement, it is efficient in that it doesn't need a to-space which doubles the memory requirements, and it is predictable which implies it works well with RAII schemes.

I am going to collect trees. Well, if I didn't mess up the invariants. There's a lot you can do with trees.

Witch's Brew

It seems to me that there's no One True Answer to GC. There's only One True Answer to issues like this and that is, "it depends". As in, everything is a balancing act / trade off / deal with the devil. So I wish instead of having it Solved Once And For All, we would instead have GC components that are architected into the system so well that we can swap them out. And munge them around. So the JVM lets you config, but not utterly replace the GC. I wanna be able to try other GC plugins. Please, let us add yet another layer of abstraction!

(Bacon @ IBM of course has lots of interesting publications on the topic of Ref Counting vs. Tracing vs. Hybrids.)

This is what I am planning

In my language I want to provide the right hooks so that efficient GC can be implemented as a library. The core language will be unsafe and imperative. You will be able to create functional components by proving them safe.

Is that akin to Erlang?

I do not really understand it for realz, but I always heard that in Erlang GC is only when a process dies away; then I guess if it is a supervisor the kids die off, and so you get a tree / domino effect.

Sometimes I hear that ref counting can lead to domino effects, and those can be painful -- you get something like a bad GC pause feeling when it happens.

I am ignorant.

Erlang is a mature system

Not like Erlang, though I think Erlang started off that way, or Cheney probably. Last I remember Erlang had a pretty obscure GC, or I read some odd proposal where they wanted to exploit the DAG-like features. In general, it's a per process generational copying garbage collector with a reference counted heap.

Erlang is old, it was written for soft real-time telecom applications. I Googled it, it seems they now run into problems with that and the language now somewhere needs a redesign for data-center applications. (The language/GC wasn't designed to handle large binary objects well.)

Yah, the avalanches you can trigger with reference counting is well known.

But I am writing a toy scripting language, nothing for soft real time. More of an experiment where I look how far you can get by lifting C++ features in a minimal number of LOCs.

rc scenario cost primer

RC (ref counting) doesn't suck as much as sometimes reported. Each of three cost-problems with RC can be mitigated. One of them, the domino effect, is harder to mitigate without doing something systematic on a drastic scale.

Three sorts of latency cost are involved: 1) cost of a single count change, 2) frequency or total number of count changes, and 3) large groups of related counts changing in a batch because a tree becomes unreachable (the domino effect). To maximize cost impact of those — the opposite of mitigation — you can 1) make them atomic to be threadsafe, 2) make them proportional to aliases (every single ref is counted), and 3) require all refs in a newly unreachable tree be released together synchronously.

To minimize cost, instead, you would prefer 1) single-threaded RC, 2) fewer counts than aliases (often called borrowing), and 3) incremental instead of monolithic tree release (rarely done). A primary reason to avoid minimizing cost this way is when you feel obligated to pursue a one-size-fits-all solution. That is, if a single count anywhere must be threadsafe, let's make them all threadsafe; if you can track aliases exactly, lazily count them all; if the root of a tree is released, have functions recursively release all children in a single batch. Each of those reasons is lazy, in the name of simplicity. But sometimes simple is dumb to the point of malpractice, like using a list instead of a hashmap for huge collections because it's "simpler".

If you use RC everywhere, only things crossing thread boundaries need thread safety, and the less that happens the better. You can prefer an architecture where activities are usually constrained to run in a single thread, where ordinary non-atomic counts are fine, so you don't pay the cost of cache-line contention across cores. (If you make an atomic change in a cache-line in core A, it doesn't know whether core B also has that cache line, so core A and core B are forced to communicate about that cache line, when all you wanted was an integer to change.) For message-passing between threads, use thread-safe refcounting. You can also pass non-thread-safe data to another thread when "reserved". You can ask another thread to fill a buffer, then do nothing with it yourself until you get the thread-safe reply, without being forced to make buffer RC thread-safe. You might need to register a cross-thread root though, in the requesting thread, if the caller dies and you want to delay release until reply occurs; but that only costs space.

RC does not need to be proportional to the number of pointers extant in running code data structures. The official count only needs to be nonzero when the number of aliases is nonzero. In some scope, like a function, if you know a count is held by someone else, like your caller, then you don't need a separate count for all your local aliases — counting them is a waste of time, because the caller's count dominates all of yours. Whenever you can statically determine a ref is held with a count in a spot that dominates a scope, that scope doesn't need more counting. If an alias can escape a scope, that's when you add a new count. When you plan to keep using something, take your own counted ref; but once you are holding it, you don't need another in scopes protected by that one. There's a direct analogy to memory management in languages that do it manually. After you allocate an owned-pointer, you use it freely until deleted when you are done; each time you use it, you don't call a function to insist "make sure this is still allocated!" because you already know it is, when not yet deallocated. Grabbing and holding a counted ref amounts to "allocation", where one count protects your other uses. The other uses are sometimes said to borrow the count where allocation happened.

The last problem, the domino effect when a tree becomes unreachable, is an artifact of code organization and little more. You could add it to a todo list and release references incrementally. In an object-oriented language, it's tempting to hide RC as an invisible implementation detail. You don't want to know calling a function releases refs, and that this in turn becomes a cascade of releases when a tree becomes unreachable. In the name of abstraction, you don't want to know a monolithic tree release is the emergent effect of releasing things individually. Usually releasing resources is conflated with doing other sorts of cleanup logic, and the cleanup needs to occur at the right time (because you are twiddling a state machine in a non-functional language). You can delay the release part, though. After doing "cleanup" you can hand the ref to be released to a releaser, who gets to it as needed, which is able to string the release process out incrementally when a tree is big. It's hard to do well without reflection, because you want a releaser to know everything about what is released, by looking at it. This means you want state information about resources to leak out of abstraction, even if everything else was kept hidden. The sensible thing to do is systematically know where resources are, without a programmer doing it manually. Unless a language supports that idea, you would have to do it manually, making your library code look more complex.

(Solving the domino effect with incremental release only gets a little mind-bending when you work out how this works when a lightweight process is killed, and you don't want its code to run anymore. If the location of resource is visible, a dead process doesn't need to run code further to release, because it can be done generically. But you really need to identify everything that acts like a resource, to avoid snags.)