## Articles on Precise GC with Parametric Polymorphism

I've tried to search CiteSeerX for a few days now and haven't managed to turn up anything concrete.

I've only managed to turned up an article on compressing stack maps (MS Research?), three or so articles/theses on "intensional polymorphism" tangentially related, and one or two articles (beyond me, I'll admit) on complex (whole program?) type reconstruction schemes.

While of interest, none of these directly address what I'd wager must be, by now, a commonplace runtime problem to tackle for many gc'd languages supporting parametric polymorphism alongside "unboxed" full machine words/ints, untraced locative pointers, perhaps unboxed doubles, and so forth.

FYI, my interests are both polymorphic variations on the mechanics of static (return address indexed) stack maps, general stack map formats, and strategies for implementing heap object headers for parametric polymorphic data structures.

As always, so many thanks in advance!

S.

## Comment viewing options

### You don't give many hints as

You don't give many hints as to what you have found. For example,
is Tolmach's 1994 Tag-free Garbage Collection Using Explicit Type Parameters a useful way into the subject? It seems fairly well cited.

(Disclaimer: this dropped out of Google into my lap; I'm not familiar with this corner of the literature, though it seems interesting.)

### There are three strategies

There are three strategies that I can think of:

1) Compile the polymorphism away by duplicating code.

2) Use runtime type tags.

3) Pass the a representation of the type of the type variable at run time.

### A 4th strategy

One can avoid all of this - see "Runtime Tags Aren't Necessary", Appel. It's not clear whether it is an advantage - he estimates the annotations to the code to equal the size of runtime tags, but probably things now changed in favor of his scheme.

Another paper is "Bounded-Space Tagless Garbage Collection for First Order Polymorphic Languages", but I'm not sure it's so interesting.

### My primary concern is

My primary concern is actually pretty basic. I want to support unboxed native word sized integers (whether 32 or 64 bit). And that ideally will include polymorphic use of these integers, as in List - or "List Int" or "(List Int)" in my language's type prefix s-expression oriented syntax.

Unboxed doubles are also highly desirable for obvious reasons (good performance for common computations), but I suppose that might be gravy, especially in polymorphic use such as "(List Dbl)" or "(Vec Dbl)" and so on.

### Stack maps can be used only

Stack maps can be used only when the stack layout is fixed, which isn't the case when combining unboxed types and polymorphism.

You need something like a bitmask-passing representation. See the post I linked to, and the previous entry linked from it. They explain various representations.

### Andrew Kennedy

has some papers on the implementation of generics in .NET, which supports unboxed data.

### "Representation passing"

"Representation passing" seems promising. So far, I believe the big challenge is to distinguish pointers from non-pointers on the stack.

A single extra word parameter to polymorphic functions can describe the traceability status of up to 32 type variables, which I presume (so far) the GC root scanner can use together with return-address-indexed stack maps for precise determination of pointer roots on the stack.

Again just so far, I posit heap objects that either 1) are 2-slot header-less values that live in heaps specialized according to slot pointer status (e.g., 4 special 2-slot heap types), or 2) live in "general purpose" heaps of variable sized values but which enjoy a header which describes the trace status of the heap value's slots, a header which will have to be similarly (?) specialized when the heap value has polymorphic slots.

Pointers are 2-machine word aligned, which gives some extra flexibility for storing pointers and immediates in the same location, say for convenient representation of algebraic data type values (e.g., Nil and Cons(car,cdr) - and so forth). So any indication of "pointer" is really just an indication of "potentially pointer" to the GC.

I suppose naturally, I'm concerned about reducing runtime overhead - extra function parameters, heap value words, construction of slot-describing bit-masks and so on - and what might possibly be accomplished more efficiently at compile time vs. runtime, or via caching the most common cases and so on. (An invariant for me is separate compilation and a "regular" system-provided linker.)

And, of course, I'm concerned that some fatal flaw in my reasoning so far will make the whole thing fall apart :-)

Many thanks for the pointers to info and articles so far. If the above sparks recall of any other papers or interesting systems (i.e., source code), I gladly appreciate anything one might toss my way.

S.

edit - I may also reorder pointer and non-pointer values in 2-slot objects, yielding just 3 special 2-slot heap types: PP, PI and II. The goal here is to have either 1) a forwarding pointer into to-space in the first slot of each special 2-word value or 2) a special-case per-heap bit-mask for the no-pointers (II) 2-slot heap type which will indicate the prior per-object copying into to-space of each value in the heap.

### Some Clean Source Code....

Upon waking, I conclude that reading the sources of a real-language implementation of precise GC for a language supporting parametric polymorphism might be the right way to go.

From experience, the smallest language implementation that enjoys this particular feature would be most suitable.

Large projects, with myriad research goals and so many folks banging on them, are often full of TONS of (important, of course) cruft added for research purposes, corner case optimizations, implementation portability and so on - all things that can greatly obscure understanding of the core RTS implementing - in this case - precise GC with parametric polymorphism.

If folks know of small'ish (or not, I guess) open source language implementations that feature such a precise GC scheme, I would greatly appreciate a pointer to the language and its implementation source code.

Thanks!

S.

### Hey!

Try the GC in use in the Hi compiler. It's a Cheney style garbage collector where you can just store arbitrary information which are either a) series of bits, b) series of pointers.

I don't see why you'ld need more than that?

[ Look in runtime/x86_64/heap ]

### Mono

Apparently, they're working on a precise GC.

Our generational GC is in a very advanced stage of development, we're right now doing extensive functional and performance testing.

The only missing bit we have right now is precise stack scanning. Which do cause some issues on workloads with very deep stacks. Luckily that's not the case for the majority of the cases.

The main issue with precise stack scanning for us is that our JIT was never engineered to support it, so retrofitting it is a huge task both in terms of gathering the required information and to generate code more friendly to stack maps - to reduce the memory footprint.

### Combine Stack Maps and Bitmask Parameters?

Can one profitably combine "traditional" stack activation frame maps with a bit mask parameter for any polymorphic parameters? The stack map itself can indicate to the GC stack scanner whether or not a bitmask parameter for polymorphic arguments is present in the activation frame.

Also, I wonder if unboxed doubles passed as a polymorphic argument might just be handled by 2 bits in the bitmask parameter indicating two non-pointer word values? (Well, even if this might allow precise GC, it doesn't account for actual proper code execution within the function, and doesn't allow for passing doubles on the FPU stack in Intel chips.)

I'm greatly interested in natural word representation in order to facilitate fairly seamless integration with C functions calling into the OS and into libraries, ala a more straightforward "FFI".

I do wonder about the impact of stack maps on locality of reference and hence speed of GC, so any thoughts there would be of interest there as well.

As always, all and any input greatly appreciated.

- Scott

### By far the easiest to

By far the easiest to implement and the fastest would probably be to compile all polymorphism away like .NET does (though they still use runtime tags in their GC iirc). Have you considered this? That is, when you see a call site foo[int](a,b) you compile a version foo_int specialized to ints. Naturally this would seamlessly integrate with C, and have excellent locality of reference.

### At the risk of code

At the risk of code explosion. It also precludes polymorphic recursion.

### Not necessarily. The .NET

Not necessarily. The .NET CLR supports both. You just have to add a level of indirection/dispatch at those points.

### Sound promising. Any more

Sound promising. Any more specific pointers to this scheme? Many thanks. - Scott

### Ref 1, Ref 2. Don't have

Ref 1, Ref 2. Don't have time at the moment to verify that they explain the approach in sufficient detail, but those are the published papers on CLR generics.

### .NET, with its runtime

.NET, with its runtime specialization, seems like a poor model for me, as separate compilation (including to .so files) is a hard language implementation prerequisite.

This also seems like overkill for just achieving precise stack scanning during GC. And this will require stack maps for the GC scanner even if somehow polymorphic functions and structures are compiled to monomorphic equivalents.

That I can live with, although I seek detailed guidance on stack map design, sharing/compression, return address lookup schemes, etc.

Might it be profitable to treat polymorphic function in a manner akin to C++ templates (compile time specialization), but _only_ specializing based upon the word status (pointer or non-traceable word) and possibly argument sizes greater than a word (e.g., doubles, complex numbers and other such common place candidates for stack passing value arguments) of any polymorphic arguments, associated local and return variables, etc. ???

Then bitmask/representation passing would become unnecessary and regular monomorphic-language style stack maps would suffice for precise GC stack scanning with untagged integer/word (and maybe larger) values for polymorphic arguments/variables.

Am I on the right track here or barking up the wrong tree. I worry about some nuance conflicting with the requirement for separate compilation and, I should have mentioned earlier, the use of a standard system provided "stupid" linker.

Again, any and all advice greatly appreciated !!!

- Scott

### Depends. Are the type

Depends. Are the type parameters reified? I.e. if you can distinguish between List<A> and List<B> for all A and B at runtime, then specializing to the reference/non-reference classification of the type won't be enough. (You may have to push through a representation of the types A and B to appropriate allocation sites of List). Also, if you have types that are larger than a word then you may need to specialize in order to have enough storage for them in objects, arrays and stacks.

### If I understand the first

If I understand the first question, no - there is no RTTI (to use the common C++ and Java terminology). The language has "ext data" types so that exceptions are not a special case, and these can be distinguished via the regular "match" pattern matching operator.

Regarding your second question, I'm on the fence. I really would like to support unboxed doubles and maybe even rational and complex numbers My language isn't particularly mathematics oriented, but efficient operations on doubles at least is still and important special case.

I'm on the fence only because I'm an individual implementing the language, and I have to "pick my poison" on which areas of complexity to tackle.

One "pragmatic" option I am considering is only supporting monomorphic unboxed arrays and structs (roughly like C), in order to more seamlessly interact with OS functions and C libraries. Polymorphic vectors and structures (roughly like ML) would be represented differently.

The type-level and rts-level options, interactions and trade offs are, indeed, somewhat overwhelming.

### Object Representation

It sounds to me like you are asking for the basics - how precise garbage collection can be done at all in the presence of parametric polymorphism.

The usual solution is to implement polymorphism so values of an unknown type are only manipulated by reference (Java, etc), and generating specialized code for each possible type if it appears otherwise (C++ templates). Then the problem is reduced so pointers are in known locations, but you may not know the exact type of the target object, which can be handled by giving every object a standard header including layout information. For an object-oriented language, that can probably be handled as another field in the vtable.

Details for one system can be found in the GHC Commentary: Storage, One notable optimization in marking is recognizing the various object types that have a fixed layout without following the info pointer. Stacks are handled by requiring layout information at a standard location in the stack frame.

Things like Tolmach's type reconstruction are only relevant if you actually want to do something like tagless object representations (and note an object oriented language usually uses a vtable anyway), or if you want to support polymorphism without either boxing or specialization.

### Thanks for the reference.

Thanks for the reference. No, I do not want to scan stack frames for type reconstruction or attempt other o'er complex schemes.

You are correct that heap allocate values (rather traditional if simplified objects, strings, symbols, records, tuples, lists, etc.) don't present too great a challenges. Excepting special 2-word heaps for headerless values (those 2-word heaps themselves are specialized for pointer/immediate pairs of values), all other n-word objects all have a 2 word header allowing for "self description" to the gc scanner.

My primary concern is precise stack scanning that allows for unboxed full-machine-word-sized integers, and this in the presence of polymorphic functions. I've found tidbits of information here and there, but no direct treatment of, say, stack map, schemes that account for this polymorphism. There's my quandary. I'll eagerly examine your references. Thanks again.

- Scott

### Unboxed Numbers

If you don't want to do anything too tricky, then GHC's stack handling will be a reasonable start - just apply the idea of making objects self-describing to stack frames as well.

If you only want to support polymorphism over two or three sizes of unboxed numbers (pointer-free), then it seems reasonable enough to handle code possibly polymorphic over unboxed types by specializing for those two sizes, and one last version for all boxed types. Then all the things you've been reading should apply individually to each version.

I'll mention again that every single boxed type is equivalent in the generated code. As far as the GC is concerned, whatever type the invocation was at, the stack just has some pointers at fixed offsets, which always point to self-describing heap objects.