Monadic and Substructural Type Systems for Region-Based Memory Management

Matthew Fluet's thesis, Monadic and Substructural Type Systems for Region-Based Memory Management:

Region-based memory management is a scheme for managing dynamically allocated data. A defining characteristic of region-based memory management is the bulk deallocation of data, which avoids both the tedium of malloc/free and the overheads of a garbage collector. Type systems for region-based memory management enhance the utility of this scheme by statically determining when a program is guaranteed to not perform any erroneous region operations.

We describe three type systems for region-based memory management:

* a type-and-effect system (a la the Tofte-Talpin region calculus);
* a novel monadic type system;
* a novel substructural type system.

We demonstrate how to successively encode the type-and-effect system into the monadic type system and the monadic type system into the substructural type system. These type systems and encodings support the argument that the type-and-effect systems that have traditionally been used to ensure the safety of region-based memory management are neither the simplest nor the most expressive type systems for this purpose.

The monadic type system generalizes the state monad of Launchbury and Peyton Jones and demonstrates that the well-understood parametric polymorphism of System F provides sufficient encapsulation to ensure the safety of region-based memory management. The essence of the first encoding is to translate effects to an indexed monad, trading the subtleties of a type-and-effect system for the simplicity of a monadic type system.

However, both the type-and-effect system and the monadic type system require that regions have nested lifetimes, following the lexical scope of the program, restricting when data may be effectively reclaimed. Hence, we introduce a substructural type system that eliminates the nested-lifetimes requirement. The key idea is to introduce first-class capabilities that mediate access to a region and to provide separate primitives for creating and destroying regions. The essence of the second encoding is to "break open" the monad to reveal its store-passing implementation.

Finally, we show that the substructural type system is expressive enough to faithfully encode other advanced memory-management features.

I've just finished chapter 4 which describes the substructural type system for regions. Figure 4.22 is a matrix defining the allowable operations for linear (L), affine (A), relevant (R) and unrestricted (U) references; R and U references have no "free" operation. How is R and U data reclaimed?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

GC

In the paragraph above that Figure there is this sentence.

Conversely, a region for which the capability’s qualifier is either U or R may never be deallocated. Since there may be many copies of the capability in the program state, all of which grant access to the region, there is no guarantee that the program will not access the region in the future.

Or to put it another way, this is where you fall back on some kind of garbage collecting scheme.

Yes, that statement led me

Yes, that statement led me to believe that GC might be needed, but requiring GC seems like the very anti-thesis of region-based memory management. So this substructural type system cannot replace GC as is done with other region-based memory management schemes; it will reduce the need for GC, but not eliminate it.

Not-so-strange bedfellows

requiring GC seems like the very anti-thesis of region-based memory management

My understanding is that pragmatic considerations may require combining both approaches - e.g., the MLKit SML system "integrates reference-tracing garbage collection with region-based memory management."

The pragmatic considerations

The pragmatic considerations were the LIFO limitations of the type-and-effect region systems, and that region inference sometimes couldn't infer sufficiently optimal region annotations (the Makholm-Niss system suggests that optimality is unachievable without some sort of manual annotations in fact). This work supposedly subsumes type-and-effect systems, but it seems to totally miss the point of region-based memory management: all prior region systems were trying to replace GC, not require its use or supplement it.

GC and Region Inference

GC is needed in the most general case, but this suggests that that general case is both rare and easy to avoid.

I've read that paper many

I've read that paper many times. Notice how introducing GC requires ~30% space overheads, and ~15% runtime overheads. While it's an improvement over GC alone, I'd say the conclusion is that regions+GC is a sufficient solution for most problems given the current limitations of region-based memory management. We should still try to address those limitations though.

Exercise

If the problem is with the U and R qualified region variables, what might be a solution?

I wish I knew! Garbage

I wish I knew! Garbage collection was the first thing that came to my mind as well, but I'm trying to explore the solution space. I believe simply excluding U and R qualifiers effectively reduces the solution to previous LIFO region systems which are known to be suboptimal. Makholm-Niss is complete and does not suffer from LIFO, but it supports first-order programs only; defunctionalization can solve the problem of higher-order programs for Makholm-Niss, but since I'm working on a language with dynamic loading, whole program defunctionalization seems infeasible. I've been looking for an alternative, and hoping that this substructural type system could provide one. Any suggestions?

Linearity

As you say, with the bottom half of that matrix you should be able to implement the monadic regions on top so you aren't any worse off than that (and apparently that is already doing pretty well). However, you should still have the flexibility that moving to rgnURAL provided. The point is to explicitly have allocate and free primitives rather than them being combined in a runRegion function.

Straight away this lets us

Straight away this lets us use region-structures other than a stack. Why not start treating our conses as a binary tree rather than a list, for example?

I take your point, but there

I take your point, but I still feel there was insufficient discussion of the concrete benefits of this additional flexibility. What sorts of programs benefit from linear/affine types over TT regions? More importantly, what sorts of programs are still not not suitably handled? For instance, TT and similar systems have trouble with iterative algorithms; are those problems solved simply by breaking the LIFO handling of regions?

Perhaps these questions would be easier to answer had the dissertation provided an inference algorithm like most other region papers.

I think it's a bit premature

I think it's a bit premature to ask for an inference algorithm - there are many possible and inference algorithms for System F-like languages are still an ongoing effort as it is. Rather than looking for all the answers now, consider that we now have a sensible playground in which we can explore the options in a manner open to hackers who don't want to have to create their own language? One sufficiently powerful to write a GC in, and thus powerful enough to use to explore hybrid techniques? One which provides a solid target language for inference algorithms and can guide intuitions about what forms of annotation might be useful?

There was discussion (and sufficient enough for me)

For instance, TT and similar systems have trouble with iterative algorithms; are those problems solved simply by breaking the LIFO handling of regions?

The more obvious ones, yes, that is exactly what problems are solved. From the introduction of the thesis:

For example, a loop that allocates data each iteration and uses that data only in the next iteration cannot be executed with a fixed amount of acquired memory under the nested-lifetimes regime.

This could be readily done with a linear region variable that you allocate at the end of an iteration and free at the beginning. This couldn't be done with a nesting discipline.

Part of the motivation of

Part of the motivation of rgnURAL was to demonstrate that one could have a unified core calculus that could explain both explicit memory management via regions and (have a reasonable story) for implicit memory management via GC.

Derek's point is the important one. Every region allocated reference really has three qualifiers: (1) the qualifier of the pointed-to data, (2) the qualifier of the ref cell itself, (3) the qualifier of the capability for the region in which the ref is allocated. It is the latter two that determine the memory management of the cell. When qualifier 2 (the qualifier of the ref cell itself) is A or L, then you can explicitly free the cell. When qualifier 2 is U or R, then you implicitly free the cell when the region in which it is allocated is deleted. So, U and R ref cells are handled just like they are in previous region-based memory management systems: the data is reclaimed when the region is deleted.

More interesting is how qualifier 3 (the qualifier of the capability for the region in which the ref is allocated) impacts the memory management. When the capability has qualifier A or L, then you can explicitly free the region (and all data contained therein -- regardless of the qualifiers of the contained data!). When the capability has qualifier U or R, then we are in the situation that most closely relates to your original question. In this situation, you can not explicitly free the region (and, hence, cannot free the data contained therein). So, to reclaim this data, one would need to fall back on some other mechanism; garbage collection is the natural choice. But, note that it is a slightly different form of garbage collection -- we are interested in whether region capabilities are reachable, not ref cells allocated in regions. When a region capability is unreachable, then we can reclaim the region, regardless of whether data therein is reachable. (This is just the standard notion that region based memory management can reclaim data earlier than a tracing garbage collector.) (BTW, the Cyclone language does something like this, where regions are reclaimed when their handles become unreachable.)

So, if you can program exclusively with A or L capabilities, then you won't have any need for garbage collection, and you can still have URAL references allocated within the regions. This is the situation that corresponds to translating from F^RGN to rgnURAL.

Thanks, this clears up quite

Thanks, this clears up quite a bit of confusion my part!

When qualifier 2 (the qualifier of the ref cell itself) is A or L, then you can explicitly free the cell.

What is the benefit of this? You can't reuse that memory until the region is freed anyway, correct?

So, to reclaim this data, one would need to fall back on some other mechanism; garbage collection is the natural choice. But, note that it is a slightly different form of garbage collection -- we are interested in whether region capabilities are reachable, not ref cells allocated in regions.

So despite the fact that region capabilities have no runtime significance, they can't be erased because we may need to GC them -- at least the U/R qualified region capabilities. Tracing only U/R region capabilities is definitely a substantial improvement over tracing all data.

I'm still not clear on what types of programs would not be suitable for A/L qualified regions; what types of "sharing" imply U/R? For instance, a shared frame buffer between an application and a graphics driver?

No, if you free an A/L

No, if you free an A/L qualified ref cell, you should be able to reuse its memory in the region. Now, I'm not sure about the implementation details -- the overhead of fragmenting your regions by deallocating/reallocating individual ref cells therein may actually be more space than leaving the cells until the region is deleted. Or, you could take a tack like Cyclone, and make all A/L qualified ref cells allocated in a distinguished region (which isn't really a region at all), managed in a slightly different way.

You raise a good point about the runtime significance of region capabilities. Off the top of my head, I don't see any alternative besides keeping them around. If one wishes to use the stronger implication that an unreachable capability implies freeing a region (rather than the weaker implication that one can free a region only when there are no reachable ref cells within the region), then region capabilities do have a runtime significance. Cyclone conflates the region capability with the region handle, so sometimes you can get things to do double duty.

Hacking ahoy!

Presumably it'd be possible with a little extra work to build regions that only have cells of a given type or size. In that case, if you're mostly worried about your maximum size rather than your average then for types big enough to store a cons your total overhead can be a single word pointing to the start of a free list.

I'm now finding myself wondering about regions with destructors. It occurs to me that if we can build those then you've probably just found the core of the nice language Stroustrup reckons is trying to get out of C++.

Now, I'm not sure about the

Now, I'm not sure about the implementation details -- the overhead of fragmenting your regions by deallocating/reallocating individual ref cells therein may actually be more space than leaving the cells until the region is deleted.

Yes, the fragmentation is what I was referring to. In thinking about it more, I'm not sure I see the advantage of a linear ref-cell though: behaviourally, is it not equivalent to a direct pointer to the pointed-to data? Writing to an A/L ref cell is just locally re-binding the name to the new data, so why allocate a cell at all?

Couldn't the same effect be achieved by a rewrite which splits the function f into its public definition f_p, and a number of inner definitions f_i (where i=1,2,...) for each write to an A/L ref cell, and lifting the ref to a parameter of f_i? Re-binding is then either a recursive call to the current f_i, or the next f_i with a new binding for r.

(* r1 and r2 are linear by the L qualifier *)
let f x:int o:string =
  let r = ref x in  (* type: int ref *)
    r := "foo";     (* strong update to: string ref *)
    return (r, o);;

becomes:

let f x:int o:string =
  return f_1 "foo" o;;
let f_1 r:string o:int =
  return (r, o);;

Cyclone conflates the region capability with the region handle, so sometimes you can get things to do double duty

Ah good, I was considering folding the capability into the region handle in order reduce the number of parameters to pass around. Then the handles themselves could be garbage collected. Cyclone will prove helpful to investigate any disadvantages of this approach.

Hmm, I had a thought that region handles could be reference counted, but since they're first-class I'm now thinking that's not feasible; region handles could be embedded in other regions, so garbage collecting these by any means has the same problems as traditional GC.

You are correct that an A/L

You are correct that an A/L qualified reference is rarely useful on its own, because one can almost always replace the use by flowing the contents of the cell through the program rather than the cell itself. Where A/L qualified references become interesting is when they are themselves in another reference. Consider the following type:

U^ref A^(U^unit + A^ref(U^int))

That is, an urestricted ref cell whose contents is the affine disjoint sum of an unrestricted unit and an affine ref cell (whose contents is an unrestricted int). The outer ref cell can be shared throughout the program; one can then swap in a unit value to try to acquire the affine ref cell, do something that needs exclusive access to the cell, and then swap it back. See some of the Cyclone "Safe Manual Memory Management" papers for more details on this idiom.

Freeing memory in a heap

Reconsidering Custom Memory Allocation reports it works pretty well to use a fairly standard malloc to track and attempt to allocate from the holes in a region (as if the allocator had gotten those disjoint spaces through mmap). I think they only considered fairly long-lived regions, which could be a problem.

I thought the paper had been discussed here, but I can't find it.

I just skimmed this the

I just skimmed this the other day interestingly enough. Allocations require header words, and considering ref-cells are pointer-size, that's a lot of overhead for a ref cell. I'm currently reading about the Streamflow allocator which does not use headers for small objects.

As is hopefully clear from

As is hopefully clear from the other comments I've made, a lot of the motivation for my thesis work was formalizing some of the practical ideas that emerged in Cyclone. The region idiom was used extensively in Cyclone to combine various features. For example, there are (traditional) lexically scoped regions. There is a (global) heap region; the data allocated therein is garbage collected. There are dynamic regions that can be explicitly deleted. Then there are uniqueness qualifiers that can be used to explicitly free region allocated objects. Practical experience with Cyclone showed that one wants this variety of features because a one-size-fits-all approach to memory management won't suffice for all applications.

The downside of Cyclone's language design is that the essential commonalities and differences of these language features weren't factored out. This is a good thing for programmers: they get to work at a higher level of abstraction. But, it is a bad (or, at least awkward) thing for proving type soundness. rgnURAL was an attempt to boil everything down to essentials (where type soundness was easier to show) and demonstrate how to build back up to Cyclone's features (Chapter 5.5 of my thesis). One can model Cyclone's global (garbage collected) heap as U qualified region (capability) that is created at the beginning of the program and in scope for the whole program. This just corresponds to an idiomatic use of the rgnURAL constructs, rather than the (singular) language feature of a global heap in Cyclone.

Back to your question: what types of programs would not be suitable for A/L qualified regions? Well, any Cyclone program that wants to make use of the global (garbage collected) heap. Suitability doesn't mean that you can't express the program using A/L qualified regions, but simply that it may not be the most natural. Similarly, sometimes lexically scoped regions are good enough, but sometimes you want dynamic regions.

Suitability doesn't mean

Suitability doesn't mean that you can't express the program using A/L qualified regions, but simply that it may not be the most natural.

Understood, but I'm looking at your system as a target for a higher-level language without regions, not for a human to program in, so the tolerance for complexity is subsequently higher. As long as all programs can be expressed solely with some arrangement of A/L regions with reasonable lifetimes, I can live with that. I'm hoping that even such suboptimal lifetimes would be competitive with GC. It all depends on my region inference algorithm I suppose.

Admittedly, it would be nice to add substructural typing to handle effects in the language as well, as I've been researching type and effect systems for this purpose. I'm not against GC at all, but my language will feature processes ala Erlang, and the complexity of maintaining per-process heaps and efficiently passing messages between them is growing in complexity to the point that learning all these snazzy region systems is worth it. :-)

On that note, I haven't come across any papers on the use of regions for efficient message passing, but if you know of any, I'd appreciate a link! I'm currently reading "Safe Manual Memory Management in Cyclone".

Message Passing & Memory Mgmgt.

Here's one:


Language Support for Fast and Reliable Message-based Communication in Singularity OS
, Manuel Fähndrich, et. al. in Eurosys2006, Leuven, Belgium.

Also, in the context of the Ynot project, we've developed a more general notion of type-and-effects (based on Hoare Type Theory) which you may find interesting. In essence, the type-and-effect systems is based on Coq, so literally get to track whatever you want in the types (as long as you can prove it.)

I've read the Singularity

Singularity emphasizes garbage collection not regions. Each Software Isolated Process (SIP) has its own heap, potentially with its own custom garbage collector, and shared data is held on a GC'd "exchange heap" between SIPs. This is similar to Erlang HiPe and Manticore. I was hoping to explore alternate designs with regions.

Ynot looks interesting. I look forward to reading your papers. :-)

[Edit: the singularity paper did mention Linear types for packet processing, which uses linear types to transfer buffers between agents. This is closer to what I had in mind.]

Linear Regions are all you need.

As Matthew's thesis shows, once you have linear regions (and System-F polymorphism), you can code up a garbage collector *in* the language. You can also code up TT-style regions. So what he has developed is at least as expressive (in terms of space complexity) as both, at least in a theoretical sense.

Inference for such a system is hard, because it really leverages a lot of the power of System-F (in particular, higher-rank polymorphism to encode things such as let-region.) System-F inference is generally undecidable, though as GHC has shown, you can get away with omitting a lot of information and have it recovered. I suspect that similar tricks can be done with the languages described by Matthew's thesis, but this was well outside the scope of the work.

Hmm, I wasn't asking for a

Hmm, I wasn't asking for a general type inference for lambda-URAL, I was just after region inference. Perhaps they are more intertwined than I believed. I will give it some more thought.