archives

Evolutionary Programming and Gradual Typing in ECMAScript 4

Evolutionary Programming and Gradual Typing in ECMAScript 4, by Lars T Hansen, Adobe Systems, is an 18 page tutorial detailing features of ECMAScript 4, and how they can be applied to existing ECMAScript code to improve it.

This tutorial uses a simple library as a running example to illustrate the evolution of a program from the ES3 "script" stage, via various levels of typing and rigor, to an ES4 package with greater guarantees of integrity and better performance potential than the original code. We then look at how union types, generic functions, and reflection can be used to work with a library whose code we can't modify.

Monadic and Substructural Type Systems for Region-Based Memory Management

Matthew Fluet's thesis, Monadic and Substructural Type Systems for Region-Based Memory Management:

Region-based memory management is a scheme for managing dynamically allocated data. A defining characteristic of region-based memory management is the bulk deallocation of data, which avoids both the tedium of malloc/free and the overheads of a garbage collector. Type systems for region-based memory management enhance the utility of this scheme by statically determining when a program is guaranteed to not perform any erroneous region operations.

We describe three type systems for region-based memory management:

* a type-and-effect system (a la the Tofte-Talpin region calculus);
* a novel monadic type system;
* a novel substructural type system.

We demonstrate how to successively encode the type-and-effect system into the monadic type system and the monadic type system into the substructural type system. These type systems and encodings support the argument that the type-and-effect systems that have traditionally been used to ensure the safety of region-based memory management are neither the simplest nor the most expressive type systems for this purpose.

The monadic type system generalizes the state monad of Launchbury and Peyton Jones and demonstrates that the well-understood parametric polymorphism of System F provides sufficient encapsulation to ensure the safety of region-based memory management. The essence of the first encoding is to translate effects to an indexed monad, trading the subtleties of a type-and-effect system for the simplicity of a monadic type system.

However, both the type-and-effect system and the monadic type system require that regions have nested lifetimes, following the lexical scope of the program, restricting when data may be effectively reclaimed. Hence, we introduce a substructural type system that eliminates the nested-lifetimes requirement. The key idea is to introduce first-class capabilities that mediate access to a region and to provide separate primitives for creating and destroying regions. The essence of the second encoding is to "break open" the monad to reveal its store-passing implementation.

Finally, we show that the substructural type system is expressive enough to faithfully encode other advanced memory-management features.

I've just finished chapter 4 which describes the substructural type system for regions. Figure 4.22 is a matrix defining the allowable operations for linear (L), affine (A), relevant (R) and unrestricted (U) references; R and U references have no "free" operation. How is R and U data reclaimed?

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

I think the real culprit behind the "poor performance" of GC languages is demonstrated in Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, by Matthew Hertz and Emery D. Berger:

Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can measure the cost of conservative garbage collection relative to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown.

We take the first steps towards quantifying the performance of precise garbage collection versus explicit memory management. We present a novel experimental methodology that lets us treat unaltered Java programs as if they used explicit memory management. Our system generates exact object reachability information by processing heap traces with the Merlin algorithm [34, 35]. It then re-executes the program, invoking free on objects just before they become unreachable. Since this point is the latest that a programmer could explicitly free objects, our approach conservatively approximates explicit memory management. By executing inside an architecturally-detailed simulator, this “oracular” memory manager eliminates the effects of trace processing while measuring the costs of calling malloc and free.

We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection
degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.

Overall, generational collectors can add up to 50% space overhead and 5-10% runtime overheads if we ignore virtual memory. Very reasonable given the software engineering benefits. However, factoring in virtual memory with its attendant faulting paints a very different picture in section 5.2:

These graphs show that, for reasonable ranges of available memory (but not enough to hold the entire application), both explicit memory managers substantially outperform all of the garbage collectors. For instance, pseudoJBB running with 63MB of available memory and the Lea allocator completes in 25 seconds. With the same amount of available memory and using GenMS, it takes more than ten times longer to complete (255 seconds). We see similar trends across the benchmark suite. The most pronounced case is javac: at 36MB with the Lea allocator, total execution time is 14 seconds, while with GenMS, total execution time is 211 seconds, over a 15-fold increase.

The culprit here is garbage collection activity, which visits far more pages than the application itself [60]

Considering the prevalence of virtual memory, this poses a significant problem for garbage collected languages, server-side systems in particular. LTU has covered garbage collectors that co-operate with virtual memory managers before by the same authors.