archives

Writing practical memory management code with a strictly typed assembly language

Writing practical memory management code with a strictly typed assembly language, by Toshiyuki Maeda, and Akinori Yonezawa.

Memory management (e.g., malloc/free) cannot be implemented in traditional strictly typed programming languages because they do not allow programmers to reuse memory regions, in order to preserve memory safety. Therefore, they depend on external memory management facilities, such as garbage collection. Thus, many important programs that require explicit memory management (e.g., operating systems) have been written in weakly typed programming languages (e.g., C). To address the problem, we designed a new strictly and statically typed assembly language which is flexible and expressive enough to implement memory management. The key idea of our typed assembly language is that it supports variable-length arrays (the arrays whose size is not known until runtime) as language primitives, maintains integer constraints between variables, and keeps track of pointer aliases. Based on the idea, we implemented a prototype implementation of the language. We also implemented a small operating system kernel which provides a memory management facility with the language.

This paper forms part of Toshiyuki Maeda's thesis, where he also uses TALK (OCaml source code available) to build a certified x86 kernel called TOS (source code available). I'm surprised I hadn't heard of this work before, as it seems quite practical.

The paper does not discuss the limitations of TALK, though the thesis does. The most significant disadvantage is described in section 2.8.1:

The type system of TALK is flexible and expressive enough to implement practical memory management code (e.g., malloc/free). However, it is not enough to efficiently implement all the data structures that can be represented in ordinary typed languages, because the type system of TALK restricts pointer manipulation in some cases in order to keep track of aliasing relations between pointers. For example, generic graph data structures including directed acyclic graphs and cyclic graphs cannot be represented naturally in the type system of TALK. This is because once a memory region is encapsulated inside an existential package, the region cannot be accessed until the package is unpacked.
They suggest prior work designed to relax linearity restrictions can be applied to TALK to enable graph structures to be captured.

While TALK is a low-level assembly language, other higher level languages have attempted to capture safe low-level memory management as well.

Problematic data structure in functional language

I am trying to build an efficient data structure for representing boolean expressions in Erlang. I am having trouble producing a good data structure in a functional context, although there are very fast data structures for imperative languages like C. The data structure is to be used for a functional SAT solver.

A boolean expression (always in conjunctive normal form) can be described as a set of variables, such as {p,q,r...} where each variable in the set is mapped to one of three values true|false|indetermined yielding something like {{p->ind},{q->true},{r->ind},...}.

The expression is described as a set of sets of literals, where a literal is a variable with a possible negation (~). So we have something like {{~p,q},{~q,r}} which describes a formula that would usually be written (~p v q)&(~q v r). Each of the inner sets has the property that if one of its literals evaluates to true then the whole set can be removed from the outer set.

To being with we can represent the expression as a list of lists where the variable-name, negation and value are all stored in one tuple. [[{name,negation,value}]]

Now if we assign a value to a variable we then need to traverse every sublist ensuring a consistent assignment is made for every occurrence of the varia ble in the expression. However, as a reward for this effort we can easily check each sublist to see whether one of its members evaluates to true (meaning the sublist can be removed).

Alternately we could represent the expression as a list of lists where just the variable name and negation are recorded {name,negation} and have the assignment values stored as a separate mapping. This makes assignments fast and safer, but we still need to traverse every sublist to see that it does not now contain a literal evaluating to true. Is there a better way, I suspect that I am thinking about it in the wrong way. Thanks.