Implementation

Run time type checking

A couple of items that are loosely related. First, Sun is updating the Java Class File Specification.

The main difference is the introduction of a new verification scheme based on type checking rather than type inference. This scheme has inherent advantages in performance (both space (on the order of 90% savings) and time (approximately 2x)) over the previous approach. It is also simpler and more robust, and helps pave the way for future evolution of the platform.
I haven't read through the fine print of the changes yet - at some point I need to update my class file disassembler - but then I still haven't gotten around to the changes in Java 5 which were much less involved. One thing that I found interesting was the use of Prolog in the specification (predicates and horn clauses).
The type rules that the typechecker enforces are specified by means of Prolog clauses. English language text is used to describe the type rules in an informal way, while the Prolog code provides a formal specification.
On a different note, there is a new thesis paper on A Semantics For Lazy Types related to the Alice ML project. The specifications for runtime type checking are in the more formal operational specs of ML and the runtime type checking is more ambitious than that associated with Java.
Type analysis needs to compare dynamic types, which have to be computed from (probably higher-order) type expressions that may contain types from other modules. If lazy linking has delayed the loading of these modules so far, then the corresponding types are represented by free type variables. For the type-level computation in our model this means that it may encounter free type variables whose denotation depends on yet unevaluated term-level expressions. To proceed, it inevitably has to eliminate these variables by triggering the necessary evaluations on term-level. In other words: type-level computation has to account for lazy types. We give two strategies for this and show soundness properties for the resulting calculi.
In general, it means you still have to with dynamic type checking in static languages - though the meaning is different than the runtime typing that occurs with dynamic PLs.

A Very Modal Model of a Modern, Major, General Type System

A Very Modal Model of a Modern, Major, General Type System, by Andrew W. Appel, Paul-Andre Mellies, Christopher D. Richards, and Jerome Vouillon. Preliminary version of August 10, 2006.

We wish to compile languages such as ML and Java into typed intermediate languages and typed assembly languages. These TILs and TALs are particularly difficult to design, because in order to describe the program transformations applied in the course of compilation, they require a very rich and expressive type system... Putting all these type ingredients together in a low-level language is an intricate exercise. A formal proof of soundness —any well-typed program does not go wrong—is thus recommended for any type system for such TILs and TALs.

It has been awhile since we discussed work in this area. The current paper is quite intriacte, it seems, and I don't have the time to read it carefully. Maybe someone else would care to elaborate. The paper makes a few technical innovations, and uses several interesting techniques. Soundness is not proved syntactically, but rather semantically.

Some LtU member will be happy to see that the authors use Coq to formalize their proofs.

An Incremental Approach to Compiler Construction

Abdulaziz Ghuloum (Indiana University), An Incremental Approach to Compiler Construction

Compilers are perceived to be magical artifacts, carefully crafted by the wizards, and unfathomable by the mere mortals. Books on compilers are better described as wizard-talk: written by and for a clique of all-knowing practitioners. Real-life compilers are too complex to serve as an educational tool. And the gap between real-life compilers and the educational toy compilers is too wide. The novice compiler writer stands puzzled facing an impenetrable barrier, “better write an interpreter instead.”


The goal of this paper is to break that barrier. We show that building a compiler can be as easy as building an interpreter.

This paper from the Scheme workshop presents an extended tutorial showing how to implement a compiler from Scheme to x86 assembly language. The compiler is written in Scheme, obviously (something that may confuse students, I might add, so beware).

An important aspect of the presentation of the material is that the compiler is built incrementally, and the product of each step is a working compiler for a growing subset of the language. This is similar to the EOPL approach. In contrast many compiler textbooks and courses follow a different route, in which each phase is dedicated to building a different part of the compiler (i.e., lexer, parser etc.), and thus the student only sees how the parts interconnect, and gets the satisfaction of seeing the compiler work, at the very end of the process.

Supporting material can be found here.

Samurai - Protecting Critical Heap Data in Unsafe Languages

Samurai - Protecting Critical Heap Data in Unsafe Languages.
Karthik Pattabiraman, Vinod Grover, Benjamin G. Zorn.
September 2006.

Programs written in type-unsafe languages such as C and C++ incur costly memory safety errors that result in corrupted data structures, program crashes, and incorrect results. Previous approaches to eliminating these errors attempt to eliminate all unsafe memory operations in a program. We present Samurai, a runtime system that allows programmers to selectively identify heap objects that are critical to correct execution of their program. Samurai supports operations to consistently read and update critical data and probabilistically guarantees that no other memory updates in the program will corrupt critical data. Samurai uses replication and randomization to provide these consistency guarantees. Because Samurai is oblivious to memory operations on noncritical data, the majority of memory operations in programs run at full speed, and Samurai is compatible with 3rd party libraries. We have annotated five benchmark programs with Samurai and we present measurements of the execution overhead and fault tolerance that Samurai provides. Samurai can be applied selectively to parts of a program’s heap allowing the execution overhead to be tailored for the needs of a particular application.

Essentially, you use a custom memory allocator for critical objects, so that they are stored redundantly (i.e, they are replicated).

Related reading: Failure-oblivious computing.

Gradual Typing for Functional Languages

Gradual Typing for Functional Languages

Static and dynamic type systems have well-known strengths and weaknesses, and each is better suited for different programming tasks. There have been many efforts to integrate static and dynamic typing and thereby combine the benefits of both typing disciplines in the same language. The flexibility of static typing can be improved by adding a type Dynamic and a typecase form. The safety and performance of dynamic typing can be improved by adding optional type annotations or by performing type inference (as in soft typing). However, there has been little formal work on type systems that allow a programmer-controlled migration between dynamic and static typing. Thatte proposed Quasi-Static Typing, but it does not statically catch all type errors in completely annotated programs. Anderson and Drossopoulou defined a nominal type system for an object-oriented language with optional type annotations. However, developing a sound, gradual type system for functional languages with structural types is an open problem.

In this paper we present a solution based on the intuition that the structure of a type may be partially known/unknown at compile-time and the job of the type system is to catch incompatibilities between the known parts of types. We define the static and dynamic semantics of a λ-calculus with optional type annotations and we prove that its type system is sound with respect to the simply-typed λ-calculus for fully-annotated terms. We prove that this calculus is type safe and that the cost of dynamism is “pay-as-you-go”.

In other news, the Holy Grail has been found. Film at 11.

This piece of genius is the combined work of Jeremy Siek, of Boost fame, and Walid Taha, of MetaOCaml fame. The formalization of their work in Isabelle/Isar can be found here.

I found this while tracking down the relocated Concoqtion paper. In that process, I also found Jeremy Siek's other new papers, including his "Semantic Analysis of C++ Templates" and "Concepts: Linguistic Support for Generic Programming in C++." Just visit Siek's home page and read all of his new papers, each of which is worth a story here.

Lisp Machine Manual

This is a prerelease version of the hypertext edition of the 6th edition of the Lisp Machine Manual. To read this manual, you need to use a browser that is capable of performing XSL transformations (Firefox, Internet Explorer and others).

A 1984 document by Richard Stallman, Daniel Weinreb and David Moon.

The preface contains a "personal note" from Stallman announcing GNU.

Some sections of interest are: cdr coding, closure manipulation functions, stack groups, and areas.

Concoqtion: Mixing Indexed Types and Hindley-Milner Type Inference

From the "Whoa!" files:

Concoqtion: Mixing Indexed Types and Hindley-Milner Type Inference

This paper addresses the question of how to extend OCaml’s Hindley-Milner type system with types indexed by logical propositions and proofs of the Coq theorem prover, thereby providing an expressive and extensible mechanism for ensuring fine-grained program invariants. We propose adopting the approached used by Shao et al. for certified binaries. This approach maintains a phase distinction between the computational and logical languages, thereby limiting effects and non-termination to the computational language, and maintaining the decidability of the type system. The extension subsumes language features such as impredicative first-class (higher-rank) polymorphism and type operators, that are notoriously difficult to integrate with the Hindley-Milner style of type inference that is used in OCaml. We make the observation that these features can be more easily integrated with type inference if the inference algorithm is free to adapt the order in which it solves typing constraints to each program. To this end we define a novel “order-free” type inference algorithm. The key enabling technology is a graph representation of constraints and a constraint solver that performs Hindley-Milner inference with just three graph rewrite rules.

Another tough-to-categorize one: dependent types, the Curry-Howard Correspondence, logic programming, theorem provers as subsystems of compilers, implementation issues... it's all in here.

Update: A prototype implementation is available here, but it took a bit of Google-fu to find, and it's brand new, so be gentle.

Update II: The prototype implementation isn't buildable out of the box, and includes a complete copy of both the Coq and O'Caml distributions, presumably with patches etc. already applied. So it's clearly extremely early days yet. But this feels very timely to me, perhaps because I've just started using Coq within the past couple of weeks, and got my copy of Coq'Art and am enjoying it immensely.

Update III: It occurs to me that this might also relate to Vesa Karvonen's comment about type-indexed functions, which occurs in the thread on statically-typed capabilities, so there might be a connection between this front-page story and the front-page story on lightweight static capabilities. That thought makes me happy; I love it when concepts converge.

Leaky regions: linking reclamation hints to program structure

Tim Harris. Leaky regions: linking reclamation hints to program structure. June 2006 Microsoft Research Technical Report MSR-TR-2006-84.

This paper presents a new mechanism for automatic storage reclamation based on exploiting information about the relationship between object lifetimes and points in a program's execution. In our system method calls are annotated to indicate that most of the objects allocated during the call are expected to be unreachable by the time it returns. A write barrier detects if objects escape from one of these calls, causing them to be retained and subsequently managed by an ordinary generational collector. We describe a tool that helps select suitable annotation sites and we outline how this process can be fully automated. We show that if these annotations are placed judiciously then the additional costs of the write barrier can be outweighed by savings in collection time.

Not really my area of interest, but since there was much discussion about GC and related topics in the discussion group I thought I'd post a link to this tech report.

Abstracting Allocation: The New new Thing

Abstracting Allocation: The New new Thing. Nick Benton.

We introduce a Floyd-Hoare-style framework for specification and verification of machine code programs, based on relational parametricity (rather than unary predicates) and using both step-indexing and a novel form of separation structure. This yields compositional, descriptive and extensional reasoning principles for many features of low-level sequential computation: independence, ownership transfer, unstructured control flow, first-class code pointers and address arithmetic. We demonstrate how to specify and verify the implementation of a simple memory manager and, independently, its clients in this style. The work has been fully machine-checked within the Coq proof assistant.

This is, of course, related to TAL, PCC etc. If you find the deatils too much, I suggest reading the discussion (section 7) to get a feel for the possible advantages of this approach.

Pluvo : new hybrid scripting language

Pluvo is a functional and imperative programming language intended to be highly practical for scripting and CGIs, interpreted using a just-in-time compiler. Its hybrid dynamic and strict typing is flexible enough to allow duck typing. Its functions are, more specifically, closures, and it allows object orientation using prototypes.

From Sean B. Palmer.

Knowing he's a big fan of Python I expected a fair bit of influence - and there is, in fact the implementation is written in Python. Flexibility over typing was to be expected too, Sean's done a lot of work around RDF. Slightly surprising was Pluvo's syntax, which owes more to Bash.

XML feed