Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Types

Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Types, Koichi Kodama, Kohei Suenaga, Naoki Kobayashi, JFP 2008.

There are two ways to write a program for manipulating tree-structured data such as XML documents: One is to write a tree-processing program focusing on the logical structure of the data and the other is to write a stream-processing program focusing on the physical structure. While tree-processing programs are easier to write than stream-processing programs, tree-processing programs are less efficient in memory usage since they use trees as intermediate data. Our aim is to establish a method for automatically translating a tree-processing program to a stream-processing one in order to take the best of both worlds.

We first define a programming language for processing binary trees and a type system based on ordered linear types, and show that every well-typed program can be translated to an equivalent stream-processing program. We then extend the language and the type system to deal with XML documents. We have implemented an XML stream processor generator based on our algorithm, and obtained promising experimental results.

Linear logic is what you get when you give up the structural rules of weakening and contraction -- it's logic when you cannot reuse or forget any hypothesis you make in the course of a proof. This means that every formula is used exactly once, which makes it useful (via Curry-Howard) for programming, as well: linear types give us a way of tracking resources and state in a type system. Intuitively, you can think of it as a kind of static analysis that ensures that an object's runtime reference count will always be one.

However, linear logic retains the structural rule of exchange, which lets us use hypotheses in a different order than we introduced them. Ordered logic is what you get when you drop exchange, as well. (Amusingly, it was invented long before linear logic -- in the 1950s, Lambek introduced it with an eye towards applications in linguistics: he wanted to express the difference between one word occurring either before, or after, another.)

This means that you can use ordered logic to express the order in your types the order in which you use data, as well, which the authors here use to design a language whose typechecker statically rules out memory-inefficient programs.