Implementation

Technometria: Google Web Toolkit

Phil Windley Technometria podcast is dedicated to the Google Web Toolkit. The guest on the show is Bruce Johnson a Tech Lead of GWT.

The show is very good, and more technical than usual. Many themes that are near and dear to LtU are discussed. Here are some pointers:

Bruce talks at length about the advantages of compiling from Java to JS, many of which arise from Java's static typing. He mainly talks about optimizations, but also about how static typing helps with tools in general (IDEs etc.). This was a subject of long and stormy debates here in the past.

The advantages, from a software engineering standpoint, of building in Java vs. JS are discussed. This is directly related to the ongoing discusison here on the new programming-in-the-large features added to JS2. I wonder if someone will write a compiler from Java/GWT to JS2 at some point, which will enable projects to move to JS2 and jump ship on Java all together.

Bruce mentions that since JS isn't class-based, and thus doesn't directly support the OO style many people are used to, there are many ways of translating common OO idioms into JS. This is, of course, the same type of dilemma the Scheme community has about many high level features. Cast as a question on OOP support the questions becomes is it better to provide language constructs that allow various libraries to add OO support in different ways, or to provide language support for a specific style. The same can be asked about a variety of features and programming styles, of course.

Finally, Bruce mentions that as far as he knows no one thought about something like GWT before they did. Well, I for one, and I don't think I was the only one, talked many times (probably on LtU) about Javascript as a VM/assembly language of the browser, clearly thinking about JS as a target language. I admint I wasn't thinking aobut compiling Java... But then, I am not into writing Java, so why would I think about Java as the source language...

The End of an Architectural Era (It’s Time for a Complete Rewrite)

The End of an Architectural Era (It’s Time for a Complete Rewrite). Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, Pat Helland. VLDB 2007.

A not directly PL-related paper about a new database architecture, but the authors provide some interesting and possibly controversial perspectives:

  • They split the application into per-core, single-threaded instances without any communication between them.
  • Instead of using SQL from an external (web app) process to communicate with the database, they envision embedding Ruby on Rails directly into the database.
  • They state that most database warehouse tasks rely on pre-canned queries only, so there is no need for ad-hoc querying.

The somewhat performance-focused abstract:

In two previous papers some of us predicted the end of "one size fits all" as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific data base markets.

Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to one of the popular RDBMSs on the standard transactional benchmark, TPC-C.

We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow's requirements, not continue to push code lines and architectures designed for the yesterday's requirements.

A critical comment by Amazon's CTO, Werner Vogels.

binpac: A yacc for Writing Application Protocol Parsers

binpac: A yacc for Writing Application Protocol Parsers.

R. Pang, V. Paxson, R. Sommer, and L. Peterson. ACM Internet Measurement Conference. October 2006.

A key step in the semantic analysis of network traffic is to parse the traffic stream according to the high-level protocols it contains. This process transforms raw bytes into structured, typed, and semantically meaningful data fields that provide a high-level representation of the traffic. However, constructing protocol parsers by hand is a tedious and error-prone affair due to the complexity and sheer number of application protocols. This paper presents binpac, a declarative language and compiler designed to simplify the task of constructing robust and efficient semantic analyzers for complex network protocols. We discuss the design of the binpac language and a range of issues in generating efficient parsers from high-level specifications. We have used binpac to build several protocol parsers for the "Bro" network intrusion detection system, replacing some of its existing analyzers (handcrafted in C++), and supplementing its operation with analyzers for new protocols. We can then use Bro's powerful scripting language to express application-level analysis of network traffic in high-level terms that are both concise and expressive. binpac is now part of the open-source Bro distribution.

Binpac nicely abstracts away issues such as large numbers of concurrent, asynchronous parsing processes and protocol specifics (such as HTTP's chunked encoding). A parser for a large part of HTTP is presented in the paper and fits on half a page. The authors have also written parsers for CIFS/SMB, DCE/RPC, DNS, NCP, and Sun/RPC.

Derivation and Evaluation of Concurrent Collectors

Derivation and Evaluation of Concurrent Collectors, Martin T. Vechev, David F. Bacon, Perry Cheng, and David Grove. ECOOP 2005.

There are many algorithms for concurrent garbage collection, but they are complex to describe, verify, and implement. This has resulted in a poor understanding of the relationships between the algorithms, and has precluded systematic study and comparative evaluation. We present a single high-level, abstract concurrent garbage collection algorithm, and show how existing snapshot and incremental update collectors, can be derived from the abstract algorithm by reducing precision. We also derive a new hybrid algorithm that reduces floating garbage while terminating quickly. We have implemented a concurrent collector framework and the resulting algorithms in IBM’s J9 Java virtual machine product and compared their performance in terms of space, time, and incrementality. The results show that incremental update algorithms sometimes reduce memory requirements (on 3 of 5 benchmarks) but they also sometimes take longer due to recomputation in the termination phase (on 4 of 5 benchmarks). Our new hybrid algorithm has memory requirements similar to the incremental update collectors while avoiding recomputation in the termination phase.

The Manticore Project

Status Report: The Manticore Project. Along the lines of Concurrent ML comes a new language design that is aimed at the parallelism of multicore CPUs.

The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, we combine CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. We have been working on an implementation of Manticore for the past six months; this paper gives an overview of our design and a report on the status of the implementation effort.

Interesting material in terms of approach and implementation based on an ML style language (sans mutable variables). But the underlying assumption that language design offers the best path to solving parallelism is probably the key appeal for LtU:

The problem posed by such processors is how to effectively exploit the different forms of parallelism provided by the hardware. We believe that mechanisms that are well integrated into a programming language are the best hope for achieving parallelism across a wide range of applications, which is why we are focusing on language design and implementation.

And as long as we're on the subject, I see that Reppy's book on Concurrent ML is now in paperback form (moving it more into my price range).

Zipper as Insecticide

From the 2005 ACM SIGPLAN Workshop on ML, here is An Applicative Control-Flow Graph Based on Huet's Zipper, by Norman Ramsey and João Dias.

We are using ML to build a compiler that does low-level optimization. To support optimizations in classic imperative style, we built a control-flow graph using mutable pointers and other mutable state in the nodes. This decision proved unfortunate: the mutable flow graph was big and complex, it led to many bugs. We have replaced it by a smaller, simpler, applicative flow graph based on Huet's (1997) zipper. The new flow graph is a success; this paper presents its design and shows how it leads to a gratifyingly simple implementation of the dataflow framework developed by Lerner, Grove, and Chambers (2002).

Tagless Staged Interpreters for Simpler Typed Languages

Finally Tagless, Partially Evaluated, Tagless Staged Interpreters for Simpler Typed Languages.
Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan.

We have built the first family of tagless interpretations for a higher-order typed object language in a typed metalanguage (Haskell or ML) that require no dependent types, generalized algebraic data types, or postprocessing to eliminate tags. The statically type-preserving interpretations include an evaluator, a compiler (or staged evaluator), a partial evaluator, and call-by-name and call-by-value CPS transformers.
Our main idea is to encode HOAS using cogen functions rather than data constructors. In other words, we represent object terms not in an initial algebra but using the coalgebraic structure of the LC. Our representation also simulates inductive maps from types to types, which are required for typed partial evaluation and CPS transformations.

Oleg explains:

It seems like a common wisdom that an embedding of a typed object language (e.g., DSL) to a typed meta-language so that all and only typed object terms can be represented requires dependent types, GADTs or other such advanced type systems. In fact, this problem of writing (tagless) type-preserving typed interpreters has been the motivation for most of the papers on GADTs. We show that regardless of merits and conveniences of GADTs, type-preserving typed interpretation can be achieved with no GADTs whatsoever, using very simple type systems of ML or Haskell98. We also show the same approach lets us perform statically type-preserving partial evaluation and call-by-value or call-by-name CPS tansformations. The latter transformations, too, are often claimed impossible in Haskell98 or ML - requiring instead quite advanced type systems or language features.

The complete (Meta)OCaml and Haskell code accompanying the paper is
available (see readme).

One of features of our approach is writing the DSL code in a form that can be interpreted in multiple ways. Recently we have become aware the very same approach underlies `abstract categorial grammars' (ACG) in linguistics. Chung-chieh Shan has written an extensive article on this correspondence. That post itself can be interpreted in several ways: the file can be read as plain text, or it can be loaded as it is in Haskell or OCaml interpreters.

It should be noted that the linguistic terms `tectogrammatics' and `phenogrammatics' were coined by none else but Haskell Curry, in his famous 1961 paper 'Some Logical Aspects of Grammatical Structure'. The summary of the ESSLLI workshop describes further connections to linear lambda-calculus.

The paper has been accepted for APLAS; the authors appreciate any comments indeed.

A functional correspondence between evaluators and abstract machines

A functional correspondence between evaluators and abstract machines by Mads Sig Ager, Dariusz Biernacki, Olivier Danvy, and Jan Midtgaard, 2003.

We bridge the gap between functional evaluators and abstract machines for the lambda-calculus, using closure conversion, transformation into continuation-passing style, and defunctionalization of continuations.
We illustrate this bridge by deriving Krivine's abstract machine from an ordinary call-by-name evaluator and by deriving an ordinary call-by-value evaluator from Felleisen et al.'s CEK machine. The first derivation is strikingly simpler than what can be found in the literature. The second one is new. Together, they show that Krivine's abstract machine and the CEK machine correspond to the call-by-name and call-by-value facets of an ordinary evaluator for the lambda-calculus.
We then reveal the denotational content of Hannan and Miller's CLS machine and of Landin's SECD machine. We formally compare the corresponding evaluators and we illustrate some relative degrees of freedom in the design spaces of evaluators and of abstract machines for the lambda-calculus with computational effects.

I was surprized not to find this paper featured in a story on LtU, as it looks very important both from implementation and understanding points of view.
See also more recent paper by Dariusz Biernacki and Olivier Danvy: From Interpreter to Logic Engine by Defunctionalization, which applies similar ideas in context of Prolog-like language.

Beyond Pretty-Printing: Galley Concepts in Document Formatting Combinators

Beyond Pretty-Printing: Galley Concepts in Document Formatting Combinators, Wolfram Kahl. 1999.

Galleys have been introduced by Jeff Kingston as one of the key concepts underlying his advanced document formatting system Lout. Although Lout is built on a lazy functional programming language, galley concepts are implemented as part of that language and defined only informally. In this paper we present a first formalization of document formatting combinators using galley concepts in the purely functional programming language Haskell.

We've talked about functional layout algorithms for mathematics before, so it seems like a good idea to link to a paper about typesetting all the text those formulas are surrounded by.

Status Report: HOT Pickles, and how to serve them

Status Report: HOT Pickles, and how to serve them, Andreas Rossberg, Guido Tack, and Leif Kornstedt. ML Workshop 2007.

The need for flexible forms of serialisation arises under many circumstances, e.g. for doing high-level inter-process communication or to achieve persistence. Many languages, including variants of ML, thus offer pickling as a system service, but usually in a both unsafe and inexpressive manner, so that its use is discouraged. In contrast, safe generic pickling plays a central role in the design and implementation of Alice ML: components are defined as pickles, and modules can be exchanged between processes using pickling. For that purpose, pickling has to be higher-order and typed (HOT), i.e. embrace code mobility and involve runtime type checks for safety. We show how HOT pickling can be realised with a modular architecture consisting of multiple abstraction layers for separating concerns, and how both language and implementation benefit from a design consistently based on pickling.

XML feed