Implementation

The Transactional Memory / Garbage Collection Analogy

Courtesy of my shiny new commute, I have been listing to various podcasts, including Software Engineering Radio. A while back, they had an interview with Dan Grossman on his OOPSLA 2007 paper, which I have not seen discussed here.

The Transactional Memory / Garbage Collection Analogy is an essay comparing transactional memory with garbage collection based on the analogy:

Transactional memory (TM) is to shared-memory concurrency
as
garbage collection (GC) is to memory management.

Grossman presents the analogy as a word-for-word transliteration of a discussion of each of the technologies. (Hence the "fun" category.)

(As an aside, Grossman does not address message-passing, but says, "Assuming that shared memory is one model we will continue to use for the foreseeable future, it is worth improving," which is probably a correct assumption.)

One point that he does make is that

[This essay] will lead us to the balanced and obvious-once-you-say-it conclusion that transactions make it easy to define critical sections (which is a huge help in writing and maintaining shared-memory programs) but provide no help in identifying where a critical section should begin or end (which remains an enormous challenge).

The one serious weakness of the analogy, to me, is that GC does not require (much) programmer input to work, while TM does.

Although some parts of the analogy are strained, there are some interesting correspondences.

Google V8 JavaScript Engine

You can read the docs and download the C++ source here.

V8 is supposedly the main added value of Chrome, the newly announced Google browser.

Our discussion of the Chrome announcement enumerates some of the features of V8.

Closures for C

Chris Lattner's post last week to cfe-dev, "Blocks" in Clang (aka closures), has generated a certain amount of enthusiasm, since Clang is, or will be, a compiler for C to LLVM (cf. LLVM 1.5 has been released!).

The representation of closures as blocks resembles the technique used for Poletto's 'C language; cf. the related discussion on The MetaC Language.

By Charles Stewart at 2008-09-01 13:53 | Implementation | 27 comments | other blogs | 27893 reads

Continuation Fest 2008

I had received the announcement for the Continuation Fest 2008, but then completely forgot about it. Back in mid-April, some neat stuff was going on in Tokyo:

Demo of persistent delimited continuations in OCaml for nested web transactions, Oleg Kiselyov, [Slides]. (I wish I had seen that a few years back when I was fooling with the Infernal Device. Just beautiful.)
How to implement continuation only language in gcc 4.x, Shinji Kono, [Slides].
Continuing from the past: An approach to building dynamically upgradable applications, Takuo Watanabe and Yuichi Tanaka, [Slides].
Plus others....

[Edit: fixed url.]

Catch me if you can: Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml

Catch me if you can: Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml, by David Teller, Arnaud Spiwack, Till Varoquaux:

This is the year 2008 and ML-style exceptions are everywhere. Most modern languages, whether academic or industrial, feature some variant of this mechanism. Languages such as Java even have a degree of out-of-the-box static coverage-checking for such exceptions, which is currently not available for ML languages, at least not without resorting to external tools.

In this document, we demonstrate a design principle and a tiny library for managing errors in a functional manner, with static coverage-checking, automatically-inferred, structurally typed and hierarchical exceptional cases, all of this for what we believe is a reasonable run-time cost. Our work is based on OCaml and features simple uses of higher-order programming, low-level exceptions, phantom types, polymorphic variants and compile-time code
rewriting.

Exhaustively checked, user-friendly exception handling was a bit of an open problem for awhile. As the paper details, languages supported either cumbersome, exhaustively checked polymorphic exceptions, as in Haskell, or we had unchecked easily extensible monomorphic exceptions, as in ML, or we had checked, extensible exceptions using a universal type as in Java.

Supporting exhaustively checked, easily extensible polymorphic exceptions seemed quite a challenge, which this paper solves using monadic error handling and nested polymorphic variants. The paper also gives a good overview of current techniques of exception checking in OCaml, ie. ocamlexc.

The performance of such exceptions is understandably lower than native exceptions, given all the thunking and indirection that monads entail. The authors attempt various implementations and test their performance against native exceptions. Ultimately, monadic error management seems acceptable for actual error handling, but not for control flow as native exceptions are sometimes used in OCaml.

One interesting extension is to consider how efficient the implementations would be given more sophisticated control flow operators, such as continuations, coroutines, or delimited continuations, or whether native exceptions can be salvaged using a type and effects system in place of monads.

Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

via Jeff Newbern in the discussion forum comes the writeup from the winning entry in the MEMOCODE 2007 contest:

This year, the ï¬rst MEMOCODE hardware/software codesign contest posed the following problem: optimize matrix-matrix multiplication in such a way that it is split between the FPGA and PowerPC on a Xilinx Virtex IIPro 30. In this paper we discuss our solution, which we implemented on a Xilinx XUP development board with 256 MB of DRAM. The design was done by the ï¬ve authors over a span of approximately 3 weeks, though of the 15 possible man-weeks, about 9 were actually spent working on this problem. All hardware design was done using Bluespec SystemVerilog (BSV), with the exception of an imported Verilog multiplication unit, necessary only due to the limitations of the Xilinx FPGA toolï¬‚ow optimizations.

Wow! This is much cooler than the kinds of programs I write :-)

This MIT/Bluespec team won the first contest at MEMOCODE 2007 and the second in MEMOCODE 2008. The results for 2008 are very interesting: The Bluespec team had the highest performance by an order of magnitude, the next fastest program was written by one guy in straight C without using the FPGA, and the rest were mostly in a hybrid of C and (V?)HDL.

Does this make Bluespec the programming tool of choice for discriminating hardware hackers?

By Luke Gorrie at 2008-07-01 11:22 | Implementation | 7 comments | other blogs | 13072 reads

Project Coverage

Thanks to French public funds, the next generation of Free Software code coverage tools is on its way. Project Coverage will produce a Free Software coverage analysis toolset together with the ability to generate artifacts that allow the tools to be used for safety-critical software projects undergoing a DO-178B software audit process for all levels of criticality...
The key insight of â€œProject Coverageâ€ is as follows: code coverage can greatly benefit from recent advances in hardware virtualization technology as promoted, for instance, by QEMU.

By Ehud Lamm at 2008-06-24 02:39 | Implementation | 1 comment | other blogs | 7166 reads

Pure imperative programming

Two intensively studied intermediate representations in compiler theory are Static Single Assignment form (SSA) and CPS translations, and Richard Kelsey's 1995 paper, A Correspondence Between Continuation Passing Style and Static Single Assignment Form (.ps.gz) shows a nearly complete, exact equivalence between the two IRs.

The correspondence shows how the imperatively expressed SSA can be regarded as side-effect free, and Andrew Appel has pushed this idea to claim that SSA is functional programming. This result is of clear relevance to discussions turning on "what is purity?", such as here.

As an aside, the Wikipedia article Static Single Assignment form is rather good.

Parametric Higher-Order Abstract Syntax for Mechanized Semantics

We present parametric higher-order abstract syntax (PHOAS), a new approach to formalizing the syntax of programming languages in computer proof assistants based on type theory. Like higher-order abstract syntax (HOAS), PHOAS uses the meta language's binding constructs to represent the object language's binding constructs. Unlike HOAS, PHOAS types are definable in general-purpose type theories that support traditional functional programming, like Coq's Calculus of Inductive Constructions. We walk through how Coq can be used to develop certified, executable program transformations over several statically-typed functional programming languages formalized with PHOAS; that is, each transformation has a machine-checked proof of type preservation and semantic preservation. Our examples include CPS translation and closure conversion for simply-typed lambda calculus, CPS translation for System F, and translation from a language with ML-style pattern matching to a simpler language with no variable-arity binding constructs. By avoiding the syntactic hassle associated with first-order representation techniques, we achieve a very high degree of proof automation.

I was aware of this some months ago now, but held back commenting on it at Adam's request until it had been accepted for publication, which it now apparently has. This is (one element of) Adam's continued work on LambdaTamer, his Coq-based environment for building certified compilers. There is a new version of LambdaTamer using this parametric higher-order abstract syntax approach. The new version also works in current and future versions of Coq, unlike the previous version. Finally, Adam is apparently working on a paper regarding "type-theoretic denotational semantics for higher order, impure object languages" and is post-docing with Greg Morrisett. The relationship between Adam's work and the YNot project is a bit unclear to me; perhaps either Adam or Greg could help clarify that.

Update: Whoops. I got ahead of myself and neglected to notice that the paper is not actually yet available, although the new version of LambdaTamer is. So at the moment, this story is merely to note that the paper exists and to provide a link to the new LambdaTamer. My apologies to Adam and the LtU readership.

2nd Update: The paper is now available at the link, in either PostScript or PDF form.

Automatic Generation of Peephole Superoptimizers

Automatic Generation of Peephole Superoptimizers, Sorav Bansal and Alex Aiken, ASPLOS 2006.

Peephole optimizers are typically constructed using human-written pattern matching rules, an approach that requires expertise and time, as well as being less than systematic at exploiting all opportunities for optimization. We explore fully automatic construction of peephole optimizers using brute force superoptimization. While the optimizations discovered by our automatic system may be less general than human-written counterparts, our approach has the potential to automatically learn a database of thousands to millions of optimizations, in contrast to the hundreds found in current peephole optimizers. We show experimentally that our optimizer is able to exploit performance opportunities not found by existing compilers; in particular, we show speedups from 1.7 to a factor of 10 on some compute intensive kernels over a conventional optimizing compiler.

It's always fun to see a method that ought to be intractable rendered tractable through a combination of cleverness and strategically applied brute force.

I found their performance measurements suggestive but perplexing. Their results on their kernels are really astoundingly good, which they claim that is because they make use of the x86's SIMD instructions. But there is very little change in the SPEC benchmarks, and they subsequently suggest that their peephole optimizer catches many things that other compilers get via dataflow optimization. As a result, I don't feel like I have a clear picture of what their optimizer does and doesn't do, or where it does and doesn't overlaps with existing optimizations. But they definitely have enough to convince me that this is worth further study, so I really hope Bonsal and Aiken publish some more about this -- a tech report or journal paper with more systematic measurements and in-depth analysis would be really illuminating.

By neelk at 2008-05-05 19:56 | Implementation | 8 comments | other blogs | 14324 reads

Browse archives

Active forum topics

New forum topics

It could be.
4 weeks 1 day ago
Not sure this is new
4 weeks 2 days ago
My preferred design choices
16 weeks 2 days ago
Delimited continuations don't parallelize well.
16 weeks 3 days ago
With Unicode, Character as a type is in fact meaningless.
16 weeks 4 days ago
With Unicode, Character as a type is in fact meaningless.
16 weeks 4 days ago
...people claim about C++
17 weeks 3 days ago
Terminology counts
17 weeks 3 days ago
Terminology counts
17 weeks 3 days ago
Yeah, been there...
20 weeks 3 days ago

Lambda the Ultimate

User login

Navigation