A Verified Compiler for an Impure Functional Language

A Verified Compiler for an Impure Functional Language

We present a verified compiler to an idealized assembly language from a small, untyped functional language with mutable references and exceptions. The compiler is programmed in the Coq proof assistant and has a proof of total correctness with respect to big-step operational semantics for the source and target languages. Compilation is staged and includes standard phases like translation to continuation-passing style and closure conversion, as well as a common subexpression elimination optimization. In this work, our focus has been on discovering and using techniques that make our proofs easy to engineer and maintain. While most programming language work with proof assistants uses very manual proof styles, all of our proofs are implemented as adaptive programs in Coq's tactic language, making it possible to reuse proofs unchanged as new language features are added.

In this paper, we focus especially on phases of compilation that rearrange the structure of syntax with nested variable binders. That aspect has been a key challenge area in past compiler verification projects, with much more effort expended in the statement and proof of binder-related lemmas than is found in standard pencil-and-paper proofs. We show how to exploit the representation technique of parametric higher-order abstract syntax to avoid the need to prove any of the usual lemmas about binder manipulation, often leading to proofs that are actually shorter than their pencil-and-paper analogues. Our strategy is based on a new approach to encoding operational semantics which delegates all concerns about substitution to the meta language, without using features incompatible with general purpose type theories like Coq's logic.

Further work on/with LambdaTamer for certified compiler development.

Certified Programming With Dependent Types Goes Beta

Certified Programming With Dependent Types

From the introduction:

We would all like to have programs check that our programs are correct. Due in no small part to some bold but unfulfilled promises in the history of computer science, today most people who write software, practitioners and academics alike, assume that the costs of formal program verification outweigh the benefits. The purpose of this book is to convince you that the technology of program verification is mature enough today that it makes sense to use it in a support role in many kinds of research projects in computer science. Beyond the convincing, I also want to provide a handbook on practical engineering of certified programs with the Coq proof assistant.

This is the best Coq tutorial that I know of, partially for being comprehensive, and partially for taking a very different tack than most with Adam's emphasis on proof automation using Coq's Ltac tactic language. It provides an invaluable education toward understanding what's going on either in LambdaTamer or Ynot, both of which are important projects in their own rights.

Please note that Adam is explicitly requesting feedback on this work.

Why Normalization Failed to Become the Ultimate Guide for Database Designers?

While trying to find marshall's claim that Alberto Mendelzon says the universal relation is an idea re-invented once every 3 years (and later finding a quote by Jeffrey Ullman that the universal relation is re-invented 3 times a year), I stumbled across a very provocative rant by a researcher/practitioner: Why Normalization Failed to Become the Ultimate Guide for Database Designers? by Martin Fotache. It shares an interesting wealth of experience and knowledge about logical design. The author is obviously well-read and unlike usual debates I've seen about this topic, presents the argument thoroughly and comprehensively.

The abstract is:

With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.

The body of the paper presents an explanation for why practitioners have rejected normalization. The author also shares his opinion on potentially underexplored ideas as well, drawing from an obviously well-researched depth of knowledge. In recent years, some researchers, such as Microsoft's Pat Helland, have even said Normalization is for sissies (only to further this with later formal publications such as advocating we should be Building on Quicksand). Yet, the PLT community is pushing for the exact opposite. Language theory is firmly rooted in formal grammars and proven correct 'tricks' for manipulating and using those formal grammars; it does no good to define a language if it does not have mathematical properties ensuring relaibility and repeatability of results. This represents and defines real tension between systems theory and PLT.

I realize this paper focuses on methodologies for creating model primitives, comparing mathematical frameworks to frameworks guided by intuition and then mapped to mathematical notions (relations in the relational model), and some may not see it as PLT. Others, such as Date, closely relate understanding of primitives to PLT: Date claims the SQL language is to blame and have gone to the lengths of creating a teaching language, Tutorial D, to teach relational theory. In my experience, nothing seems to effect lines of code in an enterprise system more than schema design, both in the data layer and logic layer, and often an inverse relationship exists between the two; hence the use of object-relational mapping layers to consolidate inevitable problems where there will be The Many Forms of a Single Fact (Kent, 1988). Mapping stabilizes the problem domain by labeling correspondances between all the possible unique structures. I refer to this among friends and coworkers as the N+1 Schema Problem, as there is generally 1 schema thought to be canonical, either extensionally or intensionally, and N other versions of that schema.

Question: Should interactive programming languages aid practitioners in reasoning about their bad data models, (hand waving) perhaps by modeling each unique structure and explaining how they relate? I could see several reasons why that would be a bad idea, but as the above paper suggests, math is not always the best indicator of what practitioners will adopt. It many ways this seems to be the spirit of the idea behind such work as Stephen Kell's interest in approaching modularity by supporting evolutionary compatibility between APIs (source texts) and ABIs (binaries), as covered in his Onward! paper, The Mythical Matched Modules: Overcoming the Tyranny of Inflexible Software Construction. Similar ideas have been in middleware systems for years and are known as wrapper architecures (e.g., Don’t Scrap It, Wrap It!), but haven't seen much PLT interest that I'm aware of; "middleware" might as well be a synonym for Kell's "integration domains" concept.

Simplicial Databases

Simplicial Databases, David I. Spivak.

In this paper, we define a category DB, called the category of simplicial databases, whose objects are databases and whose morphisms are data-preserving maps. Along the way we give a precise formulation of the category of relational databases, and prove that it is a full subcategory of DB. We also prove that limits and colimits always exist in DB and that they correspond to queries such as select, join, union, etc. One feature of our construction is that the schema of a simplicial database has a natural geometric structure: an underlying simplicial set. The geometry of a schema is a way of keeping track of relationships between distinct tables, and can be thought of as a system of foreign keys. The shape of a schema is generally intuitive (e.g. the schema for round-trip flights is a circle consisting of an edge from $A$ to $B$ and an edge from $B$ to $A$), and as such, may be useful for analyzing data. We give several applications of our approach, as well as possible advantages it has over the relational model. We also indicate some directions for further research.

This is what happens when you try to take the existence of ORDER BY and COUNT in SQL seriously. :-)

If you're puzzled by how a geometric idea like simplexes could show up here, remember that the algebraic view of simplicial sets is as presheaves on the category of finite total orders and order-preserving maps. Every finite sequence gives rise to a total order on its set of positions, and tables have rows and columns as sequences!

50 years of Advanced Programming – an Anniversary Seminar on Algol 60

The influential Algol 60 report was created 50 years ago after a meeting in Paris held from 1 to 16 January, 1960. A seminar marking the 50th anniversary of Algol 60 will be held next week:

This seminar marks the 50th anniversary of the publication of the ALGOL 60 report. Its significance and influence will be presented and discussed by a panel of distinguished speakers. Contributions for this anniversary from several of the pioneers in this field will be included. The talks will be followed by discussion - till 5pm. This is a joint meeting of the Computer Conservation Society and the BCS Advanced Programming Specialist Group. More details of the programme will be given before the event on the website of the BCS Advanced Group – see the page for the APG January event.

Date/Time: Thursday 14th January 2010, 2.30 - 5.00 p.m.

Venue:The Library, The Science Museum, Exhibition Road, South Kensington, London S.W.7.

Contributors:
Mike Woodger interviewed by Brian Wichmann
Brian Randell on "Early compilers"
Cliff Jones with a contribution from Tony Hoare
Richard Bornat on Peter Landin's early work
Cliff Jones with a contribution from Luca Cardelli
David Holdsworth speaking on revivifying the KDF9 compiler
Peter Onion speaking on practical aspects of using the Elliott 803 compiler

Session Chairmen: John Florentin and David Hartley

Physics, Topology, Logic and Computation: A Rosetta Stone

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay, 2009.

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear
that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear
operator behaves very much like a `cobordism': a manifold representing spacetime, going between two
manifolds representing space. This led to a burst of work on topological quantum field theory and `quantum
topology'. But this was just the beginning: similar diagrams can be used to reason about logic, where
they represent proofs, and computation, where they represent programs. With the rise of interest in quantum
cryptography and quantum computation, it became clear that there is extensive network of analogies between
physics, topology, logic and computation. In this expository paper, we make some of these analogies precise
using the concept of `closed symmetric monoidal category'. We assume no prior knowledge of category
theory, proof theory or computer science.

I am not sure whether this should be categorized as "Fun" instead of "Theory", given that "We assume no prior knowledge of category theory, proof theory or computer science".

At least one pair from the title (logic and computation) should ring some bells...

ParaSail, a new language oriented toward parallelism and verification

For those who enjoy seeing sausages made, as opposed to just eating them, we have started a blog following the design of a new programming language called "ParaSail" for Parallel Specification and Implementation Language (not to be confused with SAIL or MAINSAIL, languages originating in the Stanford AI Lab):

http://parasail-programming-language.blogspot.com/

ParaSail doesn't exist yet, but the blog talks about it in what my colleagues call the "software present tense," presuming some day it might. Postings are about once a week, though it varies a lot. As with most blogs, the most recent entry is presented first, but it is probably wise to read the first few entries as they give a bit of the background, before reading the more current entries.

ParaSail is intended for high-criticality software. The things that might make ParaSail of interest are its implicit parallelism, its integrated support for annotations (preconditions, postconditions, assertions, constraints, etc.), the fact that all modules are parameterized (i.e. are essentially generic templates), and that there are only two kinds of modules, interfaces and classes (which implement interfaces). A type is produced by instantiating an interface with appropriate parameters. The two kinds of modules are intended to cover the entire spectrum of grouping constructs, from packages/modules at the high end, to simple types at the low end.

ParaSail clearly inherits plenty of ideas from other languages, such as Modula-2/3, ML/CAML/OCAML, CLU, Eiffel, C++, Java, Ada, etc. The implicit parallelism and the integrated annotations were some of the reasons to justify starting from scratch, but also there is the pleasure of now and then starting with a clean sheet of paper.

In any case, it is inherently a work in progress, but comments and questions are certainly welcomed.

-Tucker Taft

The year in review, and What's to come

Naturally in this day and age, the path to understanding the past and future goes through twitter.

Now here is the challenge, in order to make this more exciting: What we really need is a statistical analysis of the #code2009 and #code2010 streams (and in particular, their differences). The goal is to post code that does this analysis in the most elegant and succinct way; naturally using the langues du jour earns bonus points.

Functional Pearl: Implicit Configurations —or, Type Classes Reflect the Values of Types

Functional Pearl: Implicit Configurations —or, Type Classes Reflect the Values of Types, by Oleg Kiselyov and Chung-chieh Shan:

The configurations problem is to propagate run-time preferences throughout a program, allowing multiple concurrent configuration sets to coexist safely under statically guaranteed separation. This problem is common in all software systems, but particularly acute in Haskell, where currently the most popular solution relies on unsafe operations and compiler pragmas.

We solve the configurations problem in Haskell using only stable and widely implemented language features like the type-class system. In our approach, a term expression can refer to run-time configuration parameters as if they were compile-time constants in global scope. Besides supporting such intuitive term notation and statically guaranteeing separation, our solution also helps improve the program’s performance by transparently dispatching to specialized code at run-time. We can propagate any type of configuration data—numbers, strings, IO actions, polymorphic functions, closures, and abstract data types. No previous approach to propagating configurations implicitly in any language provides the same static separation guarantees.

The enabling technique behind our solution is to propagate values via types, with the help of polymorphic recursion and higher-rank polymorphism. The technique essentially emulates local type-class instance declarations while preserving coherence. Configuration parameters are propagated throughout the code implicitly as part of type inference rather than explicitly by the programmer. Our technique can be regarded as a portable, coherent, and intuitive alternative to implicit parameters. It motivates adding local instances to Haskell, with a restriction that salvages principal types.

This paper has been mentioned in threads quite a few times, but never had its own story. Of particular interest is the discussion of how to safely support local type class instances where Haskell only supports global instances.

Holiday Fun: How Programming Language Fanboys See Each Others’ Languages

Perhaps I am a bit dense, but I find this only mildly amusing, not ROFL material. Still, it is amusing enough to share at this time of year.

Happy holidays!