Vellvm: Formalizing the LLVM Intermediate Representation for Verified Program Transformations

Vellvm: Formalizing the LLVM Intermediate Representation for Verified Program Transformations by Jianzhou Zhao, Santosh Nagarakatte, Milo M. K. Martin, and Steve Zdancewic, POPL 2012

This paper presents Vellvm (verified LLVM), a framework for reasoning about programs expressed in LLVM's intermediate representation and transformations that operate on it. Vellvm provides a mechanized formal semantics of LLVM's intermediate representation, its type system, and properties of its SSA form. The framework is built using the Coq interactive theorem prover. It includes multiple operational semantics and proves relations among them to facilitate different reasoning styles and proof techniques.

To validate Vellvm's design, we extract an interpreter from the Coq formal semantics that can execute programs from LLVM test suite and thus be compared against LLVM reference implementations. To demonstrate Vellvm's practicality, we formalize and verify a previously proposed transformation that hardens C programs against spatial memory safety violations. Vellvm's tools allow us to extract a new, verified implementation of the transformation pass that plugs into the real LLVM infrastructure; its performance is competitive with the non-verified, ad-hoc original.

This obviously represents huge progress in marrying the theoretical benefits of tools like Coq with the practical benefits of tools like LLVM. We can only hope that this spurs further development in practical certified software development.

Beyond pure Prolog: Power and danger

One of the sections of Oleg Kiselyov's Prolog and Logic Programming page, on Beyond pure Prolog: power and danger, points out (i) term introspection (in the guise of the var/1 predicate) can be derived from three of Prolog's imperative features, two of which are quite mild-looking, and (ii) this introspection potentially makes Prolog code hard to understand.

Oleg pointed this note in response to my defence of cut; it is short, sweet, and well-argued.

Deca, an LtU-friendly bare metal systems programming language

The Deca programming language is "a language designed to provide the advanced features of sophisticated, high-level programming languages while still programming as close as possible to the bare metal. It brings in the functional, object-oriented, and generic programming paradigms without requiring a garbage collector or a threading system, so programmers really only pay in performance for the features they use." The latter link provides a list of features that Deca does, will, and won't provide. Features provided include type inference, universally- and existentially- quantified types, and "a strong region-and-effect system that prohibits unsafe escaping pointers and double-free errors".

The Deca language and ideas behind it are documented in a thesis, The design and implementation of a modern systems programming language (PDF):

Low-level systems programming has remained one of the most consistently difficult tasks in software engineering, since systems programmers must routinely deal with details that programming-language and systems researchers have preferred to abstract away. At least partially, the difficulty arises from not applying the state of the art in programming-languages research to systems programming. I therefore describe the design and implementation of Deca, a systems language based on modern PL principles. Deca makes use of decades in programming-languages research, particularly drawing from the state of the art in functional programming, type systems, extensible data-types and subroutines, modularity, and systems programming-languages research. I describe Deca's feature-set, examine the relevant literature, explain design decisions, and give some of the implementation details for Deca language features. I have been writing a compiler for Deca to translate it into machine code, and I describe the overall architecture of this compiler and some of its details.

The source code for the Deca compiler, decac, is available here. The compiler is implemented in Scala and generates LLVM bytecode. (The author points out in the comments below that this implementation is a work in progress.)

The author of Deca is LtU member Eli Gottlieb, who back in 2008 posted in the forum asking for feedback on his language: Practical Bits of Making a Compiler for a New Language.

There's some more discussion of Deca over at Hacker News.

Seven Myths of Formal Methods Revisited

Software Engineering with Formal Methods: The Development of a Storm Surge Barrier Control System - Seven Myths of Formal Methods Revisited (2001), by Jan Tretmans, Klaas Wijbrans, Michel Chaudron:

Bos is the software system which controls and operates the storm surge barrier in the Nieuwe Waterweg near Rotterdam. It is a complex, safety-critical system of average size, which was developed by CMG Den Haag B.V., commissioned by Rijkswaterstaat (RWS) – the Dutch Ministry of Transport, Public Works and Water Management. It was completed in October 1998 on time and within budget.

CMG used formal methods in the development of the Bos software. This paper discusses the experiences obtained from their use. Some people claim that the use of formal methods helps in developing correct and reliable software, others claim that formal methods are useless and unworkable. Some of these claims have almost become myths. A number of these myths are described and discussed in a famous article: Seven Myths of Formal Methods [Hal90]. The experiences obtained from using formal methods for the development of Bos will be discussed on the basis of this article. We will discuss to what extent these myths are true for the Bos project.

The data for this survey were collected by means of interviews with software engineers working on the Bos project. These include the project manager, designers, implementers and testers, people who participated from the beginning in 1995 until the end in 1998 as well as engineers who only participated in the implementation phase, and engineers with and without previous, large-scale software engineering experience.

This paper concentrates on the experiences of the software engineers with formal methods. These experiences, placed in the context of the seven myths, are described in section 3. This paper does not discuss technical details about the particular formal methods used or the way they were used; see [Kar97, Kar98] for these aspects. Moreover, formal methods were only one technique used in the development of Bos. The overall engineering approach and the way different methods and techniques were combined to assure the required safetycritical quality, are described [WBG98, WB98]. Testing in Bos is described in more detail in [GWT98], while [CTW99] will give a more systematic analysis of the results of the interviews
with the developers.

Discussion of formal methods and verification has come up a few times here on LtU. In line with the recent discussions on the need for more empirical data in our field, this was an interesting case study on the use of formal methods. The seven myths of formal methods are reviewed in light of a real project:

  1. Myth 1: Formal methods can guarantee that software is perfect
  2. Myth 2: Formal methods are all about program proving
  3. Myth 3: Formal methods are only useful for safety-critical system
  4. Myth 4: Formal methods require highly trained mathematicians
  5. Myth 5: Formal methods increase the cost of developmen
  6. Myth 6: Formal methods are unacceptable to users
  7. Myth 7: Formal methods are not used on real, large-scale software

Dependently Typed Programming based on Automated Theorem Proving

Dependently Typed Programming based on Automated Theorem Proving, by Alasdair Armstrong, Simon Foster, and Georg Struth. [Link to preprint on ArXiv, a.k.a. this has not yet been refereed, use at your own risk].

Mella is a minimalistic dependently typed programming language and interactive theorem prover implemented in Haskell. Its main purpose is to investigate the effective integration of automated theorem provers in a pure and simple setting. Such integrations are essential for supporting program development in dependently typed languages. We integrate the equational theorem prover Waldmeister and test it on more than 800 proof goals from the TPTP library. In contrast to previous approaches, the reconstruction of Waldmeister proofs within Mella is quite robust and does not generate a significant overhead to proof search. Mella thus yields a template for integrating more expressive theorem provers in more sophisticated languages.

Coq and Agda are demonstrating the dependently-typed programming is feasible and beneficial -- but still quite painful in practice. The point of computers is that they can automate a lot of drudgery. And a lot of proofs ought to be considered drudgery as well. But, in practice, this is a huge leap. The authors present an interesting experiment in a promising direction.

The LtU angle here is that current (automated) proof assistants generate proofs which, usually, have a huge impedance mismatch with the kinds of evidence that a type-checker for a dependently-typed language needs to be convinced of the validity of some user code. So there is a non-trivial engineering issue to be solved regarding the implementation of a pleasant environment for dependently-typed programming.

Cambridge Course on "Usability of Programming Languages"

From the syllabus of the Cambridge course on Usability of Programming Languages

Compiler construction is one of the basic skills of all computer scientists, and thousands of new programming, scripting and customisation languages are created every year. Yet very few of these succeed in the market, or are well regarded by their users. This course addresses the research questions underlying the success of new programmable tools. A programming language is essentially a means of communicating between humans and computers. Traditional computer science research has studied the machine end of the communications link at great length, but there is a shortage of knowledge and research methods for understanding the human end of the link. This course provides practical research skills necessary to make advances in this essential field. The skills acquired will also be valuable for students intending to pursue research in advanced HCI, or designing evaluation studies as a part of their MPhil research project.

Is this kind of HCI based research going to lead to better languages? Or more regurgitations of languages people are already comfortable with?

CRA-W/CDC and SIGPLAN Programming Languages Mentoring Workshop

CRA-W/CDC and SIGPLAN Programming Languages Mentoring Workshop, Philadelphia, PA (co-located with POPL 2012) Tuesday January 24, 2012

We are pleased to invite students interested in programming languages research to the first PL mentoring workshop. The goal of this workshop is to introduce senior undergraduate and early graduate students to research topics in programming language theory as well as provide career mentoring advice to help them get through graduate school, land a great job, and succeed. We have recruited leaders from the programming language community to provide overviews of current research topics, and have organized panels of speakers to give students valuable advice about how to thrive in graduate school, search for a job, and cultivate habits and skills that will help them in research careers.

This workshop is part of the activities surrounding POPL, the Symposium on Principles of Programming Languages, and takes place the day before the main conference. One goal of the workshop is to make the POPL conference more accessible to newcomers and we hope that participants will stay through the entire conference.

Through the generous donation of our sponsors, we are able to provide travel scholarships to fund student participation. These travel scholarships will cover reasonable travel expenses (airfare, hotel and registration fees) for attendance at both the workshop and the POPL conference. Anyone may apply for a travel scholarship, but first priority will be given to women and under-represented minority applicants.

The workshop registration is open to all. Students with alternative sources of funding for their travel and registration fees are welcome.

APPLICATION FOR TRAVEL SUPPORT:
The travel funding application can be accessed from the workshop web site. The deadline for full consideration of funding is December 2, 2011. Selected participants will be notified starting December 9th and will need to register for the workshop by December 24th.

ORGANIZERS:
Stephanie Weirich, Kathleen Fisher and Ron Garcia

SPONSORS:
The Computing Research Association's Committee on the Status of Women (CRA-W), the Coalition to Diversify Computing (CDC), and the ACM Special Interest Group on Programming Languages (SIGPLAN).

We don't usually post conference or workshop announcements on the front page, but this seemed sufficiently new and worthy to me. Please note that the deadline for the application for travel support is only two days away!

LTL types FRP

Alan Jeffrey (to appear 2012) LTL types FRP: Linear-time Temporal Logic Propositions as Types, Proofs as Functional Reactive Programs. To be presented at next year's Programming Languages meets Program Verification, (PLPV 2012).
Functional Reactive Programming (FRP) is a form of reactive programming whose model is pure functions over signals. FRP is often expressed in terms of arrows with loops, which is the type class for a Freyd category (that is a premonoidal category with a cartesian centre) equipped with a premonoidal trace. This type system suffices to define the dataflow structure of a reactive program, but does not express its temporal properties. In this paper, we show that Linear-time Temporal Logic (LTL) is a natural extension of the type system for FRP, which constrains the temporal behaviour of reactive programs. We show that a constructive LTL can be defined in a dependently typed functional language, and that reactive programs form proofs of constructive LTL properties. In particular, implication in LTL gives rise to stateless functions on streams, and the “constrains” modality gives rise to causal functions. We show that reactive programs form a partially traced monoidal category, and hence can be given as a form of arrows with loops, where the type system enforces that only decoupled functions can be looped.
Via Alan's G+ feed.

Extensible Programming with First-Class Cases

Extensible Programming with First-Class Cases, by Matthias Blume, Umut A. Acar, and Wonseok Chae:

We present language mechanisms for polymorphic, extensible records and their exact dual, polymorphic sums with extensible first-class cases. These features make it possible to easily extend existing code with new cases. In fact, such extensions do not require any changes to code that adheres to a particular programming style. Using that style, individual extensions can be written independently and later be composed to form larger components. These language mechanisms provide a solution to the expression problem.

We study the proposed mechanisms in the context of an implicitly typed, purely functional language PolyR. We give a type system for the language and provide rules for a 2-phase transformation: first into an explicitly typed λ-calculus with record polymorphism, and finally to efficient index-passing code. The first phase eliminates sums and cases by taking advantage of the duality with records.

We implement a version of PolyR extended with imperative features and pattern matching—we call this language MLPolyR. Programs in MLPolyR require no type annotations—the implementation employs a reconstruction algorithm to infer all types. The compiler generates machine code (currently for PowerPC) and optimizes the representation of sums by eliminating closures generated by the dual construction.

This is an elegant solution to the expression problem for languages with pattern matching. This paper was posted twice in LtU comments, but it definitely deserves its own story. Previous solutions to the exression problem are rather more involved, like Garrigue's use of recursion and polymorphic variants, because they lack support for extensible records which makes this solution so elegant.

Extensible records and first-class cases unify object-oriented and functional paradigms on a deeper level, since they enable first-class messages to be directly encoded. Add a sensible system for dynamics, and I argue you have most of the power people claim of dynamic languages without sacrificing the safety of static typing.

Foundations of Inference

Foundations of Inference, Kevin H. Knuth, John Skilling, arXiv:1008.4831v1 [math.PR]

We present a foundation for inference that unites and significantly extends the approaches of Kolmogorov and Cox. Our approach is based on quantifying finite lattices of logical statements in a way that satisfies general lattice symmetries. With other applications in mind, our derivations assume minimal symmetries, relying on neither complementarity nor continuity or differentiability. Each relevant symmetry corresponds to an axiom of quantification, and these axioms are used to derive a unique set of rules governing quantification of the lattice. These rules form the familiar probability calculus. We also derive a unique quantification of divergence and information. Taken together these results form a simple and clear foundation for the quantification of inference.

For those of us who find ourselves compelled by the view of probability as a generalization of logic that is isomorphic to (algorithmic, as if there were any other kind) information theory, here is some recent i-dotting and t-crossing. The connection to Curry-Howard or, if you prefer, Krivine's classical realizability is something I hope to explore in the near future.