Validating LR(1) parsers
An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser provides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language.
I've always been somewhat frustrated, while studying verified compiler technology, that the scope of the effort has generally been limited to ensuring that the AST and the generated code mean the same thing, as important as that obviously is. Not enough attention has been paid, IMHO, to other compiler phases. Parsing: The Solved Problem That Isn't does a good job illuminating some of the conceptual issues that arise in attempting to take parsers seriously as functions that we would like to compose etc. while maintaining some set of properties that hold of the individuals. Perhaps this work can shed some light on possible solutions to some of those issues, in addition to being worthwhile in its own right. Note the pleasing presence of an actual implementation that's been used on the parser of a real-world language, C99.
Tool Demo: Scala-Virtualized
This paper describes Scala-Virtualized, which extends the Scala language and compiler with a small number of features that enable combining the beneﬁts of shallow and deep embeddings of DSLs. We demonstrate our approach by showing how to embed three different domain-speciﬁc languages in Scala. Moreover, we summarize how others have been using our extended compiler in their own research and teaching. Supporting artifacts of our tool include web-based tutorials, nightly builds, and an Eclipse update site hosting an up-to-date version of the Scala IDE for Eclipse based on the Virtualized Scala compiler and standard library.
Scala has always had a quite good EDSL story thanks to implicits, dot- and paren-inference, and methods-as-operators. Lately there are proposals to provide it with both macros-in-the-camlp4-sense and support for multi-stage programming. This paper goes into some depth on the foundations of the latter subject.
Adding Delimited and Composable Control to a Production Programming Environment (add'l material), Matthew Flatt, Gang Yu, Robert Bruce Findler, Matthias Felleisen, ICFP 2007.
Operators for delimiting control and for capturing composable continuations litter the landscape of theoretical programming language research. Numerous papers explain their advantages, how the operators explain each other (or don’t), and other aspects of the operators’ existence. Production programming languages, however, do not support these operators, partly because their relationship to existing and demonstrably useful constructs—such as exceptions and dynamic binding—remains relatively unexplored. In this paper, we report on our effort of translating the theory of delimited and composable control into a viable implementation for a production system. The report shows how this effort involved a substantial design element, including work with a formal model, as well as significant practical exploration and engineering. The resulting version of PLT Scheme incorporates the expressive combination of delimited and composable control alongside dynamic-wind, dynamic binding, and exception handling. None of the additional operators subvert the intended benefits of existing control operators, so that programmers can freely mix and match control operators.
Another tour de force by the PLT folks. Does your language have delimited control, delimited dynamic binding, and exceptions? It's the new gold standard, and so far only Racket and O'Caml qualify (and maybe Haskell and Scala?)
Racket's implementation is additionally interesting because it achieves backwards compatibility with code written using undelimited call/cc and dynamic-wind. The authors mention that a simpler solution would be possible without this compatibility - based on control filters from the Subcontinuations paper.
The Deca programming language is "a language designed to provide the advanced features of sophisticated, high-level programming languages while still programming as close as possible to the bare metal. It brings in the functional, object-oriented, and generic programming paradigms without requiring a garbage collector or a threading system, so programmers really only pay in performance for the features they use." The latter link provides a list of features that Deca does, will, and won't provide. Features provided include type inference, universally- and existentially- quantified types, and "a strong region-and-effect system that prohibits unsafe escaping pointers and double-free errors".
The Deca language and ideas behind it are documented in a thesis, The design and implementation of a modern systems programming language (PDF):
Low-level systems programming has remained one of the most consistently difficult tasks in software engineering, since systems programmers must routinely deal with details that programming-language and systems researchers have preferred to abstract away. At least partially, the difficulty arises from not applying the state of the art in programming-languages research to systems programming. I therefore describe the design and implementation of Deca, a systems language based on modern PL principles. Deca makes use of decades in programming-languages research, particularly drawing from the state of the art in functional programming, type systems, extensible data-types and subroutines, modularity, and systems programming-languages research. I describe Deca's feature-set, examine the relevant literature, explain design decisions, and give some of the implementation details for Deca language features. I have been writing a compiler for Deca to translate it into machine code, and I describe the overall architecture of this compiler and some of its details.
The source code for the Deca compiler, decac, is available here. The compiler is implemented in Scala and generates LLVM bytecode. (The author points out in the comments below that this implementation is a work in progress.)
The author of Deca is LtU member Eli Gottlieb, who back in 2008 posted in the forum asking for feedback on his language: Practical Bits of Making a Compiler for a New Language.
There's some more discussion of Deca over at Hacker News.
Software Engineering with Formal Methods: The Development of a Storm Surge Barrier Control System - Seven Myths of Formal Methods Revisited (2001), by Jan Tretmans, Klaas Wijbrans, Michel Chaudron:
Bos is the software system which controls and operates the storm surge barrier in the Nieuwe Waterweg near Rotterdam. It is a complex, safety-critical system of average size, which was developed by CMG Den Haag B.V., commissioned by Rijkswaterstaat (RWS) – the Dutch Ministry of Transport, Public Works and Water Management. It was completed in October 1998 on time and within budget.
CMG used formal methods in the development of the Bos software. This paper discusses the experiences obtained from their use. Some people claim that the use of formal methods helps in developing correct and reliable software, others claim that formal methods are useless and unworkable. Some of these claims have almost become myths. A number of these myths are described and discussed in a famous article: Seven Myths of Formal Methods [Hal90]. The experiences obtained from using formal methods for the development of Bos will be discussed on the basis of this article. We will discuss to what extent these myths are true for the Bos project.
The data for this survey were collected by means of interviews with software engineers working on the Bos project. These include the project manager, designers, implementers and testers, people who participated from the beginning in 1995 until the end in 1998 as well as engineers who only participated in the implementation phase, and engineers with and without previous, large-scale software engineering experience.
This paper concentrates on the experiences of the software engineers with formal methods. These experiences, placed in the context of the seven myths, are described in section 3. This paper does not discuss technical details about the particular formal methods used or the way they were used; see [Kar97, Kar98] for these aspects. Moreover, formal methods were only one technique used in the development of Bos. The overall engineering approach and the way diﬀerent methods and techniques were combined to assure the required safetycritical quality, are described [WBG98, WB98]. Testing in Bos is described in more detail in [GWT98], while [CTW99] will give a more systematic analysis of the results of the interviews
with the developers.
Discussion of formal methods and verification has come up a few times here on LtU. In line with the recent discussions on the need for more empirical data in our field, this was an interesting case study on the use of formal methods. The seven myths of formal methods are reviewed in light of a real project:
- Myth 1: Formal methods can guarantee that software is perfect
- Myth 2: Formal methods are all about program proving
- Myth 3: Formal methods are only useful for safety-critical system
- Myth 4: Formal methods require highly trained mathematicians
- Myth 5: Formal methods increase the cost of developmen
- Myth 6: Formal methods are unacceptable to users
- Myth 7: Formal methods are not used on real, large-scale software
Dependently Typed Programming based on Automated Theorem Proving, by Alasdair Armstrong, Simon Foster, and Georg Struth. [Link to preprint on ArXiv, a.k.a. this has not yet been refereed, use at your own risk].
Mella is a minimalistic dependently typed programming language and interactive theorem prover implemented in Haskell. Its main purpose is to investigate the effective integration of automated theorem provers in a pure and simple setting. Such integrations are essential for supporting program development in dependently typed languages. We integrate the equational theorem prover Waldmeister and test it on more than 800 proof goals from the TPTP library. In contrast to previous approaches, the reconstruction of Waldmeister proofs within Mella is quite robust and does not generate a significant overhead to proof search. Mella thus yields a template for integrating more expressive theorem provers in more sophisticated languages.
Coq and Agda are demonstrating the dependently-typed programming is feasible and beneficial -- but still quite painful in practice. The point of computers is that they can automate a lot of drudgery. And a lot of proofs ought to be considered drudgery as well. But, in practice, this is a huge leap. The authors present an interesting experiment in a promising direction.
The LtU angle here is that current (automated) proof assistants generate proofs which, usually, have a huge impedance mismatch with the kinds of evidence that a type-checker for a dependently-typed language needs to be convinced of the validity of some user code. So there is a non-trivial engineering issue to be solved regarding the implementation of a pleasant environment for dependently-typed programming.
A Monadic Framework for Delimited Continuations (PDF), R. Kent Dybvig, Simon Peyton Jones, Amr Sabry. TR, June 2005.
Delimited continuations are more expressive than traditional abortive continuations and they apparently seem to require a framework beyond traditional continuation-passing style (CPS). We show that this is not the case: standard CPS is sufficient to explain the common control operators for delimited continuations. We demonstrate this fact and present an implementation as a Scheme library. We then investigate a typed account of delimited continuations that makes explicit where control effects can occur. This results in a monadic framework for typed and encapsulated delimited continuations which we design and implement as a Haskell library.
A fascinating paper about delimited control. I'm very much a newbie to delimited control, but this paper has been enormously helpful - despite the title. ;)
The basic idea of the paper is to represent the execution context as a sequence containing prompts (control delimiters) and the (partial) continuations between prompts. This model is formalized with an operational semantics, which was insightful even though it's the first operational semantics I've studied.
The authors then present an implementation of the model in terms of call/cc in Scheme. The basic idea here is to always perform user code after aborting to a context near the bottom of the stack, just above a call to an underflow function - this means that even though we use undelimited call/cc, we only ever capture our (small, partial) execution context. The whole execution context (the "metacontinuation") is maintained as a sequence data structure in a global variable (basically, a list containing prompts and Scheme continuations). The underflow function destructures the metacontinuation, and executes (returns to) the partial continuations stored in it. Pushing a prompt adds a delimiter to the metacontinuation, capturing a delimited continuation splits the metacontinuation at a delimiter, and composing a continuation appends to the metacontinuation.
I haven't even gotten to the later parts of the paper yet, but this model and the Scheme implementation alone is worth a look.
(The paper seems to be a reworked version of A Monadic Framework for Subcontinuations, discussed previously.)
Lightweight Monadic Programming in ML
Many useful programming constructions can be expressed as monads. Examples include probabilistic modeling, functional reactive programming, parsing, and information flow tracking, not to mention effectful functionality like state and I/O. In this paper, we present a type-based rewriting algorithm to make programming with arbitrary monads as easy as using ML's built-in support for state and I/O. Developers write programs using monadic values of type M t as if they were of type t, and our algorithm inserts the necessary binds, units, and monad-to-monad morphisms so that the program type checks. Our algorithm, based on Jones' qualified types, produces principal types. But principal types are sometimes problematic: the program's semantics could depend on the choice of instantiation when more than one instantiation is valid. In such situations we are able to simplify the types to remove any ambiguity but without adversely affecting typability; thus we can accept strictly more programs. Moreover, we have proved that this simplification is efficient (linear in the number of constraints) and coherent: while our algorithm induces a particular rewriting, all related rewritings will have the same semantics. We have implemented our approach for a core functional language and applied it successfully to simple examples from the domains listed above, which are used as illustrations throughout the paper.
This is an intriguing paper, with an implementation in about 2,000 lines of OCaml. I'm especially interested in its application to probabilistic computing, yielding a result related to Kiselyov and Shan's Hansei effort, but without requiring delimited continuations (not that there's anything wrong with delimited continuations). On a theoretical level, it's nice to see such a compelling example of what can be done once types are freed from the shackle of "describing how bits are laid out in memory" (another such compelling example, IMHO, is type-directed partial evaluation, but that's literally another story).
Andrej Bauer's blog contains the PL Zoo project. In particular, the Levy language, a toy implementation of Paul Levy's CBPV in OCaml.
If you're curious about CBPV, this implementation might be a nice accompaniment to the book, or simply a hands on way to check it out.
It looks like an implementation of CBPV without sum and product types, with complex values, and without effects. I guess a more hands-on way to get to grips with CBPV would be to implement any of these missing features.
The posts are are 3 years old, but I've only just noticed them. The PL Zoo project was briefly mentioned here.
Mike Barnett, Manuel Fähndrich, K. Rustan M. Leino, Peter Müller, Wolfram Schulte, and Herman Venter, Speciﬁcation and Veriﬁcation: The Spec# Experience" Preprint of an article appearing in the June 2011 CACM.
CACM tagline: Can a programming language really help programmers write better programs?
Spec# is a programming system that facilitates the development of correct software. The Spec# language extends C# with contracts that allow programmers to express their design intent in the code. The Spec# tool suite consists of a compiler that emits run-time checks for contracts, a static program veriﬁer that attempts to mathematically prove the correctness of programs, and an integration into the Visual Studio development environment. Spec# shows how contracts and veriﬁers can be integrated seamlessly into the software development process. This paper reﬂects on the six-year history of the Spec# project, scientiﬁc contributions it has made, remaining challenges for tools that seek to establish program correctness, and prospects of incorporating veriﬁcation into everyday software engineering.
Spec# is, in some ways, quite similar to JML+ESC/Java2. But Spec# is a language rather than a set of annotations, which allows it to incorporate features such as a non-null type system and a very tight integration with the IDE.
Spec# was previously mentioned on LtU back in 2005.