LtU Forum

Self-Certification: Bootstrapping Certified Typecheckers

Self Certification: Bootstrapping Certified Typecheckers in F* with Coq by Pierre-Yves Strub, Nikhil Swamy, Cedric Fournet, and Juan Chen. POPL'2012.

Well-established dependently-typed languages like Coq provide a highly reliable way to build and check formal proofs. Researchers have also developed several other dependently-typed programming languages such as Agda, Aura, ATS, Cayenne, Epigram, F*, F7, Fine, Guru, PCML5, Ur, etc., and more are in the works, e.g., Trellys. All these languages shine in their own regard, but they lag behind Coq in the degree of safety provided by their implementations. This paper proposes a general technique called self-certification that allows a typechecker for a suitably expressive language to be certified for correctness. We have implemented this general technique for F*, a dependently typed language on the .NET platform.

Self-certification (for F*) involves implementing a typechecker for F* in F*, while using all the conveniences F* provides for the compiler-writer (e.g., partiality, effects, implicit conversions, proof automation, libraries). This type checker is given a specification (in F*) strong enough to ensure that it computes valid typing derivations. We obtain a typing derivation for the core typechecker by running it on itself, and we export it to Coq as a type-derivation certificate. By typechecking this derivation (in Coq) and applying the F* metatheory (also mechanized in Coq), we conclude that our type checker is correct. Once certified in this manner, the F* typechecker is emancipated from Coq.

Self-certification leads to an efficient certification scheme - we no longer depend on verifying certificates in Coq - as well as a more broadly applicable one. For instance, the self-certified F* checker is suitable for use in adversarial settings where Coq is not intended for use, such as run-time certification of mobile code.

F* (Fstar) was mentioned on LtU last year. I find it exciting; F* not only certifies the program but generates efficient proof-carrying bytecode for distribution and thus has much potential for certified programs in open systems where we neither trust the provider of code nor can accept the denial-of-service vulnerabilities and resource costs for validating richly typed code locally.

The self-certification of F*, and the general methods and lessons learned, open a path for incrementally modifying the language, offering much greater freedom to language developers who might pursue languages with similarly expressive type systems.

Between self-certification, generating proof-carrying code for distribution, and interop with the .NET family, F* is seducing me onto the certified programming bandwagon.

[Meta] Are we Reddit?

At other times and other places I've had fun bashing C++. It's an easy target. But if personal opinions and sarcasm are all we're going to get out of LtU then we might as well close shop and move to Reddit.

In reading through the comments on that post I see nothing at all of what I would like to see from LtU. There's no insight into the problem space, no pointers to papers solving similar problems, not cogent critiques of the approach taken or conclusion reached in the original video or paper.

I think we can do better than that. I think we must. The Internet doesn't need anther Reddit. It needs at least one Lambda the Ultimate.

Introduction to the proof engine for static verification of software

My latest blog entry talks about the proof engine of Modern Eiffel. The blog entry is a basic introduction to the proof engine. It shows the basic inference rules, the basic commands, it introduces some axioms to reason in the propositional calculus. A lot of proofs are demonstrated using the boolean operators "and" "or" and "=>".

The introduction should be easy to follow, even for SW developers without training in proof techniques and formal mathematics. The proof engine is intended to be easy to use and understand. The introduction to the proof engine demonstrates that writing proofs is just another form of programming (manipulating the state of the proof engine with proof commands).

Those not familiar with Modern Eiffel can read be previous blog entry to understand its basic concepts.

Long rant on Erlang-style Actors: Lost Dimension

This post is follow up to my previous post about programming languages. If we apply the conceptual framework that was discussed in the post we will see that modern event-driven programs designed in OO paradigm (but not necessary in OO-language) has the following dimensions of analysis and discourse.

  1. Steps (statements)
  2. Work/evaluation (blocks/composite/statements/functions/procedures)
  3. Processes (objects, methods, events)

IMHO component-based programming adds additional dimension: intentional/technical. But since this dimension is much more subtle and not universally accepted, I leave it out of this discussion.

Every piece of the code in modern OO-applications could be reasoned in the terms of these dimensions. It could be noted, that these dimensions of discourse are not independent. Each new dimension differentiate and integrates the previous one. And we could see dimensions were developed sequentially in mainstream languages:

  1. Steps: FORTRAN, BASIC (line-based)
  2. Steps+Work: Pascal, Algol, C
  3. Steps+Work+Processes: C++, Java, ...

And modern mainstream OO-languages like Java even enforce this classification making every procedure or function belonging to some object or class. This raises some objections from functional programmers, but these objections are objected by OO programmers, that feel more comfortable when no code is left unclassified from point of view of these three dimensions.

The transitions between level 1 and level 1*2 was well documented in the essay “Go to considered harmful”. I would suggest to refresh line of reasoning used in that essay, since I would use the same line of reasoning later. I do not know the essay with similar clarity that discusses transition between 1*2 and 1*2*3.

The classical functional languages are much more tricky thing to classify. Firstly they try to suppress the first level Steps calling them non-pure, and then they introduce the first class functions, that clearly add to the process dimension of the program. So the primary focus of functional languages is level 2 with suppressed but leaking level 1 and half done level 3. The functional programming does add new “effects” dimension of discourse, but this dimension is somewhat underdeveloped, since some points on it are considered “better” then other. It is possible to write event-driven process-based programs in functional languages, it just less obvious since there less of language support.

Note that all of the above is directly describing the synchronous case. When things got to asynchronous case, the same dimensions can be applied. And here we will the same problems.

The step in the synchronous case was, modifying or reading program state, and going to the next statement to execute (either textually next, or jump). Work and processes based this step notions. Lets now consider how these dimensions map on asynchronous case:

  1. Step: In asynchronous case the step is receiving event, modifying state, and sending events.
  2. Work: The work dimension would be notion of block that has single point of entry (by events) and has a single exit (producing event). Then asynchronous blocks could be parametrised as asynchronous procedures and composed using asynchronous operators.
  3. Process: The process based decomposition is quite obvious, it is a component that has state and receives and sends events.

Note that the classical actor model provides only two of these dimensions: 1*3. There is notion of step and there is notion process (actor). And there every event is processed in the single step.

Erlang truthfully implement this model. And it also assumes single step event processing. There is also no work-based decomposition. The work dimension is lost from Erlang. It is also lost from all Erlang-style implementation of actor model.

The proof is quite simple. It is not possible to esaly apply functional composition operators to Erlang actors. And it is possible to specify which events actor receives, but it is hard to specify specify which are directly or indirectly produced by the actor. So it is asynchronous version of “go to” mess of synchronous code.

Note that there is the language that support all three dimensions. It is E programming language. This is possibly the first language that allows for structured asynchronous programming. The primary enabler for such programming is “when” operator that takes operation that returns promise as parameter and returns promise as result. This allows composing asynchronous processes. However the set of asynchronous operators in E is not complete. I have also created AsyncScala library in the same paradigm, that has a richer set of asynchronous operators. There is also AsyncGroovy library in the source repository, but it somewhat dated (it could be revived, if someone wishes to take over it).

The key differences of E-style 1*2*3 actors in contrast with with classical 1*3 actors are the following:

  1. There is a notion of asynchronous expression and asynchronous operation, and code reflects structure of assumed asynchronous operations. Usually there is a Promise object that represents outcome of composite asynchronous operation.
  2. Actors are separate from event queue. Single event queue could support many actors.
  3. Actors are lightweight.
  4. The framework is responsible to delivering events to the specific objects that represent actors. The normal asynchronous code does not have operations that explicitly retrieve operations from event queue.

Since a collection of E-style actors in one vat logically represent single Erlang-style actor, then I would like to name this model sub-actor model, in order to distinguish it from classical actor model, but possibly there is a better name for them.

Thus I claim that E-style actors are more powerful then Erlang-style because they provide explicit support for additional dimension of analysis and discourse about asynchronous program: work-based decomposition of the program. Erlang-style actors do not provide explicit support for this dimension, and this causes much of the pain related to asynchronous programming in Erlang.

And if you are designing a new programming language or a library that supports asynchronous programming, please consider adding a explicit notion of compose-able asynchronous operations to it. A good smoke test for notion of asynchonous operation would be applicability of functional composition operations to asynchronous operations. It really is not that difficult, just try not to leave out the key features mentioned above. You could use E or AsyncScala as a sample.

Update 1 (2012-02-20): fixed some typos

catamorphism.com and hylomorphism.com free to a good home.

"In category theory, the concept of catamorphism (from Greek: κατά = downwards or according to; μορφή = form or shape) denotes the unique homomorphism from an initial algebra into some other algebra. The concept has been applied to functional programming as folds.”

“In computer science, and in particular functional programming, a hylomorphism is a recursive function, corresponding to the composition of an anamorphism (which first builds a set of results; also known as 'unfolding') and a catamorphism (which then folds these results into a final return value). Fusion of these two recursive computations into a single recursive pattern then avoids building the intermediate data structure. This is a particular form of the optimizing program transformation techniques collectively known as deforestation.”

For a year or so, I have been squatting on catamorphism.com and hylomorphism.com (My thought at the time was to use them for a book I was thinking of writing, and as a bonus they wouldn’t get grabbed by some scummy domain name arbitrageur). I’m joining the throngs heading for the godaddy.com exits, and it seems like a good time to consider putting either or both of these domains to good use. So:

If you or someone you know could do something interesting with either or both of these domains, I would be pleased to transfer them. No compensation is required. It could be for a book about programming as I originally envisaged, a commercial venture like a startup, redirect to video lectures, anything at all, really. I would be delighted if it's something that gratifies one's "intellectual curiosity.”

Please mention this discussion or my blog post to folks who might have an interesting idea to suggest. Thanks in advance!

R7RS public comment period (June 30, 2012)

I'm copying this [Scheme-reports] message by Marc Feeley:

This message is being posted to various lists to inform members of the Scheme community on the development of R7RS.

I am pleased to announce that the sixth draft version of R7RS ("small" language) has been completed by working group 1 and is now available at the following URL:

http://trac.sacrideo.us/wg/raw-attachment/wiki/WikiStart/r7rs-draft-6.pdf

A copy will also be posted on schemers.org .

Other documents produced by working group 1, including previous drafts and progress reports are available at the following URL:

http://www.scheme-reports.org/2012/working-group-1.html

The editors of working groups 1 and 2, in consultation with the Scheme language steering committee, have provided a mechanism for comment and discussion. For details, including instructions on how to submit a formal comment, please see this document:

http://www.scheme-reports.org/2012/process1.html

The comment period is now open and will continue until June 30, 2012.

The steering committee thanks the editors for their intensive work on the draft R7RS, and looks forward to the public comment period on this sixth draft.

Enjoy!

For the Scheme language Steering Committee,
-- Marc Feeley

Here's an interesting wiki page that discusses criticisms of R6RS and how R7RS addresses them.

Teaching challenge: culturally enriching formulae-as-types

In a G+ request, Norman Ramsey says he wants spice up his PL survey course with a little bit of "cultural enrichment" to show my students the "propositions as types" approach to proof. He's looking for source materials, and, I take it, experiences from people who have tried to teach similar things.

Active Variables in Common Lisp

A feature I lifted from ksh93, I thought I would share with LTU. Active variables in Common Lisp; variables with callbacks on reading and writing. Possibly a useful feature that I rarely see in modern programming languages.

github/cl-active-variables

Peak Abstraction

Today I learned a new term from a blog post:

An interesting phenomena I've observed over my years as a programmer is the occurrence of "Peak Abstraction" within a young programmer - and their subsequent improvement.

There is a common pattern. It typically occurs after 3-4 years of professional practice. A programmer is growing in his confidence. He begins to tackle more complex, challenging problems. And, he reflects those problems with complex, challenging data structures. He has not yet learnt the fine art of simplicity.

Every member of every data structure is a pointer or a reference. There are no simple data types. ... Those around them become increasingly desperate. They can see the code becoming inflated, inefficient, unmaintainable.

And then it happens. The programmer realises the error of their ways. They realise that they are adding neither design value nor computational value. They realise they are adding overhead and maintenance trouble. ... And thus the recovery begins. Data structures become simple, expressive, and maintainable.

The complete blog post rags mostly on perf issues, but I'm more interested in the complexity implications. I felt myself go through this before, and to be honest the more powerful the language, the worse my peak abstraction got. It was only when moving to a less expressive language (C# rather than Scala) that I had incentive to keep it simple.

Has anyone else found themselves in an abstraction trap and come down as they grew as a programmer? What can we do in PL design to avoid, or at least discourage, overuse of abstraction? Is this a case of where less might be more?

Evolution of mainstream programming language paradigms

I have been thinking recently about what makes the mainstream language a mainstream. And I think that the mainstream programming language paradigms generally evolve using the following evolution pattern:

  1. Complexity Pain: complexity of reasoning about some aspects of the programs growth fastest with program growth. At some time amount of efforts required to reason about some piece of code becomes unreasonable.
  2. Virtual Structure: Organizing program according to virtual structure that makes reasoning about that aspect of the program easier.
  3. Explicit Structure: Develop language, that makes virtual structure explicit by introducing additional meta-level constructs, that organize constructs of the previous levels into the structure.

In paradigm for mainsteam languages we could obviously see the chain when the following level organizes the previous one.

  1. [Conditional] jumps, memory assignments,and register arithmetics (assemblers).
  2. Statements (FORTRAN).
  3. Blocks, procedures, and data types (C, Pascal).
  4. Classes (abstraction over procedures and data types) (C++)
  5. Components (abstraction over classes and interfaces that differenitates techinical and intential aspects) (Java, C#, VB.NET, VB+COM, JavaScript[only halve way here])

As we see here, the next generation always abstracted previous level, and it was exactly one level abstraction. The functional languages so far do not fit into this picture.

The LISP was not one step jump, they are many step jump. Many features of LISP got into mainstream (the latest were garbage collection and lambda expressions), but there are some things to integerate (like internal DSLs). So LIPS was a kind of peak experience for programming language paradigms.

To me Haskell and ML did not offer appropriate one-step abstraction over structured programming. ML modules were only half-step abstraction over data and procedures, since data and functions were still different concepts. There were also halfway abstraction of data and behavior in the form of lambda expressions, but it was a single-method object, but GUI toolkits (current complexity challlenge at that time) required many-methods objects for appropriate abstraction.

If we look to the sequence, the next step should be some abstraction over components, and I think it would be component systems. We could see the early component systems as libraries now (Spring, GUI toolkits, EJB3, Plugins in Eclipse and IDEA, UML2, etc).

I have written more details why I think so in this document. It also contains an initial list of requirements for the possible next generation mainstream language. I think it will be a component system programming langauge (CSPL). So the document is named CSPL Challenge. I think typing and and type-checking component systems would be a quite interesting research topic. And maybe there are already some results out there, that fit requirement exactly.

XML feed