The YNot Project

The YNot Project

The goal of the Ynot project is to make programming with dependent types practical for a modern programming language. In particular, we are extending the Coq proof assistant to make it possible to write higher-order, imperative and concurrent programs (in the style of Haskell) through a shallow embedding of Hoare Type Theory (HTT). HTT provides a clean separation between pure and effectful computations, makes it possible to formally specify and reason about effects, and is fully compositional.

This was actually commented on here, by Greg Morrisett himself. Conceptually, this seems like it's related to Adam Chlipala's A Certified Type-Preserving Compiler from Lambda Calculus to Assembly Language. See, in particular, slides 23-24 of this presentation (PDF). More generally, computation and reflection seem to be gaining recognition as important features for the practical use of such powerful tools as Coq; see also SSREFLECT tactics for Coq and their accompanying paper A Small Scale Reflection Extension for the Coq system (PDF).

It's also interesting to observe that the work appears to depend on the "Program" keyword in the forthcoming Coq 8.2, the work behind which is known as "Russell," as referred to in this thread. Russell's main page in the meantime is here. Again, the point is to simplify programming with dependent types in Coq.

Update: Some preliminary code is available from the project page.

Comment viewing options

Steps to Practicality

So, what's the roadmap here? Surely we can't expect real programmers to write code using Coq syntax. Do we define a nice mainstream-friendly language syntax, a translation into Coq for typechecking, and then a translation into some runtime form that is sound if the program successfully typechecked?

This special syntax for things with effects sucks and is counter to mainstream practice and intuition, e.g. "IO Int" and "Int->IO Int" versus "Int" and "Int->Int". We should implicitly wrap all computations in monads, e.g. have Int->Int compile to Int->M(Int) for a context-dependent monad M. And have "let x=a in bx" compile to "a >== \x->bx". Then, the syntax is regular for all kinds of effects, including none. You can use recursive monads as described in the "mdo" paper to separate out the total part of the language (the identity monad without recursion), the partial part (the recursive identity monad), and the various effectful parts -- exceptions, state threads, IO.

Also, it's untidy to force programmers to generate proofs of stuff and pass them to functions in addition to ordinary parameters, such as calling a function and passing it an integer and a proof that the integer is between 0 and n. Better to have a more expressive type system with proper subtyping, e.g. have the type of natural numbers less than n as a subtype of integers.

Re: tricking programmers into using it

It does seem likely to me that if you come at the functionality from the "you are writing proofs" angle, people will avoid it because of the connotations for "proof". On the other hand, if you just say "oh, hey, we have finer-grained types now" people might be a lot more interested. Even if in the end it was basically the same thing? Like, saying the compiler's type checker proves things for you "for free" is great, but saying you have to write proofs is just scary even though people are used to writing types.

Need to learn how to use existing type systems first

I think even very good programmers could still use training in how to get the best use of type systems in languages like C++.

Last week I encountered a bug in a large C++ app that boiled down to uncanonicalised URLs. I suspect (but to be fair, don't know for sure) that the URLs were being passed around as strings and not represented using some kind of URL type/class. If the string had been casted to a URL type as early as possible, simple object orientation would have been sufficient to prevent the bug from happening.

Dependent types are an idea that feels intuitively right to me, but I wonder how many bugs in practice could be avoided by clever (or not so clever) usage of the standard C++ type system. Can people show me examples of real-world bugs that could be prevented by something like YNot but not C++ classes and templates?

The type of mistake you

The type of mistake you describe is very common. I think it is much more fundamental than not utilizing the type system (though it is that, too, of course). It is a lack of understanding of the value of abstraction in software design (ADTs in this case).

But...

Need to learn how to use existing type systems first

But industrial practice is typically 10 to 20 years behind the forefront of research. By the time dependent types are broadly available, the average coder may just be about competent with first order polymorphic types.

Can people show me examples of real-world bugs that could be prevented by something like YNot but not C++ classes and templates?

First, let me say that talking about C++ templates is remarkably tricky. Every time somebody says "you can't do this" somebody hacks up a giant ball of really clever cruft that does "do this." :-)

Anyway, you talked about the URL encoding routine. Certainly having a URL type with carefully guarded factory and conversion routines goes a long way towards eliminating URL related bugs. I can't argue with that. But when you build those conversions you have no proof that your URLs are in in fact complying with the properties you desire. Instead you have to rely on testing or hand-derived-proofs.

You used the phrase "real world" and that's a loaded phrase. In my experience everybody lives in their own little "real world." It could be that within some domains the kinds of proofs offered by C++ are in fact sufficient from a practical/economic/security stand point. In other domains, better proofs might be necessary. Is the URL type in the former or the later? That's not a question with a simple answer.

The Challenge...

...it seems to me, is to take that which can be done today (C++ template metaprogramming, "typeful" programming in ML or Haskell), add a dash of that which can't be done today, and make it not only not excruciating to do, but actually easy and, in a decade or so, as obvious as object-orientation is today (i.e. "How else would you do it?")

On the other hand, if you

On the other hand, if you just say "oh, hey, we have finer-grained types now" people might be a lot more interested. Even if in the end it was basically the same thing?

I agree. The idea of passing proofs around in your code looks a lot like a well-founded design-by-contract system of sorts. Whenever a function's specification requires a certain input, you must show that you are respecting the contract when you use it, and whenever a function's specification requires a certain output, you must show that the function respects that contract when you write it. It sounds less "threatening" to me stated like it's design by contract that is verified at compile time.

I don't see it

Trying to sell a programmer "oh, we have finer grained types" might get people more interested up front, but it won't take long before the programmer tries to pass a value to a function that's "obviously" a subtype of the parameter type, and if the compiler complains to him about it I think the interest will quickly wane.

Obviously

I think an important measure of "obviousness" is whether there's a decision procedure for proving that the subtype is, in fact, a subtype. I think you're right that having to manually discharge too many proof obligations won't fly. But manually proving a handful of more interesting ones might not be so bad.

Still don't see it

The reason I put "obviously" in quotes is that I think there are plenty of things that the programmer will find obvious that aren't obvious by a definition like yours[*]. Many of them will probably even be true.

Compare the versions of merge sort in Why Dependent Types Matter. I can assert that mergeSort preserves length trivially with lightweight type annotations - it's apparently considerably more work to prove those assertions in Epigram. Also note that paper indicates that it's easier to rework your definitions to be amenable to proof - I don't know how much worse off you'd be if you just wrote down the algorithm and tried to prove that property.

Or this presentation discusses (p25) a compiler implementation in Coq:

4,000 line compiler:
7,000 lines of lemmas and theorems includes interpreters/models of C and PPC code much is re-usable in other contexts
17,000 lines of proof scripts though with right tactics, could at least cut in half. and keep in mind, this is a very deep property.

Given that the number of lines of proof is over double the number of lines of assertions to be proved, we either have more than a handful of things requiring manual intervention or the ones that do require proof are difficult. EDIT: And 2:1 is being very generous here, since the programmer probably only cared about a few hundred or so of those theorem lines - the others being supporting lemmas.

The main argument would seem to be, though, that even if there were just a small number of hard things to prove, what is the advantage of forcing the programmer to prove them over just advising him that they aren't yet proven?

[*] I think the quantifiers need care - for any provable subtype relation there is a decision procedure that proves it.

Nothing's stopping you from

Nothing's stopping you from underspecifying in your types! I guess there's some value in keeping "well-typed enough to run" separate from "actually does what the spec says" - that is "this code can be well-typed" vs "this code has the types asked for".

I realize nothing's

I realize nothing's *stopping* me from underspecifying my types - I just don't want the fear of an impending deathmatch with type checker to *drive* me to underspecify them. And I agree with your second sentence, and think care should be taken to make sure "well-typed enough to run" is only very lightly typed (Edit Read:decidable)

Clarification

I have to imagine that I wasn't clear: if there's a decision procedure for something, then either the user of Coq has to use it, or Coq itself has to apply it automatically somewhere along the line for it to be compelling. That is, I was just using a slightly tongue-in-cheek formulation to observe that there's much more of the process of certifying software that you want to have certified in Coq.

The presentation that you quoted is, of course, referring to the CompCert project. This is a different endeavor than "just" programming with dependent types! This involves proving the semantic equivalence of all source code/object code pairs for a C-like language, which is why Greg Morrisett rightly calls it a "deep property" in the presentation. Leroy uses co-inductive big-step operational semantics and achieves this excellent result, but I'm interested in comparing-and-contrasting with Adam Chlipala's work on dependently-typed abstract syntax and denotational semantics; see his slides here, particularly slide 15, which contains a certified CPS transformation in ~250 lines, and slide 20, which gives some statistics similar to the ones you quoted. It's worth pointing out, however, that CompCert treats a very C-like, realistic language, while so far, Chlipala's work has been confined to a relatively simple lambda language.

In both cases, of course, what one would hope to arrive at would be a collection of libraries of reusable components, highly automated, with which compiler developers could implement new languages using Coq. From that perspective, Chlipala's efforts to provide generic tools and rely on Coq's own metalogic to make most traditional sorts of proofs (e.g. of progress and preservation, necessary in operational semantics) unnecessary seem quite welcome, as do tools such as U Penn's Metatheory library and Microsoft's SSREFLECT tactics.

But all of this is relative to certifying compilers developed in Coq, which isn't necessarily the same thing as making dependently-typed programming palatable to more programmers. That I really wouldn't expect to happen within Coq, but perhaps the tools I've mentioned will help someone design and implement a language, extracted from Coq, that will succeed at that goal.

A lot can be done with a

A lot can be done with a good IDE and tactics, and I imagine a lot of programmers will be willing to let the small fry pass given a lightweight way to say "fill it in for me" if it'll get some of the more complex cases easily too.

Sounds mostly good to me

I for one like the idea of more heavily integrating Monads into the language. Having pure functions expressed in the same syntax as monadic computations seems important because it makes adding a simple effect to a pure function much easier. However, I don't think you want to hide it from the type system. Improve the syntax of monadic types, yes, but the programmer needs to be able to distinguish f :: Int -> Int from f :: Int -> IO Int.

I also agree with your comment about proof passing and subtyping, though I'm quite skeptical of requiring programmers to prove those properties at all. I think something along the lines of Sage will lead to a good practice of tightly specifying types by not annoying users of the corresponding values with a bunch of proof obligations.

(Also, I don't see how Coq would fit well as a typechecker in this situation. Are you going to automatically insert tactics? Expose tactics to the programmer?)

Good Question!

I had a very similar reaction: programming in Coq is one thing; using Coq to develop a certified compiler something else.

Or is it? I have to admit to being somewhat surprised by the modest reaction to Theorem proving support in programming language semantics. It's a very interesting exercise in reflection and extraction, i.e. you can play, on a small scale, with the language within Coq, and then extract an interpreter to play with outside of Coq. In principle, there's no reason that the interpreter couldn't be a compiler. So it seems to me that if Ynot is sufficiently modular, you could program in Coq/Ynot if you wanted and extract the end-result code to Scheme, OCaml, or Haskell; or you could extract just the semantics and code gen à la CompCert or Adam Chlipala's certified compiler, put a parser on it, and have a "different" language. That's obviously an oversimplification—the devil being in the details, and all that—but I don't think it's an outrageous goal.

At least, it's no more outrageous than pushing to develop a certified compiler for a realistic, modern programming language within a proof assistant is. :-)

Better than you think

Some things are better than you might think. In particular, with the Russell extensions to
Coq, the proofs aren't actually constructed "in-line", but rather broken out as things that
you construct separate from the code. (You have the option of building explicit proofs
in-line, but we almost never do this.) Additionally, you can code and register tactics
that attempt to discharge the verification conditions automatically. Finally, Matthieu
Sozeau is adding some really nice additional features, including support for type-
classes, that should make the code even easier to read & write.

I agree that some better external syntax would be nice, but you have to be careful here:
We need a strong separation between "pure" and "impure" in large part to ensure that
when we construct proofs, they are meaningful *and* they can be erased after type-
checking. (In contrast, languages such as Sage must rely upon other techniques to
try to ensure that the "effects" that might happen within specifications are benign.
Similarly, with attempts to embed this sort of thing in Haskell, such as Cayenne,
you run into a fundamental unsoundness since you can use exceptions or divergence
to build proofs of "False".)

Having said all of this, I don't think that the syntactic structure here is at all ideal
and that there's much room for improvement. But at this point, we want to make sure
that we can actually write real code that uses effects, and prove deep properties
about them.

We do have some really neat code -- my favorite is Avi Shinnar's implementation of
dependent hash-tables (where the value's type associated with a key is dependent
on the key). He even has higher-order functions (e.g., iterators) that can
be passed effectful computations, and yet you can still reason about the
possible effects. The interface only tells you that all of the key*value pairs
will be presented to your function, but not what order. This gives you the
freedom to change the implementation of the abstraction (e.g., to anything
that implements a finite map), but also gives you enough reasoning power that
if you want to do something like map a function over the hash-table that
inserts each key*value pair into another (initially empty) hash-table, then
you can conclude at the end that they both implement the same finite-map.

Other examples at this point include things like memoizing (dependent) functions
(where the types ensure that the memoization is equivalent to re-computing)
and parsing combinators, where the types tell you what grammar the parser
will accept.

So anyway yes, we have a long way to go before this is ready for the real
world (Unreal indeed), but Coq (really Russell) has provided a good environment
to quickly prototype the ideas.

Dependent types in Scala

my favorite is Avi Shinnar's implementation of dependent hash-tables

Miles Sabin has realised this in Shapeless for Scala.

I played with this trick a

I played with this trick a lot in Scala back when I was working on it. It also works well enough in C# with plain old generics (probably Java also), although not as nicely.

It seems the Shapeless trick

It seems the Shapeless trick requires higher-kinds, so I'm not clear on how you could use this approach with C#'s generics.

The full shapeless trick

The full shapeless trick requires higher-kinds, but if you just want a "smart" dictionary you can easily vary its element type with its key type; e.g.,

T Get[T](this Dictionary[IKey, object], IKey[T] key) { ... }


Subtyping

Also, it's untidy to force programmers to generate proofs of stuff and pass them to functions in addition to ordinary parameters, such as calling a function and passing it an integer and a proof that the integer is between 0 and n. Better to have a more expressive type system with proper subtyping, e.g. have the type of natural numbers less than n as a subtype of integers.

That's precisely what Russell allows in Coq. You write the programs as if there were no proofs involved and it does all the proof passing.

Callout

I've updated the lead story to reflect the new project URL and that there is preliminary code available.

URL for SSReflect

Please note that the URL for the SSReflect tactics for Coq is obsolete. The SSReflect tactics' latest release is now located here.