Mechanized Metatheory Model-Checking

by James Cheney (beware, PDF takes up full screen)

Based on envious observations of the success of formal methods for verifying industrial hardware designs using model-checking, I will argue that "partial" techniques which provide full automation and search for counterexamples, but which do not try to verify correctness, are likely to be more useful [for "metatheory of logics and programming languages"] on a day-to-day basis activities than full verification. I will describe an unsophisticated, yet useful, implementation of such a "model-checking" approach to mechanized metatheory implemented using nominal logic programming (although the basic idea could be employed in many other settings).

Model checking meets POPLMark. I can't tell from this presentation if there's any chance of using tools similar to BLAST to search deeper or produce actual proofs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Philip Wadler said...

"Cool, it's like QuickCheck for type systems!" at the talk IIRC.

Wrong category, eh?

Surely not admin.

Wrong category, indeed!

Thanks, fixed. I would love it if the submission page had no default category, then complained if none were selected.


Can anybody provide a cite for the ML forall/ref typing bug referenced (or any of the other typing bugs)?

Forall/ref bug

See e.g. the section "Why the value restriction exists" on this page of the MLton Wiki. Early versions of ML with references did not have any such restriction and hence were unsound.


Was this found during the formal specification of the core of ML or prior to that. Or are refs not covered by the formally specified fragment.


Was this found during the formal specification of the core of ML or prior to that.

AFAIK it was discovered very early on. I don't know the exact details of the development, but we are speaking about something 1980-ish. The formal language specification we have today (for Standard ML) evolved in the late 80s.

Or are refs not covered by the formally specified fragment.

The formally specified "fragment" of SML contains the complete language, so refs are certainly covered. ;-)

On the other hand, until recently, there never had been an attempt to do a soundness proof for the complete specification. But the combination of HM-polymorphism and references has been investigated extensively in literature ever since the problem was discovered. The paper by Andrew Wright, which motivates the solution adopted today, can serve as a starting point.


Thanks for reinterpreting my question so that it was less vacuous :)

I was recently talking to a friend about formally specifing programming languages. He asked me why one would want to do such a thing and was it really worth the trouble. I said that you could prove properties about the language if it was formally specified, but my argument was not very convincing since I couldn't think of what you might actually want to show.

Soundness "sounds" like a desirable property. In the context of a functional programming language however, I don't think that I understand what soundness means? Presumably this is something more than just soundness of the type theory with respect to the language? Or is it?

The advantage of formal specs

Soundness usually means type soundness, yes. Your comment sounds like you regard that as a trivial property, but I would strongly disagree with that. Otherwise why do we have these 99% of languages with unsound type systems floating around? Nor do I see why it should be special to FPLs.

There are many other reasons why formalisation is useful:

  • Portability. A proper formal spec unambiguously specifies the typing and the operational semantics of a language. There is no room left for interpretation, such that different implementations agree on what they actually implement (contrast this to the huge differences between, say, C++ compilers).
  • Precision. A related point: as a language implementor you don't have to do any guess-working of well-intentioned prose that is ambiguous and incomplete after all.
  • Confidence. Besides soundness, there are other properties you might want to investigate. For example, you might prove that type checking is actually decidable (a recent draft paper by Kennedy and Pierce suggests that this is far from obvious even for fairly well-defined languages like Java, C#, or Scala).
  • Coherence. The process of working out a formal spec usually uncovers all kinds of non-obvious problems and glitches in a language that otherwise tend to go unnoticed until too late.

Of course, a formal spec is no automatic guarantee of any of this. Even the SML Def has its grey areas and a few bugs. But you get considerably closer to the ideal.

soundness: keeping your promises

I like to use the following intuition: a sound analysis -- be it a type system, a control flow analysis, etc. -- is one that makes promises it can keep. There are two sides to such a thing: 1) what are the promises it makes, and 2) does it guarantee to uphold them?

The type soundness theorem for ML promises that programs "blessed" by the type checker will never produce runtime type errors. A well-typed program may not terminate, or it may have other kinds of errors (like division by zero). But the type checker guarantees that an expression of type T can only produce a value of type T (this is the "preservation" property), and it always will produce a value unless it fails to terminate or raises some non-type-related error like division by zero (this is the "progress" property).

Naturally, whenever you come up with an analysis, you can make its promises as trivial as you want, in which case you'll get a vacuous soundness theorem. For example: "if my type checker blesses your program, then your program will either terminate or it won't." That's a well-formed proposition, but not a useful property.

The usefulness of ML's type theorem is that it gives programmers invariants they can rely on and enforces them--in practice it means programmers don't have to guard against ill-typed inputs and outputs. The soundness theorem proves that it enforces these invariants correctly.

Artificial distinctions

Allow me to predict that division-by-zero will be recategorised as "obviously a type error" in ten years or so when your type systems are able to detect them. :-)

(OK so admittedly it's a very good example of choosing your promises carefully. The divide-by-zero example just always "gets me" :-))

obviously a type error

Allow me to predict that division-by-zero will be recategorised as "obviously a type error" in ten years or so

Undecidability notwithstanding... ;)

good example of choosing your promises carefully

Of course. It's one of the great delusions of CS [edit: maybe I over-curmudgeoned that one a bit...] that everyone took Milner at his word when he said "well-typed programs don't go wrong." (I think he might have even intended that as a mild joke.) The program state WRONG is a technical device that encodes precisely those states which the type system was intended to prevent, and nothing else.

That's why I prefer to describe soundness in terms of "what your type system promises." I'd love to hear about a type system that guarantees its users a pony, but until then they'll just have to stick with what they can accomplish.

I'm working on this!


I'm interested in the same area, and am currently working on an implementation that uses the Alloy model checker as its back-end for checking user defined operational semantics. Is/has anyone else working on something similar?



How to formulate a type

How to formulate a type system in a form that can be formally studied, checked or verified? For example how do I express OO interface inheritance? Constructor overloading?
Could Alloy be used for such? Apparently not

Ah, and do you have examples of "formal methods for verifying industrial hardware designs using model-checking"? I did not see any in the slides, maybe I missed them.

How to...

How to formulate a type system in a form that can be formally studied, checked or verified? For example how do I express OO interface inheritance? Constructor overloading?

I feel like this is a bit OT here, but I highly recommend that you take a look at Pierce's Types and Programming Languages. It contains plenty of examples and exercises. If you'd rather just take a look at something more concrete, you can try The Definition of Standard ML, although it doesn't contain any of the OO constructs you mention.

Model-checking hardware, and the FL functional language

You might start with Intel's Forte environment, which is Intel custom-built verification environment that integrates model-checking engines, BDDs, circuit manipulation functions, theorem proving, and a functional programming language called "FL". The system has been in deployment with Intel's major processor verification teams for several years.
You could also try this 1995 paper on Model-Checking in Industrial Hardware Design, or this discussion of hardware model-checking at IBM. An older example would be the use of the FDR2 model-checker (technically refinement checker) to verify parts of the hardware for the T9000 Transputer (sadly the papers describing that effort don't seem to be available online).

Interestingly (and perhaps more on-topic for LtU), Forte's FL programming language is described as

...a strongly typed, lazy, functional programming language. A distinguishing feature is that BDDs are built into the language and every object of the Boolean type 'bool' is represented as a BDD.
I'd never even heard of FL before I saw this. It sounds like it was developed specifically for use with Forte. But I think it's interesting that it's a lazy language - I tend to think of Haskell as the only widely used example of a lazy functional language. Anyone here know anything more about FL?

AEP (*) Attack

I'm going to go out on a limb here and guess that the specfic notion of "BDD" meant here is "Binary Decision Diagram", despite the fact that the first BDD expansion shown via the search box on that very Intel forte site says that BDD stands for "Business Desktop Deployment". And wikipedia suggests a few others.

(*) AEP = the Acronym Expansion Police :-)

Use BMC for bounded checking like this

Our experience is that BMC (bounded model checking, which transforms a fixed-depth exhaustive search into a propositional-satisfiability problem and passes that to a SAT checker) is far more powerful than the explicit-state model checking used here, when doing fixed-depth searches.

Furthermore, after using BMC to debug a system, you can use BMC to do inductive model checking for an unbounded proof, which for problems of this size will be completely impractical for explicit-state model checking. You do have to come up with the inductive hypothesis by hand, so inductive BMC isn't automatic, but for systems like this where you understand a great deal about the intended invariants, it may be a cost-effective alternative to traditional proof methods.