Applications of formal semantics

I've read this site for some time, but now I have a question I believe the people at LtU could answer. The question is what are some of the current applications of formal semantics.

I've read or skimmed through some semantics textbooks, like Winskel's, and some less well-known, but a question they rarely answer in a satisfactory manner, at least to me, is why I should try to model the semantics of a language formally in the first place. Usually, the books say in the very beginning that semantics is useful for verification and program analysis, and then say no more about it. That's the reason I tried to get hold of a copy of Semantics with Applications as soon as I discovered it (an old edition is available online). The applications mentioned in the book are proving the correctness of an implementation, program analysis, and axiomatic program verification.

I've also seen recently the papers about expressing semantics in theorem provers, like here. It's cool that it is possible to extract an interpreter and a module for program analysis from the proofs about semantics. But we could just code the interpreter and program analysis; yes, I know, extracting them from a prover makes them more trustworthy. I refrain from saying that they are "correct" because it would depend on the correctness of the theorem prover, the compiler for the language, the extraction process, etc. Also, I don't know how this idea would fare in bigger, more realistic languages.

So, although I have seen already some applications of formal semantics, nothing of it quite managed to impress me much. What is it, then, that we gain by modeling the semantics of a language formally? What we can accomplish with a semantics that we couldn't without? What would be possible without a formal semantics, but difficult and/or clumsy? What examples do we have of sucessful and important applications of formal semantics to programming practice?

I think this is very much worth knowing, the questions are worth asking, and the books don't provide answers. So maybe LtU can :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

My motivation...

...is indeed to ensure correctness (no scare quotes because I'm interested in using a theorem prover satisfying the de Bruijn criterion written in a language with a formal semantics and built with a certified compiler running on a processor almost certainly specified and verified using Z or something very similar). Correctness of what? Correctness of treatment of properties that are extraordinarily difficult to get right without such tools. My argument is that the next mainstream programming language needs to get two things really right that, today, no language manages at the same time:

  • Concurrency
  • Information flow security

Both of these have proven highly resistant to informal treatment. The state of the art for dealing with concurrency outside of process algebras seems to be message-passing, and that imposes limitations on the runtime architecture and programming model, along the lines of Erlang or E, for example. If you can live with that, great, but many systems can't, which is why people are interested in process calculi, software transactional memory, and so on.

Security is a huge topic, and interesting because it has both static and dynamic components. E is the gold standard here, but has essentially no static component. I'd like to see something like Flow Caml but with the necessary dynamic component included. This is a horribly inadequate description of the goal; a much better description, along with details about one effort to address it, can be found in Geoffrey Washburn's Principia Narcissus.

So if you think in terms of a JoCaml + Flow Caml + some stuff that neither of them do, like addressing the dynamic aspects of information flow control, you'll see that there's some rather heavy type-theoretic lifting to do. It's brutally hard to get that stuff right when you are being formal, let alone when you aren't. In addition, if you want people to use your language, you have to not violate their existing mental models too much, either in terms of apparent semantics or in terms of syntax. This is where I draw most of my inspiration from Tim Sweeney's work. Tim is putting a great deal of effort into making the language that he is working on look like another object-oriented/imperative/functional hybrid whose syntax resembles a cross between ML's and Pascal's, but whose semantics might resemble something like a dependently-typed Disciple more than anything else.

I think that's what the next mainstream language is going to have to be like, if it's to be anything besides another warmed-over C/C++/Java/C# or Lisp/Smalltalk/Python/Ruby clone. Neither multiprocessor/multicore systems nor the Internet are going away; the days of writing Von Neumann code that only has to deal with one implicitly-trusted user are long behind us. But addressing those issues is hard, and it will take the absolute best practices we can bring to bear on designing and implementing programming languages to tackle them effectively.

So that leaves scaling this stuff up to real languages. That's the big outstanding challenge, and is part of why I'm excited about the potential of Adam Chlipala's work with LambdaTamer. I think his program of using denotational semantics so that the metalogic is Coq's own Calculus of Inductive Constructions and "proofs" of semantic equivalence are just CIC equality, coupled with aggressive generic proof automation, is the path to scaling efforts like this up. Of course, this remains hypothetical; there's a great deal more work to be done to confirm this hypothesis. But it seems to me to be the most promising path.

I don't know how closely this set of thoughts tracks those of anyone else who is interested in certified software development. But I do think it helps explain why you can come away from Winskel and the Semantics Survey in Coq that's based on it not really feeling like it's worth it for the little IMP language that Winskel develops, but perhaps you can see how it could be worth it for a kind of combined JoCaml/Flow Caml/Disciple that won't scare people away. :-)

An approach to the de Bruijn criterion

And an approach that complements your (Paul's) other interests at that...

Consider Zhong Shao & coworker's An Open Framework for Foundational Proof-Carrying Code. This provides a specification of a typed assembly language (OCAP) that uses the calculus of inductive constructions as its type system, and a notion of proof represention allowing concrete witnesses to assembly code satisfying a specification. We satisfy the de Bruijn criterion by providing two further ingredients: (i) a verifier written in OCAP code, and (ii) a proof that the verifier verifies all and only code whose correctness is witnessed by the PCC-carried code.

Does this sound like a good fit?

Imagine physics without math...

Studying programming languages without formal semantics would be like studying physics without math. It would make the study of programming languages an exclusively soft science. Formal semantics help us understand programming languages, give us confidence in the truths we discover about languages, and prevent us from believing things that aren't true. They provide tools for exploring new semantics rigorously — many (most?) of the advances in programming language theory have arisen out of formal semantics.

If formal semantics has any applications other than that, that's just a bonus. But there are some: for example, they're very useful in the design of languages. The security examples given by Paul are just one example of this. Formal semantics provide general patterns for language design, which can be applied to new languages.

Disagree

Studying programming languages without formal semantics would be like studying physics without math.

Studying programming languages without any semantics at all (not even an implementation of the language) might be like studying physics without math, but the formal semantics he's talking about aren't nearly so important. Studying programming languages without formal models is probably more akin to studying physics without mathematical rigor... and early physicists were getting pretty good results playing fast and loose with the math IIRC.

Rigor is important for eliminating subtle mistakes and because there are often insights to be gained from distilling concepts down to minimal models. But you can't really study physics without math while you can get quite far designing programming languages without formal models.