Questions about Semantics.

Okay, I'm doing a self study in techniques for defining the semantics of programming languages. I have found this book( Syntax and Semantics of Programming Languages ) which seems pretty comprehensive. In particular, I was interested if there were any formalisms besides those in the book(Operational, Denotational, Domain Theory, Axiomatic, Algebraic, etc.). I was also interested in people's opinions about the strengths and weaknesses of these formalisms, and if any of them are more popular, or considered "obsolete".

Also, since I'm interested in parsing as well, I was wondering how the use of grammar formalisms to define semantics(such as Attribute Grammars, Affix Grammars, Wijngaarden grammars) is seen in comparison to these formalisms. I guess, this paper(ACM Digital Library membership required) is a good example of one way of using a grammar to define semantics. Is it considered antiquated to use grammar formalisms to do this, or is it simply not popular?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Free the contexts!

Wow, could you have asked a question with more scope? ;-) My short answers:

Each of the semantic formalisms has its uses, depending on what you are studying.

How much semantics can you sneak into the syntax (grammar formalism)? Things get very hairy here(in HL as well as PL).

My impression is that most people favour a sharp separation between the two these days; it's easier to deal with.

Re: Semantics

Matt Estes: Okay, I'm doing a self study in techniques for defining the semantics of programming languages. I have found this book( Syntax and Semantics of Programming Languages ) which seems pretty comprehensive.

This is currently a subject of intense interest to me. Thanks for the link!

Matt: In particular, I was interested if there were any formalisms besides those in the book(Operational, Denotational, Domain Theory, Axiomatic, Algebraic, etc.).

Those are all of the ones I've heard of.

Matt: I was also interested in people's opinions about the strengths and weaknesses of these formalisms, and if any of them are more popular, or considered "obsolete".

It's my understanding that the most popular formalisms have historically been the Denotational Semantics and Operational Semantics, and as time has marched on, the focus has shifted from the former to the latter. The specific form of Operational Semantics that you tend to find in the literature is called Structural Operational Semantics, or SOS. However, it's long been recognized that the traditional SOS isn't modular and hence doesn't scale well to real-world-sized languages: a change to a single language feature generally means changing the entire semantics. This observation led Peter Mosses to develop "Action Semantics," which do a vastly better job of modularizing language semantics descriptions. More recently still, Espinosa and Liang, Hudak and Jones developed the notion of "Modular Monadic Semantics." Espinosa's papers and thesis on "Semantic Lego" are must-reads. Finally, Wansbrough and others have fused Action Semantics with Modular Monadic Semantics to come up with Modular Monadic Action Semantics, which was previously discussed on LtU here.

Lately a number of groups appear to have developed an interest in the use of formal tools, generally involving Rewrite Logic, to assist in the development and maintenance of programming language semantics, and in some cases, even in the development of interpreters and compilers from formal specifications. The systems that I find most intuitively appealing in this regard are MetaPRL and Maude. My prejudices in favor of MetaPRL are that it's an extension to O'Caml and I'm already an O'Caml programmer, and that it's been used to develop the Mojave compiler system. Balancing that for Maude is its application to development of the intriguing-sounding Open Calculus of Constructions.

I should also point again to DrScheme. Any of the post-208 versions include PLT-Redex, a very nice reduction semantics development environment, complete with animated reduction steps.

Matt: Also, since I'm interested in parsing as well, I was wondering how the use of grammar formalisms to define semantics(such as Attribute Grammars, Affix Grammars, Wijngaarden grammars) is seen in comparison to these formalisms. I guess, this paper(ACM Digital Library membership required) is a good example of one way of using a grammar to define semantics. Is it considered antiquated to use grammar formalisms to do this, or is it simply not popular?

I'm not familiar with this area at all. Intuitively it would seem to be related to the notion of using rewrite rules, which can be seen as a "grammar," to define your semantics, but I don't know enough about either approach to say much more—that's very much an intellectual destination for me, to understand the relationship between rewrite logic and programming language semantic definition. Actually, to understand all of it: how you can use a tool like MetaPRL to write a compiler, from parser from concrete syntax to higher-order abstract syntax, to rewriting the higher-order abstract syntax to an intermediate representation, to rewriting the intermediate representation to code for a real-world processor or back-end system like C--. Hence my interest in MetaPRL: they've actually already done what I just described. So to the extent that I'm interested in programming language formalisms, it seems like a good place to focus on their development.

However, it's long been recog

However, it's long been recognized that the traditional SOS isn't modular and hence doesn't scale well to real-world-sized languages: a change to a single language feature generally means changing the entire semantics.

This is not always true; the operational semantics of the general
kernel language given in chapter 13 of CTM is a form of SOS and
it is rather well factored. The interactions between the language
concepts are small and the kernel language can be subsetted in
many ways, with almost no modifications to the semantic rules.

The act of choosing which concepts are basic in the language is
very important: choosing them wisely makes this kind of
factorization possible. Choosing them poorly makes factorization
hard, and I don't believe that a clever choice of formalism can make
up for a poor choice of concepts.

But I am not a professional semantician; I will welcome any
suggestions for making the semantics even better than it is.

(PS on another point: there is another semantics that has not been
mentioned, and that is a logical semantics. I.e., using logical models
for the meaning of a program and a proof theory for the operational
aspects. Logical semantics only work well for a restricted set of
languages, though, which are called logic languages precisely because
they have a simple logical semantics.)

I should also point again to

I should also point again to DrScheme. Any of the post-208 versions include PLT-Redex, a very nice reduction semantics development environment, complete with animated reduction steps.

In this regard, check out the VamOz interpreter for Oz.
It respects the abstract machine semantics of CTM. It currently
only handles the declarative concurrent subset of the language,
though. (If you want to extend it, send a message to Christian
Schulte.)

You can't make an omelette without breaking some eggs.

Though I've heard of Wijngaarden's (two-level) grammars, I've never seen them used. That doesn't mean anything, though; I can imagine what they are, and why they might be useful.

Marc is correct that most people favor a sharp distinction between syntax and semantics, and I would add a third layer in between: typing. The reason for this is probably that we have well-established formalisms and notations for the first two: BNF-style tree grammars for syntax and "sequents and judgements" for typing. And the latter is also recycled for SOS semantics.

Personally, I prefer categorical denotational semantics to operational semantics, though. Besides the modularity issues, I don't think—in practice—operational semantics gives much insight, because it doesn't locate the language in a mathematical universe. Furthermore, all the accounts of operational semantics I have seen are untyped, and I think it is counterproductive to throw away the typing relation when you start talking about semantics.

However, as I understand it, operational semantics is essentially coalgebraic semantics, and so it is possible in principle that someone in the future will integrate it with denotational algebraic semantics, and then categorical (that is, typed) semantics.

Omelettes

I guess part of my interest is that I like obscurity(and 2 level grammars seem to definitely fit that bill), but since I'm in the midst of working on my Masters degree(in Mathematics), I have to say that the idea of making things mathematical precise is very appealing to me.

Also, finding the book "Grammars for Programming Languages" by Cleaveland which is a tour de force of using Wijngaarden grammars for all sorts of problems helped as well. It gives examples for proving Fermat's Last Theorem and defining the language of all true logical expressions. Of course, this also means unrestricted two level grammars are *too* powerful, but there are more tractable variations(which is what I'm trying to study).

Unfortunately, I don't have the time right now to do a self study of Category theory and connect it computing science(since I've not found anyone doing the connection and writing it down understandably, and my thesis is will be done in December, or at least, thats what my advisor told me :), so I found something I did understand(parsing, context free grammars, etc.) and tried to understand how it attacked the problem of semantics.

Two level grammars caught my attention since they can very elegantly deal with context in languages(which as I near as I understand captures of a lot of "semantic" information), and show the promise of being tractable for both formal proofs and for the automatic construction of tools such as compilers(like Paul was mentioning). I like my theory and practice to not be disconnected, which is why I'm a refugee from Engineering, and didn't go into a CSC program immediately, well, that and a girlfriend still at my school :).

I've heard that...

...two-level grammars fell out of favor because they are too expressive -- it's allegedly very easy to write grammars that permit far more than you want them to. (I say allegedly because I'm relaying something I was told on ll1-discuss rather than conveying personal experience.) Using an inference rule formulation is (I am told) just as expressive (since it's basically a functional program[*]) but it's much easier to say what you mean.

[*] edit: nondeterministic functional program (logic program?)

Actually...

I was probably the one who asked about them there :), but yeah, they can get really messy, but there were several attempts to make parsers that found restrictions to make them less expressive(but hopefully still powerful enough...).

And honestly, the literature seems indicate that Attribute grammars devolve to a bad way to write a program(functional or imperative) on nontoy problems.

I guess I was curious if other (nongrammar) formalisms suffer from this same problem. And it seems that some people indicated previously, that, at least for Operational Semantics, that has been the case. If I were to state a goal for my research it is for a technique for handling "semantics" or "context"(I guess its more a school of thought issue than anything) that is both formally useful for proofs and other theoretical work, as well as practically useful(like Context Free grammars were) so that it would actually be used to design languages(it seems like many semantics are defined after the fact, rather than as a part of the design process itself?).

I realize its arrogant to presume that I could succeed were many have not, but I noted someone in another thread linked to Hamming's You and Your research which seems to suggest that you should aim high :).

Well....

Attribute and van Wijngaarden grammars are great for definining small languages. Especially since in multilevel grammars it is often easier to deal with implementation details which are often conveniently forgotten about in denotational or operational semantics formalisms (think identification/typing).

Attribute grammars are also not dead, some languages actually still use them in their implementation (if I remember correctly the Utrecht Hugs implementation uses attribute grammars). There is also the ASF-SDF set of tools and the very unknown but good CDL3 project. (And these are only the dutch projects).

The problem with implementations of two-level grammars is that (a) they are often based on unification and exhaustive search (as not to restrict the formalism too much) giving objectionable performance and (b) multilevel grammers are mostly defined w.r.t. the concrete syntax whereas it is just more convient to define semantics w.r.t. the abstract syntax of a program, or coding w.r.t. to an attributed abstract tree and some lookup tables. In practice it is just more easy to think of compiling/interpretation as a discrete number of transformation steps on several data-structures.

In that respect, I suppose it would be near to impossible to use van Wijngaarden grammars to define a language from concrete syntax upto peephole optimizations on the generated code.

For a best of all worlds implementation you might really want to look at the CDL3 implementation.

I am running out of titles...

(if I remember correctly the Utrecht Hugs implementation uses attribute grammars

There is no Utrecht Hugs implementation. You are probably thinking of Generic Haskell, which is implemented using AG. That, or Helium, but I think Helium is written in straight GHC Haskell.

Personally, I don't like AG because it's a preprocessor (I don't like them), and you can get about the same benefits using monads. But Doaitse is big on attribute grammars, and they are part of the compiler construction curriculum here.

BTW, another kind of semantics which was not mentioned above is continuation semantics. It is a kind of denotational semantics, but you can think of it as a way of doing operational semantics denotationally. I don't know an authoritative reference, but it's explained in Andrzej Filinski's Master's thesis Declarative Continuations and Categorical Duality, and probably in Andrew Appel's book Compiling with Continuations (Cambridge University Press, 1992)—though I haven't read the latter. However, I think continuation semantics is subsumed by monadic semantics.

Titles are overrated :)

Even when using something else for the semantics is their value to having parsers that can handle context sensitivity? I mean, it seems that while we traditionally draw the line at context free for distinguishing between syntax and semantics, that there are many aspects of languages that seem more "syntatic" in nature than semantic(i.e. declaration of a variable before use really seems to be syntatic rather than semantic, maybe this is a bad example, but its the first one that jumped in my head :).

Also, it seems that a lot of people passed through Wijngaarden grammars on the way to Logic programming. What are "open" and/or important problems on the boundary between syntax and semantics? Am I just going off on a wild tangent?

I guess some of this is me having some existential angst over my choice of topics for masters thesis :). Maybe I should just stop beating the dead horse...

No worries

Hmm, Frank is right, it was Helium - the Utrecht Haskell (-like) implementation. Uhm, btw, the last time I looked there were some .ag files in that directory on my drive ;-).

I wouldn't get into any existential crisis. After my post I actually realized that most of the systems I mentioned use two-level grammars in a very restricted manner so the correspondence with van Wijngaarden grammars is actually, uhm, maybe a bit far fetched.

Matt, uhm, I am not sure I actually got the gist of your post?

Depending on how formal you wanna be, you can express everything from parsing, identification, semantical analysis, code generation as a syntactic game using a van Wijngaarden grammar. Also, if I remember the notion correctly, you can, for instance, also express the language of all derivable statements of a higher order logic using such a grammar since there is no actual notion of computation in it. Just syntax and rules.
(I think you might even argue that when most systems become more formal - that is, syntactic - it is just a matter of which notation you like best)

Maybe it would just be nice to do a comparative study, especially since it is a master's thesis?

God now I am replying to my own posts again...

Hmm, uhm. I don't want to so I guess you'll have to find out on your own ;-) : You might want to look up where van Wijngaarden stands in the Chomsky Hierarchy since I don't recollect the total formal definition of VW grammars.

Two-level grammar and Chomsky Hierarchy

You might want to look up where van Wijngaarden stands in the Chomsky Hierarchy

IIRC, two-level grammars are unrestricted, i.e. Turing powerful. This is partly why they are considered too powerful when most PLs and HLs are mildly context-sensitive.

Too much power...

Yeah, as was pointed out, they're equivalent to Type 0 grammars. A good portion of the literature on Wijngaarden grammars is about restrictions that make the grammar "decidable" or, even better, possible to make a pragmatic parser automatically. Unfortunately, a lot of the restrictions people have come up with are frustrating. In fact the language for all true statements of higher order logic is remarkably easy to understand. Of course, for some problems its obvious that you can't make a parser, but for others(particularly context sensitive constructs) Wijngaarden grammars are pretty elegant.

What I've been working on a little for my thesis is mapping between things like Affix grammars and Attribute grammars, particularly mapping decidable Affix Grammars to a decidable Wijngaarden grammar.

As to the bulk of the thesis, a lot of it so far has been rather survey like describing(with formal definitions and proofs) the main two level grammar formalisms. One of my earlier chapters is currently title "Nongrammar" formalisms to mention the types of semantics noted earlier.

A lot of my frustration has been coming up with a clear "why" we draw the line where we do, and why Context Free grammars caught on in mainstream CSC, and yet it seems that none of the semantics models(grammar based or not) has caught on as a way of designing the language first, whereas most people start designing the syntax of a language with a grammar.

I will probably try to do as much of a survey as possible

What I was meaning to say about the angst was that I'm afraid I've picked a bad thesis topic(or a bad approach to a good topic...), especially since I want to go on and get my Ph.D.(in CSC). Of course, part of the trickiness is that I'm in Math, and that my thesis has too be "mathematical enough" to make people happy in the math department. Since I was interested in programming languages(among other things) my advisor and I came up with parsing as being "mathy" enough and contributing to several of my other "big picture" goals.

I guess I'm also afraid I'm showing my ignorance about theoretical CSC, but I've really appreciated the responses so far, they've helped me to understand the stuff a lot better, especially bigger picture stuff.

You might want to look at Frank Pfenning's notes...

...for his class "Computation and Deduction".

He argues for using LF as the framework for formalizing languages, because it lets you use higher-order abstract syntax in a very clean way. (HOAS represents variables and binders in the object language with variables and lambdas in the meta-language, and this works -- in the sense of meta-theorems like soundness being provable -- because functions in LF are always parametric.)

Algebraic Singularities, Finite Graphs & D-Brane Gauge Theories

What I was meaning to say about the angst was that I'm afraid I've picked a bad thesis topic(or a bad approach to a good topic...), especially since I want to go on and get my Ph.D.(in CSC).

If you're worried about this, then a good thing to do might be to look at the home pages of some researchers or research groups, as they sometimes have suggestions or recommendations for thesis topics or projects. For example, we have a list for Master's students here, Eike Ritter has one here and Thorsten Altenkirch has a page for final year projects.

BTW, here is an exemplar you should use to write your thesis. Be sure to read the last sentence on page one.

Infinite Languages and Grammars

As a note, Matthew Estes completed this thesis 2005 May. I can't find any dedicated LtU node for it, though.

Properties of Infinite Languages and Grammars by Matthew Scott Estes.

Wow. I admit I have devolved

Wow. I admit I have devolved to more of an infrequent lurker around here. Imagine my surprise when I just pop in for a brief visit today and find someone has posted a link to my Master's thesis... yikes. Well, I do appreciate the sentiment, although I'm not sure its up to the standards the community here normally expects.