The Semicolon Wars

Some light reading for the holiday season: writing for American Scientest, Brian Hayes says in The Semicolon Wars

A catalog maintained by Bill Kinnersley of the University of Kansas lists about 2,500 programming languages. Another survey, compiled by Diarmuid Piggott, puts the total even higher, at more than 8,500. And keep in mind that whereas human languages have had millennia to evolve and diversify, all the computer languages have sprung up in just 50 years. Even by the more-conservative standards of the Kinnersley count, that means we've been inventing one language a week, on average, ever since Fortran.

For ethnologists, linguistic diversity is a cultural resource to be nurtured and preserved, much like biodiversity. All human languages are valuable; the more the better. That attitude of detached reverence is harder to sustain when it comes to computer languages, which are products of design or engineering rather than evolution. The creators of a new programming language are not just adding variety for its own sake; they are trying to make something demonstrably better. But the very fact that the proliferation of languages goes on and on argues that we still haven't gotten it right. We still don't know the best notation—or even a good-enough notation—for expressing an algorithm or defining a data structure.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Pointwise analysis not optimal

A thought-provoking and enjoyable article, though I think he exaggerated some of the downsides of linguistic diversity, for dramatic effect. In fact, he pointed out that only a few languages have had really long lifespans so far, which indicates that languages are dying out as well, possibly as fast as they're being created (though this is harder to measure).

In fact, one of those long-lived languages (Lisp) has some features (lambda expressions) that have started crossing over into other languages (C#, then Java, then even C++). One can also look at crossover in the higher-order type space (Haskell type classes to Scala traits and C++ concepts).

So to me it's less important to focus on the "pointwise" tension between individual languages at any point in time, and more important to look at the "setwise" collective evolution of the languages in current use. That kind of analysis would be fascinating, and it seems Hayes missed an opportunity to consider this.

Also, I have an issue with this sentence:

"I would argue that the distance between C and Lisp, for example, is greater than that between any pair of human languages."

Really? Greater than the difference between, say, English and Japanese? I can't agree with this example, at all.

Multimembered, but?

When we already are at it and when we view language evolution as a multimembered crossfertilization process, it would be of interest to see what pressure these languages give in.

What happened when Modula-R failed to see the light, but nevertheless we can write database applications in almost every language nowadays. What is behind the functional programming frenzy, it would be more amenable to proofs, but does somebody really do proofs about their programs? Do I need the programming language Go to deploy my distributed algorithms in the Cloud? Or is it simply that every University needs its pet project to get funding? Or is a language just a way for Egomanicas/Streetgangs to get attention?

I more feel that although to some degree the environment changes over time, that the substrate has dramatically changed. FORTRAN was invented to allow formula translation, LoL. Every one IDE delivers a toolkit to define custom grammars, there are dozen of compiler construction kits etc.. It is very natural to experiment with these and often we are motivated to create domain specific languages. So I see a time coming of syntactic and semantic flux, software agents automatically exchanging their towers of definitions, rooted in some universal deictic bootstrapping yet to come.

FP frenzy

I've yet to witness real frenzy when it comes to FP. Proof is not the point (although given the right language and tools it can be), it's more about being able to express certain algorithms succinctly and reason about them in a way that is for the most part easier than the kind of operational reasoning you do when you talk about procedural programs. Although this only really applies to purely functional languages, as side effects and the absence of referential transparency don't play well with equational reasoning. Apart from all the talk about mutlicores and the end of Moore's law and the intricate connection of these developments to purely functional programming and new ways of dealing with concurrency and making parts of your program concurrent (semi-)automatically, I guess the main thing is that we are witnessing a trend towards more diversity. Business begin to see that there are languages and paradigms which are better suited to some of their problems than the language(s) they are already using. What I'd like to see is a move away from rolling your own DSL towards embedded DSLs. Playing with a compiler construction kit or similar software may be fun but in the wrong hands it leads to, well, you said it: syntactic and semantic flux. Knowing what a language (or family of languages) does is the most important thing. How will you know what the semantics of your program written in DSL X really is and whether it does not change while you're having a coffee break because the DSL guy had so much fun with this compiler compiler?

That's why you should ship

That's why you should ship your new programming language only together with documentation for the semantics (like it has been done with Babel-17).

I'm starting to shink that

I'm starting to shink that you should provide a mechanized proof of the good properties of the semantics (static and dynamic).

I definitely envision

I definitely envision mechanized formalizations of Babel-17 such that you can prove properties about programs written in it. If you want to do such a thing now, for example in Isabelle, that would be great (and of course I would be more than willing to answer any question with respect to Babel-17 that might come up)! My personal focus currently is on a) delivering a decent performance implementation of Babel-17, b) employ (and grow) Babel-17 for "real-world" projects. I hope to have a Babel-17 to Java compiler available in February.

Is there actually a

Is there actually a documented case where the value of a mechanized semantics at time of initial release and development has been shown? There is a non-trivial burden here -- there are very few languages with even a mechanized kernel, and of those, most were after the fact. To be clear, I'm not arguing against the existence of a value; I just have little validated feel for the pros/cons and implications on process wrt current strengths and limitations. E.g., even if something is proven about the language, it could be the wrong thing: after years of even postpartum semantics papers about Java/C++, it seems the important thing for C++ etc. turned out to be memory models and maybe templates, with the rest being self-driven by analysis desires.

I don't know of any such

I don't know of any such case. But I think that there is a relation between the fact that we try to mechanize the semantics after the fact, as you noted, and that such mechanization is difficult. Formalizing C semantics, for example, has shown to be a huge undertaking, and I expect an important part of the difficulty to come from the fact that C was never meant to have a mechanized semantics, and that the informal specifications were therefore very difficult to make precise and amenable to verification.

If a language was designed with formal semantics in mind, it would probably have a semantics more adapted to verification in the end. It's a form of co-design of a language and its formal semantics. And this is very interesting because when proving things about a program written in the language, we reason on the semantics of the program. A language with a semantics adapted to mechanized proofs would make proving various specifications of programs easier.

Of course, this is mostly an hypothesis right now, and on the other side there is plenty of evidence that formal semantics of even very simple and pure language such as typed lambda-calculi, while well-known now, is still a highly non-trivial task -- for example, the representation of variable bindings is still an active research topic.

C isn't a very good example

C isn't a very good example since it was never intended to provide "good properties of the semantics (static and dynamic)", so there is almost nothing to verify. C was intended to serve as portable assembler, and the desired portability isn't something that can readily be formally defined (since we have no closed list of targets or hosts), much less formally verified.

I would like more languages be designed with effort to support a wider range of "properties of the semantics (static and dynamic)", even if only through informal proof-sketches and other justifications.

A formal semantics does nothing to evade the halting problem or Rice's theorem. It is trivial to provide a formal, operational semantics for many turing-tarpit languages, but doing so wouldn't help us reason about programs without running them. Since any type-0 language can model any other, it also is difficult to attribute properties to the language itself - i.e. Haskell's purity would not help reasoning about a Whirl program represented in a Haskell monad. Similarly, a language that allegedly simplifies reasoning about deadlock and progress won't be a big help if the language doesn't offer effective control over concurrency and developers must idiomatically model serializers, mutexes, and semaphores to control the excess of concurrency. (I'm looking at you, actors model.)

We don't benefit from "a semantics more adapted to verification" until we have chosen a useful set of properties to verify. The language designer chooses which properties will be 'easiest' to reason about and verify for programs developed in that language. But those decisions must be carefully considered if we are also to avoid the idioms and patterns that would violate the properties we wish to achieve.

The 'good semantic properties' of a language are balanced between simplifying achievement of useful properties in programs and avoiding incentives for counter-productive patterns. It isn't clear to me how we could mechanically verify either of these aspects. How do we formally define "simplicity" for achieving useful properties?

Designing with 'formal semantics in mind' is useful, insofar as it helps us reason about correctness of the language implementation or about other properties. But I think most reasoning about language design will be informal, even if the language has formal semantics.

A good way to judge the

A good way to judge the semantics of a language is by judging how easy it is to prove functional correctness for programs written in that language. If it is difficult to even define what functional correctness for programs in your language means, the semantics of your language is in serious trouble. Like the semantics of C.

Judging semantics

I don't really agree... the domain, not the language, is the primary factor in whether it is difficult to "even define what functional correctness means" for a program. We can define what 'sorting a list' means just as easily for C as for Haskell. If we try to define what functional correctness means for the Google AI Challenge, neither C nor Haskell will help us.

For judging semantics of a 'general purpose' programming language, I'd actually suggest you should focus on the non-functional properties - safety, robustness, resilience, performance, progress, resource costs, et cetera.

If developers can reason about these non-functional properties, achieve them without much semantic noise, then they should have greater freedom to focus their attentions on 'functional correctness' and other domain properties.

It is much easier to define

It is much easier to define for a purely functional language what sorting a list means than for a language like C. Believe me.

Apart from that, functional correctness is probably the most "general purpose" you can get :-)

Of course, non-functional properties are very interesting too, but usually not more important than functional correctness. For example, what is "robustness" if not functional correctness you can actually rely on?

A program may be incorrect

A program may be incorrect but robust, or correct but fragile.

A typical program makes many assumptions - preconditions, well-formed input, that clients complete their protocols, that dependency services are available, et cetera. A 'correct' program only needs to be 'correct' within the domain of these assumed conditions. A 'robust' program would be one that behaves sanely when such conditions are violated - i.e. a robust algorithm won't crash (though it might abort gracefully), won't enter divergent loops, will leave memory and persistent stores in a well-defined state; a robust service will not leak memory or other resources when a client or dependency fails to complete its protocol.

Of course, you could always clutter your requirements and code with a bunch of robustness assertions. Robustness by itself is trivial to achieve. But we typically also want correctness, modularity, extensibility, debug-ability, performance, and other properties. Since we want a lot of nice properties for almost every program, a little clutter for each property would quickly grow into a huge complexity problem.

In addition to letting us achieve our application or service goals, a duty of a good programming language is to provide us robustness, modularity, performance, and so on even when we aren't thinking about them - i.e. to have 'good defaults' on the path-of-least-resistance. The less we need to specify to achieve common properties, the simpler the requirements and code, which in turn simplifies verification and exploration.

And for sorting a list, we could easily make correctness conditions the same in C as in a purely functional language: produce a new list, don't mutate the old one, assume that the input list is not concurrently mutated. And then our algorithm in C would be 'correct' under the same conditions as the corresponding algorithm in the purely functional language. The difference would lie in the non-functional properties - safety, robustness, complexity, performance, generic reuse.

That's my point: If you

That's my point: If you treat C as a purely functional language, why not choose a purely functional language to begin with! It saves you a lot of work in case you actually fire up an interactive theorem prover and do some work in it.

You can prove programs written in Turing-complete languages

A formal semantics does nothing to evade the halting problem or Rice's theorem. It is trivial to provide a formal, operational semantics for many turing-tarpit languages, but doing so wouldn't help us reason about programs without running them.

That's not true, or rather there must be a misunderstanding here. You can perfectly well reason on programs written in Turing-complete languages. The Rice theorem is about *automatically* deciding a property on *all* programs. This is undecidable, but that certainly does not mean that you cannot prove a property for a given program.

By "mechanized verification" I do not mean that the computer has to do all the proving (which is hard), but that the proof is described in a formal system that the computer can check, as in usual proof assistants. Part of the proofs can probably be automated, but there is necessarily human intervention in some part, at least to define the properties to prove/check.

Since any type-0 language can model any other, it also is difficult to attribute properties to the language itself - i.e. Haskell's purity would not help reasoning about a Whirl program represented in a Haskell monad.

A good semantics for a language does not mean that all programs are easy to reason about; in any language you can write programs that are hard to read, understand, debug, etc. But I wish that well-written programs, or at least programs written with verification in mind -- but I like to think that this is related to the former criteria -- be easy to reason about. And if you want formal reasoning, you need a formal semantics for the language -- or at least the subset your program uses.

Language Properties

Your earlier position, as I understood it, was: "you should provide a mechanized proof of the good properties of the [language] semantics (static and dynamic)". I argued there is no such thing as good properties of language semantics that hold true in a formal sense (for general purpose languages). That is: we can always write programs in a manner that undermines whatever good properties the language was attempting to achieve. Thus, I must understand 'good properties of language semantics' in a soft, informal sense - i.e. a language simplifies reasoning about some properties or discourages various buggy patterns that would hinder reasoning, but we cannot mechanically prove 'simplifies' and 'discourages'.

'Good' vs. 'formal' language semantics are two very different things - e.g. one could formalize a turing-tarpit. But I would agree they are related; formal semantics do simplify reasoning about many language properties by eliminating dependencies upon specific implementation. Formal semantics also clarify what means 'correct' for an implementation or optimization.

What 'good' semantics offer is the ability to reason about and readily achieve useful properties in programs on the path-of-least-resistance. The very nature of technological progress is the ability to accomplish more without thought or effort. The properties we don't need to think about are the properties that will exist pervasively across libraries and applications.

Statically checkable properties of programs

That is: we can always write programs in a manner that undermines whatever good properties the language was attempting to achieve.

Yes, but the compiler may statically reject such programs. That is, after all, the whole point of semantic analysis and static type systems. In these cases, it might be argued that the good property in question is a property of the static type system (or semantic analysis) and not of the (operational) semantics, but nevertheless any static guarantee about the dynamic behavior of programs is a potentially good thing (TM).

At least IMO--but now we are getting into value judgements and I thought LtU was a nonreligious site :-)

At least Standard ML has

At least Standard ML has been developed with a formal (not necessarily mechanized, but as the inventors of Standard ML were also busy building theorem provers, this was probably on their mind, too) semantics to guide the development. I think that the value of this is apparent, just look at Standard ML, it is really beautiful and nice to program in for such an old language.

I found that it helped me a lot (and is still helping me, as Babel-17 is not completed yet) to design Babel-17 by basically mechanizing its formal semantics "in my head". For example , the interplay between exceptions, concurrency and laziness is designed to yield the clearest formal semantics that occurred to me.

Unreleased languages

I suspect the number would be much, much larger if we could count all the languages that were invented but never released.

Clearly

The paper itself

This is a wonderful paper. I have never seen such a complete and readable exposition and taxonomy of programming languages. I will save this to show people who ask about what kinds of programming languages there are, and how they differ. (Believe me that my delight in this paper was clear way before I got to the end; I don't only like it because he is, as I am, a Common Lisp fan.)

If you want to see another wonderful exposition, a video that is both a history of programming languages and a delightful and funny work of performance art, see "50 on 50" by Guy L Steele Jr and Richard Gabriel. See http://lambda-the-ultimate.org/node/3101 and follow the links to the video or just jump to http://blog.jaoo.dk/2008/11/21/art-and-code-obscure-or-beautiful-code/

Clojure

Its funny I had a new experience. This article ended with the typical comment about Common Lisp holding Lisp back but don't change it..... For the first time, because of Clojure that struck me as ridiculous. Its not held back anymore. There finally is a new LISP that offers a huge library and at the same time offers all the features of a LISP. And not having to feel that frustration was a very nice experience.

So I guess I'll break with LTM tradition and share about an emotion rather than a thought.