archives

A Brief History of Scala

Martin Odersky is blogging

Raising Your Abstraction: A Brief History of Scala

Human Factors Research On Programming Language Syntax

Greetings All,

In the sprit of Ehud's latest comments, I would like to offer a new approach to the question of syntax. Rather than further splinter the "Let's make a programming language!" discussion, I think it reasonable to fork this topic.

As you may recall from my prior posts, I fall into the "syntax matters" camp. In my research, I am constantly looking at code fragments for a large number of languages and it is more than enough of a challenge to keep their respective semantics sorted out without the added burden of juggling generally terse notation systems which "across languages" heavily overload the same symbolism. At various conferences I see speaker after speaker transliterate slides of their pet notations into a semi-formal dialect of what one might call "CS English" and it is this phenomona that informs my thinking.

At The Institute for End User Computing, we are working towards the long term grand challenge goal of helping End Users to realize their full human potential by providing them with a new legacy free End User Computing Platform to integrate the best strands of research from all areas of CS and allied disciplines. Following the Newton model, we think it prudent to base the new platform on a new programming language. In this regard we have two, hopefuly compatible, goals.

1) Provide a mutli-paradigm Interlingua so projects orginally developed in C, Scheme, Prolog, Java or whatever can be re-coded and elegantly expressed in a uniform syntax that makes it easy to read and modify them if needed.

2) Support End User Programming, by following the PLT Scheme language level approach, to let End Users slowly wade in deeper as they look at code and learn new concepts. (i.e. the platform should be able to serve as a teaching tool)

We think these goals can be best met with a Quasi Natural Language Syntax that can map multiple surface structures (perhaps one based on CS terms of art and another based on more verbose plain english phrases with corresponding meanings) to a cannonical internal s-expresion-based AST representation that can be more readily manipulated by facilities like a hygenic macro system. Likewise, we place a premium on programmer usability which we suspect will greatly enhance software reliability, so we are willing to trade a lot of CPU cycles for a more natural and interactive programming experience. See Inform 7 and its scholarly overview for the closest live example of our general direction and imagine a similar system targeted to multi-paradigm programming rather than text adventure simulations.

Regardless of your thoughts on this approach, we don't want to design from the gut or from historical accident if we can find a more grounded approach.

So I suggest that we use this thread to gather any references we can find to actual research studies looking at questions of Syntax design. As a community we should also look beyond past research and try to ennumerate testable hypotheses and frame possible experiments that could support or refute them.

I want to see what the actual results are if you take groups of programmers and give them a task to express in more than one syntax that maps to the same sematics. What syntax works best for ordinary End Users? How strong a lasting effect if any does the notation of one's first programming language have on one's ability to pick up subsequent languages with different notations? Can we design studies to factor out syntax from programming paradigm? At a low level, is it better to start indexing arrays at 0 or 1? Is the answer different for programmers with different backgrounds? At a high level, can we devise experimental designs to show whether syntax has an impact on how long it takes to learn a new language and on whether it contributes to program correctness in a positive or negative way? Is there any correlation between a programmer's subjective opinion of a language's syntax and his or her ability to write correct code in it in the shortest time possible? Does a subjectively hard to use syntax lead to fewer or more errors?

Lets get away from arguing based on subjective opinion and try to find some emphirical evidence to back up our thinking!

Key questions to consider are the assumptions behind each study, whether it looked at programmers or non-programmers, their level of experience, and which discourse community its test subjects were drawn from (i.e. the programming paradigm under consideration and any historical notations imbued in its literature).

Even if, at the end of the day, we find that syntax is indeed irrelevant, the exercise will be a signficant step forward.