Human Factors Research On Programming Language Syntax

Greetings All,

In the sprit of Ehud's latest comments, I would like to offer a new approach to the question of syntax. Rather than further splinter the "Let's make a programming language!" discussion, I think it reasonable to fork this topic.

As you may recall from my prior posts, I fall into the "syntax matters" camp. In my research, I am constantly looking at code fragments for a large number of languages and it is more than enough of a challenge to keep their respective semantics sorted out without the added burden of juggling generally terse notation systems which "across languages" heavily overload the same symbolism. At various conferences I see speaker after speaker transliterate slides of their pet notations into a semi-formal dialect of what one might call "CS English" and it is this phenomona that informs my thinking.

At The Institute for End User Computing, we are working towards the long term grand challenge goal of helping End Users to realize their full human potential by providing them with a new legacy free End User Computing Platform to integrate the best strands of research from all areas of CS and allied disciplines. Following the Newton model, we think it prudent to base the new platform on a new programming language. In this regard we have two, hopefuly compatible, goals.

1) Provide a mutli-paradigm Interlingua so projects orginally developed in C, Scheme, Prolog, Java or whatever can be re-coded and elegantly expressed in a uniform syntax that makes it easy to read and modify them if needed.

2) Support End User Programming, by following the PLT Scheme language level approach, to let End Users slowly wade in deeper as they look at code and learn new concepts. (i.e. the platform should be able to serve as a teaching tool)

We think these goals can be best met with a Quasi Natural Language Syntax that can map multiple surface structures (perhaps one based on CS terms of art and another based on more verbose plain english phrases with corresponding meanings) to a cannonical internal s-expresion-based AST representation that can be more readily manipulated by facilities like a hygenic macro system. Likewise, we place a premium on programmer usability which we suspect will greatly enhance software reliability, so we are willing to trade a lot of CPU cycles for a more natural and interactive programming experience. See Inform 7 and its scholarly overview for the closest live example of our general direction and imagine a similar system targeted to multi-paradigm programming rather than text adventure simulations.

Regardless of your thoughts on this approach, we don't want to design from the gut or from historical accident if we can find a more grounded approach.

So I suggest that we use this thread to gather any references we can find to actual research studies looking at questions of Syntax design. As a community we should also look beyond past research and try to ennumerate testable hypotheses and frame possible experiments that could support or refute them.

I want to see what the actual results are if you take groups of programmers and give them a task to express in more than one syntax that maps to the same sematics. What syntax works best for ordinary End Users? How strong a lasting effect if any does the notation of one's first programming language have on one's ability to pick up subsequent languages with different notations? Can we design studies to factor out syntax from programming paradigm? At a low level, is it better to start indexing arrays at 0 or 1? Is the answer different for programmers with different backgrounds? At a high level, can we devise experimental designs to show whether syntax has an impact on how long it takes to learn a new language and on whether it contributes to program correctness in a positive or negative way? Is there any correlation between a programmer's subjective opinion of a language's syntax and his or her ability to write correct code in it in the shortest time possible? Does a subjectively hard to use syntax lead to fewer or more errors?

Lets get away from arguing based on subjective opinion and try to find some emphirical evidence to back up our thinking!

Key questions to consider are the assumptions behind each study, whether it looked at programmers or non-programmers, their level of experience, and which discourse community its test subjects were drawn from (i.e. the programming paradigm under consideration and any historical notations imbued in its literature).

Even if, at the end of the day, we find that syntax is indeed irrelevant, the exercise will be a signficant step forward.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

preattentive visual processing

I posted this message related to this subject in another thread.

Margaret Burnett

Margaret Burnett has a large bibliography on visual languages which might be relevant.

Previous threads of interest

#1, #2.

End Users and Problem Domain Languages

Having to provide solutions for my clients, attention needs to be focussed on two areas:

1. Providing a specific Problem Domain Language that allows an end user to develop solutions within the context of their problem domain. There are as many languages as there are industries. The one thing that I have encountered is the desire on the part of many "users" to have a language to describe their problem without it looking like a computer language. Many have expressed their distaste of having to learn a "computer" language. It appears to be "too" complicated and too hard.

2. Providing a standard translation process to a Solution Domain Language (used by programmers).

An area that "typically" works in both domains are the Companies that develop solutions in FORTH. They have pragmatic processes in place to create specific Problem Domain Languages when needed. My experience from the comp.lang.forth newsgroup is that the principals of these companies are quite willing to share their actual experiences in this matter.

They will have particular ideas that won't fit within the framework of LtU, but since you want to look at end user computing, their practical experience is relevant (some of it to 30 to 35 years).

Peter,

In relation to your comment regarding

constantly looking at code fragments for a large number of languages and it is more than enough of a challenge to keep their respective semantics sorted out without the added burden of juggling generally terse notation systems which "across languages" heavily overload the same symbolism.

This is a natural and normal problem when trying to understand the terminology of any problem domain. We just have to be prepared for it and be willing to learn. I have worked in Telecommunication, Postal, Airline, Accounting, Petroleum, Liquor, Retail and Sales and Marketing industries and they all have common terminology that means different things in each industry. This is a fact of life.

Developing an end user solution means knowing the language of the end user. The "trick" will be translating this into a common and hidden from the end user language. Your suggestion of using

2) Support End User Programming, by following the PLT Scheme language level approach, to let End Users slowly wade in deeper as they look at code and learn new concepts. (i.e. the platform should be able to serve as a teaching tool)

would only be appropriate for a small number of end users (from my experience over more than 20 years in commercial areas). Provide an appropriate PDL and you'll most likely have more success and as I said above this means many languages - each language based on the particular industry being looked at.

This would be a good area for research.

End users, natural language

I think it would be valuable to refine the idea of "end user". I've been programming for 25 years, but I'm often an end user -- for instance, I want to customize my email client, but I don't want to download a 100 MB tarball and compile from scratch. My abilities and goals are much different than a VP of marketing that knows Excel and wants to tie into our back-office database. I wonder if the only thing that makes us both "end users" is our desire to accomplish a task without dealing with concepts outside of the task's domain (concepts like Makefiles and database connect strings).

You might try following the citations in CiteSeer and see what you find.

Re: English-like syntax -- It's hard to find pro-arguments for NL-programming. Dijkstra believes it's a foolish endeavor. No one really likes AppleScript. The ubiquitous SQL Server 2000 has an English query interface, but why is it so obscure? I don't think NL-syntax is as important as appealing to problem domains that the user is familiar with.

Re: Inform 7 -- it's an interesting experiment, and it definitely has inspired me to think about the whole natural-language-programming thing differently. It may be the best example of literate programming ever conceived, in that the code is meant to read like a book and very few comments are needed. However I think this literary aspect appeals to I.F. authors' particular aestetic :) On the whole, it's been a very difficult language to learn to use correctly, even after reading the entire manual. There are just so many syntax constructions to remember. Then again, I've been corrupted by BASIC and C.

NL programming

Personally, I need all the help I can get when I program, which is why I prefer languages with claer syntax and rich static semantics ;-) I don't see why end-users, who are supposed to be less experienced at programming than I am, should get less support, so I don't find the idea of natural language programming very appealing.

There's another side to this issue, of course, but I don't feel like arguing it at the moment...

visual programming

I recall a comment from Professor Shneiderman about interfaces focusing on better visual interaction will always be better than interfaces based on audio interaction. Unfortunately I couldn't find the exact source of this comment. My personal hunch is that instead of relying on text based languages with a few grpahical guides (UML, ER, regex builders), we should experiment more with graphic 'languages' with text as a quick way to drop to a lower level (similar to the relationship between modern Windows or OSX and their command line interfaces).

The following video is an interesting example of visual 'programming language,' I have noticed that several graphics applications make use of such interfaces (naturally). The next version of Blender is supposed to have a 'node editor,' a graphical representation (nodes and links) of different operations, lighting, texture applied to 3d models (as far as I know, there are no video presentations of this functionality). The video mentioned above is from an experimental project which is not expected to be part of the next release.

The following video is an

The following video is an interesting example of visual 'programming language,' I have noticed that several graphics applications make use of such interfaces (naturally). The next version of Blender is supposed to have a 'node editor,' a graphical representation (nodes and links) of different operations, lighting, texture applied to 3d models (as far as I know, there are no video presentations of this functionality). The video mentioned above is from an experimental project which is not expected to be part of the next release.

The problem is, those two examples are fairly domain-specific. In fact, I didn't see anything in the first example that really demonstrates anything turing complete, so I'd be a bit hesitant to even call it programming.

The reason why there aren't any general purpose visual programming language is that text is such a compact way to view highly complex structures, such as programs. An advantage of graphical interfaces, though, is that it's relatively straitforward to infuse information into it that the IDE inferred. When you're dealing with text files, you can't very well start inserting things into the source code, so you have to be a bit more imaginative when it comes to presenting information, e.g. popups, colors, underlining, etc.

higher text graph encoding density

Curtis W: The reason why there aren't any general purpose visual programming language is that text is such a compact way to view highly complex structures, such as programs.

(I'll let Paul Snively tell you about his favorite general purpose visual programming language.) I worked on a domain specific visual programming language in 1993, aimed at Wall Street financial traders, which ran on NeXT boxes. I added a type system that permitted polymorphic oo dispatch. (Users had to add the metainfo showing inputs and outputs of one box mapped to another, then my compiler had to make sure subtyping didn't allow type violations.)

[Edit: it was a functional programming dataflow system with an oo type system; evaluation was lazy, depending on which outputs the UI tried to fetch values for.]

After thinking about it for a year, I concluded exactly what you just said, that text was just a more compact way to view the same sorts of structures. The trick is that coders have the ability to see the structures in text. Most people see text as gibberish in programming languages, even less challenging ones like HTML style markup.

After a year, we stared at the graphs we were building on this NeXT station and asked ourselves, why is this such low density? Is that the only reason why this is irritating? Why was the text representation of nearly the same bit of logic so much more compact?

Finally I got it, and told them the semantic information was graphs in both cases, but in the text form the graphs were encoded semantically instead of syntactically. Whereas a graphical language showed the graphs syntactically. But use of screen real estate was expensive (read: profligate) for visual graphs.

Programmers see the graphs of relationships between objects and dataflow. Layman don't. (Of course, mediocre coders often don't see them either, and write code by cutting and pasting text the way today's post modern music artists create by sampling and re-recording old music.) Later I took to remonstrating folks who make things too complex by saying, hey, we're just editing graphs here.

The IDE for a text based language might be willing to add some missing spatial dimensions for graph seeing impaired uses, which they might not realize is the intent of the new UI features, but has the effect of letting a user more easily experience the semantic graph content despite text encoding.

Epigram

When you're dealing with text files, you can't very well start inserting things into the source code, so you have to be a bit more imaginative when it comes to presenting information, e.g. popups, colors, underlining, etc.

Have you seen Epigram?
Granted, the current implementation leaves room for improvement, but I think the general idea of programmer collaborating with, err, IDE, shines through anyway.

I've seen it, but I'm not

I've seen it, but I'm not sure what you mean with "collaborating with the IDE."

On a conceptual level,

On a conceptual level, development in Epigram resembles a Q&A session with the computer asking you "so how do you do that?" and "how do you know that?" as you fill in details.

"...focusing on better

"...focusing on better visual interaction will always be better than interfaces based on audio interaction."

I imagine that would depend on the preferred interaction style of the users, right? A number of people are very strongly audiocentric, both in terms of there learning style and the manner in which they formulate ideas. I imagine the textual interface can be more a natural fit, at least for that subset.