Evaluation and Usability of Programming Languages and Tools (PLATEAU)

Found via Wadler's blog: PLATEAU is calling for papers on the HCI side of PL theory, design, and tooling.

Programming languages exist to enable programmers to develop software effectively. But how efficiently programmers can write software depends on the usability of the languages and tools that they develop with. The aim of this workshop is to discuss methods, metrics and techniques for evaluating the usability of languages and language tools. The supposed benefits of such languages and tools cover a large space, including making programs easier to read, write, and maintain; allowing programmers to write more flexible and powerful programs; and restricting programs to make them more safe and secure.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Level of language in ref to machine and its usabiliy

I think it is accepted as given than the higher level the language, the more concise the programmer can express his intent (except for VB, where You can only express drunken mumblings...). It would be interesting to compare same level languages vs. usabily:

  • can we improve on C or ASM to make it more usable without transcending levels? Do we need to?
  • Maybe all we need is better optimizing compilers / transformers and low level languages shuold be left as evolutionary dead ends that serve only as compilers to higher levels?

Conciseness

I think it is accepted as given than the higher level the language, the more concise the programmer can express his intent.

True (with some caveats), but unfortunately the relationship between conciseness and usability is far from clear, particularly given ranges of programmer skill

I'm not so sure. A popular

I'm not so sure.

A popular research direction is to enable computations over types. To be useful/meaningful, this would preclude abstractions that violate them (..unless there's a gradually but dependently typed language out there?). Is supporting operations over types worth forcing programs to type check (beyond Dynamic)? Perhaps macros and eval are higher-level so my argument is backwards.

I'm not so sure. A popular

I'm not so sure.

A popular research direction is to enable computations over types. To be useful/meaningful, this would preclude abstractions that violate them (..unless there's a gradually but dependently typed language out there?). Is supporting operations over types worth forcing programs to type check (beyond Dynamic)? Perhaps macros and eval are higher-level so my argument is backwards.

Maximizing Productivity

Maximizing productivity means maximizing the following:

* Feedback cycle speeds (error reporting, program correction, coding, debug cycles, compilation vs. interpretation during development)
* Automation of programming tasks (macros, "abstract syntax editors", program transformation provided by IDEs, type systems as built-in DSLs)
* Leveraging of program code (reusability, expression problem, code libraries vs. enmeshed implementation)
* comprehension of program code (linear execution semantics & declarative code vs. nonlinear, dynamic execution semantics & imperative code, context-dependant semantics vs. context-free)

Like anything in life, productivity is something that can be reasoned about mathematically & formally, if you can step outside the box of being clingy to "this" language idea or "that" language idea -- "this" language design or "that" language design. If you're stuck in the tar-pit, you can't see much farther than the walls surrounding you.

Please Elaborate

  • comprehension of program code (linear execution semantics & declarative code vs. nonlinear, dynamic execution semantics & imperative code, context-dependant semantics vs. context-free)

Like anything in life, productivity is something that can be reasoned about mathematically & formally

I'm curious. How is it you plan to mathematically and formally reason about comprehension of program code?

Very funny,

I didn't mean that you take things that far. The point is that you can reason about what works and what doesn't. Productivity is an engineering problem.

In jest, however, I've thought about your question.

To measure the comprehensibility of code:
1. measure the work it takes to interpret it, maybe even the complexity of the interpreter required to interpret it without error (since this can sometimes determine how the programmer is required to interpret the code).
2. measure the work it takes to extract the meaning of the code

Another way is to measure:
a. the number of concious steps taken by the reader to interpret it.
b. the number of variables requiring concious maintenance to interpret it.

To measure the work it takes to extract meaning from the code, we consider the "semantic linearity" of routine calls (defined as how much of the implementation of a routine we must understand to understand the invocation of it). This includes factors such as the routine's name, how well it maps to a concept, whether it has unintuitive side-effects, etc.

Measurements

1. measure the work it takes to interpret it, maybe even the complexity of the interpreter required to interpret it

Ok, that's probably measurable, but you'd be measuring the wrong thing - a language can be very simple to parse but tough for a human to read. Brainfuck comes to mind. (Was that 15 plusses or 16?)

2. measure the work it takes to extract the meaning of the code

Can you measure that? What units would you use? :)

a. the number of concious steps taken by the reader to interpret it.

And this is even harder to define.

I think you'd need to go for a much less detailed measuring strategy. Just get two groups of programmers, give one group a program written idiomatically in one language, and the second group the same program written idiomatically in another language, and measure how long it takes them to understand the programs enough to add a simple new feature.

Of course, you'd have to control for various things, like their level of experience with their language; and I can imagine debates about exactly what is "the idiomatic way" of writing something in a given language...

Controlling for experience

Of course, you'd have to control for various things, like their level of experience with their language; and I can imagine debates about exactly what is "the idiomatic way" of writing something in a given language...

It seems that controlling for the level of experience that people have with two languages is only slightly more tractable than "measuring the [mental] work it takes to extract the meaning of the code." ;-)

Take a language such as Scheme, ML, or Haskell that in some sense were discovered more than they were invented. Get in a time machine and go back in time shortly after they were created. Most people working with these (then new) languages still had a lot to learn about their language. It takes time to develop senses of what good idiomatic code really is.

Also... how do you control for familiarity with a given algorithm or type of task being accomplished? I'm guessing this will be comparable in importance to your kind of experiment as familiarity with the language itself.

Clearly, from a human programmer's point of view, some languages are far more than others. How to reasonably and rigorously measure this isn't clear to me. Is relying on opinions of programmers with a good sense of language the best we've got?

(e.g. Joe Armstrong did quite a few usability studies in the process of creating Erlang; but I don't doubt he projected some of his own biases to a substantial degree onto the studies and the language...)

Uh?

>Like anything in life, productivity is something that can be reasoned about mathematically & formally

Except that 'real' productivity also depends of programmers training, language syntax, etc.
Things that are not so easy to reason mathematically with concrete numbers.

You can still model those things,

measure them, and discover what works and what doesn't.

IMNSHO, language syntax is pretty much a non-issue if you have a good IDE.

For these kinds of things, we simply model the learning curve.

Also, these kinds of things fall better to empirical data. We don't have to capture an exact notion of every minute detail -- the point is to discover the fundamentally sound facts that illuminate what works and what doesn't.

Can you back up any of what

Can you back up any of what you're claiming?

Back up with what?

If you mean with someone else's research, that argument doesn't hold: there is always someone that has to be the first one to try something. What is that person going to do, seeing as he has no predecessor.

If you mean with more reasoning, I must say that most of this is obvious, and one day, may turn out to be obvious in retrospect.

It's called "just do it." Necessity is the mother of invention; anything observable and measurable can be mathematically modeled.

What you say does not make sense. Perhaps you could elaborate on what you mean.

Mmm

>IMNSHO, language syntax is pretty much a non-issue if you have a good IDE.

Of course, a 'good' IDE will make Perl programs easy to maintain [sarcasm].

A sufficiently good IDE is

A sufficiently good IDE is indistinguishable from magic.

Usability <> productivity

As some other people noted, productivity is meaningful only when the user is trained.

Before getting into the details, why not look at the definition of usability. According to Wikipedia:

* Learnability: How easy is it for users to accomplish basic tasks the first time they encounter the design?
* Efficiency: Once users have learned the design, how quickly can they perform tasks?
* Memorability: When users return to the design after a period of not using it, how easily can they re establish proficiency?
* Errors: How many errors do users make, how severe are these errors, and how easily can they recover from the errors?
* Satisfaction: How pleasant is it to use the design?

It also states that usability is a qualitative attribute.

Effectiveness

Usability requires that one can accomplish the right tasks (ideally on the first attempt, but that's efficiency).

Chicken-and-egg

I don't see any way to advance usability into the mainstream of academia, thus making it an important problem many academics will focus on, without a textbook or "lecture notes" book that pieces together an ensemble of the state-of-the-art. This task is immensely time-consuming, but what I am doing now. Usability is a combinatorially explosive features problem.

Nice comments from this thread, though, thanks to Winheim Raulsh and David Barbour.

Most top departments have

Most top departments have folks that work on usability and programming languages. E.g., read about work from the groups led by Brad Myers (CMU), Rob Miller (MIT), Scott Klemmer (Stanford) and now Bjoern Hartmann (Berkeley -- his work on rapid prototyping is of interest for those in this thread and the frp/live programming groups). They're really in the CHI/UIST communities: I follow their particular work because I like their approaches, but you'll find similarly intentioned work by others and with other approaches (e.g., FSE/ICSE, but also more far out -- e.g., learning research in cog sci and I even remember some stuff in the neural net community).

Now why LtU favorites of PLDI/POPL/ICFP (largely) ignore this style of work ... that's another topic. This thread reinforces the idea that the mainstream PL community has silo'd itself away from the usability one so this knowledge isn't making it over nor being endorsed for examination by the core community of those who write the guts relevant to 'mainstream' languages.

Now why LtU favorites of

Now why LtU favorites of PLDI/POPL/ICFP (largely) ignore this style of work ...

I will leave aside the conferences, but as regards why this type of work doesn't regularly appear on LtU my guess would be that none of the contributing editors follows this work. The interesting question is why no one who follows this work wants to volunteer to be a contributing editor...

Touche ;-) What would the

Touche ;-) What would the mechanics of that involve?

The mechanics are simple:

The mechanics are simple: you email me saying you want to become a C-E. I upgrade the account. From then on, you can post to the home page when you get the urge.

A hunch

I think Leo and I would both be willing to contribute, but don't expect a flood of materials (at least from me). I'm very conservative in what papers I save after reading.

People are interested, the

People are interested, the work just has to be rigorous and not applying other fuzzy soft methods (e.g., Greene's cognitive dimensions of PL won't get an author very far as proper evidence). Then their are dogma and informal principles (e.g., see Ruby's design document), which is really hard to evaluate scientifically. If someone could do it, that would kind of be a breakthrough style contribution.

I'm not sure the work in CHI/UIST is really up to real PL standards. These papers tend to ignore PL as a proper field and lump it into CHI, which leaves me unsatisfied as a PL researcher.

people who live in glass houses

If a CHI paper proposes that a system makes a task easier to accomplish, the is an expectation for a corresponding user study contrasting the task with and without the approach. In contrast, in PL, casting something as tractable viz. analysis or lowering false positive count is sufficient. One of the most informative parts of a paper should be the evaluation: what worked, what didn't, and how did it fit into the bigger picture. Not including an analysis of a person actually using the language for its intended task is suspicious.

People who live in glass houses...

I think its about time

I think its about time someone came out with some research with a good PL-oriented user study. However, I have very little idea on what the right strategy is for doing such a study. The CHI studies I've seen are very user oriented and I'm not sure they would work well for programmers (vs. end users). How to measure that PL method A is better than PL method B? One way would be to sit users down and ask them to solve problems using each method, but the sample would have to be large to get any reasonable results, and if one method is established, there will be significant bias in the results. The only strategy I can think of is to somehow break up the methods into more fundamental components that can be evaluated without much bias.

I've published papers where a user study was definitely in order. However, given the the challenge/lack of knowledge in doing such a study, I've had to come up with other metrics instead, and I've never been very satisfied with these sections.

Another issue

Another problem that I've had with this type of work is its tendency to focus on novice users, or at least novice users of the system under consideration. I think it's nice to study which types of systems are easy to learn or good for new users, but I'm much more interested in optimizing for the expert.

This phenomenon is well-known in the world of programming language design. Given a substantial innovation in language design, it may take years of community experience before common idioms and usage patterns are established (even for the designers). I don't know of any way to study this phenomenon in a controlled way, but punting by restricting attention to novices is not a satisfying answer.

Of course it's possible to argue that the system that's easiest for the beginner is also most productive for the expert, but without evidence that's just a matter of ideology.

This is the single biggest problem I've had with usability research as applied to PLs. That said, I'm not deeply familiar with the literature. Obviously this problem occurs in many areas: usability studies would not likely lead to the development of the violin, for instance. So I wonder how the usability community deals with this in general?

I like the idea of focusing

I like the idea of focusing on experts, or at least non-beginners. My criticism was that most of the HCI focus was on end users in PL (i.e., end user programming).

Perhaps language design is more of an art than science, meaning it relies more on creativity and experience vs. well defined principles and science. In that case controlled study might not be possible, and much of what passes through POPL/PLDI/OOPSLA/ICFP will have little impact since innovative language ideas cannot be presented their without some kind of evidence. Perhaps the role of a PL conference with respect to language design is more reflective on PL features that have already become popular (for whatever reason).

So the best place to vet new PL design ideas is...in an implemented language. Then let the crowd decide?

E.g., read about work from

E.g., read about work from the groups led by Brad Myers (CMU), Rob Miller (MIT), Scott Klemmer (Stanford) and now Bjoern Hartmann (Berkeley -- his work on rapid prototyping is of interest for those in this thread and the frp/live programming groups).

Gina Vinolia's work at MS is also pertinent. Perhaps out of the names you've mentioned, Myers has been working at this the longest. Vinolia, btw, is the MS researcher who proposed MS convert Outlook into what Google is now calling its Gmail killer: Wave. (She proposed this in 2003, way before Wave ever came about. It even reached the front page of Slashdot. Yet, IMHO, judging by MS employed bloggers, most at MS were dumbfounded by Wave when they first saw it, and struggled to understand how it was any different from LiveMesh.)

I think MS and Hewlett-Packard will probably be the two best places to look for guidance moving forward in usability research. HP is getting more pressure on its automated testing product line thanks to Micro Focus's recent acquisition of Borland.