Staking Claims: A History of Programming Language Design Claims and Evidence

Interesting paper I found on usability and PL, abstract:

While still a relatively young field, computer science has a vast body of knowledge in the domain of programming languages. When a new language is introduced, its designers make claims which distinguish their language from previous languages. However, it often feels like language designers do not feel a pressing need to back these claims with evidence beyond personal anecdotes. Peer reviewers are likely to agree.

In this paper, we present preliminary work which revisits the history of such claims by examining a number of language design papers which span the history of programming language development. We focus on the issue of claim-evidence correspondence, or determining how often claims are or are not backed by evidence. These preliminary results confirm that unsupported claims have been around since the inception of higher level programming in the 1950s. We stake a position that this behavior is unacceptable for the health of the research community. We should be more aware of valiant and effective efforts for supplying evidence to support language design claims.

I found this paper because I've been trying to answer the following question: is user testing performed on PL designs beyond the end-user kind. If yes, how is it done? If no, are we just lazy, or is user testing fundamentally inappropriate for PL design given the learning curves involved?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Author and date

Some additional information: the author is Shane Markstrum and the paper was published in at the PLATEAU 2010 conference.

PLATEAU is a recent conference on "Evaluation and Usability of Programming Languages and Tools". It is, as was expected, right on spot for LtU, and has a been mentioned here.

In the 2010 proceedings, the paper Hard-to-answer questions about code (PDF) also piqued my interest.

How quickly I forget LtU

How quickly I forget LtU threads! Just goes to show that papers are more valuable when you need them than when you are just browsing.

Somewhat unrealistic

It seems to me that it is somewhat unrealistic to expect first papers or worse press releases (as used for C++ and Java) to contain measured user experience data.

Its chicken and egg, until the first paper is published to introduce the language and encourage a user community the only users are likely to have been the developers and their immediate colleagues.

It would be better to examine and expect such studies after some experience has been gained in the idioms of the particular language.

But I suspect that those studies haven't been done either :-)

Which is something for the PL community to consider.

This is really hard... user testing fundamentally inappropriate for PL design given the learning curves involved?

User testing is an extremely good idea, but statistical testing of these user tests is generally valueless for programming language design.

You put your finger on the reason precisely: there are big learning effects, which puts you on the horns of a fatal dilemma. Attempt to model the learning process, and the the parameter space blows up so much that statistical tests will always be inconclusive. Fail to model the learning process, and you get conclusive results which are worthless due to model mis-specification.[*]

IMO, a better approach is to conduct the tests, but to use exploratory data analysis techniques to develop a better qualitative understanding of what's going on. It's hard to publish stuff like that at traditional software engineering venues, though. It's counter to the methodology of software engineering research, which says that facts must have passed traditional hypothesis testing.[**]

[*] Exception: there are interesting specific questions which do not require modeling the learning process. A good recent example of this is Marceau, Fisler and Krishnamurthi's CSE paper "Measuring the Effectiveness of Error Messages Designed for Novice Programmers".

[**] This has caused me difficulties in the past, when I've tried to publish in SE venues -- things my home communities (verification & PL) take to be uncontroversial facts (like higher-order imperative programming being hard) aren't facts according to the standards of SE research. I think Weltanschauung barriers like this are the hardest part of interdisciplinary work.

On McCarthy's note

When McCarthy wrote "this notation is writable and somewhat readable" he didn't mean "particularly readable" as Markstrum understood; contrary, McCarthy believed that it is possible, but not easy to read S-expressions. That's why he wrote "it can be made easier to read and write at the cost of making its structure less regular."

Agree with Kazimir

Yeah, i thought so too. (That's why i registered yesterday...)

Markstrum's phrasing in the

It may be related that Markstrum's phrasing in the subsequent paragraph does not obviously suggest awareness of the difference between S-expression notation described in the 1960 paper and in modern Lisps.

good goal, but just the tip of the iceberg

I applaud the goal in this paper. PL designers shouldn't be able to just claim whatever they like without evidence. However, the analysis in the paper doesn't seem so good.

One thing missing is that I know of the claims that many languages are intended to support, and I largely don't see those being analyzed in this paper. I don't believe that all "golden age" languages were only motivated by being "natural" and "easy to read", but those are the only examples given. History is being, ahem, highly simplified in this paper.

I do agree that the arguments about "natural" and "easy to read" languages deserve to be lambasted. Nobody agrees on what "natural" means, to begin with. Additionally, neither of these things seem like especially important goals. It's going to take a while to really learn to use any language, and reducing that time just doesn't seem especially important.

Not all golden-age designers limited themselves to such arguments, however. For example, Alan Kay carefully argued that some problems are better solved if (a) you have control abstraction, as the lambda in functional languages, and (b) if you think about transforming data structures rather than mutating them in place. As evidence, he submits two lines of argument. First, he walks through the kind of thinking you do with each style. Second, he points out an instance where he asked one of the best imperative programmers in the world to solve the problem imperatively, and the guy got really stumped.

This leads to the main thing I wish the author would look into. Good claims about programming language expressiveness are usually not based on user data. I would call that market research, and while it's important, it's unlikely we will see a lot of PL experts who are also good at marketing research. If the hope is that PL people get better at marketing, I sort of agree, but it would be nice to see some positive pointers on how to make this happening.

Better, though, would be to start with the sorts of evidence PL designers currently put forward. The best ones I've seen involve describing a problem and then proposing a solution to it. Minimal effort is put into justifying the problem itself; that's the market research part that is usually weak. A great deal of effort, however, is sometimes put into showing that a given solution solves that problem.

As an example, the paper describes definition- versus use-side variance and complains that there's no evidence that use-side variance is a problem developers are facing in practice. Fair enough, but there are certainly arguments being made and evidence submitted that definition-side variance removes cognitive load from developers. For example, here's a claim Martin Odersky makes to defend definition-side variance:

"Variance is something that is essential when you combine generics and subtyping, but it's also complex. There's no way to make this completely trivial. The thing we do better than Java is that we let you do it once in the libraries, so that the users don't have to see or deal with it."

It's possible to break down this claim: go examine just how much of the work ends up being done by library writers versus users. As a hint, in Java it's zero, so it seems impossible to do worse.

I'll stop there. Again, it's an excellent topic. How do we really know when a language is good at what it's claimed to be good at?

This leads to the main

This leads to the main thing I wish the author would look into. Good claims about programming language expressiveness are usually not based on user data. I would call that market research, and while it's important, it's unlikely we will see a lot of PL experts who are also good at marketing research. If the hope is that PL people get better at marketing, I sort of agree, but it would be nice to see some positive pointers on how to make this happening.

In HCI, they try to separate market research from scientific evidence. Instead, you are supposed to come up with quantitative or at least rigorous qualitative tests to validate parts of your arguments. The easiest to test empirically are claims of "intuitiveness:" put users in front of the tool and see if the tool is really intuitive, measure their performance via time and output.

Tools with higher learning curves are harder to measure quantitatively, as they rely on expert knowledge that is hard to acquire and vary widely on those that have acquired it. Instead of doing a test, you could try to do an "expert survey" where tool use is not involved directly. But this is hardly evidence for a scientific paper, perhaps more useful in guiding design.

As PL researchers, we are pretty far away from providing the evidence that HCI researchers provide for conferences like CHI or UIST. However, a HCI researcher once told me that it was easy to play experiments: tell me the result you want, and I can design a test that delivers that result. So we should have some healthy cynicism when looking at that field for inspiration. The ultimate gauge of PL success is adoption, which usually comes too late for use as evidence in a publication AND can be influenced by other factors beyond usability and technical merits.

The easiest to test

The easiest to test empirically are claims of "intuitiveness:" put users in front of the tool and see if the tool is really intuitive, measure their performance via time and output.

I forgot where I read this, but 'intuitive' should raise alarm bells; neelk remarked on something similar as well. Intuitive often correlates with similar to what is already done. What is already done may be due to network effects etc., which isn't what we want to measure, and adds an additional barrier to radically different.

This is not to say intuitive to people cannot be designed for. We know (particular forms) of sequential reasoning is cognitively natural; direct style is thus desirable. When actually testing, we can stratify the sample to help control for bias, etc.

The ultimate gauge of PL success is adoption, which usually comes too late for use as evidence in a publication AND can be influenced by other factors beyond usability and technical merits.

As you said in another thread, then we should all be aiming for PHP :) I view adoption as a distinct design goal, and, in addition, something that should influence the *design methodology* -- as for measures of particular features, it seems too noisey.

I'm increasingly less and less believing in the need for rigorous user testing: when looking at a feature, I believe analytic arguments are sufficient for ground level design, and unless a surprise happens in basic user testing to highlight that something noisey is happening, looking deeper isn't worth the effort. Perhaps I'd feel different from radically unusual designs or domains, but that seems more like the exception. More fruitful is probably going in the other direction: instead of testing a language feature, test a usability principle for relevance. This would also help answer the question of, for other features, what to look out for.

I'm increasingly less and

I'm increasingly less and less believing in the need for rigorous user testing: when looking at a feature, I believe analytic arguments are sufficient for ground level design

I'm not entirely convinced either. Engineers are expected to learn plenty of analytical techniques which are not "intuitive" or "natural", but no one criticizes these techniques on this basis. Engineers are just expected to learn them because the techniques are powerful tools for solving problems safely.

Programming languages are tools as well, albeit very general tools. Natural intuitiveness would be nice to have, but analytical power is more important.

You suggest PL features need

You suggest PL features need not be (very) usable if they provide analytic benefits -- I agree to an extent (there is a spectrum of needs). However, my point was more that, when concerned with human factors, usability etc. can be driven by analytic arguments (e.g., principles that have already been established).

"Usability" might exclude benefits

I agree about analytic arguments.

A user might claim to be more productive initially, but be hampered in the long run. This brings to mind this blog post:

There is another belief that goes deeper, and it is the reason that after decades of existence and millions of newbie-suffering-hours, the learning curve has not become any easier, or gone away. That belief is: the learning curve has value, it is essential for learning, and it needs to be preserved, not whittled away in the name of “ease-of-use.”

The article goes on to quote some research that backs this belief.

Even a long usability study might misleadingly show a programmer more productive or able to solve certain problems more elegantly, but not show the loss of generality or the cognitive advantages inherent in using another approach.

Source for Alan Kay story

Lexspoon, Can you give a source for your Alan Kay story?

The Early History of Smalltalk paper, clearly... favorite Computer Science paper of all time.

Below my (a) and (b) line up with Lex Spoon's (a) and (b).

(a) Too many amazing quotes to list here about LISP, FEXPRs, "What seemed to be needed was complete control over what was passed in a message send; in particular when and in what environment did expressions get evaluated?", and getting making the non-functional "special forms" in LISP first-class. But this is probably the most revealing:

“Object-oriented” Style
This is probably a good place to comment on the difference between what we thought of as OOP-style and the superficial encapsulation called “abstract data types” that was just starting to be investigated in academic circles. Our early “LISP-pair” definition is an example of an abstract data type because it preserves the “field access” and “field rebinding” that is the hallmark of a data structure. Considerable work in the 60s was concerned with generalizing such structures [DSP *]. The “official” computer science world started to regard Simula as a possible vehicle for defining abstract data types (even by one of its inventors [Dahl 1970]), and it formed much of the later backbone of ADA. This led to the ubiquitous stack data-type example in hundreds of papers. To put it mildly, we were quite amazed at this, since to us, what Simula had whispered was something much stronger than simply reimplementing a weak and ad-hoc idea. What I got from Simula was that you could now replace bindings and assignment with goals. The last thing you wanted any programmer to do is mess with internal state even if presented figuratively. Instead, the objects should be presented as site of higher level behaviors more appropriate for use as dynamic components.

Even the way we taught children (cf. ahead) reflected this way of looking at objects. Not too surprisingly this approach has considerable bearing on the ease of programming, the size of the code needed, the integrity of the design, etc. It is unfortunate that much of what is called “object-oriented programming” today is simply old style programming with fancier constructs. Many programs are loaded with “assignment-style” operations now done by more expensive attached procedures.

Where does the special efficiency of object-oriented design come from? This is a good question given that it can be viewed as a slightly different way to apply procedures to data-structures. Part of the effect comes from a much clearer way to represent a complex system. Here, the constraints are as useful as the generalities. Four techniques used together—persistent state, polymorphism, instantiation, and methods-as-goals for the object—account for much of the power. None of these require an “object-oriented language” to be employed—ALGOL 60 can almost be turned to this style—and OOPL merely focuses the designer’s mind in a particular fruitful direction. However, doing encapsulation right is a commitment not just to abstraction of state, but to eliminate state oriented metaphors from programming.

Perhaps the most important principle—again derived from operating system architectures— is that when you give someone a structure, rarely do you want them to have unlimited privileges with it. Just doing type-matching isn’t even close to what’s needed. Nor is it terribly useful to have some objects protected and others not. Make them all first class citizens and protect all.


One little incident of LISP beauty happened when Allen Newell visited with his theory of hierarchical thinking and was challenged to prove it. He was given a programming problem to solve while the protocol was collected. The problem was: given a list of items, produce a list consisting of all of the odd indexed items followed by all of the even indexed items. Newell’s internal programming language resembled IPL-V in which pointers are manipulated explicitly, and he got into quite a struggle to do the program. In 2 seconds I wrote down:

    oddsEvens(x) = append(odds(x), evens(x))

the statement of the problem in Landin’s LISP syntax—and also the first part of the solution. Then a few seconds later:

     where odds(x) = if null(x) v null(tl(x)) then x
                        else hd(x) & odds(ttl(x))
          evens(x) = if null(x) v null(tl(x)) then nil
                        else odds(tl(x))

This characteristic of writing down many solutions in declarative form and have them also be the programs is part of the appeal and beauty of this kind of language. Watching a famous guy much smarter than I struggle for more than 30 minutes to not quite solve the problem his way (there was a bug) made quite an impression. It brought home to me once again that “point of view is worth 80 IQ points.” I wasn’t smarter but I had a much better internal thinking tool to amplify my abilities. This incident and others like it made paramount that any tool for children should have great thinking patterns and deep beauty “built-in.”

I disagree with much

I disagree with much of what he says in part (a). Neither are ADTs "superficial" in terms of encapsulation, nor do they have anything to do with "field rebinding", or state, or "assignment-style".

"Replace bindings and assignment with goals"? Simply putting state changes behind some higher-order abstraction doesn't eliminate them, it is just sweeping them under the rug most of the time.

Stateful vs stateless and object-oriented vs ADT-based programming are two completely separate axes, and he muddles them. I even think his argument is backwards: in fact, OO often makes it too convenient to hide away state, so people keep sneaking in more of it more frequently.

I thought he was railing against JavaBean style components

say you have a vector, then you can access any piece of that vector. There's really no reason why you can't do this, since the representation is exposed.

Elsewhere in the paper he talks about why exposing the representation is something he didn't think was that useful, since he argues that the only time you really care about the representation is when a computation's result didn't make sense.

Therefore, I don't think he is muddling anything. Instead, you are neglecting to realize this is a HISTORY paper and is told in chronological order with some side tidbits and perspectives thrown in here and there as the story emerges. -- I tried to quote as big a paragraph as possible for this reason.

Of course

Of course, exposing representations is bad, that's what encapsulation prevents. And ADTs do that just fine. Can you elaborate on what you are trying to say with your vector example specifically, and how it relates to what I said, especially w.r.t. mutable state?

For example ...

"I mean that the pure language [Lisp-KM] was supposed to be based on functions, but its most important components -- such as lambda expressions, quotes, and conds -- were not functions at all ... My next questions was, why on Earth call it a functional language? "

For my taste, this is stunningly beautiful.

...and stunningly out of context

He is pretty clearly saying the special forms should be functional, and that Lisp should be a true functional language. He is very clearly not referring to a dynamically scoped Lisp, when you read the whole paper, since he references Irons' IMP work and his distaste for calling OO dispatch "polymorphism". It is even clearer when you realize that Kay is now acting on many of the "loose threads" he discusses in this paper as part of his Foundations of New Computing project. For a great example of what he was getting at, look at the language Nile which is defined in terms of OMeta.

If you want more info on this, Ian Piumarta has given presentations on how the "monstrous bug" in LISP 1.5 manual is something VPRI aims to tackle as part of their FONC project.

Good questions, few answers

As the paper says, we are just starting to seriously grapple with this issue. There are few if any solid empirical results on PL usability. Gilad Bracha gave a keynote at the same workshop in which he declared that the aesthetic judgment of the language designer is all we have. That led to a lively discussion. You should come to the workshop this year and join the fun.

As for market research, I like what Steve Jobs said: "It isn't the consumers' job to know what they want."

These claims tend to be subjective. What would "proof" mean?

Asking for formal proof of claims which are subjective seems like an admission that one is doomed to frustration. And most of the claims made about programming languages are subjective.

Something *is* being claimed when someone says that, say, LISP is easier to use than assembly language, and for most people and most applications it's completely true. But it's hard to measure or formalize exactly what the subjective word "easier" means. Further, what is "easier" is different for different people and different applications. The claim may be false when taken as a universal (for example, in writing device drivers when your LISP has no memory-layout control or explicit pointers).

The things we can formally prove with evidence, such as an explicit formal semantics, type safety, referential transparency, control or absence of side effects or mutations, and so on, are not really the things we care about. We care about ease of use, productivity, the reusability of code, ease of project management, and showing that programs correctly implement specifications.

And the things we care about, We can't prove. In formal semantics where everything has to be a hundred percent true for a hundred percent of the users in a hundred percent of the cases, the claims most important to us don't mean anything. Even the claim that we can prove a particular program implements a particular specification is not formalizable, unless the specification itself is provably consistent and unambiguous. And if the specification is provably consistent and unambiguous, then its development entails the same management of complexity as the development of code.

Formal languages are tools to make human intellectual effort easier, and human intellectual effort is surprisingly resistant to analysis by formal methods. While formal languages have mathematical foundations their value as tools and their conformance to the claims made about them depends mostly on the squishy, non-formalizable stuff that goes on between human ears.

That lack of mathematical

That lack of mathematical proofs isn't unique to PL design. The question is what other methods should we examine (and how in particular) -- e.g., analysis, qualitative studies, and psuedo-random quantitative experiments are common classes of solutions.

Sean is asking for psuedo-random quantitative while
I think analytic reasoning gets us far (though there are
far too few
established principles today for guiding it).

Principled Programming Language Design

What principles can you name today?

I personally follow Ka-Ping Yee's principles of secure interaction design to guide language and library design. These principles were developed for UI in general, but my position is that UI should essentially be a programming language with progressive disclosure.

Thomas Green gives us some dimensions with which to understand or measure some UI-qualities of a language design.

I believe CSCW principles would also apply well.

At least Green's cognitive

At least Green's cognitive dimensions (CDs) are very qualitative. They are great principles to guide your design, but their are no empirical tests that you can run to ensure that your design adheres to those principals. Also, implementing a principle is fuzzy with varying shades of gray.

We are still in the realm of best practices, which is good enough I guess.

From cognitive science, one

From cognitive science, one review I enjoyed years ago was The Big Book of Concepts, which is great for understanding the human side of representation and general models. I forgot if it was from that book or the general reading I had been doing back then (a lot of low-level cog sci stuff, often at the level of neuroscience), but that was when I started to accept imperative programming in the small.

For the past couple of years, I've been focusing on adoption. E.g., Roger's user-centric adoption procedure made me think about evaluation, trials / trialability, and reinvention steps as current blind spots for language support -- his book, another literature survey, is great! A colorful lesson I pulled from it is that safety and long-term oriented techniques are inherently challenged by observability: the challenges facing verification techniques are similar in many ways to that of safe sex approaches. There are a variety of diffusion models (network, ecological, probit/economic, ...), all of which we can learn from (e.g., switching costs, technological expectations, ...). A recent twist on all this, recently, has been a switch in overall methodology (rather focus on particular features to address): social constructionism rather than technological determinism.

David is spot on about CSCW. The psychology and cog sci of programming stuff seems important for comprehension of general programs (e.g., why prolog programs are probably too hard to understand when employing cuts). CSCW, and sociology in general, help understand languages in multiuser settings, whether intended for multideveloper projects or a userbase distinct from the developer one. As three examples:

Control/order/uncertainty: users may be hesitant to act if there is great uncertainty in the system. Introducing control and order reduces uncertainty increases (and guides) participation. Consider style guides: today, we can easily write checks about the 'current' program, packaging many per-organization checks like no statics (compiler flag) or passing the right type of information through an API. However, rules exist about comparing the current version of a system against previous ones: regression testing on functional per-program properties are, again, easy -- a test suite. However, ideas like performance regression requires two versions of a program to compare. We could think of adding more control and order for this particular scenario, or, even better, package the high-level mechanism and let individual organizations customize it.

Metanotifications. It isn't just the message, but what wasn't said, when it was said, who said it, and all of the above relative to what usually happens. Languages typically are user and principal agnostic. Work in security is slowly changing this (e.g., increasing emphasis on provenance), as is in version control systems (which are still frustratingly divorced from both code and IDE), but it is clear that there is much more than just the code. We might glean from bad indentation that something is fishy from the code or some suspicious comments, but that's pretty paltry: we still lose a lot of the context in day-to-day activities.

Finally, gamification and tacit knowledge / crowding out. Gamification is a mechanism to help encourage and guide use, but it also provides tunnel vision (it's great for early stage or crude efforts). In contrast, in the long term, tacit knowledge is crucial, for which extrinsic motivators are a bad match. My thoughts here are more about the software engineering aspects right now, but I think code-level applications aren't too far fetched (e.g., associating various badges like well-tested with functions for a 'novice developer' view of code).

Perhaps worth noting -- I wasn't even really cherry picking there. There's a lot of stuff and much of it you'll see at introductory or near-introductory level sociology courses!

Formalize your expectations

You claim that the things we know how to study formally are not "the things we care about" (ease of use, productivity, reusability of code, ease of correctness checking), and that those cannot be formalized.

I do not think it's true. I claim that

1. Most formal properties of language currently studied have effects on those things you care about.

2. With more work, more aspects of the things you care about would be amenable to a formal approximation.

I think that (1) is a reasonable assumption : one reason why researchers study type safety and referential transparency for example is that they feel it has impact on those things they also care about.

Claim (2) is a bit more ambitious but I think it can be done. In this comment, dmbarbour expose some ideas that are both formulated as semi-formal properties of a language ("closure under composition"), and desirable for the programmer ("scales to programming in the large"). I see plenty of formal ideas that would be related to reusability (for example the admissibility of a weakening rule, which doesn't exist in full generality in eg. Erlang because of the variable matching rule).

Of course, there is a formal/informal gap that will always be present. But this is the case in most sciences, and that does not mean there is no hope to make it reasonably small for a lot of interesting cases.

1. Most formal properties

1. Most formal properties of language currently studied have effects on those things you care about.

This perspective is problematic in at least two ways:

1. If we get some particular formal property, how do we know it lines up with the more general goal? E.g., how do we know it didn't introduce harm elsewhere?

2. If we get some particular formal property, but nobody uses it, does it really have have an effect on those things we care about?

Another perspective is from accepting social constructionism and the sociotechnical gap (I really should just finish this darn essay and stop harping on it here!): if society will build what it needs, build what will be adopted.

If you believe the idea behind the formal property is what is needed, demonstrate the need, adoptibility, etc. Details come out, e.g., something nice in a lambda calculus may be terrible when mixed with more important productivity features.

My view is also problematic (and your original view is more in line with the establishment). However, I'm increasingly seeing it now that I'm looking for it (e.g., Proebstring's great talk on disruptive language technology), and, jarringly, seeing what happens when you don't follow it (e.g., Erik Meijer's wonderful Used Language Salesman retrospective and analyses of the OLPC fiasco).

Difficult, not problematic

I don't think there is any contradiction here : I agree with most of what you say and see it as developments of the well-known claims I reproduced.

Of course, if we want to argue that a particular formal property has desirable effect in the informal world of users, we have to jump over the gap and argue for it informally, possibly using real-world studies, etc.
Agreed, this is not very often done *rigourously* for actual formal properties, most people on the formal sides tend to assume those relations as folklore and widely accepted.

I have read both your references, Proebstring's talk and Eric Meijer's (could you be more specific regarding the OLPC one?). The latter one is indeed wonderful. Still, the basic idea in this paper is that we need fundamental, formal research to provide solutions to hard problems, and then we also need people to integrate the good (and simple) parts into mainstream technologies. In this regard it seems aligned to my point of view of not questioning the eventual usefulness/relevance of formal language research.

OLPC and bananas: spirals, not pipelines

OLPC is probably now the text-book example (questionable) technological determinism (top-down approach of creating a theory, throwing it over the wall, waiting for profit): .

I don't want speak against doing what OLPC did -- compared to others, it was something. Likewise, it's great to randomly go off purely in theory. However, at the same time, it is also very fruitful to spiral theory with practice, such as helping you become a banana man without worrying about whether people actually eat bananas (e.g., guiding you towards barbed wire instead ;-)).

As an example, I don't really read Eric's essay as him valiantly reimplementing features from Haskell: he reinvented feature X into the better feature X' or decided to investigate new feature Y entirely. Instead of a top-down approach (or purely practical bottom-up), a spiral between the two. The PL community does a lot of the top down, but is not very conducive to the bottom up half of the spiral. Erik's work with LINQ is a great example of stellar PL design coming from it -- the integration of Rx and other evaluation strategies into it are beautiful. His examples of targeting data schemas and FFIs are further great examples of bottom-up motivation and then spiraling.

FWIW, the story doesn't end there. Despite Muhammad coming to the mountain, people are still confused: .


Thanks for both links, very interesting IMHO.
I didn't know that the OLPC hardware was so unreliable..
This defeat the whole project, even if OLPC were deployed in a better way..

Gasche wrote: 1. Most

Gasche wrote:

1. Most formal properties of language currently studied have effects on those things you care about.

2. With more work, more aspects of the things you care about would be amenable to a formal approximation.

I'm sure that (1) is true. These things do, definitely, have an effect on the things we care about. But we can't prove it, nor even really show why, nor even compare how much each such thing contributes or in what combinations, because the things we care about are not formalizable.

(2) however is trivially false, because these things are not aspects of what we care about in the first place. If they were, then we'd have objective proof of subjective properties, which is a contradiction in terms. The only connection anything can have to a subjective property, is a connection that can only be measured and predicted subjectively.

Honestly, probably the closest we could come is a lot like our "proof" that chocolate tastes good; an opinion poll finding that a large majority of people like it.

I am reminded of one of Paul Graham's essays where he talks about the engineers who design bridges and the engineers who design chairs. The former have a fairly well-defined problem; bridging a span within a well-defined schedule while minimizing costs and maximizing the safety and reliability of the finished product -- and those factors are things that there is a pretty good understanding of for the most part. The latter, on the other hand, has to think about the shape of people's butts and what feels good or bad when you're sitting on it for hours, and whether people will think it's pretty enough for their living room. About these things there is no accepted definition. So, while the first engineer can design his bridge in an "Ivory tower" using only math and materials science, the second has to go out into the world and interview a lot of people, and test various prototypes, and record subjective responses hoping that a consensus about what is good will emerge.

Like Paul Graham, I think programming language design is much the same as the design of chairs. The chair designer has to know his materials science and math, and build something that won't fall down, just like the programming language designer has to know his Language Theory and the underlying soundness of the mathematical ideas that the language is founded on. But the chair designer also has to know the shapes and frailties of human butts and the purely subjective aesthetics of interior designers and probably some medical facts about how to treat human bodies well, and has to get feedback from people about what they find comfortable and useful. The math, in other words, is necessary but not sufficient to do a good job.

If we want to design good programming languages, we have to know the math and theory - there's no question of that - but we also have to look beyond math, and into fuzzy, ill-defined places like psychology and aesthetics and so on. We have to form a good idea who our users are, what their strengths and weaknesses and likes and dislikes are, and figure out what and how much they can and should understand and handle in a programming language.

We also have to figure out how much we trust the language users. And there's nothing more subjective than that.

No need for Psychology

A language is not like a 'chair' built to suit the bottom of an individual or aesthetics of a small family unit. Under that analogy, a language is closer to a whole stadium. Languages are built for populations, not for individuals.

Psychology is hardly worth two cents for language design. Any non-trivial project is built by composing a large number of independently developed components - that is, components built by people with different tastes and worldviews. Further, individual components are often built by smaller groups, and must be maintained over time long after the original developers depart. There is no place for psychology there, at least not of individuals.

If we're going into the 'fuzzy' side of language design, you should focus your efforts on life-cycle issues: prototyping, modularity, extension, composition, configuration management, market integration, distribution, continuous maintenance and upgrade. Add some security to mitigate the inevitable errors and occasional malice.

How much we language designers 'trust' language users is also hardly relevant. Our language designs must, however, account for how well our users trust one another in a context where most of each program is borrowed or refactored from projects independently developed by strangers.


[[ A language is not like a 'chair' built to suit the bottom of an individual or aesthetics of a small family unit. Under that analogy, a language is closer to a whole stadium. Languages are built for populations, not for individuals. ]]

This whole paragraph is useless, an industrial chair designer's goal is to build a chair for a big number of individuals, I fail to see the difference between this and a language designer who design for 'populations', so it's looks very much like a strawman argument..

As for the rest, you didn't list "cost and availability of the tools which implements the language", but IMHO this is the main reason why Ada failed against C++ even though Ada was superior on the life cycle issues you list!

an industrial chair

an industrial chair designer's goal is to build a chair for a big number of individuals

Ray speaks of subjective aesthetics of interior designers. I do not get the impression he was talking about 'industrial' or stadium stackable chairs. Have you actually been chair shopping recently? When looking for a good computer chair, I walked into several furniture stores each with over 50 models of chair (and very little overlap) while looking for one suitable to my butt and subjective aesthetics, and then I paid $400 for it.

Languages are not for subjective aesthetics because even a lone-wolf programmer will need libraries written by other developers. To the extent one does account for aesthetics, it should be an objective measure through statistics.

As for the rest, you didn't list "cost and availability of the tools which implements the language"

It is true that there are a lot of non-technical reasons a language might fail. Indeed, one might observe that most of the factors for language 'success' are not even slightly technical - advertising, push by large companies, incumbence or integration with existing systems, cost and licensing of the implementation, et cetera.

individual-centric social science

I don't buy that we should not consider the psychology of the user when designing secure systems. As s such, while much of the concerns might be handled at library levels (e.g., widgets and authentication schemes), but I'd expect language-level support to be a big win as well (even if we're far from there today).

Group-oriented concerns are important, which might draw from traditional, information, and collaborative sociological literature, but we can also look into say social psychology, and go down a fairly slippery slope. Furthermore, while I have not read much in the psychology world (psychology of programming workshop has too much of a focus on novice users and university studies to keep me glued), but understanding how a developer acts individually (e.g., challenges to code comprehension) seems pretty fundamental. Maybe this is not psychology nor its techniques (e.g., cognitive science instead), but there's a clear void of fundamental human understanding.

psychology and security?

I agree that language support for security is a big win. And, though quite disputable, you might argue that managing a user's expectations and awareness is some sort of 'psychology' principle.

But try a gedankenexperiment: assume the 'target audience' for your UI is a bunch of independently developed autonomous agents, with direct access to whatever RPC you want to provide, rather than human users. Build a secure and effective UI under this assumption.

When I tried this experiment, my conclusion was pretty much: the basic security and awareness and other issues are essentially identical. Perhaps data is a bit more structured to minimize fuzzy parsing, and agents have more available bandwidth and memory to track or cause changes, but the essential set of issues is the same. And, further, one might observe that an interface developed for automated agents is easier to adapt to humans (i.e. rendering and filtering structured data and events, providing 'tangible values' or 'naked objects') than is the inverse.

Security, productive maintenance, modularity, et cetera require a the ability to reason about relevant system behavior with knowledge and control of only a small subset of its specification. Local reasoning properties don't really depend on 'psychology' of the individual performing the reasoning. Rather, these are more fundamental issues of logic and information theory.

I challenge you to name some security properties for UI that truly depend on the psychology of the user as opposed to a more fundamental principle that would hold even for T1000 robots.

In the case of UI, one of

In the case of UI, one of the worst mechanisms I've seen is using pictures to circumvent phishing logins (you haven't logged in if you don't see your predetermined picture). Fine for T1000s (arguably), horrible for people.

I don't like this example because it doesn't tell me about designing languages beyond that, perhaps, as one means, authentication should be fully baked in.

one of the worst mechanisms

one of the worst mechanisms I've seen is using pictures to circumvent phishing logins (you haven't logged in if you don't see your predetermined picture)

This mechanism makes it difficult for anyone but the 'owner' of a remote account to distinguish a successful login from a false login, thus raising the costs and risks associated with a failed login. Your concern is that human users lack the memory or attention to make this mechanism work, and thus may be trapped in their own honeypot.

I agree that the mechanism you describe is less than ideal for humans. But the associated security property - raising the cost of failure - can be achieved by many other mechanisms.

The point is that we needed

The point is that we needed to vet against people to know that the solution is flawed. You could propose a new solution, but would either have to eliminate people from the picture or otherwise account for them. Applying this to languages, we can automate more and more in them, eliminating the need for people, but there's typically something left.

Human in a loop

we needed to vet against people to know that the solution is flawed

Sure, we need humans in the loop somewhere, even if it's just to understand the common errors and fallacies of human behavior and cognition.

But understanding human biases and errors only intersects with psychology (you could get the same data from other fields), and is very far removed from 'subjective' and 'aesthetic' issues or study and classification of the individual human mind or behavior.

So while we do need humans in the loop, I say we don't need psychology.

Formal *approximation*

1. Most formal properties of language currently studied have effects on those things you care about.

2. With more work, more aspects of the things you care about would be amenable to a formal approximation.

I'm sure that (1) is true. These things do, definitely, have an effect on the things we care about. But we can't prove it, nor even really show why, nor even compare how much each such thing contributes or in what combinations, because the things we care about are not formalizable.

(2) however is trivially false, because these things are not aspects of what we care about in the first place. If they were, then we'd have objective proof of subjective properties, which is a contradiction in terms. The only connection anything can have to a subjective property, is a connection that can only be measured and predicted subjectively.

The key word in (2) as *approximation*. I don't claim that you can formalize maintainability, I claim that, with some research, you could give formal specifications of aspects of programming languages that, you would informally claim, have effect on maintainability.

Of course the connection between maintainability and those formal aspects will stay informal. But it does not make it less useful. For example, maybe some of those formal aspects will be novel, and haven't be separately studied before. Studying them (formally) would be an interesting and fruitful research, even more if you can convince your peers that they indeed have a relation to maintainability.

I see claims (1) and (2) as two opposite directions of a never-ending discussion between the formal and the informal, like the "spiral" lmereyov described. (1) is the idea that "pick some formal property, now try to guess what it's informal effects will be, you will be enlightened". (2) is the idea that "pick some informal thing you care about, now try to guess which formal properties have effect on it, you will be enlightened". Both are useful.

I will give two lengthy pragmatic examples. Feel free to skip the end of this post if you don't have time to read it.

A few weeks ago I learned about coffeescript, a syntactic sugar variant of Javascript which is intended to make it nicer to use, and is quickly gaining popularity (apparently Ruby On Rails just made it the default scripting-something language somewhere instead of Javascript). Most "features" (syntax changes) of Coffeescript are purely local changes, where CoffeeScript and JavaScript syntaxes are macro-expressible in term of the other (what dmbarbour would call an "homeomorphism").
But the scoping rules have one non-local change that I immediately thought to be a really bad idea : local variable shadowing is forbidden (more precisely, there is no variable binding construct, the scope of a variable is inferred from its topmost assignment site; I think formal parameters can shadow variables however). Name shadowing is a relatively formal idea, but I suppose most LtU users would non-consciously feel something fishy about forbidding name shadowing. I wasn't able to immediately pinpoint a pragmatic reason why I disliked the idea, until I realized it significantly hinders code reuse or refactoring, as you can't move a piece of code around without a risk of name shadowing (and, in the Coffeescript, silently changing the semantics of your code without any warning from the language). I hold a similar grudge against Erlang pattern-matching syntax.

So "name shadowing" is closely related to two formal properties, alpha-equivalence and weakening (adding a variable in the context), and loosely related to an informal idea you care about, refactoring / moving code around, and (informal claim) effects maintainability. If you have understood my "moving code around" example (I may be unclear), I assume you're reasonably convinced of the informal connection between name shadowing and refactoring. I don't think I need to conduct a usability test between users of Coffeescript and users of "Coffeescript plus a 'let' construct" to be more scientific about my claim.
For the record, when I complained about these scoping rules on reddit, I was told by a Coffeescript developper that name shadowing is bad in the first place, forbidding it is the Right Thing to do, and my concerns of "moving code around" in arbitrary contexts is misguided because Coffeescript rules are precisely there to restrict block nesting levels to an absolute minimum, and if you respect that style you won't have any problem with name shadowing (there is nothing in scope you may shadow with). So we have a questionable language feature intended to enforce a specific coding style. I personally still think they're wrong, but they have given a reasonable, informal and even "social" explanation of their choices, and that's it; besides, any argumentation at this point would be on the purely informal side, which I'm not so interested in.

My first example was an example of direction (1), turning the lack of a formal property into an argument about an informal thing. I have an example of (2), trying to reason formally about an informal problem I encountered in a language. My problem was with the open Foo directive in OCaml code : it exports all the (public) definitions of the Foo into the current scope. It is the most used way to use a module regulary in an ocaml piece of code. I have two (related) dislikes for open :

- open may shadow all previous bindings. It is sometimes voluntarily used (eg. there is a standard Array module, but you may open your own MoreArray module to shadow some of the Array functions with an implementation more specialized to your use case), but it can also be a pain, in particular wrt. compatibility : if in the next version they add a new function in Array, your code may break because some toplevel function you referenced after opening Array is now shadowed.

- Once you have more than one or two opened modules, it can be difficult to know where a given identifier has been declared. This can be alleviated by extra tooling (basically the compiler may output external annotation files with use-declaration cross-references during type checking), but it's still a problem.

Other languages (Python, Haskell...) have a restricted form of open where you explicitly reference the declarations you want to import, and the other stay qualified by the module name. My solution in OCaml is to import them manually (let bar =, or give a shorter alias (module F = Foo) instead of opening.

Can I relate this informal problem to a formal property of the language ? I have a partially-formalized idea of "isolated name resolution", which says that name resolution should be possible without any information about the external modules. They key idea is that with "import bar from Foo", the "bar" binding is apparent, while it is implicit in "open Foo" and you need information about Foo. Note that this is different from "separate name resolution", which would be, in analogy to separate type checking, that a piece of code may be name-resolved with only a description of the "lexical interface", the exported names of the external modules.
This is also related to the different semi-formal property that name resolution can be performed independently from type-checking (using a simplified (Γ ⊢ t) judgement "names are well-bound in t" instead of the more complex (Γ ⊢ t : σ) type judgement. OCaml, which doesn't have any kind of overloading, has this property (which may or not be an important thing).

Again, I linked a relatively formal (but would need to be further refined) property, "isolated resolution", to an informal feeling of unease when using a certain programming construct. I think it is reasonably clear that they are related, even if not formally. It could help, however, to gather empirical data about the number of errors made by OCaml programmers related to this "open" construct.

All I'm saying is that if we

All I'm saying is that if we want to claim that something is easier to develop with, more usable, easier to maintain, etc, those are claims relating to human beings and we need a way to measure them relative to human beings. It is irresponsible to make such claims if you haven't done testing with actual users (and a control group, etc) and discovered that the claims are true to some stated level of significance. And since these studies involve learning curves, they are hard to do.

The theory doesn't matter if it isn't developed in such a way that real live people can't achieve the benefit that the theory is supposed to provide. Having the highest horsepower number doesn't mean a particular model of car will win races (or even operate safely) if we, for example, neglect to provide a useful control interface for the driver. And if the car has a seat made out of cast iron, nobody will drive it no matter how safe it is.

If we must avoid making

If we must avoid making claims that we can't measure, then we are not allowed to innovate in PL usability simply because we can't measure our results. The current situation of banging around in the dark sucks, yes, but the alternative sucks even more (not doing anything at all).

Because user testing is difficult to impossible, we have come up with a lot of meaningless band aids (like cognitive dimensions as best practices) that allow us to proceed and feel as if we are being scientific. But false evidence can be even worse than no evidence, as it provides us with a false sense of confidence.

Perhaps we should be allowed to make unsubstantiated claims of usability and let the market sort it out. When someone in marketing says "easier" or "more intuitive", I take that as an aspiration rather than an objectively measured quality.

Language designer's notebook: Quantitative language design

Related paper I just came across on reddit:

For any given programming language, there is no shortage of new feature ideas. Language designers must not only perform the difficult task of deciding which of many possible (and often incompatible) language features should receive priority, but they also must consider that new language features can interact with existing ones in surprising, and sometimes incompatible, ways. Language evolution often requires making a trade-off between the benefits of enabling desirable new patterns of coding and the costs of potentially breaking some existing "weird" code. In this situation, being able to quantify — using real-world data — just how unusual that "weird" code is can provide valuable clues to which way a decision should go.