Lambda the Ultimate

inactiveTopic Best Language Design Papers (survey)
started 9/15/2002; 4:05:56 PM - last post 10/15/2002; 6:58:22 AM
Ehud Lamm - Best Language Design Papers (survey)  blueArrow
9/15/2002; 4:05:56 PM (reads: 2156, responses: 9)
Best Language Design Papers (survey)
I here by announce a LtU exclusive event: Best Language Design Papers survey. (more)


Posted to general by Ehud Lamm on 9/15/02; 4:06:15 PM

John Lawter - Re: Best Language Design Papers (survey)  blueArrow
9/16/2002; 12:00:36 PM (reads: 2173, responses: 0)
My nominee is

"Programming as Experience: The Inspiration for Self." Randall B. Smith and David Ungar, ECOOP '95 Conference Proceedings, Aarhus, Denmark, August, 1995.

Some of the other Self papers are good, but this one is my favorite. It is written as a retrospective, where the authors discuss why they did what they did, and also where they feel the language or its implementation falls short.

Luke Gorrie - Re: Best Language Design Papers (survey)  blueArrow
9/17/2002; 6:28:04 AM (reads: 2112, responses: 0)
I'll go with Hoare's "Hints on programming language design", as featured on LtU previously. http://www.cs.berkeley.edu/~necula/cs263/handouts/hoarehints.pdf

I now really appreciate this paper, having neglected several of the principles in a recent little language.

jon fernquest - Re: Best Language Design Papers (survey)  blueArrow
9/17/2002; 8:24:10 AM (reads: 2108, responses: 1)
Given the constraints of your survey I nominate the paper I just posted to LTU:

A Study of Evaluation Order Semantics in Expressions with Side Effects, NIKOLAOS S. PAPASPYROU, DRAGON MACOS, 2000)

I know it's not a classic or anything, but it helps you understand better the most popular language there is, and via rigor it furthers the goal of Write Once Run Anywhere...Reliably.

The paper seems to qualify on two points: 1. "The Analysis or rationale of the design of a specific programming language" , C, and 2. "The design of specific language features," the different types of expressions a language allows being features of the language.

(There's also the doctoral dissertation of one of the author's "A Formal Semantics for the C Programming Language" (Nikolaos S. Papaspyrou))

Ehud Lamm - Re: Best Language Design Papers (survey)  blueArrow
9/17/2002; 8:33:28 AM (reads: 2150, responses: 0)
Meta-message: If you find the constraints I gave to be too narrow, feel free to broaden them.

Ehud Lamm - Re: Best Language Design Papers (survey)  blueArrow
9/18/2002; 6:11:49 AM (reads: 2146, responses: 4)
Almost forgot: Brian W. Kernighan. Why Pascal is Not My Favorite Programming Language

More a critique of language design than anything else, but sure is a good (and important) read.

MJSR - Re: Best Language Design Papers (survey)  blueArrow
9/22/2002; 6:00:42 PM (reads: 2140, responses: 3)
> Why Pascal is Not My Favorite Programming Language

So why is it good or important?

I can't figure it out. All of its significant criticisms of Pascal had already been made; those who earlier promoted Pascal were already moving on to other languages (Ada and Modula-2, among others); and Dijkstra savaging languages he didn't like was far more entertaining (although I thought you wanted to discourage, not celebrate, that sort of thing in LtU).

Ehud Lamm - Re: Best Language Design Papers (survey)  blueArrow
9/23/2002; 3:18:24 AM (reads: 2191, responses: 2)
True, I don't like language bashing. But reasoned critiques are always welcome (we even have a department for them!)

The reason I like this paper is that it tries to show how important small details are, when it comes to language design.

The treatment of variable sized arrays is a great example.

Classic Pascal forces you to write a Sort routine for each array length. Yuck!

C doesn't enforce array length checking, so you get flexibility without safety.

Ada provide unconstrained array types, which safely provide the flexibility we need.

MJSR - Re: Best Language Design Papers (survey)  blueArrow
10/14/2002; 10:30:22 PM (reads: 3103, responses: 1)
> True, I don't like language bashing. But reasoned critiques > are always welcome (we even have a department for them!) > The reason I like this paper is that it tries to show how > important small details are, when it comes to language design. > > The treatment of variable sized arrays is a great example. > > Classic Pascal forces you to write a Sort routine for each > array length. Yuck! > > C doesn't enforce array length checking, so you get flexibility > without safety. > > Ada provide unconstrained array types, which safely provide > the flexibility we need.

It is a great example. It was great in Habermann's paper (1973), great when Lecarme and Desjardins (1975) called it a valid point, great when Wirth (1975) suggested an extension addressing it, great when the Ada rationale (1979) discussed that flexible safe approach, and still great when the ISO Pascal draft (1980) included conformant array parameters to deal with it (just to draw from Kernighan's citations). It's also the most significant criticism of Pascal in Kernighan's paper that holds up.

Let me make clear that I am not defending Pascal, which deserves and has received many fair criticisms; rather, I am criticizing Kernighan's paper, which does require pointing out some unfair criticisms of Pascal. Bias pervades Kernighan's paper, specifically his uneven treatment of the same issues in Pascal and in other languages. Some of the double standard may be explained by shifts in his views from earlier writings, but the absence of any comment on a change in his opinion of Fortran (for example) is itself conspicuous.

In _Elements of Programming Style_ (2nd ed 1978), Kernighan and Plauger advise "Use the good features of a language; avoid the bad ones". In _Software Tools_ (1976) they put this into practice: bare Fortran, "a poor language indeed for programming", was made better by a Ratfor preprocessor that adds control structures and improves the "cosmetics" of the language, although there are other weaknesses left alone. They recommended that any Fortran preprocessor available be used--"the payoff from any preprocessing is enormous".

In _Software Tools in Pascal_ (1981) they take another approach, wallowing in Pascal deficiencies (12 entries in the index). It seems appropriate to me to consider primarily the major flaws of Pascal that could not be remedied by a reasonable preprocessor comparable to the Ratfor preprocessor used in _Software Tools_ (pending further information on whether Fortran was also a toy language), and only secondarily the minor flaws that a preprocessor could fix (primarily matters of taste). Kernighan doesn't list many major flaws: array size, "There is no escape", interactive input, and efficiency concerns.

MAJOR FLAWS

Array sizes are part of the array type. As Kernighan notes, ISO Pascal provided a solution for parameters, which was in fact accepted as level one of the language standard (level zero was Jensen and Wirth Pascal, with some cleaning up). Kernighan calls this "the single biggest problem", even though there are workarounds (at some cost), unlike the "no escape" problem--using a single array type of the largest size, passing indices of a single large array, and linked lists. (_Software Tools in Pascal_ uses both of the first two.) This is a very fair criticism, widely made long before this paper, and remedied in ISO Pascal and almost all Pascal descendants (Per Brinch Hansen in Edison maintained that the only problem here was string literal notation).

Kernighan complains "There is no escape", meaning no way to breach the strong typing and no way to call external subroutines to replace the standard environment. Certainly this is an insurmountable problem if your implementation can't do one of these and you need it; then you get another implementation. If your implementation can do this, and whether the language guarantees that it is possible or not, there is still a problem of portability, since one deals with underlying hardware representations and the other with the operating system. No matter how nice or ugly the features that do these, you should still isolate their uses behind a small number of primitives, as in both _Software Tools_ books. (I place bitwise operators on integers here, because in Pascal they would require either a type breach to get a small set or an external subroutine.)

Throughout the paper, Kernighan sets up a strawman of "standard Pascal" in "its pure form". The book _Comparing and Assessing Programming Languages: Ada, C and Pascal_ (1984), where Kernighan's paper was first published, contains other papers that comment on C: "To date [198?], a complete definition of the C syntax has not been published." (Feuer and Gehani); "Even though most of the existing compilers are built by a rather close-knit group at Bell Labs and MIT, there are enough differences" that C programs are not portable, and "there is no clearly defined subset of [C] that would guarantee portability." (Mateti); "K&R is not a suitable Language Reference Manual.... K&R is neither accurate nor complete" (Evans); "no two compilers accept the same syntax" (Anderson). One could fairly conclude that, in theory, neither C nor Pascal was usable in 1981; in practice, both were used successfully. Real Pascal implementations had extern and fortran directives (as in Jensen and Wirth), and documented data representations sufficiently to use variant records to breach strong typing as an extension, or provided other extensions (such as a pointer type compatible with all pointers) for "unsafe" system programming. ISO Pascal required documentation of extensions, so extensions do not innately violate the spirit or letter of the standard. Kernighan scores a lot of cheap points on Pascal by demanding a level of portability that C couldn't meet. Just as one would demand a C implementation with the standard C library (not part of the language in 1981), one would demand a standard Pascal compiler with whatever extensions one needed.

Kernighan criticizes Pascal I/O strongly, but the only example of a problem (beyond the "no escape" point) is that interactive input devices may read the first input before the first prompt is output. I never met such a Pascal compiler, and ISO Pascal now calls for "lazy I/O" for interactive files. Kernighan says this requires "very careful implementation" to fix and is "relatively costly", but this is nonsense--it's about the same difficulty as coding ungetc() in C, which is already needed for scanf("%d",&n) to know when to stop, and any time and space cost is relative to the slowness of an interactive input device and the small number of such devices. Even in an implementation with this problem, a Pascal program can solve it for all but the first line of a text file by not reading the end-of-line until it is time to read the next line; only the first prompt is lost. (Since when do Unix/C programmers want programs to produce prompts, anyway? Next thing you know someone would demand error checking and readable error messages. Programs in _Software Tools in Pascal_ don't give prompts.)

The only other major problem (one that can't be fixed by a preprocessor) is efficiency (at compile-time or run-time). (The preprocessor fixes of minor problems should not affect run-time efficiency, but compilation would be slower; this is a valid concern, but not itself major, since a Fortran preprocessor was acceptable for _Software Tools_.) Most of Kernighan's complaints here confuse the language with its implementation; for example, his assertion that a set test like c in [ 'a' .. 'z', 'A' .. 'Z', '0' .. '9' ] must be slower than a range or array test fails on three counts--it's not certain that the implementation of the set isn't an (unpacked) array already, it's not certain that the compiler wouldn't optimize membership in a constant set to a range or array test, and in any event this is an issue for the implementation rather than the language. Similarly, assignments in the main block instead of initializers might be optimized away, so that the program need not be "bigger at run time" because Pascal lacks initializers. To quote _Elements of Programming Style_ again, "initialize variables with executable code" and "Write clearly--don't sacrifice clarity for "efficiency"" and "Let your compiler do the simple optimizations".

Separate compilation may be an efficiency concern for preparing large programs. Hoare gives convincing arguments against separate compilation in "Hints on Programming Language Design" (1973), although K&R C answers one of Hoare's points in that checking between compilation units is pretty much as strict as checking within a compilation unit (i.e., none; K&R C made no checks on type or even number of arguments). Given the choice between an insecure separate compilation facility and having to compile programs "all at once", I would prefer the latter; a secure separate compilation facility, as in Modula-2 and Ada, would certainly be an advantage. But large programs may only afford the choice of insecure separate compilation[*], so I'll grant this, with some reservations, as a fair criticism. (That vendors can distribute libraries in proprietary non-portable formats is not an advantage, though.)

[*] Ehud in e-mail asked why this is true (in addition to several other constructive suggestions). It only helps my argument if it is not true; but from a 1981 point of view, the bound of what could be compiled all at once might be encountered too soon, and secure separate compilation might require tracking many type definitions that have the same underlying representation (overhead that insecure compilation would avoid). It seems an unlikely scenario, and could probably be addressed by a slower compiler with more passes.

MINOR FLAWS

The minor flaws are in most cases arguable; many of them simply express Kernighan's preference for C syntax, semantics or programming style, and it is not surprising that Pascal is a poor language to write C in. Certainly Pascal does have a number of minor but irritating problems, some of which could have been avoided at essentially no cost; but every language has these, in greater or lesser number, and some are a matter of taste (Lisp enthusiasts usually snicker at arguments over the baroque syntax of imperative programming languages). But some of the minor points in Kernighan's paper reflect poorly on its quality, and others simply don't hold up (even without the benefit of hindsight).

Kernighan says that crypt was omitted from _Software Tools in Pascal_ because he couldn't write a sensible bitwise xor function. But the xor function in _Software Tools_ was written only in terms of non-standard Fortran operators, and the compress program was changed between the two editions specifically to avoid the problems that might ensue from representing counts by arbitrary non-printable characters, which seems to be a more likely reason to exclude crypt. (Without more information about what sensible means, it is hard to suggest a solution to this; certainly a portable xor could be written with a significant but not impossible slowdown, and the programs in _Software Tools in Pascal_ "do not have to be efficient". I would prefer to write crypt in Pascal using file of set of 0..7 (on a machine with 8 bit characters) or file of byte, to avoid problems with I/O of non-printable characters.)

Kernighan quotes Gannon and Horning's paper approvingly with regard to the semicolon as a terminator, but ignores another of their conclusions: "The persistence of assignment errors in TOPPS calls into serious question the treatment of the assignment symbol := as "just another operator." Expression-oriented languages using this convention (e.g., Algol 68) may cause unsuspected reliability problems." Why give the only result of Gannon and Horning that puts Pascal in a bad light? (The comments about putting semicolons in "proper places" ignores that Pascal syntax explicitly allows for a terminating semicolon before the end of a case statement. C semicolon placement is of course also tricky, since compound statements and preprocessor directives do not end with semicolon, and macros confuse the issue.)

Elsewhere he says that Pascal lacks a null string "perhaps because Pascal uses the doubled quote notation to indicate a quote embedded in a string"; of course, the obvious reason is that '' would have type packed array[1 .. 0] of char, and the corresponding type in C is also illegal (i.e., char nullstring[0]). Pascal versions with builtin varying length string types permit '' with no difficulties. Why speculate like this when the real reason is so easy to discover?

Kernighan calls reference parameters "nice" but "the information is only in one place, while two places would be better"; presumably this refers to using &x and *x in C calls. But f(x) in C can change x if x is an array, or if f is a macro, or if x is a global variable; x may already be a pointer. This argument is noteworthy as the rare case where C advocates prefer _more_ annotation of programs than "bondage and discipline" languages! A Pascal compiler could easily annotate code with the comment {var} before each such argument; in C the preprocessor alone makes this impossible, and the problem may not even be well-defined.

Kernighan notes that "There is no macro processor" in Pascal, but experience with extensible languages in 1981 was probably sufficient to conclude that this would have been a bad thing. (Dennis Ritchie at the HOPL-II closing panel said "In the specific case of C, in particular, I think that the whole pre-processor stuff is basically wrong".) An interesting side effect of the C preprocessor is that it becomes difficult to add another preprocessor--it must either deal with the obfuscation of un-preprocessed C code, or with the contents of standard header files and other features implemented by preprocessing.

The complaint that non-graphic characters can be handled only in an "ad hoc" manner such as knowing character codes illustrates a significant point: this is true to some degree in every language that doesn't mandate a particular character set, including C. C only pushes the boundaries out a bit; so ANSI C can handle named files and a small number of standard non-graphic characters, but cannot handle directories without an extension or other non-graphic characters except through "ad hoc" knowledge of the character set. (Newlines are handled in Pascal by readln, writeln and eoln; the original implementation had no non-graphic characters at all.)

OTHER FLAWS

Kernighan omits many criticisms of Pascal, so the paper is far from complete. Perhaps he didn't mention the problems C and Pascal share because C programming style didn't uncover them, or because any problem in both languages seemed universal and unfixable (even where other languages did better in 1981; e.g., exception handling). For example, he notes that Pascal's new procedure has no way to return control if it fails (where C's malloc returns NULL), but ignores the same problem for subroutine calls in both languages (if there is insufficient memory for local variables), which undercuts the value of recursion. For another example, case labels in Pascal don't allow ranges like '0' .. '9' but set constants do, an inconsistency that only points up the even bulkier C case labels.

PASCAL STRENGTHS

I think that Kernighan severely undervalues strong typing and other safety features. Where mentioned, they are marginalized: "of course compilers may not actually do all the checking implied in the language definition", and "This too seems like a service, although of course run-time checking does exact a penalty" or "Occasionally Pascal's type checking would warn of a slip of the hand in writing a program" (which would more commonly be known as an "error"). It's relatively easy to add features to a safe language, and very difficult to add safety to a flexible language. Ada is indeed a good example of the former among languages based on Pascal, and is apparently measurably better than C (see "Comparing Development Costs of C and Ada" by Stephen Zeigler).

Kernighan dismisses Pascal as only a "teaching" language, although the ISO standard--including the draft Kernighan cites--is a consequence of growing commercial interest in Pascal, which had attributes going far beyond its original goals. Kernighan says of comparing C and Pascal that "one is meant for getting something done while the other is meant for learning". Almost the same could be said of Cobol (the pre-eminent "getting something done" language for business) and Algol 60 (replacing "learning" with "describing computational processes"; it didn't even have I/O!), and Kernighan's paper is a bit like a Cobol advocate denouncing Algol 60 in 1970, when successors Algol 68 and Pascal were already designed and Algol 60 was becoming marginal. Statements like "my preconception was that Pascal is a wordier and less expressive language" (than Ratfor) hardly promote a sense of objectivity.

Kernighan's paper rehashes well-known criticisms of Pascal, provides some dubious criticisms and omits some better supported criticisms that C shares. The overall tone is attractive primarily to C advocates in flame wars against Pascal. The double standard between _Software Tools_ (and other writings) and _Software Tools in Pascal_ especially reveals bias; perhaps Fortran was a toy language, but not saying that also suggests bias. There are arguments for and against publishing this paper in a book (as Feuer and Gehani did) or a technical journal (apparently it was rejected by several). I can see no good argument for nominating it as one of the "Best Language Design Papers".

Ehud Lamm - Re: Best Language Design Papers (survey)  blueArrow
10/15/2002; 6:58:22 AM (reads: 2954, responses: 0)
Ada83 supported separate compilation. This was 1983. Maybe in '81 separate compilation was still science fiction...