> True, I don't like language bashing. But reasoned critiques
> are always welcome (we even have a department for them!)
> The reason I like this paper is that it tries to show how
> important small details are, when it comes to language design.
>
> The treatment of variable sized arrays is a great example.
>
> Classic Pascal forces you to write a Sort routine for each
> array length. Yuck!
>
> C doesn't enforce array length checking, so you get flexibility
> without safety.
>
> Ada provide unconstrained array types, which safely provide
> the flexibility we need.
It is a great example. It was great in Habermann's paper (1973), great
when Lecarme and Desjardins (1975) called it a valid point, great when
Wirth (1975) suggested an extension addressing it, great when the Ada
rationale (1979) discussed that flexible safe approach, and still great
when the ISO Pascal draft (1980) included conformant array parameters to
deal with it (just to draw from Kernighan's citations). It's also the
most significant criticism of Pascal in Kernighan's paper that holds up.
Let me make clear that I am not defending Pascal, which deserves and has
received many fair criticisms; rather, I am criticizing Kernighan's
paper, which does require pointing out some unfair criticisms of Pascal.
Bias pervades Kernighan's paper, specifically his uneven treatment of
the same issues in Pascal and in other languages. Some of the double
standard may be explained by shifts in his views from earlier writings,
but the absence of any comment on a change in his opinion of Fortran
(for example) is itself conspicuous.
In _Elements of Programming Style_ (2nd ed 1978), Kernighan and Plauger
advise "Use the good features of a language; avoid the bad ones". In
_Software Tools_ (1976) they put this into practice: bare Fortran, "a
poor language indeed for programming", was made better by a Ratfor
preprocessor that adds control structures and improves the "cosmetics"
of the language, although there are other weaknesses left alone. They
recommended that any Fortran preprocessor available be used--"the
payoff from any preprocessing is enormous".
In _Software Tools in Pascal_ (1981) they take another approach,
wallowing in Pascal deficiencies (12 entries in the index). It seems
appropriate to me to consider primarily the major flaws of Pascal
that could not be remedied by a reasonable preprocessor comparable to
the Ratfor preprocessor used in _Software Tools_ (pending further
information on whether Fortran was also a toy language), and only
secondarily the minor flaws that a preprocessor could fix (primarily
matters of taste). Kernighan doesn't list many major flaws: array
size, "There is no escape", interactive input, and efficiency concerns.
MAJOR FLAWS
Array sizes are part of the array type. As Kernighan notes, ISO Pascal
provided a solution for parameters, which was in fact accepted as level
one of the language standard (level zero was Jensen and Wirth Pascal,
with some cleaning up). Kernighan calls this "the single biggest
problem", even though there are workarounds (at some cost), unlike the
"no escape" problem--using a single array type of the largest size,
passing indices of a single large array, and linked lists. (_Software
Tools in Pascal_ uses both of the first two.) This is a very fair
criticism, widely made long before this paper, and remedied in ISO
Pascal and almost all Pascal descendants (Per Brinch Hansen in Edison
maintained that the only problem here was string literal notation).
Kernighan complains "There is no escape", meaning no way to breach the
strong typing and no way to call external subroutines to replace the
standard environment. Certainly this is an insurmountable problem if
your implementation can't do one of these and you need it; then you get
another implementation. If your implementation can do this, and
whether the language guarantees that it is possible or not, there is
still a problem of portability, since one deals with underlying
hardware representations and the other with the operating system. No
matter how nice or ugly the features that do these, you should still
isolate their uses behind a small number of primitives, as in both
_Software Tools_ books. (I place bitwise operators on integers here,
because in Pascal they would require either a type breach to get a
small set or an external subroutine.)
Throughout the paper, Kernighan sets up a strawman of "standard Pascal"
in "its pure form". The book _Comparing and Assessing Programming
Languages: Ada, C and Pascal_ (1984), where Kernighan's paper was first
published, contains other papers that comment on C: "To date [198?], a
complete definition of the C syntax has not been published." (Feuer and
Gehani); "Even though most of the existing compilers are built by a
rather close-knit group at Bell Labs and MIT, there are enough
differences" that C programs are not portable, and "there is no clearly
defined subset of [C] that would guarantee portability." (Mateti); "K&R
is not a suitable Language Reference Manual.... K&R is neither
accurate nor complete" (Evans); "no two compilers accept the same
syntax" (Anderson). One could fairly conclude that, in theory, neither
C nor Pascal was usable in 1981; in practice, both were used
successfully. Real Pascal implementations had extern and fortran
directives (as in Jensen and Wirth), and documented data
representations sufficiently to use variant records to breach strong
typing as an extension, or provided other extensions (such as a pointer
type compatible with all pointers) for "unsafe" system programming.
ISO Pascal required documentation of extensions, so extensions do not
innately violate the spirit or letter of the standard. Kernighan
scores a lot of cheap points on Pascal by demanding a level of
portability that C couldn't meet. Just as one would demand a C
implementation with the standard C library (not part of the language in
1981), one would demand a standard Pascal compiler with whatever
extensions one needed.
Kernighan criticizes Pascal I/O strongly, but the only example of a
problem (beyond the "no escape" point) is that interactive input
devices may read the first input before the first prompt is output. I
never met such a Pascal compiler, and ISO Pascal now calls for "lazy
I/O" for interactive files. Kernighan says this requires "very careful
implementation" to fix and is "relatively costly", but this is
nonsense--it's about the same difficulty as coding ungetc() in C, which
is already needed for scanf("%d",&n) to know when to stop, and any time
and space cost is relative to the slowness of an interactive input
device and the small number of such devices. Even in an implementation
with this problem, a Pascal program can solve it for all but the first
line of a text file by not reading the end-of-line until it is time to
read the next line; only the first prompt is lost. (Since when do
Unix/C programmers want programs to produce prompts, anyway? Next
thing you know someone would demand error checking and readable error
messages. Programs in _Software Tools in Pascal_ don't give prompts.)
The only other major problem (one that can't be fixed by a
preprocessor) is efficiency (at compile-time or run-time). (The
preprocessor fixes of minor problems should not affect run-time
efficiency, but compilation would be slower; this is a valid concern,
but not itself major, since a Fortran preprocessor was acceptable for
_Software Tools_.) Most of Kernighan's complaints here confuse the
language with its implementation; for example, his assertion that a set
test like c in [ 'a' .. 'z', 'A' .. 'Z', '0' .. '9' ] must be slower
than a range or array test fails on three counts--it's not certain that
the implementation of the set isn't an (unpacked) array already, it's
not certain that the compiler wouldn't optimize membership in a
constant set to a range or array test, and in any event this is an
issue for the implementation rather than the language. Similarly,
assignments in the main block instead of initializers might be
optimized away, so that the program need not be "bigger at run time"
because Pascal lacks initializers. To quote _Elements of Programming
Style_ again, "initialize variables with executable code" and "Write
clearly--don't sacrifice clarity for "efficiency"" and "Let your
compiler do the simple optimizations".
Separate compilation may be an efficiency concern for preparing large
programs. Hoare gives convincing arguments against separate
compilation in "Hints on Programming Language Design" (1973), although
K&R C answers one of Hoare's points in that checking between
compilation units is pretty much as strict as checking within a
compilation unit (i.e., none; K&R C made no checks on type or even
number of arguments). Given the choice between an insecure separate
compilation facility and having to compile programs "all at once", I
would prefer the latter; a secure separate compilation facility, as in
Modula-2 and Ada, would certainly be an advantage. But large programs
may only afford the choice of insecure separate compilation[*], so I'll
grant this, with some reservations, as a fair criticism. (That vendors
can distribute libraries in proprietary non-portable formats is not an
advantage, though.)
[*] Ehud in e-mail asked why this is true (in addition to several other
constructive suggestions). It only helps my argument if it is not true;
but from a 1981 point of view, the bound of what could be compiled all
at once might be encountered too soon, and secure separate compilation
might require tracking many type definitions that have the same
underlying representation (overhead that insecure compilation would
avoid). It seems an unlikely scenario, and could probably be addressed
by a slower compiler with more passes.
MINOR FLAWS
The minor flaws are in most cases arguable; many of them simply express
Kernighan's preference for C syntax, semantics or programming style,
and it is not surprising that Pascal is a poor language to write C in.
Certainly Pascal does have a number of minor but irritating problems,
some of which could have been avoided at essentially no cost; but every
language has these, in greater or lesser number, and some are a matter
of taste (Lisp enthusiasts usually snicker at arguments over the
baroque syntax of imperative programming languages). But some of the
minor points in Kernighan's paper reflect poorly on its quality, and
others simply don't hold up (even without the benefit of hindsight).
Kernighan says that crypt was omitted from _Software Tools in Pascal_
because he couldn't write a sensible bitwise xor function. But the xor
function in _Software Tools_ was written only in terms of non-standard
Fortran operators, and the compress program was changed between the two
editions specifically to avoid the problems that might ensue from
representing counts by arbitrary non-printable characters, which seems
to be a more likely reason to exclude crypt. (Without more information
about what sensible means, it is hard to suggest a solution to this;
certainly a portable xor could be written with a significant but not
impossible slowdown, and the programs in _Software Tools in Pascal_ "do
not have to be efficient". I would prefer to write crypt in Pascal
using file of set of 0..7 (on a machine with 8 bit characters) or file
of byte, to avoid problems with I/O of non-printable characters.)
Kernighan quotes Gannon and Horning's paper approvingly with regard to
the semicolon as a terminator, but ignores another of their
conclusions: "The persistence of assignment errors in TOPPS calls into
serious question the treatment of the assignment symbol := as "just
another operator." Expression-oriented languages using this convention
(e.g., Algol 68) may cause unsuspected reliability problems." Why give
the only result of Gannon and Horning that puts Pascal in a bad light?
(The comments about putting semicolons in "proper places" ignores that
Pascal syntax explicitly allows for a terminating semicolon before the
end of a case statement. C semicolon placement is of course also
tricky, since compound statements and preprocessor directives do not
end with semicolon, and macros confuse the issue.)
Elsewhere he says that Pascal lacks a null string "perhaps because
Pascal uses the doubled quote notation to indicate a quote embedded in
a string"; of course, the obvious reason is that '' would have type
packed array[1 .. 0] of char, and the corresponding type in C is also
illegal (i.e., char nullstring[0]). Pascal versions with builtin
varying length string types permit '' with no difficulties. Why
speculate like this when the real reason is so easy to discover?
Kernighan calls reference parameters "nice" but "the information is
only in one place, while two places would be better"; presumably this
refers to using &x and *x in C calls. But f(x) in C can change x if x
is an array, or if f is a macro, or if x is a global variable; x may
already be a pointer. This argument is noteworthy as the rare case
where C advocates prefer _more_ annotation of programs than "bondage
and discipline" languages! A Pascal compiler could easily annotate
code with the comment {var} before each such argument; in C the
preprocessor alone makes this impossible, and the problem may not even
be well-defined.
Kernighan notes that "There is no macro processor" in Pascal, but
experience with extensible languages in 1981 was probably sufficient to
conclude that this would have been a bad thing. (Dennis Ritchie at the
HOPL-II closing panel said "In the specific case of C, in particular, I
think that the whole pre-processor stuff is basically wrong".) An
interesting side effect of the C preprocessor is that it becomes
difficult to add another preprocessor--it must either deal with the
obfuscation of un-preprocessed C code, or with the contents of standard
header files and other features implemented by preprocessing.
The complaint that non-graphic characters can be handled only in an "ad
hoc" manner such as knowing character codes illustrates a significant
point: this is true to some degree in every language that doesn't
mandate a particular character set, including C. C only pushes the
boundaries out a bit; so ANSI C can handle named files and a small
number of standard non-graphic characters, but cannot handle
directories without an extension or other non-graphic characters except
through "ad hoc" knowledge of the character set. (Newlines are handled
in Pascal by readln, writeln and eoln; the original implementation had
no non-graphic characters at all.)
OTHER FLAWS
Kernighan omits many criticisms of Pascal, so the paper is far from
complete. Perhaps he didn't mention the problems C and Pascal share
because C programming style didn't uncover them, or because any problem
in both languages seemed universal and unfixable (even where other
languages did better in 1981; e.g., exception handling). For example,
he notes that Pascal's new procedure has no way to return control if it
fails (where C's malloc returns NULL), but ignores the same problem for
subroutine calls in both languages (if there is insufficient memory for
local variables), which undercuts the value of recursion. For another
example, case labels in Pascal don't allow ranges like '0' .. '9' but
set constants do, an inconsistency that only points up the even bulkier
C case labels.
PASCAL STRENGTHS
I think that Kernighan severely undervalues strong typing and other
safety features. Where mentioned, they are marginalized: "of course
compilers may not actually do all the checking implied in the language
definition", and "This too seems like a service, although of course
run-time checking does exact a penalty" or "Occasionally Pascal's type
checking would warn of a slip of the hand in writing a program" (which
would more commonly be known as an "error"). It's relatively easy to
add features to a safe language, and very difficult to add safety to a
flexible language. Ada is indeed a good example of the former among
languages based on Pascal, and is apparently measurably better than C
(see "Comparing Development Costs of C and Ada" by Stephen Zeigler).
Kernighan dismisses Pascal as only a "teaching" language, although the
ISO standard--including the draft Kernighan cites--is a consequence of
growing commercial interest in Pascal, which had attributes going far
beyond its original goals. Kernighan says of comparing C and Pascal
that "one is meant for getting something done while the other is meant
for learning". Almost the same could be said of Cobol (the pre-eminent
"getting something done" language for business) and Algol 60 (replacing
"learning" with "describing computational processes"; it didn't even
have I/O!), and Kernighan's paper is a bit like a Cobol advocate
denouncing Algol 60 in 1970, when successors Algol 68 and Pascal were
already designed and Algol 60 was becoming marginal. Statements like
"my preconception was that Pascal is a wordier and less expressive
language" (than Ratfor) hardly promote a sense of objectivity.
Kernighan's paper rehashes well-known criticisms of Pascal, provides
some dubious criticisms and omits some better supported criticisms that
C shares. The overall tone is attractive primarily to C advocates in
flame wars against Pascal. The double standard between _Software
Tools_ (and other writings) and _Software Tools in Pascal_ especially
reveals bias; perhaps Fortran was a toy language, but not saying that
also suggests bias. There are arguments for and against publishing
this paper in a book (as Feuer and Gehani did) or a technical journal
(apparently it was rejected by several). I can see no good argument
for nominating it as one of the "Best Language Design Papers".
|