What Are The Resolved Debates in General Purpose Language Design?

In the history of PL design there has been quite a bit of vitriol spilled over design choices. These days bringing up "static vs dynamic typing" is a good way to generate some heat on a cold winter web forum.

But a few debates seem to have been pretty firmly resolved, at least for general purpose languages. GP languages have considered goto harmful for decades now, dynamic scoping is seen as a real oddity, and manual memory management is a choice only made by systems language designers.

Questions for LtU: What other debates have been "resolved" in the sense that general purpose languages are hardly ever designed with an alternative to the consensus? What debates will be resolved in the near future? And what apparently resolved debates deserve to be reopened?

One rule: "static vs dynamic typing" will be considered off topic because clearly and emphatically that debate hasn't been resolved and has no prospect of being resolved any time soon. Also, it's boring.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

how about text syntaxt?

historically most programming systems were text based, there was some debate on the topic and very few non-text examples, so this is a pretty basic feature and not only for mainstream languages :)

here's another one...

Most, if not all, current mainstream languages tend to use the same strategy to fix different problems in their design: "make the IDE fix it and it is not a problem anymore".

Thus, the user experience and productivity in such languages may vary greatly based not on the experience of the user (forgive the pun) or one's knowledge of the domain but rather on the IDE used and one's familiarity with it. the situation is quite the contrary with such languages like CL, the ML family or even Python is a good example.

...abstraction via the IDE :)

The refactoring browser

is the extreme example of this--browser-oriented metaprogramming.

the fun part about refactoring browsers is...

that most of them re-implement much of the language parallel to the compiler/interpreter thus introducing a great probability of errors :)

Which is why language

Which is why language implementations (at least the front end) should be simple and easy to write....even better if the same frontend used by the compiler can support the IDE too. I've had my share of headaches with IntelliJ and Eclipse finding different errors than javac, for example.

Or...

means to parse the language ought to be part of the standard library, so toolsmiths don't have to reinvent that wheel.

indeed...

and ad extensible and usable library at that, not just stub code. In addition, I'd say that the main condition for such a library to even pretend to be meaningful, is that it must be the same code used to actually parse the language...

This is a real problem in some languages I know :)

Though, I still think that things like refactoring should be a feature of the language or it's library rather than the editor or an IDE. And, how should that be done, is a very interesting question :)

Fine, as long as there still

Fine, as long as there still exists a specification outside of the implementation, like a formal semantics or a language specification document.

Is Formal Semantics a Resolved Debate?

Is it? (and it should be clear I speak of having one independent of language implementation, not of agreement across languages).

I'm certainly with you in the camp that thinks of languages in terms of a (potentially abstract) syntax with an associated formal semantics. And I would assume that most here on LtU agree. But I often learn that such assumptions are either wrong or come from preferring to hang out with people that think as I do.

I can name PHP and Perl among the languages for which I've never found a formal semantics. I also imagine PLT guys are somewhat biased on the issue. I wonder how many DSLs receive formal semantics, and how many general purpose languages still receive a semantics that is highly tied into the development of a particular language implementation (to the point that the two are hardly distinguishable).

As far as I know

There are no languages in widespread use with a formal semantics. And, the one coming the closest to it might actually be Java, given the number of verification tools for that.

Formal Semantics Unresolved

Well, then I guess we can call the issue of formal semantics thoroughly unresolved regardless of wishful thinking to the contrary.

Yeah, what would you do with it anyway...

I am with you on the ideal.

But, say you would have a formal semantics for a language. What would you do with it? Write conformance tests or check the compiler implementation? Tests are never exhaustive enough, and checking the compiler? You would need a formal semantics for the language the compiler is written in as well as the output language.

No language expressive

No language expressive enough to formally check its own implementation in terms of a formally specified semantics has ever come to my attention. I wonder if such a thing might run into problems with Goedels Incompleteness Theorem and Rice's Law.

Despite that, one might be able to verify a compiler or interpreter to extreme levels of confidence, higher than for other parts of programs. A bootstrap compiler/interpreter might only cover a tiny fraction of the semantic space for the language being compiled, so one may be able to verify it more easily than a compiler written in the language itself.

There are also some major advantages to a formal semantics, particularly when it comes to proving properties that should hold true for programs assuming the implementation meets the formally specified semantics. It should be easier to reason about programs in reference to an assumed faultless compiler than it is to simultaneously reason about failures in both a program and its implementation environment.

Idealistically, the existence of a formal semantics for the language also allows one to clearly determine where blame belongs. Unfortunately, in practice, 'backwards compatibility' issues intervene. If you demand programmers code to the specification, their programs might not work due to implementation bugs, and they'll hate you for that reason (but at least you'll hear about the bugs and be able to fix them). If you allow programmers code to the implementation, their programs might work, but they'll hate you if you break their code by fixing (or even optimizing) the implementation.

I guess that just proves that the blame-game is negative-sum: you lose no matter what.

Still, my personal opinion is that you're better off demanding they follow the specification... forcing the issue as much as possible by making the debug implementation hide implementation details by checking for dependencies where possible and 'randomizing' dependencies where not (such as message arrival orders, delays, disruptions, etc.).

If we're interested in

No language expressive enough to formally check its own implementation in terms of a formally specified semantics has ever come to my attention.

If we're interested in HM-type languages, unless I misunderstood something, they can specify themselves given polmyorphic-let. It's probably a weaker specification than you were thinking of however.

No language expressive

No language expressive enough to formally check its own implementation in terms of a formally specified semantics has ever come to my attention. I wonder if such a thing might run into problems with Goedels Incompleteness Theorem and Rice's Law.

The incompleteness theorem does say something about what's possible, but we can have a language that can "establish" the correctness of an implementation of itself. Of course, that the language does in fact establish the correct thing, you'll need to convince yourself of through some other means.

If you demand programmers code to the specification, their programs might not work due to implementation bugs, and they'll hate you for that reason (but at least you'll hear about the bugs and be able to fix them).

Ideally, the specification is the implementation, or rather is a formally checkable interface to the implementation. Ideally, the portion you specify and check on paper should be very small.

Agreed, but...

Ideally, the specification is the implementation, or rather is a formally checkable interface to the implementation.

This really doesn't prevent programmers from coding to the implementation. It's still easy to learn implementation details experimentally, e.g. by testing code via unit tests. E.g. programmers might learn that arrival order of messages in actors model might be pretty well fixed on a local machine. If programmers code taking advantage of this 'fact', they've still coded to the implementation regardlesss of the fact that the language specification is an executable interface to the implementation, and features that the language is intended to offer (such as distribution, or a variety of optimizations) will likely fail.

Even if an implementation is a correct one, it will take some special effort to actually force programmers code to the specification rather than to the implementation. I.e. at least for the debug implementation, one will intentionally need to integrate pseudorandom message arrival orders in an actors model language. If programmers later decide they need local ordering of messaging, you'll need to update the language specification to support some form of locality (akin to a 'kell' in kell calculus) and allow programmers to control message ordering and scheduling for messages passed within a kell.

And then one will need to confirm the modified specification and its implementation.

True

I was just saying that if the implementation was formally checked against the specification, then you get rid of one half of what you were talking about.

Well, if I'm allowed to

Well, if I'm allowed to wiggle a bit to fit my wishful thinking into the title of this thread, I can make my statement true with either of the following assertions:

i.) Most new general purpose languages aren't "designed", but emerge spontaneously from the uncultured masses who produce such junk as PHP and Perl.
ii.) My comment applies to PL theory, not practice.

Ok, ok, just kidding. Formal semantics are really nice to have, but still most people don't bother these days, except theorists. But it's pretty hard to publish research on PLT without semantics.

Well, if I'm allowed to

[dupe]

If the parsing library is

If the parsing library is built of parser combinators, isn't that just an executable specification?

Not in general, no...

After all, it is possible for, say, an EBNF form to be incorrect relative to a formal specification of a language that is written in another document.

Essentially, one would further need to designate the formal specification of the language itself to be a particular executable specification. Not a bad start. At this point it really is just a matter of whether the executable specification language allows one to express various language concerns while avoiding semantic noise in terms of accidentally specifying implementation details. Suggesting implementation strategies as sort of 'annotations' to the specification wouldn't hurt, so long as they are clearly suggestions.

Language defined in Language

What you describe is general to things being 'implemented' in two places. If the 'definition' for a language, including its syntax rules, parser, semantics, base error-reporting standards, compiler or interpreter, etc. is defined as a standard or semi-standard library within the same language, then I expect that refactoring browsers become a great deal easier to write (even better if the above are defined extensibly such that an IDE can integrate its own checks, error reports, optimizers, preprocessed resources for rapid edit-compile-test, etc.). The definition becomes twice and only twice, and one of those two definitions being the language bootstrap.

Refactoring browsers are still likely to have problems with a syntax that is not fully computable without communication side-effects.

Context-free syntax

All modern languages, to my knowledge, have their syntax specified (if at all) in a grammar meta-language that's equivalent in power to Chomsky's context-free grammars. The meta-language can be BNF (originally designed for Algol-60 specification), syntax diagrams, a Yacc specification or another meta-language of equivalent power. All language aspects that cannot be handled by a context-free grammar are relegated to the fuzzy area of semantics.

It may seem strange today, but this is not the only option. Algol 68 was specified using the Van Wijngaarden grammar, which was context-sensitive. I'm not aware of any other examples, but I think the point stands: context-free has won.

I think Python's nesting

I think Python's nesting based on whitespace makes it context-sensitive.

Python's nesting is at the lexical level

No, the indentation levels are handled on the lexical level. It's as if every increase in indentation is replaced by a block-start token (i.e., a '{' in C) and every decrease by a block-end.

This algorithm is not context-sensitive, unless you think of the previous indentation level as context. Haskell is similar in this regard, but its alternative syntax with visible curly braces and semicolons makes it more obvious.

Lexical level is part of the grammar

Although in a language description a distinction is made between the lexical level and the syntactical level, together they describe the grammar of the language.

Since the lexer needs to remember the indentation level, which can only be expressed in a context-sensitive grammar, the grammar in total is context sensitive.

Right

In practice one can use two CFGs. One that describes the token and another one the parser. The context dependency comes in precisely in the post-tokenization step that is not treated declaratively. Post tokenization emits then a modified token stream.

I guess one can go around the notorious ambiguities of C in the same way and replace a NAME token by a TYPENAME token when it matches against a type name being stored in a symbol table that is created dynamically. Not sure if one can even tame C++ in that way? I've never seen C++ grammars that were useful.

Associative arrays

It seems to be accepted now that a language should have an associative array type. All the "scripting languages" have one, and even C++ has one in the STL.

Even stronger

In fact, several major "scripting languages" are nothing but an associative array type and some syntactic sugar. ;)

It's one of those

It's one of those worse-is-better things. Couple of years back it was "resolved" that associative arrays are a bad general purpose data structure, and no respectable language had them. We were told that you could always code them from scratch. So much for resolved debates.

DWIM (Do What I Mean) (InterLisp feature)

DWIM stands for Do What I Mean. In this post, I am referring to a feature of an old Lisp dialect called Interlisp (I've never used InterLisp myself, I've only read about it). It seems to me that the community has resolved that DWIM is undesirable.

In InterLisp, the compiler/interpreter would, upon encountering an error, attempt to 'patch up' the program to do something sensible. For example, DWIM might correct DEFINEQ((FACTORIAL (LAMBDA (N) (IFFN=0 THENN 1 ESLE N*8FACTTORIALNN-1)))) to DEFINEQ((FACTORIAL (LAMBDA (N) (IF N=0 THEN 1 ELSE N*(FACTORIAL N-1))))) (example from http://extravagaria.com/Files/HOPL2-Uncut.pdf ).

According to Thomas Bushnell in a discussion in March 2002 on comp.lang.lisp ( https://groups.google.com/d/msg/comp.lang.lisp/OoGwrUcibTY/oHlRSOARXDQJ ), "DWIM was a great experiment, I think; in the sense that it was worth trying, and produced a fairly clear conclusion: "bzzt, not the right way.". According to Kent Pitman in http://www.nhplace.com/kent/Papers/cl-untold-story.html , a desire to have an ARPA-funded Lisp which did not have DWIM was one of the motivations for Common Lisp (http://www.nhplace.com/kent/Papers/cl-untold-story.html , as well as the March 2002 comp.lang.lisp discussion ( https://groups.google.com/d/msg/comp.lang.lisp/OoGwrUcibTY/AHhj1Jk0DdIJ )).

One complaint about DWIM from Thomas Bushnell was that "DWIM has a near-disastrous non-locality effect. A given piece of code will be patched up to look "sane" given the current feature set of the system. But when the feature set changes, the "nearest sane patchup" may well be very different." ( https://groups.google.com/d/msg/comp.lang.lisp/OoGwrUcibTY/oHlRSOARXDQJ )

According to Kent Pitman in ( https://groups.google.com/d/msg/comp.lang.lisp/OoGwrUcibTY/AHhj1Jk0DdIJ ), there were various horror stories involving DWIM, and "One of them I heard went like this (and again, I can't even say if this is an actual story of what DWIM would do but it gives you the idea of what it had license to do and why people both liked and feared it): A user didn't know how to use the DECL facility so figured they would just say (DECL) and that DWIM would help them. [This shows the trust aspect of a user who likes DWIM.] DWIM realized that DECL was not actually a function name so searched for a name to substitute that was better. It found DEL. [I think that was the name. It's been many years now and I'm doing this from memory. The function it found deletes files, anyway.] DEL isn't a function of no args so DWIM "helpfully" assumed (DEL NIL) to pad out a good number of args. But NIL was a wrong type of argument, so DWIM concluded a wildcard would be most appropriate ... Well, you get the idea. A lot of people just wanted DWIM turned off, and you can sort of see why. If you think Common Lisp has fuzzy semantics, or you think some of my stories about Maclisp make it sound like the unruly wild, wild west of Lisp, you should think of Interlisp as all the more --- well, not unruly but weirdly ruly.

BUT, you couldn't just turn DWIM off because it had been around a long time and lots of code needed it to survive."

. It seems to me that the

. It seems to me that the community has resolved that DWIM is undesirable.

Well, it seems Martin Rinard has made a career of this :)

Nothing is resolved

Rather every decision in designing a language depends on the goals of the designer. The art is to make the choices consistent in a way that each design decision supports the others resulting a a symmetric, consistent, yet conceptually simple language.

Relatively relative

Those who try for a single way of doing things do tend to underestimate the differences between problem domains, but symmetrically, those who give up altogether on finding a common medium, tend to underestimate how much commonality the different problem domains really have.

Given enough people, there will be some who disagree with anything; that probably oughtn't keep us from observing that some things have been "settled". (Commentary: there are more than enough people.)

Compatability and Popularity.

Finding new common abstractions seems a great idea, but these are often incompatible with existing abstractions and one must choose compatible abstractions/features when designing a language from amongst mutual exclusions and dependencies.

I am not sure the number of people disagreeing with something has anything to do with its correctness. That would reduce things to a popularity contest (in the absence of any kind of provability), and opinion can be reversed by compelling argument, publicity or politics. Following this further what is popular today, may be unpopular tomorrow, and hence nothing is resolved (in a final way).

True, the existence of

True, the existence of suitable abstractions, or even knowledge of suitable abstractions, need not imply the feasiability of getting there from here.

The number of people disagreeing with something isn't necessarily linked to its correctness (though one hopes there's nonzero statistical correlation, especially with some reasonable thresholds applied). In fairness, this thread was originally about, so I understand, what questions have been "resolved". It's possible for something to be settled without being right. Keep in mind, my public position of late has been "macros bad, fexprs good", directly opposing a settled decision of the Lisp community that stood for about twenty years before I started rocking the boat.

In this whole gigantic,

In this whole gigantic, multi-year thread, it seems nobody has mentioned short-circuit and/or? Or is that unresolved? I've gotten the impression lately that it's relatively resolved in mainstream languages?

double post.

double post.