poll: syntax

Imagine: You take part in a competition to design the syntax of a programming language for people who have never programmed anything before (short-term goal!), but are about to become professional programmers for the rest of their lives (long-term goal!). You want to win the competition, so be realistic. Don't mention your favorite language just because *you* like it, but because you think that it's highly user-friendly for newbies and that it scales enough as experience grows.

1. What type of syntax should this programming language have?

  • C/Java
  • Python
  • Haskell
  • Lisp
  • Forth
  • Smalltalk without math precedence: 1+2*3 = (1+2)*3
  • Smalltalk, but with math precedence: 1+2*3 = 1+(2*3)
  • Other

2. Do you think it's important to respect math precedence?


Please focus on the answers and don't add more than a short note to your reply, if at all. Put your answer at the top. This will help me evaluate the result.

In case most people choose "Other" as the best syntax I'll start another discussion about what it should look like.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Context?

How do these newbies learn to program? Are they self taught? Or do they take an introductory course at the college level? Or....

And what is your motivation for asking this question? Seems to be rather open ended and an invitation to opinion rather than some objective criteria by which we should rigorously evaluate what makes a good first PL.

Be that as it may, LtU has a multitude of discussions about what makes a good first programming language. For what it's worth, I recommend Python for those that are self learners. And for those who learn in an academic environment, I suggest HTDP or SICP - both of which use Scheme - though the particular PL is incidental to the lessons being taught.

But I usually come away feeling that people learn things in many different ways, and there is no one correct answer. Diversity may actually be more important than conjecturing that a single PL fits this purpose.

motivation

I'm wondering what makes syntax popular, if we ignore the people who already know a language. IOW, are C-like languages popular because most people already know C/Java-like syntax or are they popular because that's the most understandable and least ugly syntax we have today?

BTW, I've read that natural-language syntax leads to newbies believing that you can enter any English sentence which ironically makes it more difficult to learn the language (and those who are more experienced probably would prefer a different syntax, anyway). Are there some real studies on this?

I'm wondering what makes

I'm wondering what makes syntax popular

So why not ask this question and your last one in the above? An LtU opinion poll is not going to answer any of the questions you've just posed. Further this question is extremely difficult to answer; you pick languages (and oftentimes not even that), not syntax.

If we were picking languages

I think it's more like this simplification:
We pick the language whose example code we can easily understand. Then we start learning it.

From long discussions with developers I know that you can't tell people what's so great about features like closures, etc. Even if you provide an example you'll have to simplify it so much that they'll wonder how *they* can apply it. Many of the advanced languages have features that you only begin to appreciate when you actually use them in your own code.

That's why I think that syntax is everything if you want to convince people to use your language. Now, what is the syntax of choice for someone new to programming?

I'm pretty sure that we can't easily convince existing C/Java devs to learn a new syntax. The only thing you can try to do is a side-by-side comparison of *common* everyday code snippets and prove that your language actually reduces code significantly. (I've seen something like that with OODBs vs RDBMs and it was convincing, not only for me).

C's syntax for variable declaration is ugly

[[are C-like languages popular because most people already know C/Java-like syntax or are they popular because that's the most understandable and least ugly syntax we have today? ]]

C's syntax is ok on average but C's syntax for variable declaration is ugly, Limbo's syntax for this part which used Pascal's order and C's terseness is much better.

IMHO what matters is compatibility: when you use C for ten years, it's much easier to grok a new language if it's a bit C-like, still it's annoying to see that C++, Java copied too much C's syntax instead of fixing it like Limbo did.

Ugliness is a matter of taste

The big problem with C (and especially C++) syntax, as I mention in another post below, is that you need to know the kinds of a term to correctly parse it (i.e is a term a type or a value), but the syntax doesn't in general provide any clues as to what it is.

Consider the following declarations:


struct Foo{
    // assume constructors as necessary
};
struct Bar{}; 
enum {bar};

Foo foo (int);
Foo foo (5);
Foo foo (Bar);
Foo foo (bar);
Foo foo ();
Foo foo;

The first declaration of "foo" is easy to parse; it's declaring a function "foo" which takes an int arument and returns a Foo. The kind of the stuff in the parens (int) is "type". The second example instead is declaring an instance of struct Foo, also called "foo"; and invokes a construct which takes an integral type--5 is of kind "value".

In the third and fourth cases, the tokens "Bar" and "bar" are neither a keyword bound to an implicit type (and thus known to be a type), nor a a literal known to be a value. They are simply identifiers. Is the resulting expression a function declaration, or a variable declaration? The parser has to decide--but it doesn't have enough information. Instead, it has to wander into semantics-land, look up "Bar" and "bar" respectively in the symbol table, and see what the heck they are. "Bar" is a type and "bar" is a value in our example, so the third and fourth expressions are a fdecl and a vdecl respectively.

When you deal with template expansions, it actually becomes impossible for the compiler to determine without some help--which is why the "typename" keyword was added to C++--to flag a given term as a type when its kind would be otherwise ambiguous. (There isn't a "valuekind" keyword, though... and strangely enough; there are certain contexts where "typename" may NOT be applied to a term of kind type).

What of the fifth example, Foo foo()? In this case, we have an example of a reduce-reduce ambiguity. Is that an empty list of types, or an empty list of values, inside the parens? It could parse it as either! The language solves that issue by fiat; declaraing that the empty parens in this case is a list of types; and that the expression is therefore a function declaration. If you want to declare an instance of Foo and use the default constructor; you *must* leave off the parentheses, and use the "Foo foo" syntax.

Psychology

Seems to be rather open ended and an invitation to opinion rather than some objective criteria by which we should rigorously evaluate what makes a good first PL.

I really wish the psychology of computer programming was given a bit more attention. There's the classic book by Weinberg and I believe there used to be some kind of magazine.

Maybe Intentional Programming will lead to more experimentation.

What Chris said -- but I do

What Chris said -- but I do think observing conventional precedence rules in arithmetic expressions is particularly important for new programmers.

Python

While Python is not my personal favorite language, I think that it's the best for beginners: in Python3k they even changed the meaning of / so that now 1/3=0.3333.. instead of 0.
This kind of thing is great for beginners.

Smalltalk isn't bad for beginners but it could be improved: respecting the math precedence is a start, maybe a more math-like way to call a function "f(param1:val1 param2:val2)" instead of
"f param1:val1 param2:val2" would help too, I don't know..

My own pet peeve about Smalltalk is the variable declaration at the beginning of block instead of "inline", but I don't think that beginners would care too much about this point..

Syntax?

What type of syntax should this programming language have?

  • A programmable one. But they don't know that (yet), because they need to start simple.
  • use visual layout (tabbing, margin-positioning) to structure code. Python seeks this.
  • maximial use of identifiers + minimal punctuation; a newbie tends to regard punctuation as fluff and not as parsing semantics.

Do you think it's important to respect math precedence?

  • we're talking about beginners here
  • goal: preserving syntax one-to-one with mathematical notation as much as possible
  • the associative and commutative properties of arithmetic operators is probably why the operator precedence (OP) was adapted in the first place (subconciously?)
  • it wouldn't take much work with a programmable syntax
  • meet their needs; they'll just add it themselves if you don't. (If they can't, they will grumble about it and eventually your language will be superceeded with something that has OP)
  • Yes

Lots of things to consider when evaluating a syntax:

Much of what makes a good syntax is dependent on context. Is the code frequently read by humans, or mainly processed by machines? Is the code frequently written by humans, or generated by machines? If humans are involved--what is the problem domain? What is their expected level of expertese, either in programming, or in the problem domain?

In general, a good syntax for human consumption will have the following attributes:

* Is unsurprising. Typing "let x = y" should not reformat your hard drive. Things which are similar should look similar; things which are different should look different.

* Is easy to read and write. Languages which look like line noise are notoriously hard to read. Languages which are extremely verbose are painful to program in--many programmers are lazy typists. Also import is the syntactic difference between semantically distinct programs; adding some redundancy (such as requiring that things be explicitly declared before use) can catch many programmer errors early. On the other hand, redundant specification of things can also lead to errors if the redundancy isn't caught; and can also make program changes more difficult (try changing the type of an argument to a base class method in Java, and watch how many things you'll have to fix).

* Maps the problem domain well (common idioms in the problem domain are easier to type and spot), and respects the conventions of the problem domain. Shell programming languages make I/O syntactially easy. Most general-purpose languages use standard mathematical notation (or ASCII approximations thereof) for arithmetic, and respect mathematical precedence.

* Likewise, if targetted towards a particular programming community, maps their expectations (which may include resemblance to another programming language that there is familiarity with). Many attribute Java's popularity to its (superficial) similiarity with C/C++, and indeed many tokens and grammar productions in Java mean (nearly) the same thing as in C/C++. Java did improve quite a bit, removing some of the nastier syntax of C++ (though Java Generics thought it reasonably to more-or-less clone C++'s infamously ugly template syntax). Many arguments about braces (curly vs square vs round) are essentially pissing contests between different language communities whose members are more comfortable with a particular style--beyond catering to an extant community, there is no reason to prefer one style over another.

* Use of natural language for built-in language constructs is a mixed bag. Keep in mind that not all programmers speak English. Trying to emulate natural languages too much may result in grammars which are stilted or ambiguous. And humans don't recognize and "parse" written text in the same way that machines do (in particular, people don't deal well with deeply nested structures).

* Subject to the first rule, is regular and unambiguous. Here, "regular" means straightforwars and free of exceptions and corner cases (as opposed to regular expressions). "Unambiguous" means each valid sentence means exactly one thing which is clear from its reading (note that this doesn't preclude grammar ambiguities which are clearly resolved by specification, such as the "dangling else").

* Is amenable to machine processing--both by compilers/interpreters, and by other tools which may analyze or generate code. This enables the creation of tools and libraries to assist programmers with their endeavors. Two languages with syntaxes which flunk miserably are early dialects of Fortran (the first high-level language, so it can be excused), and C++ (which features niceties like several "reduce-reduce" amgiguities, the need for the parser to know the kind of a term but being unable to determine it from the grammar. For an example of the latter, consider the syntactic distinction between a function declaration, a function invokation, and a constructor invokation...)

* Is well-specified. Each valid program should have precisely one meaning. This goes beyond "syntax", of course, but is important to consider. Note that most production languages have a few implementation-dependencies on them, but in general, programs should either have a well-defined precise meaning, or fail to compile. One reason many here dislike C/C++ and such is the ease of writing programs with undefined behavior--programs which are syntactically valid but semantically rubbish.

Also keep in mind the difference between a concrete syntax (the textual or graphical representation of a programming language) and the abstract syntax (parse trees, etc). In some languages (Lisp/s-expressions, XML), the concrete syntax maps straightforwardly onto the abstract syntax; in others, it does not. Abstract syntax has different constraints than does concrete.

And keep in mind that syntax, for some reason, produces the silliest flamewars. :)

[edited to fix URL above]

No poll on syntax is worth a

No poll on syntax is worth a thing without Pascal/Ada...

...by which you mean Algol

...by which you mean Algol, of course. ;-)

There can be only /I.

Don't forget the culmination of all such languages: PL/I.

Lispers like to claim that all languages trend towards Lisp, but if you examine languages like C++ and C#, you'll see that in fact, those languages are trending towards PL/I. While having a bad rap as a kitchen-sink language, in fact PL/I was merely ahead of its time, anticipating the kitchen-sink-language trend by 40-odd years.

According to the PL/I FAQ, "It has more power than Pascal, Fortran 95, BASIC, C, and COBOL, and has comparable facilities to Ada." Take that, Ehud!

PL/I is a really ugly

PL/I is a really ugly language... especially as regards syntax. Specifically, what I have in mind is the fact that you can often change the order of keywords at will, there is a very hairy preprocessor, it has all kinds of strange conversion rules when defining data types, which make different declarations have identical results etc. So no, for the specific exercise at hand PL/I and Pascal/Ada are as different as you could possibly imagine, given that all of them are imperative, procedural languages.

/I, Singular Sensation

And, specifically to the point of the question being asked, the PL/I FAQ informs us that

The language is suitable for beginners, as well as for anyone wanting to become a professional.

There you have it!

A Few Thoughts on Quasi Natural Language Syntax

One frequently hears the assertion that End Users will think they can type anything in English and have it understood. My suspicion is that this hypothesis stems from "AI" based systems deceptively marketed as doing just that and not an inherent result of QNL notation per se.

I think the best way to present QNL to End Users is to take a "cookbook" approach and provide them with representative utterances that work and negative examples (something we rarely see in documentation, but which has proven indispensable in the Linguistics literature where a '*' preceding an example indicates that it is not well formed) of pathological constructs that will choke the language. If you tell the users that they are interacting with a program that does not support unrestricted English, I am confident that they can adapt.

As to non-English speakers, there is no reason not to support alternate natural languages like say a romanized representation of Japanese. This would entail providing hooks for building a bi/multi-lingual lexicon of identifiers (if you wanted to support Machine Translation of Programs sans comments - which would be in unrestricted natural language that could of course be funneled through Google translations) and reserve words. Then you would need an alternate grammar for just the programming language constructs making the task far simpler than trying to support a full human grammar.

Supporting a rich set of QNL constructs where issues like coordination, gaps, anaphora, and morphology come into play would require a different approach to parsing than what we use for algol derivatives, but Categorial Combinatory Grammars and Parsing Expression Grammars both hold significant promise in going from a subset of English to an abstract syntax tree and any resulting ambiguities can be resolved up front by the language design (e.g. the first parse wins as in a PEG).

That said, there are two non-exclusive approaches to doing something English-like, which could be exposed as alternate dialects. 1) We can eschew any CS terms of art using a circumlocution like "warm fuzzy thing" instead of a scary term like "monad". 2) We could support formal PL terminology, eschewing only the sort of "line noise" compression than makes Perl and APL notoriously dense. Approach two might be ideal as an interlingua for use in conference papers by experienced programmers, as one can easily envision mechanical expansion of existing programs into such a notation, provided that the programmer supplied a mapping of any cryptic identifiers into more readily comprehensible ones.

As to people's ability to comprehend deeply nested constructs, a language design could encourage a shallow design composed from named elements perhaps by employing an interactive QNL dialog to evolve the program over time. But that would be an orthogonal dimension of system architecture similar to adding multi-paradigm or software visualization support.

Overall, these strategies would probably expand code and would in theory make for a lot of "evil" typing work. But in the real world, we could write a handful of editor macros and list them in a GUI pallet to insert such boilerplate. Moreover, code is written once and potentially read many, many times making a bias toward readability a reasonable design choice.

This also suggests a route toward the rigorous evaluation of alternate syntax designs. We could develop some sample programs that could be expressed in both QNL and conventional programming languages, express them in pure non-executable pseudo code, give programmers and end users an explanation of the target syntax(s) into which the program description should be transformed and then have them re-write the code. In the other direction, we could provide, for example, C and QNL encodings of a program and then ask programmers and non-programmers various questions about the code to evaluate how fully it was understood.

I would hypothesize that:

  • An experienced programmer will do better at encoding a problem in his her language of choice than in converting it to QNL until some level of experience with the QNL is achieved.
  • An experienced programmer will do better at encoding a problem in QNL than in a programming language based on an unfamiliar notation (e.g. trying to write Perl after working in Scheme).
  • An experienced programmer will perform equally well in reading QNL or a familiar programming language if the QNL is based on PL terminology.
  • An experienced programmer would do better in reading a familiar language than a QNL based on circumlocutions to avoid terms of art.
  • End Users would do relatively poorly at all tasks involving conventional PL syntax unless they are math people using a PL based on the notational conventions of mathematics.
  • End Users might learn to write a QNL based on circumlocutions more readily than one based on terms of art, but they could readily move from using the former to the later once they understood the underly big idea being associated with the new terminology.
  • QNL should be easier to learn since our brains are wired for NL and not for the abstract strings of symbols one finds in a "line noise" encoding.
  • The best way to teach QNL would be through the use of positive and negative examples, building up skills through several Language Levels as in PLT Scheme.

Has anyone seen any experiments along these lines?

Somehow, I doubt that this sort of work has been done, since it would require many test subjects with different levels of PL familiarity and a rather complex experimental design subject to human-test-subject administrative protocols.

That said, if it any work like this has been done, pointers to that research would be deeply appreciated!

Very doutful that QNL would be useful for writing

Sure reading a program in a QNL would be easier for beginners, but I think that writing programs in a QNL would be harder than say in Python or Lisp.

As MUD shows, computers are very limited in understanding 'natural language', so beginners in QNL would suffer a lot from the limitations of the computer.
Sure eventually they would adapt but IMHO this would be significantly more painful for the beginner than learning a language well suited for beginners..

Why I'd Rather Write QNL

I would submit that QNL could eliminate many errors if it mapped to PL concepts.

For example, let us posit that we are writing some low level code and need to define a hairy data structure.

Would it not be easier to write something like.....

Foo is a pointer to an array of 33 handles to '3-d input samples' as defined in the header file 'space-ball data structures.qnl'.

.... than it would be to grab a copy of K&R to dope out the C equivalent?

Moreover, a QNL parser need not be limited to the constructs found in a 1980's text adventure. Considerably more powerful NLP algorithms could be applied.

Likewise, devising an effective QNL surface structure is orthogonal to the question of what language features make for a good beginner language, so all else being equal, we are just postulating a notation that matches how the programer would express his or intention to exercise unambiguous language features in English or some other natural language.

This should be easier since it removes the artificial step of encoding the solution in a compressed parser friendly notation.

Think of how you would transliterate some perl into English to get a sense of what QNL might look like.

Not convincing

[[ Would it not be easier to write something like.....
Foo is a pointer to an array of 33 handles to '3-d input samples' as defined in the header file 'space-ball data structures.qnl'.
.... than it would be to grab a copy of K&R to dope out the C equivalent? ]]

Except that as I said before C's syntax for variable declaration is ugly, so it's not suited for beginners: Limbo (Pascal) type declaration make it easy to declare this kind of thing.
And it's better than "compiler error: 'which is' not understood" if for example, you had to use 'as defined' but not 'which is defined'.

It's the same for Perl which is also a language not good for beginners, so to be more convincing your examples must show the advantage of using a QNL over Python (for example) not C,Perl,APL..

QNL avoids Pythonic whitespace dependencies / Errors Orthogonal

About the only thing that I've found problematic with Python syntax is its dependency on whitespace to group statements and the visual ambiguity of tabs v. spaces and the risk of false alignment in the context of some editor font defaults. (Note that I'm not using Python day to day, so there maybe easy fix I'm missing. Moreover, this isn't so much an issue with one's own code since you can either use just spaces or just tabs, but what happens if a novice users goes cutting and pasting code fragments with different whitespace conventions in some dumb editor before feeding them to Python?)

Using indentation makes it hard to relate code verbally and might make life more inconvenient for blind programs using off-the-shelf software (again, just speculation here).

With a QNL formulation, you could write and interact with a piece of remote software over your cell phone or run it on a PDA hands and eyes free without having to deal with line formatting issues.

---

On an unrelated note, QNL could support both 'as defined' and 'which is defined' and any other sensible formulations as alternatives that would boil down to a single canonical form. Applescript does just this. Likewise, NLP systems *can* produce more meaningful feedback than 'complier error' - perhaps something like:

system: 'funky data structure' needs to be defined somewhere, we don't recognize the phrase 'check it out in foobar.qnl' and were expecting a file reference of the form 'as defined in ' or 'which is defined in '. Since 'foobar.qnl' matches the grammar for a file name, should we treat it as such?

user: yes

system: Replacing 'check it out in foobar.qnl' with 'as defined in foobar.qnl'. Program now understood.

So you see, the question of error handling really is orthogonal to syntax style, but you are looking at a completely different tool chain that might draw on some AI techniques and tools tools like Cyc or WordNet with Pegs or CCGs used for grammar definition. This could be thought of as an entire additional layer of language processing that would replace the traditional Unix language development tools in getting from raw input to an abstract syntax tree.

I don't know... most people

I don't know... most people can't even write e-mails to other people in natural language, let alone a computer. ;-)

Anybody who writes software in a QNL

is likely to be eaten by a grue.

You are in a maze of twisty parse trees

all alike.

What problem are we solving?

If we really have to re-thrash a dead equine like "which syntax is better", let's at least try to think of something more interesting to discuss than "Which is better C or Lisp?" and similar questions.

A big problem with these discussions is that they talk as if e.g. "C syntax" or "Pascal syntax" or even "Lisp syntax" is a well-defined thing with fixed properties both good and bad, independent of the semantics it is being used for or the context it is being used in.

Syntax, like all other PL features (or program features, for that matter), can only be assessed as a design solution for a well-defined design problem. Without specifying what concepts you want to empasize or clarify by distinguishing them in your syntax, "Is syntax X good?" is a meaningless question.

The exemplar for this kind of thinking, in my opinion, is Oz. Whether one likes the syntax there or not, it shows clear signs of being designed to unambiguously distinguish the different concepts that its designers wanted to distinguish. This may have forced them to make some choices that at first seem unfamiliar, but which in the long run make Oz a language that is very consistent, and easy to learn to read.

Again, I'm not necessarily recommending Oz's particular solutions as "the good ones": the key thing to note is the kind of thinking one has to use and the kind of context that one has to consider if we are going to make any significant improvements in our discussion about syntax.

The goal is to have a syntax

The goal is to have a syntax that appeals to programming newbies without limiting them on the long run. It's not about some special use-case. It's about replacing Java and C++. The basic language operations (control flow, function calls/definitions, objects, etc.) should have a nice syntax and there shouldn't be any surprises.

Mentioning obscure languages is pointless for the reason that you named yourself: they are not nice for beginners, so they're not acceptable (imagine there being alternatives with similar functionality, but wonderful syntax for any language you suggest).

It's not about "C vs Lisp for existing programmers". Most programmers already know C and many of them wouldn't even use Python because they're so happy with {}; and other syntax noise. I'm interested in what makes a language attractive for newbies, so we can start from scratch and get rid of stupid noise.

For example, Python has a few nice ideas, but maybe it's not readable enough compared to something more verbose like Smalltalk (readability is important for newbies and experienced devs). OTOH, from my discussions with experienced and newbie devs I got the impression that most of them found that the only desirable feature of Smalltalk syntax is the keyword message syntax. Apart from that, it doesn't look and behave enough like math and it's too cryptic (^$#|:[].). Another annoyance is that "a = b ifTrue:" behaves as one expects, but "a = b whileTrue:" is a totally different story, though it's consistent within Smalltalk. AFAICT, most people don't want to keep little syntax details in mind when developing, but just concentrate on their high-level goal and get that implemented in as few lines of code as possible while retaining readability and maintainability.

O newbie, where art thou?

The goal is to have a syntax that appeals to programming newbies without limiting them on the long run.

Unfortunately, I don't think this is a meaningful design problem. Contrary to popular assumption, "programming newbies" are not a homogeneous group. A lot depends on what we assume their existing background and knowledge is, and what they plan to do with programming. Are they high school students? Novice computer scientists? Mathematicians? Graphic designers?

Furthermore, what concepts are we trying to make clear to them? This is heavily dependent on the semantics for your language, and again on the purposes you expect they will be programming for.

There is no magic bullet in syntax any more than their is a magic bullet in any PL issue that will solve all problems for all people for all time. Syntax has to be defined in the context of a given language with given semantics for a given target audience. Only in such a context can a given syntax design be assessed.

The most readable language I know is...

...Applescript. It's designed for newbies, and because of the focus groups involved, was effectively designed by newbies too.

But I also find it to be one of the hardest languages to write, mainly because of its widespread inconsistencies.

So I wouldn't place readability above, say, consistency.

In fact this is probably a good read for anyone trying to design a language for non-programmers.

Syntax for what, exactly?

Before you pick syntax, don't you need to decide on

* call-by-need vs call-by-name vs call-by-value
* impurity vs purity
* sequential abstraction methods (macros, classes, functions, modules, rules, type classes, meta-objects, etc)
* parallel and concurrent abstraction methods
* typing system
* effects system
* etc, etc, etc

Picking a syntax before you pick any of those seems premature. And your choices here will have a far, far greater impact on what a newbie thinks of as natural than the "braces vs spaces" debates.

Most programmers never get over a preconception that programming languages should be strictly evaluated, sequential, imperative, and either untyped or typed in a very limited way. That's all they've ever seen.

These preconceptions do more to hurt the ability for programmers to learn new things than quibbles over syntactic difference. In turn the industry suffers when problems that would be easy given another approach are dismissed as infeasible or are solved using overly complicated code in the wrong "paradigm."

If we really want to educate programmers to solve the unknown problems of their future, we need them to be learning about multiple tools. I don't mean Ruby, Python, and Smalltalk - no matter how nice those languages are and no matter how much their syntaxes differ they are all, to a pretty good first approximation, the same tool.

I want students to think in terms of: logical inference vs function evaluation. Declarative thinking vs imperative thinking. Type based reasoning vs case by case analysis. Concurrency through shared variables and locks vs message passing vs data flow vs transactions. Abstraction through meta-programming vs the million and one other ways to create abstractions.

Once we figure out how to do that, then we can discuss syntax.

Complex syntaxes can be friendly

Picking a syntax before you pick any [semantics] seems premature.

This case is made well in the History of Haskell (previously on LtU). Section 4 is about syntax, and it's well worth reading in connection with the current discussion. Section 3.6 makes a great point about the advantages of embracing superficial complexity at the syntactic level (in effect, TMTOWTDI), while avoiding deep semantic complexity.

Amongst such more concrete points, it also mentions that syntax is a language's user interface (something which Stroustrup pointed out in his '94 book about the design and evolution of C++). I think that's a helpful perspective to understand the significance of syntax. It explains why syntax is often focused on excessively by the inexperienced — because it's what they interact with most directly and obviously. (A certain Usenet wag once commented that syntax is the first refuge of the inexperienced language designer.) It also explains why syntax is not of paramount importance, and takes a back seat to semantics in that respect.

People tend to be willing to learn whatever interface they need to learn, within reason, to get a job done. Once learned, almost anything can seem intuitive. The wrong semantics, on the other hand, can really get in the way of doing a job. That applies about as much to say, a word processing program, as it does to languages.

Semantics assumed?

I suspect that, implicitly buried in the original question, there are some assumed semantics. Perhaps there's an assumption that it will be easier for newbies to learn imperative programming first because imperative programming is somehow easier.

That might be true. Then again it might not. We might just be trapped in a self perpetuating loop. I learned imperative style first so it's "easy" therefore I'm going to teach you the imperative style.

But for the sake of argument let's assume that imperative coding is easier to learn than pure forms. Here's a question: if we start with purity does it make it easier to learn other less pure semantics? For instance, if the newbie can master purely functional code will imperative code seem like old hat (oh, I see, it's like I'm always in the IO monad)? Or would imperative code deeply incomprehensible to somebody who learns purity first?

If the former, then isn't there value in starting with what might actually be, in some way, the "harder" semantics in order to ease the transition into other territories?

The same kinds of questions go in spades for concurrent programming. If we start with concurrency does the extra cognitive up-front load pay off in an easy time understanding more sequential styles as being a degenerate case? Is starting with concurrency too much of a barrier? Or is the barrier in the relatively week mechanisms popularly used to deal with it?

I honestly don't know the answers to these questions. But I do think they're more interesting than "braces vs spaces."

Thought

I think that the importance of languages is overrated. In the early days of programming it was common experience for a scientist or engineer to be producing their own programs within a week. These were people with no computer experience "at all". Never even saw a computer! How can it be? The important skills are math and science, and the discipline and thought processes involved. Having said that the early languages such as Fortran and Basic were perfect in that they introduce a minimum amount of constraint, letting the programmer express his or her own thought process.

A common pattern on syntax quality...

Straightforwardly denoting execution semantics (ES) is the key to a readable syntax!

Why? Because if you don't denote ES, then there is a disconnect between how the programmer interprets code, and how the computer interprets code.

Thoughts:

  • A human brain is a language interpreter too.
  • Syntax included for the parser (curly braces, parenthesis, blah) does not correspond to ES, so the programmer tends to forget to add this syntactic "fluff", focusing instead on the goal (semantics) of the program.
  • Every syntax problem I see arises from an inconsistency between the syntax and the ES.

Bad examples:

  • Smalltalk--trying to emulate math expressions in a message-passing semantics is only going to confuse the user, because there is a disconnect between the execution semantics (ES) of the language, and the ES that the user expects from the syntax.
  • C's variable declarations--they don't denote the semantics of variable declarations--they only imply it.
  • Line Noise--to little denotation; too much ES; no balance.

Good example:

  • Look at Haskell--it distinguishes functions with side-effects! It denotes those semantics, and I've noticed that it helps when reading code!

Message-passing should look differently then subroutine invocation, and that should be different from variable declaration, etc, etc.

Careful with what you mean by "message passing".

That particular term of art can be ambiguous, especially when Smalltalk (specifically Smalltalk-80 and its successors) are brought into the mix.

Outside the domain of Smalltalk, the term "message passing" is usually used to mean an asynchronous or semi-synchronous (in either case, one way) communication between two different processes. (Here "process" is used in the abstract sense, not the OS-centric sense of a kernel object with a private address space and symbol table).

In early dialects of Smalltalk, which were somewhat based on the actor model, that was true as well. Each class had its own process, and classes did indeed (asynchronously) pass messages back and forth.

However, in later dialects of the language, including Smalltalk-80, a synchronous model was used instead. Smalltalk-80 "messages" are fully synchronous, in that the caller suspends until the target completes and returns a value. Semantically, message passing in Smalltalk-80 is the same is method invokation in any other dynamically-typed OO language. Unfortunately, the name "message passing" stuck, and Smalltalkers still use a term that in other contexts implies asynchronous behavior, for a fully synchronous context.

Getting back to the topic of this thread though; the failure of Smalltalk to follow standard mathematical operator precedence is FTMP orthogonal. Smalltalk parses a + b * c as (a + b) * c, rather than a + (b * c) like most languages. The language permits custom classes to implement the binary operators, but makes no assumption as to their meaning--thus the designers decided to always evaluate them in left-to-right order. (C++, on the other hand, also permits overloading operators, but requires that they have the same precedence as the default operators).

Which version?

Which early version of Smalltalk implemented message passing asynchronously? Smalltalk 72 certainly didn't.

Apparently, Smalltalk-76 for one.

Not quite "very early" in ST history, but there you go.

See this page on c2.com for more.

"message passing", drawing a strait mental line from wording...

...technically means that the act of synchronously passing a message is not the same as an actor asynchronously (concurrently) executing a message it recieves. A blocking routine "send()" that passes a message, is the same thing as asynchronous message passing. This is what I mean by denoting executional semantics--(a) using a synchronous function call to denote a message being passed, as opposed to (b) message passing being implicit, where:

(a) =
send(window, newMsg("close"))

(b) =
window close

Debatable

[[ Bad examples:
Smalltalk--trying to emulate math expressions in a message-passing semantics is only going to confuse the user, because there is a disconnect between the execution semantics (ES) of the language, and the ES that the user expects from the syntax. ]]

Only if the user really understand the message passing semantic and stop his mathematic training getting in the way which is true for experienced programmers not for beginners.

For them, if it looks like math then it has to work like math and "a + b * c" do look like math..

Not like math

"a + b c" or "a + b . c" or "a + b x c" look like math. If we are going for math like syntax we need proper mathematical notation, like square root, sub/superscripting, fractions, etc.. Anything else is a poor's man solution.

Now back to lurking...

Exactly. I think contorting

Exactly. I think contorting a language's syntax just to conform to mathematical convention is just a bad idea. Smalltalk has it right here IMO. How hard is it really to put brackets and enforce explicit order of operations anyway? I already do this in all languages with "proper precedence" simply because it's a good idea. Besides, the programmer is going to have to learn the language's semantics anyway if he's going to be reasoning about his programs, so it's just more confusing to say, "this is how things are evaluated... unless it's this sort of thing, or that, or this other thing". Uniformity and simplicity!

Math is nice but not a goal

I agree. Convention is nice only if there's actual mathematical notation. OTOH I think Smalltalk got it wrong. They should forbid mixing operators without parenthesis to enforce precedence, which most programmers end up doing anyway on complex equations. That and the behavior of ; (it would be better if it behaved like $ in Haskell) would create a very simple set of rules to write and read code.

I meant that Smalltalk got

I meant that Smalltalk got it right purely in the precedence-free, simple left-to-right message passing way; I'm not familiar enough with Smalltalk beyond those surface semantics, so I'll take your word for it. I plan to learn more about it in the near future, so I'll keep your comments in mind. :-)

"Just to conform"

just to conform to mathematical convention

I think it's worth pointing out that modern mathematical notation is a product of several hundred years of fine tuning rather than a random convention that someone made up one morning. It's good. It works. It's well tested. It's worth conforming to.

If the programmer never does

If the programmer never does math then that's fine, but you can't assume that. Also, what if you sometimes need to use a math app? You have to keep in mind when to use which precedence rules and you can't just copy-paste math from one context to the other.

What is easier for you: unlearning math or simply using what everyone knows since taught early in school?

So you're saying that a

So you're saying that a language should add an ad-hoc mechanism just to support expressions which the user can trivially specify using brackets. Computer science is larger than arithmetic, and the sooner the developer gets that, the better.

Now, if I only could

Now, if I only could understand what you're saying. Why brackets? What ad-hoc mechanisms? Why is computer science larger than arithmetic? Since when do more people speak computer science (esp. PLs without math precedence) than arithmetic?

I think this is a really huge problem with all experts and technical people. They live in their own world with no intimate contact to less technical people. Engineers can't talk in a way that the line workers understand. Physics professors can't explain a theory to students without making things ridiculously complicated. Programmers don't understand the needs of users. Language designers don't understand the needs of normal programmers. At least, this happens very often. Strangely, everyone is aware of a few problems ("my students don't understand!", "our users find our software too complicated!", "nobody wants our product!"), but only few actually try to understand what most people really need. Instead, many experts continue to design for themselves, assuming that this is what everyone secretly wants (or hoping that by bombarding people even harder they can solve the problem) and if somebody claims something different he is sometimes even called "stupid". Wasn't user-centered design intended to solve that problem?

Is there any language that was built around that principle? I guess Java, Python, Rails (even if not a PL) come close within their own market and they show that this principle can be fruitful.

Fortress, though still in development, at least has the will to serve its target audience. Overall, the syntax isn't dead-simple (there are lots of constructs), but at least the basics are. And who knows, maybe for most programmers having lots of syntax constructs is no different from learning an API or pattern, so maybe it's not important to have a syntax with as few constructs as possible, but rather enough flexibility to implement any concept you need. Yes, that's mere speculation, but it would be interesting know if we can stop designing compact syntax definitions (i.e.: elegance from the PL developer's perspective) and start designing readable, clean, and consistent syntaxes (what the user might actually need).

BTW, is there any wiki/project where people can collaboratively design syntaxes (together with the PL's semantics, of course, but from the point of view of the syntax)?

Now, if I only could

Now, if I only could understand what you're saying. Why brackets? What ad-hoc mechanisms? Why is computer science larger than arithmetic?

Because the semantics of arithmetic is closed, where the semantics of programming languages as a whole is open. Why shoehorn a language's semantics into the pigeonhole of arithmetic? It just doesn't make sense.

The precedence of arithmetic operators is ad-hoc, and was used merely for convenience when doing math by hand [1]. There is nothing magical about it, and there's no reason to force a language to use a specific semantics just because arithmetic is done that way.

I mentioned brackets simply because every language needs a grouping mechanism, and even in mathematics brackets tend to be that mechanism. Brackets let the user specify the order of operations without sacrificing the native language's semantics. If you want 3 + 4 * 2 = 11, then group it properly to enforce the order of operations: 3 + (4 * 2) = 11. It's just better style anyway, as it's more resistant to refactoring.

[1] others have mentioned that the precedence was probably influenced by the properties of arithmetic.

Since when do more people speak computer science (esp. PLs without math precedence) than arithmetic?

It's not a matter of popularity, it's a matter of knowing what domain one is working in. The domain of programming languages is not the domain of mathematics. If one is programming, one should not expect everything to be as it is when one is doing mathematics.

If we're going to criticize languages for not following mathematical principles, then we should be consistent and criticize them also for not using unlimited precision integers, rationals, and maybe even reals by default, otherwise the programmer will be surprised the first time he performs division; unfortunately, the reals aren't even computable, so what do we do then? This highlights the very disconnect we have here: computation is not mathematics, and the sooner the programmer realizes this, the better off he will be.

math

It's not a matter of popularity, it's a matter of knowing what domain one is working in. The domain of programming languages is not the domain of mathematics. If one is programming, one should not expect everything to be as it is when one is doing mathematics.

With this argumentation we could also say that math notation should not be used in engineering or anywhere else. But it's a concept we learn at school, so everybody knows it. Why is it wrong to reuse that knowledge everywhere? People don't re-learn conventions in every domain. Instead, they reuse what they already know, so they can concentrate on what really matters. By reusing existing notations you definitely make it easier to learn a new concept.

Seriously, the only programmer-side argument against using math precedence is that you need to keep it in mind when overloading + and *, but frankly, why should you overload those operators? That's arbitrary. You could as well have list.append() or string.append(), etc. which is more readable, anyway.

Unconventional conventions

With this argumentation we could also say that math notation should not be used in engineering or anywhere else. But it's a concept we learn at school, so everybody knows it. Why is it wrong to reuse that knowledge everywhere? People don't re-learn conventions in every domain.

But people do end up relearning conventions in different domains. The most obvious example I can think of off the top of my head is electrical engineers using j to represent imaginary numbers, instead of the mathematician's i (the latter being easily confused with the symbol typically used to represent electrical current). That's not to say that every convention must be relearnt. But there's certainly precedent for pragmatic adaptation of conventions to new domains.

IMHO, the difference between

I think the difference between your example and my example is that as an electrical engineer you only work with j instead of i, but basic arithmetic is ubiquitous. When you calculate something simple on paper (which doesn't have to be unusual, even for a programmer :) will you unlearn the math rules you know from school or continue to use them? When you use math in an application (Excel, gnuplot, etc.) you have to use math precedence. If you have a half-way decent calculator you again normally have to use math precedence. When you read other people's math (Wikipedia, articles, simple arithmetic, helping your son) you have to use math precedence. It's just about everywhere. You can't fight it. This is a different situation than what you described.

I think the difference

I think the difference between your example and my example is that as an electrical engineer you only work with j instead of i...

Unless I'm trying to read and apply a text or article written by a mathematician :-)

Seriously though, I actually agree with you (to a certain extent) about arithmetic precedence rules: the conventions for syntactic->semantic mapping that we use are well-developed, time-tested, and have been used in a number of different domains - so any language designer that plans to modify them had better have some pretty good justifications (aside from parsing convenience) for doing so. But I object to the blanket assertion that conventions aren't adapted for different domains. I don't have problem with arithmetic precedence rules being discarded if there is a good justification for doing so.

People reuse concepts more

People reuse concepts more than notation and/or syntax. If the syntax/notation is not usable in the new domain, due to conflict or other reasons, then new syntax that fits the domain is created. As the other poster mentioned, the notations of complex mathematics differ from discipline to discipline; as an electrical engineer I can attest to that.

Programming is such a different domain that new syntax is warranted. First we define the domain and the notations used so we can reason in the domain easily, then we adopt existing concepts into the new domain.

Honestly, syntax is easy to learn; it's the semantics that are hard. Internalizing how something behaves is much harder than learning how to label it. I can rattle off at least half a dozen different ways languages express string concatenation (+, ++, &, ^, append, concat, conatenate, etc.), and it took me half a second to learn each one. But first learning about strings and the meaning of concatenation took me a long time (relatively).

Exactly my point

and that's what a programmer has to do--think in the paradigm of the language, and not its syntax. Syntax that hides the paradigm--the executional semantics--causes a problem for those who misassociate the syntax with the semantics, which tend to be beginners...

Which is why a language for beginners should either use math precedence OR should NOT have concrete syntax (but rather abstract syntax).

Another example is Forth...
Syntax with the implied semantics of:
3 5 + 7 /
Is equivalent to the denoted semantics of:
push(3)
push(5)
exec(+)
push(7)
exec(/)

Or even more heavily denoted semantics of:
push(3, stack)
push(5, stack)
exec(+)
push(7, stack)
exec(/)

Or:

Forth_interpreter.pushOntoStack(3)
Forth_interpreter.pushOntoStack(5)
Forth_interpreter.execSymbol("+")
Forth_interpreter.pushOntoStack(7)
Forth_interpreter.execSymbol("/")

The point is that denoting semantics results in easy reading and interpretation. (But obviously not in productive code writing, which is not the point).

syntax depends of audience & good editor!

A "good enough" syntax heavily depends of the audience & of the semantic of the language.

I think that syntax is not that important. More important is the semantics, and also the editor.

Current programming environments are the tools which work the less for their users (ie developers). There is usually no equivalent of "spelling checkers" in them: the computer don't work much for the developer, except for syntax highlighting & (perhaps interactive) compilation (or "interpretation" in a wide sense).

For instance, assuming an ML like (strongly statically typed, functional, eager with few side effects) semantics, it would be IMHO welcome that the environment (ie a glorified editor & interactive compiler) made type inference & completion interactively (and mixed).

Regards.

And syntax is much less important than 30 years ago: As a teenager, I did start with PL/1 on punched cards (I was 14 in 1974), and at that time syntax was important: a syntax error meant 5 - 10 minutes of lost time (the time to put the punching cards & wait for the job & read the printed output); today a syntax error is only a few seconds of my time.

Semantics is more important, even with ocamlc -dtype & emacs with interactive type querys (e g C-c C-t IIRC) a type error (in a higher order function) requires sometimes a dozen of minutes to be understood & found.

And pragmatics, ie program intention & debugging, is even worse.

Regards

Syntax is not that

Syntax is not that important? Imagine you want to get a first impression of a language (e.g., Haskell), so you want to see some sample code. Since you're interested in what the language can do better than your current language (you won't learn it just because the syntax is nice) you search and eventually find that quicksort can be defined in three lines:

 qsort :: Ord a => [a] -> [a]
 qsort []     = []
 qsort (x:xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort [y | y <- xs, y >= x]

Uhm, what was that? Obviously, the author's keyboard must be broken or he tried to encrypt his message, so nobody could understand it. Even with hard thinking I can't recognize anything that might resemble a sorting algorithm. Well, let's look at a simpler example

fac :: Integer -> Integer
fac 0 = 1
fac n | n > 0 = n * fac (n-1)

Ouch, this hurts. One has to concentrate to understand something as simple as that. Only few programmers would at this point still be interested in the language. The language fails to convince the programmer of its advantages because it talks to him in "Alienish".

It's probably of greatest importance to get the basic syntax constructs right, so the programmer can understand code examples and later doesn't have to fight with an oversimplified syntax (Lisp, Smalltalk) making the program semantics more complicated. When that is in place, I fully agree that semantics is more important.

Your examples are no more

Your examples are no more clear to the uninitiated in any language, so I don't see your point. And your quicksort is wrong.

It's easier to read a piece

It's easier to read a piece of Python code and it's easier to explain what it does. I've tried to explain various syntaxes to people new to programming and Python was readable. Smalltalk has nice aspects, too, but you quickly hit a barrier where the syntax simplicity starts to make the code more complicated (no math precedence, [X] whileTrue: vs X ifTrue:, $#|: are more complicated than if they were replaced with more descriptive solutions). Something like Haskell makes it incredibly difficult to get started because the code looks like line noise.

BTW, I was already wondering what happened to the quicksort code. It was not formatted correctly because Drupal assumed HTML... :(

I agree that Python is very

I agree that Python is very readable; I think many language designers should learn Python before attempting to design their own. Python is not as safe nor as expressive as Haskell though, and as programmers we are interested in the safety and expressiveness of our language as well as its readability. As for using operators in place of descriptive names, there is some contention over this; I believe most functional languages provide both and the operator is just an alias for the name. However, certain operators in a language are so common you should simply remember them (indexing expressions, concatentation, etc.). This is true of any language.

Python

Python is not as safe nor as expressive as Haskell though

With safe, do you mean static type typing? I think we're getting into a totally different topic, then, and it doesn't matter what I personally prefer (I seriously haven't made up my mind, yet), but I think that there is a good reason why most newbie PLs use dynamic typing (one less thing to learn). If the IDE could only warn you when you have typos, for example, so no stupid errors slip through...

If you talk about real safety, I'm not sure if Python could be made as safe as E, for example, but I think that the syntax itself can't be the problem, here.

As for expressiveness, I can't judge that. When my friend and I once coded a little statistical simulation using the same code principles (he tried to map my Python code 1:1 into C++) I was amazed that the C++ code was only about 30-40% bigger than my Python code which I really tried to reduce to the shortest possible form (at the expense of readability, but the C++ code was much more horrible). My expectation was more around 200-300% bigger C++ code. I've also once read about someone who made another such comparison with C++ vs Lisp. For the comparison he implemented C++ lists in a way that they don't have side-effects (i.e.: they return a copy with the modifications). He claimed that with this he was able to nearly match Lisp's expressiveness (unfortunately, I can't remember the source code). This really makes me wonder how much influence the programming language actually has on your code and whether it's not much more important to find a good concept for the PL's library.

Say what?

Elsewhere in this thread you harangue some languages for not following conventional math operator precedence.

If you pull out a math textbook and look up factorial the formula is likely to look quite a bit like the Haskell code you wrote. Yet, in this case you say the language of math is too weird and you'd rather the programming language look nothing like it.

On to quicksort. Here's a correct (though naive) form of quicksort in Haskell

qsort :: Ord a => [a] -> [a]
qsort []     = []
qsort (p:xs) = qsort lesser ++ [p] ++ qsort greater
    where
        lesser  = [ y | y <- xs, y < p ]
        greater = [ y | y <- xs, y >= p ]


It's very clear. The first line is a type signature indicating that you can only quicksort lists of elements that can be ordered. The next line says a quicksort of an empty list is the empty list. The third line says that a quicksort of any other list is the quicksort of all the elements less than the pivot (chosen as the first element) concatenated with the pivot concatenated with the quicksort of all the elements greater than or equal to the pivot. The last two lines spell out how to find the elements that are less than or greater than the chosen pivot. In other words, the Haskell version is pretty close to how you would describe quicksort.

Quicksort isn't something you throw at a new programmer on the first day. Quicksort in any language requires a deep understanding of recursive thinking which takes time for programmers to really "get."

Yes, there is notation to learn. But by the time you're ready to introduce quicksort the students should be quite comfortable with other list manipulation things like list pattern matching, list comprehensions and list concatenation.

I'm not tyring to pimp Haskell here in general. I'm just saying your examples happen to be in an area where Haskell is particularly good at conveying the concept being taught: recursion. I suspect you're looking at it through eyes clouded by many years of using languages that have a less declarative feel.

I wasn't talking about

I wasn't talking about higher-level math. That's not what most people know. I was talking about math precedence (+-*/^) and basic math functions everyone learns at school (ln, sin, cos, tan, ...).

As for the quicksort code: I don't know why you need => and -> and :: and : and ++ and <- to express the algorithm. Isn't it possible to replace that with short, descriptive words?

It would take less effort for me to understand the code if it didn't look like line noise.

Hmm, lesser and greater are list comprehensions, right? Couldn't that be expressed more clearly like (just an example)

lesser = every xs < p

BTW, my eyes aren't clouded. I simply don't see why all declarative languages need to look like %$%W$W§%§. Is there any way you can justify that? In many imperative languages you say
append(list, list2)
In declarative languages you nearly always have constructs like
list ++ list2
or (lemme' invent something ;)
?x shark (instead of "x is shark" like in Python)

Yes, it's shorter, but it doesn't make the code more readable and that's much more important than saving three bytes per line. Readability is the reason why many programmers give their functions descriptive names like rename() instead of ren(). If abbreviations are used to keep lines short, so more function calls can fit on a single line, then I think this totally misses the point. The code doesn't get any less complicated, so it doesn't buy you any expressiveness. What speaks against having Haskell with more readable syntax? I don't think it would take away any power from the developers, but it would make the language more attractive to new developers and the code easier to understand.

I don't know what the best concept (declarative vs imperative) that maps exactly to how the human brain works because I do more with imperative PLs, but I have the impression that our world is not recursive, but rather iterative, so imperative PLs might be easier to think in (at least for beginners). Though, I probably couldn't come up with good arguments to support my speculations.

Confusing library with the language syntax

Yes, imperative thinking is a part of our every day lives. But so is declarative thinking and recursion(just try to explain English grammar to somebody without getting recursive). The two examples you chose, quicksort and factorial, are pretty simple to explain declaratively/recursively and are much more involved to explain imperatively.

I'm slightly confused about your issues with symbols. Apparently your hypothetical student programmers are smart enough to learn that * means "multiply," contrary to a lifetime of learning that the symbol is more like an x. At the same time, your hypothetical newbie student isn't smart enough to learn that ++ means concatenate lists even though they have no particular training in what concatenation should look like.

Anyway, there's nothing inherent to Haskell (or declarative languages in general) about ++, etc. That's just a rule about the language as to what constitutes a legal identifier. In some languages you must use alpha-numerics. In some you can use symbols. Here's a rewrite of the Haskell code using words (that of course have to be defined somewhere but which can also be defined without any funky symbols.)

qsort []     = []
qsort [pivot:tail] = qsort lesser `concatenate_with` [pivot] `concatenate_with` qsort greater
    where
        lesser  = every_item_from tail (< pivot)
        greater = every_item_from tail (>= pivot)

The only unusual symbols I've left in are square brackets, which are an easy way to work with lists and take about 4 seconds to explain.

As for factorial being higher level math...um...wow. I won't begin to address that. Instead, here's a rewrite of factorial that looks more like you might find in other languages. I've even thrown in a bunch of parens and such. It's still Haskell, though.

fac n = if (n == 0) 
          then 1 
           else n * fac(n - 1)

Does the above look "natural" to you? If so, then you've fallen into a trap: you assume that a newbie will see the world with your eyes. I assure you that they won't. There's nothing particularly natural about the definition above any more than the equational definition you presented earlier.

At the same time, your

At the same time, your hypothetical newbie student isn't smart enough to learn that ++ means concatenate lists even though they have no particular training in what concatenation should look like.

A student definitely is smart enough. Why is the standard assumption of experts that somebody is too stupid? The question is: should we have to learn it, at all? Should we have to deal with source code that looks like ++//-*% which takes more time to decipher than more descriptive text (which at least has some resemblance to a human language)?

Does the above look "natural" to you?

You're mistakenly assuming that I apply my own familiarity with languages, here. I'm not too stupid to know that I shouldn't take myself as the ultimate reference, even if you believe that. What I don't like about the first fac code is that it looks too much like line noise, even for people who already know a PL. Yours looks OK and this one is acceptable, too:

fac 0 = 1
fac n = n * fac(n-1)

Well, fac is too small to make a point (and your qsort example could use shorter names like "filter" instead of "every_item_from", but you didn't mean it seriously, anyway, so I can stop here).

Takes more time for who to decipher?

Once you recognise them, symbols are much faster to read than lengthy identifiers. This makes it easier to transform code (or algebra, to pick a perhaps more familiar example), it's easier to spot the relevant patterns.

Um,

obviously programming in abstract syntax (mapped 1-to-1 to semantics more or less) and programming in concrete syntax have their trade offs. Why paint our glasses?

Why not allow both? Optimality would require the necessitation of both to universally meet peoples needs. Perhaps an IDE can provide visualizations of both abstract and concrete syntax. There are several possibilities here (think intentional programming). Why not have our cake and eat it too.

A constructive suggestion

Why is the standard assumption of experts that somebody is too stupid? The question is: should we have to learn it, at all? Should we have to deal with source code that looks like ++//-*% which takes more time to decipher than more descriptive text (which at least has some resemblance to a human language)?

Givas, you are skating awfully close to ad hominem here. No one was calling anyone stupid, and though there are languages that do look like line noise, none of them are being suggested here as elegant syntax design.

If you like Python for "newbies" (whoever they happen to be in your view), fine. But long experience here at LtU suggests that this kind of discussion goes nowhere unless some fairly specific and objective criteria are put forward as a sensible basis for dicussion.

"I like X" and "Y is ugly" won't get us anywhere.

OK

Seriously, I hoped that it would be pretty clear which syntax is the easiest to read, but it seems like most people here (who, I think, are far beyond C-like languages) don't care so much about syntax and always begin to talk about semantics.

I would like to know what most developers think about syntax. Is it important? Can anything potentially be accepted by the market? I think I won't find the answer here, but I also don't know where to ask.

What I really want is a syntax that
* is easy to read
* doesn't have noise and unnecessary statements/delimiters
* doesn't add surprises to the language like Smalltalk's "X ifTrue" vs "[X] whileTrue"
* has acceptable math notation

It would be great to have the good properties of Smalltalk combined with the cleanness and math compatibility of Python.

But this is only my personal wish and I don't know if this is what most programmers want, too. I don't care so much about syntax as long as it doesn't add unnecessary cruft (as in C++/Java/Pascal). I just want to finally have a popular PL that is not just a slow scripting language and that has similar expressiveness as Lisp, for example. I have the impression that syntax is pretty important because otherwise people wouldn't be learning C# or D, but something much more powerful and suitable for more than just scripting and web development (i.e., Ruby and Python).

Why are Lisp, Haskell, and Smalltalk unpopular (compared to C++/Java/PHP/Python)? With Smalltalk I could imagine that it might be the VM and the lack of an easy to use open-source IDE (Squeak's UI is too overloaded and too much VM-centric), but if that were the case then this problem would've been solved a long time ago. After all, there are companies that are interested in Smalltalk's success and I doubt they don't want to become more successful. Also, it's not like Smalltalk forces you to use a VM that integrates badly with the host OS. The only explanation I have is that it must be the syntax. At least, from the posts I read on the web and from my discussions with other programmers, this could actually apply to Lisp and Haskell.

Does anyone have a different good (overwhelmingly convincing) reason why Smalltalk doesn't gain significant market share?

Programming Language

Programming Language Popularity. This and issues related to popularity and/or mainstream languages have been discussed on LTU before. It sounds like your language is OCaml, SML or Scala. OCaml is gaining serious traction, particularly with Microsoft's commercialization of F#; OCaml will in general match the speed of C++, it's concise, and it's simple. Scala is available now on the JVM, and people are doing incredible things with it. There are only a few things missing from all statically checked languages; that list applies to OCaml, but not so much to Scala and F#.

There are many reasons why the languages you mention haven't gained traction, from performance issues (perceived or real), to platform issues, to insufficient static checking, to insufficient libraries, to national and/or language barriers, and so on. Claiming one reason is dominant is misrepresenting all the diverse factors involved.

People are learning C# because it's pushed by Microsoft and Novell. People are learning D because they see it as a better, safer C++ without the "VM cruft" of Java/C#. Haskell is unpopular because it's too powerful a language for most people to program in; until the "monadic revolution", it was even too hard to do I/O. It's still very much a research language, but with a growing number of practical applications. PHP is the web scripting language for C programmers; trivial to pick up and installed on all the cheap web hosts in the world.

Lisp/Scheme may be the only well-known languages that actually are unpopular because of their syntax. ;-) But even they are used heavily in certain industries (there was a recent thread about an airline reservation system written in Lisp).

The reasons people keep bringing up semantics is because the issues you bring up are semantical; for instance, your issues with Smalltalk are issues with it's semantics, not its syntax.

* doesn't add surprises to the language like Smalltalk's "X ifTrue" vs "[X] whileTrue"
* has acceptable math notation

This is semantics, not syntax. Comprehensions, first-class functions, objects, types, math, etc. are all semantic issues, not syntactic issues. A mathematical syntactic issue is whether integer literals like 1234 require a postfix to distinguish long (1234L) from int (1234I); operator precedence is semantics. Ring/modular arithmetic as found in hardware is semantics.

There are many reasons why

There are many reasons why the languages you mention haven't gained traction, from performance issues (perceived or real),

Did anyone try to market those languages (or a variant) as scripting languages, so people don't look at performance so much?

to platform issues,

Does this still apply to Smalltalk?

to insufficient static checking, to insufficient libraries, to national and/or language barriers, and so on. Claiming one reason is dominant is misrepresenting all the diverse factors involved.

So, I think we both agree that while Lisp and Haskell do have syntax or semantics problems for many programmers it's really strange that Smalltalk can't gain traction.

My reasoning is that if someone wants to learn a new language he first looks at example code to get a first impression. If that example code looks too strange he just gives up and tries some other language. Squeak is horribly integrated into the platform and without reading a tutorial you can't even explore the environment by trail-and-error. Other Smalltalks don't have that problem, though. Maybe most people just don't see what Smalltalk gives them that they can't get with Python or Java. When compared to other dynamic languages, Smalltalk's power lies primarily in its IDE, so maybe this is another factor that makes people wonder why they should learn it, especially if the syntax is unfamiliar and maybe also if it doesn't support math and normal function calls. Probably another problem with Smalltalk: it forces you into the object paradigm, but that's not always convenient.

When I was looking into

When I was looking into alternative languages way back when, my problems with Smalltalk were the lack of static typing and its image-based nature. I was performance-oriented at the time as well, so that eventually impacted my decision too.

As for platform availability, GNU Smalltalk seems fairly portable and the fastest freely available Smalltalk implementation; however, while it's floating point performance is higher than Python and Ruby, it's still not in the league of C, and even Perl can beat it out on microbenches. Personally, the lack of strong static typing a deal-breaker for me.

Static typing, yes, I forgot

Static typing, yes, I didn't mention it because I assumed that dynamic typing is easier for beginners, but it definitely is another difficulty getting C++/Java developers to switch.

Why do you think that? I'm

Why do you think that? I'm not saying you are wrong, I'm just curious why you think so.

Static typing has always

Static typing has always been a topic for heated debates. Some people prefer it because they want the compiler to catch as many errors as possible. Typos, for example, can slip through in dynamic languages which is very annoying. Dynamic languages heavily depend on test cases which many programmers (or "hackers") need to get used to, first. The argument goes that you have to write much more code.

Before I start a flame war: that's not the topic, here! I choose my language depending on what is needed for my particular task, not based on some stupid prejudice.

I wasn't trying to start a

I wasn't trying to start a heated debate, I wanted to know why you think dynamic typing is easier for a beginner.

It's once less obstacle to pass before you can start running your code, that's for sure. But if that's the argument you could also do what some old BASIC interpreters did, you can postpone parsing of a line until that line is being executed. That way you get start running code even quicker.

Apologies

I'm not too stupid to know...even if you believe that.

Please accept my apologies if I gave you the impression that I think you're stupid. I was (rather clumsily) trying to make a point that it's easy for any of us to fall into the trap of assuming that something unfamiliar to us is intrinsically more difficult than something that is familiar to us. The mistake on my part was in using the phrase "if...then you...". Again, my apologies.

With that, I'm going to bow out. I've made my preferences clear elsewhere for syntax to follow semantics. I just hope that the above exchange has shown that while it's possible to write "obfuscated Haskell" it's also certainly possible to write very literate Haskell. Indeed, most languages have this property to some extent or another.

I just wanted to end the

I just wanted to end the potential confusion. No need to apologize. I wasn't angry when I wrote that. I really spent a lot of time thinking about this topic, I talked to developers, and read opinions on the web. I really want to know why some languages are less successful than others, despite their obvious semantic advantages. Marketing can do a lot (as with Rails), but it can't make something successful that the market doesn't want.

Haskell is pretty close to math notation....

If anyone is still compiling the "vote": I vote for Haskell. But this is just me, at this time.

I disagree with your claim that the Haskell is "Alienish." It just depends on familiarity. And this shapes the whole issue of whether the notation is similar to math or not.

For example, the use of "|" to represent "such that" is common set theory, analysis, topology, etc. List comprehensions are generally done in Haskell almost exactly as they would be in conventional set-theoretic notation. And the "

The Haskell factorial example is just about an exact mapping of a basic definition of Haskell. (A fun thing to look at is Fritz Ruehr's "The Evolution of a Haskell Programmer," at http://www.willamette.edu/~fruehr/haskell/evolution.html ).

(On a slight aside, even the non-eager side of Miranda and Haskell means that unbounded or infinite data structures can be handled without special keywords such as "force" or "delay," so that one can make statements akin to "Let P be the set of primes" without muss or fuss. Very nice to see such a close match with real mathematics!)

Also, the issue of operator precedence is not even so clear as you make it out to be even in math. Consider three different, and commonly used, ways to express addition of two numbers:

"31 + 43 = 74"

31
+ 43
----
74

"Take 31 and 43, then add them together"

Arguably, the second form is the most commonly-used way of hand-adding numbers. The two operands are prepared, then the operation is applied. It sure looks like a version of Polish notation. Reverse Polish Notation, RPN, so familiar to so many of from H-P's line of calculators.

(This made Lisp's "(+ 31 43)" notation look utterly natural to me when I first encountered it back in the 70s, not long after I'd started using an H-P 25 on a daily basis.)

Forth is another example, as someone just mentioned.

This gets back to the point about why the poll is misleading. People are often fondest of the languages they like. And there's the issue of background of the users, intended uses, and so on. Most engineers I worked with were comfortable with FORTRAN, then C. Athers like APL, with its own syntax. Most AI folks would use something else. And so on. As I said, the fact that Haskell uses notation that looks like a 1-to-1 mapping from pure mathematics is very nice...for my particular uses. (Cf. some of Martin Escardo's papers, mentioned here on LtU before, where he takes ideas out of topology, such as Baire spaces, compactness, etc., and expresses them quite simply and elegantly and without too many extraneous symbols in Haskell.)

--Tim May

Haskell is quite alien

In "normal" language, if a concept is hard to explain, you can always show the implementation and this may help (exemple for inheritance you can explain vtable and this will seem less magic) but for Haskell I'm not sure that you can do this: good luck for explaining monads to the beginners.

I've never been able to understand it myself and by now I must have read something like ten "tutorials"..

Little built in

Haskell has an unusually small amount of special behavior. Many things you might think were built into the language, like the notation "+" for addition or try-finally bracketing are really just library definitions. There is a lot of syntactic sugar, and syntax you might want to extend like do notation or numeric literals are defined by translation into type classes you can implement. In GHC I think even most of the stuff that requires runtime support is done with library definitions making standard foreign function call into symbols exposed by the runtime system. For another dimension of transparency, pure code can be evaluated stepwise just with source transformations, without brining in a machine model or because there is no store to worry about.

Most monads are ordinary Haskell code. You can look at the library
sources to see the real definitions of standard examples like Reader or State. It's quite a bit shorter than some dummy vtable code (by the time you cover casting and interfaces), and it's really what you program actually uses. If you want to you can even see how GHC defines IO, ghci is happy to tell you

newtype IO a
= GHC.IOBase.IO (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))

Maybe you mean Haskell programs use more abstract concepts which don't immediately make sense even with an implementation? That's probably true, but I don't think the language design makes things harder to understand, and I'm certain it's not due to the syntax - do you you think it would be easier to understand a list or continuation monad in another syntax, say Java generics?

Say what, again?

In "normal" language,... you can show the implementation ... good luck for explaining monads to the beginners.

The Monad type class and monads like List, Option, State, Reader, Parser and indeed most monads can be expressed in perfectly ordinary Haskell. Haskell does not have any special underlying mechanism for monads other than a tiny bit of syntactic sugar and it's totally optional and entirely superficial.

In Haskell98 the monad you can't see in Haskell is the IO monad. But in most languages the implementation of the IO library is hidden - ultimately buried in some C and assembly. Haskell is no different in this respect.

In other Haskell extensions there are a few monads that are hidden because they deal with foreign function calls, inter-thread communication, and other low level things. But again, that's par for the course.

I've never been able to understand it myself and by now I must have read something like ten "tutorials".

People's difficulty in grasping monads has nothing to do with their implementation being opaque. The implementation is completely transparent and I have written monads in a few languages and seen monads in half a dozen more.

Now, many type systems prevent you from creating a generic monad type - but again, that's not due to special monad support from Haskell that's just due to the way Haskell's type system works.

I should repeat that I'm not shilling for Haskell as the best beginners language. I'm saying that if you want to throw Haskell out of contention then do it because you think statically typed, lazy, purely functional code isn't right for beginners. Don't throw it out because of a misapprehension that monads are implemented in some opaque fashion.

Monads...

Not many of the monad tutorials really seem to explain them very well, I actually grasped them when I was looking at the Mercury language tutorial. The basic idea is this:

Say you want to print a string 3 times in a row you might say:

print("hi"); print("hi"); print("hi");

which might work as expected. However, in a mathematically pure language each function can have only 1 output per given input. This would break your print function, since it effects the "world" and so the output is really different for each print("hi"). To fix this you'd probably do this:

print("hi", 1); print("hi", 2); print("hi", 3);

And so on, threading the state of the world through all your functions which touch the world, such that no two function calls effecting the world have the same input. With monads however, this is done for you in the background:

print("hi") >> print("hi") >> print("hi")

really takes the output of the function, and throws it in a tuple along with the world's state. You can think of the 3rd example using monads as being transformed into the second example by the compiler to preserve mathematic purity.

Mercury does something similar, without monads. The print predicate in Mercury looks something like this:

Print("hi", IO_1, IO_2), Print("hi", IO_2, IO_3)

and the compiler provides the following as a convenience:

Print("hi", !IO), Print("hi", !IO)

which gets transformed into the former. I think monads are a little more clean, flexible, and nice.. once I grasped them :)

Sorry if that was a little off topic, to respond to the original topic posted:

I think any syntax/semantic which removes any and all ambiguity is best. One of the more annoying points of learning to program for me when I was starting was remembrance of precedence rules in languages like C or Java, which inevitably led to a sort of Lots of Irritating Superfluous Parenthesis (LISP) style code. That said, I really like CL/Scheme, Factor, and Haskell. The first two remove just about all ambiguity at the syntactic level. I like smalltalk as well, but the environment is foreign to me, and is very slow.

Haskell is pretty close to math notation....

To add something to my post that for some reason didn't appear: (Note: I tried several variants of the less than symbol followed by the hyphen symbol, but each time the Preview comment showed everything after the less than symbol, including it, not appearing. Hmmmh.)

"And the notation is as close as one can get to "is an element of" or "is taken from the set," usually denoted with the Greek small epsilon, as one can get in ASCII. Very easy to learn. And any language that tries to do the same thing with list comprehensions, selectors, and quantifiers is going to have to make choices about the ASCII representation. The designers of ISWIM, Miranda, Haskell, etc. made choices that closely match math notation. (I recall that some of these constructs came from SETL, a set theory-oriented language. ISWIM pioneered, too.)"