The English-Likeness Monster

I thought that denizens would be interested in this post by John Gruber about the problems caused by the notion that making Applescript syntax work like English would lead to a friendlier language: http://daringfireball.net/2005/09/englishlikeness_monster

Besides Cobol and Perl are there any other languages where the designers felt that making the syntax resemble English (an evolved rather than designed language with legacy syntactic structures stretching back thousands of years and a history of ambiguous usage) was a good idea? Surely someone must have done some research showing why this usually leads to languages that are harder to work?

This raises some interesting questions about the ways in which language design happens. Should we be paying more attention to the choice of syntax (even though semantics matters I'd claim that in most of the languages used by the vast majority of people the scoping rules and type systems don't wander out of a narrow range of options) and using syntax lists such as this one: http://merd.sourceforge.net/pixel/language-study/syntax-across-languages/ compiled as part of the Merd language project?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

language like

thread on Rebol list on this:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlCJQS

can't remember the relevant quote that they quote but here is an interview with Carl Sassenrath Rebol's inventor http://www.rediff.com/computer/1999/sep/29carl.htm

where some of the basis of thinking of Rebol as a human language is discussed.

J, with a syntax not at all like English nonetheless is based on a set of concepts more common to human languages than computer languages (although some of the concepts are directly translatable.)

http://www.jsoftware.com/books/help/jforc/preliminaries.htm#_Toc24767903

Variables and Loops

I found the interview with Carl Sassenrath amusing in places. For example, when he said, "Computation is computation. If you go the programming language route there is no way around teaching people what a variable means or what a loop is," my first thought was that the first sentence is absoutely correct. But then I looked at the second, and realized that Haskell has neither variables or loops.

But as far as looking like English, well, I don't know. Does speed.r look like English to you?

And, this is friendly to naive users?

"To download a copy of REBOL, you need to indicate that it is binary. Here's how:"

write/binary %rebol.zip
    read/binary http://www.rebol.com/downloads/core031.zip

Ugh.

But computation isn't computation, except in the very trivial case of "being computable", and even here you need to specify the computable problem set.

I agree

Rebol does not necessarily achieve the goal of being english like, nonetheless it is a specific goal. A goal that has generally been more closely achieved at the simple one or two line scripts, like

write %currentlambda.html read http://lambda-the-ultimate.org

SQL, DB3/FoxPro/Clipper, OmniMark

One obvious such language is SQL. If you're looking for more general-purpose languages, I can only offer two examples that go halfway there: Clipper (or dbIII+ it originated from) and OmniMark. Both languages are Turing-complete and can be used for general-purpose programming, but they are meant to be used in specific niches. Here's a taste:


  • Clipper/dbIII+:

       delete for x = 1 while x > 0 next 50
       copy file "foo.txt" to file "bar.txt"
       set x to 5
    

  • OmniMark:

       using output as file "foo.txt"
          repeat over y as x
             output x when x is attached
          again
       set x to 5
    

The two languages are not related at all, but you can see some interesting similarities. They both use SET <variable> TO <value> syntax, just as AppleScript. Both are case-insensitive. Both originated as small, purpose-specific languages, and later kept adding more symbolic and mathematically-oriented syntax. And most interestingly, to me at least, they both eventually allowed user-defined functions to have the same kind of rich syntax as the predefined language.

My personal experience is that, much as any other PL feature, the English-likeness syntax can be helpful as long as it's not overused. That means it has to be optional, and the programmer must be able to disambiguate what he means using parenthesis or something similar.

huh

I never thought i would live to the day i would see clipper in these lambda pages... :P

Merd & Scala

Merd is neat, although I already registered my desire for them to stick with mathematical syntax when doing math, instead of trying to do some 'intelligent' parsing: "horizontal layout: 1+2 * 3 is (1+2) * 3 (merd innovation!)" seems to me to be flying in the face of centuries of perfectly understood mathematical rules.

Scala has a nice comparison with AsmL, and I actually found the AsmL code to be a lot more readable. It might take longer to type, but personally I would favor that slightly 'naturalized' and wordy style simply for long term maintenance.

parsing arithmetic

While making 1+2 * 3 mean (1+2) * 3 seems like asking for trouble, I've wondered if it might be a good idea to make it a compile error (or a warning).

More generally, something like:

If a subexpression could "go with" either its left or right context, to be disambiguated by precedence and/or associativity, then the whitespace on the side it "goes with" must be <= the whitespace on the other side (where "no space" < "one or more spaces" < "newline"). So "1 + {NEWLINE} 2 * 3" would be fine, but "1 + 2 {NEWLINE} * 3" would be a compile error. Does anybody else think this is a good idea?

Smalltalk

Of course, 1 + 2 * 3 has meant (1 + 2) * 3 for years if you're a Smalltalk programmer.

I remember reading about some language (L, I think--maybe it was here on LtU) that used the amount of whitespace to indicate binding strength. In other words,

1+2 * 3

was different from

1 + 2*3

and so on. I've also seen languages (does Lisp do this) that when parse the following differently:

-3

and

- 3

such that the first is the literal "negative three"; the second is a minus sign operator followed by the literal "three".

I like the latter use of whitespace more than the former--it's quite useful in a highly-parenthesized language like Lisp. Of course, that's my $.04

Re: warning for bad math

I'm a hot-under-the-collar stick-in-the mud about this topic. My $0.02:

I just cannot fathom how anybody could want to come up with an arbitrary system for mathematical notation when there already exists a perfectly well-defined and understood notation.

Even if the person writing the code knows how the whitespace works and uses it, what about when I go to read the code because I'm trying to track down an error? I have to then go look up what cockamamy syntax the system is using, when instead I could just have gone with stuff I've known and used since I was a kid!

Please, for the love of Something, don't anybody ever do this! I know that things like Smalltalk have already screwed the proverbial pooch in this regard, let us learn from that rather than continue down such a pernicious road.

/rant

[Edit: basically, I mean 'precedence' instead of 'notation', see below.]

The Existing System

You must have gone to quite a different school from me. I didn't use AND and OR in algebraic expressions, not even .

I guess that's why I prefer a system with as few rules as possible (Smalltalk is good; Scheme is great), because then I can just look and know, rather than have to go to a precedence table to find out if

Yes, that seems to be the ide

Yes, that seems to be the idea. This feature is not for arthimetics but for all the other millions of infix operators. It might be noted that languages like APL and J which use a lot of inifix syntax also abandond precedens rules. and simply parses from the right. ie 2 * 3 + 4 is 2 * (3 + 4).

A Radical Thought

I wonder if any languages have ever tried parsing from the left to be more English-like, so 4 + 3 * 2 would be interpreted as 4 plus 3, times 2.

Then explicit parens could be used to force a lookahead by spawning and reducing new local evaluation contexts as in 2 * 3 + (7 div (4 + 9)).


Peter J. Wasilko, Esq.
Executive Director & Chief Technology Officer
The Institute for End User Computing, Inc.

These comments are not official IEUC positions unless otherwise noted.

S/N ratio

Don't you think a 5 line signature for a 4 line post is a bit excessive? Especially considering your name is already attached to every post you write?

[Admin] Sigs

The signature feature has just been removed, so you shouldn't be bothered by this kind of thing in future. I'm leaving this subthread as-is, as a reminder of the bad old days, now behind us. ;)

Yes. Both Smalltalk and Rebol

Yes. Both Smalltalk and Rebol does that.

Which operators?

I agree that on the whole having to look up operator precedence sucks.

But I think that if you force Smalltalk-like evaluation upon the mathematical operators (OR and AND would be in that category as discrete math when used as such, even if bit shifting isn't) then you are throwing away a perfectly usable system. See, I pretty much wouldn't ever have to look things up if they are * and +. If somebody did have to look them up then they didn't graduate from high school?

So can we separate those things which have predefined 'universal' precedence, like math, from 'arbitrary' precedence, like pointer arithmetic vs. dereferencing? I dunno, but that to me is a much more worthy goal than this claim to make things more like English.

(And if the whole idea is to be more like English, then you have to fix what OR means - I think most folks use it in the XOR sense!)

But mathematical notation ISN'T well-defined.

More precisely, mathematical notation is frequently ad-hoc--subdisciplines (or even individual authors and papers) will have their own definitions; and the various symbols used may have different meaning in different contexts.

Mathematical notation is also very dependent on such things as position, font size (sub/superscripting), font effects (bold, italics, "blackboard"), etc.-- things which are notoriously difficult to specify in a text editor, without resorting to some markup language--Tex/LaTex, XML/MathML, etc.

The notion of a "universal" mathematical notation simply doesn't exist. Given that most uses of mathematical notation are intended to be read by humans, not by computers, and it's assumed that the reader/student has the appropriate context, this isn't a problem for printed mathematical works.

But it's a different story for describing algorithms and constraints to a computer.

One way to deal with this, I suppose, is to have languages/parsers which will accept MathML (and be able translate it meaningfully into a suitable computer-language grammar). More generally; languages should have a "kernel" grammar which is clean, unambiguous, and well-defined--and mathematical notation, should it be needed, can be viewed as a DSL.

Now, the case of operator precedence (for well-defined issues like multiplication and addition), accomodating the mathematical world isn't unreasonable--many languages do this already. But for the more generic mathematical universe---ugh.

I Agree!

For sure, mathematical notation is a very rich and complex thing. However, I don't believe that in any mathematical system that I've seen would one condsider 1+2 * 3 != 1 + 2 * 3 (which is the example from Merd, above). So, yeah, having the precedence in the programming language match the basics from mathematics is the very minimum I'd require. Speaking for myself.

Regarding going beyond ASCII, I think things like Fortress are cool in that they want to make sure ASCII is still allowed. If we are interested in languages that can become widely used, I think we (very unfortunately) have to realize that ASCII is the only thing we can count on - I agree that trying to use XML or something is just fraught with peril, wailing and gnashing of teeth.

Precedence Schmeschidence...

Any code that requires you know the precedence of the binary operators is not properly written - parenthesis are cheap and make this stuff obvious. I don't think code like (1 + 2 * 3) should see the light of day.

The argument...

... is one of tradeoffs between suitability for humans, and suitability for computers. Suitability for humans can be further broken down into ease of reading, and ease of writing.

Mathematical notation, when professionally typeset, is easy to read (compared to the alternatives, and assuming the reader is suitably trained in math). It's a royal pain to write and edit--whether you're using Latex, MathML, Microsoft Word, FrameMaker, troff, or chalk on a blackboard.

Handwritten mathemathical notation is easy to write (though some effects like bold and italics can be tricky); but it can be difficult to read--especially if you studied under some of the professors I had. :)

Fully-parenthesized expressions (whethern infix, postfix, or prefix) have the nicety of being unambiguous. But really complicated ones are difficult for many people to read. This is one reason many mathematical texts will use different sized parentheses at different (nested) levels of grouping--even though changing the size of a printed ( or ) doesn't change it's meaning; it makes things easier for humans to read. And it's a reason why most languages provide more "human intuitive" syntaxes, even if it makes parsing more complicated and introduces ambiguity into the mix. (Many argue that it's a reason why Lisp--one language which doesn't provide infix syntax, reader macros nonwithstanding--hasn't been more successful. How true that is, I don't know).

Like many things, this is a user interface issue. Coming up with the right answer is not a simple problem--and no matter what you do (and how much research backs your decision), someone ain't gonna like it.