Good languages with simple grammar

I'm looking for pointers to languages that are considered expressive and usable, yet are defined by simple lexical grammars and parser grammars.

One popular example is Python, which is almost LL(1) in simplicity. LISP is LL(1) and might be considered another example depending on one's views. C++ is, of course, a non-example since it isn't even definable in a context-free grammar.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Tcl

Can't get much simpler without being LISP.

Expressive? Usable?

Perhaps you could provide more detail on these 2 terms. "I want something nearly LL(1)" is specific enough, but "expressive" and "usable" are vague and highly debateable terms. For instance, if "usable" means the language has a commercially viable set of tools and support libraries, and a healthy growing user community, then your language list is going to be awfully short. This before even discussing grammar properties! You wouldn't need "references" to such a short list, you'd just count on your fingers.

Cheers, Brandon J. Van Every
(cruise (director (of SeaFunc) '(Seattle Functional Programmers)))

You'll have looked at it alre

You'll have looked at it already by now, but Haskell (possibly minus some syntactic sugar and operator overloading) generally isn't too bad. Things like the layout rule can be handled with a preprocess before parsing proper, too.

Maybe Standard ML?

The grammar of SML is remarkably concise and understandable. I'm not sure what class it's in, and there might be a few gotchas that complicate automated parsing, although I don't recall any off the top of my head. Either way, worth a look.

In a totally different part of the language design landscape, you might take a look at Lua. Syntactic simplicity has always been a goal, and they've done remarkably well.

Algol-style

SML has a conservative syntax in the Algol tradition. Unfortunately, it has a few quirks that render the full grammar far from LL(1), or LR(k) even.

The most obvious candidate in the Algol family rather is Pascal, which was especially designed to allow one-pass compilation and simple recursive descent parsing.

On the other hand, I always found Algol-style syntax rather boring and slightly heavy on the keyword side.

Exciting syntax

On the other hand, I always found Algol-style syntax rather boring and slightly heavy on the keyword side.

On the other other hand, maybe boring is what one should want in a syntax? Which syntaxes are exciting? ;)

Re: Exciting syntax

On the other other hand, maybe boring is what one should want in a syntax?
Bah, I rather prefer to have at least some fun when typing programs :).
Which syntaxes are exciting? ;)

To me it's all about aesthetics. YMMV, but I always found Haskell's most pleasing. You can spend your spare time fine-tuning code layout to make it shine even more ;).

\petRant{Of course, the most boring, ugly and unergonomic syntax ever invented was SGML/XML.}

Easy

Anyone who has seen an obfuscated C program can tell you which language has the most exciting syntax. ;> And with C++ allowing you to overload operators to your heart's content (without any gratuitous parens), you can write code that would make anyone's heart race! (Whether that's from excitement or because their blood is boiling is another matter)

Simplicity of LISP?

The Common Lisp reader.lisp in CMUCL is 1763 lines of non-trivial code. (Less than I expected, in fact.)

The Lisp reader as a table-driven recursive program is one of the really cool but overlookable things about Lisp. Everyone know about READ-macros?

Concatenative Languages

Joy, Forth and similar languages (and probably array languages like J) should tend to be the simplest languages to parse.

simplicity for who?

the compiler/interpreter or the programmer?

I guess Perl, and C++, are very hard to parse and compile. But they allow for some very high level contructs to the programmer.

Scheme, Python are very simple for the programmer. And i think not that hard for the compiler/interpreter ( specially Scheme ).

Forth should be very easy on the interpreter, but it's so low level it's a pain to the programmer...

Besides, simplicity is not simply clear, simple syntax. Behind a very convenient simple syntax for the programmer can hide some pretty heavy machinery going on that may be very hard to the compiler/interpreter. Continuations, tail-call optimizations, type inference, lazy evaluation, pattern-matching... all come to mind.

Lua: small, yet powerful

Lua is small and powerful. The syntax is easy to parse with Lex/Yacc; the Lua interpreter employs a hand-written scanner, though. To learn about Lua, try the book 'Programming in Lua' by its main author - it's well written and available online, too.

Smalltalk

Smalltalk has a simple syntax and it's usually considered quite expressive.

GNU Smalltalk syntax

Python

John Skaller (the author of Felix and the MIA Python implementation Vyper) has said that python was not close to LL(1) and difficult to parse.

Maybe LL(1) grammars are poised for a comeback? Doaitse Swiersta has written some cool papers on parser combinators for LL(1) grammars with error-correcting, and I've seen some papers on extensible LL(1) grammars (I'm vaguely thinking of Syntax Macros but don't have a reference handy).

Hrm

John Skaller (the author of Felix and the MIA Python implementation) has said that python was not close to LL(1) and difficult to parse.

Hrm, he argues pretty unconvincingly, I think.

In fact, Python's parser is written using an LL(1) parser generator (with EBNF extentions) which acts upon a mildly preprocessed token stream. Specifically, the preprocessor generates three kinds of whitespace-related tokens: NEWLINE, INDENT and DEDENT. INDENT and DEDENT are the moral equivalents of '{' and '}'--I think you'd have to be pretty crazy to try to handle whitespace-based indentation in the grammar itself!

His second complaint about Python's grammar concerns trailing commas at the ends of tuple definitions, e.g.

# This is a one-element tuple containing x.
(x,)

# This is the same as (x,y), a two-element tuple containing x and y.
(x,y,)

Earlier in the same message he recommends that people use a "coarse" grammar and move more of the handling to the semantic analysis phase. Well, he should consider taking his own advice because that's exactly how Python's parser deals with these trailing commas. Here's the relevant part of Python's grammar:

atom: '(' [testlist_gexp] ')' | ...
testlist_gexp: test ( gen_for | (',' test)* [','] )

The semantic actions can easily check for the presence of the trailing comma to tell the difference between '(x)' (which is identical to 'x') and '(x,)' (a one-element tuple containing x).

Languages with simple grammars

Algol 60, Pascal, Oberon, and Oberon2 are all fairly simple. IIRC, the Oberon2 report is about 18 pages. Most (all?) of Wirth's compilers use single pass recursive descent.

Rebol

There is a simplicity sweet spot to be found between the needs of the compiler/interpreter and the needs of the programmer.

Rebol seems to do an excellent job of hitting that sweet spot.

And not only that, it doesn't just build in RegExp processing, it actually provides a built-in BNF parser.

It's an excellent, easy-to-learn, easy-to-use language and I can't understand why it isn't massively more popular.