Lambda the Ultimate

Exegesis 5 is out
started 8/23/2002; 6:58:48 AM - last post 8/24/2002; 3:43:17 AM

Keith Devens - Exegesis 5 is out

8/23/2002; 6:58:48 AM (reads: 930, responses: 4)

and it's awesome. I can't get over how fantastic it is.

I'm not familiar with languages such as Icon or SNOBOL which are supposed to specialize in text processing, so I was hoping some LtU readers could make some comparisons.

Noel Welsh - Re: Exegesis 5 is out

8/23/2002; 7:22:29 AM (reads: 973, responses: 0)

It looks like good old BNF parsing. No real innovation there. Can you explain how this differs from BNF, if it does?

Keith Devens - Re: Exegesis 5 is out

8/23/2002; 7:56:07 AM (reads: 960, responses: 0)

Well, yeah, basically that's what it allows you to do. But while it looks like BNF, it's an actual parser - I wasn't aware other languages did that. It's a very nice improvement over what Perl had before.

While parsing, it'll capture a named rule into a variable of that name. So, just from specifying all the rules, you parse your data into a tree with each piece of your data named. It'll also allow you to run code anywhere within a rule, so you can modify your resulting tree *while* you capture it. You can bind any part of a rule to a variable name, so you can renumber $1, $2, $3, among other things. You can actually build up a grammar and *inherit* from it to build specializations on the grammar. It allows you to have much more control over your backtracking, it has some nice convenience features, and it cleans up what Perl had before.

Maybe someone coming from a language like Icon thinks all this is no big deal. I don't know what else is out there. Maybe there isn't much that's innovative, and maybe there is. Probably what's most important is it brings these things to the mainstream. I've never seen anything like this (the closest I've seen is REBOL's parse dialect), so that's why I'm hoping someone can give a comparison to other things I haven't seen :)

Noel Welsh - Re: Exegesis 5 is out

8/23/2002; 8:21:34 AM (reads: 963, responses: 0)

Ok, let me ramble a bit.

Plenty of languages have parser generators. A nice thing about a perser generator is that it does the work at compile-time and it can do things like minimising the number of states in your regular expressions (once you have a real parser available the need for Perl's regular expression extensions is minimised). Flex and Bison (or Lex and Yacc if you live in prehistorical times) are the canonical C implementation. Oh boy are they ugly! ANTLR, MLLex, MLYacc, etc. are implementations for other languages.

The bad thing about parser generators is that they run at compile-time. This limits their flexibility and they usually introduce a hideaous little language with poor abstraction facilities. All the rage in the functional programming community at the moment are parser combinators. Libraries of combinators (which are functions with a fancy name) that you glue together to make parsers at runtime. This gives you the flexibility to create your representation at runtime and you can use the host languague's abstraction facilities. Apparently performance is a bit poor.

Enlightened languages with powerful macro facilities allow your parsers to be specified using the macro facility so you get speed of compile time parser generators with the abstraction facilities of your host language. With Turing-complete macros you can blur the distinction between compile- and run-time.

Here's a link to a parser combinator library:

http://www.cs.uu.nl/groups/ST/Software/UU_Parsing/index.html

I think it's been discussed on LtU before.

Ehud Lamm - Re: Exegesis 5 is out

8/24/2002; 3:43:17 AM (reads: 960, responses: 0)

Even though both Icon and SNOBOL are Griswold languages, they handle pattern matching differently. Thr SNOBOL approach is closer to the Perl approach, since you explicitly create patterns (and pattern variables) and then perform a pattern matching operation between the pattern object and a given string. You can see the similar flavor in this quote, from the SNOBOL4 book: If P1 and P2 are strings or pattern structures, the statement P3 = P1 | P2 builds a new structure and assigns it as the value P3. P3 mathces any string matched by P1 or P2.

In some sense the Icon approach is more organic. The scanning is integrated with the regular control flow in the language (remember that Icon has goal directed evaluation):

Icon's string scanning facility is based on the observation that many operations on strings can be cast in terms of a succession of operations on one string at a time. By making this string, called the subject, the focus of attention of this succession of operations, it need not be mentioned in each operation.

(The Icon Programming Language, Griswold & Griswold, chp. 3)

More concretely (quoting the same chapter):

The form of a string-scanning expression is expr1 ? exp2 where expr1 provides the subject to be scanned and expr2 does the scanning. The outcome of the scanning expression is the outcome of expr2....
Scanning starts at the beginning of the subject, so that
text ? { while move(1) do write(move(1)) }
writes the even-numbered characters of text on seperate lines.