archives

Differentiating Parsers

A fascinating article by Oleg Kiselyov on delimited continuations:

We demonstrate the conversion of a regular parser to an incremental one in byte-code OCaml. The converted, incremental parser lets us parse from a stream that is only partially known. The parser may report what it can, asking for more input. When more input is supplied, the parsing resumes. The converted parser is not only incremental but also undoable and restartable. If, after ingesting a chunk of input the parser reports a problem, we can `go back' and supply a different piece of input.

The conversion procedure is automatic and largely independent of the parser implementation. The parser should either be written without visible side effects, or else we should have access to its global mutable state. The parser otherwise may be written with no incremental parsing in mind, available to us only in a compiled form. The conversion procedure relies on the inversion of control, accomplished with the help of delimited continuations provided by the delimcc library.

Is Small Still Beautiful?

(LTU appropriate punchline at end) I suppose I'm just old enough to have been raised on the "small is beautiful" philosophy, and I still hold in awe some languages built from a relatively spare set of primitive concepts: Forth, Smalltalk, Scheme, C and the Unix shell + utilities + pipeline all come readily to mind.

But recently, I've had some time on my hands and spent some time "swimming" about in the programming language space. A few observations.

Some of our modern languages (some a decade+ old already) have type systems that require a PhD to understand fully. We have a low level threading model in languages like C and Java that almost require a PhD to use effectively in any sufficiently complex system. (Not to make a fetish of the PhD.)

In Pike's Power Point on system research, recently posted in another thread, he mentions, IIRC, that 80% of the Inferno (?) effort was spent conforming to existing, externally imposed standards! Turning to the most mainstream, popular, *production* languages - we have TRULY giant libraries that boggle the mind.

Designing an effective GUI library for the modern, newly more complex UI was a grand challenge of the late '80s and early 90s.

But today, we have Java/JVM, Perl, Python, C#/CLR (different, but still pregnant with oodles of MS API's), MS binary APIs, a growing body of de facto Linux binary APIs, and even fairly rapidly growing complex Scheme libraries (PLT) that actually require a separate complex application to locate, manage and update - and that's prior to presumably learning the eventually using the libraries in our applications. We're not in Kansas anymore.

The documentation effort alone for any of these language specific "package databases" is daunting.

And all the while, the famous "principle of least surprise" is growing stronger and stronger both with each new generation of computer user and each new generation of computer programmer. It recall's the joking "10 principles" of successful programming language design, the first of which, IIRC, was "use C style curly braces" :-)

I guess that Pike's PowerPoint on systems research had a big impact on me, and that I readily seemed able to apply it to today's situation with programming language design and development. Will new languages be increasingly be relegated to "toy" status until more and more design efforts and research just whither away?

And at least my aging collection of language texts still emphasize a notion of programming language made up of a small set of data and control concepts easily combined - typically, the smaller the better. It reminds me of my formal logic training, where the search for a set of primitives to provide the foundations of logic, set theory and mathematics formed some sort of holy grail (sound a little like Scheme?).

So I ask - do we need an about face? Do we need to study (and teach) how to build *large* programming languages: languages with type-checked integrated SQL syntax; built in rich XML support; myriad native persistence, serialization and network communications facilities; a diverse family of concurrency mechanisms; language level transaction support, including distributed transaction facilities (MQ Series style?) to better support cluster computing.

As for library infrastructure and a (poor) degree of platform based language interoperability, we have the JVM we know and love today frankly by an historical mistake. We have the CLR because MS has to produce a "one up" version of whatever else is popular in the computing world. I won't restate the many, many gripes about each made by folk targeting, or potentially targeting, these platforms for their new, innovative languages (while, acknowledging, that surely each also has its many interesting implementation virtues). But I will invite us to recall the gripes :-)

It's arguable that we need an academic/industry consortium effort to redesign the JVM (presuming we don't start from scratch), with a new, concerted focus on language support - advanced calling convention support such as generalized last call optimization (or think of efficiently supporting Common Lisp calling conventions, even CLOS multimethods, combined with higher order function just for a brain teaser); integration of compiler analysis with runtime call/loop support for optimizing GC or thread switch safety points; optimized execution models to support logic programming and expert system style languages; type systems divorced via some well defined barrier between (more limited) capabilities of the runtime/JIT and new, innovative (unanticipated) future type systems at the language level; safe and efficient intermixing of manifestly and latently typed code and data; rule or specification based per-language calling conventions to facilitate "auto-glue" supporting automatic cross-language library interoperability; support for compiler and linker customization to support a variety of module systems of varying complexity; same potentially for macro facilities, and yada yada yada.

Just a brief scenario based on examples, but I hope you get the idea. I'm sure many of us could go on and on, based on current personal research or commercial interests, likely isolating even more fundamental and/or timely issues that beg attention in order to support language innovation in this apparent new era of "Big is Beautiful."

Like it or not, are we in the era of "Big is Beautiful" language design, and if so, what are we to do about it?

Put another way, given the above described "issues," the raw CPU itself gets in our way least of all! So what's the problem? The problem is the scale of the libraries one must support in a modern language. The problem is increasing productivity of smallish research teams by sharing a low level 3 address code, set of SSA optimizations and code generator - and other relatively neutral infrastructure. The problem is composing language features, *larger* features, and on a *larger scale* than the minimalist principles laid down in the days of yore.

In summary, it *appears* that the glue holds some promise, and clearly some languages benefit from it better than many others. So can the "glue" truly become the *solution* for future language research, design and implementation?

Scott