Parsing: The Solved Problem That Isn't

In the blog post Parsing: The Solved Problem That Isn't Laurence Tratt discusses some interesting unsolved practical problems with parsing especially in combining grammars

The general consensus, therefore, is that parsing is a solved problem. If you've got a parsing problem for synthetic languages, one of the existing tools should do the job. [...]

One of the things that's become increasingly obvious to me over the past few years is that the general consensus breaks down for one vital emerging trend: language composition. "Composition" is one of those long, complicated, but often vague terms that crops up a lot in theoretical work. Fortunately, for our purposes it means something simple: grammar composition, which is where we add one grammar to another and have the combined grammar parse text in the new language (exactly the sort of thing we want to do with Domain Specific Languages (DSLs)). To use a classic example, imagine that we wish to extend a Java-like language with SQL [...]

He goes on to mention several example problems:

  • Two LL or LR grammars may combine to produce a grammar that is neither.
  • Two unambiguous grammars may combine to produce an ambiguous grammar.
  • Two PEG grammars may combine to produce something that doesn't do what you want due to left bias.

What's the current state of the art?

Language mystery: identify the source language to a worm based on its object code

Here's a fun challenge for LtU. The team at Securelist is analyzing a worm called Duqu and found a few interesting things. One of them is that they can't figure out the source language for the core framework.

After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++. It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language.

We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story.

I'm not clear on how much knowing the source language helps with the security analysis, but what else were you doing with your time? All the details and clues in the object file can be found on their blog.

Informed dissent: William Cook contra Bob Harper on OOP

Ongoing discussion that you can follow on William Cook's blog.

I am not going to take sides (or keep points). I know everyone here has an opinion on the issue, and many of the arguments were discussed here over the years. I still think LtU-ers will want to follow this.

Given the nature of the topic, I remind everyone to review our policies before posting here on the issue.

Announcing Lang.NEXT - A Free Event for PL Designers and Implementers Hosted By Microsoft

The event is held at the Microsoft Campus on Apr 2-4 with talks, panels and discussions from 9-5 every day. Attendance is free, and includes lunch. Details here.

When Formal Systems Kill: Computer Ethics and Formal Methods

While ethics aren't normal LtU fare, it's sometimes interesting to see how our technical discussions fit into a larger picture.

In When Formal Systems Kill: Computer Ethics and Formal Methods February, 2012, Darren Abramson and Lee Pike make the case that the ubiquity of computing in safety critical systems and systems that can create real economic harm means that formal methods should not just be technical and economic discussions but ethical ones as well.

Computers are different from all other artifacts in that they are automatic formal systems. Since computers are automatic formal systems,techniques called formal methods can be used to help ensure their safety. First, we call upon practitioners of computer ethics to deliberate over when the application of formal methods to computing systems is a moral obligation. To support this deliberation, we provide a primer of the subfield of computer science called formal methods for non-specialists. Second, we give a few arguments in favor of bringing discussions of formal methods into the fold of computer ethics.

They also spend a good amount of time giving a lay overview of the practical, economic challenges faced by formal methods.

Julia, a language for technical computing

Julia is a new programming language by Viral Shah, Jeff Bezanson, Stefan Karpinski, and Alan Edelman.

From the blog post Why We Created Julia:

We are greedy: we want more.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.

We never want to mention types when we don’t feel like it. But when we need polymorphic functions, we want to use generic programming to write an algorithm just once and apply it to an infinite lattices of types; we want to use multiple dispatch to efficiently pick the best method for all of a function’s arguments, from dozens of method definitions, providing common functionality across drastically different types. Despite all this power, we want the language to be simple and clean.

Looking at the excellent Julia manual, it becomes clear that Julia is a descendant of Common Lisp. While Common Lisp has many detractors (and not entirely without reason), nobody can claim that the family of languages it spawned aren't well designed. On the contrary, languages like NewtonScript, Dylan, [Cecil and Diesel,] Goo, PLOT, and now Julia all have a hard to grasp quality without a name that makes them an improvement over many of their successors.

A Concept Design for C++

In the video A Concept Design for C++ and the related paper Design of Concept Libraries for C++ Bjarne Stroustrup and Andrew Sutton describe how they're going avoid the problems that lead to concepts getting voted out of C++11. In a nutshell they seem to be focusing on the simplest thing that could possibly work for STL (C++'s Standard Template Library).

C++ does not provide facilities for directly expressing what a function template requires of its set of parameters. This is a problem that manifests itself as poor error messages, obscure bugs, lack of proper overloading, poor specification of interfaces, and maintenance problems.

Many have tried to remedy this (in many languages) by adding sets of requirements, commonly known as "concepts." Many of these efforts, notably the C++0x concept design, have run into trouble by focusing on the design of language features.

This talk presents the results of an effort to first focus on the design of concepts and their use; Only secondarily, we look at the design of language features to support the resulting concepts. We describe the problem, our approach to a solution, give examples of concepts for the STL algorithms and containers, and finally show an initial design of language features. We also show how we use a library implementation to test our design.

So far, this effort has involved more than a dozen people, including the father of the STL, Alex Stepanov, but we still consider it research in progress rather than a final design. This design has far fewer concepts than the C++0x design and far simpler language support. The design is mathematically well founded and contains extensive semantic specifications (axioms).

Why Concatenative Programming Matters

Jon Purdy riffs on Hughe's famous "Why Functional Programming Matters" with a blog post on Why Concatenative Programming Matters.

Cambridge Course on "Usability of Programming Languages"

From the syllabus of the Cambridge course on Usability of Programming Languages

Compiler construction is one of the basic skills of all computer scientists, and thousands of new programming, scripting and customisation languages are created every year. Yet very few of these succeed in the market, or are well regarded by their users. This course addresses the research questions underlying the success of new programmable tools. A programming language is essentially a means of communicating between humans and computers. Traditional computer science research has studied the machine end of the communications link at great length, but there is a shortage of knowledge and research methods for understanding the human end of the link. This course provides practical research skills necessary to make advances in this essential field. The skills acquired will also be valuable for students intending to pursue research in advanced HCI, or designing evaluation studies as a part of their MPhil research project.

Is this kind of HCI based research going to lead to better languages? Or more regurgitations of languages people are already comfortable with?

Dennis Ritchie passed away

I have just learned that Dennis Ritchie (1941-2011) has passed away. His contributions changed the computing world. As everyone here knows, dmr developed C, and with Brian Kernighan co-authored K&R, a book that served many of us in school and in our professional lives and remains a classic text in the field, if only for its style and elegance. He was also one of the central figures behind UNIX. Major programming languages, notably C++ and Java, are descendants of Ritchie's work; many other programming languages in use today show traces of his influences.


Bjarne Stroustrup puts the C revolution in perspective: They said it couldn’t be done, and he did it.

XML feed