Lambda the Ultimate

inactiveTopic Lazy Functional Parser Combinators in Java
started 7/23/2002; 3:10:32 AM - last post 7/28/2002; 6:05:08 AM
jon fernquest - Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 3:10:32 AM (reads: 2982, responses: 11)
Lazy Functional Parser Combinators in Java
"This paper shows how to use parser combinators in a functional language as well as Java, and shows how parser combinators can be implemented in Java. Implementing parser combinators is accomplished by combining two libraries. The first one, writtenin Haskell, defines error-correcting and analysing parser combinators.The second one consists of a small Java library implementing lazy functional behavior. The combinator library is straightforwardly coded inJava, using lazy behavior where necessary.

Using Java makes this functional approach to recursive descent parsing more accessible to programmers and projects working in mainstream languages. Note Prof. Swierstra's site with additional papers and combinator parsing web pages. See also Parser Combinators in C .

Posted to functional by jon fernquest on 7/23/02; 3:34:22 AM

jon fernquest - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 3:29:59 AM (reads: 1299, responses: 1)
Note that a large percentage of the work in functional programming (like my last two functional programming posts) comes out of Utrecht University and the Netherlands. My favorite quote on "Utrecht University" by a well-known computational linguist:

"The reason I came over to Utrecht, if you must know the truth, is because I could not understand what was going on in current Categorial Grammar. I had this stack of papers...... and I just couldn't get through them on my own. Coming here I realized the benefit of immersing yourself in a particular culture. I volunteered to teach this material, amongst a group of people many of whom knew it better than me. I think I've come to the point where I'm up to the state-of-the-art level of knowledge." Bob Carpenter Ta! Interview

Does anyone know the history and reasons behind this consistency of approach at Utrecht (and the Netherlands in general) to linguistics (natural languages) and programming languages? (involving algebra and logic that is)

There is a similarity in their approach to And also note the applicability of a lot of this algebraic-functional approach to parsing to computational linguistics and natural language. Anyone

Martin Bravenboer - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 4:01:53 AM (reads: 1267, responses: 1)

Note that the implementation of parser combinators (and any code with a functional smell) in Java will become much more attractive with the addition of parameterized types/generics to the Java language.

The conclusion of this paper mentions the problem of the weak type-system of Java, which forces the parser combinators to work just on java.lang.Objects. The parameterized types allows you to parameterize a function (which is represented as a class) with its argument and result. This makes functional like code in Java quite attractive (although still disgusting verbose of course).

However, I assume there is not a lot of performance gain in Java, because generics are not available in Java Bytecode and information on the parameters is not available at runtime. In C# and .NET this will be much better. I'm looking forward to a parser combinator library that exploits generics in Java or C#.

Unfortunately I'm not aware of any historical reasons why there is so much work in functional programming going on at the University of Utrecht. I'm a student over there, and indeed there is a high interest in functional programming in Utrecht. Our one and only course on parsing and grammars is even based on parser combinators in Haskell ;-) .

Ehud Lamm - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 5:08:52 AM (reads: 1318, responses: 0)
I assume there is not a lot of performance gain in Java, because generics are not available in Java Bytecode and information on the parameters is not available at runtime

I am not sure I follow. Are you talking time or space efficiency?

Compiling by erasure can make generic code be as efficient as type specific code. Or do you have something else in mind.

Ehud Lamm - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 5:29:42 AM (reads: 1327, responses: 0)
Does anyone know the history and reasons behind this consistency of approach at Utrecht

I was wondering about that myself. I would guess it is mostly beacuse of the impact of a few scholars, but I'd love to hear some inside information...

Martin Bravenboer - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/23/2002; 5:37:39 AM (reads: 1277, responses: 0)
Time efficiency: the compiled Java Generics code will result in the same code as the original code from this article. This will therefore not remove the casts in the implementation of the original paper. Those casts are ugly, but they might also be one of the reasons the performance isn't very great?

If the casts have a serious impact on the performance of the parser combinators, I think the performance of this approach could be better in a future .NET version, with generics in the IL and .NET CLR. Maybe a JVM could remove unnecassary casts.

Jo Totland - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/24/2002; 12:12:09 PM (reads: 1177, responses: 0)
Also check out parser combinators in C++, Spirit.

Ehud Lamm - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/26/2002; 7:05:58 AM (reads: 1152, responses: 0)
Martin, re time efficiency, I haven't really studied the code in the paper, but in general you can compile generic code into type specific code - so no casts are needed at runtime. This is even better than having a generic IL.

For example, in Ada I can have a Stack package which is parameterized over all item types. When I instantiate the package for integers, I get a package that is identical to an integer specific version of the stack package. This works no matter what the machine code is.

Obviously, if the casts are required because the client code uses Object to interface with your code, you are out of luck.

Martin Bravenboer - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/26/2002; 10:49:56 AM (reads: 1152, responses: 2)
The Java Generics (JSR 14) do not compile a type specific parameterization to a separate class, which can only be used for this parameter. Just decompile some code compiled by the Java Generics compiler: you will find classic Java code with casts and usage of the universal superclass Object. Therefore Java Generics for now just offer compile-time safety.

In general I must agree with you of course :) .

Ehud Lamm - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/27/2002; 7:11:03 AM (reads: 1195, responses: 0)
Right. I'll have to check that sometime.

Quel dommage.

Ehud Lamm - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/28/2002; 3:46:35 AM (reads: 1196, responses: 0)
The Java Generics (JSR 14) do not compile a type specific parameterization to a separate class, which can only be used for this parameter.

I was wondering what were the motivations for choosing this approach. Does anyone know?

Martin Bravenboer - Re: Lazy Functional Parser Combinators in Java  blueArrow
7/28/2002; 6:05:08 AM (reads: 1225, responses: 0)

Unfortunately I cannot figure out exactly why they've chosen the current solution. To me there seems to be some conflicts between te jsr 14 goals and the current public draft that is available for this jsr 14. The goals seem more ambitious to me. Most decisions are based on the huge amount of legacy code for the Java Platform or problems related to modification of the JVM. Therefore I think it was a huge mistake not to include .NET generics in the first release of .NET.

public draft of jsr 14: The proposed extension is designed to be fully backwards compatible with the current language, making the transition from non-generic to generic programming very easy. In particular, one can retrofit existing library classes with generic interfaces without changing their code.

In essence this means that you can use existing libraries as if they where parameterized. Although I understand the goals, I think this sucks. A class is parameterized or a class is not parameterized. The result of this goal is illustrated by the early access compiler that is available: you have to compile against 'stubs' for the Java Collections. These classes do not contain an implementation. At runtime the current, not parameterized, classes can be used. Of course this is great at the moment because you can use generics without requiring a new JVM, but is it worth it?

Some goals of the JSR:

G4) Simplicity. Keep it simple for users, but not necessarily for implementors: It's okay to place a larger burden upon VM and compiler implementors (within reason) if that will make generics more natural and easy to use.

Well, that sounds good to me. Now hope that they will bring this into practice.

G7) First class generics. Types involving parameters should be first-class types. This goal consists of several subsidiary goals: a) Instantiated parameterized types (e.g., List) should be first-class types.<br/> b) Type parameters (e.g., T) should be first-class types. (A consequence of this is that List is a first class type).<br/> By "first-class" we mean that these new sorts of type expressions can be used in exactly the same ways as existing type expressions. In particular, it should be possible to cast a value expression to one of these sorts of types, and to test whether an object is an instance of such a type.

I doubt whether this goal is completely satisfied by the current proposal. If the types are really first class, I would expect to be able to create a new instance of this type or pass the class to some method, which is not possible.

public draft: To facilitate interfacing with non-generic legacy code, it is also possible to use as a type the erasure of a parameterized class without its parameters. Such a type is called a raw type. Variables of a raw type can be assigned from values of any of the type's parametric instances. For instance, it is possible to assign a Vector<String> to a Vector. The reverse assignment from Vector to Vector<String> is unsafe from the standpoint of the generic semantics (since the vector might have had a different element type), but is still permitted in order to enable interfacing with legacy code. In this case, a compiler will issue a warning message that the assignment is deprecated.

Maybe great from the practical point of view: interaction with legacy code is possible, but imho this really sucks from a formal point of view.

NextGen is somewhat better. There are rumours that Java generics will be extended to NextGen in a future more complete revision of the Java Platform. I doubt whether this two fase introduction of generics is a good idea.

NextGen paper: NextGen is a more ambitious extension of Java, based on the same source language as GJ, developed by Cartwright and Steele that overcomes the limitations of GJ by introducing a separate Java class for each distinct instantiation of a generic type; all generic type information is preserved by the compiler and is available at run-time. Hence, type dependent operations are fully supported by NextGen. On the other hand, NextGen retains essentially the same level of compatibility with legacy code as GJ. For these reasons, we believe that NextGen could serve as the basis for an important step forward in the evolution of the Java programming language.

A few more links that might be interesting: Java Generics report.

There is also a presentation on generics for .NET that compares the approach chosen by .NET with the various proposals for Java. You can find some links at my overview of .NET resources. There is also a link to a paper.

(sorry for the long message ;) )