DSL Error Handling in an Object Oriented context

I have been a lurker on LTU and have already learned a ton. I am currently developing an in house DSL, and I have come across a problem to which I haven't been able to find resources.

I am currently implementing the DSL in an object oriented language (ok, its Java, don't boo and hiss). I am not entirely sure how I should be handling errors that arise in the DSL (syntax errors, semantic errors). Is it enough for a log file? Should I have a DSLerror class of some sort?

Sorry if this is a naive question. DSL construction is a fascinating (yet relatively new) topic for me.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Language environment?

How's the language hosted? How's it used? Is it a compiler, and interpreter, something integrated into a GUI, or something else entirely?

For a DSL compiler, you're usually fine with the normal compiler error mechanisms. Record position/line numbers when you lex it into tokens, then if you get an error (either syntax or semantic), spit out an error message with a line number and abort. If you want to be more user friendly, you can try to recover and spit out the rest of the error messages so they don't have to rerun each time.

For an interpreter, you have a bunch more options. Do you want to stop execution, let the user try to fix the problem, and pick up again, a la edit-continue or the Lisp debugger? Do you want to implement something like the Common Lisp Condition System, where the environment offers a bunch of programmer-defined options when an error occurs, some of which may include fixing the code and running it again? Do you want to silently ignore errors and run what you can, like what Javascript often does? Or do you want to go the compiler route, which is still an option for interpreters?

And DSLs-in-GUIs give you a bunch more options like continuous compilation and direct manipulation. You could do things like Eclipse, where compilation errors get flagged in the margin while you type. The interface issues are conceptually simple (just translate the line number from the compiler error into a GUI annotation), but making the parser fast enough to run while you type can be a challenge, and you'd probably need some form of incremental compilation.

As for the actual nitty-gritty, yeah, you're probably best off making language errors into Java exceptions. Here's a great opportunity to take advantage of the fact that Exceptions are full-fledged Java objects, and can have methods and fields and class hierarchies. You can put things like the position/line number into the exception, as well as things like the expected type for typechecking errors, and then provide methods that perform computations on them.

John, The DSL is in a very


The DSL is in a very formative stage. As I am writing it now, it is being developed in Java (I am checking out OCaml and may try switching over when I am proficient enough). At the moment, it is written to simply accepts a file as input. Further down the road I would like to develop a simple IDE which would flag errors as you type a la Eclipse or netbeans.

Syntatic/semantic errors will cause the system to stop; it will not fail silently or try to run what it can while ignoring errors.

I was thinking of either using Exceptions or creating my own error class.

Thanks for the response.


EMF as DSL platform

Further down the road I would like to develop a simple IDE which would flag errors as you type a la Eclipse or netbeans.
Why "a la"? Why not just capitalize on the strong side of Java platform - libraries and tools?

My current approach to DSLs (in Java) is to use EMF to define AST (ecore model in EMF parlance), rely on a default serialization (XMI), generate Java classes that represent AST, and optionally generate a simple tree-based editor for this AST. In 20 minutes you are ready to code your interpreter (without writing a line of Java until that moment - sounds like marketing, eh? :-) ).

Granted, this approach is somewhat limiting, so I would not recommend it if you are planning to create the next Haskell :-)

If you later decide you want classical textual syntax as the main representation, you are mostly on your own - parser and editor are your responsibility (both rich text and graphical editors are easily doable in Eclipse, but current libraries are not well integrated with EMF). I found that given a lot of keystroke shortcuts and ability to direct edit, trees are not as bad as one might expect, though. Also, the help is coming, there are several OSS projects aiming for augmenting EMF toolchain in area of editors (GMF, Merlin, VE, to name a few).

This is not to discourage you from exploring OCaml or other platforms for PL development, but if you happen to use Java, why not use the tools.

Parser generators

There are pretty decent parser generators for Java, so you're not totally on your own with classical textual syntax. See JavaCup and JLex.

Integrating these in with Eclipse may be a little tricky. They're probably a bit too slow for real-time compilation, and IIRC Eclipse uses a rule-based scanner that's based on a completely different model from LALR parsing.



Thanks for the info. I have been using netbeans, but I will check out the Eclipse modules you talked about.


I will check out JavaCup. I have checked out JavaCC already. In the meantime I have implemented a simple S expression parser so I can explore some of the other concepts.

Thanks again or the information.



Check out SableCC (http://www.sablecc.org) as well. It's the best parser generator I've used (apart from Parsec, perhaps).

How abstract is that AST?

In my experience, using parser generators often results in unnatural ASTs, like collections of subnodes represented by cons-lists (this is unnatural if you generate Java). I heard good reviews of JJForester, but could not evaluate it myself (it's not in Java, right? :-( ).

[on edit: so I really mean that, when using parser generators, one starts with textual syntax, and tries to "abstractize" it somehow. What I do prefer, is starting from AST, and then choosing the serialization syntax]

It depends

With bison, you can generate whatever you want in response to each syntax rule. With antlr, the default is to construct generic AST objects, but you can override that.

And, with any AST structure, you can use a factory pattern to generate more specialized objects to represent the nodes. That's what I did the first time I used antlr (when I didn't realize I could override the default).

As abstract as you want to make it

SableCC allows you to define a translation from the concrete syntax tree to an abstract syntax tree in the grammar. The AST uses idiomatic Java (java.util collections, etc.)