Xtext: An IDE on the cheap

The introduction of Helios (Eclipse 3.6) included the release of version 1.0 of Xtext - Language Development Framework.

With Xtext you can easily create your own programming languages and domain-specific languages (DSLs). The framework supports the development of language infrastructures including compilers and interpreters as well as full blown Eclipse-based IDE integration.

Given a grammar, Xtext derives a parser and an IDE with syntax highlighting, code completion, code folding, outline view, real-time error reporting, quick fixes among other standard IDE features. The models then can be used as an EMF Resource(ie as an interpreter) or with a little more work they can be used to generate code as well.

Check out the video clips on their website or the Webinar for a more detailed look.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

yeah, i wanna know

how it compares to Intentional Software, and whatever MSFT pushes, etc.

Intentional SW

The approaches you mention are generally supposed to expose multiple surface representations from a single "deep" model. It's quite clear that Xtext implements a lot of what intentional programming is expected to do, including quasi-structural editing of the textual representation ("refactoring" and "code complete") and syncing changes among text, exported AST and graphical representations.

On the other hand, the focus in Xtext is very much on writing a single grammar and then auto-generating its structural model, with no mention of alternate surface representations. There is some interesting support for tree views and graphical modeling but it looks mostly ad hoc.

Xtext is quite ad-hoc

For example, take this Eclipse TMF forums post by somebody just checking Xtext out: XText vs. traditional parser generators and template engines. There solution for resource identification is not well-thought out. Then again, nobody takes the idea of resources seriously. This is an aspect of these tools that will only be obvious 10 or 20 years from now.

Re: Xtext is quite ad-hoc

Besides that the newsgroup post is one year old, all the complains were based on missunderstandings. Xtext 1.0 comes with easy to use support for namespaces supporting nested packages as well as nested namespace imports. This is completely file agnostic and exactly how most JVM-based and CLR-based languages do this. That said it's just a default and you can also use file references or whatever you like better.

I do not understand your feedback

My best guess is we are talking past each other. In the post I linked, Harald clearly communicated (to me at least) that he wanted support for automatically deriving namespace support based on the grammar itself.

Xtext does NOT do this, and to achieve it, you provide explicit hooks that the user has to fill in - you cover this in your replies to Harald in the post linked above. I am saying that these hooks are ugly. The fact you and a fellow project member both recommended two different ways to solve the problem just shows how ad-hoc it is.

It's great that an approach like Xtext exists. Don't get me wrong. I just think it isn't the future. But as you might quip, you aren't building something for my grandkids to use (edit: I don't even have kids yet).

Xtext does namespaces

Sorry, if it wasn't clear. The post you refer to was old (version 0.7). At that early time there was no built-in support for namespaces but it was possible to do it.
Now (Xtext 1.0.0) it is there by default.
See http://vimeo.com/8235577 for a demo and http://www.eclipse.org/Xtext/documentation/1_0_0/xtext.html#scoping for an explanation.

Still talking past each other

1) The video didn't show me anything I expected... it didn't walk through the process of scoping, or describe how that even works.

2) The documentation you link to doesn't actually explain anything... sorry...

What I am looking for is how, based on some structural invariants, you can provide Uri features for free... so I don't have to implement them.

Don't think so

The demo shows linking based on qualified names. Though not cross resource, but it works as you would expect.
The documentation about scoping covers a lot of aspects local vs. global scopes, etc. AND it introduces the concept of an IQualifiedNameProvider, which is just what was asked for in that old referenced post. The default implementation already computes fully qualified names based on containment relationships, but can of course easily be adapted for single concepts.

MSFT isn't pushing Oslo any more

Developers could not get their arms around it. As I stated to one MSFT Architect, who in turn passed my feedback to the designers of Oslo:

Looking at the talks for PDC... Oslo sounds like it is asking me to learn a language just to use a proprietary tool? The language had better have a very small surface API or it will be an utter disaster. XAML is easier to justify because on the Web I would need to know JavaScript, CSS and HTML just to have beautiful interactive web pages. Then to port the application to the desktop and extend it with more features, I would need more languages and more APIs. So, the 30 second elevator pitch for XAML is easy. What's the 30 second pitch for Oslo's language? What process is [Connected Systems Division] automating? What script is being optimized? Pretend I'm a 40 year old developer with a spouse and two kids.

My goal when building systems is to eliminate labor, not manage it.

Oslo was massively downsized and folded into SQL Server. Microsoft is experimenting where to go with it next. Chances are the biggest users of it will be internally at MSFT at first, since Steve Balmer is relentless in consolidating product lines. I'm sure he likes the idea of consolidating Microsoft Dynamics and BizTalk and SharePoint and all that somewhat overlapping stuff. SharePoint is already a powerhouse in the enterprise, IIRC it is designed by the same guy who designed the garbage collection model for .NET (Brian Harry).

As I understand it, from a second-hand source (my friend who visited VPRI once), VPRI tried to get MSFT to listen and not use a GLR engine for Oslo. MSFT chose GLR. Flashbacks to 1980s when Alan Kay tried to convince Redmond of the power of spreadsheets. I think the reason VPRI dislikes Oslo is that it uses GLR parsing instead of SGLR or PEG (both use "lexerless" or "Scannerless" parsing), but GLR is also nondeterministic, so an SGLR tool like Spoofax doesn't change that attribute as well. Some academics, like Erik van Wyk (the creator of Copper), claim nondeterministic parsers raise the barrier of entry for people who aren't experts in defining grammars. I personally think Van Wyk misses the point, and that modeling tools based on textual DSLs should target people who read usenet://comp.compilers, gcc mailing lists, llvm mailing lists, eclipse tmf mailing lists, etc. Each generation of programmers thinks they can take a grocery clerk and turn them into not only a programmer but a compiler writer. Just read Oslo creator Douglas Purdy's bio: "[My] vision is to broaden the franchise of people building applications, allowing non-professional developers and end-users to harness the full power of computing."

I'm not really convinced xtext is the best we can do. Last I checked, it is an ammalgation of Eclipse TMF, oaW(Open Architecture Ware) and ANTLR. Yeah, it defines conventions so that you can build plug-ins to Eclipse so users can do a bunch of great stuff. But that stuff isn't automatic. You have to work for it. I guess if you are a famous researcher and can afford a post-doc to provide xtext bindings, then great! Otherwise, I don't see this being widely adopted. (It is managing my labor, not eliminating it.)

Paul Klint, Eelco Visser and Ralf Lammel are the ones who are really trying to push automation of things for the first time since Reps Ph.D. thesis and work with Teitelbaum. As far back as I know, Lammel has always been interested in what he calls "grammarware", which is a very Chomsy-like view to computer systems design, rationalist philosophical in that the grammar defines all - this also includes software like "grammar stealing"/"grammar recovery" tools("The 500 Language Problem").

Some people go so far as to want "structured editors" that only allow syntax-directed editing. Personally, from an HCI perspective, I think structured editing is only truly valuable for recursive edits like the Rename Refactoring. Once you understand that, what you really want to do is build a language to automate these recursive edits - a refactoring specfication DSL. Then you can automate notices to client library users that the API is going to go out of date.

As far back as I know,

As far back as I know, Lammel has always been interested in what he calls "grammarware", which is a very Chomsy-like view to computer systems design, rationalist philosophical in that the grammar defines all - this also includes software like "grammar stealing"/"grammar recovery" tools("The 500 Language Problem").

Grammars as the ultimate kernel languages...

Now the question is what do we learn when we balance our attention between parsing and language transformation? We cannot only derive parsers from grammars but also unparsers, expression generators and a few other tools ( how many essentially different ones? ). Keeping them all working constraints our choice of formalism more than parsing alone could do. When we focus on parsing we can say that a parse_tree -> AST transformation is justified and encode it in the form of "semantic actions" but then we somehow need to hide concrete syntax information in the AST for exact unparsing in refactorings. It gets more complicated with more tools derived from our initial grammar. I don't even see the tradeoffs yet at least not when it comes to CFGs and finite state machines.

Traditionally this led to insulated development not only on the level of individual languages but also the tooling for those languages. XText seems just a baby step ahead but I hope it gets enough publicity to sharpen awareness of the problem domain.

Cpp?

What really turns me off about these types of endeavors (ie there have been a couple of similar projects for the Netbeans IDE, Project Schleiman[1] was one) is: unless they use a 'standard' parser generator, the 'simplified' parser genarator is rarely powerful enough to specify non-trivial grammars ie the one in Schlieman lacked support of different lexer states.

Regardless of the fact that pretty much no parser generator is 'powerful' (or should that be 'psychotic') enough to implement a compliant C++ parser or at least without mighty hacks (or implementing your own ie Elsa/Elkhound, or simplifying the grammar, etc...). If I can't implement a C++ parser in XText, then what good is it? I know that's extreme, but it's really not very general is it?

But besides all that; just getting even gifted/advanced/senior software developers to see the need/desirability for DSLs is a non trivial tasks. When you have a OOP hammer then everythings a nail which is what gave us JEE in the first place.

[1] http://wiki.netbeans.org/Schliemann

Hammer time

When you have a OOP hammer then everythings a nail which is what gave us JEE in the first place.

Shouldn't that go something like "When you have an AbstractHammerFactory everything implements the IHammerable interface"? Or perhaps just "...everything subclasses Nail"?

Touche, well played sir.

Touche, well played sir.

Xtext uses Antlr ...

which is a very nice parser gen. It performs well, has extremely good generic error recovery and with its LL(*) algorithm you can parse a lot of languages. The lexer comes also from Antlr by default, but can be exchanged to solve harder problems (like e.g. python like languages).
That said, Xtext does not provide semantic predicates, which indeed is needed to do C++.

Just thought I could bring

Just thought I could bring bits of an update on this (that I'm only noticing, shame on me, as I'm rather interested in Oslo) and comment a bit (YMMV, as usual) :

MSFT isn't pushing Oslo any more

is to be mitigated at a minima : Don Box gave an update about it.

(Maybe there are even more recent info(?) somewhere else I'm not aware of yet; thanks in advance for the other pointers you have, then)

[...]VPRI tried to get MSFT to listen and not use a GLR engine for Oslo. MSFT chose GLR. Flashbacks to 1980s when Alan Kay tried to convince Redmond of the power of spreadsheets. I think the reason VPRI dislikes Oslo is that it uses GLR parsing instead of SGLR or PEG (both use "lexerless" or "Scannerless" parsing)

I'm a proponent of the idea that the qualities (and the unknown, as much as the known issues) of the PEGs' parsing strategy in the two complementary contexts of language processing & building tools are definitely worth investigating further, as objectively as possible, and as compared to the older, well-studied for decades, LL-/*LR-based parsing algorithms.

So, I'm not going to spit in the soup, but to have played quite a bit with several CTPs of the GLR-based Mg specifically, and the rest of the M tool chain in general, I found the overall usability of the latter pretty good -- from a language syntax design/parsing implementation point of view.

That said, I reckon I haven't thought much about the theoretical non-deterministic vs. deterministic aspects of Mg (M grammar) and also regarding the status of their GLR implementation against Mg's "competitors" (*), but that worked fine enough for me, and seemingly for a number of others, too.

(*) (out of time/energy, I confess I had to surrender at some point with my Reflector buddy in front of the amount of decompiled code I had to try understand, if only regarding how the GLR-based parser exactly works in relation to some other extensibility features I was interested in)

Also, even before Don Box' post, my feeling/guess had been for quite a while that the choice issues they'd likely have to face wouldn't be so much about the features of Mg/M as meta languages (to serve as a breeding ground of all these DSLs at Microsoft's customers that Mr. Purdy is envisioning with enthusiasm ;) than for them to figure out how all this new language + tools + API stack will fit without confusion/contradiction with the rest of ... well, say, the "pre-Oslo stack".

The latter, of course, being pretty rich already, to say the least... (for better or worse, depending on your own view about it, as usual).

[...]Each generation of programmers thinks they can take a grocery clerk and turn them into not only a programmer but a compiler writer. Just read Oslo creator Douglas Purdy's bio: "[My] vision is to broaden the franchise of people building applications, allowing non-professional developers and end-users to harness the full power of computing."

Thanks for quoting Purdy: I hadn't noticed this before. Well, I agree this is indeed a very (i.e., likely "too much") strongly put one, there, which can be quite easily seen as a call for "flames", or at least for an expression of counter-argument as strong (especially from the other extreme opposite point of view, which isn't mine either, but anyway...)

Just .02

Disappointed

I looked at the screencasts that demo XText in Eclipse and they looked neat.

Then I watched their "webinar" (that is a truly ugly word, BTW) and was disappointed. Perhaps it was just a bad presentation, but it involved lots of irrelevant Java which, as far as I could tell, had nothing to do with the problem itself. And they spent half an hour on parsing with a BNF grammar and citing statistics about the number of project commits and early adopter corporations, which left them about 10 minutes each to cover the three more interesting parts.

I once looked into writing an Eclipse plug-in and was similarly put off by the tons and tons of rigamarole involved. It is like the old Java EE, but worse.

At least they did not use the hated words "business logic".

"Business Logic"???

Not sure what Xtext has to do with 'business logic' but comparing Xtext development with the old Java EE leaves me speechless...

Plug-in framework, not Xtext

I was comparing the Eclipse plug-in framework to J2EE, not Xtext.

Xtext has nothing to do with business logic. Neither do 95% of the contexts in which the words "business logic" are used. So I'm glad you didn't use it. OTOH, I don't know why you mentioned Java Beans in the context of abstract syntax.

Anyway, I should apologize for my dismissive tone. While I didn't find that presentation very useful, I know it wasn't aimed at me, and I still want to take a closer look at Xtext sometime.

Ott - at the user interface level

XText autogenerates programming interfaces from language descriptions, while Ott (wich has been mentioned before on LtU) produces proof-assistant manipulation functions.

Thanks for this

I think I clicked on that link 3 years ago but didn't know what to make of it then.

An alternative but related approach would be José Meseguer's Rewriting Logic Semantics (RLS). So far it is mainly just an interesting prototyping tool within Maude, considering the applications of it that I'm aware of are generally too slow to compete with commercial functional programming languages.

Ott has an upper edge in that the designers had the foresight to consider compact notations (LaTeX math exoressions), but I don't believe this is a long-term competitive advantage related to methodology.

Just tossing that out there, wondering if you had an opinion.

How Does it Compare to MPS?

JetBrain's MPS sounds similar?

MPS and intentional workbench are similar

Xtext relies on text. MPS and intentional's tool are projectional editing workbenches, which have big workbenches with projectional editors. They do look like text (in screenshots) but are essentially form editors on an in-memory AST. It's a very different approach. The main difference from a user's perspective is, that with Xtext you don't need an IDE at all to edit the text and run compilers or interpreters. In addition you can integrate the sources in common DVSs. Also because the editor is a text editor it also feels like one. What you get from Xtext is really close to a traditional language infrastructure, though we think we rely on very nice building blocks (antlr, guice and EMF) and provide a great architecture (I hope 'architecture' is not one of these 'Java,J2EE,business logic' terms :-P).

text vs. form editing

They do look like text (in screenshots) but are essentially form editors on an in-memory AST. It's a very different approach.

Can Xtext support multiple textual representations (syntaxes) on the same AST (for instance, s-exprs and pretty-printed M-expressions for a LISP language)? Can the existing EMF/GMF integration be used to provide "form editing", in addition to the graphical language which was shown in the webinars?

The idea of structural editors ('form editors') is of course very old and they have consistently failed the usability test, but providing both free-form and structural 'views' on the same code is likely to be effective.

Yes, it can

You can use graphical or form based (or whatever) editing on top of an Xtext resource. There's a short screencast on the web site demoing it. You could also theoretically have a different textual syntax, but nobody did that yet. The OCL project at Eclipse uses Xtext in a projectional manner, that is they have XMI storage under the hood and use the unparser to populate the text editor.

Interesting

I would like to use it, but how well does it do on large files? I.e., will Xtext still perform on 20kLoC?

Performance / Scalability

You don't want to have 20kLoC in one file, do you?
Xtext performs pretty well and is used in large projects. E.g. BMW uses Xtext in the automotive industry where they have a lot of files in one project (see BMW's screencast).

That said, it highly depends on what you as a language designer do. When people face performance problems with Xtext it most of the time boils down to expensive custom implementation in linking, scoping and validation. The framework stuff is quite optimized as we do regular profiling sessions. On the other hand we generally do not trade good and clean code for only little performance improvements...

Autogenerated code

20kloc might seem excessive to write by hand, but sometimes source is generated by a tool (e.g. parser generator) and it is very, very, very annoying to run into scaling problems like this.

Unfortuantely it's necessary to assume that users of your language/tool will abuse anything you give them--and yes, it's your problem.

DrScheme/DrRacket's not bad in this respect.

If you happen to have implemented your parser in PLT Racket as I did, then you can get nice coloring (and wonderful little identifier/binding arrows!) on the cheap.

I'm just happy to see some more work along these lines. For DSL implementers it's hard to do much more than an emacs mode.

I looked at the MPS link above, but it's a little more invasive than I'm interested in right now. I really just want the IDE support, not help with the language design/implementation. Anything else out there?

There are techniques...if

There are techniques...if you already have a compiler that creates syntax trees, you can adapt those trees for use in the IDE. If your language is complicated enough so that compilation is relatively expensive, you can find a way to make your compiler incremental at some granularity of trees. This is what I did for Scala (though I don't think it survives today), tools like xtext are not as useful when the compiler already exists and the syntax/semantics of the language are fairly expensive to re-encode in another tool (if possible at all).

Xtext ...

seems to be what you are looking for. Isn't it? Why not?

Anything else out

Anything else out there?

Oslo's toolchain (not deeply integrated with Visual Studio yet, and no integration with other IDEs)

Spoofax,

MedaEdit+,

For NetBeans,
- somebody mentioned Project Schlieman, which was pretty much dead from its conception... the people behind it didn't really think through how complex the task would be. Although, it was a very cool project name! NetBeans since moved on to GSF, and GSF is now forked into the CSL project, which is essentially GSF but with the Parsing and Indexing API.

Xtext ...

seems to be what you are looking for. Isn't it? Why not?

Oh, yes, sorry svenefftinge

Xtext seems very cool, and I hope to give it a try. I just forgot to write that bit ;-). I was only asking if there were others I didn't know about for thoroughness's sake.