Embedded Languages in Java

ok, this may be one of those stupid questions that makes no sense, but can anyone give me any references, ideas, pointers, examples, etc to embedded languages in java. i don't mean compiling non-java to class files, but taking the idea of embedded or domain-specific languages and somehow making them work in java. for c++, for example, you might think of the recursive decent parser that uses template expansion (spirit?). but java doesn't have templates. so perhaps the question is more about whether there's anything useful to carry across from using dsls into "plain old" oo design. comments? have i missed something really obvious? thanks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

next version

The next version of Java (1.6?) is supposed to allow easy hooks for languages to be used from within java (at least that's how I understand JSR 223). I'm not exactly sure what you mean by 'making them work in java' and how that might be different from 'compiling non-java to class files.' There is a very good list available here. Just recently Jetbrains released an experimental tool which allows 'meta programming.' Some say it is actually nothing more than fancy, re-invented LISP.

sorry, i wasn't clear. i l

sorry, i wasn't clear.

i like the way that the syntax of many languages is flexible enough to allow them to be extended so that you can build a "new language" that is used from within the existing language. but for some reason - perhaps just a lack of imagination - i can't see how that would work well in java. the syntax is just too clunky. everything is name.verb(...). so i was wondering how people worked around that.

for example, in haskell the s

for example, in haskell the syntax is sufficiently frlexible to make something that looks "simple" even if it uses higher order functions to do the heavy lifting.

in lisp, you can use macros to add syntax of your own.

in c++ template methods (do i have the right terminology?) and the type system let you write "code" that is constructed into an ast at compile time (if i understand the technique correctly).

in java, in contrast, i seem to be restricted to using a different language completely (via compiling to byte code), or moving things into xml config files, or relying on "patterns" as a way to describe how the objects fit together (it's been said in the past that lispdoesn't have patterns because you extend the language instead of making stories to aid the memory).

so i'm wondering if i've missed something.

Two other choice

One is that of using the language just like you'd do in the usual embedded dsl approach, just less nicely, i.e.:

obj.expects(once()).method("receive").with(eq(message) );

this is an example of usage from the JMock library, obj is a mock opbject built at runtime to simulate a behaviour and check that it is used in the expected way.

Another way is to have some mini language compiled to java code instead of bytecode directly.
AFAIR this is what Jythonc and AspectJ do, and what is done in many "extending java" papers such as the recent implementing pi-calculus in java or old ones about Jam etc.

indeedy. in fact the present

indeedy. in fact the presentation that nat mentioned (and which i have finally got round to reading this sunday afternoon) is all about the evolution of jmock. it's very interesting - i encourage him to make it public and/or encourage others to leave comments on his web site asking him for copies... :o)

Steve and I are writing it up

Steve and I are writing it up as an article that will go in the BCS ACCU Overload magazine and we intend to write a more academic paper, possibly for OOPSLA or a similar conference. Suggestions of conferences or journals suitable for this material are very welcome.

everything is name.verb(...).

everything is name.verb(...). so i was wondering how people worked around that.

Like this ?

new Select().field("a").field("b").field("c").fromTable("abc").runQuery();

now that you point it out, i'

now that you point it out, i'm kicking myself - i was looking at c++ output streams at about the same time as i wrote my post!

on the other hand, it's not as "languagey" as i'd hope, but it's certainly sweet. thanks.

Maybe this can help

Polyglot is class library that is can be used to create a compiler for a language that is a modification to Java. Not exactly the same approach you were asking about, but it can be used for similar purposes.

thanks, but that's too "compl

thanks, but that's too "complicated". this is for work, and anything that can't be done within the language is pretty much ruled out.

i'm fighting enough battles over general design/architecture. i don't have the energy to start fighting over the technology as well, especially when java is already a leap forwards from fortran 66...

wrong language

The biggest strength of Java is that it stops you from doing this sort of thing. If you're debating architecture, I would say that Java is the wrong place to it. The java mindset is very much a "do it this way", and to be fair, "its way" isn't too bad. It scales with number of coders quite easily, and coders of almost any level can get tucked in quite quickly. That's why IBM like it -- you get almost linear increases in productivity with more coders. It may not be good for small teams of

Nicely put...

And exactly the sort of argument I so rarely see on this forum. There's some serious work to be done to determine just why Java scales so well over team sizes and is productive at such a large range of programmer skill levels. Some of it is clearly language, some of it is amazing tools, and some of it is community, but beyond that it gets hard to determine root causes.

Scaling near-linearly with team size is very nice, and decidedly unusual for a programming technology, but I can't help but feel that there's still scope for super-linear scaling. It'll require much tighter team integrations and reuse support that currently exists, certainly.

Illustration

I've worked in Java for the past 5 years or so. but I don't understand what is meant by the assertion that it scales better than other languages.

I've seen large Java teams and programmers with a wide range of skill levels using Java (successfully and unsuccessfully). However, these observations could simply be explained by the fact that Java is the mainstream business language of the moment (for reasons that may not be solely due to the quality of the language) so there are a lot of people using it.

Can anyone offer examples or comparisons that illustrate how Java scales better than other languages?

Examples...

Some possibilities as to why Java scales so well with team sizes and skill-ranges (I'm not wedded to any of these):

The standard and open-source-defacto-standard class libraries are enormous, and almost all of them work in every supported environment. No training up on some third-party threading system that only one guy on the team understands, and he misuses. No ditzing around trying to get Joe-Bob's String library to compile on your machine. Documentation for standard libraries is generally of very high quality. For scalability, this means vastly lower amounts of potentially-incompatible rework, in both specification and implementation.

The modularization and encapsulation facilities in Java are adequate to the enterprise task (with caveats), and the defaults are generally well chosen. It takes novice programmers quite a bit of work to truly screw up application areas that they haven't been assigned to. In terms of scalability, this means that novice programmers can be used much more safely than in C++ or VB land. It also means that programmer tasks can be partitioned at many levels of granularity, which is necessary for large teams. Finally, with few exceptions the encapsulation and modularization functionality mean lower costs for integrating large systems (libraries don't conflict), which mean you can split up teams even more.

Lack of control flow abstractions beyond strict single-dispatch polymorphic method call lowers the training requirements, clarifies module boundaries, simplifies documentation, and generally prevents confusion. Yes, this does come at a cost to expressiveness that many here find too high.

Tooling is excellent, with very fast compilers, extremely clear and easy cross-platform build systems, shockingly powerful editors, and best-of-breed static and dynamic analysis systems, all available at low-or-zero cost. This means low per-developer fixed-costs, lower training costs, and generally easier availability of development talent. (Caveat: The cost and complexity of app-servers does somewhat limit this.)

Lack of syntactic extensibility eases documentation, training, and integration costs enormously, and also helps bring the power of available tools up and their costs down. Yes, this does come at a cost to expressiveness that many here find too high.

libraries

You know what? Libraries written in C are enormous as well. Here in my GNU/Linux system most of the standard interfaces are written in this arcane language. Everything from database interfacing, to GUI building to xml processing is C all the way. Most development tools are targeted for C. It's got years of mindshare, is a small language you can learn in a day and comes complete with lots of tools and libraries for any task imaginable.

Why should i exchange that for the java ones when the C ones are faster and less bloated? Or why should i go through all the trouble of compiling higher-level wrapper bindings for higher-level languages?

C clearly is the superior language, ain't it? Oh, yes, this does come at a cost to expressiveness that many here find too high.

Not quite sure who you're responding to...

...as nothing I wrote was toward the question of Java superiority. It was about scalability over the development team sizes and developer skill levels. C is a fine language, in which a huge amount of great software has been written. That said, there are real problems with putting novice programmers on C projects, and integrating C projects written to different libraries and environments. C has many virtues, but no one is going to include "portability", "safety", or "modularity" among them. Those hold back it's scalability, and keep development costs in C very high for large projects.

(BTW, if your response was intended to be an ironic sendup of foam-flecked advocacy posts, it was well done, although it could have been a bit longer. The fanboyish 'GNU/Linux' reference was a perfect touch.)

GNU/Linux and C

What's the problem with acknowledging the GNU project?

C has many virtues, but no one is going to include "portability"...

Sorry, but i will. C is far more portable than Java. It runs on all hardware imaginable. Write once, compile everywhere. Java runs on about a few supported platforms and that's it.

If you're using specific resources from a specific platform, well, that's just as much of a portability problem as in Java or any other language...

Those hold back it's scalability, and keep development costs in C very high for large projects.

Perhaps. If you don't acknowledge a large project is actually just a lot of smaller subprojects interfacing with each other, that is. Naming conventions can ease the pain of a lack of proper namespaces...

And C and C++ header files are a lot better to convey interfaces than searching through scattered method definitions inside java code or depending on huge, bloated IDEs to browse that...

And Makefiles convey relationships between modules of a project just as nicely as well...

but, yeah, my original post was sarcastic. i'm not recommending C as an example of a fine modern language... still, it's got some good points...

Portability vs. Availability

Sorry, but i will. C is far more portable than Java.

I was afraid of this :-). I think we must distinguish two notions of portability: (1) can be ported to many platforms, (2) has the same meaning on any platform it runs. I would rather call the former availability - C surely excels in that regard. I.e.,

It runs on all hardware imaginable.

However, if by

Write once, compile everywhere.

you also suggest the latter then it is distinctively not true, as anybody knows who has ever struggled writing a "portable" C program. In that regard, Java is clearly superior, because it is much more high-level.

modern portable C code...

... uses portable, cross-platform, highly tested open-source C libraries. Just as in java.

as anybody knows who has ever struggled writing a "portable" C program.

most people who say that haven't touched C since the 80's... you'd be amazed with how much the free software infrastructure has changed that...

I don't buy it

Have you ever had the pleasure of using some of these libraries cross-platform - say, on Linux and Windows? I had.
most people who say that haven't touched C since the 80's...
I have. And I just got a bug report that our software segfaults on Fedora 4. Most likely while calling Gtk (which does not necessarily imply that it's a bug in Gtk, of course).
you'd be amazed with how much the free software infrastructure has changed that...

I know, but it still cannot compete with a platform-independent high-level language/library, like Java is trying to be (with relative success). Here, all the pain of achieving portability is shifted to the language vendor.

Also note that the such free libraries are usually written by highly competent people, who have decades of expertise in working around the inherent non-portability of C with lots of macros and conditional compilation. "Modern portable C code" is not automatic, it must be worked out.

i agree

"Modern portable C code" is not automatic, it must be worked out.

Yes, you see, i was talking about application software leveraging from such good infrastructure. It's not unlike in Java: people are building their apps using high-quality libs written by experienced infrastructure folks.

like
all the pain of achieving portability is shifted to the language vendor.

exactly the same for C libs, written by GTK people and others...

what's your point, then?

Abstraction

A high-level language is an unbreakable abstraction barrier that can guarantee portability (and other desirable properties). No such thing is possible with C - you're on your own. And more likely than not will fail, unless you have access to all relevant platforms and are willing to test and debug on all of them.

E.g. we just found out that our stuff does not work on 64 bit platforms, although we tried to be very careful about platform independent abstractions - a few things slipped through nevertheless and it's not easy to locate and eliminate them now.

no doubt!

"A high-level language is an unbreakable abstraction barrier that can guarantee portability"

yes, but...

"although we tried to be very careful about platform independent abstractions - a few things slipped through"

that's still your fault, not C... :)

I find C to be sufficiently clear and high-level if i want to. Nothing fancy, but good enough when in company of rock-solid libs and a few nicely designed macros to ease the pain of lack of syntatic sugar for better control constructs.

But yes, i never tried building a GTK app with PostgreSQL access and then try to make that run on Windows, Macintosh or Sparc, so beats me...

The original point

that's still your fault, not C... :)

Sure, but my whole point was: in contrast to your original claim, C is not portable per se - you have to carefully craft your code for portability (and in fact, you always have to rely on a lot of silent assumptions, because there are much more subtleties than most people realize).

Some time ago, I had an idea for an interesting project (don't know whether it's new): building a test-bed implementation of C that completely randomizes behaviour on all aspects that are actually undefined or implementation-defined according to ISO. Not sure whether people would like it, though, because you probably wouldn't be able to write even the most trivial programs in it reliably ;-).

That's what I actually meant...

Yes, if you expend a lot of up-front effort, you can productively program in C. That effort largely comes in the form of training. Training to avoid the thousands of gotchas, training in extra-linguistic idioms (naming conventions, makefiles) necessary to manage language deficiencies, training in the extra-linguistic tooling necessary to build and deploy, training to find the errors that occur when conventions are inevitably broken, training to find adequate libraries and assess their weaknesses, training, training, training. Once you've got all that training, though, you are quite right that C you can be perfectly productive, although I would say that their are still high ongoing costs. Without the training, you're a disaster waiting to happen, and of high negative value to your employer.

From the point of view of a developer, that's actually a pretty great deal. Training in difficult but necessary skills makes them more valuable, and increases their security. Once they are trained, they can view the training costs as sunk, and simply ignore them, as you seem to be.

From the point of view of those who employ developers, things aren't so rosy. Training costs a fortune, and drives up the costs of hiring already-trained developers. The dead-weight cost of all that training makes it very tough to scale teams. My initial comments about how Java scales well with team size pretty much all boil down to Java having greatly lower training costs.

Makes me wonder how much of the recent IT slump could be attributed to the rise of Java, VB, and related technologies. A Java or VB programmer is simply less expensive to train and maintain than a C programmer is, and in the short term that will lead to lower compensation if shops switch from C to technologies with lower cost profiles. Declining compensation means a temporary unemployment surge, as programmers search for lower-paid employment (rather that staying in one place with pay cuts). Interesting.

Half-heartedly agree

With most of your claims. But still, look at your desktop (I am assuming your staring at one at this moment) and the programs you are running. How many code bits in memory are compiled C? 80%? 95%? They must be doing something right.

What???? ANSI C remains the m

What???? ANSI C remains the most portable language available. I can think of plenty of platforms lacking Java support, but very few indeed that don't have a C compiler.

C also has huge numbers of libraries that are standardised.

I suspect that the reason why java scales only linearly is that because it doesn't allow you decent abstractions, good programmers don't add as much as they could, while poor programmers will find it relatively easy to understand existing code (because they won't be shocked by an abstraction which would have yielded comprehensibility benefits beyond their imaginings if they could have been bothered to understand it), and the use of the garbage collection and exception tracing ameliorates the problems of sloppy programming.

Ported != Portable

C code targeted for a particular platform rarely runs unmodified on another platform unless you put in a bunch of conditional compilation statements (how big is an int anyway?)

Oh man that is such a killer

Oh man that is such a killer argument: Java programmers always check numerical code for overflows, given just how much numerical code they write.

The fact is that Java code very rarely is numerical, instead dealing with business logic. In these cases, the size of numerical types is irrelevant.

Overflow should be consistent

I'm not particularly keen on Java, so I'm not sure who what windmills you're arguing against. Anyhow, if java code overflows, then that overflow should occur on any platform given that you are actually running code against the same (virtual) machine.

I mention int size in passing because C does not specify the range for it's most basic of basic types. Could be that it's 8, 16, 32, 64, 128, or even 9 bits. Yes, everyone ends up programming around such a basic non-standardization of type (giving us a proliferation of basic types with annoying X_ names).

My point is that for most jav

My point is that for most java applications it simply wouldn't make a difference if the size of types were not standardised.

We already had that

ANSI C remains the most portable language available.

To repeat my argument from above: being the most widely available language does by no means make it the most portable one. As a low-level language it's far away from being the latter.

Python

The language I like to pit Java against is Python. The former tends to be use for "enterprise" applications — sprawling hundred KLOC applications, possibly distributed over a global cluster, etc. worked on by large teams from different continents. IBM's Websphere Application Server, for instance; Portal would sit on top, then you get other things like Lotus Workplace, and addons that utilizes Portal's XML rendering for other things. The resulting application requires no less than 2 AIX servers and possibly a few helping dual Xeon blades, and then still takes 30 minutes to start (if you're lucky). Python, on the other hand, tends to be more dainty. Little configuration apps for Linux desktop distros (Ubuntu and Fedora), or at most a whole package management system (Gentoo). The teams that produces these aren't very large either, one being very common and most would not be over 5. The question is why? (That's purely a question, I don't have an answer)

As far as super-linear scaling goes, I think it would be nice too, but I can't think of even one situation. Python I would argue scales super-linearly with lines of code, but any decently orthogonally organised language will; Java, being very hierachical only does so linearly. In terms of people, it becomes interesting. I would say that again, for a small team, Python get a faster head start (the first derivative is bigger near zero), but it has a negative second derivative (damn those pesky second derivatives!) </physics-geek>. :D

I do wonder though if the smaller Python teams result not because they don't scale, but because they don't need to scale. I've certainly never thought "this Python app is really difficult to develop -- if only I could turn it into Java and hire a bigger team". In other words, are Java apps too large and the teams larger than necessary? I would think a suggestion to redo Portal and LWP in python wouldn't go done well...

Political vs Technical?

I think the reason is more political than technical. In my experience Java is often selected by large IT shops for political or dogmatic reasons rather than on its technical merits. In big enterprises, your worth is usually judged by the number of people underneath you, so the IT manager or architect chosing Java is also likely to want to build a large team.

Python is not a language that has a big marketing budget or hype bandwagon behind it. If it's chosen, it's chosen for technical reasons -- code clarity, for example -- by people who are trying to get things done, not climb the corporate greasy pole.

If you're debating architectu

If you're debating architecture, I would say that Java is the wrong place to it.

i think you're confusing two different things. i was asking about embedding languages in java for my own benefit and as a "low level" implementation tool.

when i referred to arguing about architecture i was referring to things like using service oriented architecture, or not, and what that "really means" (do you use federated jms or a single server; where do you put the trade-off between few, complex messages with low coupling and simple, frequent messages with higher coupling; etc). these are options within the general "java way" as far as i can see.

Sleep, a ready for extension language implemented in Java

Hi Andrew,
I might recommend you take a look at Sleep: http://sleep.hick.org. Sleep is a scripting language built for Java with several hooks that allow one to extend and add on to the language. The base language is inspired from Perl. I like Sleep, but I'm kind of biased, I wrote it.

Maya, Nemerle

Maya is a Java compiler that can be extended with user-defined macros. LtU discussion.

Though not Java, Nemerle's macros are like Maya but even easier to use.

Scala gets pretty far without allowing users to muck with the AST. Look at the section titled "Scala is extensible". Scala compiles to Java bytecode so it might be of use to you.

We presented on this topic at SPA'05 and will at JAOO

Steve Freeman and I gave a presentation at SPA'05 entitled "Evolving an Embedded Domain Specific Language in Java". We will be presenting something very similar at JAOO this year, probably also addressing our experience with C#. If you wish, I can send you the PowerPoint slides and notes.

thanks. i'm about to email y

thanks. i'm about to email you for the info, since i suspect people may no longer be reading this (now old) thread.

Reading

Oh, we are reading it alright...

Slides for this talk

Frink

You might also look into Frink as a language that's implemented in Java and embeddable into a Java program or one from which you can call arbitrary Java code.

Frink's lexical analyzer and parser is built using a combination of JFlex and JavaCUP. (Ooh, I just noticed that JavaCUP has moved and has new features added!) JFlex and JavaCUP both allow you to relatively easily build a parser for a domain-specific language that can integrate well with existing code. Granted, this is not really "in" the Java language, meaning they share the same parser, but they can share the same memory space, objects, and references.