Language Oriented Programming

Sergey Dmitriev of JetBrains has written a whitepaper on domain specific languages. It is called "Language Oriented Programming: The Next Programming Paradigm" and is available at Language Oriented Programming

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Lisp

I think it is very fortunate that Dmitriev is unaware of Lisp, otherwise he may have never started this quite interesting work.

Not entirely

Not that the reply is anything stellar

"Yes, I agree that you can emulate some aspects of LOP, using LISP's macros, as well as by using C++'s preprocessor and so on. But you also have a lot of restrictions with such approach.
The first problem here is that in case of macros you don't have a freedom to define any syntax for language you want to - it is restricted by the syntax of these macros.
The second problem is that you can define only local code transformations, for example you cannot define macro that changes code in many different places in your program - so, you are very restricted in the kind of code generator you can define.
Third restriction, is that you have fixed target language for code generation - LISP in your case."

What exactly does it mean that Lisp macro are restricted by their syntax? As far as I know, the transformations macros can do are arbitrary. One only has to take a look at LOOP.

As far as transformations being local, given that macros can do arbitrary transformation, and have the full language at their dispositions, I believe that isn't a problem. Either the macros could share a global state, or one global macro could walk down the tree. Both sound like they would generally be potentially dangerous; I'm not sure why one would want to do non-local transformations.

Finally, some would argue that having Lisp as the only output language isn't a problem ;) (Plus, they all compile down to machine code in the end.) Seriously, there are several examples of developers using a lispy syntax for assembler, C or their own language, simply to be able to use Lisp and macros more easily. I fail to see why one cannot transform data as well as code.

From what I understand of the article, the tool provides a way to parse languages into an AST, tranform the AST, and finally interpret it or transform it again into the target language, using a different DSL each step of the way. Lisp offers the same complete language for each step, and also lets one do each step incrementally. If I don't have trouble with sexp, I can use them directly; if I do, I can choose to balance how much work should go in parsing with the benefits of non-sexp syntax. When I want to interpret or compile the language, it is much easier to stick to Lisp and output it than to define yet another interpreter. It's only if I want to output to another language that I have to define its syntax, and if my needs are simple enough, I may be able to output strings directly without having to resort to arcane FORMAT strings. It's, imho, a much more flexible and dynamic process than having to specify everything before having anything testable.

Where I find the approach interesting is in the parsing DSL itself and the integration with the editor. I'll probably look at the project from time to time, just to see if something really interesting has sprouted there.

MPS isn't enough?

I have to agree. I don't think Sergey has seen read macros in Common Lisp, which can alter the syntax quite dramatically, though it still might not be satisfactory for him.

From what I've read from Sergey's website it seems like MPS is pretty similar to ANTLR, but it comes with a graphical editor. Have I missed something here?

Those scanner, parser & tree walker generators still aren't powerfull enough IMHO. You can't create constructs that will accept an arbitrary number of sub-construct arguments, eg a list comprehension builder that takes an arbritary number of filters and list generators. To make that possible you need to programmatically alter the raw AST. More experienced Lispers can probably point out other transformations that can't be done.

Sergey also seems to support writing external DSLs instead of embedded DSLs. The costs of external DSLs to me seem to outweight its benefits. Even if MPS can generate the compiler for you, you still need a debugger, profiler and IDE. You also need to write your own libraries or at least a foreign function interface to make use of the base language libraries. With embedded DLSs you can leverage all of that from the base language natively without additional work.

Still I have a lot of respect for the effort Sergey put into MPS. We need more advocates of bottom-up programming.

wannabe DSLs?

Dmitriev has an interesting vision, but it's a stretch to say "all the libraries in the JDK should be DSLs." It's common for programs to be substantial just because they bring together multiple libraries. With conventional programming this is simple, and the programming effort goes into solving the given problem. On the other hand, with multiple DSLs

You could create a new language by extending an existing one, inheriting all of its concepts, modifying some of them, and adding your own.
In other words, in language oriented programming (LOP) the DSLs must be explicitly reconciled in order to be used together. Given a choice between libraries and LOP DSLs, using libraries would be easier.

> it's a stretch to say "all

it's a stretch to say "all the libraries in the JDK should be DSLs."

I agree. I'm not sure whether Dmitriev really meant to say that (since it's an easily disputed claim), although it's very easy to read what was written that way. What I would hope he meant is that some (whether it's many or few is a separate point) libraries are definitely good candidates for language orientated programming.

I have written down a few thoughts of mine on the subject of Language Orientated Programming, most of which I suspect won't come as a big surprise to LtU readers.

SQL can already be embedded

There's already a standard called SQLJ for embedding SQL into Java, and similar things for other languages. I believe it checks SQL syntax. People tend not to use SQLJ because they want to use standard Java compilers. Also, it's pretty easy to check SQL syntax in unit tests - you do have unit tests, right?

> There's already a standard

There's already a standard called SQLJ for embedding SQL into Java, and similar things for other languages.

I think you're taking things a little too literally ;) I only used SQL in Java as an example that's trivially understandable by most people [I rarely use either language, and personally feel that Java is grossly unsuited to such things, but it serves as the lingua franca of the modern programming world whether I like it or not]. Yes, SQLJ is a particular hack for SQL under Java - but what happens if you want to embed something else? What happens if the embedding is more sophisticated? Do you really want to have a specialised hack each time you embed something new?

People tend not to use SQLJ because they want to use standard Java compilers.

In the LOP view of the world, the facilities for extending the language are built into the compiler. Using a new DSL is thus no more difficult than having the relevant library lying around.

Why most of us won't be inventing little languages

I responded to this on my weblog:

"Sergey seems to believe that the main barrier preventing the invention of new computer languages is that it's too difficult to write language-specific tools. While that's a natural argument for a tools vendor to make, l think it misses something fundamental about language. The real downside of inventing a new language isn't that you have to write new editors, compilers, and many other tools. It's that other people won't understand you." [more]

Communication problem

I think this is an excellent point. I think the main point that the LOP article misses is that programming means not only communicating with the computer, but with other programmers as well. More DSPs are not the magic bullet for that.

Language design is hard

I completely agree. Having parser generators and editor frameworks will not change the fact that language design is hard. As other poster pointed out, one of the difficulties is that programs communicate not only with the computer, but with other programmers as well. Having a Babel of languages in a single system may be worse.

I'm all for using a good DSL when it's advantageous, but I don't like the idea of having little languages sprouting everywhere.

Too Late

I don't like the idea of having little languages sprouting everywhere.
How many shells are there?

How many programs have something they each call rexexps?

How many different ways do programs take command-line arguments?

Unix is an absolute mess of languages and multiple conventions with the unifying structure of byte streams (maybe signals and other small pieces too).

Oh, I forgot the config files.

But that's just Unix, maybe some other operating system has it better?

You do have a point

Yes, you have a point. Maybe I'm alarmed by someone saying that all the libraries in the JDK should be DSLs :)

But the real concern is the proliferation of ad-hoc languages, designed by people with no clue, mixed in the same source code. I can imagine the horror of trying to read a piece of code using 4, 5 languages at the same time, most of them badly designed. Yes, library design is (almost) language design, but in this case at least the concrete syntax is fixed and uniform.

> But the real concern is the

But the real concern is the proliferation of ad-hoc languages, designed by people with no clue,
> mixed in the same source code.

You're assuming, somewhat optimistically, that "proper language" designers have a clue. I can think of several mainstream languages that lead me to the opposite conclusion ;)

I can imagine the horror of trying to read a piece of code using 4, 5 languages at the same time,
> most of them badly designed.

I don't think anyone should be under any illusion that creating DSL's is suddenly going to become possible for just anyone. Although new (and "rediscovered") techniques and technologies are bringing down the labour costs required to produce a DSL, it still requires a lot of skill and experience to do well. That inevitably means that some people will create better DSL's than others. But just as one hopes that computing Darwinianism leads to the gradual decline in use of badly designed libraries, so one hopes that the same should happen with badly designed DSL's.

In other words, I'd say: trust the market.

Probably Already Do

I can imagine the horror of trying to read a piece of code using 4, 5 languages at the same time, most of them badly designed.

Most programmers already embed many different langauges in an application. A perfectly reasonable Perl CGI may contain regexen, SQL, and a templating language. The template may contain (X)HTML, CSS, and JavaScript. So even the most humble CGI may contain 7 different languages, some general-purpose, others domain-specific. Most programmers don't even realize this fact.

I think you guys missed the point

Yes there is a possibility of lots of ad-hoc languages or languages that are hard to understand appearing but MPS like all tools are open to abuse. That doesn't mean that the tool or the ideas behind it are wrong.

Tool abuse has been a problem since mankind first invented tools. Abuse of a tool that is actually right for the job, or the selection of the wrong tool is a man-management problem, not a tool/idea problem.

The point that Sergey is trying to make is that general-purpose languages are not the right tool to solve domain specific problems. He's just trying to make it easier for the right people to create the right tool (DSLs) for the job.

Yes there is a cost involved in learning a new language in order for people to understand the code, but I would argue that a lot easier to understand a code written in good DSL than it is to understand general-purpose language code that is poorly written because the language doesn't have sufficient abstractions for the domain problem.

> The real downside of invent

The real downside of inventing a new language isn't that you have to write new editors, compilers,
> and many other tools. It's that other people won't understand you.

What's more difficult about using a DSL than using a library? One wouldn't expect to understand code that uses a particular library without reading that library's API documentation. Similarly, one wouldn't expect to understand code written in a particular DSL without checking that DSL's documentation. There is little or no conceptual difference between the two in my opinion.

DSL vs. Library

What's more difficult about using a DSL than using a library? One wouldn't expect to understand code that uses a particular library without reading that library's API documentation. Similarly, one wouldn't expect to understand code written in a particular DSL without checking that DSL's documentation. There is little or no conceptual difference between the two in my opinion.
Is it exen morphling to fruby a linging than to fruby vocling.
Vocabulary:
  • exen - more
  • morphling - difficult
  • fruby - learn
  • linging - language
  • vocling - vocabulary
linging fruby morphling vocling fruby morphling exen.

The language is a postfix language, where relations have a specific arity and apply to the top levels of concepts on the stack.

Relations:
  • exen/2 - "greater-than"
  • morphling/1 - "difficulty-of"
  • fruby - "to-learn"
Concepts:
  • linging - "language"
  • vocling - "vocabulary"

Re: DSL vs. Library

You seem to be saying that DSLs are more generic than libraries and therefore DSL code can be made more obfuscated than library calls, if that's the designer's intention. Not exactly a surprising result. But in practice, designers of libraries and designers of DSLs usually strive to create a design that's easier, not harder to use. So let's see what your sentences would look like in a DSL and as a sequence of library calls in a Java-like language:

DSL:
It is more difficult to learn a language than a vocabulary.

library calls:
it-is(more-difficult-than(learn(a(language)),a(vocabulary)))

Now, an expert Java programmer won't have much difficulty parsing the library call syntax above. But if this "it is more difficult ..." assertion is one of thousands of other assertions which form an expert system or a similar piece of software, I'd say the DSL form has a clear advantage.

As an example, I'll describe the project I've been assigned to for the past three years.

The main development language is C. But there are several other DSLs involved in the development. Some of them came with external tools, like the make files. Some are SGML and XML dialects, used for stuff like message strings, file formats, or byte code descriptions. A couple of them are full-blown DSLs with their own custom parser. Most of DSLs translate to C and then get compiled and linked together with the hand-written C code.

Now, let me say that every one of those DSLs is there for a reason. If it was easier to hand-code any of the DSL code in C, it wouldn't be there. Would you, for example, consider throwing away all make files and coding them in C?

Also, I can't recall any difficulties learning them when I needed to. First of all, every DSL has a clear role in the system and when you work in one area you only need to know one language. Second, the language syntax is self-descriptive. The only "documentation" for the DSLs were the DTDs and the parser sources, but I didn't need any documentation to understand what they were doing. Third, because they translate to C, you can examine and debug the translated C code until you figure the DSL out.

All that said, I don't think this is what "language oriented programming" is about. The article is describing a much more ambitious system. I don't see how two languages can interoperate without knowing each other's semantics, but I wish Sergey Dmitriev best luck.

Re: DSL vs. Library

I'm sorry that I'm replying this late, I'm not used to follow web fora.

DSL:
It is more difficult to learn a language than a vocabulary.
library calls:
it-is(more-difficult-than(learn(a(language)),a(vocabulary)))

The problem is that I'm very sure about the semantics of the library call. I'm not very sure about the semantics of the DSL - it looks like a natural language, but I'm pretty much aware that the NL understanding of computers is currently quite limited. So I think this DSL must be somewhat less expressive than NL. Hm. I hope they added all the connotations I have when reading such a sentence.

The same is true for any DSL: I have to learn its semantics before I can be sure I understood what the code does (as I have to do for any language). For a "normal" library, I do understand these semantics already. If the DSL does not provide a great increase in conciseness, why replace a well-understood language with a poorly-understood one?

And that's the only good reason for DSLs: When the original language becomes too clumsy to express an idea concisely that should be fixed (Being a Schemer, I prefer extending the language instead of writing a new one, though).

The problem is that I'm very

The problem is that I'm very sure about the semantics of the library call. I'm not very sure about the semantics of the DSL...

Are you sure about that? Libraries often have unexpected semantics layered on top of the semantics of the language itself. Plenty of times I've been surprised to find that a language [on edit: that should be "library," of course] had hidden state, unexpected semantics with respect to the ordering of function calls, or complex rules about initialization of values.

I think many of the same semantic choices are made during library design and language design, but library designers often aren't even aware of it.

I see your point, but I think there's a lot of grey area there...

(scribbled)

(scribbled)

What documentation?

One wouldn't expect to understand code that uses a particular library without reading that library's API documentation.

Actually, one would. In fact, we do it all the time. I work on an Extreme Programming team, and most of our code doesn't have any separate documentation. The code, plus the unit tests, *is* the documentation, and we like it that way. It's very important for anyone on the team to be able to dive into the code at any point and understand it, and we're able to do that pretty well because it's just Java code.

>> One wouldn't expect to und

> One wouldn't expect to understand code that uses a particular library without reading that
>> library's API documentation.
> Actually, one would. In fact, we do it all the time. I work on an Extreme Programming team,
> and most of our code doesn't have any separate documentation. The code, plus the unit
> tests, *is* the documentation

I think you're being somewhat selective in your thinking here. It's hardly a big surprise that you and your team are able to work out what your own code does, as you are the ones writing it (although come back to it in two years time, and tell me that "the code plus the unit tests" is sufficient documentation when you haven't been actively working on the system during that time). The point is, if you take an external "thing" (a library, or a DSL, constructed by someone else), are you really going to want to scour through its code to discern what it does, how it does it, and how to use it?

As an example, imagine the following scenario. You download the Java source code using Sun's nearly-free license - it has lots of testing code, so that satisfies your stated requirements. I'll download the API docs. Then we'll time you and I trying to find a function to do a particular job, and working out what arguments to pass to it. Personally I suspect the documentation might give me the slight edge ;)

Code cannot lie

Personally I suspect the documentation might give me the slight edge ;)

Except when it is wrong and/or incomplete.

>> Personally I suspect the d

> Personally I suspect the documentation might give me the slight edge ;)
> Except when it is wrong and/or incomplete.

So from that I infer that you would never, ever check documentation in case "it is wrong and/or incomplete", always preferring to check the code itself? You'll have to forgive me if I suggest that's at best an incredibly flimsy argument. Perhaps I should take your argument to mean that you are a fan of DSLs? After all, as you appear to enjoy reading through library code to discover it's API, then you should also enjoy reading through a DSLs implementation to discover how to use it.

More seriously, yes documentation can sometimes be wrong - but surprisingly rarely in my experience. Personally I'll take the hit on the fact that most decent systems will document at least their most frequently used aspects, and that the proportion of incorrect / misleading material within that documentation will be very small. But if you wish to bite your nose to spite your face, I have no intention of stopping you.

Failure is only an opportunity to begin again more intelligently

So from that I infer that you would never, ever check documentation in case "it is wrong and/or incomplete", always preferring to check the code itself?
I would check the informal documentation, more often than not find that it's inadequate, and then I will check the formal documentation (the source code). I tend to understand ten lines of code better than two pages of hype.

Ideally, I prefer a concise architectural documentation to get my bearings, and source code for everything else. The specific level of details at which I switch from prose to source code can be visualized as a slider, regulated by expressivenes of the PL in use. If it's machine code, I will stick to prose as mush I can. If it's a highly expressive PL, I will spend most of time in the source. Realistically, with Java or ilk I will need some prose expressed in form of source-code comments.

Perhaps I should take your argument to mean that you are a fan of DSLs? After all, as you appear to enjoy reading through library code to discover it's API, then you should also enjoy reading through a DSLs implementation to discover how to use it.
No, I agree with Jorgen's point that understanding a new DSL is more complex than understanding a new library. And no, I do not (always) enjoy reading source code, but sometimes it's the only option.
But if you wish to bite your nose to spite your face, I have no intention of stopping you.
And if I wish to bite somebody else's nose? :-)

> I would check the informal

I would check the informal documentation, more often than not find that it's inadequate,
> and then I will check the formal documentation (the source code). I tend to understand
> ten lines of code better than two pages of hype.
[snip]
> I agree with Jorgen's point that understanding a new DSL is more complex than understanding a new library

I don't quite understand. First of all you say that you find it so insanely difficult to understand most libraries that you have to peruse the source code, and then you're saying that DSL's are even worse than that! So how often have you checked make's source code then? ;)

I think all you're running into is that the systems you choose to use have bad documentation. And a system with bad documentation is a bad system, be it a library, a DSL or whatever else.

And if I wish to bite somebody else's nose? :-)

As we say in England, what you do behind closed curtains is your own business. But I'd strongly advise you to get the permission of your wife/fluffy toy/pet/whatever before doing so ;)

Reading source code.

I don't quite understand. First of all you say that you
> find it so insanely difficult to understand most
> libraries that you have to peruse the source code, and
> then you're saying that DSL's are even worse than that!
> So how often have you checked make's source code then?
> ;)

About once every two years. Slightly less often than I check the standard C library source code. However, because make is a language, I am effectively dependent on the documentation for make in addition to the source code, and I'm getting significant pain from low quality documentation for make implementations.

And you seem to be somewhat misreading things here: You are assuming that "reading source code" is a response to "insanely difficult". For many of us, reading source code is usually easy.

However, reading the source code to a language in order to understand that language is a pain. The less constrained the language grammar/semantics, the more of a pain. Thus, doing full DSLs have a learnign cost that is usually higher than an API - but then again it has benefits that go beyond an API.

In my favourite language (Ruby), DSLs are commonplace, but they are implemented using the basic Ruby syntax and really are just APIs - Ruby is just flexible enough that the line sort of blurs, without losing the consistency of having a full language.

Eivind.

> You are assuming that "read

You are assuming that "reading source code" is a response to "insanely difficult".
> For many of us, reading source code is usually easy.

When I am given a big wodge of someone elses source code, I find genuine comprehension of what I am given extremely hard. Bearing in mind I'm an English speaker, I would liken it to me trying to read Norwegian armed with a Norwegian -> English dictionary. This would allow me to understand localised aspects of the writing, but at such a slow (and often unreliable) pace that understanding the entire piece would be a very difficult task. There have been occasions when I have put in the work to understand a complete system, but my most frequent tactic is to grep for a particular keyword I'm looking for. My most frequent target for such grep'ing is the standard C library headers to find which of many files a particular #define lives in...

I suppose that what I'm saying is that while there may be a few élite people who can glance at piles of source code and discern the intention of the system it represents, most normal programmers simply can't do that (or, at least, they haven't put the effort into developing the skills that would allow them to do that). At that point, it almost doesn't matter if the source code of the API or DSL is harder to understand, because the vast majority of people won't look at the source code of either one of them.

This is getting a bit silly

I never said that I refuse to read documentation, so I see no reason to take up your challenge. Sun's javadoc is quite good, and it's silly not to use it.

What I was trying to explain is that sometimes you want to or have to read the source code, and it pays to make the source code as straightforward as possible.

Most projects don't have the resources of Sun, or the audience. Our customers aren't developers, and we aren't going to have books written about our API's. Making the source code more readable is a better investment for us than maintaining documentation that nobody will read (including us), and will quickly go out of date. Other people will make different tradeoffs.