Library vs. domain specific language

I guess this question was already discussed here on LtU, but I did not find it.

When is it necessary and rewarding to design a domain specific language, instead of creating a library for some general purpose language? Is it just more convenient syntax? What are the aspects that should be taken into account on deciding this question?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Libraries can be composed...

In general, libraries can be stitched together to form programs, while DSLs are, at least in one source unit, uncomposable. If all libraries were DSLs we would obviously have trouble writing anything at all.

Scala combines the two concepts, so you can get the benefits of both at the expense of a less custom syntax/semantic for those DSLs/libraries.

What is this difference of which you speak?

To Lispers (especially Schemers (especially Racketeers)), "library" and "DSL" look like two extrema of a single continuum. In the Scheme world in particular, we refer to syntax abstraction as the direct analogue of procedural (i.e. semantic) abstraction. It's quite common for a library that mostly supplies procedures to provide some helpful syntax as well, and the fact that the difference is mostly transparent to the user is very helpful in implementing things like even streams in eager Scheme.

Don't you mean '(especially

Don't you mean '(especially Schemers [especially Racketeers])?

Sigh.

I hate that square brackets were adopted in Scheme (and Racket) as a mere alias for parentheses. I would far rather have had a little bit more syntax, specifically a concise indexed accessor for vectors and lists. Square brackets would have been perfect for that, but now cannot be used.

I know, I know -- in a language community that prides itself on using a bare minimum of syntax, it was a forlorn hope.

Historically, Lisp used

Historically, Lisp used parentheses for S-expressions (data structures) and square brackets for M-expressions (control structures). The hijacking of square brackets for use in a more elaborate S-expression notation necessarily happened after the Lispers involved lost track of the data-only status of S-expressions. (Failing to think of S-expressions as data is, of course, at the heart of the myth that "the theory of fexprs is trivial" — but I digress. :-)

Eh. I'd have said

Eh. I'd have said failure to properly keep environments with fexpr arguments and consistently evaluate subexpressions in the environments where they appeared was most of the problem with early fexpr implementations.

I don't really give much of a darn about M-expressions: I'd prefer to treat square braces as delimiting array literals in data and treat them as (multidimensional) array indexes in code. This also has the nice effect of providing an accessor to array indexing structure, making it trivial to iterate over the indexes of an array. For example

(define foo 2)
[30 20 50 22][foo] ==> 50

We've both modified fexprs in ways that create (different) nontrivial theory.

You've modified fexprs by wrapping them with applicatives in kernel, which forces evaluation (and forces it in the correct environment).

I've used promises as arguments and enabled 'non-problematic' use cases of promises such as 'eval', 'force', and 'bind' in the main language - none of which exposes the syntax of the subexpression. I like having 'eval' separate from 'force' - it has semantics similar to algol's call-by-name arguments and allows me to implement, for example, looping forms directly.

I put access to the syntactic subexpression behind a primitive called 'break' which, when applied to a promise, returns the syntactic expression and the environment. But nobody would ever have a good reason to use 'break.' It's there solely for the purpose of implementing bug-for-bug compatibility with languages that got it wrong, so I put it behind some barriers intended to make people aware of its problems.

In an absolute sense, the theory of my fexprs is still trivial because of 'break', but only if someone imports the library

I_AM_INSANE_AND_WANT_TO_DESTROY_EVERY_ABSTRACTION_THAT_EXISTS

which is where I put the binding for 'break' and a few other semantically problematic forms. It's the only library whose name is in all-caps. Further, no module that imports that library will ever link with any module that doesn't explicitly import it, which I hope will put it "outside" the set of things the use of which can be hidden or accidentally "infect" code simply by using someone else's library.

Finally, if you specifically want to enable 'break' you have to invoke the interpreter with the argument

-IA!_IA!_CTHULHU_FTHAGN!

Subtle, no?

Are DSL for programmers?

I think the first question to ask is: who is going to write "programs" in the DSL. If the answer is "a software engineer", then you can design your DSL as a library, perhaps with a syntax extension if your target programming language supports it.

But if the answer is "a domain expert", then you should not make the assumption that the domain expert can write code in a given programming language. The DSL should stand on its own, and you will probably provide a compiler or interpreter for the DSL. This doesn't preclude the idea that most of the DSL's semantics can be implemented as a library, but this is an implementation detail.

high bar

the problem is i've often seen people write a dsl, and not get it good enough to really avoid being a leaky abstraction. hard to say if in the end it was a win. seems like you really have to know your dsl's audience and carefully consider how well you must hide the original language, and programming issues in general.

To be fair, library design

To be fair, library design is also hard.

I don't know if DSLs are really a win, at least the powerful textual ones. Perhaps a structured/graphical DSL designed for non/light-programmers would be fine...just think the of the DSL at that point as a extra-flexible user interface for interacting with the computer via very high-level code. You still have to be a good library/UX designer to build a good one, however.

You miss the big picture

There are some 'core features' we need for creating powerful abstractions. For example, the ability to separate what from when and how, treating them as orthogonal concerns. Libraries vs. DSLs do no good if the implementer is at the mercy of some fixed 'kernel'. Please read the seminal paper on graphics programming abstractions -- Ivan Sutherland's Great Wheel of Reincarnation paper: On the Design of Display Processors. This paper discusses how 'fixed instruction set' graphics co-processors evolved over time, up to 1968, and extrapolates out the trend and predicts, very accurately, the future.

The point is not simply about higher level abstractions like quaternions or Clifford algebras. The same trend is happening right now in the virtual machine space. Programmers want the illusion of an infinite compute fabric, and currently virtual machines like JVM and .NET CLR and Mono CLR do not provide efficient user-land scheduling. People have built libraries on top of these run-times to help address the shortcomings, for example the recent Big Data research done by Stanford and EPFL on Scala.

Building stuff from scratch is hard. Building on top of safe, sound primitive combinators is not as hard. Finding the right set of combinators is hard.

DSL = domain specific *language*

If the answer is "a software engineer", then you can design your DSL as a library, perhaps with a syntax extension if your target programming language supports it.

Since a DSL is a language, then by that argument we should all be still programming in assembly language, just with really really high-level libraries.

by that argument we should

by that argument we should all be still programming in assembly language, just with really really high-level libraries.

We are.

Well, yeah. We are.

Yes, all programming languages can be implemented as a highly extended set of very complicated assembly macros. We don't usually think of them that way, however.

I think that a "DSL for programmers" is one that allows you to define the semantics and control structures of most other programming languages (even languages with awful semantic problems) as a library. Mimicry of the syntax of other languages isn't necessary, but if you can create a program with an abstract syntax tree isomorphic to the original program in another language, link it with your library for that language, and run it with the same semantics (for, say, at least a dozen significantly different programming languages covering several different preferred paradigms) then I'd say you have a very good "DSL for programmers."

You're making my point for me

In the terms of the OP (or other non-Lispers), if it has to be implemented with macros, it's not a library any more.

True. Is that a problem?

Was I not supposed to make your point for you?

Confusion compounded

François Rouaix said that DSLs for programmers might as well be just libraries, which I construed to be meant in the sense of the original post, namely procedure libraries. I said ironically that if that were true, we would all be programming in assembly language (that is, using the syntax of assembly language), but with lots of libraries. This is clearly not what we are doing, not even when programming in C. You and John Shutt affirmed that on the contrary, we are doing just that; he doesn't say why, but you say that our programming languages can be construed as syntax extensions (i.e. macros) on top of assembly language.

This seems to me to miss my point (or, in the mood of irony, to make my point for me), and furthermore to be untrue: macros extend Lisp, but not to the extent that it doesn't look like Lisp any more, whereas Lisp or any other HLL looks nothing like assembly language. Granted, Racket has an Algol 60 mode, but this no more means that Algol 60 is a species of C than the existence of the Stalin, Gambit, and Chicken compilers means that Scheme is.

The central confusion here

The central confusion here is I believe about what actually is the difference between the "library" approach and the "DSL" approach. If there is a difference at all. Sean McDirmid had earlier suggested that libraries are the more flexible approach, whereas a DSL is monolithic, which I could see being a useful distinction, although it seems to me a DSL would still be the result of selecting some libraries. Francois Rouaix mentioned syntax extension as one of the techniques that may be within bounds for a library, which really throws the doors open for all sorts of stuff to count as libraries, if 'syntax extension' is meant as generally as I think of it. They also apparently reckoned that a library is explicitly invoked, making it a technique for software engineers rather than for domain experts (because the invocation is a programming concern rather than a domain concern). You (johnwcowan) then made, so I understood, the point that even software engineers sometimes start over with a new language rather than do everything with libraries, or we'd still be working with libraries in assembly language. I then attempted to drop a small note of humor/irony to the effect that the difference between libraries and DSLs is superficial; when you get down to it, the distinction between a high-level language and an assembly language library involving syntax mutation is just in what you call it. Perhaps my attempt was too laconic. But I do think the distinction between libraries and DSLs is largely illusion; in Lisp, where there is traditionally no syntax at all, it's arguable that there is no library/DSL distinction, and if syntax extensions are then allowed by libraries, it's hard to make the distinction even in non-Lisp languages.

No syntax?

But John, Lisp has syntax.

Not very much of it, it's true. But it has some.

The only portable programming language I'm aware of with less syntax than Lisp is Forth.

Yeah, no syntax.

I've discussed this a time or two on my blog, I'm sure. The traditional characterization of Lisp as having 'no syntax' is actually true if understood to mean 'no syntax for programs'. Because the syntax of Lisp, when properly understood, only describes data structures. There is no way of writing any sort of control structure at all; you can only write a data structure that you expect will be evaluated in a known environment. This is, of course, why the whole "theory of fexprs is trivial" thing doesn't really make any sense: the alleged trivialization of theory is a trivialization of syntactic theory, and fexprs don't have any syntactic theory —trivial or otherwise— because they can't be represented by syntax.

Dragging myself back to the point,

Dragging myself back to the point, though, in this case when I mentioned Lisp having no syntax at all, my intent was to point out that when the issue of syntax extension is factored out of the discussion, the library/DSL distinction arguably vanishes.

Lisp has a rich syntax

I see Lisp through a Scheme lens, from which perspective it has as much syntax as any programming language. Its lexical syntax is fairly sparse, it's true, although if you look at the Scheme lexical syntax for numeric literals, you'll find it's much richer than any other language I know of, as for example #e4.9+4.5i, which can also be written 49/10+9/2i (these are literals, not infix expressions).

But comparing the parentheses of Lisps with the rich syntax of a C or a Haskell is to compare apples with oranges. The syntax of two-armed conditionals in C is if (expr) stmt else stmt, and in Lisp it's (if expr expr expr). That's where the true parallelism lies. Common Lisp, following Lisp tradition, obfuscates the issue by calling its lexical syntax "syntax" and its true syntax "special forms", but Scheme calls them by their right names.

What, then, is syntax extension? It is macros, which both Common Lisp and Scheme have. Typical implementations have hundreds of macros already available, and provide at least one way to write more. If all of that were spelled out in the BNF instead of being mingled with the standard procedures, we'd see just how much syntax Common Lisp actually has: there is not another programming language standard that comes close to it. By my count there are 119 syntax keywords in the standard, of which 93 must be implemented as macros and 26 more may be primitive or may be implemented as macros.

Beyond macros

I think of syntax extension in a much broader context than macros, probably shaped by my background knowledge of the historical extensible languages movement, potentially encompassing adaptive grammars and not necessarily excluding non-monotonic adaptations. Standish's 1975 post mortem of the extensible languages identified (as I understood it) macros as the reason the movement failed: macros don't hide any of what came before, so everything just keeps getting more and more complicated, and you really can't take the language very far from where it started that way.

Folks,

Have you ever heard of DSEL (Domain-Specific Embedded Languages)? The real question is how to do "call by intention" well when integrating languages.

Another good idea --

Finally tagless interpreters.

Am wondering if some 'canonical references' for these two ideas -- DSELs and finally tagless interpreters - should be put in the "Design Docs" or "Research Papers" section of the site. It's amazing how nobody has thought of these as options so far in this thread.

Probably should...

Probably should...