DSLs: Embedded, standalone, or both?

I've a question for the group concerning domain-specific languages. One taxon by which DSLs can be categorized is whether or not they are standalone or embedded (my terms; don't know if there are better or more precise terms for the concepts in the literature). A standalone DSL is one which contains little or no trace of the language used to implement it (and indeed, implementations of the DSL may be written in any decent general-purpose language, with similar effort). When general-purpose computing constructs (flow control, recursion, arithmetic, error handling, type processing) is needed, it must be added as part of the DSL.

HTML, to the extent that it is a programming language, would be one example.

An embedded DSL is one which extends, in some fashion, an existing general-purpose language which I shall refer to as the host language (one which is typically the language of implementation)--code written in the host language, or a subset thereof, is also valid in the DSL. In such languages. Such language may be implemented via macros or similar "compile-time" metaprogramming means; by standalone programs which translate the DSL into the host language, or by interpreters which include the ability to evaluate statements in the host language (possibly relying on an eval() facility present in the host language itself). Generally, such DSLs are tightly coupled to their host, and writing code in the DSL will often require knowing the host language. Writing the DSL implementation in a different language than the host may be difficult; especially if the DSL is interpreted. (Writing a DSL compiler/translator is more easily done in a "third" language).

Yacc is one example; while the part of the language for describing productions themselves isn't dependent on any other language, semantic actions pretty much have to be written in C/C++. If you want to write a LALR(1) parser in some other language, you're better off with a different tool. The universe is replete with experimental languages which compile into Java, and which typically allow Java to be embedded within. (Java generics started off this way, although adding decent polymorphism to Java is hardly domain-specific).

The questions are: What are the design trade-offs between the two approaches? The embedded approach seems to be easier, especially in a language with decent metaprogramming. The standalone approach seems better for a language which is intended to be used by non-programmers, or which need to spread beyond a single project, organization, or language community--but when general-purpose programming facilities are needed, that wheel is often implemented poorly. (And in the degenerate case, the result is not really a DSL but yet another general-purpose language--quite a few general-purpose languages started off as DSLs but became generalized).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Terminology?

I actually tend to think that DSL can be read as scripting language.

I am not sure of the terminology. I would believe that only DSL implementations are standalone or embedded. An embedded implementation (e.g. LUA is usually a library to be linked into some bigger application. A standalone implementation is usually an executable (e.g. Perl).

Not the common terminology

That's not how "DSL" is normally meant. DSL (domain specific language) is a language whose syntax and semantics are optimized for some target domain. Think of DSL as being opposite to general purpose language. A DSL may be so optimized for a specific domain that it isn't even usable for generalized work or it may just be awkward to do anything not immediately tied to the domain. Obviously it can be a matter of opinion whether a language is a DSL or not.

Anyway, by this definition LUA isn't usually called a DSL. It's a general purpose scripting language. Arguably Perl was once a DSL for doing string and text file manipulation but it's certainly not used that way now (conjecture: any Turing complete DSL will be used as a general purpose language whether it should be or not). Few would argue that HTML or SQL aren't DSLs though.

"Embedded DSL" is commonly used to mean a set of libraries and macros and such that are used to make it look like there's a DSL built into a host language. The line between embedded DSL and library is rather blurry - one could say that all libraries create embedded DSLs, its just that some DSLs are more awkward than others. There's a more expansive definition beyond libraries and macros which I'll talk about later.

Scott also includes things like Yacc which are compiled down to a host language but which expose their host language. I think most people would say that Yacc isn't embedded since you can't just include a few headers, type Yacc definitions into main.c and compile it. Nor can you write any arbitrary C program in a Yacc file. But he makes a good point about its dependency on other general purpose C code to get anything useful done. Maybe things like Yacc need another name (Dependent DSL?).

Scott finally mentions compilers and interpreters that create extended languages which include a host language as a subset. This is a more expansive definition of embedded DSL than the library/macro one. As an example, many C compilers extend the C definition to included a bit of assembly as an embedded DSL. This definition is a bit problematic, though. If carried too far that would mean that C++ is (almost) an embedded DSL in C.

Embedded DSL

To make searching easier: Embedded DSLs are often referred to as DSELs.

A wider definition

I think that by the term "Embedded DSL" Scott did not mean a language that allows code to be embedded in a host language code. I think he targeted a wider definition, more related to the nature of the language, as one that does not stand on its own merits, and captures some general-purpose features from some "host" language.
I would make one comment though: In my opinion there is no connection between the "host" language and the language in which the DSL was implemented. For example, Yacc has several clones that produce output in languages other than C (and use these languages for the embedded action code). Same with ANTLR. ASP has HTML embedded with any windows-scripting language, and it's obvious it's not implemented in any of them (though we can't know for sure because it's all closed source :-) ).

Terminology: Internal vs External

I've heard the DSLs being described as Internal and External. Martin Fowler describes them pretty well here.
External DSL and Internal DSL