using words and notation in domain specific language syntax

I face the following dilemma:
  • it is easier to remember keywords in a language, but there can be a confusion when people take their meaning from plain English. Also, such a language might be wordy like Cobol
  • Symbolic identifiers make expressions shorter but they might become cryptic and this refutes the whole purpose of a domain specific language, namely, that a domain expert with less experience with a general programming language can read it without effort
I would like to hear your opinion on this, with literature pointers when possible.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ask the ppig?

Psychology of Programming Interest Group might have something, tho a quick google search didn't give me good hits.

personally i think there are enough different kinds of minds in the world that if you picked one extreme or the other you'd be hitting only a subset of all programmers. so then you have to choose 1 of (a) a given concept having only either a keyword or a symbol, never both vs. (b) allowing a concept to have aliases both of keyword and symbols sometimes. java sucks! haskell sucks!

Core vocabulary

Fwiw, most of the vocabulary of a language benefits from being predictable —regular— but the core vocabulary, the most important and often-used words, can and arguably should use irregularity to allow quick, convenient expression. An irregularity in an obscure part of a language is something to ridicule; an irregualrity in the central core of the language is usual. Lisp has lots of highly regular vocabluary, but at the heart of the language are "car", "cdr", and "lambda". Natural languages follow the same pattern, with irregularities most frequent in the most commonly used parts of the language (like the verbs "be" and "do" in English). This is partly because nobody wants to dedicate real-estate in their wetware to memorizing irregularities for something that's rarely used; but also, irregularities often provide redundancy so reducing transmision errors, which is particularly important with the most important/common words. There's also, I think, a sort of orienting, where the irregularities in a language tell you "pay attention! this is an important part of the language; you can tell because it's irregular!". Rather like a qwerty keyboard has bumps on the letters F and J, which are there so a touch-typist can feel, without looking, that their fingers are the the right places.

the whole purpose of a DSL

While some DSLs might aim to ease understanding for non-programmer experts, there are many other valid purposes/goals for various languages. Consider Regular Expressions, which are a DSL that make no attempt to ease understanding, but rather to abbreviate the construction of little state machines for operating on strings. I'm a big fan of s-exprs and ascii symbols in prefix notation, but for, *ahem*, specific domains, infix operators, obscure symbols, and other notational conveniences are often appropriate. For me, the important thing is that there exists a notation cheatsheet that includes pronounceable names.

This is an interesting

This is an interesting topic, and reasonable people have different opinions. Here are my opinions.

The most common operations and functions are better as operators, especially if they are commonly know, like the basic operators from mathematics. So using + * ** and so on is good (and they should have the usual precedence rules of mathematics, so Forth, Lisps/Schemes dialects, and others like Smalltalk are not natural and require training to be used).

For operations that are semantically similar to the basic operators, it's still good to overload the operators. This sometimes causes a little confusion, like using * to sum two matrices element-wise instead as the usual matrix multiplication. But overall it's an acceptable use, and avoids the unreadable code with vectors, arrays, bigints, rational numbers, etc, you see in languages like Java and Go.

For the other cases you can sometimes justify operators for a very common and vaguely standard DSL (regular expressions, C or Lisp formatting strings, and few more).

In some cases if you write very specialized code, like some highly mathematical code in Agda/Coq/Haskell/Mathematica, I think it's OK to use some newly defined infix operators, because the code is often so hard to write and to understand that the increased complexity given by some operator symbols is not significant.

But for most other cases for most code, a keyword or function name is better, it's simpler to search with a search engine or in the documentation, and it's often a little self-documenting. The Zipf's law also reminds us to use shorter names for the most commonly used functions, and longer for the less commonly used.

Haskell code sometimes contains too many cryptic operators (Haskell Parsec or Diagrams came to mind). For more common human programmers it's better to use function names (even used infix). Writing code is often a mostly work of engineering. And in Engineering you usually have to accept several trade-offs.