Terminology proposal

From a comment:
(Likewise it is ridiculous to say Lisp has no syntax--it certainly does; just a rather simple one).

I think even this is too much. Lisp's true syntax is anything but simple, and of course in fact it varies from Lisp to Lisp. I propose that we use the phrase latent syntax to describe the Lisp approach to syntax, by analogy with latent types.

The idea, of course, is that Lisp has a very trivial syntactic structure at one level, but there's an entire world of syntactic constraints that are latent in the surface syntax, but that are clearly syntax rather than semantics and which the programmer (and many analysis tools) must be aware of and manage. Note all the things in R5RS that are labeled as "syntax" or "derived syntax"...

Does this distinction make sense? I really like the phrase latent syntax to describe this, but maybe there's something better? There's abstract syntax, of course, but I'm not sure that's really the same thing. For instance, C and Java have concrete and abstract syntax, but neither of them really has latent syntax. I'm not sure what the opposite would be... manifest syntax, maybe?

Obviously this applies to XML as well. Is there a standard terminology in that community? "Schema" isn't particularly useful...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

:-)

This is certainly not abstract syntax, since that term means something else entirely.

I kinda like "latent syntax". Nice one!

Preliminaries

A common beginners question on comp.lang.lisp is "What characters can I use in variable names?" and the technically correct answer is that CL uses symbols as variable names, not strings of characters, so the question does not arise.

CL uses symbols as the names for variables and functions. What does it use for the names of symbols? Strings of characters. So the question comes back again "what characters can I use in symbol names" and the technically correct answer is that you can use any characters in symbol names.

That is hugely unhelpful. Our beginner is trying to prepare a flat file of characters that he is intending to submit to LOAD or COMPILE-FILE. Both of these read their input files with READ, and READ is watching out for a variety of characters, (,),quote,full-stop,space,back-quote,comma,etc to give them special meanings. In the first lesson our beginner is told not to use those characters in variable names. In the second lesson he learns that he can use them provided he protects them from read with suitable quotation. In the third lesson he learns that they were not variable names at all, CL uses a two step process to name its variables, first strings of characters name symbols, then symbols name variables.

In the fourth lesson our beginner learns that CL doesn't use a two step process for naming variables. CL source is an in-core data structure.

What does that mean? Why does it matter? Consider trying to illustrate differentiation with CL.

Create a variable name

(defvar var-name (gensym))
Create a squaring form
(defvar form `(* ,var-name ,var-name))
Create a form to define a function
(defvar func-def `(defun square (,var-name) ,form))
Create a function suitable for use with a numerical differentiation routine
(eval func-def)
(square 7) => 49
The point of this exercise is to differentiate the squaring form symbolically and demonstrate that evaluating the derivative yields the same result as numerical differentiation of the original function. So it is very natural to want to use the squaring form from the definition of the original function. We have the definition of the original function stored in the variable func-def. It is
(DEFUN SQUARE (#:G1538) (* #:G1538 #:G1538))
or something like it. It depends on how your system prints out uninterned variables. An informative exercise is to look at it with the display of structure sharing enabled.
CL-USER> (setf *print-circle* t) =>T
CL-USER> func-def => (DEFUN SQUARE (#1=#:G1538) (* #1# #1#))
It is the same uninterned symbol every time. If we want to submit the form to a differentiation routine we need the form and the variable. We can extract the form with (fourth func-def) and the variable with (car (third func-def)).

With these preliminaries out of the way, we can go back to the original quote

Likewise it is ridiculous to say Lisp has no syntax--it certainly does; just a rather simple one.
We have a bit of CL source code stored in func-def. We evaluated it to define a function; we extracted the arithmetic form to pass on for symbolic differentiation. What can we say about its syntax? For example, which characters were we allowed to use in the variable name?

We say that the macros we define with defmacro perform source to source transformations. It is hard to see what else we could say, but the macrofunctions that we create with defmacro operate on in-core data structures. If we say that macros do source to source transformations we are saying that CL source is an in-core data structure, and the flat files of characters that we prepare with our text editors are not source files. Whoops.

Sometimes you will see the word "cello" written "'cello", as an acknowledgement that it is an abbreviation for violin-cello: the 'cello belongs to the violin family of instruments. Perhaps the first bit of terminology we need is to start writing 'compiler as an acknowledgement that 'compiler is an abbreviation for parser-compiler. Then we can start talking sensibly about CL. While most languages offer you a unitary parser-compiler, CL offers you a parser, READ, a compiler, COMPILE, and a parser-compiler, COMPILE-FILE, which has a hook, DEFMACRO, to let you perform transformations between parsing and compiling.

I don't know what the standard computer science terminology is for the situation with CL. I hope that some-one will post and enlighten me. There are several ideas requiring technical terms. Perhaps the format of the files submitted to compile-file could be called syntax, but there are constraints on the in-core data structures that are used as 'source'. These constraints also have a claim on the word 'syntax'. Further, these constraints apply both to the in-core data structure and the flat file of characters, if any, from which the in-core data structure might have been derived. Thus the technical term needs to specialise in two ways to talk about its two sub-cases.

exactly!

There are several ideas requiring technical terms. Perhaps the format of the files submitted to compile-file could be called syntax, but there are constraints on the in-core data structures that are used as 'source'. These constraints also have a claim on the word 'syntax'. Further, these constraints apply both to the in-core data structure and the flat file of characters, if any, from which the in-core data structure might have been derived.

Yep, that's exactly what I meant. The second set of constraints ("on the in-core data structures...") do indeed have a claim on the word syntax, and that set of constraints is exactly what I'm calling "latent syntax"... It's "latent" in the source text (if any) due precisely to the bi-level structure that you describe.



It's the "format of the files submitted to compile-file" (usually, although of course not always, S-expressions) that I'm calling "manifest syntax".



Note that the concrete operational metaphors that you use to describe the situation in CL are only one way to describe the situation. The more formal exposition used, e.g., by R5RS is another.