Is There a Standard Formalism for Describing Abstract Syntax Trees?

I'm in a position where I need to describe an algorithm that operates over an abstract syntax tree. In this sense, the language would be that of a simple 'while' language and I'm only interested in statement level detail. An example of what I'm looking for would be a good, mathematically precise description of implementing a control flow graph. Does anyone know of any articles, papers, books with such a discussion? An example:

Program:

A = 0
while A != 10
  if A == 5 then
    print "hello, 5!"
  else
    print "waiting for a 5..."
  endif
  A = A + 1
endwhile

The obvious AST:

[stamentlist]
 +[assignment]
 +[while]     
   +[statementlist]
     +[if]
       +[statementlist]
         +[print]
       +[statementlist]
         +[print]
     +[assignment]


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

AST are not well defined (and not really abstract)

The point is, that abstract syntax trees are not well defined, and are not really abstract. One could defines ASTs as the internal representation[s] inside compilers.

In practice, AST are not really abstract (at least in the compiler I looked inside: ocaml, gcc). One of the reason is that they should not only convey the abstract syntax (i.e. the idealized tree, as implicitly understood in e.g. BNF grammars) but also concrete information which matters a lot in practice, in particular source location and names. Both are practically required because they are very useful: to give back information to the compiler user (a compiler which reports error without giving locations in the source code is useless; and debugging information, and even object code, need to keep some of the concrete information (e.g. location in source code and external names of routines).

By the way, one could view Lisp s-expressions as a minimal formalism for abstract syntax trees.

Regards.

ML data types

I would think that typically one would use ML (or Haskell) data types to define the abstract syntax tree. For example, the limited example above would be something like:

datatype term = TermList of term list
              | If of term * term
              | While of term * term
              | Assignment of term * term
              | Print of term

Or use a final, tagless

Good Point

Good point about AST's not being well defined.

To clarify, I'm mainly looking for an example of an algorithm that operates over an AST (such as creating a CFG) and how it's specified in the literature.