Lambda the Ultimate

Hidden complexities of tail-call/tail-recursion optimization
started 7/11/2001; 5:08:38 AM - last post 7/12/2003; 12:17:48 AM

Ehud Lamm - Hidden complexities of tail-call/tail-recursion optimization

7/11/2001; 5:08:38 AM (reads: 3123, responses: 4)

Hidden complexities of tail-call/tail-recursion optimization
Tail call elimination and and the difference between tail calls and tail recursion are frequent topics on language related discussion groups. Last time this came up on comp.lang.fnctional, I asked Oleg to summarize the subject for LtU. Happily, he obliged. Posted to implementation by Ehud Lamm on 7/11/01; 5:24:52 AM

Ehud Lamm - Re: Hidden complexities of tail-call/tail-recursion optimization

7/11/2001; 5:18:25 AM (reads: 2066, responses: 0)

Hidden complexities of tail-call/tail-recursion optimization

In most imperative and functional languages, a function foo(x,y,z) upon entry establishes a frame for itself on stack. The frame contains the values of function arguments, saved registers, as well as "local" variables (bindings created during execution of the function). Upon the return, the frame is popped from the stack. The existence of a frame makes it easy to implement recursive functions. Each separate invocation of fact(x) in the code below

        int fact(int x) {
           if( x < 2 ) return 1;
           return x * fact(x-1);
}

has its own frame with its own copy of variable x.

Tail-call optimization applies to situations such as follows:

        foo(int x, int y)
        { ... return bar(1,y,x); }

Function foo() invokes function bar(), and upon bar's return, pops the foo's frame off the stack and returns to the foo's caller. It may be possible to discard foo's frame _before_ the invocation of bar rather than after that. That's the essence of the tail-call optimization. Function foo() will discard its own frame, prepare the parameters for bar() and enter the bar() via a jump rather than a 'call' instruction. Function bar's frame will override function's foo frame. When bar() returns, the control will directly pass to the foo's caller. If the function foo() invokes itself in a tail position, the tail-call optimization becomes a tail-recursion optimization. The obvious benefit is that recursion can execute in constant stack space.

More extensive discussion of the tail recursion -- including the reference to the formal definition, rationale with a bit of history, and the specification for a _tail context_ can be found in Sec 3.5 "Proper tail recursion" of the R5RS.

Although tail-recursion/tail-call optimization appears straightforward, there are a number of complex issues, some of them, in general, preclude such optimizations in languages like C and C++.

Evaluation of argument expressions for the tail call:

In the simplest case such as

        int fact_tr(int x, int accum)
        { if(x < 1 ) return accum;
          return fact_tr(x-1,x*accum); }

The compiler can directly over-write the existing frame: it can multiply the old value of x by the old value of accum and write the result back into the 'accum' variable. The compiler can now decrement x (overwriting the previous value) and jump to the beginning of fact_tr. Note how the compiler took advantage of the unspecified order of evaluation of argument expression. If the compiler was required to evaluate argument expressions in their lexical order, x-1 first and x*accum later, the compiler had to use a temporary location to save the value of x before decrementing x. In general, the compiler must use temporary locations to save the results of evaluation of argument expressions. Only when all argument expressions are evaluated, the compiler should overwrite the current values of function's arguments. In Scheme, the use of temporary locations for argument expression results is mandatory unless the compiler can prove that none of the argument expressions capture continuations. See #1, #2 for more discussion.

Order of invocation of constructors and destructors:

Given the following C++ code

        int objr(int x)
        { ObjA obj(x);
          if(x < 1 ) return obj.to_int();
          return objr(x-1);
        }

evaluation of objr(2) leads to the following sequence of invocation of ObjA's constructions and destructions:

ObjA() ObjA() ~ObjA() ~ObjA()

If the tail-recursion optimization is performed, the sequence will be

ObjA() ~ObjA() ObjA() ~ObjA()

The order of invocation may matter as both a constructor and a destructor may perform i/o or execute other side-effecting operations.

Unless the C++ compiler can prove that the order will not matter, the compiler must not do the tail-call optimization.

In Scheme and other functional and non-functional languages (Java), only bindings are allocated on stack while non-primitive values go on heap. When a frame is dropped and the bindings are destroyed, the values are unaffected (and will survive the collection if, e.g., another binding points to them). In C not only the bindings are allocated on stack but the values (e.g., structures, strings and other arrays) as well. Dropping a frame has more consequences then.

 
FILE * open_png(char * file_name_stem)
{
  char file_name[256];
  if( strlen(file_name_stem) + 1 + 4 >= sizeof(file_name) )
    return (FILE*)0;
  sprintf(file_name,"%s.png",file_name_stem);
  return fopen(file_name,"rb");
}

The call to fopen() is a tail-call. If open_png() destroys its own frame before passing the control to fopen(), the name of the file to open will be destroyed as well.

Thus in general, tail-call optimization in C/C++ requires global code analysis, and is possible only in exceptional cases. More discussion here and here; Lazy Semantics - The tail-call issues look different in a lazy context; capturing continuations.

Executing setjmp() prevents any tail-cal optimizations in C/C++. Indeed, if function's frame is dropped, longjmp() will have nowhere to return to. In contrast, in Scheme execution of call-with-current-continuation does not affect the tail-call optimization. A Scheme compiler is obligated to do the tail-call optimization whether continuations are captured or not.

Biep - Re: Hidden complexities of tail-call/tail-recursion optimization

7/12/2001; 5:29:23 AM (reads: 1951, responses: 0)

Thanks, Oleg!

I think one of the best discussions (in a Scheme context) is Will Clinger's article.

It defines various levels of tail recursion in space terms, which eliminates a lot of clutter about mechanisms to achieve it.

Oleg - Re: Hidden complexities of tail-call/tail-recursion optimization

7/10/2003; 11:16:01 AM (reads: 572, responses: 0)

A recent follow-up, posted here for easy reference.

Re: Implementing call/cc or Dynamic Continuations in ANSI CL
http://groups.google.com/groups?threadm=3E9F3E99.5EE41AAB%40sonic.net

The thread discusses at length the difference between tail-recursion (syntactic property), tail call elimination and proper tail recursion. The latter is a technical term, perhaps ill-worded, and not a moral judgment and definitely is not a sum of its constituent words. Of special interest is the following excerpt from the post by William D. Clinger (Apr 21, 2003):

What I object to is the hijacking of the term "proper tail recursion" to mean more (or indeed less) than the sum of its parts.
Yes. The problem is that the phrase is not compositional: the word "proper" does not modify "tail recursion". Instead, the entire phrase "proper tail recursion" refers to a property that is achieved by implementing (syntactic) tail calls in any of several systematic ways, and is not achieved by several other systematic ways or by any unsystematic approach.
I don't like that either, but it is the way Steele and Sussman used the phrase. OTOH, their usage was better than prior usage, which overloaded "tail recursion" to mean three distinct things: syntactic tail calls, a certain approach to implementing them, and the space efficiency property. That they themselves often overloaded "proper tail recursion" to mean both a certain approach to implementing them and the space efficiency property helped cause the confusion between tail call optimization and the space efficiency property. Now that we have an established literature on "tail call optimizations" that are seldom systematic enough to achieve the space efficiency property, we have arrived at the kind of three-fold terminology that, in an ideal world, we would have had from the start:

a syntactic concept (tail calls, aka tail recursion)
implementation techniques (tail call optimization, tail merging)
space efficiency property (proper tail recursion)
It is indeed unfortunate that one of the common terms for the first is a subphrase of the established term for the third. The situation is much like that surrounding the abominable parsing technique known as "recursive descent with backup", which is not a form of the excellent parsing technique known as "recursive descent". With the parsing techniques, the confusion has led to alternative terms such as "definite clause grammar (DCG) parser" for the abominable technique, and "hand-coded LL(1) parser" for the excellent technique.
Thus I prefer "tail call" to "tail recursion" for the syntactic concept. I would be willing to consider an alternative name for the space efficiency property if one can be found. I do not, however, believe that it is helpful to deny that "proper tail recursion" is the established name for that property, or to claim that Steele and Sussman hijacked it. So far as I know, they were the first to use it, and their terminology was definitely an improvement over what had come before.

The thread also mentions three main approaches to achieving the proper tail recursion when compiling into C:

whole program compilation into one C function; map source language calls into gotos. That's the best way to test your C compiler.
trampolines (pioneered by Guy L. Steele in the first Scheme->C compiler, which was actually a Scheme->MacLISP compiler (RABBIT))
Cheney on MTA, implemented in the Chicken Scheme system

See the Scheme FAQ (www.schemers.org) for pointers to the corresponding web pages.

Many real systems rely on the combination of the above approaches. For example, Gambit compiles function calls within a cluster of functions defined in one file into gotos. Gambit uses trampolines for calls between Scheme functions that reside in separate blocks or separately compiled files.

Ehud Lamm - Re: Hidden complexities of tail-call/tail-recursion optimization

7/12/2003; 12:17:48 AM (reads: 545, responses: 0)

Thanks.

We really need to put up some links about the Cheney on MTA technique.