# Lambda the Ultimate

 Hidden complexities of tail-call/tail-recursion optimization started 7/11/2001; 5:08:38 AM - last post 7/12/2003; 12:17:48 AM
 Ehud Lamm - Hidden complexities of tail-call/tail-recursion optimization   7/11/2001; 5:08:38 AM (reads: 3123, responses: 4)
 Hidden complexities of tail-call/tail-recursion optimization Tail call elimination and and the difference between tail calls and tail recursion are frequent topics on language related discussion groups. Last time this came up on comp.lang.fnctional, I asked Oleg to summarize the subject for LtU. Happily, he obliged. Posted to implementation by Ehud Lamm on 7/11/01; 5:24:52 AM

 Ehud Lamm - Re: Hidden complexities of tail-call/tail-recursion optimization   7/11/2001; 5:18:25 AM (reads: 2066, responses: 0)
 Hidden complexities of tail-call/tail-recursion optimization In most imperative and functional languages, a function foo(x,y,z) upon entry establishes a frame for itself on stack. The frame contains the values of function arguments, saved registers, as well as "local" variables (bindings created during execution of the function). Upon the return, the frame is popped from the stack. The existence of a frame makes it easy to implement recursive functions. Each separate invocation of fact(x) in the code below ``` int fact(int x) { if( x < 2 ) return 1; return x * fact(x-1); }``` has its own frame with its own copy of variable x. Tail-call optimization applies to situations such as follows: ``` foo(int x, int y) { ... return bar(1,y,x); } ``` Function foo() invokes function bar(), and upon bar's return, pops the foo's frame off the stack and returns to the foo's caller. It may be possible to discard foo's frame _before_ the invocation of bar rather than after that. That's the essence of the tail-call optimization. Function foo() will discard its own frame, prepare the parameters for bar() and enter the bar() via a jump rather than a 'call' instruction. Function bar's frame will override function's foo frame. When bar() returns, the control will directly pass to the foo's caller. If the function foo() invokes itself in a tail position, the tail-call optimization becomes a tail-recursion optimization. The obvious benefit is that recursion can execute in constant stack space. More extensive discussion of the tail recursion -- including the reference to the formal definition, rationale with a bit of history, and the specification for a _tail context_ can be found in Sec 3.5 "Proper tail recursion" of the R5RS. Although tail-recursion/tail-call optimization appears straightforward, there are a number of complex issues, some of them, in general, preclude such optimizations in languages like C and C++. Evaluation of argument expressions for the tail call: In the simplest case such as ``` int fact_tr(int x, int accum) { if(x < 1 ) return accum; return fact_tr(x-1,x*accum); } ``` The compiler can directly over-write the existing frame: it can multiply the old value of x by the old value of accum and write the result back into the 'accum' variable. The compiler can now decrement x (overwriting the previous value) and jump to the beginning of fact_tr. Note how the compiler took advantage of the unspecified order of evaluation of argument expression. If the compiler was required to evaluate argument expressions in their lexical order, x-1 first and x*accum later, the compiler had to use a temporary location to save the value of x before decrementing x. In general, the compiler must use temporary locations to save the results of evaluation of argument expressions. Only when all argument expressions are evaluated, the compiler should overwrite the current values of function's arguments. In Scheme, the use of temporary locations for argument expression results is mandatory unless the compiler can prove that none of the argument expressions capture continuations. See #1, #2 for more discussion. Order of invocation of constructors and destructors: Given the following C++ code ``` int objr(int x) { ObjA obj(x); if(x < 1 ) return obj.to_int(); return objr(x-1); } ``` evaluation of objr(2) leads to the following sequence of invocation of ObjA's constructions and destructions: ObjA() ObjA() ~ObjA() ~ObjA() If the tail-recursion optimization is performed, the sequence will be ObjA() ~ObjA() ObjA() ~ObjA() The order of invocation may matter as both a constructor and a destructor may perform i/o or execute other side-effecting operations. Unless the C++ compiler can prove that the order will not matter, the compiler must not do the tail-call optimization. In Scheme and other functional and non-functional languages (Java), only bindings are allocated on stack while non-primitive values go on heap. When a frame is dropped and the bindings are destroyed, the values are unaffected (and will survive the collection if, e.g., another binding points to them). In C not only the bindings are allocated on stack but the values (e.g., structures, strings and other arrays) as well. Dropping a frame has more consequences then. ``` FILE * open_png(char * file_name_stem) { char file_name[256]; if( strlen(file_name_stem) + 1 + 4 >= sizeof(file_name) ) return (FILE*)0; sprintf(file_name,"%s.png",file_name_stem); return fopen(file_name,"rb"); } ``` The call to fopen() is a tail-call. If open_png() destroys its own frame before passing the control to fopen(), the name of the file to open will be destroyed as well. Thus in general, tail-call optimization in C/C++ requires global code analysis, and is possible only in exceptional cases. More discussion here and here; Lazy Semantics - The tail-call issues look different in a lazy context; capturing continuations. Executing setjmp() prevents any tail-cal optimizations in C/C++. Indeed, if function's frame is dropped, longjmp() will have nowhere to return to. In contrast, in Scheme execution of call-with-current-continuation does not affect the tail-call optimization. A Scheme compiler is obligated to do the tail-call optimization whether continuations are captured or not.
 Biep - Re: Hidden complexities of tail-call/tail-recursion optimization   7/12/2001; 5:29:23 AM (reads: 1951, responses: 0)
 Thanks, Oleg! I think one of the best discussions (in a Scheme context) is Will Clinger's article. It defines various levels of tail recursion in space terms, which eliminates a lot of clutter about mechanisms to achieve it.
 Oleg - Re: Hidden complexities of tail-call/tail-recursion optimization   7/10/2003; 11:16:01 AM (reads: 572, responses: 0)