tail recursion

Hello and sorry if this question has been already answered somewhere, though i searched for it.

The question is: tail-recursion is something badly needed _only_ in Scheme?

Here's a related question: if it isn't, that is, if it's possible to generally get compilers or interpreters to correctly detect recursion and automatically apply and implicit goto to it, then why should i bother, when it just complicates matters?

I gave a though to it when a saw a recent post here about LLVM and its recent support for tail-call for Scheme and also for playing with Haskell a bit.

So, to keep things simple, let's look at the examples:

;; in Scheme
(define (fact n)
  (define (product min max)
    (if (= min n)
	(product (+ 1 min)
		 (* min max))))
  (product 1 n))

That's one way to look at a factorial, but not the _natural_, i guess.

-- in Haskell
-- and just to add a bit more argument
dec n = n - 1

fact 0 = 1
fact n = fact( dec n ) * n

That's the _natural_, let's call it that, high-school definition for factorial, pristine clear. There's no tail-call anywhere to be seen, and yet, it doesn't cause stack overflow with large numbers...

Am i missing something? or is Scheme being stubborn? R6RS is coming and perhaps should address it?

Sorry for being provocative, but it's a sincere question...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

this might be the stupidiest post ever on LtU

sorry for your time. :P just because the stack broke with 9000 on guile but not on hugs is not proof that Haskell does it out of magic. feed it 90000 instead just resulted in:

ERROR - Garbage collection fails to reclaim sufficient space

guess we have to live with tail calls only...

It isn't, this one is

[... and if this post isn't the stupidiest post ever - expect one in the future ;-P ]

How does tail-call elimination compare to stack-less implementations of languages?

Clarification: as far as I understood it, tail-call elimination is always explained as an optimization in a stack-based language - TCE = there is no need to keep a stack-frame which won't be used (?).

Thus, stated differently, assume a stack-less with GC virtual machine, could the GC automatically get rid of "tail-call" activiation records since they aren't referenced? Can I dumb it down to a non-issue with the right operational semantics/VM design?


By a stackless implementation do you mean managing our own stack, keeping the usage of the system stack constant?

If so, there are two choices: either the custom stack works in the conventional way inside an array with explicit popping, or stack frames are allocated individually, linked, and reclaimed by the GC.

In both cases whether TCO works depends on the particular implementation. It doesn't happen automatically if you don't care about it, but it's quite easy.

In the first case you have to pop the stack frame before a tail call insetad of after it. In the second case you have to pass your caller's frame to the tail call instead of your own. In both cases the callee must use your caller's return address as if it was called by it directly.

Stackless refers to...

systems like Smalltalk or Stackless Python, where activation records are allocated from the free store (and deallocated by the garbage collector) rather than allocated from a contiguous region of memory (the "stack") and forced to obey LIFO discipline.

Stackless allocation of ARs has numerous advantages--"stack overflow" isn't a problem (until you run out of memory); it makes implementing things like first class continuations much easier, ARs can become objects like anything else, etc. The primary disadvantage of stackless ARs is performance; this often can be rather hard on the garbage collector. Many other techniques (such as "cactus stacks") exist to approximate stackless evaluation but with better performance in the common case of ARs having LIFO discipline.

CPS can be used to effect tail call elimination, at least in theory: If some function F tail-calls some other function G and passes G the RETURN continuation of F (rather than the "current" continuation of F corresponding to the location of the call-site), it might be that F's activation record is no longer live, and eligible for garbage collection. A sufficiently smart garbage collector (perhaps a new page on c2?), or one tuned to optimize this situation, might be able to detect this condition and reclaim F's activation record as soon as the call is made; thus re-using the space for G. In practice, F and G will likely have their ARs both in memory simultaneously--as the GC won't have run yet.

However, suppose after a long chain of recursive tail calls, the attempt to allocate a new AR fails due to insufficient memory. THEN the GC gets to run; and it would be able to free all the "parent" ARs (assuming they aren't referenced elsewhere). The new AR is then allocated once memory is cleaned up, and things continue from there. This could go on forever; having the essential effect of constant space.

For this to work requires either the human, or the compiler, to encode the tail call properly, to make sure that only the innermost tail call's AR is live.

I suspect Scheme implementations work this way in practice (I haven't studied the inner workings of any of the Scheme implementations so I cannot say for sure).

Stack overflow

Stack overflow can be avoided even if the stack is implemented in an array. For example my compiler of my language Kogut reallocates the stack as needed.

Why did you call the language

Why did you call the language Kogut? Are you fowl-obsessed? Or is there a deeper reason?


There is no deep reason. My nick means Chicken; my friends called me so 12 years ago. I wanted the language name to be distinct, not to mean anything concrete.

You're missing two things

1. Yes, tail calls can be optimized. If you pump the equivalent C code into the llvm demo page (http://llvm.org/demo) you'll see that it will transform most non-tail-call versions of fact into tail calls, then apply tail recursion elimination to turn it into a loop.

2. That is not enough. Scheme requires tail calls to be optimized even if they are between different functions, potentially with no recursion involved.

I don't think that this is anything special for Scheme, many functional languages benefit from this. Scheme is one of the first that requires tail call optimization as part of the langauge spec though.


Tail call transformation

While this version of Factorial isn't tail-recursive:

factorial (v)
    if (v==0 || v==1) return 1;
    else return v * factorial(v-1);

This version is:

   if (v == 0 || v ==1) return p;
   else return factorial_impl(v-1,p*v);
factorial (v)
    return factorial_impl(v,1);

In general, if a function ends with with something like

return f(g(...));

which contains one tail-call (to f(); often times f() is an inlined function such as the multiplication above) and one non-tail-call (to g()), such a thing can often be transformed into a form containing only tail-calls by construction of a new auxiliary function. In the factorial case above, this trick is coded by hand. Many compilers can do this sort of thing automatically, for simple cases like this.

Transformation of non-tail calls to tail calls

Many compilers can do this sort of thing automatically,
> for simple cases like this.

Which for example? I don't know of any.

I'm not convinced that it would be a good thing even if it was practical. It would make asymptotic memory complexity depending on optimization.

Correctness of the rewrite relies on associativity of multiplication and lack of exceptions. Usually it's the programmer who must decide whether it's safe or whether he doesn't care about the order.

RE: Which for example?

LLVM does!



He was talking about transformation of a non-tail-recursive factorial to tail-recursive one. This changes the order of multiplications.

RE: No

He was talking about transformation of a
> non-tail-recursive factorial to tail-recursive one.
> This changes the order of multiplications.

Yes, which LLVM does. LLVM has a fairly aggressive optimization pass where it tries to turn non-tail-recursive functions into tail recursive functions (for example, it can introduce accumulator variables).

If a function is or can be made self-tail-recursive, the LLVM optimizer turns it into a loop and subjects it to standard LLVM loop optimizations as well.



How does it know when it's safe? For example rewriting the factorial relies on the associativity of multiplication, or (if done differently) on the fact that the parameter goes through an arithmetic sequence which could be generated in the opposite order.

RE: How does it know it's safe

It uses conservative analysis to determine that it is safe. This means it misses some cases, but handles others. It knows that +/* of integers is associative and commutative, and the 0/1 is the identity value.


Memory complexity dependent on optimization

I'm not convinced that it would be a good thing even if it was practical.
>It would make asymptotic memory complexity depending on optimization.

Yup. That's actually an issue with Java. The spec allows but does not require JVM's to do tail call optimization internally. Some JVM's do, some don't, making it a performance gotcha. In response, modern Java IDE's will allow you to detect direct tail recursion, and offer to replace it with equivalent iterative code automatically.

Tail-called JVMs...

As I understand it, implementing tail-calls break Java's security model, so if there are such JVMs in existence, you wouldn't be able to run untrusted code on them.

All JVMs I'm familiar with do not implement tail recursion.

Tail calls and JVMs

There is no Java bytecode for an inter-procedural jump (which is necessary for tail-call elimination, except for the special case when a function issues a tail call to itself; intra-procedural jumps are legal). The JVM only provides instructions for intra-procedural jumps calls to native functions, standard function calls (obeying stack discipline), and exceptions.

However, a JVM, having verified that a particular class is valid (all methods pass the bytecode verifier) can optimize it how it likes. This includes JIT, and nothing precludes the native code procuded by the JIT compiler from eliminating tail-calls.

Tail-call JVMs can exist!

As I understand it, implementing tail-calls break Java's security model

This is a generally held belief which I also subscribed to not long ago. But it is not true. Clements and Felleisen have shown that you can both have tail-calls and stack inspection (Java's security model relies on stack inspection). They had a paper in ESOP 2003 and a journal version in TOPLAS.


I should have included a link


As near as I can tell, only do tail recursion elimination, not general tail call elimination. No mention is made about the security model issues.

It's an issue with TCE in general...

Either you can depend on it occuring (ala Scheme) or you cannot (most other environments). In those environments where you cannot depend on TCE being performed; you ought to assume that any explicitly-coded recursion will require space linear with the depth of the recursion. If that's a problem, convert to an iterative solution.

If you can depend on TCE, then you can assume tail recursion will not require linear space.

Even if one ignores the space issue, there is also a performance improvement with TCE, as you eliminate many cycles spent setting up and tearing down stack frames.