Tail call elimination decorator in Python

Features of a programming language, whether syntactic or semantic, are all part of the language's user interface. And a user interface can handle only so much complexity or it becomes unusable. This is also the reason why Python will never have continuations, and even why I'm uninterested in optimizing tail recursion.

Thus spoke Guido - as LtU readers already know.

Now, not even four weeks later, it has become clear that turning tail recursions into iterations can be achieved by an innocent little decorator in pure Python. No Rube Goldberg machine(s) in sight.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Claims tail calls but handles only tail recursion

Worse: the comment says "This function fails if the decorated function recurses in a non-tail context".

I would also suppose that in cases the call depth limit is not reached, it's slower than leaving non-tail calls.

So it's not very practical.

Usability issues

I think part of the problem from Guido's perspective is that tail call elimination can make the stack traces from exceptions more confusing.

How does Scheme handle this?

Tail calls in stack traces

I have been thinking about this a little lately. A stack trace might include a line like:

... 768 stack frames omitted ...

It's trivial to count the number of tail calls at run time, and maybe worth it for usability.

And: you're not required to throw away all those stack frames eagerly. You can mark them as tail calls, then throw them away later if you need the memory. Then the problem is determining which stack frames are important enough to display and which are just loops.

(I have no idea whether any Schemes do this.)

scheme has to make tail

scheme has to make tail calls direct or it's not scheme. so there wouldn't be any stack frame (that's the whole point of tail calls).

or i'm completely confused about this, i guess.

[later] bleagh; i completely mis-read the post i was replying to. sorry.

Take a ride on the MTA

Henry Baker's Cheney on the M.T.A. algorithm does pretty much exactly what you describe for lazy stack frame collection, and Chicken Scheme implements that approach. Summarizing, it just keeps pushing the C stack, never returning, but the stack is garbage-collected when it fills up.

To address Andrew's objection, afaik this approach is observationally equivalent (modulo GC pauses, perhaps) to an eager tail recursive implementation, so it's a valid way to implement Scheme.

I don't know of any implementations that tell you how many frames have been optimized from the stack history, though. That might not be very useful, because the tail calls could be from anywhere to anywhere, so knowing how many there are wouldn't tell you much. Imperative languages don't tell you the sum of all iterations of loops you might have gone through to reach a certain point, either.

Imperative languages don't

Imperative languages don't tell you the sum of all iterations of loops you might have gone through to reach a certain point, either.

Man, it would be cool if they did, though...

MIT Scheme

IIRC, MIT Scheme can be configured to keep a fixed-size FIFO buffer of tail call frames. Since the buffer is bounded in size this does not violate the TCO requirement.

This is entirely anecdotal, but most Schemers I've talked to don't seem to have found the lack of tail call history to be the debugging impairment that one might initially fear it to be. In the rare cases where such information is needed in specific parts of the code, it is customary to wrap the tail call in a call to the identity function.

proper stack traces in the presence of tail calls

MIT Scheme can be configured to keep a fixed-size FIFO buffer of tail call frames

The problem with this approach is that any loops implemented using tail recursion (which in Scheme means all loops) will quickly fill even the largest FIFO buffer. One can do better than that: I've been working on a stack tracer for SISC that compacts the FIFO buffer by detecting repetitions in tail call sequences. Early results are encouraging.

Updated link to Matthias's post

Sourceforge has rearranged all their mailing-list archive links. The thread Matthias linked to is now available at https://sourceforge.net/mailarchive/message.php?msg_id=2043767.

This isn't true tail recursion...

The current stack frame needs to be popped before entering *any* function called as "return foo(...)". It doesn't matter if this call appears in "foo" or "bar". This hack appears only to work when "return foo(...)" occurs in "foo". So this really isn't true tail recursion.

But this is typical Python culture. Pythoneers claimed to support higher-order functions long before they actually did. :-)

Tail recursion vs. tail call

It is a tail recursion optimisation, but not a tail call optimisation (as the recipe's title suggests).

i think by fails they mean

i think by fails they mean is doesn't do anything (just froma quick read of the code). not that it's going to crash + burn.

[oops. this was in reply to the last thread on the page... maybe the generic reply box should be at the top of the page.]

Innocent?

It throws an exception every other call....

nocent

It is of course a little more heavyweight than a cute goto. Unfortunately the language does not provide any. O.K. it provides one but Python sells it as an april fools joke. What really annoys me about the goto implementation is that it retokenizes the module that uses it. This strikes me excessive and nocent.

Kay

Same thing in Javascript

Not because it is useful, but just because it is possible.

Unwinding a historical accident

Back in 1978 I was learning to program the Motorola 6800, an 8 bit micro-processor, from the manuals. Thinking back to that cycle-conscious era I find it easy to imagine a little section that the manual lacked. It would have gone like this:

section n.1 Nested subroutines.

You can call a subroutine from within a subroutine. Use JSR xxxx as usual. This pushes a second return address onto the stack. The RTS at the end of the nested subroutine pops the second address and correctly returns you to immediately after the JSR xxxx.

section n.2 Avoid double popping

You should never write JSR xxxx, RTS. This is the infamous double pop. The RTS at the end of the nested subroutine pops a return address. That address is merely the address of the RTS, so the microprocessor immediately pops another return address to get back to the main routine.

Whenever you end a subroutine with a call to a nested subroutine you should use a single instruction JMP xxxx instead of the combination JSR xxxx, RTS. The RTS at the end of the nested subroutine will correctly pop the return address to the main routine. With only 128bytes in a 6810 memory chip, you will often only be able to spare 10 or 20 bytes for the stack and cannot afford to store redundant return addresses and double pop them.

Back to the 21st Century

I find nothing implausible about my little historical day dream. The computer science tradition could have been different. Instead of double-popping being the usual approach, and tail-call elimination a clever optimisation, a paragraph or two in influential manuals could have made a huge difference.

Tail-call elimination would have been a standard part of the assembler-lore and compilers that doubled popped would have been laughed at as hopelessly inefficient. Indeed, the awful way that double popping causes compiled code to blow the stack, when the obvious hand coded version works perfectly, would make tail-call elimination a mandatory optimisation for even the crudest of compilers.

I think there is historical baggage for programming language designers to shed. When you see

and even why I'm uninterested in optimizing tail recursion.
flip mentally to the alternative history in which Guido says
and even why I'm uninterested in fixing the double-popping bug

Nonsense

When you type "import this" into Python, part of the output is: "Flat is better than nested." It seems to me that Guido's stance on tail call optimization comes from that more general principle. It's not a historical coincidence.

What counts as flat?

Optimized tail calls result in a flatter call stack, so...? ;)

[P.S. Where do I sign up for Alan's alternative universe?]

don't be silly

You know, it's hard to be sure, but I think it means "flat for the user-programmer's brain" rather than "flat in terms of the C stack at run time, as a performance optimization". :)

This is only a toy

I wrote the code, and also tossed it up on http://littlelanguages.com

It is only a toy, and yes, I got the terminology wrong. I've spent a lot of time hacking arround with minor languages, but my relationship with formal semantics isn't as close as I'd like.

I'm actually trying to find a good book to walk me through implementing an efficient lisp, just so I know more about this.

LiSP

Two books that deal specifically with Lisp implementations are SICP, and Lisp in Small Pieces (LiSP, available in paper only, afaik). The latter is focused on pretty much nothing other than implementing various Scheme-like Lisps with different properties, and covers about a dozen implementations, using various techniques. It also introduces some denotational semantics.

EOPL might also be helpful. These are all linked from the Getting Started thread.

Thanks

I've had several people suggest Lisp in Small Pieces, and I'm going to get it.

Actually...

A friend of mine and I played around with it the other day, and it did work for 3 and 4 mutually-recursive procedures, none of which called themselves. So my comment above is not correct.

I'm still skeptical that this implements true tail recursion, but the counterexample isn't nearly as simple as I thought it would be.

Actually, your hunch was right

The code as posted recurses on itself, instead of on the called procedure, so mutually-recursive procedures will not be optimized correctly.

@tail_call_optimized
def even(n):
  if n == 0:
    return True
  else:
    return odd(n-1)

@tail_call_optimized
def odd(n):
  if n == 0:
    return False
  else:
    return even(n-1)

>>> print odd(1)
False
>>> print even(1)
True

.. and here's the fix:

- return g inside the exception thrown
- have a local variable inside the while loop that is initialized to g
- when catching the exception, set this local variable to the g from the exception

Fixed version and more write-up here.

Another tail-call-optimizing decorator

(Note: please nuke this post if it is too lengthy! I'm new here... )

The technique: wrap the function with a lazy wrapper, then use a trampoline to prevent tail-position wrappers from ever being evaluated.

It will still be slower than normal recursion, but at least it won't run out of stack room. It also should not fail in non-tail or mutually-recursive functions. (If so, it's an implementation bug.)
I'm no python guru (I prefer Ruby ;) ), and in particular I probably introduced bugs due to not understanding python variable scoping rules...

from LazyEvaluation.Promises import PromiseMetaClass
# I got this from: http://freshmeat.net/projects/lazypy/
class TailPromise(object):
  __metaclass__ = PromiseMetaClass

  def __init__(self,func,args,kw):
    self.__func = func
    self.__args = args
    self.__kw = kw

  def __arginfo__(self):
    return self.__args, self.__kw

  def __func__(self):
    return self.__func
  
  def __force__(self):
    return self.__func(*self.__args,**self.__kw)

def trampolined(g):

  def func(*args,**kwargs):
    old_trampolining = func.currently_trampolining

    # if this is not the first call, and it is a tail call:
    if (func.currently_trampolining != func):
      # Set up the trampoline!
      func.currently_trampolining = func
      while 1:
        res = g(*args,**kwargs)
	if res.__class__ is TailPromise and res.__func__() is g:
          # A tail recursion!
          args,kwargs = res.__arginfo__()
        else:
          func.currently_trampolining = old_trampolining
	  return res
    else:
      return TailPromise(g,args,kwargs)

  func.currently_trampolining = None
  return func

The trampoline terminology comes, I think, from a paper about an implementation of Scheme on the JVM...

In order to let this work

In order to let this work the passed function g must be prepared. Instead of recurse into g(...) in g's function body the call must be replaced by TailPromise(g,...) isn't it? So it seems to be more a design pattern for recursive functions: turning them into non-recursive ones explicitely in the implementation and evaluate them as if they would be recursive by means of returning the TailPromise wrapper and the trampoline function. This might not be exactly what I expect from a decorator solution where the implementation of a function is left unchanged.

Kay

No, this works

The else: return TailPromise(g,args,kwargs) part takes care of that. This is actually almost the same as the other implementation. But it uses a promise instead of an exception. The tail call position makes sure it doesn't matter what or how you return.

Yupp! My fault.


I'll also point out:

Because the TailPromise has (faked) lazily-evaluated semantics, it should work as a standin for the real return value in non-tail situations. Also, because it inspects the return type rather than the stack, it should work in mutually-recursive tail situations (multiple functions tail-calling each other).

It is very similar to the other implementation, though, and I wouldn't have ever thought of doing something like it without seeing the first one.

much older

http://en.wikipedia.org/wiki/Trampoline_%28computers%29

trampoline is a general optimization technique used to implement tail-calls in non-tail call implementations. I think it's as old or older than first attempts at native code compiling in functional languages during the 70s...

Here, use this

Maybe this could help...

def isTailCall():
	""" True if the caller was called in tail position. """
	f = sys._getframe().f_back.f_back
	code = f.f_code.co_code
	ip = f.f_lasti
	assert ord(code[ip]) in (131, 140, 141, 142)
	return code[ip+3] == 'S'

Update - New Tail Recursion Decorator

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496691

It's still broke.

This one don't work too good neither. Again, try:

@tail_recursion
def even(n):
   if n == 0:
      return True
   else:
      return odd(n-1)

@tail_recursion
def odd(n):
  if n == 0:
     return False
  else:
     return even(n-1)

Then calculate even(2) to be False. Comment the decorators out and it works. This would be a problem even with non-mutually recursive procedures, because you could have something like

@tail_recursion
def foo(n):
  if blah:
     return foo(bar(n))
  else
     return baz(n)

and the tail decorator could interfere with the call to baz(n), even if baz(n) is not recursive.

Comment the decorators out

Comment the decorators out and it works.

It works when you comment out one decorator - with the expected bound stack size. Commenting out both doesn't work for large n of course.

About foo. I don't see a problem here unless bar() calls foo again as in the following code:

@tail_recursion
def even(n):
    if n == 0:
        return True
    elif n == 1:
        return False
    else:
        return even(odd(n-2))

def odd(n):
    if n == 0:
        return False
    else:
        return even(n-1)

The decorator is more brittle than Crutchers due to its dumbed down lookup procedure. Not sure I want to investigate this issue any further since I never used it in my work and I don't know anyone who did. However I will place a warning in the recipe.