Recursive Algorithms that cannot be expresses with Tail-Recursion?

Are there 'useful' or 'interesting' recursive algorithms that cannot be expresses with Tail-Recursion, and do you know of any examples?

P.S. Feel free to define the words in quotes as you see fit. That's why they're in quotes.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

How about Ackermann?

ackermann(0,n) = n + 1
ackermann(m,0) = ackermann(m - 1, 1)
ackermann(m,n) = ackermann(m - 1, ackermann(m, n - 1))

(for some definitions of "interesting")



(define (ack m n)
  (cond ((null? m) n)
          (let ((nm 0) (nn 0))
                 ((number? m) (begin
                                (set! nm (list m))
                                (set! nn n)))
                 ((= (car m) 0) (begin
                                  (set! nm (cdr m))
                                  (set! nn (+ n 1))))
                 ((= n 0) (begin
                            (set! nm (cons (- (car m) 1) (cdr m)))
                            (set! nn 1)))
                 (else (begin
                         (set! nm (cons (car m)
                                        (cons (- (car m) 1) (cdr m))))
                         (set! nn (- n 1)))))
               (ack nm nn)))))

Here is a tail recursive version of the Ackerman function. The call stack should never expand while running this function, that isn't to say that the data will not. This uses the same idea I'd posted earlier, and naasking commented on later, but this version is definitely tail recursive :)

Why do you need to explicitly implement Tail-Recursion?

I couldn't have asked for better examples.

One of my favorite aspects of functional programming is how clearly it can expresses recursion (see first example). However, when tail-recursion has to be implemented EXPLICITLY, that elegance goes straight out the window (see second example).

But if every recursive algorithm can be made tail-recursive, then why don't most FP languages do this transformation automatically? Is this a difficult problem? Are there certain cases that are intractable (and if so, how common are they)? Am I missing something fundamental here?

Tail-call optimization is most useful...

...when it allows an algorithm which requires O(n) (or O(log n)) space if implemented naively with recursion, to instead only use O(1) space.

Some algorithms, can be implemented in tail-recursive fashion using the transformations discussed above--but still require non-constanc space. The difference is that this non-constant space is maintained in user data structures, usually allocated on the heap, rather than on the program stack.

The advantages to doing this are: A) stacks generally have to occupy, in most implementations, a contiguous address space--the "stack" is a somewhat scarce resource which is subject to catastrophic exhaustion--stack overflow. B) Heap-allocated data structures, not being stacks, are not limited by LIFO discipline--making possible all sorts of things (such as arbitrary continuations) not easy to do with stacks.

The *disadvantages* to this are: A) performance. Stack allocation and de-allocation are almost free (simply move the stack pointer on most architectures); whereas heap allocation is much more expensive. If GC is used to clean up the heap, then the GC has to work a *LOT* harder.

Some languages *do* do this transformation, although often as a function of how the VM works. Some VMs, such as stackless Python, only use a fixed amount of the native processor's call stack, and maintain the HLL "stack" entirely in data structures. (Others use native stacks as part of their implementation). Many Scheme implementations likewise do this sorta thing under the hood; it's almost mandatory to support call/cc.

This transformation is

This transformation is something that requires a fair bit of semantic knowledge. Let's restrict attention to the case of list folds, and see what happens there. So, here's a left and right fold for lists:

  val foldl : 'a -> ('a -> 'b -> 'b) -> 'a list -> 'b
  let rec foldl nil op list = 
    match list with
    | []      -> nil
    | x :: xs -> foldl (op x nil) op xs 

  val foldr : 'a -> ('a -> 'b -> 'b) -> 'a list -> 'b
  let rec foldr nil op list = 
    match list with
    | []      -> nil
    | x :: xs -> op x (foldr nil op xs)

So the question of tail-recursion optimization is when you can replace a left fold with a right fold. Now, let's look at the two patterns of calls:

   foldl nil op (x1 :: x2 :: x3 :: nil) = op x3 (op x2 (op x1 nil)) 
   foldr nil op (x1 :: x2 :: x3 :: nil) = op x1 (op x2 (op x3 nil))

You can the elements of the list are getting passed to op in the opposite order in the two kinds of fold. So it should be clear to rewrite the first call pattern into the second, it's sufficient for op to validate the following equation:

  op x (op y z) ==  op y (op x z)

Note that this is a semantic property, and whether it's true or not depends on the definition of op, and also on what notion of equivalence you're interested in. Since deep code analysis is hard, compilers mostly don't do optimizations like this: a few might do it for known functions like +, but certainly not more than that.

Once dependently typed programming languages start showing up in earnest, expect to start seeing compilers support optimizations like this, because you'll have a way to reliably reassure the compiler that such transformations are semantics-preserving.


Here are three different (but equivalent) perspectives:

1) Every recursive algorithm can be made into a tail-recursive one via the CPS transform. If you don't want to allow higher-order functions, simply defunctionalize (or closure convert or...)

2) Every algorithm can be expressed with while loops as the only control structure.

3) Languages that don't support recursion (or even "procedures") are nevertheless Turing complete, e.g. Turing machines or your processor's machine language.

For interesting values of

As the point of tail recursion (elimination) is to save stack space, I'd rephrase the question to: What interesting algorithms can't be done in constant space?

For example: Tree traversal can if you are allowed to modify the tree nodes during the traversal.

This is the interesting

This is the interesting question, since any non-tail recursive function can be made tail recursive if we maintain stack data manually.

Modification is cheating

If you modify the tree nodes during your traversal you are in effect using O(n) space, only it is already allocated in the tree nodes. The tree could be stored using smaller nodes if you remove the extra field used for traversals.

What interesting algorithms can't be done in constant space?

Probably most of them.

A constant space algorithm on input of length n is a bit strange, since you already use to O(n) space of the input. You can always turn an algorithm requiring O(n) additional space into one requiring O(1) by changing the representation of the input to include additional scratch space (as in your traversal example).

Things get more difficult when the algorithm/problem is reentrant. For example when you need to traverse a tree again, while a traversal is already in progress.

Stream processing algorithms

A constant space algorithm on input of length n is a bit strange, since you already use to O(n) space of the input.

Not so. Processing a text file line-by-line (cf. grep) need only use constant space in memory, no matter how long the file itself is. Likewise, processing a stream of data from a socket.

Not quite

Processing a text file line-by-line (cf. grep) need only use constant space in memory ...

What if your file has no line breaks?

More precisely...

What algorithms require space more than a constant multiple of space in (the size of the input + the size of the output)?

Alternatively, one could ask for problems that require more space than that for any correct algorithm, but that may not be what you meant.

Space complexity in general

You're effectively asking "What interesting algorithms can't be done in O(n) space?" (cf. Andreas's version of the question above.) This discussion seems to be in danger of turning into a general seminar on algorithmic space complexity :)

Which reminds me of something that crossed my mind recently. I'm no expert on complexity theory, but am I right in thinking that because visiting a given amount of space will take time at least proportional to the space, it trivially follows that no algorithm can have a time complexity better (in the sense of a smaller big-O expression) than its space complexity?

What about multiple

What about multiple processors?

Constant factor

Multiple processors, SIMD, extra gpu pipelines, etc. would just divide the time complexity by some constant factor, which doesn't change big-O (or best-case, or worst-case).

What if you had an

What if you had an essentially unlimited number of processors? I hope this isn't a silly question - we often make the same assumption about space, don't we? Couldn't we talk about thread (or whatever you'd call it) complexity?

This would be equivalent to

This would be equivalent to non-deterministic computation - as in the N in NP.

Parallel Complexity

The complexity class NC comes from asking what happens when the number of processors scales with the input. NC_i contains problems that can be solved in log^i time with a number of processors polynomial in the input. (This is one kind of algorithm that takes less time than the size of its input. Also think of MapReduce). NC is actually it's defined in terms of boolean circuits with logarithmic depth and polynomial total size, which implies something about the communication model (a PRAM machine, I think). It's the kind of complexity you want for any big parallel jobs.

I think allowing lots of threads mostly fits into existing complexity classes, if you are just interested in what algorithms are possible rather than exact performance.

Polynomially many threads running for logarithmic time can probably be mapped onto NC. Polynomially many threads running for polynomial time can be simulated on one processor. It might be interesting to see how the exponent can be divided between more time and more machines, but even n^2 machines is pretty expensive. Exponentially many threads is like NC one level up, somewhere between NP and EXP. I don't see how to fork more than exponentially many threads in polynomial time, even if you pre-create and infinite number of threads with thread ids on an input tape polynomial time is only long enough to read enough of a prefix to see exponentially many behaviors.

Generalized Chess, Go, and game-theory driven games

Generalized Chess, Go, and other, no-martingale, game-theory driven games.

Some consider that NP is, in many respects, close to the LOG-SPACE class of algorithms: given that solving an NP complete by brute force is equivalent to traversing a search tree (of size exp(n)) of candidate solutions towards the first good one, you'd need something to store something like log(n) steps to retain the current branch in your tree while checking the solution given by the oracle. This is unproven, of course. But nevertheless, a LOG-SPACE class of algorithms exists.

One wikipedia article should be a good starter for your quest: PSPACE

Update: some progress has been made since my studies: it has been shown that LOGSPACE is strictly in P... so clearly NP>LOGSPACE, hence, all NP problems fit your request.

Tree traversal

Depending on the tree representation, it's actually possible to traverse it without requiring any modification and using only enough space to hold a reference to the current tree node.

Cheating without up

Right. In fact I'm using that algorithm to implement iterators on a map (which is implemented as a balanced tree) to get to the next in-order node.

The cheating (which I concur it is) algorithm works on a structure of lisp cons cells because at garbage collection time the O(n) extra space is a bit hard to get by. (You could bet on the assumption that cons cell structures are usually wider that deep and do the tail recursion on the cdr.)