Why is tail call optimization an issue in GC'd languages?

I have never quite understood the reason (or excuse) for not supporting tail call optimization in environments that support GC. Why is it an issue at all? If a function has nothing left to do, its activation record ought to be simply reclaimable. Why do the .NET IL and LLVM require an explicit 'tail' annotation, and why doesn't Java just support it out of the box?

The ability to debug using a stacktrace is a rather weak argument. Other than that, I can see that varargs would pose a problem in C, but surely not in Java/.NET. What am I missing?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Activation records are often

Activation records are often treated specially and stack-allocated.

Programs that rely on tail

Programs that rely on tail calls need them to be implemented as tail calls for semantic correctness on real machines - they need to be assured that stack usage will be bounded independent of how many calls logically recur. Meanwhile, developers frequently want to avail of debug builds, runtime diagnostic logging, code security based on stack analysis, etc. that don't automatically work well with tail call optimization, which can be surprising to the procedurally-oriented developer.

The optimization is also more important outside procedurally-oriented programming, where mutable state and explicit loops are more frequently used. The runtime and R&D work that would go into analyzing for TCO and implementing it could be spent on other optimizations that give better return for the investment. tail.call on .NET has long been implemented in a relatively simple but slow way (it may have improved in recent years with F# etc., but last I checked it was implemented as a generic assembly routine that unwound the stack, rather than being baked in to the JIT codegen).

As pkhuong says, it's very rare for the activation record to be allocated outside the stack for procedural code oriented VMs like .NET, JVM, LLVM etc.; closures, coroutines etc. that would extend the record's life are rewritten to explicit heap allocation by the front end before they hit the VM layer of abstraction. (Actually, I took for granted that you would think that the GC does not normally collect activation records in industrial VMs, I missed that in your question.)

There is no good reason for it.

There is no good reason for it.

The stack trace issue can be solved by simply emitting notes to a debugging log that supports unwinding.

It is essentially a social problem -- I think that Guido exemplifies it nicely, and you can read his reasoning here:

http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html

PS: varargs shouldn't be an issue for TCO in C, either.

The stack trace issue can be

The stack trace issue can be solved by simply emitting notes to a debugging log that supports unwinding.

"Simply" building and maintaining the notes that make up stack trace messages usually only generated when dumping stack frames to I/O may in fact cause "social problems". So preventing them is a good idea, no?

This is what debug builds

This is what debug builds are for.

A Non-Issue

Tail call optimization is just not interesting enough for languages which are not predominantly functional but imperative.

Guido says his best in his post: 'After all TRE only addresses recursion that can easily be replaced by a loop.' I agree with that. Unless you're a hard-core lisper, most Pythonians will just write a loop or rely on (built-in) higher-order constructs, and it's just not worth the effort to implement TRE for the few corner cases where it applies.

Moreover, in functional languages, I find it bad practice to manually introduce tail recursion into programs which when written naively are just not tail-recursive.

Guido wrong, film at 11

Guido says his best in his post: 'After all TRE only addresses recursion that can easily be replaced by a loop.'

Calling it "tail recursion elimination" makes it easier to miss the point, as Guido does here. How would you replace a program in CPS with a loop?

That's what I said

There are two parts to the answer. 1) You raise a non-issue. 2) Even as a non-issue, it's possible. But 1. is more important - there simply is no need.

On 1. Only Lispers and other functional programming enthusiasts write programs in CPS. Now, no doubt there are Python examples where reactive programs are written in CPS style, but I am not sure it is a good idiom for Python.

Myriads of Python programs run fine and are well-written without the need for TCO, for CPS style programs, or -heck- monads.

On 2. Anyway, you can always factor out general recursion by making the stack explicit. That also holds for CPS.

As an addendum. For functional programming, it's even a question whether you need to implement TCO. My own language relies on an eager combinatorial rewrite semantics which is implemented without a stack; there is no need for TCO. As far as I can see, that's even as fast as most Lisp implementations. It's a trade-off: slower than heavily optimized functional compilers but you don't run the risk of running out of stack space (which was a major obstacle when using Ocaml as a bootstrap language).

I'd like to point out that

I'd like to point out that "making the stack explicit" may in some (I would say, most) case add an interpretative overhead to a program. For example, consider the usual encoding of a fixed automaton as mutually tail-recursive functions, "making the stack explicit" amounts to making the automaton state explicit, and this is an *interpretation* of the automaton, while the tailcalling functions where a machine-level *implementation* of the automaton.

You may claim this example is only of limited use. For example it only works well for automaton whose structure is known at compile-time; if one want to build the automaton from varying parts at runtime, the interpretative overhead is coming back quick. But this may still be an important use in a given application (say implementing a fixed protocol as a state machine).

Finally, "making the stack explicit" is also a non-local transformation that I consider a "compilation pass". Asking programmers to compile their programs themselves (with all the verbosity and maintenance cost that it may occur) is often a warning sign of a language's limitations: "Oh, you don't need closures, you can closure-convert your code easily, see?". It may or may not apply here (I'm well aware of the "social" argument which I find interesting, if not totally convincing), but it makes me dubious at least.

(I also have a hard time believing your "combinator rewriting" system actually makes a difference regarding the amount of information you need to keep to resolve pending function calls, however this is formulated in your system. I won't make any further comments before having looked at it more closely.)

I agree, but have different conclusions...

I'd like to point out that "making the stack explicit" may in some (I would say, most) case add an interpretative overhead to a program.

Which is exactly why I don't really buy into the 'we need TCO desperately' cry-out from time to time.

If you like functional programming, and you like recursion, then a lot of programs have very natural functional implementations. However, most programs are not essentially tail-recursive when written naively, and to make them tail-recursive, most programmers necessarily make the stack explicit in the host language. (Which is work the compiler should do.)

I don't like the latter, it may save conventional stack space at the cost of a few processor cycles, but it also obfuscuates the programs beyond what I recognize as code. A language which is functional should just support recursion well and should alleviate programmers from doing such tedious manual work.

As far as your analysis goes. You are correct, there is no fundamental difference between what I did, or a stack based eager functional compiler. The compiler I wrote essentially looks at a combinatorial rewrite graph and compiles the eager reduction strategy into the combinators and emits that as C code. It will make exactly as many calls as any other compiler technique.

But for tail-recursive programs, it should have constant memory requirements, not linear.

[ Another addendum. The major difference is that most people assume that function applications are implemented as calls where the return point at the calling site is saved on the stack. Combinatorial rewriting rewrites a name to its definition, the body of a function. Hence, there is never a need to return, hence, for tail-recursive programs, the memory requirement is constant. ]

[ As a last point. Nice, heh? ;-) ]

The need for tail calls

It's possible that Python does need proper tail calls, and that only Lisp programmers use CPS. However, Guido's claim continues to be false, since CPS cannot be rewritten to use a loop.

It's always possible to factor out everything by writing an interpreter for a Turing machine, however this tells us nothing about the language other than its ability to compute a particular large set of functions on the integers.

As for your language, either it is in the space classes defined by Clinger, or it isn't. If it is, you've implemented proper tail calls. If it isn't, then your language uses unboundedly more space than it needs to.

Ah well

I think Guido's comment should be read as a snarky remark, as a generalization of the fact that he thinks Python just doesn't need it.

You raise a non-sequitur. If you can factor everything out, then also CPS, even if you need to go way down to writing an interpreter for a tail-call-extended language. Maybe even a loop will appear somewhere. Semantics...

As far as my language goes. It has been bootstrapped and I ran a memory profiler on the heap a few times, there are no leaks. I assume I use constant space for tail-calls, but it would be slightly better to rephrase that as that I avoided the whole issue by choosing a rewrite semantics. It wouldn't even make sense to naively tail-convert (most) algorithms, the space requirements would essentially remain the same.

[ Haskell and Clean should behave similar. ]

[ Now I am a bit annoyed. Of course you can factor out CPS by making the stack explicit and trampolining that in a loop. ]

Why Rewriting doesn't need Tail Call Optimization

On second thought, I thought it would help to make concrete 'Why Rewriting doesn't need Tail Call Optimization'.

Assume a combinator Z which counts to zero; the definition is given below.

Z 0 = 0
Z n = Z (n-1)

Now, without going into the specifics of the rewrite machine I use, the next example shows how Z 3 is rewritten to 0 in constant space.

Z 3 => Z (3-1) => Z 2 => Z (2-1) => Z 1 => Z (1-1) => Z 0 => 0

Since rewriting doesn't build a stack of invocations, the memory requirement for tail calls is constant. Of course, I admit, that can be called supporting proper tail calls, but also, more to the point, there just aren't any calls. Stuff just gets rewritten.

[ Last note. The price you pay is that I estimate rewriting on at least an order slower than using a stack based semantics.

On the positive side, you don't run out of stack space but out of memory. Python has rumored recursion depth of around a thousand, I estimate Ocaml on around ten thousand, I wanted a hundred thousand or more. ]

You demonstrated TCO, not rationale for its absence

"Since rewriting doesn't build a stack of invocations, the memory requirement for tail calls is constant."

For most of us, that's the very definition of "tail call optimization". I understand the term may be poorly named; I think that's why some say "proper tail calls". Regardless how you spell it, all we care about is that a non-constant chain of tail calls consumes O(1) space.

I think, given that definition, you'd say instead that "Correct term rewriting implementation requires TCO"; that "Z 1e30" should not run out of memory.

Pardon the late reply, but I thought this needed to be said; I think we agree in principle, but suffer confusion of nomenclature.

One problem with your example

One problem with your example is that it does not need stack frames. In an interpreted language such as yours, stack frames are not needed for control flow, as the control flow aspect is handled by your interpreter when it looks for next part of the term to reduce (in effect, this is the return pointer; it knows where to go reduce next after this reduction). So, if I understand correctly, stack frames would only be useful to store locals.

How would you reduce `Z n = let m = n-1 in Z(m)`, for example? (A local variable is not really meaningful here, as there is no sharing; maybe `Z(m+0*m)`, or `if true then Z m else Z m`)

Why is this a problem?

There are many programming models that do not need stack frames: dataflow programming, reactive programming, temporal logics, term rewriting, etc.

Stack frames, and 'activation records' in general, are an artifact of control-flow paradigms.

I would like to see us move away from such paradigms in general. Irrelevancy of stack frames seems to me a feature, not a problem. We can achieve cache-locality by other means.

Methodology

marco is trying to demonstrate that his evaluation strategy is "naturally TCO" in that it doesn't need to do anything specific to reap the space benefits of tail calls. He produces an example and show that it does not use stack space.

The problem with the example is that, if it used stack space, the frames would be of size 0. Therefore we can't know if its paradigm is naturally TCO as he claims, or if it would exhibit memory usage patterns similar to stack frames in other, more demanding tail-recursive cases.

The question seems relevant as, in his design document, he describes his evaluation strategy as an alternative to the usual virtual machines of eager function languages, eg. the SECD.

I don't know the implementation methods of the paradigms you suggest, so I wouldn't confirm nor contradict your claim that they "do not need stack frames". However, I'm a little dubious that they would be "naturally TCO" in the sense given by marco, claim that you have not made. As a comparison point, the Pure language, which is rewriting-based, indeed performs specific tail-call optimization.

Not 'naturally TCO'

I don't believe marco was arguing that term rewriting is "naturally TCO". Indeed, his words were: "I admit, that can be called supporting proper tail calls, but also, more to the point, there just aren't any calls. Stuff just gets rewritten."

I grant that a term rewriting system might use an unbounded amount of memory in similar patterns. For example, you can model a stack if you want one, or a left-recursive parser.

The need in Pure language for tail recursion seems more an artifact of its interpreter than the programming model. But, I suppose one could say the same for any language implementation for which TCO might apply.

I'm quite fond of temporal semantics with respect to GC. You get to 'garbage collect' the past while constructing the future. Reminds me of Langoliers. Sort of region-based collection in the fourth dimension, plays really nice with generational GC.

TCO is useful regardless of

TCO is useful regardless of whether it is imperative or functional. What's not to like about a goto with arguments? :)

I hate to disagree with Guido seeing how much I like Python, but he's wrong. Lua implements proper tail recursion and Lua programmers make use of it as if it was just normal. As it should be. Here's a state machine example.

Sort of reminiscent about all the noise in the Java world about closures and how it is not for your "average industrial programmer"; meanwhile Ruby and other languages just went ahead with it and nobody seemed to think it was a big deal.

[citation needed]

Where has anybody ever said that GC is a barrier to tail calls? Tail calls first appeared in Scheme, a GCed language. Since then they've appeared in most GCed functional languages (e.g. most Lips, MLs, and the Miranda/Haskell family).

I suspect miscommunication here

I believe that Sriram's question amounts to "Why aren't proper tail calls trivial to implement, given a GC?".

Taken back but with a different question

Okay, I'm confused about the question. Given that the GCed languages and VMs mentioned tend to put activation records on a contiguous stack and not in a heap, what does GC have to do with tail calls (or their lack) for those languages and VMs?

The reclamation order of

The reclamation order of stack-refered objects reverses.

reclamation order

The OP mentions the JVM and .NET. Neither JVM nor .NET have a way to observe reclamation of stack allocated objects. In JVM stack based objects are all simple primitives or references which can't participate in reclamation protocols like finalizers and phantom references. On .NET, "value types" may be allocated on the stack but there are rules around them including the fact that they can't have finalizers that make them act very similarly.

Both VMs do have ways to observe reclamation of heap based objects, and certainly tail calls are going to at least probabilistically move the GC iteration at which unreachable heap objects are picked up. But neither VM makes any promises about reclamation ordering of heap objects so any program that depends critically on a specific reclamation ordering of heap objects is probably going to be broken with or without tail calls.

I know

I never stated it was a good argument - just that TCO might change the semantics 'casually' of a language.

So, you end up implementing something at a substantial compiler cost, at the cost of an opcode, at the cost of complex stack manipulation which may interfere with GC and native code generation, making debugging more difficult since local scopes suddenly cease to exist, and without an obvious aim for an imperative programmer and with possibly perceived weird consequences.

But, as far as GC goes, that's about it.

This is precisely the topic

This is precisely the topic of this post.

As I see it, there are NO changes to the semantics, it adds zero complexity to the compiler, and doesn't make debugging any harder (ask Lua programmers). There are NO weird consequences.

Except for...

(Sorry, I was still editing that previous post.)

There's always the point were GC kicks in to actually do some GC work. You'll see [1] that `out-of-natural-order' stack manipulation is not as easy as it might look. You need extra bookkeeping, might need to change the manner in which the GC works (after every call, or every jump).

And then there's the cost of supporting this with an analyzer, in the debugger, in a hot-spot-compiler, native compiler, the slightly changed semantics, and the fact that probably less than .5% of programmers might be using this optimization (effectively).

It's a design choice. I think Lua shows that it can be implemented well if the language is designed for that. But it comes at a cost and I don't see why Java or C# might need it.

[ Note that this comment comes from someone who implemented a mostly-pure eager functional language. So, you might guess on which part of the fence I stand when it comes to functional programming and the merits of recursion. ;-) ]

[Sorry: deleted

[Sorry: deleted uninformative comment]

Since the activation record

Since the activation record only contains obj refs and primitives, my point is that it should be trivial to reclaim the space occupied by the activation record (knowing that one doesn't have to worry about the objects themselves; they will be taken care of the GC as they are now).Shouldn't this be like a twenty line change to the JVM code?

Reclaiming space

As far as I can tell, the possibility of additional difficulty in reclaiming space no longer needed in a tail call has never been an argument against tail calls. As you say, there is not additional difficulty.

Arguments raised have involved: stack based security protocols, developer expectations in stack traces and debuggers, compiler complexity in finding tail calls, developer understanding of what is and what isn't a tail call among other things.

I happen to generally be in the pro-tail call camp. I'm just saying that there's no need to attribute a non-argument to the anti-tail call camp when they've never made that argument.

Stack based security

Stack based security protocols: Clements and Felleisen show it is not an issue

Developer expectations in stack traces and debuggers: Not much of a factor IRL; it is easy to live without, as Lua programmers can attest.

Compiler complexity in finding tail calls: Really? A method call node in the CFG without any outgoing edges.

Developer understanding of what is and what isn't a tail call among other things: that's a matter of a few mistakes.

small matter of font-lock-mode

Developer understanding of what is and what isn't a tail call among other things: that's a matter of a few mistakes.

If it comes to that, it would be fairly simple to have the text editor render tail calls in a slightly different colour. It's not as if it requires any kind of deep analysis. (Actually, it would be a lot more useful than the standard fruit salad highlighting lexical elements and reserved words...)

Stack based security

Stack based security protocols: Clements and Felleisen show it is not an issue

Developer expectations in stack traces and debuggers: Not much of a factor IRL; it is easy to live without, as Lua programmers can attest.

Compiler complexity in finding tail calls: Really? A method call node in the CFG without any outgoing edges.

Developer understanding of what is and what isn't a tail call among other things: that's a matter of a few mistakes.

That sounds somewhat like

That sounds somewhat like the .NET tail.call implementation. It's slower than a normal call by perhaps 6x; so it is not frequently used unless it is necessary for semantic correctness (unbounded tail call depth).

I wrote about it here about 5 years ago.

I expect a decent implementation - one that genuinely is "goto with arguments" - would be significantly more work.

We are back to my original

We are back to my original assertion that (a) detecting a tail call is trivial (obvious from the CFG) (b) and adjusting the stack ptr and inserting a jmp instead of a call is really not an issue (in the .Net/Java environment). You and others have said that it is considerably more work; can you be a bit more specific about what makes this problem complex?

The Cost of Software

Lua has tail-calls and may be developed with that in mind.

As languages grow the cost of reengineering a new feature into a language becomes prohibitively high. Any change to a language can easily take half a year of discussions and a month of work on the language specification document.

So, say Sun (or whoever) would like to support tail-calls in Java. The cost:

- Half of year of discussions.
- A month to change the language document and get the change out to other compiler developers.
- A month of work to the compiler.
- A month of work to the debugger.
- A month of work to the hot-spot compiler.
- Three months of work to build tests, tests, tests.
- A month or two to get the message out on how to use this effectively.

Your 'simple' change I estimate on at least a year or a year-and-a-half of work. I.e. 100,000.- to 150,000.- Euros (not including any overhead and a very light-weight process).

And that for something that 80% of developers will perceive as a non-issue.

[ Actually, if I include the fact that you need to activate managers, pull developers of other projects, illness, developers which may have a hard time understanding TCO, the fact that some analysis may not fit in the given infrastructure, secretarial support, the cost of pushing this through the entire engineering process, I might be off by a factor three.

Worst case, you're asking for half a million bucks, not a feature. Heck, who wouldn't like half a million bucks? ]

Story I once heard Ungar tell

Is that Gosling would not put a feature into Java unless at least two people asked for it, no matter how much sense it seemed to make. As a result, a lot of the nice implementation ideas from Smalltalk and Self never made it into the language, despite the fact he had people telling him how great the features were. Unless at least two users DEMANDED the feature, he didn't care.

Talk about a "very light-weight process"!

Truth is, I have no feeling

Truth is, I have no feeling how big the development effort in Java is now. I mean, as most languages it started as a one/two person project. Now, I assume there's a steering committee which includes some people from the field and from co-developers like IBM, and I assume that some people are responsible for the different parts of the compiler, different implementations, libraries, regression tests, etc.

Clueless.

When I left Sun a year and a

When I left Sun a year and a half ago there were roughly 18 people on the HotSpot team, down from an average around 20. The JRockit team was at least as big. IBM had more than 100 people working on J9. That's just VMs!

Thanks for that comment.

Thanks for that comment. Good, that makes me somewhat confident that the numbers in my analysis aren't too far off.

You're assuming: (a) that a

You're assuming: (a) that a CFG will be available and built in all scenarios, in interpretation, naive compilation and deep profile-guided compilation, for methods of arbitrary size; (b) that the marginal cost of scanning nodes looking for viable tail calls is negligible; (c) that calculating the correct shuffling of arguments, local parameters, return location etc. on the stack is trivial, both in R&D and at runtime, in both time and space; (d) that such shuffling will have no repercussions with GC and asynchronous exception handling which may need to traverse the stack at any moment. And don't forget the ongoing maintenance cost, QA cost, etc.

But more important than any of these, is the opportunity cost. What features are not implemented because efficient tail calls are chosen for implementation instead? In a commercial environment, it's not enough that a feature has positive value and is technically possible or easy to implement. It must have the highest risk-adjusted return on investment as evaluated by the customer. The customers of commercial VMs are not crying out for tail calls.

Customers also complain loudly about losing features they take for granted. Full and correct call stacks for diagnostic logging in retail builds is one of those features. An existence proof of a niche language with customers who don't complain isn't sufficient.

All that said, I'm a fan of tail calls. I think they're a boon for expressing certain algorithms, especially state machines. But if you're asking why, in the real world, a particular situation pertains, you cannot argue against it on a CS theoretical basis. You need to take into account business, marketing, education and social factors.

The issue, as I understand

The issue, as I understand it, is that to implement a tail-call on a contiguous stack, you must generate parameters for the new call, generally requiring intermediate values or parameters to the current activation record as arguments. Then you rewrite the stack with the new parameters, which involves some copying. Then you move the process counter to execute the call.

The intermediate step of rewriting the parameters stack can be a minor performance hit, especially if we have mixed-size structures on the stack (in which case the implementation might need to worry about overlapping regions).

Assuming there are no stack-reflection semantics (as seen in Java) then there's no reason we couldn't turn every call into a tail-call. But many people would object to the overheads.

Problems with C

I can see that varargs would pose a problem in C

Varargs can be solved with a calling convention that's a bit different from the usual.

The big complication with C is that pointers are allowed to alias local memory for as long as a call is active. Here's a simple example that would choke in a straight forward tail call optimization.

int foo {
  int bar = 42;
  return baz(&bar);
}

int baz(int *quux) {
  return *quux;
}

A tail call from foo to baz would throw away bar but baz needs bar's memory to hang around.*

Doing tail calls in C requires enough data flow analysis to preclude such aliasing. It would generally have to conservatively prevent tail calls from being optimized any time it couldn't prove that no local memory addresses escape.

See Mark Probst's thesis Proper Tail Recursion in C for more than you ever wanted to know on the subject.

* A real C compiler would almost certainly inline my toy baz function, rendering tail calls moot. Ignore that inconvenient fact, please.

All of what you say here is

All of what you say here is valid, but it's also interesting to explore C as a compile target. Dave Tarditi did some nice work on using C efficiently as a target language in cases where the source language requires tail call elimination.

Already discussed...

Thank you. I missed that.

I also missed this discussion about why OO requires tail calls.

It's all about guarantees

Dave Herman raised this point in the OO thread, and I think you'll find it is a theme in almost all of these discussions: When talking about tail calls you need to decide, at a fundamental level, whether they are a guarantee your implementation gives to programmers, or merely an optimization that an implementation might apply.

If you think of tail calls as "just an optimization," then the only question you need to ask is whether it provides enough bang for your buck. I suspect many language implementors are concerned that TCO wouldn't improve the performance of enough programs to be worth the implementation cost (as marco notes above), or else are concerned that an optimization that may change asymptotic space bounds could lead to confusion ("my program runs fine in a release build, but overflows the stack in a debug build?").

If you want "proper tail calls" to be something that the language guarantees, then you need to figure out how you can document those guarantees in your language specification with enough precision that programmers can have confidence that their programs will work as intended. I'm not claiming that specifying this behavior is impossible, just that it is more tricky than one might casually expect. If your specification is loose on this point, then it is really little better than leaving tail calls as an "optimization."

As an interesting example of how difficult it can be to get tail calls "right," consider the following function in Dylan:

define method foo( n :: <integer> ) => <integer>
    let f = foo; // store to a variable of type <object>
    if n <= 10
        n;
    else
        f( n - 10 );
    end if
end method

Is the recursive call through f subject to tail call optimization? It depends.

A naive compilation of foo will recognize that it must return a value of type <integer>, and that it might return the result of calling f. But f is statically typed only as an <object>, so we don't know what type it might return when called, and thus insert a dynamic type-check after the call. Thus for a naive compiler, this is not a tail call.

A more advanced compiler might copy-propagate the value of f and thus recognize that the call is to a known function, which returns a value of the expected type. But now we have a situation where the presence or absence of a tail call depends on implementation details of the compiler, and so a programmer can't really rely on tail calls without detailed knowledge of the language implementation.

I find this question

I find this question optimisation versus guarantee very important when discussing my language of choice, Common Lisp. In CL as well, determining whether a function is in tail position can be very difficult. Some people have suggested that we add a new special form that would be (semantically) the same as FUNCALL, but would warn at compile-time if the compiler has not managed to put it in tail position.

Does anyone have thoughts on this compromise?

Stack Allocated Objects

Good points, but I would be more worried about breaking invariants in the now-very-expensive industrial compilers of Java or C. James Iry's example generalizes to that stack allocated objects might go out of scope prematurely. So, implementing TCO might also mean that you need to do some difficult liveness analysis and you might break invariants which make current optimized (hot-spot) compilers run fast.

So, you end up investing a lot of time and money at the possible cost of throwing away a lot of already invested time and money and breaking current optimizations.

The next release for Java might have it. But, then convincing arguments to do that would probably be that most hot-spot compilers already implement a form of TCO and it removes some competition out of the market. Not that it is such a nice feature to have (since imperative programming did well for the last fifty years without).

Imperative programming did well for the last fifty years?

I think that's ... debatable. (Not that I think TCO is a silver bullet)

Well, look at your PC and try to visualize

Well, look at your PC and try to visualize the lines of code running on it. Now, try to guesstimate the amount of code which is imperative, maybe OO, logical and functional.

Yeah, imperative did well. It's a miracle it works, but it did well.

I think how well imperative

I think how well imperative did over the last 50 years will be better judged looking back on it in another 50 years. I certainly hope the consensus will be that it did well in the same sense that rocks and sticks did well for early man. Also, I said only that it was debatable, and I have now debated it with you. Q.E.D.

Essentially contested concept

I am sure we hit an essentially contested concept somewhere. ;-)

Meaning of "Did Well"

Right. Remember that "did well" can mean "was popular" or "served the purpose it was put to well" / "performed admirably".

Missionaries went to the South Seas to do good ...

... and ended up doing very well. :-)

More General

It was more a comment on functional vs imperative, TCO, and in general how discussions on LtU tend to proliferate widely.

I sometimes wonder whether a discussion-style board is a good manner in getting a clear picture. Feels to me that Wikipedia, for instance, is way better in implementing a 'socratic' process.

Ah, but how many of those

Ah, but how many of those lines 'imperative' trying (through frameworks, libraries, self-discipline, etc.) to be functional, reactive, grammars, or some other model? ;-)

[addendum] We shouldn't measure the success of a paradigm by how many lines of code it takes to get a job done. (Or, at least, we should be inverting that number!) But I'll grant that even FONC is primarily imperative/procedural.

Why you restrict TCO to tail recursion?

I agree that foo is not "obviously" recursive, but that is (or should be) irrelevant, because the call is anyway in tail position and should be optimized.
Scheme and such languages (ML, Haskell) guarantee that any tail-call consumes no stack space - a common motivation is mutually recursive functions, but your example is another good reason, even if I've never seen it discussed. Defining whether a call is a tail-call is much simpler.

Now I know no Dylan, and maybe it optimizes only tail recursion (I don't know, I'm just trying to understand your statement). That would however be broken. Such languages exist I think, but they are broken (Programming Languages: Application and Intepretation, by S. Krishnamurthi, discusses such broken cases without giving examples, IIRC).
Scala has this problem but that's due to the JVM not supporting tail calls, so it's excusable.

In this case...

...in Dylan, the type annotations are dynamically-checked. As a result, the return type annotation means that the result has to be checked, which means that something that textually looks like a tail call might not be: there are additional dynamic checks to be done before the return.

There is a similar issue in contract-checking schemes, such as found in Racket. You could probably ask them to learn about the pragmatics of programming with contracts and tail calls.

There is no such an issue:

There is no such an issue: Scheme is "garbage collected" and has full tail call optimizations.

The problem with Java and tail call optimizations seemed to be about the security model, but this is already solved:

Tail Call Optimization in the Java HotSpotâ„¢ VM

There's also this request for enhancement:
http://bugs.sun.com/view_bug.do?bug_id=4726340

And it seems there's work on progress in this area:
http://wikis.sun.com/display/mlvm/TailCalls

Scala and F#; debugging experience

Real-world relevance
The whole discussion forgets that functional languages do run on both the JVM and .NET - Scala, Clojure, F#. F# is the reason why .NET does have tailcalls.
I program in Scala, where only tail recursion is (often) optimized (the compiler rewrite the bytecode to produce a loop where possible). And I've coauthored realistic programs where TCO matters: a C parser using parser combinators. The sequencing parser is based on tail-calls.

CPS per-se is only useful for functional programmers, maybe, but continuations are a useful feature, even in real-world cases (e.g. for web programming, as showed e.g. by S. Krishnamurthi), especially when it's the compiler who performs CPS transformation and not you.

Optimizations must be transparent during debugging
Another point about implementation issues: from Self onwards, virtual machines hide semantic changes during debugging, unlike in C: functions are inlined, yet you don't see that. And debugging has no performance cost until you do hit a breakpoint - if you enable remote debugging the performance is the same, until you actually attach the debugger (so you could always have remote debugging active).

If you debug optimized C (possible with GCC, I did that for a living - not possible with most other compilers), that's very apparent - the current line jumps back and forth across lines and inlined functions in an seemingly random way.

Thus, to avoid decreasing implementation quality, the VM authors would HAVE TO implement perfect debugging support - "by simply emitting notes to a debugging log that supports unwinding", and without performance impact (impossible). Maybe not hard, but needs work, and makes debugging more expensive (in terms of performance and memory consumption). And let's face it, if you debug a Scheme program it would be sometimes useful to access the complete stack-traces including removed frames (for the same reason as omniscient debugging is useful for languages with mutations).

The alternative is to distinguish tail-calls from other calls, like in .NET, so that tail-calls are optional. Scala, Clojure etc. would probably always use tail-calls and the programmers would have to be aware of that.

Side Note

CPS per-se is only useful for functional programmers, maybe, but continuations are a useful feature, even in real-world cases (e.g. for web programming, as showed e.g. by S. Krishnamurthi), especially when it's the compiler who performs CPS transformation and not you.

JBoss Seam emulates CPS-like state management for web applications through reflection and keeping a heap (I think represented as a stack) of "State" corresponding to the application at various scopes (form, page, session, app, global). My knowledge of JBoss Seam is a little rusty since I haven't touched it in about 4 years now. But Java programmers *are* using these ideas, even if in a bizarre form.

In terms of scaling this in a functional language (e.g., not running out of memory after 13 hours of load), people like Jay McCarthy have done a good job making it pragmatic.

Optimization, or not?

From the point of view of a pragmatic programmer who is actually using a language that guarantees tail call optimization, it can be vitally important to know whether it is expensive or cheap.

It's not uncommon for tail calls to be implemented in an expensive way, or to have an underlying runtime that makes them difficult, and find that your tail calls are ten times more costly in time than standard procedure calls. It's also not uncommon to have a compiler that reliably reduces tail calls to gotos and have them be ten times as fast as a standard procedure call.

And knowing which environment your program will run in means you have a factor of a hundred in performance in its favor or against it. That's significant. Even when you have written a "correct" program, a factor of a hundred in runtime is often something that can render it useless.