Multi-Return Function Call

Olin Shivers and David Fisher. Multi-return Function Call. In Proceedings of ICFP '04.
It is possible to extend the basic notion of "function call" to allow functions to have multiple return points. This turns out to be a surprisingly useful mechanism. [...] We conclude that multiple-return function call is not only a useful and expressive mechanism, both at the source-code and intermediate-representation level, but is also quite inexpensive to implement.

Especially interesting are the cases where a tail call can actually shrink the stack.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Useful indeed

I believe "Towards Leakage Containment" provides useful additional backround for this paper.
"Events and Continuations" from LtU archive can be relevant, as well (yes, it is a shameless plug).

Disappointed by "Towards Leakage Containment"

I enjoyed "Multi-Return Function Call" and decided to read "Towards Leakage Containment" as advised. I found the paper enjoyable and stimulating, agreed with the premises, but finally found out that their solution doesn't actually solve the problem they claim to solve.

(Remark: while Multi-Return Function Call introduces new concepts into the language with a non-entirely-obvious semantic, this paper which discuss similar questions is written entirely in plain Scheme, passing multiple continuation functions to the callee instead of letting him do a multi-return. In some sense Multi-Return Function Call can be seen as a "direct style" translation of the "Towards Leakage Containment" programming style.)

The problem the paper is about is a duplication of code concerning the passing of control flow from a called function back to its caller. Functions return a value, and the caller decide, based on the returned value, what to do next. In some cases the value returned is actually an encoding of the control flow of the callee, only to be analyzed by the caller to be transformed in control flow again. For example, a function with a failure case may return, in Haskell terms, a `Maybe a` result (one conditional branch returns `None`, the other `Some result`), and the caller would then do a pattern-matching to branch control flow according to the value. The two-steps conversion control flow -> value -> control flow is useless and criticized by the authors. Yet, they say, they're not primarily concerned with efficiency, but with clarity of code.

Their solution is to ask the caller to provide the control flow that will follow the callee return directly to the callee, under the form of continuations. The callee is therefore responsible of branching the control directly instead of returning a value; for example the former function would accept a "failure" and "success" continuation (the first one with no parameter, the second one with one; they emphasize that this is not always-one-parameter as in CPS). They provide some more examples where similar techniques get rid of the intermediary value.

The flaw however is that the initial duplication is still present. It is not encoded in the analysis that the caller will have to perform on the return value, but in the calling convention of the callee (what continuations should we pass?). The claimed problem of having the control flow structure reproduced in two places was not solved; of course, it is a fundamental fact that the caller and callee must agree on some structure for handling the callee return. Regarding clarity, passing continuations may be clearer that a series of tests on the result values, but it is certainly less readable than ML-style pattern matching; it would be better if continuations where passed as named arguments, or fields of a named struct/record.

I think the important idea of the paper is that control flow needs to be structured, exactly as data. Programming languages provide elaborate ways to structure data (algebraic data types), but control flow is still relegated to the "a continuation is just a function" view, the equivalent of church-encoded data types. So when we want to pass "control structure" when a function return, we reify it into a data structure to be deconstructed again by the caller, which turns it into control flow. We need to think about more structure for the control flow. Maybe it would be possible to automatically derive both a data structure and a control structure from the same algebraic definition.

I do not expect it to add clarity however (pattern matching is pretty damn good at turning structure into control flow), and it would possibly add complexity. But it also removes an inefficiency layer. As the author noticed, this is a manual equivalent of the deforestation transformation. But the big advantage here is that it preserves the function boundaries, so can be applied locally by the programmer, without changing the program structure. Lately, I've been considering the idea of a language where all interesting optimizations, even relatively low-level, could be expressed as a source-to-source transform, and I think such work would be interesting in this perspective.

Control Flow

We need to think about more structure for the control flow.

Control flow considered harmful.

In some cases the value

In some cases the value returned is actually an encoding of the control flow of the callee, only to be analyzed by the caller to be transformed in control flow again. For example, a function with a failure case may return, in Haskell terms, a `Maybe a` result (one conditional branch returns `None`, the other `Some result`), and the caller would then do a pattern-matching to branch control flow according to the value. The two-steps conversion control flow -> value -> control flow is useless and criticized by the authors. Yet, they say, they're not primarily concerned with efficiency, but with clarity of code.

I've also been wondering about this, although I have not read either paper in detail. The Multi-Return paper devotes a section to sum types, in which they outline a duality between "product types in continuation space" and "sum types in value space". But given that such a duality obtains, it's not clear how the proposed constructs (multiple continuations and multi-return) can improve clarity of code, compared to the use of ML-style sum types.

They don't improve clarity but efficiency

They don't improve clarity, they improve efficiency. Without this solution, you convert control flow into a data structure callee-side, then inspect the data structure to recreate the previous control flow caller-side. This means data allocation, unless your compiler does a worker-wrapper inlining transform and removes the redundancy for you.

I think it is a Good Thing that the language can express at the source level the result of an optimization, without requiring non-local changes. The section (6) of the Multi-Return Function Call paper (PDF, alive link) about a "control-flow if%" is particularly interesting in this respect. They present a "low-level if" primitive that, instead of returning values depending on a boolean, turns the boolean into control flow. They show that several important local compiler optimizations, usually performed at a lower-level intermediate representation stage can be easily expressed at the source level.

It doesn't improve clarity (and therefore the "Towards Leakage Containment" paper is rather misguided in my opinion) because you actually express the same thing. With the sum type solution, you write something that says "build that data", then "depending on the data, do this and this". With some form of control-flow-level solution, you have to write something that says "jump to that return point that I was given at call-time", then "call the function, and depending on the return point chosen, do this and this". It's really the same thing and could be expressed with similar notation. I would be very interested in an in-language way to convert one into the other.

This sounds very much like

This sounds very much like Smalltalk's approach of reifying control flow constructs as methods, ie. 'if' is a message send on Boolean with two blocks for 'then' and 'else'.

Are there advantages to the control-flow approach that cannot be obtained automatically with a deforestation transformation on sums? Seems like these simple cases can be automatically inlined, in which case, why not support the simplest solution + supercompilation.

I might be wrong, but

I might be wrong, but compared to the (pretty lovely) Smalltalk use of blocks (closures) and Boolean methods for control flow, a multi-return function call would, say in a translation to Smalltalk style code, constrain the treatment of the blocks (closures) allowing for easier optimization via direct jumps to multiple return addresses, also interpretable as constrained multiple continuations.

Thinking of this now, I suppose this is a language simplification and optimization strategy more than any kind "semantic revolution" concept.

The advantage is that the

The advantage is that the programmer is in control of the precise operational semantics of its code, instead of having to relying on compiler transformations to get the desired effect.

Supercompilation is not so easy to achieve and it is not reasonable to pose its existence as a reasonable language design hypothesis. This is an instance of the "sufficiently smart compiler" bias.

Compiler optimizations, for example, have difficulties crossing the module boundaries in presence of separate compilation, and without breaking encapsulation. Programmer-side control-flow control has no issues with module boundaries.

Finally, even if the programmer itself doesn't usually use such "low-level details" language features (eg. they would only be voluntarily used in critical loops and such) that indeed complexify the language a bit with no net increase in expressivity, one good thing about having it in the source language is that it let those compiler optimizations be performed at the source level. I think the ability of performing a wide range of optimizations at the (desugared) source level is a good thing, as it is a well-specified layer where transformations should be easy to prove correct, and with a lot of semantic information available. Cf. optimizations of Core Haskell in Haskell compilers.

Don't need sufficient smarts, dumb will do

Re: module boundaries, control-flow control would seem to be limited to a single level of optimization though, where supercompilation-style optimizations could collapse multiple levels. Even if we trivialize the optimization to only collapse one level, we've already matched the efficiency of the control-flow control. This seems eminently doable, and doesn't require a "sufficienty smart compiler", so what am I missing?

Re: separate compilation, this is a legitimate complaint, though increasingly the days of native compilation seem numbered. The optimizations are just too limited given the increasing levels of abstraction, and link or runtime optimizations seem the way forward (ala LLVM or JIT).

What is this?

Could someone explain what a multi-return function call is? The paper gives an example, but I still don't really understand what it is.

Explanation

The paper defines a form multi exp r1 ... rn, which evaluates an expression exp, resulting in a value v. The multi form provides one or more possible "return points" for exp, i.e. r1 thru rn. The expression exp can specify which of those return points to return to, using a return point specifier of the form #n. If the evaluated expression doesn't specify a return point, then the first return point is used.

A return point can either be a lambda expression, or a return point specifier (#n). If a return point is a lambda expression, it is applied to the value v mentioned above. If a return point is itself a return point specifier of the form #n, then v is returned to the n'th return point in the calling context.

It might be easier to explain it in code than English, although the following requires familiarity with Scheme. Here's a (naive) definition of a multi macro which implements multi-return function calls:

(define ret-mark '(ret-mark))
; The following 'ret' function is overloaded in a hackish way:
; (ret n) returns a return point specifier, can be used in multi expressions
; (ret n v) returns value v to return point n, used in the multi implementation
(define (ret n . v) (cons ret-mark (cons n v)))
(define (ret-mark? m) (and (pair? m) (eq? (car m) ret-mark)))
(define ret-index cadr)
(define ret-value caddr)

; apply-ret: apply a return point to a value
; returns either a plain value, or a new return point containing the value
(define (apply-ret ret-pt v)
  (cond
    ((ret-mark? ret-pt)  ; ret-pt is a return point specifier
     (let ((i (ret-index ret-pt)))
       (if (= i 1) v     ; support non-multi contexts, return plain value
           (ret i v))))  ; return v to i'th return point in caller
    ((procedure? ret-pt) ; ret-pt is a lambda expression
     (ret-pt v))         ; apply it to v
    (else 
     (error (format "Invalid return point specified: ~a" ret-pt)))))

; the multi macro
(define-syntax multi 
  (syntax-rules ()
    ((multi exp r1 ...)
     (let ((v exp)
           (ret-pts (vector r1 ...)))
       (if (ret-mark? v)
           ; the expression returned a value tagged with a return point specifier
           (let ((rn (ret-index v))  ; return point index
                 (rv (ret-value v))) ; returned value
             ; apply return point to value
             (apply-ret (vector-ref ret-pts (- rn 1)) rv))
           ; otherwise, the expression return a plain value, apply r1 to it
           (apply-ret (vector-ref ret-pts 0) v))))))
This definition allows us to implement the filter function given in the paper:
; filter example from paper
; code layout matches paper's figure 1 line for line
(define (filter f lis)
  (letrec ((recur (match-lambda ('() '())
                              ((x . xs)
                               (if (f x)
                                   (multi (recur xs)
                                          (ret 1)
                                          (lambda (ans) (multi (cons x ans) (ret 2))))
                                   (multi (recur xs)
                                          (lambda _ (multi xs (ret 2)))
                                          (ret 2)))))))
    (multi (recur lis)
           (lambda _ lis)
           (ret 1))))
Finally, here's a test which proves that the function achieves the desired goal, of "[sharing] the longest possible tail between input and output":
(define lis '(3 7 12 4 13 7 8 6 10))
(define ans (filter even? lis))
ans ; => (12 4 8 6 10)
(eq? (list-tail lis 6) (cddr ans)) ; => #t
Of course, implementing multi-return functions with a macro (especially one as naive as the above) won't necessarily achieve some of the machine-level efficiencies described in the paper. However, the macro is useful for experimenting with the concept, without having to modify a compiler!

Alternate Explanation

Here's a different angle that may or may not be helpful: Suppose you had a case statement that chose one of several cases based on the return value of a function. With a multi-return function call you could have the called function jump straight back into the appropriate case, without having to return a value and perform comparisons. If any of the cases just returned on up the stack, you could instead have the called function jump directly up the stack to where it needed to go.

By itself this isn't too special. But as the paper demonstrates, it opens the door to some interesting optimizations, like stack-shrinking tail calls, or non-tail calls that run in constant space. It also allows for "leakage containment" in the sense of the paper that Andris mentioned.

Re: Tuple

Is this essentially a way for the caller of a method call to yield multiple execution paths from a single call without having to manually select out the specific sub-type of the resulted method call?

Sounds like you could almost use a Tuple of N items which designated which specific yield point was referred to by the return site of the call site.

It'd be fairly trivial at that point to enable language support, the difficult part would be optimization, though I guess a simple switch across the potentially yielded return points (return-point indicator, versus a type-check, switches are pretty quick.)

I should bookmark this for later reading, since I don't have time right now (I'm basing my comments off of the basic concept of 'multiple returns'.)

FORTRAN

Equally interesting is that this is essentially FORTRAN's (now obsolescent) alternate return statement.

Just trying to get my head

Just trying to get my head around this, I'm asking myself if this would be a generalized way to implement control structures such as Icon's concept of failure?

cf. Arrows

Yes, using this sort of 'multiple-continuation passing style' could be used in general to model control structures. I've ended up doing something similar to improve parallelism and efficiency for arrows-based composition:

data A a d r where
  Aprim  :: a d r -> A a d r
  Aident :: A a r r
  Acomp  :: A a r' r -> A a d r' -> A a d r
  Apure  :: (d -> r) -> A a d r
  Aprod  :: A a b c -> A a b' c' -> A a (b,b') (c,c')
  Asum   :: A a b c -> A a b' c' -> A a (Either b b') (Either c c')
  ...

Roughly, 'Acomp' corresponds to sequencing, 'Aprod' corresponds to parallel behavior, and 'Asum' corresponds to statically structured choice (e.g. conditional expressions). However, Aprod and Asum are both optimized forms, avoiding actual construction of pairs or analysis of Left/Right, supporting composition across multiple continuations. I have several more optimized forms (Aconst, Afst, Asnd, etc.) but they aren't critical for the illustration.

Essentially, I can run the following optimization:

simplify (Acomp (Asum f g) (Asum f' g')) = Asum f'' g''
  where f'' = simplify (Acomp f f')
        g'' = simplify (Acomp g g')

or (f' +++ g') >>> (f +++ g) = (f' >>> f) +++ (g' >>> g), if expressed with Haskell's arrow operators.

This particular simplification corresponds closely to the benefits achieved with 'multi-return function calls' - i.e. allowing us to eliminate many intermediate decisions on whether a value is true or false (expressed as 'Left' or 'Right'), and potentially even clear the unused path from memory much earlier (depending on the nature of the arrow).

A similar optimization also applies for parallel composition, Aprod, which was the main reason I pursued this approach (to compose asynchronous distributed behaviors).

I've used this technique

I've implemented multiple return points and used them to do type dispatch and error handling (which are the same thing if you consider values of type "error" as possible return values from a function) in a hobby lisp implementation.

For example, the real-division function can have five possible return types when dealing with real numbers; Bignum (when both args are bignums) Integer (when one argument evenly divides another), Rational (exact non-integer returns), Inexact (approximate returns), and Error (division by zero). When the calling function has five different return points and the callee can return to the appropriate point, the caller no longer has to do type dispatch. This is important because type dispatch is often the only thing that the caller has left to do. In that case, if we can pop it off the stack with the callee call frame, then each type of return becomes an eliminatable tail recursion.

But it's not original with me. I got it (or at least the kernel of it) from Manuel Serrano's STALIN scheme implementation. He uses this technique to do type dispatch on calls, but not returns. My implementation uses call frames on the heap instead of call frames on the stack as in STALIN. In Scheme, tail recursion elimination is a language requirement. When you have call frames on the heap, calls and returns have the same semantics and so can be handled by the same code. So it sort of fell out as a mere consequence of optimization.

I hadn't thought about it to the extent that this paper does. It's good work!