Function arity with currying and call-by-push-value

In λ-calculus, all functions are unary, and n-ary functions are encoded through currying. But knowing the arity of a function can be useful for optimization — a curried function starts computing only after applying it to at least N arguments, and this N can be useful to know.
It seems that call-by-push-value(CBPV) allows dealing with arities in a convenient way, as discussed below, though that was not the main design goal. My questions are:

  1. Is this true?
  2. Is this already known?
  3. Would you recommend call-by-push-value, or are there downsides which make it unsuitable?

Below I explain the details.

Applications of arity:
(1) In GHC, that's useful to deal differently with partial applications — because e.g. a function body only computes after specifying enough arguments (in simple cases, that is barring some optimizations, as many arguments as specified in the left-hand side of the equations defining the function).
(2) In my case, I'm working on a program transformation: transforming a function f gives a function f* that returns f's final result and all its intermediate results. This can be useful for incremental computation (along ideas by Liu and Teitelbaum (1995)). Here, too, it seems silly to return intermediate results of a partial application.

But function arity is a tricky concept in λ-calculus. For instance, id should be unary, Haskell's ($) is binary, but ($) is defined as equal to id:

($) :: (a -> b) -> a -> b
($) = id

So, a notion of arity seems tricky, and one of my colleague keeps repeating it's a bad idea.

Today, while reading a paper using CBPV (Hammer et al. 2014), I got a hunch that the call-by-push-value (CBPV) λ-calculus seems to allow for a natural notion of arity. But Googling does not find this spelled out anywhere, maybe because CBPV seems typically used in theoretical contexts. I'm

So, why should CBPV help?

(1) It distinguishes values (A) and computations (C): Computations are either functions (A -> C) which take values and return computations, or base computations F A which wrap values in a "return" expression:

C ::= A -> C | F A

So, the return type of a function is wrapped by the F constructor. In particular, this distinguishes A1 -> A2 -> F A3 (a binary function) from A1 -> F (U (A2 -> F A3)), a unary function returning another unary function.
U is a type constructor for values representing thunks.

(2) Moreover, we also distinguish partial and full applications: since a full application produces a result of type F A, we need to use the elimination form for F.

Thus, after conversion to CBPV, we can distinguish syntactically between partial and full applications.

Questions:
- is this a good way to define a notion of arity?
- would it be better to just introduce multi-argument functions? I suspect not, because those prevent abstracting over arity, while CBPV you can still write forall A B. (A -> B).
- does the translation have downsides? I imagine that ($) = id transforms to something more complicated in CBPV, which would involve (the equivalent of) a full eta-expansion first.
- The GHC core language does have a notion of arity, but it does not seem as elegant as far as I can tell.
- is there an extension of CBPV to calculi richer than STLC, say to System F? I've heard people saying that focusing and polarity runs into some form of trouble with System F, but I haven't heard why. Interestingly, Levy's own paper dealing with System F (http://www.cs.bham.ac.uk/~pbl/papers/polynorm.pdf) does not use CBPV, but Jump-With-Argument.

Bibliography:

Liu, Y. A., and Teitelbaum, T. Caching intermediate results for program improvement. In Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation (New York, NY, USA, 1995), PEPM '95, ACM, pp. 190—201.
Hammer, M. A., Phang, K. Y., Hicks, M., and Foster, J. S. Adapton: Composable, Demand-driven Incremental Computation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2014), PLDI '14, ACM, pp. 156—166.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

dumb question

i really don't grok this so pardon the clueless thought: so what if... it simply was unary but took a list of arguments then you could see the length of the list? if you need to restrict it for compile time type checking then maybe a form of e.g. dependent typing could restrict what the list can be.

You can use a tuple there,

You can use a tuple there, yes.

For an intermediate representation it's fine, but then all transformations need to handle multiple arguments. Sometimes that's just a nuisance, sometimes it can be more annoying.

For user programs the matter is different: If that's visible to programs in the language (which it usually is), you lose the standard advantages of currying-by-default: see Scala's Function1 to Function22. In general, many abstractions have to come in variants of different arities. There are various solutions, but currying is a very simple one.
Hiding this from programs, again, takes some thought — if a unary and binary functions have different types, it seems obvious this shows up in the source, right? Unless you transform types to distinguish arity during compilation, but how? CBPV is one clever way of doing it.

Jumbo and L

I think the question of "how many function types are there?" is at the core of Paul Blain-Levy's work on "Jumbo Connectives", but he did not do that work inside CBPV, and I don't think the "optimisation" angle was discussed in this work. It would also probably not seem surprising to people working with "big synthetic connectives" in focused proof search (eg. the Imogen papers discuss how focusing efficiently forces us to push the functions arguments in the context all at once), but those don't really talk about effects.

I think the most relevant discussion you'll find is a recent paper by Arnaud Spiwack, A dissection of L, which details some programming-related aspects that emerge from the work on a family of systems named "System L" (Curien and Herbelin, and recently the PhD thesis of Munch-Maccagnoni) which generalize CPBV. Section 3.5 (pages 17) discusses the relevance of (a linear variant of) System L as an intermediate compiler representation, page 22 remarks that chained linear arrows "do not need intermediate closures" (and explicitly remarks that this property also appears in CPBV), and page 24 discusses the relation to the efficient handling of currified functions in the ZINC abstract machine.

PS: I'm not sure about your point on "abstracting over arity"; if your system ends up with two different quantifiers for value and computation types, or if those abstractions themselves embed modalities/shifts, that may not work as simply as writing "forall B. A -> B". I would agree that CPBV is a nice theoretical framework to think about these, but I wouldn't expect it to work much better in practice than just doing the work carefully in the current GHC representation.

Thanks a lot for the

Thanks a lot for the reference. I guess the link you meant is http://assert-false.net/arnaud/papers/A%20dissection%20of%20L.pdf?

side effects

I think this relates to where side effects happen and 'F' is effectively Haskell's io monad.

In functional programming referential transparency requires there is no distinction between a nullary function and a value, which this breaks because F A /= A.

More precisely, the paper

More precisely, the paper says that U (F X)) (where U is the type constructor for thunk types, which are again value types) can be implemented by M X (if M is a monad).

In functional programming referential transparency requires there is no distinction between a nullary function and a value, which this breaks because F A /= A.

Not sure. In CBPV you don't have `v = f`, just `v <- f into rest`, so the fact that you can't inline such a function doesn't seem a necessary violation of referential transparency (whatever that might be, I understand the term is ill-defined) — I might be wrong here. Also, I'm not sure that's different from monadic.
Also, CBPV satisfies more eta-rules (all of them) than either CBV or CBN.

Handling arity

Arity can certainly be handled: the broader question is, is the handling good for the kind of analysis you want to do?

A simple trick for the ($) puzzle (ie., give a definition of arity for curried functions that is close to our intuition about the arity of uncurried functions and works for your example): define the arity of non-function types to be a 'number' in an extension of the natural numbers (in fact, Scott's lazy naturals): 1 if the type is not a type variable, * if it is a type variable, and 1+arity(b) for the function type a -> b. Then the arity of ($) is 2+*, while the arity of id is 1+*. Via substitution, any term of type m+* might be specialised to a term of a type with arity n or n+*, for m<=n. This fits with ($) being a specialised version of id.

CBPV simplifies the definition of arity, because the arity of computation types can never be of the form n+*. Using CBPV seems overkill if this is your only concern.

Qs and As

1.Is this true?

Yes.

2. Is this already known?

It's "well-known" in the mathematical sense, meaning that there exist people who know it. :)

3. Would you recommend call-by-push-value, or are there downsides which make it unsuitable?

CPBV and related calculi (like the enriched effect calculus) give a very natural syntax for effectful programs. One thing I really like, but which you may not, is that they end up prohibiting the sequentialization of effectful expressions, so that you can't write things like (read(), read()).

Would it be better to just introduce multi-argument functions? I suspect not, because those prevent abstracting over arity, while CBPV you can still write forall A B. (A -> B).

Two things. First, if you introduce other computation types (such as lazy pairing A & B), then your computation types can include things like A -> (B & (C -> D)), which requires a more careful treatment of arity and what argument lists are. See Cervesato and Pfenning's A Linear Spine Calculus for insight into this issue.

Two, if you only allow parametric polymorphism over value types, then the arity is always statically known, since the types make thunk forcing explicit. So this actually works really well, better IMO than treating arity explicitly in the obvious way.

is there an extension of CBPV to calculi richer than STLC, say to System F? I've heard people saying that focusing and polarity runs into some form of trouble with System F, but I haven't heard why.

The natural way of treating this gives you two foralls, one for values and the other for computations. See Møgelberg and Simpson's Relational Parametricity for Computational Effects, which not only gives the syntax, but the full theory of parametricity for a calculus close to CBPV.

Also, do read the "A Dissection of L" paper that gasche pointed to.

One thing I really like, but

One thing I really like, but which you may not, is that they end up prohibiting the sequentialization of effectful expressions, so that you can't write things like (read(), read()).

Now that's interesting. I always thought languages that don't specify evaluation order should classify possibly effectful expressions that assume an evaluation order to be errors. This has never been done to my knowledge.

Hm

While nice, I don't think this is new. Monadic code forces you to order effects (eg. in the do-notation) when you compute a pair of values from a pair of computations.

I don't think we're talking

I don't think we're talking about the same thing. I'm talking about argument evaluation order, like in OCaml where it's left unspecified for optimization reasons. Instead of restricting the class of expressions that can be provided as arguments to only variables, or specifying the evaluation order as with monads (which inhibits optimization), I'm talking about identifying possibly effectful expressions and permitting at most one of them in argument position.

Not CBPV

CBPV is very similar to monadic languages in this regard — computations are not values, and an argument must be a value — either the result of a computation (useful when translating from CBV), or a computation suspended by creating its thunk (useful when translating from CBN).

BTW, I think "prohibiting the sequentialization of effectful expressions" from neelk was an excess of negations which confused you — CBPV *forces* "the sequentialization of effectful expressions".

BTW, I think "prohibiting

BTW, I think "prohibiting the sequentialization of effectful expressions" from neelk was an excess of negations which confused you — CBPV *forces* "the sequentialization of effectful expressions".

I wasn't commenting on CBPV, I was commenting specifically on the property that neelk raised. I don't see how I could possibly have misread it: he not only says that it "prohibits the sequentialization of effectful expressions", he also says that you can't write things like (read(), read()). Both those claims seem consistent with each other and with how I interpreted it.

Instead of (read(), read()),

First of all, what you describe does sound interesting. So instead of (read(), read()), you'd want to allow (a, read()) and (read(), b)?

Maybe you would indeed like CBPV for whatever reason. But you talk about "identifying possibly effectful expressions and permitting at most one of them in argument position". CBPV allows exactly zero possibly effectful expressions as arguments, exactly like monadic calculi.

Instead of (read(), read()), in CBPV you have to write

x <- read() in
y <- read() in
return (x, y)

like with monads, so you do end up "specifying the evaluation order as with monads (which inhibits optimization)".

Alternatively, you can also write (thunk (read()), thunk (read())), but that triggers no computation (in Haskell, imagine having a value of type (IO (), IO())).

But I'm not commenting on calculi related to CBPV: I just studied the CBPV paper :-)

Thanks! Doubts: Two, if you

Thanks! Doubts:

Two, if you only allow parametric polymorphism over value types, then the arity is always statically known, since the types make thunk forcing explicit. So this actually works really well, better IMO than treating arity explicitly in the obvious way.

Hm... would this lose expressiveness? I suspect yes, because I can't force a value, unless I know its type is a thunk type. Without a forall on computation types, I couldn't even internalize the typing rule of thunk as a type signature.

However, maybe that's not a problem when using CBPV as a translation target, since System F doesn't have explicit thunking. I hope to get there eventually.
I'm finally learning category theory with some success, so this time I might make it through these citations.

More pointers

FWIW: Apparently, Axelsson's Syntactic library (ICFP 2012, http://dl.acm.org/citation.cfm?id=2364573) reuses the same syntax to distinguish operator arities (with a different semantics). The "arity" A of an operator is defined by the following grammar:
A ::= Full T | T :-> A
where T is a metavariable ranging over types. The operator for application composes the let and the application syntax, and the intended semantics is different (distinguish partially applied and fully applied operators in a language that might be first-order).
I'm still undecided whether Syntactic's approach is sufficient for dealing with arity. It looks like simply constraining functions result type to be wrapped by F (with a corresponding introduction form), and relying on currying for multiple arguments, might be enough for such a limited goal. Maybe something like

FT ::= Full T | T -> FT
T ::= FT | ...

But I've not thought about the latter suggestion enough yet :-)

Thanks for answers

I'd like to thank all the answerers. I'll have to do some study before commenting on details.

I am also not sure that CBPV is worth it for arity alone, but there's also "returning intermediate results". Also, I'm trying to do things elegantly, so "there are less places where you need to be really careful" counts as an advantage.
I'll have to check, but I suspect the transformation becomes even more obvious after transforming to CBPV.

Moreover, my current calculus allows for unspecified constants, without distinguishing constructors or eliminators. I suspect CBPV would force me to separate the two, and this will help as well: constructors don't return intermediate results, eliminators totally do.

Sorry for destroying my top post

Sorry for destroying my top post. I was trying to edit Unicode back in, but the website told me that some other user (me) edited the post, so I couldn't save my changes. I didn't expect the content to have disappeared, so stupidly enough, I didn't save the content of the text box. I'd dare say it's not my fault, but whatever.

If there's an easy way to recover the old content, I'd be happy.

Debugging

Posts seem to stop at

Found it in the history, encoding issues found guilty

I luckily found the post in my browser history.

Apparently, Drupal doesn't allow to write posts containing the Unicode character U+1D706 (called "Mathematical italic small lambda"), probably because it's outside the BMP; posts are cut to the last character before that (as shown in "Debugging"). So Drupal had cut my post to the first word. I've switched to the more usual lambda (inside the BMP) instead.