How efficient is partial sharing?

Partial sharing graphs offer a reduction model for the lambda calculus that is optimal in a sense put forward by Jean Jacques Levy. This model has seen interest wax and wane: initially it was thought to offer the most efficient possible technology for implementing the lambda calculus, but then an important result showed that bookkeeping overheads of any such model could be very high (Asperti & Mairson 1998). This result had a chilling effect on the initial wave of excitement over the technology.

Now Stefano Guerrini, one of the early investigators of partial sharing graphs, has an article with Marco Solieri (Guerrini & Solieri 2017) arguing that the gains from optimality can be very high and that partial sharing graphs can be relatively close to the best possible efficiency, within a quadratic factor on a conservative analysis (this is relatively close in terms of elementary recursion). Will the argument and result lead to renewed interest in partial sharing graphs from implementors of functional programming? We'll see...

(Asperti & Mairson 1998) Parallel beta reduction is not elementary recursive.

(Guerrini & Solieri 2017) Is the Optimal Implementation inefficient? Elementarily not.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

sharing strategies

I rejected the Haskell-style lazy evaluation as a default in my language, though it's still accessible via explicit annotations.

Granted, my language is combinator based with explicit copy `c :: s * a -> (s * a) * a`, and potential substructural type annotations (to restrict copy or drop for some values). Between these, there is a significant impact on algorithms design assuming linear data structures or Huet zippers instead of copying on access/map/iteration/etc.. Consequently, there is rarely unnecessary sharing. My default 'sharing' strategy is just to deep eval upon copy, unless annotated otherwise.

A `(lazy)` annotation will mark a value to be copied without evaluation, wrapping it within a single assignment hole for the evaluated result. I find this feature is rather awkward with respect to serialization and incremental computing. It's also a bad for for scenarios like exploring a large, procedurally generated model where storage costs can quickly exceed recomputation overheads; for that, we need something closer to caching.

I find that `(memo)`, to memoize computations via indexing an external table, which is not guaranteed to keep all data indefinitely, is a lot cleaner and easier to control and tune regarding memory overheads vs. recompute overheads. It works easily with serialization and durability. It's very suitable for sharing efforts in context of incremental computing, e.g. we can memoize recursive computations over persistent data structures, make a slight change at O(log(N)) costs, and compute again, reusing computations for most of the structure. It's good for caching partial computations of large models, since old resources can be released. Albeit, it's also more expensive than laziness because indexing computations is non-trivial.

I'd be interested in a performance comparison of default laziness with a strategy of evaluate-on-first-copy, memoization tables, or the conventional eager evaluation. Ideally in context of how algorithms are fitted to the evaluation model.

CBV and CBN; PL implementation concepts

The semantics of different evaluation strategies can be captured quite naturally by using alternate embeddings of typed lambda calculi into the proofnets some flavour of intuitionistic linear logic, and using an embedding of these proofnets of partial sharing graphs. I would describe PSGs as being 'naturally' CBN in that I find this embedding to be the simplest.

Linear data structures are a good fit for PSGs. Alan Bawden's PhD thesis and work on connection graphs / linear graph grammars contains a lot of ideas in this vein.

Concepts like "strategy of evaluate-on-first-copy, memoization tables" don't really have good analogies in PSGs that I am aware of, although maybe my work on port graph grammars, a generalisation of connection graphs, could be a starting point for thinking about this.

Conversion in type theories

Hi Charles,

I think the most promising application of work in this space (including related things like atomic lambda calculus) is in the implementation of conversion or judgmental equality in dependent type theory. This is a situation where both

a) simple evaluation strategies such as call-by-name, call-by-value, or weak-head reduction frequently hit pathological cases (where you end up evaluating large computations unnecessarily, or end up recompute the same term too frequently),
b) but at the same time, the absolute performance required is not high, so it's fine to spend a lot of effort on book-keeping to avoid the pathological cases.

Alas, there are only so many hours in the day!

judgmental equality in dependent type theory

Are you talking about Homotopy Type Theory here? I haven't really looked at how complex equality propositions are to prove and how much sharing they might need: it is a thought-provoking idea.

Just plain old MLTT

Hi Charles,

I'm just talking about plain old MLTT. Typechecking requires equality checking between terms. Morally, you're evaluating two open terms to normal form and comparing them for alpha-equality, but you can do better by interleaving evaluation and comparison.

As usual, if you have a term f (expensive-computation), and f doesn't use its argument, then call-by-name wins and call-by-value loses. On the other hand, if f duplicates its argument, then call-by-value wins and call-by-name loses. Both of these cases arise very frequently in dependently-typed programming, and indeed it's often the case that certain arguments to a function "should" be strict and others "should" be lazy.

So it seems plausible that one of these fancy evaluation strategies that do a better job of preserving both sharing and laziness could pay off for convertibility checking in dependent types.

Optimal reduction without oracle?

While optimal reduction is known to be efficient for evaluating ELL- and LLL-typable λ-terms, it is an open question whether there exists an efficient implementation of optimal reduction for arbitrary untyped λ-terms without the so-called oracle (consisted of brackets and croissants in BOHM and scope operators in Lambdascope).

Recently, there has been some effort towards a universal modification of Lamping's abstract algorithm, resulting in a possible solution (arXiv:1710.07516) which is still waiting to be (dis)proved. Any help with finding counterexamples or checking the algorithm's correctness would be highly appreciated.