An impure solution to the problem of matching fans

Some time ago, I found an impure solution to the problem of matching fans in Lamping's abstract algorithm. It is described in [1] and implemented in [2], the essential part of source code being available in [3].

My understanding is that the algorithm effectively eliminates the need in bookkeeping nodes (so-called "oracle") for optimal reduction in case of arbitrary untyped λ-terms. Although I have no formal proof for its correctness yet, the amount of testing [4, 5] that have already been done leaves little room for counterexamples.

Questions remaining open are: how to (dis)prove correctness of the algorithm as well as how to simplify and improve the algorithm? Any help would be highly appreciated.

[5] (the "Benchmarks" section)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Amount of testing

(Disclaimer: I am not familiar with optimal reduction and the 'matching fans" problem.)

If I understand correctly, the "amount of testing" you reference is that you wrote a couple (non-trivial) programs and tested that your reducer returns the expected output.

I don't consider that this is enough testing to give confidence of a non-trivial property, as the programs that humans write are typically of a very specific shape. (For example, I am no expert, but your tests may all fall in the fragment where it is known that an oracle is not needed?)

Here is what I would find more convincing:
- a formal proof that the algorithms acts as desired
- or, at least, an exhaustive check that all lambda-terms smaller than a certain bound (say 15) behave as desired

Then there is also the matter of what you are claiming and testing. Are you only testing that the reduction returns the right result, or are you testing that the reduction sequence is "optimal" as claimed? It sounds plausible that an invalid simplification of the algorithm would still compute correct normal-forms, only not in an optimal way -- if I understand correctly, your current tests would not catch this error?

(Have you implemented the standard algorithm to compare the reduction sequences of both? Are you claiming that they are identical, or merely of the same length?)

Still looking for counterexamples

Some other test cases are available in the software package itself [5], including comparison with Lambdascope, matching the number of beta-reductions wherever Lambdascope is able to reach normal form within reasonable amount of interactions (up to ten million interactions).

A formal proof of the algorithm's (in)correctness is one of the things I am asking for help with. Due to different nature of the solution, I am unable to directly apply techniques from [6]. So far, I have not been able to construct a counterexample which would differ in its normal form or number of beta-reductions from the optimal one. Any help in finding one would be highly appreciated.

[5] (the "Benchmarks" section)
[6] Andrea Asperti, Stefano Guerrini. The Optimal Implementation of Functional Programming Languages. Cambridge Tracts in Theoretical Computer Science, v.45, Cambridge University Press, 1998, isbn = 9780521621120

Exhaustive search

Have you tried exhaustively searching for counter-examples of size below 15?


No, I have not. If you know how to arrange such a search, please feel free to use CLI:

$ npm i -g @alexo/lambda
└── @alexo/lambda@0.3.6
$ lambda -p 'S K I' # experimental algorithm
11(7), 2 ms
v1: v1
$ lambda -p -a optimal 'S K I' # Lambdascope
59(7), 4 ms
v1: v1

Some results

I managed to collect some results [1]. First of all, I had to pick a particular definition for "size" of a λ-term, because there are many. I chose the one that is used in A220894 [2]:

size(x) = 0;
size(λx.M) = 1 + size(M);
size(M N) = 1 + size(M) + size(N).

For sizes from 1 to 9, inclusively, there exist 5663121 closed λ-terms. I tested all of them against both "abstract" [3] and "optimal" [4] algorithms in MLC, with up to 250 interactions per term. The process took almost a day of CPU time. Then, I automatically compared them [5] using a simple awk(1) script (also available in [1]), looking for terms for which normal form or number of β-reductions using "abstract" would deviate from "optimal".

No such terms have been found this way. Surprisingly, there have been identified apparent Lambdascope counterexamples instead, the shortest of which is λx.(λy.y y) (λy.x (λz.y)) resulting in a fan that reaches the interaction net interface. I plan to look into this in near future.

As for sizes higher than 9, testing quickly becomes unfeasible. For example, there are 69445532 closed terms of sizes from 1 to 10, inclusively, which takes a lot of time and space just to generate and save them. [6] is a 200MB gzip(1)'ed tarball (4GB unpacked) with all these terms split into 52 files with 1335491 terms each. In my current setting, it is unfeasible to test them.

I may come up with optimizations at some point to make it possible to process terms of sizes up to 10, but 11 and higher look completely hopeless to me.



This looks really nice, thank you! I think that this result gives a lot of confidence into you result. The limit is a bit too low to claim that "a counter-example, if it exists, would have been found" (I would confidently made the claim that if a counter-example exists, there is one under size 1000), but the fact that you found issues in Lambdascope is a proof that the strategy is working.

It should of course be possible to perform the tests in a streaming fashion. You mention that you run in space limitations, but what about time? What is the time that upto-10 checking takes with your setup?


I think that as it is now, the setup will take about two weeks to test two algorithms on terms of size 10. That is comparable with time needed to significantly optimize the setup, so I would postpone launching such a thing for now. Besides, it will be too hard to analyze the results due to their size.

Space issues could indeed be dealt with if I stop collecting data for all the terms, and only output the ones that behave differently between two algorithms. That will take embedding diff.awk into compute.js, and making the latter to test two algorithms at the same time.

Anyway, before proceeding with bruteforce, I first plan to look further into the issues with Lambdascope. In particular, it might possibly be that the embedded read-back mechanism somehow interferes with reduction. Then, it makes sense to test sizes 1-9 using the standard optimal reduction algorithm as well (currently not present in MLC).

Quick update

I added the standard optimal reduction to MLC in version 0.3.7, and conducted the same 250-1-9 testing against it. Unlike Lambdascope, nothing interesting was found. So, in version 0.3.8, I removed Lambdascope as unreliable, and renamed "standard" to "optimal". Results have been shared in the same gist.

Next up, I plan to arrange long-term testing for "abstract" and "optimal" (the standard optimal reduction) at the same time against closed λ-terms of size 10.

Counterexamples found

Among all the 63782411 closed λ-terms tested, there have been found the following three counterexamples for the "abstract" algorithm:

((v0: ((v0 v0) v0)) (v0: (v1: (v0 (v2: (v1 (v0 v2))))))) N/A NO RULES: !print = \fanout_{23}(w1, w2);
((v0: (v0 (v0 v0))) (v0: (v1: (v0 (v2: (v1 (v0 v2))))))) N/A NO RULES: !print = \fanout_{47}(w1, w2);
((v0: (v0 v0)) (v0: (v0 (v1: (v0 (v2: (v1 (v0 v2)))))))) N/A NO RULES: !print = \fanout_{29}(w1, w2);

The standard optimal reduction does not produce any errors on these terms, at least up to ten million interactions (none of these three terms have a normal form). The first two of them are simultaneously counterexamples for Lambdascope. On the third one, Lambdascope behaves the same way as the standard optimal reduction.


So if I understand correctly, those are counter-examples to the fact that your proposed "imperative" rule is correct with respect to the reference optimal-reduction strategy. Is that correct?

(Of course there is still the risk of having just detected a software bug in your implementation, which can only be ruled out by carefully inspecting the reduction behavior of the implementation on this program.)

Have you tried to understand what goes wrong, what is the essence of the problem?

Details unknown

All I know for now is that the "abstract" algorithm in MLC in its current implementation behaves incorrectly on those three terms, while the reference standard optimal reduction passes those particular three test cases. Whether it is the proposed hash table approach or embedded read-back mechanism that causes the error behavior is yet to be investigated. I am still collecting results for the standard optimal reduction against size 10. The latter will take a few more days.

Data collected

I have just finished collecting results for the standard optimal reduction at size 10 and comparing them against "abstract". Nothing interesting found besides those three terms. All the collected data have been shared in the same gist. I was hoping to find an easier counterexample which would at least have a normal form. None of those three terms are suitable for pen-and-paper analysis as they all take about 200 interactions and do not have a normal form. I think my best shot at figuring out the issue is to analyze the shortest Lambdascope counterexample, because it takes only 13 interactions on "abstract" and 23 interactions until the illegal one on Lambdascope, which are bearable.

Lambdascope fixed

The issues with the Lambdascope implementation turned out to be read-back-related and were fixed with

diff --git a/encoding/optimal/template.txt b/encoding/optimal/template.txt
index f8f0a9e..9d054d3 100644
--- a/encoding/optimal/template.txt
+++ b/encoding/optimal/template.txt
@@ -66,7 +66,7 @@;
 } \atom_{M};
-\read_{C}[a] {
+\read_{C}[\scope_{0}(a)] {
        /* Read back abstraction. */;
 } \lambda[\atom_{this.mkid()}, \read_{this.abst(C)}(a)];

In version 0.3.9, Lambdascope implementation is back online.

As for the experimental "abstract" algorithm, I have not been able to figure out what is causing the problems yet.

A better counterexample

I updated samples/counter.mlc with a better counterexample that has a normal form and fails only the abstract algorithm, while causing no problems against closed, optimal, or standard.

This counterexample takes 85 interactions until an illegal one which is better than 200, but still too many for me to figure out what is causing the problems. Any help with analysis and/or simplifying the term would be highly appreciated.