Binary Lambda Calculus and Combinatory Logic

While Anton was waxing about Church & Turing, I figured that Occam's Razor would be the type of proof one would postulate when giving the nod to Lambda Calculus over Universal Turing Machines. This leads inexorably to the question of what is the smallest (as measured in binary bits) Turing Machine that can possibly be constructed. John Tromp provides an answer to this question in his always fun Lambda Calculus and Combinatory Logic Playground:

Pictured you can see the 210 bit binary lambda calculus self-interpreter, and the 272 bit binary combinatory logic self-interpreter. Both are explained in detail in my latest paper available in PostScript and PDF. This design of a minimalistic universal computer was motivated by my desire to come up with a concrete definition of Kolmogorov Complexity, which studies randomness of individual objects. All ideas in the paper have been implemented in the the wonderfully elegant Haskell language, which is basically pure typed lambda calculus with lots of syntactic sugar on top.

Interestingly, the version based on the Lambda Calculus is smaller than the one on Combinators. A statement I found of interest in the paper about PL's:

Although information content may seem to be highly dependent on choice of programming language, the notion is actually invariant up to an additive constant.

Not sure if that statement means that PL research is ultimately doomed. :-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The catch is...

the notion is actually invariant up to an additive constant.

... the range of additive constants applicable to various languages is VERY large. ;-)

I guess, the dead give away is...

...that the Haskell code for the lambda calculus interpreter is a lot longer than Penrose's, but the compiled code is significantly smaller. :-)

(Just so used to throwing away constants in the big-O notation.)

Obligatory..

Rule 110 post.

Proviso

I'm guessing that the proposition is that a UTM can be constructed from the pattern of 14 bits (00010011011111), but would those 14 bits constitute an actual Turing Machine in and of themselves?

Apples and Oranges

Thought an email I received from John Tromp might be of interest:

Those cellular automata are universal only in a very roundabout sense, requiring you to first express your computation as a cyclic tag system, and then providing the CA with a specific infinitely long pattern (of which those 14 bits repeated form the background) as initial state, then running it long enough for certain conditions to arise, and then extracting the answer again from the current pattern.

My machines are universal in a much more useful, direct, sense: they process binary streams, parsing a program from it, that they run on the remainder of the stream. Furthermore, these programs are quite compact, which is of course what I needed for my application to Kolmogorov Complexity.

Fruit

Hmm... If by apples and orange you mean both are roughly spherical fruits of similar size, then I guess I agree ;-)

John Tromp's examples are universal only in a very roundabout sense, requiring you to first express your computation in the lambda calculus (or combinatory logic), encode that program in binary, and then having an interpreter with a an infinitely long stack space to hold intermediate results...

Infinite stacks...

then having an interpreter with a an infinitely long stack space to hold intermediate results...

Anything related to Universal Turing Machines is going to have to have some piece that is infinite. But by the rules of the game as outlined - assuming they are tenable - the intermediate computations are not part of the information density. That is, we measure the number of bits input (along with the initial bits we start with in the turing machine) in relation to the number of bits output. (Of course, one can argue that the space/time requirements of the algorithm are part of the complexity of the information as well).

...encode that program in binary...

Binary bits was the arbitrary measuring stick that I chose for measuring size, so that would be part of the requirement.

...

My argument above was a little different than John's response in that I don't consider a single cell of Rule 110 to be a Turing Machine. Rather, a TM can be constructed from a grid of interacting cells. Drawing an analogy, a single human brain cell is likely not considered to be a TM, but get enough of them together and you eventually end up with something that is Turing complete. So the measure of Rule 110 should not be that a TM can be constructed with a single 14 bit cell, but rather we'd like to know what is the smallest grid which can be constructed that achieves this level of computing? I suspect that the minimum number of cells required to achieve Turing completeness would dwarf the LC and CL implementations - but that's pure conjecture on my part. Something along the lines of the way a TM was constructed in the Game of Life.

And I suppose I should have made it more obvious from my reference in the original post that Occam's Razor is not really a decisive form of logic - being more a technique of persuasion. What is simple is a matter of circumstance and judgement. And since we measure all these forms of analysis in terms of Turing Completeness, the means by which we actually compare the relative merits of each has to be based on the insight they give us, the kinds of problems we are attempting to solve, and the constraints we are given.

But that's not nearly as much fun as saying "mine's smaller". :-)

And I suppose I should have

And I suppose I should have made it more obvious from my reference in the original post that Occam's Razor is not really a decisive form of logic - being more a technique of persuasion. What is simple is a matter of circumstance and judgement.

Occam's razor is not about "simplicity", but about the axioms of logical arguments; it actually is a valid logical principle.

Refinement

Occam's Razor is fine when applied to variations of a single system. For example, the SK-Combinator logic is considered to be more fundamental than SKI because I can be derived from S and K.

However, using Occam's Razor to compare two disparate competing systems is more tenuous. In the current instance, is Combinator Logic more fundamental than Lambda Calculus (or Turing Machines or Cellular Automata)? And how do we compare the axioms between the systems? I'm not sure that sheer quantity of axioms (or even more arbitrarily, the number of bits) represents a fair cop.

I'd say that Combinator

I'd say that Combinator Logic's more fundamental in that it doesn't need variables. I think there's a way we can go in (E)DSL design with the knowledge that variables aren't "fundamental" - that they're effectively a UI issue. Not least in EDSLs by handing off binding to the host language!

Unfree the variables!

I'd say that Combinator Logic's more fundamental in that it doesn't need variables

And I would say the opposite for the same reason. Variable binding ( a separate concept from the mere labelling of the variable) is the fundamental concept of computation.

Bringing that concept explicitly into the model by means of labelling, however, is much clearer than trying to hide it as a "UI issue", or throwing explicit variables aside because they present some descriptive problems for the theory.

(We really haven't given enough time to the pointed vs. point-free holy war here on LtU; consider this my pro forma entry. ;-) )

So would you go to the other

So would you go to the other extreme and suggest A-normal form as more fundamental than the conventional lambda calculus?

Minds over machines

o would you go to the other extreme and suggest A-normal form as more fundamental than the conventional lambda calculus?

Definitely not. A-normal form is more generally conceived as a convenience for compilers, not for humans. (Though, there may be exceptions to this...)

My position here is based on the goal of making key theoretical concepts visible and central for a person trying to understand computation by means of the model.

Maybe it's just me, but though I think combinatory logic is cool, for real purposes, I find LC much easier to reason about, thanks to the explicit labelling of bindings.

ANF's just the extreme form

ANF's just the extreme form though - almost everything gets labelled, including all those temporaries that only get used once. The only values that go unbound are constants. IME it's often actually easier to explain that to people first and then release the restrictions to get plain LC back, even if LC is simpler syntactically.

One way of looking at it is that ANF requires you to bind results as well as parameters - something that in an unsugared LC you can't actually do.

binding results

One way of looking at it is that ANF requires you to bind results as well as parameters - something that in an unsugared LC you can't actually do.

By "can't actually do", do you just mean that unsugared LC has no 'let'? Otherwise, (λ x. M) E  seems like a pretty good candidate for binding results in unsugared LC, i.e. binding the result of E to x in M.

From this perspective, in an unsugared LC with CBV evaluation (for simplicity in talking about "results"), results are "required" to be bound because the only thing you can do with a result is apply a function to it, thereby binding it.

It seems to me that ANF only becomes meaningful when you're dealing with languages that have more complex built-in data types (e.g. tuples) and/or functions that take multiple arguments, so that the idea of intermediate values comes into play. ANF requires you to bind intermediate values. [Edit: whereas in an unsugared LC, there aren't any intermediate values.]

Sure, I'm discussing in the

Sure, I'm discussing in the terms already given - though for pedagogical purposes I don't think using the unsugared version (except to show the desugaring) is exactly helpful in showing results being bound.

Exposing the notion of intermediate values is perhaps entirely the point here. I'm reminded of the result about the lambda calculus not offering any opportunities for parallelism against an imperative language that strangely disappears when you add things that take multiple parameters.

Bind the variables!

Variable binding may be how _you_ think about computation, but that doesn't make it fundamental. The fact that combinators do everything lambdas can completely proves that. Either one is fundamental, and combinators lack a complex feature lambdas have while still preserving the same functionality.
But this doesn't prove anything about a "pointed vs. point-free holy war"; combinators may be more fundamental, but that doesn't mean all your code should use nothing but combinators. There's SO MUCH else to do with one's life aside from figuring out combinators or primitive lambdas :-).

There's SO MUCH else to do

There's SO MUCH else to do with one's life aside from figuring out combinators or primitive lambdas :-).

As if. Name one thing that's more important! ;-)

Contemplating the nature of

Contemplating the nature of the curse from which all recursion originated?

Convenience is the reason

There are many formal systems (combinatory logic, lambda calculus, Turing machines, etc.) all equally capable of representing computation. Why would we prefer one over the other when studying human-efficient means to formally describe computations, i.e. programming languages?

Simple convenience.

Lots of experience has shown that the lambda calculii, as well as a few other formalisms which introduce naming as a concept, "maps" better to many human intuitions of computation--too much bookkeeping in other systems. That's not a statement I know how to make formally, let alone prove--just a simple observation.

Likewise, the other formalisms have their places as well--the Turing machine is a good model for the von Neumann computer architecture, for instance.

Saying that one particular model of computation should be "the" preferred model simply makes no sense. Just use the one that is most convenient for your work--in PLT that's usually the lambda calculus or one of the process calculii.

There can be only one....

Somewhat reminiscent of the discussion on which is more fundamental: Lists or Sets. The answer is the somewhat unsatisfying: it depends on what you want to do.

Since we're focused on PLs and this is not "Tape the Ultimate", we are allowed to root for the home team. :-)

Other tools

Some points about the tools you mention. Because these are analytic systems order and steps are ignored sence order is irrelevant to the outcome. But there are reasons to consider states even in an analytic situation. You need state in any information theory context, like the one in the paper. Also I would argue that you need information in dealing with the world outside any analytic component.

Chris Okasaki on minimal combinators.

Chris Okasaki has a great paper that deals with one obvious problem with the S and K combinators: you need parentheses. Okasaki shows how to eliminate the parentheses so that a binary expression like 11001 can be read, literally, as the composition 1 (1 (0 (0 1)))

Just in case this looks like pointless playing - Hinze shows how to use the underlying techniques to give a nice embedding of type-safe parsers in a functional programming language.