SF-calculus

SF-calculus builds combinators from two operators S and F that are more expressive than the usual S and K. The reduction rules are

SMNX = MX(NX)
FOMN = M if O is S or F
F(PQ)MN = NPQ if PQ is factorable

where the factorable forms are the partially applied operators, i.e. of the form S, SM, SMN, F, FM and FMN. The operators are the atoms while the factorable forms that are applications are compounds.
The calculus is confluent since atoms and compounds are stable under reduction. The ability to factorise can be used to define static pattern matching, in the sense of pattern calculus . In particular, the calculus supports a generic structural equality of arbitrary normal forms.

The factorisation combinator cannot be represented within SK-calculus, so that SF-calculus is, in this sense, more expressive than SK-calculus or pure lambda calculus. To fully appreciate this surprising, if not controversial, claim, requires a careful definition of representation.

This, and full proofs, are in our paper A combinatorial Account of Internal Structure .

Yours,
Barry Jay
Thomas Given-Wilson

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Controversial?

The factorisation combinator cannot be represented within SK-calculus, so that SF-calculus is, in this sense, more expressive than SK-calculus or pure lambda calculus. To fully appreciate this surprising, if not controversial, claim, requires a careful definition of representation.

My impression, based on a cursory examination of the paper, is that the specific claim is that it "cannot be represented" in the sense that the F combinator can distinguish SF-calculus terms with identical behavior under reduction but different representations, whereas there is no possible term in the SK-calculus that can distinguish other SK-calculus terms except by behavior.

If the above summary is reasonably accurate, I cannot see how it could possibly be a surprising or controversial claim. A quick tour of esoteric languages--particularly of the "Turing Tarpit" variety--will produce myriad examples of systems that are technically Turing-equivalent but so inexpressive as to be nigh-useless except as curiosities, best admired from a safe distance. The SK-calculus itself is arguably such, as evidenced by the "Turing tarpit" list above containing at least three languages that amount to little more than awkward notations for it.

It thus seems to me that SF-calculus is only "more expressive" than SK-calculus in roughly the same sense that a programmer might describe some languages as being "more expressive" than others.

Who needs lambdas?

My impression, based on a cursory examination of the paper, is that the specific claim is that it "cannot be represented" in the sense that the F combinator can distinguish SF-calculus terms with identical behavior under reduction but different representations, whereas there is no possible term in the SK-calculus that can distinguish other SK-calculus terms except by behavior.

This is correct, the surprising result is that there are Turing computable functions that cannot be represented in SK-calculus (i.e. lambda calculus).

Thus the controversial point is that there are Turing computable functions that cannot be represented in SK-calculus or lambda calculus. That is: Turing machines and SF-calculus can represent computable functions that the lambda calculus cannot, such as equality of arbitrary normal forms.

If the above summary is reasonably accurate, I cannot see how it could possibly be a surprising or controversial claim. A quick tour of esoteric languages--particularly of the "Turing Tarpit" variety--will produce myriad examples of systems that are technically Turing-equivalent but so inexpressive as to be nigh-useless except as curiosities, best admired from a safe distance. The SK-calculus itself is arguably such, as evidenced by the "Turing tarpit" list above containing at least three languages that amount to little more than awkward notations for it.

It thus seems to me that SF-calculus is only "more expressive" than SK-calculus in roughly the same sense that a programmer might describe some languages as being "more expressive" than others.

The SK-calculus is the traditional combinatory logic of Schönfinkel (equivalent to lambda calculus). Its utility as a programming language (and relation to others) is not the focus of the paper. The goal is to show that there are combinatory calculi (logics) that are more expressive than SK-calculus(/logic) and so there are computable functions that are beyond lambda calculus.

This is correct, the

This is correct, the surprising result is that there are Turing computable functions that cannot be represented in SK-calculus (i.e. lambda calculus).
Thus the controversial point is that there are Turing computable functions that cannot be represented in SK-calculus or lambda calculus. That is: Turing machines and SF-calculus can represent computable functions that the lambda calculus cannot, such as equality of arbitrary normal forms.

Is it not possible to write an interpreter for the SF-calculus with lambda terms (Church encoding, etc.) ? That is, if we consider SF terms as data, could we not encode all SF operations in lambda-calculus ?

I agree that "quoting" a SF term (producing its reification in lambda-terms or SK calculus) is a meta-operation, and that the simultion is done at the meta-level rather than then core level, but that would suggest a different, and probably less controversial, statement : « SF terms can express at the core level computations that, in lambda-calculus or combinator logic, would require an interpretive layer to be performed. »

(I agree that the meaning of "core level", "meta level" and "interpretive layer" is vague here, but so is your "can represent" notion, and perhaps the "meta" formulation provides a better intuition of the formal result ?)

Within not meta

It is possible to write an interpreter for SF-calculus in Lambda Calculus. However, as you identify this is a meta-level operation.

Similarly SK-calculus can be factorised (in the sense of the paper) by representing the SK-combinators as symbols on a tape or in a list. This again requires meta-level operations to factorise the SK-combinators. By contrast SF-calculus can factorise combinators (in factorable/normal form) within the calculus.

This is discussed in more detail in Section 3 "Symbolic Computation" of the paper.

So TMs /= untyped LC...?

That is: Turing machines and SF-calculus can represent computable functions that the lambda calculus cannot, such as equality of arbitrary normal forms.

So then you must be claiming that the proof that Turing machines can be translated faithfully into untyped lambda-calculus is wrong...?

In type systems for higher-order abstract syntax, one of the typical problems is that the language can represent functions on terms which distinguish terms with the same denotation. For example, you might be able to do a case analysis which distinguishes (\x.x)y from y. This is usually regarded as a defect, since it breaks the term abstraction and functions on HOAS are not homomorphisms for (the semantics of) terms. After all, representing abstract syntax is trivial; the point of HOAS is to treat more than just the syntax.

I can imagine why, in the context of a calculus for patterns, you would regard a similar but more refined property as an advantage. But I do not think you want to make the claim you appear to be making...

No...

...the trick here is that TMs and lambda-calculus are not necessarily equivalent for higher-order programs.

For example, in LC you can't implement parallel-or (ie, a program that takes two boolean thunks and returns true if either of them returns true). If you write a function that gets two boolean thunks, you have to evaluate one of them first, and this means an adversary can arrange that the one you run first loops, while the other one returns true.

However, you can implement parallel-or for Turing machines. This is because the "calling convention" for TMs is intensional -- you receive two encodings of TMs on your tape, which means you can use dovetailing/interleaving to run a simulation of the two arguments in parallel, which means that if either one ever returns true, you'll be able to tell.

Note that the calling conventions for higher-order code differ between the two languages, and that's what makes the difference happen. A somewhat dramatic way of putting this is that the Church-Turing thesis fails for higher type programs. John Longley has a gigantic survey article, "Notions of Computability at Higher Type" about this kind of stuff.

Crossed parallels

in LC you can't implement parallel-or ...
you can implement parallel-or for Turing machines

I can't help but feel there is a bit of a shell game going on here.

Both TMs and LC are normally conceived of as deterministic "machines", and this example is introducing non-determinism.

Since TMs are defined in a fairly fine-grained way, it is obvious how to make them simulate non-determinism, i.e. via stepwise interleaving.

But I think we can pull the same trick of with LC. Since we already tolerate different evaluation strategies for LC, why not introduce a non-deterministic evaluation strategy, where every subexpression must reduce by one step if possible, before any expression can take the next reduction step.

This effectively "interleaves" operations in the same way as the TM example, and resolves the "inequivalence" between TMs and LC.

I think this is a case where the details of the formalisms, and the conventions and traditions associated with them can make differences seem greater than they are.

Me too

I had exactly the same reaction. It seems that there's some foul play going on here...

"You can use dovetailing/interleaving to run a simulation of the two arguments in parallel" would seem to suggest that one would need to modify the particular universal TM implemented by the tape reader itself. Of course there is nothing to stop you from doing so, but this seems the same to me as adopting a different evaluation strategy for your LC. In fact, when specifying the evaluator, LC terms are as "intensional" as TM tapes (how else could one meaningfully define substitution, for example?). If one wants to treat terms as opaque entities equipped only with some particular notion of application (say, normal order reduction), then the point is true but trivial... Given a particular universal TM, the tape is no more "intensional" than a lambda term (allowing yourself to mess with the inside of the machine is cheating).

Of course, all kinds of fiddly concrete issues would need to be dealt with: how are the two distinct machines marked on the tape input, how does the interleaving work, etc., etc. But this just strengthens the point: in the TM setting, specifying this kind of interleaving would seem to require substantial collusion between the vendor of the TM and the producer of the tape...

You don't need to appeal to a different evaluation order.

You don't need to appeal to a different evaluation order. Neel says to encode two TMs and interleave the interpretation of them on a TM. You can do the same in LC. Implement a TM and interpret them in an interleaving fashion. Or you can implement an LC interpreter and interleave the reductions of those.

In fact, I think that to show an equivalence between TMs and LC you should use an LC with a fixed sequential evaluation order, that is a particular sequential operational semantics, like CBV; otherwise you're comparing apples with oranges.

Reflection and Higher Order Programs

Neel is depending upon a property of the TM: that 'programs' are explicitly represented on a 'tape', and are thus available for reflection. Given two TM programs (that should reduce to boolean output), it is feasible to write a third TM program that interleaves them for parallel-'or'. No changes to the TM primitives are needed - excepting that you need to put all three programs on the same machine (i.e. on the same tape).

With LC we could just as easily interleave two TM programs, or even two representations of LC.

But in context of a question about 'higher order' programming, we should assume in LC that our inputs to the parallel-'or' are themselves LC programs (that should reduce to boolean output). LC's computation model lacks TM's reflection facilities. The only means for one LC subprogram to make an observation on another is by application.

Sure, that's comparing apples and oranges, but recognizing the difference between apples and oranges is the whole purpose of this comparison! Just because LC and TMs are 'equivalent' for one set of observations (whether one can write the same 'functions' in each) does not mean they are equivalent for all possible observations (such as higher-order program composition).

I compare the question of Turing equivalence to asking "is it round?". We can't distinguish between apples and oranges because they're both round. We can't even distinguish between apples and grenades. We need better questions.

to show an equivalence between TMs and LC you should use an LC with a fixed sequential evaluation order

For the equivalence questions you've been asking (can M1 and M2 express the same functions?) it is valuable to compare TMs to all sorts of LCs: CBN, CBV, fully-concurrent, etc. It strikes me as odd that you'd select CBV in particular.

For the Neel's question of parallel-'or', it doesn't even matter even if all lambda applications in the program are concurrent. The problem is that one cannot express an LC function representing 'or' that will reduce both X or True and True or X to True when X is an unknown or divergent LC subprogram.

Not without a new primitive, anyway.

A matter of interpretation

Neel is depending upon a property of the TM: that 'programs' are explicitly represented on a 'tape', and are thus available for reflection.

Wouldn't the "program" be more properly considered the state transition table? The tape is a combination of output and recorded state for the program embodied by the transition table.

If you're feeding "programs" in on the tape, that's a Universal Turing Machine, i.e., a TM interpreter, which can be encoded just as well in λ-calculus or Conway's Game of Life or whatever you want, and everything is trivial again.

Similarly you could feed data describing λ expressions into an evaluator that interleaves them, whether the evaluator is a Turing Machine, another lambda expression, and so on.

But in context of a question about 'higher order' programming, we should assume in LC that our inputs to the parallel-'or' are themselves LC programs (that should reduce to boolean output). LC's computation model lacks TM's reflection facilities. The only means for one LC subprogram to make an observation on another is by application.

Neither has any reflection capabilities. The TM cannot inspect its state table, either. In both cases the input would be the a description of a different computation, either as tape cells or Church encoded data

The problem is that one cannot express an LC function representing 'or' that will reduce both X or True and True or X to True when X is an unknown or divergent LC subprogram.

I don't see how this would be expressed as a TM state transition table, either. Consider two TMs that may or may not halt starting from a blank tape; rename states until there are no overlaps and combine them. Starting from a blank tape, an initial state not in either sub-table, and any additional states desired, can you construct a machine that will halt if either of the original two does?

Keep in mind that, by construction, no state in the two sub-tables can ever reference states in the other.

UTM

Thank you for the correction. I was considering programs encoded for a UTM.

If you encoded a UTM in λ-calculus, you could compose UTM subprograms within an LC framework. And you'd probably learn something about how UTM programs compose. But that doesn't seem relevant to composition of λ-calculus subprograms within the λ-calculus.

The relevant questions of higher order program composition in a model aren't trivialized by noting that one model can be implemented in another.

you could feed data describing λ expressions into an evaluator that interleaves them, whether the evaluator is a Turing Machine, another lambda expression, and so on

True. But that is not relevant to composition of λ-calculus subprograms. Perhaps you're thinking of an alternative, fexprs-based computation model, where composing "data describing λ expressions" is the basis for program composition.

Consider two TMs that[...]

If we were composing TMs, it would be a problem; Turing's formulation is essentially defined around a tape that never changes independently of the TM, which means putting two TMs on the same tape is only okay if neither of them modify it.

But I speak of composing 'programs', not of machines. And, specifically, I speak of closed, higher order composition: combining two programs (called 'subprograms') for computation model M into a larger program for computation model M, and performing this composition from within computation model M (no meta-model for composition).

The very model of subprograms compositional

The relevant questions of higher order program composition in a model aren't trivialized by noting that one model can be implemented in another.

No, but they are trivialized by only dealing with programs being run on an ambient interpreter, such as a UTM. It's not relevant to composing λ-calculus terms directly just as it's not relevant to composing TMs directly.

What you call "higher-order composition" is only non-trivial when no external interpreter is involved.

If we were composing TMs, it would be a problem; Turing's formulation is essentially defined around a tape that never changes independently of the TM, which means putting two TMs on the same tape is only okay if neither of them modify it.

I'm not sure what it would mean to have a Turing machine use a tape it couldn't write to, that moved independently of the machine's state table. But it doesn't really matter, because...

I speak of composing 'programs', not of machines. And, specifically, I speak of closed, higher order composition: combining two programs (called 'subprograms') for computation model M into a larger program for computation model M, and performing this composition from within computation model M (no meta-model for composition).

Well, you're not going to get that from TMs or λ-calculus, because neither support a concept of composition beyond a black-box, "pass total control to the other computation" approach.

Perhaps some alternative formalization, permitting inspection of unevaluated subcomputations without ceding control to them, might suit your purposes? I believe there was a topic on the LtU forums about that just recently, in fact!

Re: 'External' Interpreters

As I understand Neel's claim about (U)TMs, he is speaking of an internal interpreter that takes advantage of the fact that all the composed programs are necessarily on the same tape as the internal interpreter.

neither [TMs nor λ-calculus] support a concept of composition beyond a black-box, "pass total control to the other computation" approach.

This is not correct.

λ-calculus has no concept of "control" (unless you specialize its definition with a particular reduction-order). LC does support "black box" composition.

TMs, OTOH, have a concept of "control", but have no concept of "black box". Everything on the tape is available. For a UTM, 'everything on the tape' would include all the programs (including the one in control).

Don't know why I bother

As I understand Neel's claim about (U)TMs, he is speaking of an internal interpreter that takes advantage of the fact that all the composed programs are necessarily on the same tape as the internal interpreter.

There is no such thing as an "internal interpreter".

Encoding a machine on tape and feeding it to a UTM is invoking a meta-environment for computation, which is what allows inspection. This is not at all equivalent to applying a λ-calculus term directly and I have no idea why you insist on making the comparison.

Encoding a TM on tape to feed into a UTM is conceptually equivalent to Church-encoding a λ-calculus term and feeding it to a λ-calculus interpreter of some sort. Why do you reject an interpreter in one case but implicitly assume it in the other? You can't have it both ways.

Encoding multiple λ-calculus terms and feeding them into an interpreter has the same compositionality benefits that a UTM does. You can even let one of the interpreted programs pretend to interpret the others if you like.

TMs, OTOH, have a concept of "control", but have no concept of "black box". Everything on the tape is available. For a UTM, 'everything on the tape' would include all the programs (including the one in control).

This is not correct.

You are persisting in confusing computational models vs. meta-models with an internal and external environment.

I am done repeating myself on this point.

Illegal moves, Casey

Encoding a TM on tape to feed into a UTM is conceptually equivalent to Church-encoding a λ-calculus term and feeding it to a λ-calculus interpreter of some sort.

Under some Humpty Dumpty definition of "conceptually equivalent", sure. Yet you insist on ignoring the problem scenario, which is program composition. As a consequence, you keep repeating irrelevant truths.

Let us assume component programs A and B. And there is a machine or bored mathematician named M who can interpret/execute these programs following a simple set of rules. Those rules correspond to the 'model' of computation for which the program is written - such as LC or UTM.

A and B, as defined in the scenario, are programs by themselves. M can interpret A, which we will assume either reduces to True or diverges. M can interpret B, which we will assume either reduces to True or diverges. For 'parallel-or', the job is to produce a program C that reduces to True if A or B reduces to true.

We must not change the encoding of A or B to achieve this, or we would be violating program composition. Additionally, we must not modify M or change to some M'. Thus, in the M = UTM case, we are assuming both A and B are encoded for a UTM. And in the M = λ-calculus case, we are assuming both A and B are encoded as λ-calculus terms. (And we could say similar things for A and B encoded in a fexprs language, or SF, or pick-your-model.)

This restriction against changing encodings defeats your "Church-encoding a λ-calculus term and feeding it to a λ-calculus interpreter of some sort". That is an illegal move - a violation of a program composition scenario.

I suspect you would argue that re-encoding 'TMs' A and B for a 'UTM' is also a violation of this scenario, and you'd be right. But that is irrelevant because, as I noted earlier, I've been speaking about a UTM all along. That is, for M=UTM, A and B are programs for that UTM; they appear on the UTM's tape. They are not re-encoded when composed into program C. Instead, the exact (original) encoding of A and B must each appear somewhere within the definition of program C, which really means: they must appear (in original form) on C's tape.

In the lambda-calculus case, A and B are programs encoded as applicative forms. Within the rules of program composition, these applicative forms must appear within the definition of C, unchanged. However, LC doesn't provide any mechanisms to tell M to peek inside these encodings. Instead, M will blindly interpret these applicative forms as straightforward parts of program C.

Within the restrictions of a 'program composition' scenario, it is claimed that UTM and LC are not equivalent. That it is possible to write C for the UTM, but impossible for LC.

The 'equivalence' that matters in program composition is "did not change machine, model, or program encoding".

Encoding multiple λ-calculus terms and feeding them into an interpreter has the same compositionality benefits that a UTM does.

Indeed, I agree. But the magic happens in that 'encoding' step, which I do not believe is feasible from within LC.

You are persisting in confusing computational models vs. meta-models with an internal and external environment.

Are you trying to say that meta-models are never, themselves, computational models? Perhaps you are insisting on a distinction that is less valid than you believe.

There is no such thing as an "internal interpreter".

Oh? I would say that most programs based on pattern-matching are interpreters - for term rewriting languages, for grammars, for regular expressions, for SF and fexprs. A 'case' statement is the very basis for interpretation. And these aren't the same as the 'ambient' or 'external' interpreter for the model as a whole. So, I wonder, what would Casey call them?

The wrong game

Under some Humpty Dumpty definition of "conceptually equivalent", sure. Yet you insist on ignoring the problem scenario, which is program composition.

I think this is a case where your fixation on distributed computation has over-complicated your understanding of the basics.

The basic premise of the Church-Turing Thesis is normally restricted to the deterministic case: for any partial function from the naturals to the naturals that you encode in one formalism (LC or TM), I can give you one in the other formalism.

The parallel-or case that Neel introduced breaks this by introducing non-deterministic inputs: the input will either be a number (evaluated as a boolean) or ⊥ determined by the termination or non-termination of the evaluation of input.

Distinguishing between termination and non-termination is par excellence non-deterministic thanks to the Halting Problem, so introducing it has changed the normal game.

Now because we tend to think of UTMs as being "just like" computers, and the way they are defined makes more explicit their meta-awareness possibilities, it is easier to see how seemingly "without making changes" we can slip to handling this new non-deterministic game.

The catch is that many parts of the LC system are just as meta (such as evaluation strategies, etc.) but because we don't normally think about them as explicit parts of the calculus, it seems that we have to make "unfair" changes to the LC to make it work with this new game.

But this is neither here nor there: in both cases the game has changed, and making the comparison between them without rethinking both formalisms is like arguing whether a cricket bat or a baseball bat makes the better hockey stick.

Re: The Basics and Determinism

Program C on the UTM would certainly be deterministic in the above scenario. Nothing I stated involves distributed computations or indeterminism.

The parallel-or case that Neel introduced breaks this by introducing non-deterministic inputs

Properly, the inputs to parallel-or in Neel's scenario are programs within the same model. These programs (and thus the inputs to parallel-or) are deterministic.

When run independently, those programs output a (deterministic) number representing a truth value, or (deterministically) diverge. The functional requirements of 'parallel-or' requires obtaining the outputs of those programs only up to obtaining a number representing True. Thus, computing the composed programs is part of the 'parallel-or' definition - but only as necessary to produce output from parallel-or... which, deterministically, shall be a number representing a truth value, or divergence.

We cannot 'reencode' the input programs because, under the rules of program composition, they must retain their properties under independent execution.

Distinguishing between termination and non-termination is par excellence non-deterministic thanks to the Halting Problem

But identifying termination after it has occurred is quite easy. ;-)

And, relevantly, that semi-decision is quite sufficient for implementing a Parallel-or. If you need a rough spec:

  • Parallel-or must output True if at least one of its input programs outputs True
  • Parallel-or must output False if all of its input programs output False
  • Otherwise, parallel-or may diverge

Now because we tend to think of UTMs as being "just like" computers, and the way they are defined makes more explicit their meta-awareness possibilities, it is easier to see how seemingly "without making changes" we can slip to handling this new non-deterministic game.

I've neither assumed nor implied indeterminism. The resulting program will be deterministic.

many parts of the LC system are just as meta (such as evaluation strategies, etc.)

Since I didn't explicitly name one, I'd allow any evaluation strategy for LC whose divergence characteristic will be deterministic. This excludes full-beta reduction, but allows full concurrent reduction (roughly, full-beta reduction with fairness).

Non-deterministically deterministic

But identifying termination after it has occurred is quite easy. ;-)
...
I've neither assumed nor implied indeterminism. The resulting program will be deterministic.

Based on the definition of "deterministic" you seem to have, everything is deterministic, because you can always wait until whatever happens and then say that that was what was supposed to happen.

The way I use the word, deterministic means something like "100% predictable from initial conditions".

Since we know from the Halting Problem that we have no way to reliably predict whether an arbitrary program will halt or not from its initial conditions, any program which must discriminate its behaviour based on the termination or non-termination of an arbitrary other program is by definition non-deterministic.

I sometimes think that many of the problem threads around here could be solved if everybody studied a little bit more philosophy. ;-)

Re: Definition of "Deterministic"

RE: The way I use the word, deterministic means something like "100% predictable from initial conditions".

Usually, we inductively prove a program in a model is big-step deterministic by proving each small-step and composition expression is deterministic. This does not mean we can generally predict the emergent or big-step behavior without actually running the program - not even whether the program halts, and certainly nothing more precise. Determinism, as I understand it, asserts only that the behavior (whatever it might be, divergence included) will depend only upon initial conditions. One might refine determinism for specific properties (e.g. deterministic timing or deterministic output, deterministic-on-success vs. deterministic-failure-modes).

For Halting Problem, we usually use the word 'undecidable'. That is, we cannot write a universal decision algorithm H that will, for arbitrary program P produce in finite time a yes or no answer to "will P halt?" (or H(P); assuming inputs to P are part of P's definition). But, even if we cannot know what H will answer, we may prove that H(P) will have the same behavior every time: divergence or an answer.

If you really feel I'm misunderstanding the words, I'd love to see a second opinion. :-)

Based on the definition of "deterministic" you seem to have, everything is deterministic, because you can always wait until whatever happens and then say that that was what was supposed to happen.

That isn't the case at all. I would say that Actors model, for example, is fundamentally indeterministic because it imposes no ordering on receipt of incoming messages (other than that they arrive after they are sent). Thus, there is no small-step determinism from which we may inductively deduce big-step determinism.

Given an initial set of conditions, an Actors model program is permitted to produce a different observable behavior each run. A program may also be halting for some runs, and non-halting for others.

I sometimes think that many of the problem threads around here could be solved if everybody studied a little bit more philosophy. ;-)

Methinks you should study the philosophy fora. :-)

A different approach

Usually, we inductively prove a machine, program, or model is big-step deterministic by proving each small-step is deterministic, and each composition is deterministic.
This does not mean we can predict the big-step output, but we can claim that we'll have the same output every time we run the program (assuming inputs remain the same).

By the time we are talking about small-step and big-step we have already escaped either formalism. A human being able to look at the details of the system and prove things about it is already at a meta-level from the system, and how easy it is to make such proofs in some particular circumstance is not a measure of the formal power of the system itself.

It is true that the TM definition lends itself to small-step thinking, and that most people who use the LC also rely on big-step evaluation, but these are more by convention and circumstance than necessity.

Another factor is that it is very rare for anyone talking about TMs to really nail down the details of the exact system they mean, whereas LCs are usually laid out in excruciating detail, so I think there is an illusion that the former are more flexible than the latter. ;-)

If you really feel I'm misunderstanding the words, I'd love to see a second opinion. :-)
...
I would say that Actors model, for example, is fundamentally indeterministic because it imposes no ordering upon receipt of incoming messages

Not very surprisingly ;-), you seem to be using "deterministic" in the more restricted sense normally used when talking about concurrent programs. I'm not saying that's wrong, but to understand what I mean, you need to go with the definition I gave.

I think if we can agree that the normal understanding of both TMs and LC is restricted to functions of the form N → N and the parallel-or function is of the form N → N, and that this difference changes the game being played, we will at least have a shared understanding of our game. ;-)

(For the nitpickers at home, substitute "isomorphic to" for "of the form" in the above paragraph.)

The reason I think this is worth hashing out is simply because of the seemingly endless stream of "discoveries" (including the paper in the original post) that Church-Turing is "inadequate" as a basic model of computation, when most often the discoverer has merely changed the rules of the game without noticing.

Methinks you should study the philosophy fora. :-)

I was already thinking "be careful what you wish for" when I made that last post. ;-)

Re: a different approach?

A human being able to look at the details of the system and prove things about it is already at a meta-level from the system, and how easy it is to make such proofs in some particular circumstance is not a measure of the formal power of the system itself.

I'm not following this argument. Are you referring to a specific formalized power? or to the set of all possible formalizable powers? If the latter, I'm inclined to disagree. If the former, I object to referencing ability to express various classes of computable functions as 'the formal power of a system'. But perhaps you meant something else?

Can you clarify how your argument here is relevant to the earlier discussion on Determinism and Halting?

most people who use the LC also rely on big-step evaluation

Do they really? It has been my impression that most people understand LC in terms of small-steps as well - i.e. individual substitutions and term reductions. Thus, they reduce (\x.(x x) \x.(x x)) one step at a time.

Each step is fully deterministic - deterministic inputs, and deterministic outputs. Therefore, the behavior is deterministic, even if we never see a normal form (which we won't).

you seem to be using "deterministic" in the more restricted sense normally used when talking about concurrent programs

It applies also to stochastic or explicitly random programming models. For example, stochastic programming models explicitly include probabilities for certain operations occurring. And I know at least one esoteric language that executes statements in a random order (but one at a time).

We can combine concurrency with determinism, as for concurrent constraint programming and certain forms of dataflow programming (such as Kahn process networks). In these models, concurrency is necessary to achieve progress of the computation.

I understand concurrency and determinism to be orthogonal properties. I'm really not understanding your definition for it.

to understand what I mean, you need to go with the definition I gave

Ah. There's glory for you.

the parallel-or function is of the form N⊥ → N⊥, and that this difference changes the game being played

Program C holistically is still of the form N → N⊥

But I'll happily grant that the very question of program composition changes the game being played - at least if you had assumed the game being played was representing computable functions (by translation).

the seemingly endless stream of "discoveries" (including the paper in the original post) that Church-Turing is "inadequate" as a basic model of computation

They are inadequate, for many reasonable definitions of "adequate". ;-)

But I agree people should be careful to identify precisely which definition of adequacy they are using, and to distinguish it from that applied in the Church-Turing thesis.

Clarifying, I hope

Thank you for being so concrete and clear. This is a different point than I thought you were making (and it's still not clear to me whether Neel was making your claim or another one.) In any case, I now see what you're saying, and hopefully I can be as clear.

I'll quote you at length, so it's very clear what my terms mean:

Let us assume component programs A and B. And there is a machine or bored mathematician named M who can interpret/execute these programs following a simple set of rules. Those rules correspond to the 'model' of computation for which the program is written - such as LC or UTM.
A and B, as defined in the scenario, are programs by themselves. M can interpret A, which we will assume either reduces to True or diverges. M can interpret B, which we will assume either reduces to True or diverges. For 'parallel-or', the job is to produce a program C that reduces to True if A or B reduces to true.
We must not change the encoding of A or B to achieve this, or we would be violating program composition. Additionally, we must not modify M or change to some M'. Thus, in the M = UTM case, we are assuming both A and B are encoded for a UTM. And in the M = λ-calculus case, we are assuming both A and B are encoded as λ-calculus terms. (And we could say similar things for A and B encoded in a fexprs language, or SF, or pick-your-model.)
This restriction against changing encodings defeats your "Church-encoding a λ-calculus term and feeding it to a λ-calculus interpreter of some sort". That is an illegal move - a violation of a program composition scenario.
I suspect you would argue that re-encoding 'TMs' A and B for a 'UTM' is also a violation of this scenario, and you'd be right. But that is irrelevant because, as I noted earlier, I've been speaking about a UTM all along. That is, for M=UTM, A and B are programs for that UTM; they appear on the UTM's tape. They are not re-encoded for composition of program C. Instead, the exact (original) encoding of A and B must each appear somewhere within the definition of program C (which really means: they must appear on that tape).
In the lambda-calculus case, A and B are programs encoded as applicative forms. Within the rules of program composition, these applicative forms must appear within the definition of C, unchanged.
Within the restrictions of a 'program composition' scenario, it is claimed that UTM and LC are not equivalent. That it is possible to write C for the UTM, but impossible for LC.

I agree that it's possible to write C for UTM. But we need to dig deeper. What exactly is C? What must it do? In order to guarantee interleaving without modifying UTM, C must effectively implement its own UTM which performs the interleaving as well as all the rest of the operations of the underlying UTM. But now it seems to me that you have two problems:

  1. It's difficult to support your claim that the encoding of A and B need not be changed. In particular, either A or B may require unbounded tape for its computation, so the operation of C will need to somehow split the tape between A, B and its own specification and internal state. (For example, we might change "one move left" in A to "ten moves left" or something similar, giving each program infinite tape and allowing them not to interfere.) But clearly A and B will need to be re-encoded to execute in this new environment (e.g., separating each tape marking by 9 spaces).

    But this re-encoding could be performed by C itself, so I concede that it doesn't really undermine your specific claim that A and B could initially be present on the tape in their original form.

  2. It's difficult to support your claim that the situation with UTM is fundamentally different than implementing a new interleaving LC interepreter in LC and providing A and B in whatever representation of lambda terms you like. In either case there is clearly another layer of interpretation occurring.

If you agree so far, then it seems that the only real claim being made is that a tape is a somewhat better (or at least somewhat more obvious) form of general-purpose data representation than lambda terms. I agree entirely, but I don't think it's an especially deep point. The fact that you're overloading the tape to provide input to another interpreter is a sleight-of-hand that has important implications for practical computers, but little theoretical import.

Re: Now you have two problems

;-) I refuse to make any claim that UTM is 'good' or 'better' because it can do this. One can make any number of counter-points about LC's relative benefits for local reasoning, security, and so on.

It's difficult to support your claim that the situation with UTM is fundamentally different than implementing a new interleaving LC interepreter

Let's assume that the 'interleaving LC interpreter' already exists and does what we need, under the name 'ILC'. How do we apply ILC? This question ultimately differentiates the scenarios.

We aren't allowed to re-encode A and B for ILC. Essentially, we'd need some way to 'quote' A and B (or otherwise obtain an encoding of their expression) so that ILC can 'interpret' them. But we aren't allowed to tweak the machine (or bored mathematician) M, thus we cannot add any quoting or fexpr primitives.

This returns to your point (1): you say: "this re-encoding could be performed by C itself". That is true for a UTM. It is not true for LC.

The fact that you're overloading the tape to provide input to another interpreter is a sleight-of-hand that has important implications for practical computers, but little theoretical import.

I disagree. I believe this 'sleight of hand' is both practical and worthy of theoretical exploration - not for the UTM in particular, but rather the exploration of staged and reflective programming in general. Rigorously apply this sleight of hand to your Coq, and you'll emit an Ssreflect library. >;^)

Exploring approaches to reflection and staging, and especially what can be achieved without 'escaping' a given computation model, is of import to both practice and theory. It also remains a major research area today.

Feeling much better

I'm glad we dug deeper. We now agree on the technical details, and I guess we just disagree on the significance of the distinction, which is eventually a matter of opinion, I suppose. Incidentally, I could see a bunch of ways of extending this to a claim that I would find more significant: for instance, if you could show that any LC encoding of A and B suitable for interpretation by C required an asymptotic blowup in space, that would be more interesting. (Even more interesting if you could show that the UTM C did not require a corresponding blowup in time/space to unpack A and B prior to interpretation.) This sounds like a familiar direction, but I can't think of specific work.

Anyway, I will leave it to you to "rigorously apply this sleight of hand to your Coq," and you can feel free to enjoy whatever emissions are produced... ;)

Quelle Bordelle

You shouldn't make fun out of the French. ;)

Grammaire

Actually "bordel" is a masculine noum (coincidence?) so it is "Quel bordel" :-).

BTW, I'm french and I approve making fun of OCaml and Coq :-p.

Right

Postscriptum

I apologize for the tone of my previous comment, above. I was frustrated and unable to sleep at 2:40am and let that unduly color my remarks; I plead forbearance. It is probably best to recuse myself from further discussion on the matter, as well.

Don't worry

I share your frustration. I thought I addressed all of this in my first comment on the subject, but it's since been repeated three or four more times, to little avail. I'm finding this thread extremely frustrating and am trying very hard not to post any more comments...

Those comments

My response was intended as much as a response to your comment as to Frank's. The claim that "one would need to modify the particular universal TM implemented by the tape reader itself" isn't true. One can encode a new UTM within a UTM, and that is what the 'parallel-or' program in Neel's scenario would essentially be - a reflection or simulation of multiple UTMs.

I fully agree that "(allowing yourself to mess with the inside of the machine is cheating)". But changing the encoding of the programs being composed (beyond a trivial renaming of 'main') is also cheating. [Namespace issues are often a concern for program composition!]

I don't get all these arguments

Since the tape of a Turing machine can be seen as a number, and LC can express number arithmetic, thereby simulating a Turing machine, and Turing machines can evaluate LC expressions, are they not just equivalent?

How about simulating TM using untyped LC?

Hi, I had to show the equivalence between TM and untyped LC has a homework this year. The way I did it was to encode a universal Turing machine using lambda calculus and show the existance of a Turing machine that can reduce any lambda-term that it has on its input tape (written using an appropriate encoding).

The article is available here, I'm sorry it's in french.

So my question is: if I have the encoding of a TM that implement a parallel-or, what could stop me to encode it in the lambda-terms representation I used in my homework and then simulate it in untyped lambda-calculus?
Because if I can do this then I implemented parallel-or in LC and according to you there's a problem with that.

What's wrong with my reasoning?

EDIT: okay so I just saw the other responses to neelk post, maybe I'm not wrong after all.

More subtle than that...

It's a bit more subtle than that. No one is claiming that LC and TM are not formally equivalent, in the sense that you can come up with a suitable encoding of input terms so as to allow implementing parallel-or in LC. The question is more to do with what I might call the "native expressivity" of each formalism. Given two arbitrary boolean predicates in LC, encoded in a "natural" way and not according to your whims, is it possible to implement parallel-or?

In other words, the back-and-forth encoding of a TM parallel-or in LC is likely to require a global transformation or another layer of interpretation, which is exactly what we want to avoid.

This is made tricky, though, by the fact that neither formalism really even has a native notion of "boolean predicate," so immediately you have to deal with some minimal questions of encoding.

That said, I still don't really buy Neel's argument above, for the reasons in my previous comment... But maybe there's more to the story. I glanced at the survey he linked, but honestly just don't have time to go through it. Hopefully he can expand or clarify for us?

Still don't see it

My LC knowledge when it comes to boolean encodings is twenty years old. But, as far as I remember the trick was that true and false implement conditionals, so I thought they were already parallel in nature? (Under lazy reduction)

[Ah well, I'll look up the conditionals.]

Not individual conditionals

Neel's challenge is expressing a single `or` function that can reduce both (True `or` _|_) and (_|_ `or` True) to True. The question of parallel reduction of operands is not relevant.

Yeah

I got that after reading the other comments. So, I guess SF is mostly about the introspective qualities of different combinatory calculi.

Nice

Even in French, I could actually read that.

Unconvinced, but grateful

I will have a gander at that article, thanks. But I don't believe your argument; see my post below.

Not quite...

So then you must be claiming that the proof that Turing machines can be translated faithfully into untyped lambda-calculus is wrong...?

No, either system can be translated into the other - but this requires some meta-level operation to factorise the lambda-terms, or tape and machine/state to be translated. Further, the translation of a Turing Machine into Lambda Calculus must represent the symbols on the tape in some data structure and not as the application of the symbols to each other.

The claim is that there are computable functions that can be represented within SF-calculus that cannot be represented within SK-calculus. Consider the equality of normal forms combinator equal, that can be applied as follows:

equal X Y

When X and Y have the same normal form then the result is true (K), if X and Y have distinct normal forms then the result is false (KI) [1].

By contrast there is no SK-combinator that can determine equality of normal forms within SK-calculus[2].

Note that none of this conflicts with the traditional proofs that all of these models can compute the same effectively calculable general recursive functions (by definition functions on the natural numbers).

1 - Details/proofs, including confluence, are on pages 8-9 of the paper

2 - Corollary of Theorem 3.2 on page 7.

Edited to fix title.

Re: Computational Equivalence

Most people will read 'computational equivalence' as 'capable of computing the same functions'. Translations (one in each direction, rarely isomorphic) serve as proof of this 'computational' equivalence. (How else would we compare different models for this equivalence?) For better or worse, you are not going to redefine that word phrase; reception to such attempts will be as confused, hostile, and misunderstood elsewhere as they are here.

Instead, use some other words - perhaps 'expressive power' and 'subsumption', which you can reference. Identify compostional properties. If necessary, invent and formalize a word-phrase that won't be as heavily overloaded; avoid strong connotations that cause people to argue definitions with you.

Thanks for the comment.

Thanks for the comment. I have fixed the title, my mistake.

OK, but...

I think I understand what you are saying, but I still have problems with the wording of your claim.

You say, "The claim is that there are computable functions that can be represented within SF-calculus that cannot be represented within SK-calculus." Let's be more explicit. The set of "computable functions" is the set of partial recursive numeric (PRN) functions. SKI, LC, TMs and presumably SF-calculus all have models which are surjective on this set. This is the sense in which they "represent" all computable functions.

Now, given this (standard) definition, it cannot be the case that there exists an SF-term whose interpretation does not coincide with the interpretation of some LC- or SKI-term since every SF-term is interpreted by a PRN function and every PRN function is in the image of the other models; in short, SF models cannot be "more surjective".

I suspect what you want to say is something more like this. PRN functions form some sort of F-algebra (for some F which accounts for factorizability or whatever) and so do SF-terms. SF is a classifying model for F, while LC, etc. need to be equipped with extra structure to form F-algebras.

This would be an interesting result, but not controversial.

You say, "The claim is that

You say, "The claim is that there are computable functions that can be represented within SF-calculus that cannot be represented within SK-calculus." Let's be more explicit. The set of "computable functions" is the set of partial recursive numeric (PRN) functions. SKI, LC, TMs and presumably SF-calculus all have models which are surjective on this set. This is the sense in which they "represent" all computable functions.

Let me focus upon this statement and clarify, as when making it I was using it to summarise the statements in the paper.

On page 4 a "symbolic function" is "an n-ary partial function G of some combinatory logic, i.e. a function of the combinators that preserves their equality, as determined by the reduction rules." For example, S and K can be defined as symbolic functions:

S(M,N,X) = MX(NX)
K(X,Y) = X

On page 6 we clarify that we are interested in "symbolic computation", that is a "Turing-computable symbolic function". The "Turing-computable" is in the sense that it can be computed using a Turing machine.

Note that there is no requirement that the "(Turing) computable (symbolic) functions" (TCSF) be the partially recursive numeric functions (PRNF). That said, there is of course a way to "represent" any such TCSF by a PRNF. Part of the discussion (beginning page 7) is that the shift from one to another is metamathematical in nature, and requires implicit factorisation, say, of SK-combinators.

To return to the claim, I hope this clarifies how "there are [Turing-]computable [symbolic] functions that can be represented within SF-calculus that cannot be represented within SK-calculus"? Specifically, the factorisation combinator F represents (from here used in the sense of the paper) a TCSF on combinators that cannot be represented within SK-calculus/Lambda Calculus.

Of course F can be represented once you encode the SK-combinators into a list or tape or some other structure and compute over that structure. But then the factorisation has already been performed at a meta-level.

Now, given this (standard) definition, it cannot be the case that there exists an SF-term whose interpretation does not coincide with the interpretation of some LC- or SKI-term since every SF-term is interpreted by a PRN function and every PRN function is in the image of the other models; in short, SF models cannot be "more surjective".

I suspect what you want to say is something more like this. PRN functions form some sort of F-algebra (for some F which accounts for factorizability or whatever) and so do SF-terms. SF is a classifying model for F, while LC, etc. need to be equipped with extra structure to form F-algebras.

This would be an interesting result, but not controversial.

I agree that there are (PRN) functions interpretation is the same as the interpretation of some SF-term. The claim is that there are (TCS) functions (e.g. F) that are representable within combinatory logic that are not representable by an SK-combinator. However, there are combinatory logic(s) that can represent these TCSFs (e.g. SF-calculus), and so there is more to combinatory logic than lambda calculus.

Who needs anything, then?

This is correct, the surprising result is that there are Turing computable functions that cannot be represented in SK-calculus (i.e. lambda calculus).

Thus the controversial point is that there are Turing computable functions that cannot be represented in SK-calculus or lambda calculus. That is: Turing machines and SF-calculus can represent computable functions that the lambda calculus cannot, such as equality of arbitrary normal forms.

But this is of course only true if you exclude using SK-calculus to implement an interpreter for the SF-calculus. One might as well say that the SK-calculus cannot represent integers because it lacks them as primitives and must use some sort of meta-level encoding, e.g. Church numerals. For that matter, I don't see how either SK- or SF- calculus is equivalent to a Turing machine in this sense; which combinator inspects the color of a cell on a tape?

Many demonstrations of Turing equivalence in fact rely heavily on encoding schemes and simulation of other Turing-equivalent systems, rather than accomplishing anything using "native" constructs.

It is, I think, only the meaning of "represented" here that is surprising or controversial.

The SK-calculus is the traditional combinatory logic of Schönfinkel (equivalent to lambda calculus). Its utility as a programming language (and relation to others) is not the focus of the paper. The goal is to show that there are combinatory calculi (logics) that are more expressive than SK-calculus(/logic) and so there are computable functions that are beyond lambda calculus.

Personally, I would characterize "logical expressivity" as a subset of "utility as a programming language", but that's not terribly relevant. The point was merely to demonstrate that the existence of Turing-equivalent systems that are unable to express some functions "directly" is well-known. In some circles, for instance, this idea would be known as "Greenspun's Tenth Rule".

I hope you don't mind my complaints too much--the SF-calculus looks interesting, but the way expressive power is discussed detracts somewhat from the paper, to my mind. In short:

  • Turing-equivalence is an irrelevant distraction
  • The idea that some systems are more "expressive", in some other sense, is neither surprising nor (I hope) controversial
  • SF-calculus is indeed more "expressive", in some sense, than SK-calculus

I would actually be interested in a formal treatment of "represention", "expressive", and related concepts, as they're used here. In what senses can useful distinctions be drawn between two Turing-equivalent systems? I didn't see a direct definition of the terms in the paper, nor an obvious reference to other material giving such. What would be a good place to look?

The Expressive Power of Programming Languages

The definitive citation is Felleisen 1990, On the Expressive Power of Programming Languages, Science of Computer Programming.

Expressive Power of Programming Languages

Subsumption in Language

Paul Blain Levy in his Call-By-Push-Value: A Subsuming Paradigm offers a nice definition of 'subsumption' in language, which can be used to compare expressiveness.

But what does it mean for one language to subsume another? After all, there are sound, adequate translations from [call-by-name and call-by-value] languages into each other and into other languages such as linear λ-calculus, Moggi's calculus, and others. So we must explain in what sense our translations into CBPV go beyond these "classic" transforms.

We therefore introduce the following informal criterion. A translation α from language L' into language L is subsumptive if every "naturally arising" denotationalt semantics, operational semantics, or equation in L' arises, via α, from a "similar" denotational semantics, operational semantics, or equation for L.

The importance of such a translation is that the semanticist need no longer attend to L' because its primitives can be seen as no more than syntactic sugar for complex constructs of L. We shall see in Sect 1.2 that the classic translations mentioned above are not subsumptive.

This is informal, but I think it offers a useful intuition for understanding when one language L subsumes L', which is when individual expressions of language L' are essentially local syntactic sugar for expressions in L, without loss of important semantic properties and no need for frameworks.

But this is of course only

But this is of course only true if you exclude using SK-calculus to implement an interpreter for the SF-calculus. One might as well say that the SK-calculus cannot represent integers because it lacks them as primitives and must use some sort of meta-level encoding, e.g. Church numerals. For that matter, I don't see how either SK- or SF- calculus is equivalent to a Turing machine in this sense; which combinator inspects the color of a cell on a tape?

Of course it is possible to create an interpreter for X in Y (where X and Y may be SK-calculus, SF-calculus, Lambda Calculus, Turing Machines, etc.). However, there require the original X-terms (etc.) be factorised to some structure in Y that can then be operated on in Y.

The focus of the paper is upon the ability to look inside the structure of a factorable form. This allows some functions to be represented within SF-calculus that cannot be represented within SK-calculus. For example, the equality of normal forms as discussed on page 9 of the paper.

Note that there is no claim that SF-calculus is equivalent to a Turing Machine.

I hope you don't mind my complaints too much--the SF-calculus looks interesting, but the way expressive power is discussed detracts somewhat from the paper, to my mind. In short:

•(A) Turing-equivalence is an irrelevant distraction
•(B) The idea that some systems are more "expressive", in some other sense, is neither surprising nor (I hope) controversial
•(C) SF-calculus is indeed more "expressive", in some sense, than SK-calculus

I don't mind at all. To address you points (labelled) above:
A - We didn't mention "Turing-equivalence". Of course one model can simulate another, however this requires a meta-mathematical operation to convert the terms/combinators/etc. into a form the other model can use.
B - Although this is true in general, I believe it is surprising and unexpected that a combinatory calculus is more expressive than Lambda Calculus. Specially as combinatory calculi were considered to all be able to be represented by S and K alone and thus equivalent to Lambda Calculus. SF-calculus disproves this.
C - See B, and consider the equality of normal forms (above).

I would actually be interested in a formal treatment of "represention", "expressive", and related concepts, as they're used here. In what senses can useful distinctions be drawn between two Turing-equivalent systems? I didn't see a direct definition of the terms in the paper, nor an obvious reference to other material giving such. What would be a good place to look?

"Representation" is defined at the bottom of page 4 (of the paper). A discussion of the relation between different models if in Section 3 "Symbolic Computation" (pages 6-8). As discussed above, another good place to look would be "On the Expressive Power of Programming Languages" by Matthias Felleisen.

Intension/Extension and 'shape'

The whole point of the SF-calculus, as far as I am concerned, is that application is not blind. More precisely, before you evaluate a term, you are allowed to look at its shape (its intension) and make a decision about what to do based on that. Extensional calculi explicitly prevent themselves from doing that, basically to get more powerful results about substitution.

But this does not happen just in programs, it is already present in mathematics.

This leads to an interesting, new kind of 'calling convention' that I refer to as call-by-intension or call-by-computation [but would welcome suggestions for better names]. In a language with first-class syntax (i.e. one where you have programs-as-data), a function can ask that its input be given "unevaluated". This is different than call-by-name where you get a thunk: you get an actual value, but it is an AST which, when evaluated, will give you the 'actual value'; of course, the computation could be divergent...

Lest people think this is really esoteric, this calling convention has existed in Maple since its very beginning. It turns out to be quite useful! The SF calculus is the first time I see another language with this feature. [Thomas, Barry, email me, I think we should discuss this further, I've been working on this on-and-off for quite some years].

Shutt's Kernel Language

John Shutt seems to be working on a similar premise for kernel. Maude reflection is also similar in purpose and power.

I'm looking for a good balance between the expressive power the reflexive (pass-by-expression?) features offer and the compositional power that object capability models and first-class hidden models can offer. I'd really like to get performance advantages out of both. But my attempts thus far have been unsatisfying (i.e. failing support for Ka Ping's 'Secure Interaction Design' principles that require it be clear at all times which authorities are possessed, and which are shared).

pass-by-expression

Very good, I might adopt that.

Interesting

It is truly remarkable that two such different roads should lead to fexprs at almost exactly the same time. It's also remarkable that you seem to be unaware of the extensive recent discussion on this site of "call-by-intension." I wonder if there's some substantial difference that I'm missing?

Where?

There were some stretches of time where I did not visit LtU all the time, so I may well have missed some important discussions. Can you link to them please?

Link Link

Thanks

I had indeed missed that those discussions contained relevant material.

I would add this one. You

I would add this one. You should be aware that call-by-intension takes you straight into deep holy-war territory, and for better or worse, these discussions are also now inseparable from the (many) difficulties surrounding Scheme's community and standardization process.

Regarding denotational semantics, the Wand paper is worth reading, but something tells me it's no longer the final word on the subject. The implications for source-to-source optimization are by now very well known, but today when this is more or less taken for granted and JITs are increasingly the norm... Well, I will be interested to see a fresh look at these ideas.

(Edit: incidentally, I don't blame you at all for missing the relevance of those discussions. They were more valuable than they appeared, in my opinion, but getting to the good stuff takes some work.)

Call-by-intension

There seems to be something of a spectrum here, of call-by-intension devices. Wand's paper was looking at a device —a kind of fexprs— that was capable of completely dismantling and analyzing any computational term whatsoever in the formal system. You might call that "full reflection". The device I'm working with, also a kind of fexprs, can completely dismantle any source expression (i.e., an AST), but cannot dismantle a computation that is already underway, and cannot completely dismantle certain kinds of encapsulated objects that occur at interpretation time — in particular, it cannot dismantle environments or fexprs. And in SF-calculus, it seems, there is yet another balance struck as to what the combinators can and cannot dismantle. Clearly, when two objects can be distinguished by dismantling them as operands, those two objects can't be formally equated to each other. Wand's system has a completely trivial theory exactly because it can completely dismantle all possible terms. The other systems have nontrivial theories because there are terms that they can't dismantle.

Superb

Very well put. Yes, there is a full spectrum of choices here, and the 'extreme' choice forces pure interpretation. There must be some middle ground designs which are 'useful'. The choice of being able to dismantle any AST but not an arbitrary computation is particularly appealing. [And is indeed what the Chiron logic [PDF] supports].

Cool

Thanks for the reference; I see some serious reading in my immediate future. Of coure, vau-calculus is untyped and computation-oriented, rather than typed and logic-oriented... which both explains how the two paths of research might have managed not to cross, and also makes the comparison between them all the more fascinating.

Although vau-calculus is dabbled with slightly (using what by now seems a rather inferior explanation of its relation to Wand's work; you've already seen some of my latest thinking on that) in Appendix C of the Kernel Report [PDF], the definitive reference on vau-calculus would be Part II of my dissertation — which I submitted to my committee a few days ago, and won't be defending until mid-August.

I've been doing this in

I've been doing this in Bling for a long time. I thought it was normal: when you have functions that compute over expression trees rather than values. If you dress the expression trees up to look like normal values, the user doesn't really know the difference.

Computing over expressions

"computing over expressions" has been 'done' for a very long time - LISP (1958) surely claims dibs on that. Hiding the difference is both convenient and very dangerous: 'equivalent' expressions may not give you the same answer anymore. This is a feature which can lead to results which are so surprising that I am quite wary of 'hiding' it. Although I certainly understand the attraction.

What I see as missing a reasonable theoretical framework that allows one to deal with expressions and their values as co-existing and linked entities in the same language. Have an operational semantics is the easy part; having a sound denotational semantics is what I don't think has been done. Or, if it has, I would love to see it, as I would start using such a denotational semantics!

Lots of people...

...have done this for a long time, but it's not a "normal" operation. The reason is that having unrestricted reflective access to your AST means that equational properties that you might expect your language to have, don't necessarily hold.

E.g., we might expect (λx. x) 5 to be just the same as writing 5, but if you have access to the expression tree, then you can tell these two terms apart and return different results based on the syntax of the input. This is why systems like MetaML don't allow reflection on syntax, even though they have staged computations -- the designers wanted to preserve as strong an equational theory as they could.

OTOH, sometimes you really do want to look at the syntax and decide what to do -- e.g., to figure out the right order of queries to a database. Or in your case, I imagine you need to look closely at the source code to figure out how to best split code between GPU and CPU. So this is one of those features which is (a) really useful, but (b) opens the door to screw up modularity unspeakably badly.

We can be polite and call it a "research opportunity". :)

e.g., to figure out the

e.g., to figure out the right order of queries to a database.

How would you imagine using syntax to determine the right order to send queries to a database? And in what sense are you using the word 'right'?

I could perhaps see using syntactic reflection to batch statements together, which is what most object/relational mapping tools do. Googling for a good explanation came up with this search result. There are other kinds of syntactic rewrites one might wish to apply, though, such as protecting the capability to add a column to a table, since in most database engines this will cause a table-level lock while the rows are being materialized with their default values. For example, in Postgresql, "Adding a column with a non-null default or changing the type of an existing column will require the entire table to be rewritten. This may take a significant amount of time for a large table; and it will temporarily require double the disk space." Before the syntax is interpreted by the base-level interpreter, a meta-level interpreter (that has know knowledge of the fact the user does not have the capability to add the column) rewrites the expression with an equal intention. Also, statement batching is not equivalent to coalescing statements, which could be another trick (in addition to using quenching to activate/deactivate data sources via mappings).

Another trick, commonly used by Object/Relational Mapping tools with support for "lazy loading" is to recognize when you only want a record count, and to automatically transform the query so that the object materializer is only invoked with a constructor and/or getter/setters that populate the Count field. (Strictly speaking, I dislike this mapping structure, because it is working around the static production rule limitations inherent to a simple object class system, bloating object's with option-type-like fields and all the problems that go along with that, including further call-by-need evaluations that make performance modeling non-compositional. A much better solution is to use abstract factories to build a new static production system; this can be done within the language as an embedding [1] or using a special sublanguage.)

A more risque technique that I've attempted in the past and dropped support for is converting a query that returns a single row on a candidate key into a query that returns a single row based on the primary key. This allows you then to elide the entire query subexpression for cases where you want search results on a single table and remove an entire table's worth of logical reads. A similar technique goes in the opposite direction, converting a primary key by looking for its candidate key value.

We can be polite and call it a "research opportunity". :)

I prefer easy to read, easy to write, maintainable code over a "research opportunity", any day of the week.

Don't ever look at the code

OTOH, sometimes you really do want to look at the syntax and decide what to do -- e.g., to figure out the right order of queries to a database. Or in your case, I imagine you need to look closely at the source code to figure out how to best split code between GPU and CPU. So this is one of those features which is (a) really useful, but (b) opens the door to screw up modularity unspeakably badly.

Dynamically scoped variables were another mechanism fitting into this categorization. Luckily, the choice isn't between dynamically scoped variables and no variables at all. Similarly, I think there are principled ways to get the above features without baking reflection into every function.

simple truths

There have been many posts about encodings, determinism, compilers, ASTs etc. where it is easy to get lost. Let me try to put the main points in a smaller setting. All of this is covered in the paper, too.

Schonfinkel originally had five primitive combinators but then it was noticed that these can be *represented* using just three, S, K and I. Indeed, I can be *represented* by SKK. There is no need for any encoding here. For *any* SK-combinator M we have

IM = SKKM = KM(KM) = M

Further, SK-calculus is *combinatorially complete* (as defined in Kleene's book) in the sense that it supports lambda-abstraction. So, if one defines a meta-function

G(x) = M

then there is a combinator that *represents* G. None of this involves any encoding, or Turing machines etc. Now the name "combinatorially complete" suggests that any combinator you care to define can be *represented* using S and K, just like the lambda-abstractions. However, consider a pattern-matching function

G (O) = M if O is S or K
G (PQ) = NPQ if PQ is of the form SX or SXY or KX

This is a perfectly good symbolic computation (it respects the equality of combinators), but it cannot be *represented* by a combinator. In this sense, SK-logic is incomplete.

Happily, we can introduce a combinator F that will *represent* all such functions G above. The corresponding reduction rules are Turing-computable, and so can be simulated by an *encoding* of the resulting SF-calculus into SK-calculus. However, this does not repair the weakness of SK-logic noted above. Encoding is weaker than representation, in the sense given above. The function G is meant to work with arbitrary combinators, not merely the encodings of some other calculus.

So the main take-home message from all of this is that SK-calculus is not as complete as we thought, and this incompleteness extends to lambda-calculus. This is why we developed pattern calculus. The point is not merely theoretical. SF-calculus is able to support generic versions of the queries addressed to databases, such as generic, structural equality of normal forms. This may seem a disadvantage from a lambda-calculus view, where extensional equality is preferred, but for data structures this is the way to go.

No surprise

Let me start by saying that I've been interested in your pattern calculus work for a while and your SF-calculus is interesting and may be very useful for studying a number of phenomena. There is no dispute on the value and interest of the work.

However:

So the main take-home message from all of this is that SK-calculus is not as complete as we thought, and this incompleteness extends to lambda-calculus.

Even after looking at the SF-calculus, I still think that the SK-calculus is complete in exactly the same way I thought it was before.

Essentially, SF-calculus is an SK-calculus that can inspect its own structure, and it comes as no surprise whatsoever that SK and LC, as they are normally used and defined, don't have this ability.

SF-calculus may allow the more direct and convenient expression of properties that are of interest to the people who use it, thanks to this ability. (i.e. it has a kind of enhanced "expressiveness")

But if we take your notion literally that is is more computationally powerful, then it is equivalent to saying that a PL with macros has more computational power than one that doesn't.

Some people might like the sound of that, but many others around here, including me, would put up a fight. ;-)

This might all be a failure of communication. What might surprise one person might not surprise another based on a different expectation and understanding of how things were before the "revelation".

Since to me there is no surprise, your expression of surprise makes it sound like you are making a bigger claim than the one you may in fact be making.

This doesn't diminish the interest or utility of pursuing the study of SF, it just derails the conversation. ;-)

Someone should come up with a better term

SF-calculus may allow the more direct and convenient expression of properties that are of interest to the people who use it, thanks to this ability. (i.e. it has a kind of enhanced "expressiveness")

I use the term expressiveness also in this manner of, i.e. not equivalent to computational equivalence but to the sententious expressive power of statements in a language.

Someone should come up with a disambiguating term once.

yes and no

Hi Marc,

I'm glad that you find SF-calculus interesting but there are still some crossed wires.

Essentially, SF-calculus is an SK-calculus that can inspect its own structure, and it comes as no surprise whatsoever that SK and LC, as they are normally used and defined, don't have this ability.

Great, agreed.

But if we take your notion literally that is is more computationally powerful, then it is equivalent to saying that a PL with macros has more computational power than one that doesn't.

This is not it at all. A macro is a new operator O that is a name for some combinator M that already exists in the host calculus. That is, if the meaning of O is given by some symbolic computation then M represents this computation. Now the operator F cannot be represented by an SK-combinator; it is not a macro.

Once again, we are not denying that SF-calculus can be encoded in a Turing machine, or SK-calculus, but this is a weaker statement. An encoding of SF into SK-calculus gives the power to factorise to the compiler (no surprise there) but SF-calculus gives this power to programmers. This is a big difference.

Let me see...

A macro is a new operator O that is a name for some combinator M that already exists in the host calculus. That is, if the meaning of O is given by some symbolic computation then M represents this computation.

What about macros expanded in an environment where the terms in that macro are shadowed? This is what hygeine is partly for. So if the meaning function of a gensym macro is context-dependent in the general case (which it is), how can an SK-combinator even represent that directly?

Macro macros, not micro macros

A macro is a new operator O that is a name for some combinator M that already exists in the host calculus.

Rather than simple substitution macros, I was thinking more along the lines of Lisp macros , i.e. something where you could rewrite conditionally on the syntactic structure of enclosed code. I think in this case the parallel still holds.

An encoding of SF into SK-calculus gives the power to factorise to the compiler (no surprise there) but SF-calculus gives this power to programmers. This is a big difference.

Which puts us firmly in the territory of "expressiveness" as I understand it.

Thanks for the clarification. I look forward to reading more.

Leibnitz equality

Rather than simple substitution macros, I was thinking more along the lines of Lisp macros , i.e. something where you could rewrite conditionally on the syntactic structure of enclosed code. I think in this case the parallel still holds.

I don't know enough about Lisp macros to comment in detail (or about shadowing, hygene, etc.) but SF-calculus has a good static semantics that is not conditional in any way. Reduction is confluent, and one may substitute equals for equals at any time (Leibnitz equality). Nor is there any gensym, though it is tempting to introduce it.

Practical Application

I dawned me that I have thought about this stuff before.

A nice practical application of an SF like calculus is using it for algebraic languages.

I.e. normally, you'd encode list types and concatenation operators. From that, you'ld know that some invariants hold. Like, where ++ is list concatenation:

(x ++ y) ++ z = x ++ (y ++ z).

Now the above invariant has some merits since the right-hand side is, under the normal trivial definitions for the list operators, faster than the left-hand side.

I guess in SF this law is expressible as a function?

[Looked at my hands. Swapped right for left. ;) ]

expressing laws in SF?

SF-calculus is a computing engine. It may well be used to develop a more powerful theorem-prover one day, as ML was designed to do, but currently it doesn't actually express laws.

I know that..

But some laws, like that one on concatenation can be seen as rewrites. I.e., (x++y)++z should be rewritten to x++(y++z), for performance. Now normally a programmer will make these optimizations by hand, but in the SF calculus a law like that is, I think, expressible directly as a rewrite on terms - since you can peel those apart. I.e., for a law like that to be representable directly in a calculus you'll need some introspection since you'ld need to recognize specific function applications - which is possible in SF not in SK - if I understood the above posts.

[I shouldn't have called them laws, I agree. Meta-rules on computations? ]

computing versus proving

But some laws, like that one on concatenation can be seen as rewrites. I.e., (x++y)++z should be rewritten to x++(y++z), for performance. Now normally a programmer will make these optimizations by hand, but in the SF calculus a law like that is, I think, expressible directly as a rewrite on terms - since you can peel those apart. I.e., for a law like that to be representable directly in a calculus you'll need some introspection since you'ld need to recognize specific function applications - which is possible in SF not in SK - if I understood the above posts.

For normal programming ([1,2,3] ++ [4,5]) ++ [6] reduces to [1,2,3,4,5,6]. That is, ++ is a function, or operator, that does not appear at the head of a closed normal form (when fully applied). To express the law requires a data type in which the expressions are first-class, so that ++ is a constructor and the indeterminates x,y and z are first-class expressions. Then the laws can be expressed as pattern-matching functions.

With these caveats, you are right. To put it another way, much of the program transformation that goes on during compilation can be seen as the application of such laws, and these can indeed be viewed as pattern-matching functions that act on program syntax. Freeman Huang led some preliminary work on this, using bondi to check Java byte code.

http://portal.acm.org/ft_gateway.cfm?id=1188992&type=htm&coll=GUIDE&dl=GUIDE&CFID=15151515&CFTOKEN=6184618

However, some care is required to add indeterminates to the calculus. This will be a key step in building a theorem-prover upon this approach.

Link broken, here's a

Link broken, here's a working one.