β, η, ξ ⊢ α?

I was just struck with a minor revelation. The informal explanation for α-equivalence in λ-calculus is that a function "does the same thing" regardless of the particular names that we might choose for its internal variables. But this seems like just a particular case of the general extensionality principle!

Technically, α-conversion means that a bound variable may be renamed to another one as long as no free variables are captured:

    y ∉ FV(e)
―――――――――――――――――― (α)
λx. e = λy. e[y/x]

And indeed, it turns out that in the presence of the η-rule, which expresses the extensionality principle, the above rule is derivable:

y ∉ FV(e)
―――――――――――――       ―――――――――――――――――― (β)
y ∉ FV(λx. e)       (λx. e) y = e[y/x]
――――――――――――― (η)   ―――――――――――――――――― (ξ)
λx. e  =  λy. (λx. e) y  =  λy. e[y/x]

This is of course quite trivial. However, I've read a fair number of tutorials to λ-calculus, and I don't recall having ever seen this particular tidbit mentioned anywhere. I'm unsure whether this is because I just haven't read the right papers, or the matter is so trivial that no one bothers mentioning it explicitly, or there's something wrong with my reasoning.

So, just to ease my mind, could someone please either direct me to some respectable source where the redundancy of α in the presence of η is confirmed, or else point out what I'm missing here?

Thanks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Never thought about it...

I haven't thought about this much. I may also be missing something, but if not, there's still one reason why people may prefer to state α as a distinct rewrite. The more roundabout way through η and β requires that we identify and reduce a redex beneath a λ, something which many reduction strategies will not do. So stating α as a separate rewrite allows us to perform renaming independent of the overall reduction strategy without complicating things.

But as a proof that α is valid, it seems reasonable enough.

Trivial?

This is of course quite trivial.

I'm sorry to say, but as it now stands it most certainly is not. I am unable to parse the derivation you have provided: eg. are the instances of both rule η and ξ really both the immediate premises of the conclusion? Why doesn't the premise of the instance of rule η have a label associate with the inference to which it is a conclusion? Is the conclusion really meant to contain two equal signs, and not meant look like the conclusion of your α rule?

Your discussion is interesting to me, and maybe if I was less tired I would simply figure out an inference that would serve in the place of the one you provide. But as a formal inference, what you have provided looks like nothing more than suggestive nonsense. You should really fix it.

Cutting corners

Pardon, I was just using a notational shorthand. When proving something by transitivity, especially equations, it's common to write:

a = b = c

in place of

a = b   b = c
――――――――――――― (=-Trans)
    a = c

Since my derivation already seemed far too complicated and formal for such a simple matter (it would probably have sufficed just to say "by β in the body of the η-expansion of λx. e"), I decided to use this shorthand _within_ the derivation in order to avoid adding another level to it and duplicating λy. (λx. e). Sorry if this was confusing.

Similarly, the fragment of the theory of finite sets that's required for reasoning about sets of free variables is commonly left implicit. Its formalization is easy but relatively uninteresting. I thought I was already being far too pedantic by explicitly deriving from the premiss of the α to the premiss of the η, but if you want a label for the rule by which it is derived, here's a direct formalization of the "does not occur free in" -relation:

    x ≠ y               x ∉ FV(e₁)    x ∉ FV(e₂) 
  ――――――――― (FV-Var)    ―――――――――――――――――――――――― (FV-App)
  x ∉ FV(y)	              x ∉ FV(e₁ e₂)

  x ∉ FV(e)	 
――――――――――――― (FV-Abs)       ――――――――――――― (FV-Bound)
x ∉ FV(λy. e)                x ∉ FV(λx. e) 

So the missing label would here be FV-Abs. I hope this clears up any sloppiness in my original post.

The higher-order abbreviation of transitivity

OK, with rest and an explanation of the final rule, I understand the inference now, and am less grumpy. I find it plausible that your beta rule doesn't need capture-avoiding substitution, although I haven't spent the time to convince myself of that yet.

You say: When proving something by transitivity, especially equations, it's common to write:

a = b = c

I don't think I have ever seen such a shorthand, and it has a problematic relation to the inferences above it when instantiated. I'm curious to know where this notation comes from.

If we want to achieve conciseness by combining inference rules in this way, I think that it is better to use parametric inferences, eg. a rule Trans(-,-) whose parameters name the inference rules directly justifying the premises, so that your final rule would be, with the middle term omitted, Trans(η, ξ). Such parametric rules are seen in a few places, and became first-class citizen's in Schroeder-Heister's Natural deduction calculi with rules of higher levels.

Beta reduction requires alpha equivalence

Beta reduction might require the renaming of bound variables in order to avoid capture. Thus beta reduction is sensible only over alpha equivalence classes. For example, in the derivation you provide, suppose we have e = (λy. x) so that the beta reduction you describe is

(λx. e) y = (λx. (λy. x)) y -> (λz. y)

This reduction must make use of alpha equivalence.

Given this, it does not seem sensible to say that alpha equivalence is redundant if you are using beta reduction (regardless of eta).

More specifically, it's the

More specifically, it's the "capture avoiding substitution" in the consequent of the beta rule.

Excellent point

Excellent point. This is the thing I missed.

Not quite

It is true that capture-avoiding substitution involves renaming and thus its validity is at least conceptually justified by a pre-existing notion of α-equivalence.

However, it is not necessary to define a capture-avoiding substitution operation, at least if we're interested in an equational theory and not in a directed, deterministic reduction relation. Instead of saying "substitution renames capturing variables" we can say "before substitution, capturing variables must first be renamed". That is, we let the substitution relation be undefined when capturing would occur. Then many β-equivalences would actually be joint α-β-equivalences. I.e. we can't say

(λx y. x) y  =_β  λz. y

but instead:

(λx y. x) y  =_α  (λx z. x) y  =_β  λz. y

This approach is perhaps a bit less common, but not unheard of.

But even in the presence of such primitive β (and η and ξ), α can still be derived! For the example above:

λy. x  =_η  λz. (λy. x) z  =_βξ  λz. x

and, by ξ and left-congruence,

(λx y. x) y = (λx z. x) y

Note that no renamings happen during the β.

When substitution is of this limited variety, the α-rule that was derived above,

λx. e = λy. e[y/x]

only holds when y won't get captured in e. However, we can use the very same α-rule to rename any capturing variables within e. So primitive β and η are sufficient to derive primitive α, and hence also the capture-avoiding varieties of α and β are admissible (though not derivable, due to a lack of an induction rule within the formal system).

(A pedantic note: when substitution is a partial function, as above, it is not strictly correct to say

λx. e = λy. e[y/x]

since the right-hand side might not be well defined. Instead, we should define a relation subst(x,e₁,e₂,e₃) standing for "when x is replaced with e₁ in e₂, and it doesn't get captured, the result is e₃", and the above would be properly written:

subst(x, y, e₁, e₂)
―――――――――――――――――――
  λx. e₁ = λy. e₂

Apologies to those who find the shorthand confusing.)