Strict data constructors + non-strict application?

Suppose you are defining a language based on the lambda calculus, and for some reason you must have strict data constructors. Are there any compelling reasons to have non-strict application?

Comment viewing options

Non-strict application

Non-strict application includes laziness, data-parallelism, full-beta reduction. These strategies are very useful for performance. Laziness avoids unnecessary evaluation. Data-parallelism allows you to utilize more compute resources (multi-core, GPGPU). Full-beta reduction is essentially partial evaluation - the mother of all optimizations.

But you can make do without laziness semantics. Instead of infinite lazy lists, you could more explicitly support coinductive or existential streams.

If you have side-effects as part of your evaluation, then non-strict is much more challenging for developers to work with. Is evaluation of lambda applications 'pure' in your language?

See, this is why I post here

See, this is why I post here sometimes. This is a very useful reply.

Some follow-up questions and comments:

Data-parallelism. Are you thinking specifically of lenient evaluation? (See Lenient evaluation is neither strict nor lazy.) In lenient evaluation, the compiler chooses which evaluation strategy to use case-by-case, and can even parallelize it. I get the impression that programmers would mostly have to program as if using a strict language. (I need to finish reading that paper.)

Full beta-reduction. I think of this not as being part of evaluation, but as part of the equational reasoning an optimizer does. Suppose I define a strict, purely functional language with an operational semantics. Can't I use the symmetric, transitive closure of the reduction relation to justify full beta-reduction?

To answer your question explicitly: yes, it's pure.

What do you mean by

What do you mean by transitive closure of the reduction relation? Isn't the resulting structure infinite?

It's infinite, but not a problem

The fact that you know that the result is infinite means you probably understand exactly what I mean. An infinite relation is no problem, as long as you don't plan to compute it.

(Apologies if you know the following already. But I've never read anything that summarizes very well how operational semantics are actually used. I had to figure it out myself, and I'd like to save you and others the trouble.)

Small-step semantics. Say that we define a relation "-->" (read "reduces to") on terms that specifies the meaning of all the terms in a language. Suppose it's a small-step reduction relation for an arithmetic language, so, for example, "(4 * 4) + (4 * 10) --> 16 + (4 * 10)" is true. Then we could implement the language by building an interpreter: a function "interp" such that "interp(A)" returns "B" if and only if "A --> B".

Transitive closure. But this would only compute one step and be horribly slow. We'd have to apply "interp" to its own output repeatedly to get any answers. We really want to write a recursive interpreter that returns the final, unreducible value of an expression. For a standard, we need a relation that says "(4 * 4) + (4 * 10)" ultimately reduces to "56". That's the transitive closure: call it "-->*" (or "transitively reduces to"), so "(4 * 4) + (4 * 10) -->* 56". Here's a possible standard for a recursive interpreter "interp*": if "interp*(A)" returns "B", then "A -->* B".

But that's not enough specification for "interp*". It could legally return "16 + 40" when applied to "(4 * 4) + (4 * 10)", because "(4 * 4) + (4 * 10) -->* 16 + 40" (only two reduction steps). If we want it to output only values, we have to say that if "interp*(A)" returns "B", then "A -->* B" and there is no term "C" such that "B -->* C". If that's true, our interpreter is sound.

But it won't necessarily interpret everything it could. If we want the interpreter to be complete, we need to change the "if" back into an "if and only if". We can do that and be sure that "interp*" is computable because the new condition rules out nonterminating transitive reductions.

Symmetric, reflexive closure. Now suppose we want a baseline correctness standard for an optimizer: we'd like to be able to prove that when it replaces "A" with "B", that it doesn't change the value of any program that contains "B". Using "-->*", we would have to reduce and expand expressions to prove this, which would quickly get tiresome. We might as well create a relation that embodies the idea of both reductions and expansions: the symmetric, transitive closure of "-->".

We might as well make it reflexive as well, and then it'll be an equivalence relation. Call the new relation "==". If our language implements arithmetic faithfully, we could easily prove that "(4 * 4) + (4 * 10) == 4 * (4 + 10)". More generally, we could prove that "(A * B) + (A * C) == A * (B + C)" for all numeric "A", "B" and "C", allowing our optimizer to replace "(A * B) + (A * C)" with "A * (B + C)", without changing the meaning of programs. Even more generally, using "==", we can reason equationally about all language terms, and justify all kinds of interesting transformations.

That's the basic idea, though there's more to it than this. (Usually we'd add a notion of alpha equivalence: "lambda x . x == lambda y . y" should be true, for example. Also, in stateful languages, proving equivalence requires proving it in "in any context", such as all the possible contents of memory.)

--

So, here's my original question, restated. Suppose that, for a strict, pure lambda calculus, I define an equivalence relation on program terms as the symmetric, reflexive, transitive closure of a typical reduction relation. Can I use that to justify carrying out full beta-reduction?

How are you going to prove

How are you going to prove (\x.x + 2) == (\x.x + (1+1))? They may be equal as functions (in the sense of always producing the same output for a given input), but they're not equal as terms unless you choose reductions to make them equal.

Need a context to quantify over

To optimize, I'd have to quantify over all possible arguments. So I suppose an optimizer still has to consider context in a pure, strict language, because otherwise it couldn't prove that any optimization done inside a lambda preserves meaning. Is that what you're getting at?

Fortunately, for lambdas it's only immediate application contexts. (There is generally no other way to observe a lambda.) There are two cases:

1. ((\x.x + 2) n) and ((\x.x + (1+1)) n), where n is a number.

2. ((\x.x + 2) y) and ((\x.x + (1+1)) y), where y is not a number.

Case 1: ((\x.x + 2) n) == n + 2 == n + (1+1) == ((\x.x + (1+1)) n).

Case 2: Both terms reduce to the same error, if errors are defined, or to y+2. (I assume +, like all application in this putative language, is strict in its arguments.) Qed.

--

Okay, I see what's going on. Here are some equations that prove (\x.x+(1+1)) == (\x.x+2) in a call-by-name language:

(\x.x+(1+1)) == (\y.(\x.x+y)) (1+1) == (\y.(\x.x+y)) 2 == (\x.x+2)

Fascinating. So you can use call-by-name to pull expressions out of functions, alter them, and put them back in. No need for a context or any notion of observational equivalence to prove function equality.

Thanks for clearing that up.

Beta-reduction

If your strict functional language has non-terminating computations it's not sound in general to do full beta reduction. Consider the expression (\x.y) undefined and its "optimized" form y -- clearly these two expressions are different in a strict language.

I'm also confused about the data parallelism comments; the only observable effect in your hypothetical language is non-termination, and that is a commutative effect. If you evaluate function arguments from left to right, right to left, one step at a time on each argument, or in a completely random order shouldn't matter.

Wouldn't its optimized form

Wouldn't its optimized form be "undefined" if the optimizer properly preserved term equivalence, or any reasonable kind of observational equivalence?

I think you're probably right about nontermination, btw. But dmbarbour brought it up before I said the language had no side effects (apart from nontermination).

If your optimizer can detect

If your optimizer can detect non-terminating computations that appear in a strict position it is safe to replace the "entire thing" with just non-termination. I would expect the common case of non-terminating computations that appear in real programs to be better disguised than my example though, so your optimizer needs to be a bit careful with beta reduction. (I am not arguing that you should design a lazy language; laziness has a cost as well, it just appears elsewhere).

I would also like to encourage you to try to design a stratified semantics for your language just like dmbarbour suggested. Even if you get nothing else out of it, in a worst case scenario it will make the job for your optimizer much simpler since there's quite a lot of wiggle room for pure expressions.

Effectful computations

Strict, lazy, parallel, beta-reduction - the different evaluation forms are not semantic until you introduce effects (such as non-termination, errors, IO). Lenient evaluation, as defined in that paper, for example, can only be distinguished from lazy evaluation by error-raising effects (e.g. a divide-by-zero signal).

In a truly effect-free language, evaluation order is just a performance strategy. It still makes sense to speak of strict, lazy, parallel, beta-reduction, et cetera - but only as different strategies that lead ultimately to the same conclusion. Some strategies will lead to the desired conclusion more quickly or more efficiently than others.

I happen to enjoy that property, so most of my language ideas involve a sublanguage of functions or logics or grammars where evaluation can be guaranteed to progress and terminate. I (very purposefully) make non-termination more difficult to express by requiring it exhibit only in the toplevel - where it is less likely to happen by accident. But even within a terminating subprogram, it can be convenient to model an infinite system, which can be achieved using existential or coinductive types.

What effects are you assuming within your 'pure' language? (non-termination? error? FFI integration?) Why must you have strict data constructors? Would terminating semantics below the toplevel better suit your purpose?

In a truly effect-free language, evaluation order is just a performance strategy.

That is a really nice way to look things. (As long as I remember that nontermination is an effect.) Thanks.

What effects are you assuming within your 'pure' language?

Nontermination, at least. It'll probably have an "undefined" value for errors. No more.

Why must you have strict data constructors?

This is the language I was talking about in my other question here. It's a lambda calculus for expressing set operations, where the sets are pure sets as defined by the Zermelo-Fraenkel axioms plus Choice.

In ZFC, every piece of data is encoded as a set. Numbers are encoded as 0 = {}, 1 = {0}, 2 = {0,1}, and so on. Pairs are encoded as (a,b) = {{a},{a,b}}. A lambda calculus made for manipulating these things has to respect the axioms. I'm fairly sure that the axiom of foundation mandates strict set constructors. At least, I'm sure that having strict set constructors will allow me to make a semantics that respects the foundation axiom.

The foundation axiom says, roughly, if you view sets as roots of trees, with branches representing membership, that every downward path from any node terminates in the empty set. For example, there is no set A such that A = {A}, because that'd be an infinitely deep tree with branching factor 1. But you can define similar things in an untyped, lazy language, such as lists that contain only themselves. More practically, you can encode infinite lists like [x1,x2,x3,...] as (x1, (x2, (x3, ...))), but ZFC proves that set doesn't exist (with the encoding above, or any other encoding into finite sets).

I want this property: if a program terminates, its value is either a lambda or a well-founded set. Lazy set constructors would allow programs to terminate with provably non-sets.

Would terminating semantics below the toplevel better suit your purpose?

At this point, it suits my purpose to not divide the language at all. That'll only change if there's a compelling theoretical reason.

Under which conditions would

Under which conditions would you like the ability to express a non-terminating program? That is, would it be unreasonable to be able to prove well-founded structure from syntax or a little static analysis?

it suits my purpose to not divide the language at all. That'll only change if there's a compelling theoretical reason.

A two-layer languages make a lot of inherent sense because computation has two major aspects: calculation and communication. A pure functional language handles the calculations, but communication wants its own model to handle commitment, concurrency, indeterminism, time, change, and identity in a consistent and composable way across modules.

If you're only tackling the calculations aspect, then I'll grant you don't need a divided language. But I'm still left wondering where you benefit from non-terminating semantics.

Full-beta reduction is

Full-beta reduction is essentially partial evaluation - the mother of all optimizations.

I am confused about this comment: there are plenty of partial evaluators around for strict (and even impure) languages, why would you need full beta-reduction in your language for that?

Normal lazy languages ...

... like Haskell are nonstrict in their constructors and strict in their deconstructors (like pattern matching). I don't even know what it would mean to be nonstrict in deconstruction, so a language whose constructors are strict is strict tout court.

Mostly true

In Haskell, pattern-matching can be forced to be non-strict using "~":
 Prelude> let nonstrictfun (x,y) = 0 Prelude> nonstrictfun undefined *** Exception: Prelude.undefined Prelude> let nonstrictfun ~(x,y) = 0 Prelude> nonstrictfun undefined 0 Prelude> 
Thanks to auto-currying, functions can be strict in some arguments depending on the values of earlier arguments:
 Prelude> let weirdfun 0 ~(x,y) = 0; weirdfun 1 (x,y) = x Prelude> weirdfun 0 undefined 0 Prelude> weirdfun 1 undefined *** Exception: Prelude.undefined Prelude> 
Also, "~" works at any level, so we can make functions that are even weirder in their non-strictness:
 Prelude> let weirderfun (0,~(x,y)) = 0; weirderfun (1,(x,y)) = x Prelude> weirderfun (0,undefined) 0 Prelude> weirderfun (1,undefined) *** Exception: Prelude.undefined Prelude> 
Pattern-matching in "let" is non-strict by default:
 Prelude> let (x,y) = undefined in 0 0 Prelude> 
But only in the outermost pattern:
 Prelude> let (z,(x,y)) = (0,undefined) in z *** Exception: Prelude.undefined Prelude> 

Still, I regard "a language whose constructors are strict is strict tout court" as a mostly true statement. :) But that's probably because I have a hard time thinking of reasons to have non-strict application with strict constructors.