Lambda the Ultimate seems an Overstatement (not)

As far as I know, no Turing machine and no Chomsky Type 0 or Type 1 language uses lambda expressions. In fact, my analysis indicates that Type 0 and Type 1 languages cannot use lambda expressions, because lambda expressions presume a syntax by design. Doesn't a context-sensitive language determine syntax based on run time state and the content of the expression being processed? If so, that precludes specifying a syntax at design time.

Consequently, lambda expressions are a convenience or optimization of techniques used by Turing machines, reading (i.e., getting) arguments from a program stored on its tape. That's why LtU seems an overstatement to me.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Grammar-based Computation

Well, there are plenty of richer, more expressive models of computation (compared to lambdas) based upon grammars. Two-level or Van Wijngaarden grammars offer straightforward representations of logic programming, for example. I've been studying the concepts recently, so I'll point you to another post for links and discussion.

Lambda expressions are restrictive compared to grammars, which is often a good thing - it's easier to grok lambdas than to grok a Van Wijngaarden grammar, for example. But to say that "no Turing machine and no Chomsky Type 0 or Type 1 language uses lambda expressions" seems wrong. If you're examining languages based upon their power, certainly there are plenty of Type 0 and a few Type 1 languages that 'use lambda expressions' - local substitution-based computation. If you restrict your consideration to grammars with syntax, then you might look at attribute grammars, which effectively bring lambda patterns into the grammars.

Doesn't a context-sensitive language determine syntax based on run time state and the content of the expression being processed? If so, that precludes specifying a syntax at design time.

I disagree. Those are not mutually exclusive prospects. As example: I could use a context-sensitive grammar to specify namespaces grammatically - which names are available would be determined in context. This does not prevent me from specifying in my grammar how I can use those names. Though I could not possibly know at design-time which 'names' would become part of my grammar, I still have precise control over exactly how the grammar is extended with names. That level of control, about how the context is used in general, is entirely sufficient for specification.

Consequently, lambda expressions are a convenience or optimization of techniques used by Turing machines, reading (i.e., getting) arguments from a program stored on its tape.

This claim doesn't make any sense to me.

Lambdas are not specified in terms of Turing machines, so at most you can describe them 'relative to' Turing machines. Once you start doing that, the point is no more valid than arguing 'relative to' INTERCAL or whatever your most-painful-language-of-choice would be. In some weak sense, lambda expressions certainly are a convenience and optimization of techniques seen in INTERCAL.

When expressiveness is different, that is a critical difference, even when computational power is the same.

"I disagree. ..."

It will take some time to read "another post" and understand it. Thanks for the reference.

In the mean time, I'd like to understand precisely why you disagree. Do you disagree with my belief that a context-sensitive language determines syntax based on run time state and the content of the expression being processed?

I disagree with your

I disagree with your asserted conclusion - that syntax based upon run-time state "precludes specifying a syntax at design time".

asserted conclusion

As Leon, below, reminded me, Type 2 and 3 grammars are a subset of type 1. Thus, a type 1 language can have expressions of type 1, 2 or 3. And, I assume this fact is the basis of your disagreement with my conclusion--rightly so. My original statement was not correct. The devil is in the details; lets drill down.

Let's examine a macro that processes its arguments in a context sensitive manner. Assume preceding arguments are α, the succeeding arguments are β, and the macro expansion is γ. The context sensitive expression is represented by αγβ. The expansion γ is a string of terminals and non-terminal, and α and β are also strings of terminals and non-terminals, which are the context surrounding the macro expansion that is processed at run time.

For this example, do you contend that a lambda expression can be used to process arguments from α and β?

Partial expansions from

Partial expansions from α and β can certainly be used by lambda expressions, assuming a non-strict evaluation and support for fixpoint. Context-based lambdas are abstracted as 'Arrows' in Haskell. There is no doubt that lambdas have the computational power to process context arguments.

You can make a reasonable argument that expressiveness is different, that lambda expressions are less expressive than context-sensitive grammar-based computation. A lot of expressive power that is implicit in grammars would need to be expressed very explicitly, often as boiler-plate such as explicit event-loops or explicit CPS transforms, with lambdas. Haskell manages as well as it does not through the simple substitution logic of lambdas, but rather by relying very heavily upon typeful programming to provide context via type-based dispatch and indexing.

In any case, my objection is that it is not difficult to specify context-sensitive grammars, whose precise syntax varies but where the shape of the grammar is shaped to a specification, thus violating your claim contrariwise.

The Original Lambda Papers

Well, if you are interested in understanding where the name "Lambda the Ultimate" comes from, you might be interested in reading the original lambda papers.

And forgive me for not really understanding the context of the conversation you've been having here on LtU, but the statement seems kind of nonsensical to me. I mean, lambdas are a semantic concept, not a syntactic. You can design a programming language with a type-2, LL(1) syntax that has lambdas without any difficulty, and these are a subset of type-0 and type-1 languages.

If you are instead looking at syntax as a computational formalism; most languages (including Scheme) are equivalent to type-0 grammars. And of course, the pure untyped lambda calculus, SK combinators, Turing Machines, and type-0 grammars have an equivalent computational power, known as the Church-Turing Thesis.

So there seems to be some confusion between syntax and semantics; I realize your idea has something to do with extensible syntax, but I don't understand how this relates to your claims.

Lambda Papers ....

Thanks for the references, they will take a while to read and understand.

Context of the conversation
I recognize my limited understanding of the Chomsky hierarchy and language theory. I am disabled and home bound; thus, I am using this thread to try to learn some things I do not fully understand. I have read publications, but find a Q and A helps me really understand things.

Yes, lambdas are a semantic concept, but implemented with a syntax. Yes a Type 2 grammar is a subset of a Type 1. But, what about expressions that are the set-theoretic complement of Type1 \ Type2. What role, if any, can lambda play such a subset of expressions.

A type 2 Turing complete language can simulate a Turing machine. However, that does not mean a type 2 grammar is equal to a type 0 grammar, because a type 2 grammar is a subset of a type 0 grammar.

I am trying to understand the relationship between a context sensitive grammar and extensible syntax, if there is any.

I believe there is a link between the two. Unfortunately, my primary thought process is not verbal, and I have difficulty translating thoughts into words and vice versa. Thus, my reading is slow and my writing is poor. On the other hand, programming has always been easy for me.

Syntax versus Semantics

Well, to be perfectly honest, the syntax of most programming languages are almost-but-not-quite context-free, often for ultimately boring reasons. An example I have in mind, commonly used in network protocols and file formats, is content-length delimited strings, such as is found in HTTP responses. Another example, in Haskell and a few other languages, is the need to handle custom precedence declarations for custom infix operators. Or in C, does "foo * x;" declare a variable x that is a pointer to a foo, or does it multiply x by foo and ignore the result?

What happens if you refer to a variable that's not in scope? If you consider that a syntax error, then your syntax isn't context free. Most languages consider that some kind of "higher level" error. Depending on the language, that might be a static error, or it might happen at runtime.

Probably the most interesting non-context-free example, from your point of view, is the custom precedence rules.

What about expressions that are the set-theoretic complement of Type1 \ Type2. What role, if any, can lambda play such a subset of expressions.

This question doesn't seem terribly interesting; I could add a context-sensitive syntax feature to my language that is totally separate from lambda expressions. Or I could be evil and require all lambda expressions be prefixed with a number saying how many characters long they are; thus making the syntax for lambda expressions context-sensitive.

A type 2 Turing complete language can simulate a Turing machine. However, that does not mean a type 2 grammar is equal to a type 0 grammar, because a type 2 grammar is a subset of a type 0 grammar.

Correct, it's the difference between syntax and meaning. It's the distinction I made in my previous post of "grammar as part of a programming language specification" versus "grammar as a computational device". In practice, we can get pretty far with (pretending that) a syntax is context free. There are numerous examples of file formats and network protocols that are, for all intents and purposes, context free. (Except for that niggling issue of length-delimited data.)

I am trying to understand the relationship between a context sensitive grammar and extensible syntax, if there is any.

This is a perfectly reasonable topic. I doubt that a sufficiently interesting extensible syntax is truly context free, even in the looser sense of "context free" often used to describe programming languages, but how big of a deal is it? I doubt you really need the full power of a context-sensitive grammar, though.

Yes, lambdas are a semantic concept, but implemented with a syntax.

I'm not sure "implemented" is quite the right word; a programmer expresses them with a specified syntax, but the compiler or interpreter implements them with closures, or lambda-lifting, or defunctionalization, or maybe even some other technique. Again, you seem to be confusing syntax and meaning.

Thanks for the references, they will take a while to read and understand.

I'm not sure how exactly relevant the lambda papers really are anymore, while they are certainly very important from a historical perspective, they are written in an archaic dialect of Scheme.

There are better resources if you really want to understand lambdas and higher order functions, not least because they are available in modern dialects that you can use directly with easily obtainable implementations. I would start out by learning some combination of Scheme, ML, and Haskell, and then going further in the language(s) you like best.

And of course the best way to learn these languages are to write programs in them; maybe you could solve Project Euler problems, or write interpreters, or implement some parsers or some network servers of some kind. It sounds like the Essentials of Programming Languages would be a good book for you!

Unfortunately, my primary thought process is not verbal, and I have difficulty translating thoughts into words and vice versa. Thus, my reading is slow and my writing is poor. On the other hand, programming has always been easy for me.

Yep, I'm solidly in the same boat. ;-)

See below Example?

I hope it helps.

+1 on this

+1 on this:

Well, to be perfectly honest, the syntax of most programming languages are almost-but-not-quite context-free, often for ultimately boring reasons. [...] What happens if you refer to a variable that's not in scope? If you consider that a syntax error, then your syntax isn't context free. Most languages consider that some kind of "higher level" error.

if only because that's precisely the same point as I had tried to make earlier, in that other thread on a related topic (I'm just recalling here FWIW).

No lambda calculus uses tape

As far as I know, no Turing machine and no Chomsky Type 0 or Type 1 language uses lambda expressions.

Careful that you don't assume your conclusion. It would be just as meaningful to say "As far as I know, no lambda calculus uses a tape." While both statements may be true, neither implies anything about whether one formalism is more fundamental than another. As Leon pointed out, as models of computation, they're equivalent (the Church-Turing thesis.)

The issue of syntax is a red herring. You can express lambda calculus using diagrams, if you like — the evaluation strategy known as "graph reduction", used e.g. by Haskell, exploits this.

One reason Lambda is a contender for the "ultimate" title is that lambda calculus is a powerful combination of low level and high level semantics in the same simple language: it has a very small number of rules which support a very high level of abstraction. In this respect it's much more practically useful than classic Turing machines, especially with a few natural extensions of the kind you find in the lambda-based functional languages.

Also, decades of research have given us ways to efficiently transform lambda calculus into the quaint, vaguely Turing-machine-like languages used by most semiconductor-based computing devices, so we can write code at a high level but still use the linguistically primitive extant hardware, and we're not stuck using languages whose entire data flow model is based on variations of "peek" and "poke".

Don't assume your conclusion, ....

Your advice is sound.

I agree, lambda is convenient, powerful, practical, expressive, efficient, etc. Math (i.e., f(x)) has used lambdas for a very long time. Lambda is great, and there are many reasons to continue using it forever.

On the other hand, peek and poke seem to have capabilities that lambda does not when used for context sensitive processing. Perhaps I am wrong, but so far no one has convinced me otherwise.

Abstract away the itch

On the other hand, peek and poke seem to have capabilities that lambda does not when used for context sensitive processing.

The issue you have in mind is presumably one of expressiveness or convenience (since it can't be a difference in computational capabilities, by Turing equivalence.)

Specifically, ambient mutable state is always available in the peek/poke model, whereas in a pure low-level lambda-based system, state must be passed around explicitly. Making this state-passing convenient is one of the main practical purposes of monads, though. Since monads are used very effectively to model languages with mutable state, they're guaranteed to be able to express any imperative design you might have in mind.

Furthermore, they usually do so quite elegantly, and capture the programmer's intent more comprehensively than an equivalent imperative program, for a number of reasons: they make handling of state and other effects explicit in the program's types; because of this, they also provide a degree of static checking and control over those effects; and they allow the use of effects to be modularized (e.g. via monad transformers) in ways that are impractical for imperative languages.

While there are certainly pros and cons on both sides - powerful features tend to have a cost - I suspect your perspective is likely to be due more to unfamiliarity with the possibilities than with the limitations of lambda.

On an historical note, the "Lambda the Ultimate" papers largely dealt with an extended lambda calculus that supported mutable state. Assuming I understand your concern correctly, it doesn't apply to that original usage, nor does it apply to current impure functional languages like Scheme or the ML family. It's mainly Haskell that's donned the hair shirt of purity and then figured out ways to stop it from itching.

itch, lol

Yes, Turing equivalence permits one to simulate a Turning machine to compute anything computable, but that results in using peek and poke. To me it seems impossible to use lambda directly within a context free expression. Perhaps my impression of lambda expressions is incomplete.

If implemented like Lisp, one cannot get arguments from preceding context, but that can be overcome; The difficulty I have is mapping arguments from context to lambda variables using run time state. Lambda implementations that I know of process the lambda before calling a function. It seems impossible to use function run time state to affect mapping arguments to lambda variables.

A spoonful of sugar

Yes, Turing equivalence permits one to simulate a Turing machine to compute anything computable, but that results in using peek and poke.

I don't think anyone claims that mutable state should or can be eliminated entirely - the point is that you shouldn't have to pay its rather high costs in the many places where it's not needed. In a lambda-based model, state of the peek & poke kind is just one abstraction available for use when appropriate, rather than the basis of all computation.

To me it seems impossible to use lambda directly within a context free expression.

Without specifics, it's difficult to address your concern. I second Lauri's request for (ideally) a concrete example.

But your mention of "impossible use a lambda directly" makes me wonder if you're imposing unreasonable restrictions. We don't usually program directly on Turing machines, and similarly we don't usually program directly using nothing but pure lambda calculus.

In lambda-based languages, achieving enhanced functionality of the sort you seem to getting at is often aided by the use of some syntactic sugar. For example, programming with monads directly without syntactic sugar can be a bit unwieldy, but some sugar can make it syntactically equivalent to imperative code.

As an example of this, if you're familiar with Lisp you might find Oleg's Monads in Scheme helpful. It demonstrates the building of a tree whose nodes are numbered, with the counter used to number the nodes being implicit. The function that's listed after the sentence "With these sweets..." in particular relies implicitly on a mutable counter that isn't even mentioned in the code.

This may be relevant to what you're asking about in two ways: first, it demonstrates how some syntactic sugar can recover a natural coding style from a more low-level composition of lambdas; second, it demonstrates this using an example involving ambient mutable state.

See below Example?

I hope the example helps.

This sounds interesting, but

This sounds interesting, but I cannot for the life of me understand what exactly you are saying. Could you be a bit more precise, and perhaps give a concrete example of something you want that lambda expressions aren't capable of doing?

But here's a note that may or may not be relevant: the evaluation context of a reduction can be reified in many languages as a continuation. In real-world languages the continuation is usually extensional in nature (you can only call it, not inspect its syntax), but nothing precludes having a calculus reify the evaluation context intensionally. Here's one example by Mendhekar and Friedman that immediately comes to mind, there are probably others.


My point is that I do not understand how a lambda may be used to pass arguments to a context sensitive subroutine, function, method, .... Thus, I am unable to give an example.

For natural languages, context is both verbal and social (See: An individual keeps the social context as memories, and the verbal context of a word surrounds the word both preceding and succeeding.

For computer languages, verbal context is syntax surrounding a subroutine call and social context (memory) is machine state (semantics), including state variables within the subroutine.

A Turing machine reads and writes based on tape position and character under the read heat (syntax) and machine state (semantics).

Lambda expressions in a polymorphic language use machine state external to the subroutine being called, but there is no link between lambda and state variables local to the subroutine.

If a subroutine could get its arguments instead of them being passed by lambda, the following might be done within the subroutine:

if localX=0 then V=getExpressionValue else V=getAddress fi

where, getExpressionValue does not merely get the value at the address gotten by getAddrss.

I hope this helps.

Ah Well

My point is that I do not understand how a lambda may be used to pass arguments to a context sensitive subroutine, function, method, ....

[ killed of a previous response ]

Grammars denote production rules, and with that, languages, collections of sentences.

Some grammars denote regular expressions, some grammars are context-free, some grammars are context sensitive.

Languages, collections of sentences, described by context sensitive grammars are the hardest to parse since they may encode meaning, or stated otherwise, production rules may depend on /are sensitive to surrounding expressions. You need Turing machines to parse them.

The lambda calculus is, as far as I know, not very complex and therefore described by a type-1 language. So I don't understand your comment in the main post.

A Turing machine which parses a context-sensitive sentence will store encoded meaning on it's tape. A parser for a context sensitive grammar doesn't give access to that from within a language.

It seems you mix up the meaning of programs (it may change a variable in memory) with whatever is stored on a Turing tape to parse a language. In general, you may derive a context sensitive grammar which might recognize something like "var x = 5; print(x)". It doesn't mean your recognizer gives you explicit access to the tape where the meaning of x, or that program, is stored.


It is true that context sensitive grammars may capture the meaning of programs, i.e., you can actually define grammars which will 'calculate' the meaning, or operational semantics, of expressions of specific languages. Whether that has been done with type-1 grammars for the lambda calculus, or you need type-0 for that, I am not aware of, though I assume type-1 will suffice.


A lambda expression is an abstraction of a function. I.e, we abstract from everything, even the function's name, to denote the meaning of a specific function with an expression. I.e., (lambda x . lambda y . x + y) adds two constants, it's a summation function.

There's nothing magical about that, in most languages, you can just treat a lambda expression as any other constant. So, one doesn't use lambdas to pass arguments to other functions, well not normally. At best, a lambda is a value which may be passed to another function, just like a constant true, 1, "hello world", or whatever. A lambda expression may hold a collection of bindings by mapping values in a range to values in a domain. Maybe that is what you read somewhere?

Pure Functions

Languages that eliminate side effects also eliminate the potential conditions that led to my original challenge about Lambda. Pure functional languages are elegant. However, most code in the worlds is not written in pure functional languages. Perhaps it should be, but it isn't.

Lambda is just a Vehicle

Well, to be honest, I agree with you. I don't think most programs should be written in a pure functional language. However, purity certainly has its merits when it comes to correctness and, often, conciseness. Especially for applications where the bulk of programming is reducable to manipulations on a complex data structure with little IO.

But you read too much in its widespread adoption by academia.

Lambda calculus is just a great academic intermediate vehicle for, for example, expressing complex type systems in a Turing-complete programming language with very simple operational semantics. There's really not a lot more to it than that. It may not be very good at dealing with side-effects, but any impure calculus with some hand-waving (like used for expressing operational semantics of Lisp or ML) or monads remedies that. Purity is not that big a deal, at least, academically.

At some point, when compiling to a concrete machine, lambda-calculus is too high level, but so are Turing machines. For most imperative languages the intermediate languages used to express operational semantics are always a form of an abstract assembly, with good reason. But, there are no good -silver bullet- intermediate assembly languages for generic use.

I can follow you sentiment that sometimes attribute grammars may be better at expressing semantics, but they come at a cost of needing to define semantics in terms of some operational domain, whereas most people now concluded that lambda calculus is good enough for that. (And even if so, there are not a lot of formal systems which don't have a problem with side-effects. It's just a property of math in general, math doesn't allow for side-effects unless you plumb it that way, and that's just normally not worth the effort.)

People just use lambda since there's no good alternative for expressing some ideas consicely. It's no silver bullet, except for that.

ML effects are not hand-waved

It may not be very good at dealing with side-effects, but any impure calculus with some hand-waving (like used for expressing operational semantics of Lisp or ML) or monads remedy that.

ML operational semantics is certainly not the result of "hand waving" alone. There has been tons of research on the semantics of references, and the semantics of various ML languages has been formalized at all levels (precise explanation in natural language, formalization on paper, mechanization with proof assistants).


Perhaps marco should have instead used the phrase 'hacked inelegantly' when describing how side-effects were jammed down the throat of lambda calculus to create languages such as Lisp or ML - damaging many properties of lambda calculus that would otherwise be advantageous for abstraction, refactoring, and optimization.

Monads, arrows, etc. do the cleaner job here.

Cleaning things right under the rug

The monad-based approach in Haskell isn't really that great either, at least as a way for dealing with side effects. The benefit of a side-effects monad with no loopholes is simply that it provides a reliable way to quarantine the mess, not that it cleans things up.

Since the monad concept was introduced in Haskell a lot of useful things have grown up around it, many of which are nice for their own sake independently of the embarrassing sin bin that is IO. Likewise, having fragments of imperative computation as easily-manipulated first class values is very powerful, but that just lets Haskell be an unusually nice imperative language.

And of course, I say all this as someone who very much enjoys Haskell. For all my complaints, I'm not sure what a better solution might be. It may not exist yet.

Under the rug.

I was careful to use relative terms when I said that monads do a 'cleaner' job. I agree that the Haskell approach leaves much to be desired when it comes to dealing with various communication effects.

Haskell is among my favorite languages, but I have much desire to find 'better' models for in-the-large composition. Haskell isn't providing nearly enough help here - it encourages monolithic applications, as we are forced to build new languages or computation models atop Haskell then reinvent and re-implement domain libraries atop these new languages and models. And typeful programming isn't especially amenable to integration and continuous maintenance of independently developed, mutually distrustful components. But, while a lot of people have ideas, there are very few convincing arguments for 'better' solutions.

One excellent potential for 'better' solutions are explicit temporal semantics - which support declarative concurrency, support 'purity' within any given logical instant, support declarative temporal behaviors and coordination (e.g. streaming video and audio and synchronization of the streams, in a pure and declarative manner) and allow transforms on these behaviors. But to make this work, it really needs to be pervasive like the language itself, in the sense of being used pervasively across most problem domains and libraries and integrate well with the scheduler, networking, and FFI.

Another decent option is single-assignment variables. One can model regular variables as explicit streams. This also ensures 'time' is explicit, though it can be difficult to coordinate behavior relative to temporal semantics.

I'd like to see better support for incremental computation, for reactive programming, for stability and eventual consistency, for distribution, for graceful degradation and resilience under partial failure, for robust composition of mutually distrustful components, for continuous maintenance and debugging and upgrade, for integration into the vast sea of sensors and effectors and resources. Monads aren't right for this. Nor are arrows. Nor is typeful programming.

I've my own ideas towards a 'better' programming model. But I am implementing them atop Haskell. ;-)

[citation needed]

Really? "jammed down the throat" is how you'd describe the CESK machine? I think perhaps some humility is in order here.


Besides, Lisp got its mutability 40 years before Haskell got its monads, and Scheme was 20 years before. It didn't only seem like a good idea at the time, it was a good idea. Even today, mutable reference cells, just like ML's, still have their place in Haskell, in multiple forms.

As Bob Marley put it:

We're jammin' -
To think that jammin' was a thing of the past;
We're jammin',
And I hope this jam is gonna last.


Felleisen and Friedman's CESK is a fine formalization of a mediocre computation model. Formalization is important and worthy work, but it doesn't change the essential properties of the model being formalized.

CESK's most significant concession to 'improving intellectual costs' is the separation of immutable variables and mutable references. First-class functions can still encapsulate side-effects on free variables. Developers cannot locally reason in CESK about abstraction, refactoring, and confinement the same way they could with lambda-calculus. One cannot reorder, eliminate, replicate, parallelize, or lazily evaluate arbitrary expressions without affecting program semantics in not-locally-obvious ways.

There are two broad classes of interesting properties about any computation model: its expressiveness, and the local reasoning it supports. Stateful mutation is a means to these ends - often a counterproductive one - not an end of its own. Generic λ-calculus isn't especially impressive on either of these, but has been tweaked and twiddled over time, and typing has served as one interesting improvement. CESK's λυ isn't making a good trade, but effect-typing could restore a few of the local reasoning properties it sacrificed.

Anyhow, yes, I would describe the CESK machine with the words I used earlier. Humility is always in order, of course. But, while I recognize that there is a lot of history and discovery, I believe it a mistake to evaluate models based upon that history. They should be evaluated only for their interesting properties.


I should interject a point here about formal calculi modeling imperative semantics. It's a no-brainer that impure computations will be equationally weaker than pure ones, but it isn't necessary that the modeling calculus be non-Church-Rosser, or otherwise fail to fully satisfy Plotkin's correspondence theorems. Lambda-v-CS calculus isn't Church-Rosser, but I worked out a basic strategy for fully well-behaved impure calculi in my dissertation, using a generalization of Klop's CRS's that I've called SRS's (Substitutive Reduction Systems). From the dissertation's point of view this was an incidental byproduct: I wanted to show that fexprs together with impure features would still fully satisfy CR and Plotkin's correspondences, and since the closest thing I was aware of — lambda-v-CS calculus — didn't quite do that for impure features even without fexprs, I modified its strategy.

The core idea is that impure binding constructs make no concessions to function-application-like syntactic structure, bind distinct classes of variables, and use "substitutive functions" that are different from beta-style substitution. For example, when a symbol s is evaluated in an environment e, the evaluation is replaced by a receive construct — essentially, a free variable x waiting to be replaced by a value — surrounded by a "get frame" that binds that x. The get frame then bubbles up until it encounters some state frame that gives a value for s in e, or until it encounters the outer binding frame of e and therefore knows that s is unbound. The get frame is then annihilated while doing a substitution for all occurrences of x within its scope.

Single assignment variables

Single assignment variables provide a very nice balance of purity, concurrency/commutativity, and communication - up there with explicit temporal semantics.

My complaints regard primarily multi-assignment variables that observe an implicit progression of time (based on assignments over time). To my understanding, you cannot get Church-Rosser for first-class functions sharing multi-assignment variables (at least without some sort of region typing and partitioning).


My calculi do provide Church-Rosser-ness for first-class functions holding multi-assignment variables. These calculi are untyped. Each control variable, environment variable, and get variable has a binding frame, which can rise ("bubble up") through the AST just as can the side-effect frames generated by imperative actions.

As the post below points out, Felleisen and Hieb also reconcile CR with fcf's and multi-assignment (granting that I also tackled parent environments, which is beyond what they were trying to do since lambda-v-CS isn't an explicit-evaluation calculus); that's why I also mentioned the correspondence theorems. Felleisen and Hieb use a non-bijective correspondence between calculus and operational semantics... although I admit (see below) that's not my greatest concern with their calculus.

I see. You aren't saying the

I see. You aren't saying the calculus is usefully Church-Rosser in reducing (executing) the side-effect frames, but rather in the weaker sense of generating (by lambda evaluation), gathering (by bubble-up), and ordering the side-effect frames (perhaps by CPS). (By 'usefully' Church-Rosser I mean to exclude the trivial case where reduction order is unique.)

It is unclear to me how useful that level of assurance is in practice.

People can and have argued that C is a pure language for transforming (stack,heap) pairs, but doing so doesn't help users reason about their C programs - doesn't help them refactor or abstract, doesn't readily help developers reason about modularity or IO, and certainly doesn't account effectively for concurrency (and related properties such as time and asynchronous IO).

To what extent do we gain real value from asserting Church-Rosser property up to effectful frames? To what extent does this property help developers reason locally when composing the program? Does it hold well for open modules, callbacks, and first-class functions? Does this abstraction hold effectively when developers inevitably add threading and asynchronous or memory-mapped IO?

You stated earlier that: "it's a no-brainer that impure computations will be equationally weaker than pure ones". But, like a lot of 'no-brainer' common-sense things we 'know', that assertion should be challenged. Can we develop 'impure' models that allow very powerful equational reasoning? I believe so. Single-assignment, temporal semantics, reactive dataflow, continuous or long-lived effects (such as signals or constraints), idempotence - each offer promise for modeling and reasoning about communication effects.

I've seen a few models of computation that are 'usefully Church-Rosser' even in the side-effects, in the sense that the program behavior is confluent when expressions of side-effects are reduced in arbitrary order. Compared to those, this notion of bubbly effect-frames is... not impressive.

Formal CR and useful CR

I'm quite concerned with ability to make local deductions; that isn't actually Church-Rosser-ness as such, of course. I do take your point that there's an important distinction to be made between the purely formal property of CR-ness, and ability to make local deductions. Purely formal CR-ness is trivially easy to provide: just make your rewriting system deterministic, and that automatically makes it CR. What's not so easy is to provide formal CR-ness together with local deduction.

I wanted a calculus whose equational theory would not make imperative devices look more non-local than they actually are; impure devices are by nature equationally weaker than pure ones, without artificially exaggerating the difference. The calculus should not, for example, require that side-effects rise all the way to the top of their scope before deductions can be made about them. If you locally set x to 3, and then you immediately look up x, it should be possible to deduce locally that you're going to get back 3 (and emit a side-effect), without requiring that the local term include the outer extent of the scope of x. By letting each imperative phenomenon (such as a set or get) be a frame that rises through the AST, it should be possible to make each deduction as locally as possible, by moving each imperative frame just far enough up the AST for the particular purpose.

I understand your goals, but

I understand your goals, but I wonder how effectively your designs to achieve them hold while using first-class functions. You mention code containing 'set x to 3', but requiring the developer know the whole content of the statements doesn't effectively generalize to modular composition or first-class functions.

IIRC, Felleisen and Friedman tackled the local reasoning issue using object capability security (albeit, disguised under the name 'our hygienic property for labels') - which is pretty good, allows you to control which code might affect a variable, allows confinement under controlled circumstances, but is by itself too weak to support reasoning about confluence of side-effects.

impure devices are by nature equationally weaker than pure ones, without artificially exaggerating the difference

I'm not inclined to agree. Certainly, the reasoning one can perform is different, but that does not necessarily make it weaker.

To make a fair judgement about strength of reasoning, you should start by comparing apples to apples - by which I mean: applications to applications.

In their 1986 CESK paper, Felleisen and Friedman start by observing that, in large pure applications, a lot of data gets piped through 'extra' arguments to accommodate the 'deep' components and modules. I have observed that this is especially an issue for dialogs, where a function's behavior requires interaction with its environment, or imposes constraints on when or how the environment must react in order to have a correct program.

The local and equational reasoning offered by pure applications is, in this case, weaker in many ways than offered by object capability model (or Felleisen's 'hygienic property for labels'), which can protect against corruption of the interaction stream. This positive difference in local reasoning is accompanied by a reduction of expressiveness: the ability to transform an effect stream is a level of expressiveness that is partially lost for the capability model. (It isn't wholly lost: membranes and indirection can transform access to capabilities.)

Now, of course, if we jump straight from 'pure' to 'state variables with implicit time through multi-assignment', our equational reasoning will be weaker. But that's like choosing between a desert and the ocean. There exists an abundance of intermediate forms of impurity that don't come with the reasoning costs of shared state manipulation and implicit time.

Church-Rosser for \lambda-v-CS

Felleisen and Hieb (1992) [1] already provide a variant of \lambda-v-CS which is Church-Rosser and satisfies the Standardization theorem without changing any strategies.

[1] Felleisen and Hieb, The Revised Report on the Syntactic Theories of Sequential Control and State, Theoretical Computer Science, 1992

Felleisen and Hieb

The bibliography of my dissertation recommends this URL for the Felleisen and Hieb paper. My detailed critique of lambda-v-CS, both conceptual and technical, is in Chapter 8. Although Felleisen and Hieb did have to use somewhat weakened correspondence criteria, a more worrisome technical drawback is "bubbling up" schemata that cause significant disarrangement of the AST as frames move upward through it. Although I still use the terminology of "bubbling up", in fact the rising frames in my calculi don't cause that characteristic churning.


Your critique is very long, and appears to contain an attempt at a full(!) history of formalism in mathematics. However, you said in your comment above that \lambda-v-CS is not Church-Rosser, which isn't correct. I'm happy to believe that your dissertation describes the situation more precisely.

names of calculi

Evidently we're using different names for the calculi. Perhaps Felleisen and Hieb's calculus has come to be commonly called "lambda-v-CS calculus" (in somewhat the same way that "lambda calculus" today refers to lambda-K calculus)? As I recall, the later variants of the calculus from Felleisen's dissertation all originally used different names, carefully keeping them straight; on a quick recheck of Felleisen and Hieb, its full calculus seems to be lambda-v-C(d)S(t) calculus. I was referring to the calculus of Felleisen's dissertation.

names of notions of reduction

The only name given to the merged system is the name 'cs' given to the merged notion of reduction (Section 5 of Felleisen & Hieb). These days, Matthias certainly describes that as "the \lambda-v-CS calculus".

Verbs are more provocative

But, while I recognize that there is a lot of history and discovery, I believe it a mistake to evaluate models based upon that history. They should be evaluated only for their interesting properties.

That's reasonable, but using highly pejorative words to describe a model can easily be interpreted as an attack on the people behind it, particularly if that description involves actions, like "jammed", since someone had to perform those actions.

It's in that context that ignoring the historical context seems unfair, and can in fact detract from the goal of focusing on the models.

(Besides, I don't think the technical argument here is as one-sided as you believe seem to believe, but that's a separate discussion.)

If the technical argument

If the technical argument was one-sided, I wouldn't feel any need to use strong words. I have long argued against hybridized computation models, which offer the expressiveness of all the models hybridized... but also achieve the complexity of all the models hybridized - each model damages the local reasoning properties assured in the other models. The question of whether this is reasonable depends on how one weighs expressiveness vs. reasoning, and moves ultimately into arguments about self-discipline, modularity, trust, and automated optimization vs. hand-optimization.

Besides, "jammed" is hardly what most would consider "highly" pejorative. I could use four-letter words. I used "jammed down the throat of lambda calculus" in part for the image it evokes - that of shoving an arm down the head and neck of a lambda (λx→) to add side-effects to the body. In part, I used that phrase for the connotations of harm and feeble protest. I think it quite appropriate, and I don't apologize for those words.

The original use of

The original use of "jamming" that launched this was in reference to Lisp and ML, not any of Felleisen's work. There's an interesting distinction to be made there.

I remarked above that, among calculi, I was dissatisfied with lambda-v-CS (either version :-) in that the schemata cause churning of the AST structure as a frame "bubbles up" through that structure. The source of this churning effect seems to be that impure frames imitate the structural form of function application, even though what they're doing is different from function application. My calculi use impure frames that tend to be declarations enclosing scopes of influence, orthogonal to function application, and (as a result, IMHO) when they "bubble up" through the AST they don't cause churning. This is an orthogonality between imperative features and lambda that I find aesthetically pleasing. (Incidentally, lambda is itself a separately detachable element of the calculi: I consider in ch. 9 that, thanks to explicit evaluation, my calculi could do without lambda and still work. They'd be equationally much weaker, though, because without beta-style variables they would be unable to partially evaluate the body of a compound operative prior to calling it. In my dissertation I refer to the sort of variables lambda binds as "partial-evaluation variables".)

So my concern with calculi is that the impure features should be orthogonal to lambda (and I make some practical suggestions, through my calculi, about how to do that).

But what about languages? (I'll stick with Lisp, as that's my home turf.) It's true that Lisps typically make all variables mutable, including (or even especially) those bound by lambda. But is that a complication, because in the calculi mutable variables are a separate concern from λ, or a simplification, because the language has just one kind of variable rather than two (or more)? My perspective on this is no doubt colored by the fact that, due to my explicit-evaluation approach to calculi for modeling Lisp (by which I detrivialize the theory of fexprs), I do not expect the lambda of Lisp to correspond directly to the λ of λ-calculus. For me, mutability of bindings isn't about how Lisp lambda works at all: it's about how Lisp environments work. There are clearly pros and cons to providing mutable and immutable environments, or (apparently more complicated) mutable and immutable bindings within a single environment. Personally, though, I don't see it as Lisp imposing something on λ-calculus, because Lisp and λ-calculus belong to different worlds.

Because Lisp and λ-calculus belong to different worlds

To be honest, that is why I don't read too many Lisp specific papers. It just takes too much time to recognize what is Lisp specific, or even, specific to a particular concrete machine used to implement Lisp.

As far as I get it now. To implement a lambda evaluator, you usually take a concrete stack machine (or a closure machine, which is then pragmatically implemented on a stack machine, possibly with an accompanying store/heap). Early implementations of Lisp trivially translated used lambda variables to slots in stacks/closures. Since they are slots, they can be manipulated, i.e., you end up with an impure lambda calculus where variables can be assigned to.

But that is easily modeled formally. Just add one syntactic rule 'v := E; E', and see what semantics you end up with. I.e., the simplest formal model for an impure calculus, abstractly modelling Lisp, is way more trivial than as presented in the Felleisen papers, or made concrete with CESK operational semantics.

So, I don't read (much into) them, since I have no clue what I am reading. Or at least, it seems it just all boils down to particular properties derived for particular calculi or machines which are all derived for Lisp. What's so general or interesting about that?

[ This is a great opportunity for all to explain to me why it is interesting, of course. ;) ]

Not if you want to be taken seriously

Besides, "jammed" is hardly what most would consider "highly" pejorative.

I singled out "jammed" because the use of a verb is part of what makes the comment seem as though it might be directed at people, not models. The full phrase, "jammed down the throat of," is what seems highly pejorative to me, and, as I said, more likely to be interpreted as a critique that goes beyond the model itself.

I could use four-letter words.

Not if you want to be taken seriously for the content of your argument. Which brings me to this:

If the technical argument was one-sided, I wouldn't feel any need to use strong words.

That seems strange to me. You feel that using strong words will somehow make your point if you fail to do so otherwise?

Ethos, Pathos, Logos

Strong words don't make a point, but they are useful to carry it. A good point made with feeble words might as well not have been made at all, for all the attention it will gather.

Any attack on a design can be construed as an attack on its creator by those who seek to be offended. I've seen it happen often enough (and not only for arguments I've presented). For example, an attack on Perl's syntax is suddenly a personal attack on Larry Wall in the presence of a Perl fan. I don't mind if people take offense at my argument, because to take offense first requires paying attention to it. I do mind people turning their offense into a massive sub-thread where the only point being made is how offended they are. That is a rather officious way to "jam" or filibuster the argument by decreasing the signal to noise ratio.

Anyhow, I don't believe my words are an attack on the pioneers of the model. My strong words are, rather, directed at the multitudes who blindly clone or revere the model without understanding its costs.

jamming (disambiguation)

Re: comment-62902

… jamming is the (usually deliberate) transmission of radio signals that disrupt communications by decreasing the signal to noise ratio. Unintentional jamming occurs when an operator transmits on a busy frequency without first checking whether it is in use, or without being able to hear stations using the frequency.


Humility still in order

I have no problem with judging systems based on their properties, rather than their history, and I don't think that we should get hung up on who did what.

I do think that random people on the Internet (a class in which posting on LtU automatically makes you a member of) should think a few extra times before suggesting that John McCarthy and Robin Milner are the authors of "mediocre computational models".

Humility still in order

Your attitude is quite condescending and prejudicial, assuming that LtU members - including myself in particular, I imagine - have not spent much time weighing their opinions.

humility blah blah

I do think that random people on the Internet (a class in which posting on LtU automatically makes you a member of) should think a few extra times before suggesting that John McCarthy and Robin Milner are the authors of "mediocre computational models".

I think it is safe to say that McCarthy would find fault in early Lisp's dynamic scoping and would freely acknowledge that lisp-style side effects are problematic. Strachey, Milner, Scott et al. invented a formulation of store semantics, sure, but their most profound contribution was in demonstrating that a denotational approach was a decent foundation -- not that store semantics were the ultimate anything. If you could engage them in the question today (and, yes, only one is still with us) I think it is a safe bet that none would claim that their models involving mutable state are fully satisfactory as "the right way to think about computation" ... merely that they are OK for some purposes and are among the best ways to describe those historic languages to which they've been applied.

In other words... nothing that McCarthy, Milner, Scott, or Strachey set out to do is in the slightest bit insulted by saying "Yeah, that model of computation in paper X or figure Y of book Z? Mediocre at best." The semanticists / foundationalists are properly honored for giving us a formalism in which study why those models are mediocre. McCarthy is honored by noting the informed practicality of what he did.

In general, there is really no need for a third party to around demanding "humility" and classifying some as "to be revered" and others as "random people".

A rule of thumb

In general, I find it useful to assume that when very smart people design something that I don't find appealing, that I can still learn something from it, and that probably I'm missing the reason why the system is interesting.

For example, I've never been a fan of the pi-calculus. But I try to assume that it has more to teach me, rather than the other way around.

More metaphorically

If one stands upon the shoulders of giants, it is easy to notice when the giant stumbles--but that doesn't mean the ride isn't still appreciated!

Kay's lament

Re: comment-62914:

I do think that random people on the Internet (a class in which posting on LtU automatically makes you a member of) should think a few extra times before suggesting that John McCarthy and Robin Milner are the authors of “mediocre computational models.”

You just can't criticize anything.

Hand Waving not Allowed

precise explanation in natural language

That I would call hand waving, especially since the pure lambda calculus is pretty precise. Just a matter of perspective, I didn't want to imply you can't give precise semantics to impure calculi. But, point taken.

A solution

If I understand correctly, the question is how to provide an ambient context to a lambda, without explicitly passing that context as an argument to the lambda. What I've already written about monads is one answer to that.

In Haskell, your example could be written as follows:

myContextSensitiveFunction localX = do
   V <- if localX == 0 
           then getExpressionValue 
           else getAddress

I've made localX an ordinary lambda argument here, but it could also be an internal local variable defined with let or where, it doesn't matter.

Either or both getExpressionValue and getAddress could obtain their values from an implicit context, and depending on the types you choose to use, that context can support mutability, e.g. with something like putAddress or putExpressionValue.

In the typical implementation, this depends on a clever mechanism which involves wrapping values in lambdas that take a context argument, and applying functions to these wrapped "monadic" values using a function called "bind". Bind takes care of the mechanics of propagating the context through the computation. Functions that access the context, such as getExpressionValue or getAddress, return those wrapped monadic values, so that the "bind" machinery can pass the current context to those values when they're used.

The specifics of how this works are described in the "Monads in Scheme" link I gave. There's also no shortage of monad tutorials on the web — sigfpe's You could have invented monads is worth a look.

Note that monads are just one solution to this. The basic idea of composing lambdas to manage a context can be implemented in all sorts of ways. Arrows are another lambda-based abstraction that can do this.


I will read your link on monads, and try to understand your example. ATM it is not clear how Haskell binds arguments processed by getExpressionValue or getAddress to a lambda variable.

In my example getExpressionValue or getAddress are functions that get an argument for myContextSensitiveFunction. Moreover, the function did not have a lambda because I do not understand how a lambda variable can be controlled by an internal local variable such as localX.

I have read your link on monads and still do not understand the monad mechanism.

In my example, I might have made localX change from 0 to 1 each time myContextSensitiveFunction was called. If I understand correctly, Haskell cannot do that because it is a side effect. Moreover, Haskell functions cannot read data from the outside world to set the value of localX. Thus, Haskell's purely functional nature seems to exclude it from emulating my example.

Is it possible for you to use Lisp, which allows side effects, for your example to illustrate how monads cause getExpressionValue and getAddress to get an argument for myContextSensitiveFunction.

Mutability in an immutable world

By way of analogy, if you consider that you live in a 3D world, then objects in the world move around and their properties change. If you were to "step out" and look at the world as a 4D space, where time is one of the dimensions, then you no longer require a primitive notion of change. Change is just what happens when you move along the time dimension.

Similarly, in pure languages, change is not a primitive notion. All values are immutable, and a "changing object" is modeled as a sequence of values (states). Monads provide a pure encoding of functions that are sort of indexed by "now", so that they end up looking very close to what you'd write in a language where mutation is baked in and happens during evaluation.

This is all very hand-wavy and not quite right, but maybe it will help you get oriented.

Thanks Matt

I accept that a pure functional language with lambda is a solution using lambda to implement a type 1 language. In other words, it is a solution that I could not previously imagine.

Recording history is not the same as operating in the present

Recording history is not the same as operating in the present.

Users of a shared account are not controlled by each other. They independently make deposits and withdrawals that have indeterminate effects. Of course, the history of what happened is immutable.

Concurrency is not computation

Software that drives concurrent hardware is like software that drives audio hardware - beeps aren't computation either. Monads can be used to specify the behavioral response of computational units in a system. but the assignment of those behaviors to the ATM hardware at various bank branches is not computation. Preferring an actor inspired model of computation because software will be deployed on hardware used by real actors in the world is somewhat analogous to preferring a sound wave inspired model of computation when writing an MP3 player.

I disagree. Calculation and

I disagree. Calculation and communication are each significant aspects of computation. A model of calculation alone isn't a computation model: every computation model involves an communications component for advancing a calculation through time - forms of substitution, propagation, flow of control and information. Communication includes such aspects of time, concurrency, modularity. These issues are no less significant than the calculations aspects.

In any case, you and Hewitt seem to be confusing the critical factors in the 'account' example. It isn't concurrency. The issue regarded modularity - i.e. the independence of the agencies using the account. We can indeed model time in the manner you described in your earlier post, but expressing an account and modularly expressing multiple agents operating upon it is non-trivial in the FRP model.

And, barring special requirements for integrating with hardware (such as buffered DMA), and assuming equal familiarity with all models involved, I would indeed expect greater productivity from writing an MP3 player in a model of computation that has support for the sort of continuous time and analog signals that might be 'inspired' by sound waves, such as functional reactive programming or Max/MSP. Indeed, certain features such as crossfade or volume regulation would likely be easier in those models. I certainly don't believe Actors model would be ideal, nor would monads.

I consider calculation and

I consider calculation and computation to be synonymous. A model of computation generally involves moving through states until an answer is obtained. References to time in my previous post were metaphorical. I wasn't talking about FRP. Also, I intended to allude to a model of computation based on a simulation of sound waves propagating through space, not just some abstract signal processing. I don't really follow the modularity issue you're raising, but claim that whatever solution you're advancing can be encoded perfectly well with monads :).

I consider any argument

I consider any argument where we express model A in model B to prove we don't need model A to be fallacious. I claim that whatever monadic encoding you're advancing is subject to this fallacy :).

The 'communication' element is that some system needs to be 'moving the calculation through states'. States never move by themselves. Calculation and computation are not synonymous. Calculation is simply one outcome of computation, as is communication.

I'm not arguing that monads

I'm not arguing that monads are the end of language research. This subthread is in response to Carl Hewitt basically claiming "monads can't do that".

Calculation is simply one outcome of computation, as is communication.

As is my speaker beeping, but I don't consider that computation either.

Category Mistake

As is my speaker beeping, but I don't consider that computation either.

An outcome of '10' isn't computation, either.

The question to ask isn't whether a beeping speaker is computation, rather whether it is an intermediate state in a computational process.


I have read your link on monads and still do not understand the monad mechanism.

It's not unusual for it to take some time to understand monads, particularly with a mostly imperative background. If the overused term "paradigm shift" is applicable anywhere, it's here.

That's why there's such a plethora of monad tutorials on the web, to the point where there are now anti-monad-tutorial tutorials.

I mention the latter link not just for fun, but also because it points out "the critical role that struggling through fundamental details plays in the building of intuition," which the author suggests is why the multitude of tutorials fail to magically transfer an understanding of monads into the readers' brains.

That's one reason I linked to sigfpe's "You could have invented monads" - it gives nine exercises, and working through those should improve one's understanding.

If I understand correctly, Haskell cannot do that because it is a side effect.

I take it Matt may have already convinced you otherwise, but in case not, since monads allow side effects to be modeled in pure code, Haskell can do this. Which is good, because it also means we can create tractable mathematical semantic models of programs and programming languages, which is pretty central to the whole business of PL theory.

In my example, I might have made localX change from 0 to 1 each time myContextSensitiveFunction was called.

I might have misunderstood your example slightly, then. You want localX to be the ambient value that can be varied externally? I'm going to assume the answer is yes, since that makes implementing an example in monad-challenged Lisp slightly simpler (just a single state value instead of a composite value), and it makes the goal of the example a bit more obvious (vary localX and watch how the function's behavior changes.)

Is it possible for you to use Lisp, which allows side effects, for your example ...

I've posted a complete working example in Common Lisp (tested with SBCL). The comments should help explain what's going on. The general monadic infrastructure is adapted from Monads in Scheme, which gives a much more detailed treatment of what's going on.

In the example, the local-x value is obtained from ambient state, and used to decide whether to call get-expression-value or get-address. The latter functions have been implemented as stubs. They could, of course, also be implemented to obtain values from the ambient state, but that would complicate the example.

[Edit: Please don't mistake this example for a good way to learn about monads. Monads started out as mathematical objects, and are most precisely described by a mathematical definition. The specific implementation of a state monad in a language like Lisp may be somewhat instructive, but certainly will not help understand what monads are about in general.]

Moreover, Haskell functions cannot read data from the outside world to set the value of localX.

I/O is a slightly different subject, but again the answer is not as simple as "can't do that". A fundamental point here is that languages like Haskell implement an abstract model of computation, which must then be mapped onto some underlying computing device in order to execute. (The same can be said of mainstream languages, except that their "abstract model" is unduly influenced and constrained by the nature of extant computing devices.)

In Haskell, the IO monad is used to integrate I/O in such a way that computations still obey some algebraic laws, the monad laws.


I shall examine it.

Monads don't work for concurrency

Monads don't work for concurrency because they force sequential execution.

See Actor Model of Computation

Re: Monads don't work for concurrency

Monads [...] force sequential execution

Oh really?

Monads don't work for concurrency [...]

That is a pretty absurd assertion in the context of the current GHC implementation. For example, there is the STM monad, Control.Concurrent, MVars, atomicUpdateIORef, etc etc etc.

Also, concurrency has been dealt with in monads that can be implemented in the pure lambda calculus numerous times; consider A Poor Man's Concurrency Monad, or my own Iteratees and a (very) Minimal Model of Shared-State Concurrency.

This isn't to denigrate the importance of the Actor Model, which has had many wildly different interpretations over the years. Indeed, you can implement many different models of concurrency using monads.

"Work for"?

Monads tend to provide sequential semantics, even when evaluated lazily or in parallel. There may be some confusion regarding what it means to 'execute' a monad.

Noting that threads (forkIO, forkOS) support concurrency serves as a rather questionable argument regarding whether monads are 'working for concurrency', since it is unclear that any particular monad (other than the hand-wavy IO) is providing concurrency. Perhaps a better argument would be unifying threads and event loops in a monad. I'm currently implementing temporal semantics inside a monad... e.g. multiple events can execute in an 'instant', and updates to variables in the monad only become visible in future 'instants'. [I see you've since added comments to this effect.]

But what does "work for" mean? Certainly, more than 'can be modeled atop', lest we have a Turing Tarpit argument. There should probably be some argument for expressiveness, effectiveness, efficiency, elegance, et cetera.


Well, Hewitt's argument seems silly anyway. There is no decision to be made between actors and monads on the basis of concurrency. Rather, we decide between threads + shared variables or event loops or temporal semantics and actors.

Monads force sequencing

A more precise way to put it is the following:

Because monads force sequencing by passing around state, they limit concurrency.

That's only true if you use

That's only true if you use the state monad! There are certain monads which are *all about* concurrency. Launchbury and Elliott's paper on Concurrent Orchestration in Haskell, for example, implements the Orc DSL as a monad:

Asynchronous workflows in F# are also, in essence, structured monadically:

Yes, monads enforce sequencing by making dependencies explicit. But there's no reason that this sequencing is necessarily linear.

Monads force sequencing because you have to get the state back

In the lambda calculus, monads force sequencing because you have to get the state back that results from sending the message that performs an operation. For example, the state that results from sending a withdraw message to a shared account must be sent back so that it can be sent to the next place that desires to interact with the account.

I don't think this is true.

I don't think this is true. The CPS monad proves that monads can send the program continuation along with the message, and so need not return at all.


I think the critical point, here, was that the account is shared, and Mr. Hewitt wishes to access and manipulate it from different concurrent behaviors. A withdrawal from one concurrent behavior must be observable in another, and vice versa.

It is unclear to me how CPS (in the flavor of Control.Monad.Cont) solves that problem. The 'state' for the account will still be embedded within and synchronized by a hosting monad.

But I find Hewitt's argument questionable for other reasons. Implicitly, he is claiming that because some statements need to be sequenced by shared state dependencies, they all do. But if we control dependencies between statements, then some statements (such as operations on different accounts) may freely execute 'simultaneously' and with concurrency semantics.

I would point at monadic fixpoint as a rather interesting illustration. We could, for example, define a monad that operates on every element of its own output. Different statements in that monad could be understood as executing simultaneously. In Haskell, MonadFix is defined for State, StateT, IO, and others:

mfix :: (MonadFix m) => (a -> m a) -> m a

But we'd need to be very careful about how we express the monad, because it will easily diverge if we are not careful.

In general, we can have concurrency semantics within a monad if we control the dependencies between statements in the monad. That control might take a lot of self-discipline, extra framework-like structure, typing, so monads seem far from ideal for concurrency, but I don't 'concur' with Hewitt's claims that monads force sequencing of statements.

I asked something similar before....

As Hewitt clarified later, Users of a shared account are not controlled by each other. They independently make deposits and withdrawals. I am unclear how this point was addressed in this thread so far. May be a concrete example will help?

You have to model (or interact with) 3 or more independent sequential processes (acct holder A, acct holder B, acct manager C, ...), each of which may proceed at a different rate and yet the system as a whole must work. What's more, the immutable history of events w.r.t. this acct, as seen by each of A, B and C can be *different*! A may see her request precede B's request, B may see his request precede A's and so on.

I know how to do this in stateful languages and prove it informally (at least for simple cases like this). As an example, for each acct holder there will be a proxy process that accepts acct holder's request, gains exclusive access to accts involved and performs the operation. This must work *regardless* of how fast or slow requests arrive and in any order, so long as the balance (including virtual balance due to overdraft limits) is can handle it. I can then exercise it with separate processes modeling acct holders and generating random deposit/withdrawal requests. Or I can prove that every transaction leaves the acct in a consistent state and that all requests eventually get processed (i.e. no deadlocks).

How would one go about doing this in a pure FPL? If you leave the hard concurrency parts to C coded runtime or h/w, that doesn't count!

FP view of time

I'll direct you to a dedicated topic on the subject: A functional programming view of time.

Keep in mind that there are two primary issues here:

  • modularity: the independent expression and maintenance of account holder and account management behaviors
  • concurrency: behaviors that share services or resources - the account - may take place in overlapping time, may need to be coordinated for consistency

Neither of these properties is especially difficult to achieve by itself. It's when you need both, at the same time, that FP starts giving you trouble. You may end up greenspunning a whole concurrency model to solve both, at which point you've destroyed any right to a claim that your program is FP.

The reason I object to Hewitt's claims is that his conclusions only mention the concurrency aspect while his examples additionally require modularity. His argument, by analogy, is: a man and an elephant is quite heavy, therefore men are quite heavy. (Monads and a particular form of independence force sequencing, therefore monads force sequencing.)

Modularity and concurrency plus a lot more

Yes, the Actor Model involves modularity and concurrency plus a lot more. See the following:
Actor Model of Computation

I know what Actors model

I know what Actors model involves. But I am baffled as to why you consider that relevant to your comments about monads and concurrency.

Independent shared resources cannot be implemented by monads

The Actor Model says that concurrent, independent use of shared resources cannot be implemented by monads in the lambda calculus.

Models of computation don't implement resources

In the monadic approach, the role of lambda calculus in a distributed/concurrent system is to specify computational responses of elements. Reasoning about how the whole system will behave requires knowledge of how those computations will be invoked and how any signals they produce will be interpreted/communicated.

The same limitations apply to the actor model if you limit its semantics to modeling computation. As soon as you start prescribing how elements of your actor model correspond to real communicating actors in a distributed system, you're no longer merely specifying a model of computation.

I'll direct you to a

I'll direct you to a dedicated topic on the subject: A functional programming view of time.

I suspect he's seen that.


Indeed. I should have peeked at the OP author. But it remains a better place to ask or expand upon his questions.

Not modularity

I can come up with an example where N threads run the exact same code and interact with each other in a non-deterministic way. This non-determinism has to do with the fact that each thread progresses at a different speed (due to time slicing or running on multiple cores etc), or atomic actions at hardware level (only one can grab a h/w resource). In this example there is no modularity, just N instances of the same code. In other words, this non-determinism is a problem of communicating sequential processes. Modularity has nothing to do with it.

Hoare's CSP deals with this directly. Not sure of Hewitt's Actor model (haven't looked at it in years) but I believe it does too. Or Erlang or Go for that matter. Perhaps it is a failure of imagination on my part but I just don't see how it can be implemented in Haskell.

I am well aware of the other thread! But I felt that we had a failure to communicate deterministically :-) Which seems to be happening here as well. I presented a specific problem in the hope that someone will sketch out how to do it in Haskell but that has not happened. More and more I feel FPLs simply can not deal with this nondeterminism by their very nature but I am happy to be proven wrong.

Indeterminism is unnecessary for Concurrency

You are assuming, incorrectly, that concurrency requires (or implies) non-determinism (or indeterminism).

If you do not concern yourself with modularity, then you are free to group concurrent behaviors into a single expression of the form: concurrently [B1, B2, B3] where Bx is a concurrent behavior. The model for concurrency might be multi-threaded with yield on IO (unified threads and events), single-assignment variables with promise-pipelining, temporal semantics, incremental step-based computations, direct event-loop concurrency, or something else. The implementation of the model would be provided by the concurrently function, and would be deterministic.

Even without indeterminism, this concurrency can be leveraged to achieve a high level for parallelism - at least, assuming the points of synchronization between behaviors are relatively rare (so we can achieve a larger compute:synch ratio). Some models of deterministic, semantic concurrency will parallelize more easily than others. (Parallelism is an implementation property. Concurrency is a semantic property.)

If we were to assume modularity for concurrent behaviors, we can understand our languages as expressing a soup of several 'modules' that independently introduce concurrent behaviors to the application. Inherently, these behaviors would interact via shared references (to resources, services, processes, shared state, et cetera). Those 'shared references' imply a need for identity. But FP doesn't directly support identity. FP can emulate identity, but doing so requires a common top-level framework - something like a concurrently function or monad. But, for modularity, we assume that no such function exists. So achieving modularity+concurrency+(pure)FP results in a contradiction, at least for the kind of modularity I am discussing (which might be more strictly described as 'open' modularity - pluggable, extensible, federated).

Indeterminism is another issue entirely. Concurrency doesn't require indeterminism. A few popular models of concurrent computation happen to allow indeterminism, but that does not make indeterminism a requirement (indeed, by nature, any indeterministic model of computation allows for deterministic implementation). All concurrency requires is that ongoing behavior expressions are simultaneous, that they semantically overlap in time.

In other words, indeterminism has nothing to do with it. And modularity (that is, modular expression of concurrent behaviors) remains a significant concern because concurrent behaviors must interact through shared references, which FP does not directly support.

I feel FPLs simply can not deal with this nondeterminism by their very nature

Well, sure. A pure function is, by nature, a deterministic mapping from input to output. If we needed to model non-determinism, we could use functions from sets to sets where a set represents multiple answers. But, since concurrency doesn't require non-determinism or indeterminism, FP's inability to directly deal with it is a non-issue for the questions you have asked.

Hoare's CSP deals with this directly. Not sure of Hewitt's Actor model (haven't looked at it in years) but I believe it does too. Or Erlang or Go for that matter. Perhaps it is a failure of imagination on my part but I just don't see how it can be implemented in Haskell.

Haskell is not an especially 'pure' language at its top-level; by definition, the 'main' function is always of type IO (), and there are various idioms and frameworks utilizing unsafePerformIO for global state, single-assignment variables, future values, priority scheduling of incremental computations, et cetera. The IO monad provides cheap threads and some useful synchronization primitives (TVar, MVar, atomicallyModifyIORef). Likely, in your Haskell code, you simply won't bother with a 'pure' model for concurrency.

If you really want pure FP model of concurrency, you'd use the top-level function - perhaps a monad or arrow, perhaps temporal semantics of FRP, perhaps unified threads and events - to express concurrent semantics. Concurrent behaviors could simply be provided in a list, similar to the concurrently function mentioned above... but more likely in a dedicated 'concurrency' monad, where statements incrementally build a concurrent program that one can later 'run'.

Either way, Haskell doesn't really provide 'modularity' for concurrent behaviors, not in the same sense that an multi-agent, CSP, FBP, concurrent constraint, temporal logic, dataflow, reactive, actors, blackboard architecture, tuple-space, pub/sub, SOA, or distributed language would - i.e. there is no independent expression of concurrent behaviors within a Haskell language application.

Another way

Another way to look at your point, which I think I agree with, is that there's a basic issue of levels here. Complaining about functional programming providing a poor way of modeling concurrency because it is deterministic, while concurrency may not be, is sort of like complaining about diagrams providing a poor way of modeling control flow, because diagrams are on paper, which is flat, and control flow includes arbitrarily many branches.

FP doesn't "give" you concurrency, but it provides a set of tools that can capture it. Similarly, a pen and parchment don't "give" you Sophocles' *Frogs*, but they sure provide a set of tools with which to write it down.

cf IOSpec, which often feels like it was written just to simplify exactly these sorts of arguments:

The real question

Any Type-0 language can 'capture' another. So I tend to think that a rather trivial and uninteresting point. The difficult and interesting questions regard expressiveness, local reasoning, performance, responsiveness, modularity, composition, robustness, resilience, and various other forms of 'effectiveness'.

Complaining about concurrency being deterministic only makes sense in terms of these other 'effectiveness' goals. For example, fully deterministic concurrency will generally not have the throughput and responsiveness of indeterministic concurrency: the problem is that outputs from a deterministic concurrency model will be deterministically ordered relative to inputs in logical time even if that means progress and response will wait upon inordinately expensive or slow computations in other components. A common solution, here, is to carefully introduce indeterminism to the FP language - e.g. through an oracle whose infinite sequence of inputs will miraculously support decisions based upon actual computation times.

But we must understand the costs of an oracle, also, in terms of local reasoning, composition, et al.

It is difficult to discover the right questions, and difficult to communicate them effectively even after we find them. If Bakul Shah expressed interest in indeterminism as means to a specific end (performance, responsiveness, etc.) then I'd perhaps seek alternative means to the same end (change expensive computations or IO into incremental ones, for example), but I'd be able to accept the basic position. But just assuming or asserting that indeterminism is necessary for concurrency gets me riled up and ready for combat. ;-)

Similarly, I do not object to Hewitt having many effectiveness requirements (such as scalability and modularity), but he certainly should make them explicit - e.g. "Monads don't work for concurrency where 'work for' = supports open modularity and scalability and ...". A problem is that 'works for' has no widely accepted meaning with respect to non-functional requirements.

Not a complaint

Not a complaint though that may be how my writing came across. I am trying to understand pros/cons/limits of using a language for the purpose of modeling real behaviours. I am also interested in performance, responsiveness, code clarity, cost etc. but all that comes later!

One way I was thinking about

One way I was thinking about this is that we have sequencing so everything doesn't happen at once. In pure FP, however, and in a purely denotational FRP, everything, effectively, does.

The actor model gives you concurrency out of the box, but no more and no less of a model of sequencing than you get in the world at large -- it inherits its semantics, by default, from implementations, and controls sequencing, to the extent it does, by encapsulation/modularity.

Monads let you leverage pure FP into very explicit representation of and control over sequencing, but don't give you concurrency/modularity out of the box.

What some of us, at least want, are ways to capture all these things (I really appreciate the distinction you've made with regards to modularity) such that we can leverage from existing pure FP. So monads on their own don't get you there, and maybe they're part of the picture, but my gut reaction is scrapping pure FP as a starting point (and by implication, to me, the nice properties we can get from a semantics based on it) seems like the wrong way to go.

Even though programs can be nondeterministic in the states they go through, I'd like, in any program I can envision, at least some determinism at certain points -- i.e. the state space might explode while 300 threads are picking away at a work queue, but I'd like to be able to make some statements about what they'll eventually accomplish, regardless of any number of intermediate instruction orderings.

As I've said elsewhere, I think that the best way to give semantics to concurrency, and tame rather than erase indeterminancy, is through algebras which have many exploitable symmetries.

Nondeterminism is *inherent*

Nondeterminism is *inherent* in the real world. Without it (in a language) you may have a simpler model that is easier to reason about, and that is fine, but it would be missing an important set of applications or behaviours. We've already talked about the shared account as an example. Take Dijkstra's "Dining philosophers" problem as another example (one can indeed express it as concurrently(fork[i], philosopher[i], for i in 0..N-1) , basically arrays of N philosophers & N spoon processes. But by taking nondeterminism out you will be constraining the solution. In other words, the concurrency you talk about is a subset. And that is fine....

In circuit design one can use RTL (Register Transfer Level) as an abstraction level to design synchronous digital circuits. RTL is basically registers connected via stateless logic, driven by a global clock. Simpler to design but suffer from a few problems. FPLs sort of feel like that stateless logic! Asynchronous circuits don't have a global clock and have significant advantages over sync. logic but they are harder to design. They can be modeled by process calculi & the Actor model (so we are back to Hewitt!). Perhaps I should be just happy with this analogy and not try to think harder in trying to find a unified model :-)

These discussions *are* helping me in thinking about all this a bit more clearly.... So thank you!

Separation of Concerns

There exists a big advantage in keeping non-determinism or indeterminism separate from the concurrency model: you gain freedom to introduce computation primitives for expressing and managing determinism orthogonally to the concurrency model, without risk of competition between the language features.

There are many means to express non-determinism. You could use an oracle. You could use a probability monad. You could use a backtracking model for branching decisions. Naturally, some techniques will better support modularity and reasoning about performance or responsiveness than would others.

An indeterministic model for concurrency ties the two together. That isn't necessarily a bad choice, since you can always tame indeterminism (e.g. by providing little islands of determinism) but it risks introducing some undesirable 'semantic noise' into your models of concurrent real-world behavior. For example, Actors model is terrible for describing concurrent behavior in a chemistry or physics simulation - the concurrency provided is simply of the wrong flavor, and tends to interfere with the modeling.

Lately I've been favoring use of two orthogonal concurrency models: temporal values for the domain/decision modeling aspects, and multi-agent concurrency tied to the distribution model. The deterministic concurrency here allows me to achieve eventual consistency for decision models and information distribution even when replicating the computations - that makes it robust for partitioning and more scalable. The multi-agent concurrency needs to be indeterministic because such events as disruption or partitioning can occur at any time. Agents being concurrent allows separated partitions to continue - there is no control-flow between agents.

Most languages don't have distribution models - models that express computation as occurring at different locations. That might be something else to add to your list of concerns for modeling computation in the 'real world', right along side modularity, concurrency, and indeterminism. You might peek at Kell calculus as offering one of the more formal distribution models, E language as a less formal one. Languages without distribution models rely instead upon the OS's app-level distribution model, which is usually awful, insecure, and inconsistent.

Since you are interested in modeling properties *inherent* in the 'real-world', I suggest you include distribution in your study right along side concurrency and indeterminism. E.g. your 'account behaviors' may be occurring at different ATMs.

Since you are interested in

Since you are interested in modeling properties *inherent* in the 'real-world', I suggest you include distribution in your study right along side concurrency and indeterminism.

Indeed! I almost mentioned delay & distribution but refrained in order to keep the post short and to avoid getting into more things, given limited time. Even on a single processor, message buffering and "distribution" due to time slicing imply events might be seen in a different order by different threads. Basically each thread has its own frame of reference (as to when things "happen"). Perhaps one can write each thread's code in an FPL and reason about its individual behaviour but I think we need more tools to reason about the entire composition of a set of threads. I am not saying anything new.

Software Transactional

Software Transactional Memory -- this is actually a canonical use case for it. And one can write STM purely, parametric over some set of possible scheduler choices. This gives the semantics just fine. It just so happens that to execute the semantics with actual concurrency (an operational issue) you need a concurrent runtime.

Transactional memory doesn't scale and has lots of loose ends

Transactional memory doesn't scale and it has lots of loose ends.

Ok. I'm through.

Ok. I'm through trying to chase a moving target.

Re: Monads force sequencing

Because monads force sequencing by passing around state, [...]

Your argument asserts something is true of state monads, and then generalizes to all monads. But, as S. Clover pointed out, there are lots of monads that are not state monads!

Furthermore, to be more explicit about the point I tried to make above, my post entitled "Fun with the Lazy State Monad" demonstrates two non-sequential uses of a state monad. Not even the state monad must be sequential in the ways you think it should be!

[...] monads limit concurrency

Maybe, though I don't have any reason to believe this is true. You should read LtU's policy page, especially point number two: provide context and substantiate claims. Also, you might enjoy reading the Spirit of LtU.

Is Lazy State a counterexample?

It seems to me that a reasonable way to look at this is that if your language supports laziness, then you do non-sequential things in the sequential monad framework. I think it's reasonable to consider the monad concept itself as "about sequencing," but that doesn't mean that the things being sequenced can't be "about concurrency" or anything else.

Lazy State is a counterexample

Well, "Fun with the Lazy State Monad" doesn't have anything to do with concurrency. I brought it up only to demonstrate that monadic interfaces aren't "about" sequencing any more than they are "about" the linear usage of data structures. Monadic interfaces can enforce linear usage, but they don't have to either. Monadic interfaces can force sequential behavior, but if you execute Jones and Gibbons' breath-first numbering algorithm by hand, you'll see it is not really sequential in any meaningful sense.

The execution of Jones and Gibbons' algorithm depends on how the consumer of the result wishes to proceed. Actually "stepping" through a few possible executions using pencil and paper is a real chore though; the pointer manipulations get to be quite complicated, and I haven't found an efficient way to do this.

(A chalkboard with a few different colors of chalk might be better than pencil and paper, as it would be easier to write an "update" to what you have, and then carefully erase as you go along.)

My monadic abstraction of Jones and Gibbons' algorithm is faithful; all you have to do is inline all the definitions and simplify a bit to arrive back at the starting point. So yes, I do believe that Lazy State is a counterexample.

As for Albert Y.C. Lai's "head recursion", I can work out the result on paper using algebraic reasoning, but its clear to me that I don't have an operational understanding of the lazy state monad. And, everybody I've met who thinks they have one has turned out to be wrong so far.

(I'm really quite amazed by Jones and Gibbons' algorithm, by the way. It seems like a compelling "win" for the Squiggolist/Algebra of Programming point of view, as I'm not sure that it could have been arrived at any other way.)

Jones and Gibbons' algorithm

I'm going to think about the monads/co-recursion and get back to you later, but I wanted to comment first on this:

(I'm really quite amazed by Jones and Gibbons' algorithm, by the way. It seems like a compelling "win" for the Squiggolist/Algebra of Programming point of view, as I'm not sure that it could have been arrived at any other way.)

I haven't looked at their paper, but it seems to me the algorithm you attribute to them is equivalent to the one I wrote down based on very simple reasoning. Let ns be the indices (labeled breadth first) of the leftmost nodes at each depth level of a tree. It's straightforward to write a function that converts those numbers to the rightmost indices plus one:

ls2rs :: [Nat] -> Tree a -> [Nat]
ls2rs (n:ns) (Node x) = (n+1):ns
ls2rs (n:ns) (Branch x a b) = (n+1):(ls2rs (ls2rs ns a) b) 

Noting that one plus the rightmost indices gives the leftmost indices one level down, you can "tie the knot" to solve for the leftmost indices:

ls :: Tree a -> [Nat]
ls t = ns where
    ns = 0: (ls2rs ns t) 

Then it's straightforward to write down the renumbering:

renum :: [Nat] -> Tree a -> Tree Nat
renum (n:ns) (Node x) = Node n
renum (n:ns) (Branch x a b) = Branch n (renum ns a) (renum (ls2rs ns a) b) 

lazyRenum :: Tree a -> Tree Nat
lazyRenum t = renum (ls t) t

If you combine ls2rs and renum into a pair, it looks like you should get the same thing you have on your page. (None of this is tested)

Thanks for the comment

Thanks for the comment, I'll meditate on it.

The code I attribute to them appeared in Chris Okasaki's "Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design", which is where I first learned of it. Chris Okasaki cites Geraint Jones and Jeremy Gibbons' "Linear-time breadth-first tree algorithms: An exercise in the arithmetic of folds and zips". The original handles general relabeling over rose trees, Okasaki simplifies that to sequential renumbering over binary trees.

One clarifying remark I

One clarifying remark I should make about the code above is that ls2rs works of an arbitrary subtree. For example, renum uses it to take the left indices of its left child and move to left indices of its right child. On the other hand, ls computes the left indices of a stand alone tree. Also, I can appreciate that the original paper was more general/powerful than this. I'll stick those papers into my queue.

One concession I should make: I don't really mean that monads are about sequencing (in correspondence with Natural numbers) - I really mean they're about order. In a single thread of the state monad, you can assign a total order to every 'get' and 'put'. Each 'get' gets the state of the immediate preceding 'put' under the ordering - if there isn't a bound on the preceding 'puts' then 'get' diverges. Does that sound more agreeable?

What is "sequential"?

Right; I think Okasaki's simplification is a good topic of discussion. It still contains the "meat" of the algorithm, which I'm not sure I quite understand yet. I already knew that the list at any point is the labels for the left spine of the subtree you are at, so your comment seems like a rather plausible starting point for a derivation/explanation.

Well, in my last two comments I've used the word "sequential" in two different ways with rather different meanings, unfortunately. The second use, "sequential renumbering", meant consecutive natural numbers, whereas the first "sequential" was meant as the primary topic of conversation, as I understand Carl Hewitt's intended meaning.

I interpret this to mean in the same sense that assembly language is sequential, or C is sequential. (even though some parts of C's order of evaluation is undefined, or that most modern compilers will take two sequence points and reorder (parts of) them when possible to produce better code.)

But clearly the lazy state monad does not satisfy this concept of "sequential"; consider Albert Lai's head recursion example. There is no "first" get or put action, the lazy state monad starts at the end and works backwards as necessary to get the answer. Which probably explains why the lazy state monad is so inordinately slow when people use it as if it were an imperative mini-language.

Computatonal Representation Theorem

The limited parallelism that can be implemented in the lambda calculus using monads can be characterized using the Computational Representation Theorem.

Users of a shared account not controlled by each other

Users of a shared account are not controlled by each other. They independently make deposits and withdrawals.

Could you please work on

Could you please work on putting your replies under the right sub-thread?

Sorry about that

Unfortunately, I used the default, which I didn't immediately notice was incorrect. There doesn't seem to be any way to fix it :-(