## how to design PL support for effects emerging from parallel non-determinism?

Let's make an on-topic place to discuss programming language relevant issues in concurrency and non-determinism. Software development in general isn't really on topic. It really needs to be about an aspect of PL design or implementation, or else LtU isn't suitable.

It seems folks designing a PL would like a language to be suitable for performance work, in a world where scaling better may imply doing more things in parallel, to get more hardware usage per unit of time in lieu of processors actually running faster. The plan is something like: well if cycles have stalled we can just use more cores. In general this seems to lead to unpredictability, which perhaps might be addressed by PL tech, somehow or other.

Something is missing in how developers specify what they think code will do, in the large. It's a common belief it will be obvious. ("It should be obvious what the code does from reading it, so comments are superfluous.") This is slightly like thinking other people will recognize a tune you have in mind when you drum out its rhythm with your fingers, when in fact they will only hear you drumming your fingers. A pattern you expect to be obvious won't be.

Maybe more than one layer of grammar would help. (Just an idea.) In addition to local language syntax, you could specify larger patterns of what will happen, in terms of expected patterns in future inter-agent transactions, whether they be objects or something else. There seems lack of current art in capturing developer intent about shape of event sequences. Sometimes I suspect a few people intend to fix this with determinism in language behavior, which doesn't seem a good fit for fallible distributed internet components.

I have nothing to post unless a remark catches my eye. If I have something interesting to say, I will. (I don't blog any more, and I'm all but unreachable except here, unless you call my phone or email my work address. My online presence is near zero. I don't even check LinkedIn.)

What specifically would you do to a programming language to address the emergent elephant that results from code specifying tail, legs, trunk, etc?

## Comment viewing options

### I'm in favor

of making intent (and algorithms, and configurations and details and plans) all explicit in HUMAN language.

In my opinion, no one should have to read code. You should make so much explicit in English that you can change and expand code without reading it.

### Natural-language programming

"Build a system that allows programs to be written in English, and you'll discover programmers can't write in English." Though clearly there are exceptions, witness the existence of LtU. One might argue it'd be better to turn programming over to people who can write English; but then, you'd still need people who can think, which is depressingly rare in its own right.

### what's a better path to lack of ambiguity in algorithm planning?

That's close to on-topic, touching upon intent and ambiguity, rather than non-determinism per se. The essence of my opening was that adding parallel features to a PL requires tolerance and/or management of ambiguity, to get best performance (lower latency and output value per power consumed by hardware). You can get determinism by more sync and discarding more results from hardware, basically wasting resources to get slight improvement. But this doesn't seem competitive.

(Funny, this reminds me of an awful interview I had a dozen years ago. An engineer — senior to me: a founder/guru type — proposed an idea that search latency might improve by doing N independent searches at once, on different threads, then killing losers after a first finishes. Basically he was willing to waste massive computing resources, because he couldn't think of another valid concurrent use instead. I did not pretend to be impressed.)

In another thread, raould asked about architecture less inclined to be hunch-driven. What causes organic mess requiring hunches? Ambiguity, piled deeply or in long chains. If you take mostly well-defined components, but with a little play in each one, then arrange them in very long chains, each contribution of small ambiguity at each interface has a chance to either amplify ambiguity or take a right angle turn off into the weeds. (Computers make great amplifiers.) You might try different tactics in removing ambiguity. Some favor an "iron rule of law" in mathematical formulation; this doesn't seem to address non-determimism in parallel systems, unless you plan to throw away most of your yield from power consumed. Some form of modules has had favor for decades, but that doesn't seem to gel into something with shorter chains of ambiguous connections. I vaguely favor processes as modules, but I don't have a good justification (everyone likes their own intuition). I expect something more scientific is possible when you can try to falsify advertised contracts of process modules, if you actually swap parts around with intent to provoke failure on edge conditions. You have to ask yourself: how gullible are you? A hard-nosed detective tries to destroy a story before allowing it might be true. Improvisation is great on the stage, where suspension of disbelief is a good tool. But if a work mate one cube over says "I'm an airplane" and makes propeller noises while holding arms akimbo, ask for clarification.

I notice many folks fail to see ambiguity in what they say, in human language and in code. I can't ever seem to work out what a bunch of code does before I find bugs contradicting its apparent plan. Nearly always I find a mixture of similar but not quite identical plans, almost fitting, but not quite. Always holes, parts put in backwards, done in the wrong order, something. Typically it looks like two or three plans with random track changes between, sometimes because of erroneous belief they were equivalent, or refactoring that intended to ditch an old plan that is incomplete.

If it was all done in natural language like English, I would expect something worse, barring a strong AI with an IQ of say 180 to smooth over things. It sounds nice in principle, except that I find it easier to communicate in code with middling engineers than in English. I say a lot of things that take at least a hundred words to express, but a typical listener wants a point in seven words, or ten. (One coworker guesses the rest of my point after seven words, based on what he feels he would say after a sentence starting that way; that I have ideas not already in his mind doesn't occur to him.) It works much better when I point at a sequence of code snippets and ask folks to guess what it means. Actually what works best is code interspersed with English in nice short clear Hemingway style sentences, saying how the dots connect. Without guidance it can look like a Rorschach blot.

Cache invalidation and naming things would still be hard computer science problems, even in English, right? (Yes, Phil Karlton reference.) In English, it is quite hard to attach names to local passages of points, and relate them to one another. Perhaps value might be in forcing everyone to recognize ambiguity is present from the start, that only gets slowly removed by refinement; an illusion one was already at a sufficient level of clarity might be harder to fall for, leading to better specs in parallel designs.

### Counterintuitive: manage nondeterminism by maximizing it.

I think that the big bugaboo in writing reliable software is the fact that we've been building abstractions of behavior we care about, by concatenating and combining behaviors we don't care about. That is, we've been eliminating ambiguity but we've been doing it by overspecifying behavior. And then building further abstractions, not just on the behavior we care about, but also on the specific data structures maintained by the parts of behavior we don't really care about, getting everything tied up in a Gordian knot.

It's very similar to accomplishing everything by means of side effects, and intentionally forcing side effects made for the sake of one calculation then be used in many others as well. One must conceive of side effects in this case as unnecessarily detailed behavioral and sequential and data-structure specifications, not just as unnecessary changing of stateful variables. It is tremendously important to NOT be telling the system to do things we don't actually care about.

Ultimately, we usually care only about an answer - in which case we need to absolutely maximize the ambiguity in methods of calculating that answer, so that it's in principle possible for an automated system to make program transformations that rearrange whatever we don't care about while preserving the behavior that we do care about. Such rearrangement can and should be done in different ways to avoid combinations that interfere with each other, require locks, and so on. Or to arrange calculations in ways that maximize synergies of efficiency and use of common substructures where desirable.

But sometimes, we care about things which are usually mere side effects - in security for example about making sure that every invocation of a calculation takes the same amount of time, reads or writes the same addresses in the same order, has the same instruction cache behavior, does not yield to another process during its operation, takes place in memory no other process or thread can read, and leaves no revealing information hanging around in memory when the calculation is over, regardless of its parameters.

In that case we STILL ought to be specifying the properties we actually want explicitly - constant time, identical variable access pattern, etc - and STILL ought to be maximizing the ambiguity of things we don't care about. For example, we don't really care what the variable access pattern IS, only that it's the same regardless of parameters.

Anyway, I think the way to manage effects from non-determinism, is to explicitly tell the system what aspects of the program we actually care about (which may be at very different levels of abstraction in different programs), and allow it to absolutely maximize all OTHER nondeterminism. We should make as many different algorithms for achieving it as possible, available as supplementary information. And if the system already knows (from a library, or from another program) a different implementation that achieves the specified constraints, it should treat that as a viable substitute for (or addition to) any procedures we happen to write.

At the very least, specifying which behavior we care about (probably with an overspecified reference routine but doing so while pointing out which qualities of its behavior matter to us and are part of the specification, not just part of the implementation), should allow automatic checking that that behavior is actually delivered. And during development, having different algorithms supposed to achieve the same thing, behave differently in any way we actually care about, would expose the fact that at least one of them contains a bug, or that we have failed to express correctly what aspects of behavior we care about. That in itself would be a powerful debugging facility.

### Generic Programming

This is what generic programming is all about, presenting an algorithm in the most general setting possible without loss in efficiency. Every algorithm requires certain properties in order to work, for example the GCD algorithm requires the thing you want the GCD of to be part of a Euclidean Semiring, requiring an integer is over specifying the algorithm.

Now so far this is a deterministic algorithm, but the principles of generic programming apply equally to non-deterministic algorithms. You would start by finding many parallel algorithms and trying to classify them into the types of parallelism they themselves require of the data they use.

### Re: maximizing non-determinism

While the idea sounds good in theory, in practice it doesn't lead to 'reliable software'. This is for a few reasons: (a) we're awful at precisely specifying what is required for correct behavior, (b) the behavior space for non-deterministic programs grows exponentially beyond what any mind (human or AI) can effectively grok (cf. Rice's theorem), (c) software systems are frequently much larger than a single person can read, study, or maintain so we frequently need to grasp software behavior without reading it (d) we tinker, twiddle, test, and tune systems to achieve the behavior we want.

The latter - twiddling, testing, and tuning - is generally limited to a small, finite amount of effort. It occurs in a small, finite number of contexts - e.g. a subprogram is only tested within a few programs, and even those upon just a few machines. It is problematic if tests come up differently every time we run them or if a subprogram will behave differently than we've come to expect when used in a different context or configuration.

Non-determinism can be useful, of course. Probabilistic models, search, etc. are great for a variety of interesting problems. But it is feasible to gain most benefits by modeling non-determinism within a deterministic computation, i.e. such that behavior is deterministic within a given context. The benefits are similar to imperative programming within pure functions.

Meanwhile, the ability to learn and grok the behavior of software components and reuse them in many contexts - together with continuous testing - is a robust, proven approach to 'reliable' software.

### Randomized choice

It is problematic if tests come up differently every time we run them or if a subprogram will behave differently than we've come to expect when used in a different context or configuration.

I agree about the latter part of your sentence, but is it really problematic if tests behave differently on each run? Wouldn't we fix them until we have reasonable confidence that they don't fail anymore (or at least that it is highly unlikely), and could that not result in better software?

For example, if a language does not define an evaluation order for the pairing construction (foo, bar) (which side is evaluated first), I think it is much better for an implementation to make a random choice at each evaluation (or possibly, for performance reason, a random choice for each static program location at compilation time), than pick a fixed implementation-defined order, because it forces users to be more robust to changes in implementation-defined behavior. So I'm saying that randomization can trade the second kind of failure (only when a context change happen (other implementatoin or new version) into the first kind of failures, which is good.

Of course my personal preference goes to specifying more: either choosing an order that we are sure is what users are going to expect, or finding a robust way to reject programs whose semantics depend on the order (that means, specifying that programs should be order-irrelevant, instead of having an unspecified order). This may be done statically, or through runtime instrumentation/monitoring (similar to live detection of parallelism races by clever scheduling choices).

Once you extend the design scope to situations where failure may happen, randomization becomes even more desirable. See for example the work at Netflix and Amazon on developing "Monkeys" that randomly exercise failure conditions for their distributed programs during testing and regular workloads -- shouldn't API integrate such randomized failures to help programmers improve robustness?

### Reproducable non-determinism

It's usually very annoying to know that a test has failed but not be able to reproduce that failure. So at least I think you'd ideally want to save all non-deterministic choices that can affect observable semantics.

### random testing

Determinism doesn't prevent randomized testing, e.g. modeling a scheduler or input program with a seeded PRNG and running with multiple initial seeds or fuzz testing with randomly-generated inputs. With determinism, you further preserve the ability to reproduce a failing test, keep it around for regression testing, twiddle code and see how results change on multiple random futures, etc..

Introducing a uniform random choice in a non-deterministic program, like you described, is unfortunately not representative of optimized implementations. You're not testing the behavior you'll be running.

### It's about controlling WHAT we control.

I'm arguing that a problem is that we don't have any explicit way to tell the system that some particular behavior doesn't matter, and we need one.

A lot of language specifications choose a particular order of evaluation of function arguments, for example. Even when the function being called doesn't need them to be specified. Now the system is laboring under a constraint completely irrelevant to the program. Now someone coming in later never knows that the argument evaluation order is irrelevant to the correctness of the program. The compiler is unlikely to be able to prove that the evaluation order is irrelevant to the correctness of the program. And so opportunities to optimize, opportunities to change optimization when other changes make a different choice better, and later maintainers' understanding of what the required conditions for correctness are, all suffer just because we weren't able to say "THIS DOESN'T MATTER." And in those cases, usually the language allows no way to say it at all, so if a later maintainer determines that it doesn't matter, it is flatly impossible to update the code to say so.

A lot of languages don't specify the order of evaluation. A lot of compilers consistently make a particular choice. A lot of programmers (mistakenly) come to rely on that unspecified choice. And a lot of programs break when they get moved to a different compiler. Ooops, we have code that's not robust. And it's all because we weren't required (or able) to say "THIS MATTERS."

Incidentally, I'm with Gasche in the belief that any time a language specification leaves something unspecified, a good compiler will produce a debug build that absolutely DOES crash in any way the spec allows. That way the programs get debugged properly and become robust for moving between compilers later. And if it can't be sure the program crashes consistently and immediately the first time it reaches unspecified behavior, then it ought to choose between allowed behaviors differently, every time it runs, in the hopes that the crash THAT causes will be, if not perfectly reproducible, at least frequent and immediate enough to find.

Annotations about what does and doesn't matter to program correctness, are IMO a sorely lacking point in language design.

### The problem is formalising

The problem is formalising what it means for a particular property of a program to "not matter". If it really doesn't matter at all then it does not affect equivalence under the set of other properties that do matter. This seems to be quite tricky to formalise. Taking a straight forward example: say the property that does not matter is evaluation order, and the property that does matter is preserving the evaluated result. For this to work in a robust manner the semantics of the language need to guarantee that the family of related programs (all rewrites that only change evaluation order) still forms an equivalence class for the evaluated result. This seems to be quite difficult to do: the complexity of the semantics rises quickly with respect to the size of the language (i.e. the size of the type system; the number of types and operations supported).

Unless there is some clever technique for describing the family of related programs, confluence of the sequence of operations that implement the program is a property that needs to be designed into the semantics explicitly. If any of the operators in the language cannot be inverted (i.e. they lose information as they are not bijective) then the set of related programs is more difficult to explore as it is not clear how to reach all of the points within it.

When the set of operations in the language is small, and each is invertable then confluence of sequences of their applications follows. But when these conditions are not met it is an uphill struggle to try to wrestle a set of confluent reductions out of the language design.

I tried this once, it did not end well. That doesn't suggest that it can't be done, but it did seem quite difficult to me.

### Specifications

I don't think we should aim to specify which properties don't matter. Rather, we should aim to more narrowly specify the properties that do matter. Evaluation order can be specified logically by making side-effects explicit or by being more precise about the contexts in which an expression has a value.

### re: controlling what you control

I do understand what you were saying. Many programs are over-constrained, and this is largely a consequence of modern programming languages.

We can create PLs that are confluent or causal commutative, that don't have these difficulties with evaluation order in particular. This might alleviate the issue a little, though it won't change the fact that many programs would continue to be over-constrained, just at the semantic layer (sticking items into a list or queue instead of a set, or using a free monads to sequence what should be commutative effects) instead of the syntactic layer.

In any case, what I was saying above is that controlling overspecification of accidental properties won't accomplish your initially stated goal of "writing reliable software". Not unless humans change considerably and tinkering/twiddling/testing is no longer a common approach to software development.

If you want to shift goalposts to "optimization"... sure, you might benefit there.

### It becomes a stability issue...

When systems encourage programmers to rely on non-spec behavior. Crap software that fails on every platform except the one it was developed with, and nobody knows WHY.

You can claim the answer is banning all non-spec behavior, but in that case you'll have to define the abstraction level of the machine and I'll find that here is either yet ANOTHER language that doesn't allow me to specify what I need to in order to write security code, or yet another language that doesn't allow me to use high-level abstractions.

You can't have a language that works at more than one of the many different levels of machine abstraction programmers need to work with, if you don't allow them to decide what part of the behavior they specify is important.

### non-spec behavior

Assume, for the sake of argument, that the ordering on evaluation was fully and precisely specified. This wouldn't change that the ordering was over-constrained, as you were describing earlier. It would only mean that people don't shift things around from one compiler to another. Essentially, you'd have a language that is more deterministic in its behavior. A direct consequence would be that programs are more reliably portable.

In a way, the very existence of 'non-spec behavior' in a language is an example of 'maximizing non-determinism' with insufficient concern from programmers. I.e. in the sense that "it is not determined what this statement will return", but they test against that behavior anyway. And what you're really complaining about, just now, is that this non-determinism is undermining reliability. For the same reasons I described earlier.

You can't have a language that works at more than one of the many different levels of machine abstraction programmers need to work with, if you don't allow them to decide what part of the behavior they specify is important.

You're shifting goalposts again (reliability → optimization → working multiple levels of machine abstraction).

And you seem to be assuming you need one homogeneous machine abstraction and a brilliant compiler that groks ad-hoc properties (like 'constant time') to tie everything together. It's easy to work with multiple levels of abstraction via EDSL interpreters or Free Monads with Algebraic Effects Handlers or a number of other means. (In a sense, a DSL is an explicit machine abstraction.) Non-determinism isn't something that needs to be maximized for this, nor even supported. Just modeled in a controlled manner.

### At the risk of moving the goal again...

I guess the goalposts keep moving because I haven't really identified my line of reasoning, or thought deeply enough about explaining it, to articulate why I believe this is directly connected to it. But I continue to think that it is, because I'm thinking about debugging tools that trawl ENORMOUS amounts of code to assist maintainers in identifying bugs in a PARTICULAR piece of code. I think that the power to NOT overconstrain it would allow us to build such tools effectively.

I'm blaming overconstrained code mostly because the extraneous (unimportant) behavior means we cannot easily discover whether instances of given sequences, routines, or programs are equivalent in ways that actually matter to anyone, or what it is that's equivalent about them.

When you're talking about one, or two, or three dozen routines, it's easy to claim that whatever gets done more than a few times ought to be abstracted into libraries - that real code should contain no idioms because there's no point in its existence if it's like some other code.

But our robustness problem arises in the context of millions of routines, not in the context of one, or two, or three dozen. The idioms that ought to be abstracted in bodies of millions of routines - because it's worth it if they get called more than a few dozen times each - run into at least tens of thousands of potential library calls.

Human programmers, and program maintainers, cannot handle the mental load of memorizing, looking up, or finding one of a hundred thousand library calls that SHOULD exist to implement particular "common" things correctly, nor keep distinct in our minds the specific and necessary distinctions between the semantics of the related abstractions that would comprise that set. Instead, when we need one of those abstractions, we reinvent on the spot six lines of easily understood, lower-level idioms stitching together calls to a simpler set of library routines whose number and separate purposes are comprehensible to a human mind.

These half-dozen-line snippets arguably OUGHT to be some library call, but for reasons of human mental limitations CANNOT be. And if a programmer had really memorized the hundred thousand library calls and maximized their use so that there were truly NO idioms in that programmer's code, it might be admirably brief but it would be completely incomprensible to any maintainer who had not (or COULD not) duplicate that awesome feat of rote memorization.

Because these idioms comprise a very significant percentage of our code, I am pretty sure that minor mistakes in reinventing them comprise a very significant percentage of our bugs.

When our code is overconstrained, we cannot easily gather statistics about use of these idioms because overconstraint multiplies them unnecessarily. This results in a half dozen instances each of a million different idioms, instead of six hundred instances each of a hundred thousand different idioms. The numbers of recurrences become too small for statistical significance, so the statistics aren't as strong an indicator when used as information to assist debugging.

The presumption is that we could have debugging tools that tell us how common an idiom actually is, or other ways to express that idiom, once we've written it. We write what we think is a bog-standard loop around some library call, an idiom too trivially distinct from the library call itself to have memorized as a separate entity. But we still expect the idiom to be "normal". Then our IDE notifies us that this is the *only* instance of it in millions of programs. A sidebar turns red indicating that this construction is very rare in code most of which is correct, and if we expected it not to be all that rare then maybe we ought to look at it again.

Or maybe we've just expressed something in an overly-complex way because we didn't think of the simpler way, but when we deconstrain it, our IDE can show us other, possibly shorter and more comprehensible, pieces of code out there in the codebase of millions of routines, that still meet the remaining constraints, making the job of reading and maintaining the code in the future simpler.

This is especially true if you're a code maintainer here trying to correct some particular error and you suspect something's wrong here where the bar you think ought to be green is red instead. With good statistical tools you make a query and discover that, while it's the only instance in the current codebase, this exact idiom has been found a hundred times before, and when debugging it has been corrected to be *this* subtly different idiom fifty times, *that* subtly different idiom twenty-two times, and has been removed from the code in some other way the other twenty-eight times. Is one of these other things correct?

Anyway, while there are also optimization benefits, the impulse against overconstraint in the context of robust code was about big data: enabling statistical tools to assist in simplifying and deduplicating code and to assist in detecting bugs.

The presumptions are:

Humans have limited memorization ability to keep track of potentially hundreds of thousands of entities in a library, and so combine simpler libraries with idioms to write comprehensible code;

Deconstrained idioms are easier to find similarities in than overconstrained idioms.

Maximizing the ability to detect intent (deconstraining) therefore maximizes the ability to detect statistical information about the purpose of common idioms.

Buggy idioms are rare compared to correct idioms in running code and therefore the quality of statistical information about idiomatic frequency can be an important debugging tool.

Detecting similarities in behavior constraints between complex, convoluted code and simpler, clearer code can assist programmers and maintainers in making reductions and simplifications that make (and keep) the job of maintaining it easier.

### big code

I'm thinking about debugging tools that trawl ENORMOUS amounts of code to assist maintainers in identifying bugs in a PARTICULAR piece of code.

I tend to divide one axis of reasoning about software into three qualitative classes:

• local reasoning - properties you can reason about locally to a subprogram in a syntactic sense (you can literally draw a circle around a subprogram and work out the properties). Mostly this is invariants, properties derived from syntactic structure and the syntax-semantics binding, etc..
• compositional reasoning - properties you can reason about with a summary about the dependencies for a subprogram (type, termination properties, etc.), i.e. without looking into the implementation details. These are inductive properties, generally including transitive properties.
• white box reasoning - properties that you can only reason about or determine by knowing deep implementation details of your dependencies. A typical example of white box reasoning is determining whether a program using ad-hoc threads and mutexes might deadlock.

White box reasoning is problematic because, at the scales of real world programmed systems, programmers simply don't have the time to read and grok their dependencies even once much less keep up with ongoing developments. If we want reliability at a large scale, it greatly helps if we can utilize compositional and local reasoning... at least for the features that matter at large scales.

One might reasonably argue that the 'summary' we're using for compositional reasoning should be somehow manifest, such that we can specify the details we actually care about for program correctness and have a compiler or linter tell us when things changed underfoot.

I don't believe 'overconstrained code' is the issue. Even if code is overconstrained, you'd be fine if you could perform assertions or automated tests to verify the properties you assume.

These half-dozen-line snippets arguably OUGHT to be some library call, but for reasons of human mental limitations CANNOT be.

You might be interested in Toomim's Linked Editing - a paper that describes several reasons programmers tend to replicate code.

Then our IDE notifies us that this is the *only* instance of it in millions of programs. A sidebar turns red

I doubt the sort of 'statistical idiom detection' you're describing is a very effective approach to developing reliable software. Statistics does have its place in detecting errors (cf. SHErrLoc). But that place shouldn't be to tell a programmer "fall in line, you hipster!" with a red sidebar on the IDE.

There are ways to reason compositionally about most 'idiomatic' code. Substructural types are among my favorite, able (together with normal types) to model and enforce any idiom you can describe with a context-free grammar (e.g. FileOpen FileOp* FileClose) and freely mingle it with other idioms (e.g. callbacks, continuation passing style) without relying on structured syntax.

### deterministic foreground, nondeterministic background

I like your point about maximizing nondeterminism when it is part of the "don't care" organization of code: what was abstracted away as not mattering. If you over-specify parts that don't matter, they'll interfere with options in achieving those that do. Curiously, dmbarbour seems to have assumed nondeterministic parts are in the intended result set, when you intend answers (results) that were not deterministic. (This is not fun to say, whether or not it's interesting.) Before I can make my point, I need to sharpen up a reader's grip on the word deterministic.

I hate that deterministic has five syllables. (I want to abbreviate it to D.) But shorter words have more than one meaning. The only advantage of a really long word is that no one wants to use it to mean anything else, and thus it's unambiguous due to lack of other usage. The word "fixed" also means the same thing when sense is non-varying, but it unfortunately also means other things like "repaired"; this only confuses when people think of it. (And most folks I know allow themselves to drift sideways to unrelated word senses, as if words are absolutes, so that all associated meanings apply, even when absurd.) Deterministic means having a fixed outcome given the same inputs, due to causal chains from past antecedents. Thus: non-varying results from given inputs. In other words, fixed results. We might say a deterministic algorithm fixes the results. We might say we unfix details that don't matter, like when an OS schedules code. But using fix as a verb this way is probably hopeless since we fix bugs to repair them, not to fasten or attach bugs in place. (Almost all similar short words mean something else too.)

When we run code there are always parts that vary and parts that don't. Usually an intended calculation doesn't vary: it's deterministic. Parts we don't care about, with no plan intended, are nondeterministic in the sense they can vary even if they don't. For example, we usually don't specify exactly when code runs, so how an OS schedules a process is nondeterministic from our view. If there are other threads in the same process, we don't care as long as they stay out of our way. Non-interfering variation is part of the don't-care space. We don't care if a compiler or processor re-orders instructions as long as our intended meaning is not changed. But instruction order has non-deterministic parts. Abstraction in general is a kind of organization of don't-care space. When we design a polymorphic interface used by multiple subtypes, we don't care what else those subtypes do, in addition to the interface, as long as it does not interfere with code using values by the interface.

We do tend to over-specify things we don't care about. Code may only care that phases A, B, and C all get done before starting D, E, and F. But most of the time a specific order for A, B, and C is put into code. You only succeed in making them explicitly unordered by putting them into anonymized todo worklists, done in whatever order was convenient in processing todo lists. When it doesn't matter, we usually pick one. The next person to work on the code, however, has to figure out whether order really mattered. (Because what are chances a comment will say?) Similarly, we order all kinds of things when we don't care, like fields in structures, or integer values of enums, and thus offsets in arrays using them as associative indexes.

(I often optimize code by re-arranging things, without having the slightest idea what the intended plan was. I can only tell I didn't change the deterministic result. This tends to upset people who want me to explain after I improve things. I tell them: sorry, the person who wrote it left no clues. That I can manipulate the don't-care space is hard for some folks to grasp. You aren't supposed to be able to change code without understanding it. But you can add and remove water, change how clean it is, etc, and the fish don't care because it was invisible to them. But what are the fish thinking? I have no idea. I can add hashmaps after seeing how keys are used without understanding what keys mean.)

Anyway, my point boils down to the fact both foreground and background roles are present when computing, where the former tends to be deterministic app code and the latter tends to be operating system, runtime, and standard library stuff. There is more don't-care quality to the background (like the exact format of those data structures over there), and non-determinism in parts where you are at the mercy of the OS. When does this happen? Will reading that file have an error? Hmm. Some disciplines like database architecture have more formal definitions about how non-deterministic parts behave; for example, transactions will appear to have happened in some serial order, we just don't know which order in advance. We operate based on arguments that progress is made if rules are followed, even if parts are non-deterministic in the don't-care space. We don't want to maximize nondeterminism in the objective result space, we want to maximize it in the don't-care background space so it can be varied to suit our optimization projections.

Dex: That's not the way textbooks use deterministic in a jargon sense.
Wil: Don't care, I'm not listening. La, la la.

### (dupe)

(replied in wrong spot)

### Actors

Thinking about this, you need a language where parallelism is fundamental, which leads to the actor model where everything is an actor at the value level. This is because parallelism is not a part of the algorithm requirements. In other words if I write a quick-sort on a list, it says nothing about whether it's inputs must be calculated in parallel or series, it simply doesn't care, except that all the inputs must be available when the algorithm starts or we get undefined results.

To give a parallel in VLSI design you write the code in a programming language like VHDL or Verilog, and then a layout tool lays the gates and wiring out automatically and optimises it. This could be for mask generation, or FPGA upload. If we consider a multi core CPU and and FPGA to be points in a configuration space, it is clear an FPGA offers much higher parallelism at the algorithm level than the CPU.

So you have to write the program in the most parallel way you can, and allow the code-layout-tool to sequence the operations as necessary for the architecture. This would likely be done separately from compilation, so you would have a binary actor format that gets linked into the final version at 'configuration time' which would be when the software is loaded into a new machine or the machine hardware is changed.

So Actors are the answer? Well almost, as I haven't yet found an actor language that meets the requirements for generic programming. ActorScript has a type system that does not have constraints on types (aka traits, type-classes) which makes it a non-starter for me. It also uses parametric-types not universally quantified types, which means you cannot pass an argument polymorphically, which again causes problems (see Rusts issues with this). There are ways around the lack of higher ranked types if you have type-classes, so that's less important, and it may be worth sacrificing to keep the monomorphisation properties.

I personally would want something that could compile to hardware or software, which would mean no unsized objects, no stack or heap, just local variables in each actor. But I think you could provide actors that model memory if that is needed.

So in summary, a PL for parallel non determinism would be actor based, with parametric-types (no subtyping), type-classes, and would assume no memory model.

### re: Actors

I'm in no way a purist, so some words irk me, including fundamental, canon, pervasive, and uniform. Each 'everything is an X' model is too limiting to me. I favor words like mixed, gradual, views, adaptive, and roles. Three key words from my headline are "support parallel effects". Support does not require a pervasive forced canonical parallel model. I need only PL ability to cope with effects, and support of form leading to use.

I see no special problem in adding more parallel support in existing languages. It only requires a concept of interacting with other domains which may or may not be concurrent. Then if those other domains actually are concurrent, it's not a problem in terms of violating language model. Once another domain is less under direct control (from local code's view), that domain can be moved to locations near or far without breaking things. Another domain might be right at hand, like a coroutine; or it might be put in a variety of successively more remote places.

Typical conventional support for other domains is integration with OS api that let's you talk to (say) nodes at a network address. But a PL usually has no idea what it means, so it's all on a developer. And the other domain must really be remote, usually; it must be another process, and the PL is no particular help in distributed semantics. There are exceptions to the rule, of course, like any language with lightweight processes. A LWP (lightweight process) model is necessarily part of runtime, and runtime is partly invisible to a PL as part of the medium in which a PL operates, so it can be extended without directly conflicting with old code. I suggest LWP models can be added to any language, though riskier in unsafe languages without care. Edit: Heavyweight processes are still okay to use too.

So a full-on actor model does not seem required for parallel features. Affordances are enough.

### OpenMP

The point I was making is that how parallel something is, is a configuration issue. If you do not pervasively parallelise, you will not be able to scale as far (for example FPGA targets). The advantage with the actor approach is you can target single process, LWP concurrent, highly parallel multi-cpu architectures, and FPGA targets with the same source code. In this way the PL does not need to know anything about the runtime concurrency as it is abstracted away. It is not all on the developer, because any valid actor program would work equally well on all targets (although it may not produce the same output due to nondeterminism).

OpenMP https://en.m.wikipedia.org/wiki/OpenMP, where you annotate 'C' code seems about as far as you can go starting with sequential code, but allows compile and runtime configuration.

### Appripos of little

I tried to use Visual Studio's new Open MP support.

All that happened was that the compiler crashed and begged me to submit a bug report.

### GCC OpenMP

OpenMP worked well with GCC last time I used it.

Duplicate.

### re: as far as you can go

I still seldom note disagreement unless worth argument (skipping first para).

seems about as far as you can go

So you don't assume you can do better? :-) Or maybe you limit this to canvassing things that exist already. I feel the space is barely plumbed. Starting from first principles, there are other ways to rewrite C than the way they do. For example, looks like OpenMP does not have a lightweight process model, instead spawning a thread per task, which is very heavyweight. (This is essentially no different from current naive threads-in-C practice.) Edit: okay, heavier still would be another new OS process per task.

But, I don't want to talk about it. Our earlier conversations were interesting but something of a grind. I like exploring designs when I learn something about my problem instead of what preconceptions folks hold (which is useful to know for market planning). I only planned to respond to things catching my eye, like your as far as you can go above. To rewrite C, you must plan where rewritten C is going to run, and any plan imposes limits.

To get something interesting you must first assume a kind of result you want, then see if you can get there with sharp edges removed and problems addressed. To turn C programs into LWP tasks, you have to rewrite every call to the outside world, unless you can annotate a call as safely sync, nonblocking, and short-lived. This isn't very on-topic, though, since it doesn't have much to do with PL issues. Designing a runtime environment to host a PL has little to do with what appears in the PL, although some rules might change, like adding TCO (tail call optimization) to C when spaghetti stacks appear after rewrite.

I'm happy to comment on your ideas if they interest me. But I don't want to describe mine since I have not enjoyed results before, and don't expect change. Feel free to talk about extending languages. I would enjoy hearing an idea I haven't had before.

### Same code muliple targets

Whether its LWP or heavy processes or the extreme parallelism of an FPGA or VLSI chip should be irrelevant to the algorithm. For example often a 'C' program is used to model the behaviour of a VLSI chip, then that will get rewitten in high level VHDL as parallel processes, before finally being rewritten again as RTL (register-tranfer-level) code in VHDL or Verilog. Ideally this process should be automated so you simply describe the algorithm, how it gets converted to a parallel target is not your concern.

This is what I mean't when I said "as far as you can go". You take a 'C' program and annotate it with OpenMP directives, so you don't change the underlying algorithm you specify how it can be parallelised. I didn't mean OpenMP is the perfect tool for parallelisation, and it is probably a poor turn of phrase.

There is no reason OpenMP style annotations could not be used for a system that executes using LWP and the ideal would be to have annotations that will allow the same algorithm to be compiled for such diverse targets as single-threaded, LWP, grid-cluster, FPGA etc.

The reason OpenMP targets processes is that it is mostly used in grid-supercomputer programming, where the architecture is usually shared-nothing (each node has its own CPU, memory and non-volatile storage). What OpenMP does well is he size and configuration of the compute-cluster is a runtime parameter of the compiled software, so you can run on a 10 node cluster or a 100 node cluster without recompiling. This is what I was referring to when I said that the configration should be separate from the compilation, and the algorithm. LWP, heavy-processes, supercomputer or FPGA should all be a configuration issue, not a coding issue.

### choices in deployment

Good amplification, though I have trouble seeing a PL angle beyond annotation. (I like annotation, I just think what I say about it is off topic, since it sounds like "now I have annotation I can do these neat non-PL runtime things" which is fun to me but doesn't sound like PL design stuff to some folks.)

I have no objection to talk about parallelizing an algorithm, but usually I am not. So I hope you keep in mind my goal is to untether things that may be quite different from each other, but still have data flow dependencies ... or not, when they merely share resources in a space. Now and then some sub-problem actually has concurrent parts that are all the same thing, but this is an exception. More often "the algorithm" assumes a thing not in evidence in my world. For example, if you service a lot of network flows, there is no "join". Heterogeneous pipelines (varying algorithm in each leg) occur more in my design space targets. Say if you merge multiple daemons into one OS process, there is no join, even though re-partitioning where daemons are located is still a config goal.

Now and then in the past ten years I worked on something that needed an N machine testbed to test, but I didn't have access to N machines. (Resources expected never appeared; this ought to be a familiar story.) Even getting that many VMs was a problem. So obviously, I want to simulate N endpoints on just one more efficiently. But a generalization of this idea is that you don't want resources to dictate your test topology. Whatever gimmick you use to parallelize over more resources should also let you use fewer resources too, with more sharing. Usually sharing is not the objective per se, but a cost-cutting move, and a way to test N node algorithms with fewer than N nodes. Varying resources used should not change test results, but probably will, which is why you test and fix things until finally you get the same results no matter how you distribute it, modulo latency differences and patterns of partial success and failure.

Since this is already long I'll stop, but next would be discussion of different flavors of program you annotate, depending on whether they are primarily organized around the old world or the new, or carefully crafted to straddle the two. This would almost be on topic as a PL item of interest. The new world is where you say, "Hey, since I know it's all getting rewritten, I'll go ahead and depend on things that can only possibly work in the new world." (Let's add exceptions to C now that we have continuations.) The old world is some hunk of legacy code it would kill you to rewrite, but you want to port so it cooperates better with new things, with more co-location. A motivation to straddle these is to test the "same" code in both the old and new world to compare results.

### Circular Reasoning

If you want things to run on any architecture in the same way (and allow simulation on lower specification machines), I keep coming back to the idea that everything needs to be event (message) based, and that you can only have (thread) local storage which is an LWP execution environment, but is also the actor model. Now maybe you don't want actors all the way down (even though that is where you are going to want to go once you start down this path).

So you would add some annotations to 'C' for declaring actors, and build a checker that confirms the invariants for these actors. Initially this would allow you to annotate legacy 'C' code for safety checking. You could then implement code generation from the annotations to remove the need to write boilerplate in new code. So the annotations would serve a dual purpose, if the code they refer to exists it gets checked, if not it gets auto-generated. This is not going as far as a transport, but I don't really see the need as each actor (in the not actors all the way down model) behaves like ordinary sequential code internally, except you cannot share data.

To deal with sharing data you would encapsulate the shared data in its own actor and that actor gets passed to multiple other actors (possibly in different LWPs). The message passing serialises the access. To avoid performance problems there needs to be a 'direct path' for LWP message passing, where the message pass gets converted into an ordinary function call by some heuristics (so that one LWP does not hog the resources).

I am still not sure I have interpreted to topic correctly, but features to manage indeterminance would be: event/message passing based and only local data. Sounds a lot like Erlang. For old code, annotations that allow a linter to check these constraints.

### if wishes were fishes

Don't write for me, say what you think is interesting to you personally. Whose reasoning is circular, yours or mine? (If mine, pick a return insult from a menu, doesn't matter to me.)

Message-based is obviously good, but both "everything" and "needs" are wrong, unless that's rhetoric. I also disagree any path inexorably leads to actors-all-the-way-down. Nothing forces a global actor quality; what looks like an actor is a local perspective, because you see only messages. One point of abstraction is hiding how something works inside, so internals can be imperative with non-actor structure. Fixation on uniformity is weird, and essentially pointless unless writing proofs of global properties; one point of modules is to limit how far something is relevant, so internals do not bear on global proofs. (Failure to obey api contract is just a bug, not a global invariant counter example.)

You can see why I am going to let this sub-thread languish. Arguing has little point unless you learn something, or need to negotiate shared-work priority. This reply is something of a bonus postscript, instead of not replying as I intended. Feel free to add a last remark of your own if needed.

Some things I say may sound like Erlang since, as I've noted here in the past, I started by wondering how to make C more like Erlang a few years ago, in some parts of problems. So it would be hard to point out a new parallel to me in lightweight processes in C. (Warning: change in topic next paragraph.)

Programming languages can aim to enable things as well as to prevent them. Features can be both positive and negative: helping you, and stopping you. Some PL designers focus more on one or the other. Stopping programmers from being bad still swells in popularity, but I don't care at all, unless it is enabling support to stop something from happening in a subset of code, by request and agreement. So for example, it should be possible to declare some data is immutable, and a PL should help. But forcing all data to be immutable is not a worthwhile goal in cases where it impedes devs from being productive.

If code in different fibers, threads, and processes have (and use) access to the same shared mutable state, they are part of the same "actor", even if there is a great deal of complex internal structure. A PL should let devs write imperative code with mutable state where needed, then support sharing immutable data between such components. An actor-like model appears wherever the dataflow is async and immutable. Reasoning at the actor level is not impeded or invalidated by individual actors having complex internal imperative natures. There is no evolution to nirvana.

duplicate.

### Issue independent of determinsm

I think the issue being raised here has nothing to do with concurrency or determinism. Rather the core problem identified is amplified in those contexts, but it remains a core problem for sequential deterministic programs: how to state and validate assumptions on a large scale.

On the very fine scale we have some technology dealing with dependent typing and constraints. At the low to medium level we have type systems. This includes things like Ocaml checking pattern matches are exhaustive, Ocaml's private module interfaces which isolate constructors, in Felix we can also state function pre- and post-conditions, object invariants, and type class axioms.

I would include the ISO C++ Standard STL protocol specifications at this level: stated and manually checkable even if the machine doesn't help much.

On the larger scale .. nothing much! For very large systems, the ground level is specified by networking protocols, for some applications like databases interactions of processes are managed by SQL. But for general programming .. nothing.

In the old days people used to draw flow charts, and these extended to higher levels although with considerable loss of detail.

In principle, Felix chips-and-wire model of fibration also extends, since you can build a circuit board from chips and wires and that becomes itself a new chip. The model corresponds to some extent with category theory so there is some hope the modularity and scalability of the model will extend to reasoning about it. It's interesting because it is highly indeterminate: reading and writing on channels (wires) causes an exchange of control when the reads and writes match up, but who gets control is unspecified. You have to program in a way that does not depend on it. Nope, there's no concurrency involved.

Until we can reason about synchronous synchronisation (sic) I don't see how we can reason when the synchronisation is asynchronous :-)

### SPIN and PROMELA

Maybe a higher level formalism in the language for a state machine, where states are declared, and message transitions written between each state would allow the compiler to check coverage (in the same way as for case statements) so it can warn you "No handler for message type X in state Y."

This could be improved by an interface definition language that let you define protocols (where the states and message types might be asymmetric), where roles in the protocol could be defined ('server' and 'client' being simple examples of possible roles, but these are in effect just names) and the internal states each role is allowed and the messages it can send and receive specified. The compiler can then check the protocol for consistency and check it using randomised testing (without having to look at the method implementations), and finally check all the required methods are implemented by the code for each role.

I don't know of a language that incorporates such techniques, but PROMELA is a modelling language for protocols that allows simulation with SPIN http://spinroot.com/spin/whatispin.html. If a language could verify an actual implementation against a PROMELA model then that would seem a good staring point. Ultimately something allowing formal protocol definition in the language, allowing a verification stage using a tool like SPIN whilst ensuring the model and implementation remain in sync, such that the formal protocol is like the signature for the implementation.

### purely functional processes

Purely functional computations can support high levels of parallelism and explicit models of side effects. At the same time, even.

Haskell style par/seq parallelism isn't ideal here. Too many fine-grained sparks, no ability to amortize 'fork' overheads over a large sequence of operations. Par/seq simply doesn't scale to larger systems where forking and communication is expensive. Actors and Erlang-style processes also aren't ideal: they unnecessarily entangle parallelism with identity, aliasing, state, and non-determinism. Fortunately, we can get most of the benefits of Actors or Erlang-style processes while sticking to a purely functional model.

A "process function" (PF) has type PF = arg → (result, PF). More or less. Ideally, we can give it an affine type, too, so clients cannot accidentally hold onto the old PF and destructive update is feasible.

We can annotate a process function to operate in parallel - in a separate thread, or even a separate heap. The returned process function becomes the next state for that process. Calling the function immediately returns a result - a future we can synchronize to access the whole value. The returned PF may immediately be called again, essentially creating a queue of arguments. Futures can be passed as arguments to other processes.

By orchestrating results as arguments to other functions, we get pipeline parallelism. We can model networks and message passing, e.g. use a lookup table to find the process function for a given argument. Performance can be increased idiomatically, e.g. use batching and staging techniques to amortize parallelism overhead across messages, or hierarchy so a payload of a large message can be produced incrementally (or even as-needed).

In any case, 'effects' would be modeled in terms of both results and the state update to the affine process functions.

To design a composable parallel non-deterministic system requires pervasive non-determinism. Rather than being a problem it is the actor approaches combination of parallelism, non-determinism, state, aliasing and identity that makes it a better model for this purpose.

The above formism PF = arg -> (result, PF) imposes an order on argument processing, which is not non-deterministic, (pipeline parallelism), which fails to address the topic of how PL can help with handling non-determinism. The assumption being that non-determinism is fundamental in the system and we want help ensuring program correctness.

### Non-determinism isn't the goal

If non-determinism is part of your 'purpose', I agree you'll need to introduce it somewhere.

Not 'pervasively' though. There are many ways to introduce controlled composable non-determinism, e.g. via oracles or secure reflective capabilities. There also is no need to tie in state and identity like actors model.

All tha aside, from the OP (not the title, but the wall of text that follows) it seems that non-determinism isn't the goal. Performance and predictability is the actual goal. Parallelism - use of multiple CPU cores, possibly distributed. Ability to compose and comprehend large bodies of code.

Eschewing non-determinism is a valid answer to those goals. We can have performance without sacrificing predictability.

### Network Routing

As soon as the program is distributed over compute nodes on a network, packet routing means arrival is non-deterministic. Note the discussion is about parallelism which implies multiple cores not concurrency, and even with a single node, interrupts can change relative execution times in different cores, resulting in non-deterministic completion orders for parallel tasks.

The methods you introduce for non-determinism like oracles seem difficult to use, and inconvenient from a coding perspective compared to the simplicity and uniformity of actors.

I think there is a real need to tie state and identity, consider a flip-flop in an electronic circuit, it stores state and has identity. The functional approach without state us like trying to design a circuit using only combinatorial logic, and no registers. It works fine for simple situations, like an implied global state, but that severely limits the circuit architectures you can represent.

I interpret the text to be talking about how to improve performance and predictability in the face of non-determinism. In fact it reads to me like more non-determinism equals better performance, so we don't want to over specify. In which case a drawback if your 'no nondeterminsm' approach will be poor performance for the scenarios on the OP.

### Arrival order non-determinism

Non-determinism is a relationship between a program and its potential outputs. Non-determinism of evaluation order is only relevant to the extent it's observable within a program's logic. Arrival order non-determinism is no exception.

From personal experience, I can say that affine capabilities for non-deterministic reflection (e.g. to 'select' a completed future from a list) aren't difficult to use. I haven't tried oracles, but they're basically the same thing syntactically. I eventually chose to eschew non-determinism, but it wasn't for reasons of syntactic convenience or even referential transparency.

Separation of concerns is useful. I said "there is no need to tie in state and identity" with non-determinism, not that there is no need to model them. There is value in not coupling three concepts into a single abstraction like the actor. There are a number of effective ways to model state and identity within purely functional programs that have nothing to do with non-determinism.

The assumption that "more non-determinism equals better performance" is not a valid one. But yes, it does seem the OP makes that assumption.

### re: OP non-determinism assumption

Suppose we say "internal" when program logic can see it, and "external" when it cannot. (Yes, if you model external non-determinism in program logic, it gets confusing; so pretend that is out of scope.) So IND means internal non-determinism, where program logic is non-derministic. Meanwhile END is external non-determism, where evaluation order outside the program varies in OS scheduling and network packet arrival order, etc.

Non-determinism of evaluation order is only relevant to the extent it's observable within a program's logic.

In one of my long comments above, I use other terms: foreground can have IND and background can have END. So I already talked about this distinction, with intent to clarify exactly the point you misstate:

The assumption that "more non-determinism equals better performance" is not a valid one. But yes, it does seem the OP makes that assumption.

The OP makes another similar assumption with a crucial difference: more END fosters opportunity for better performance, if program logic explicitly denotes unordered parts, lining up with api usage so an environment can assign logically concurrent parts to physically parallel hardware. That is, don't overspecify order, use api fitting async parts, and work hard on tools to find better scheduling.

My point is an obvious one, to some of my peers, who would roll their eyes if I "explained" it to them like they don't get it. (Like the guy who worked at Tandem Computers, and the guy who worked at Control Data Corp.) It follows directly from believing Amdahl's law has any meaning. Worst case is where you overspecify order in such a way nearly nothing can be done in parallel, because program logic forces serial progress. Both language and application semantics have a lot to do with avoiding that worst case. But not forcing worst case doesn't make performance better unless opportunity is leveraged.

### determinism and confluence

more [external non-determinism] fosters opportunity for better performance

Granted. Lightweight threading, load balancing, work-stealing, caching, etc. operates on this concept.

if program logic explicitly denotes unordered parts (...) don't overspecify order

I don't read that as talking just about external non-determinism. 'Program logic denotes' strongly suggests internal non-determinism. Unordered parts could very easily be 'unordered lists' or 'unordered streams', etc.. 'Don't overspecify order' reads to me as "specify the minimal order for a program to return a correct result" not "specify the minimal order for a program to be confluent". You will need (somewhat ironically) to specify confluence if that's the property you want.

In general, you cannot achieve or reason about confluence by indicating parts of an arbitrarily constructed program to be 'unordered'. This follows from Rice's theorem. Proving anything non-trivial about arbitrarily constructed programs is not very feasible.

The easiest way to ensure confluent behavior of a program, while enabling flexible evaluation order, is by construction. Pure FP has an advantage here - e.g. the Church-Rosser theorem proves lambda calculus to be confluent. It's also feasible to ensure confluence by analysis of systems constructed to support such analysis. Many term rewrite systems include support for this analysis.

### confluence and dependency

I like that idea of confluence a lot, but I never heard any practicing dev mention it. Perhaps the topic area is inoculated by an old idea, stopping devs from adopting it when thinking about problems. In particular, dependency graphs seem to dominate problem analysis where confluence would be relevant. Devs manage dependencies, but without considering confluence properties as such, beyond intuition about issues. I've been trying, without success, to think of when I would spring confluence on a coworker without them looking at me askance.

Confluence is a really nice idea in the context of expression rewriting. But I know devs who treat problem solving by rewrite-formulation as a kind of dithering. It looks like procrastination instead of "just solving it" because it doesn't sound goal oriented. People like it when you suggest seeing problems a new way, when clarity seems better once offered. But theory of re-arranging views has a fussing flavor.

Dependency is a standard idea. People even know topological sort applies to picking a valid evaluation order for expressions in a dependency graph. (If they remember what it is. If you ask a dev if they recall what topological sort is, they may smile and say "maybe".) I expect dependency is a concept devs have in mind when planning behavior of async operations and any resulting fork and join semantics. That confluence explains why a plan is valid doesn't come up.

I have been thinking about (what amounts to) expression rewriting a lot, so confluence is an idea I was ready to hear. The kinds of problem I find interesting tend to look like converting code in one form to code in another form. :-) This even applies to work problems in optimization, when it amounts to graphs executing more efficiently or with better latency structure. The idea of dependency doesn't explain why you would want to structure code transformation in some ways, especially when goals involve ease of rewriting again for a new reason, instead of just solving it.

Edit: see Topological_sorting for partial order, DAG, scheduling, dependency etc tradition in computing.

### causal commutativity

A related idea to confluence is 'causal commutativity' (cf. causal commutative arrows and their optimization). In particular, this describes confluence with respect to dependency in a flow (along 'arrows' representing computations).

OTOH, that's mostly discussed with respect to arrows. (And sometimes for 'commutative monads'.) If you aren't prepared for the arrow abstraction, it can be rather daunting. General abstract nonsense.

### re: causal commutativity

Googling "confluence causal commutativity" turns up an interesting 2012 book by Luciano Lavagno and Alberto Sangiovanni-Vincentelli: Algorithms for Synthesis and Testing of Asynchronous Circuits. A section titled "interleaving semantics of concurrent actions" (top of page 130) has nice language:

We then choose to model such concurrency by interleaving, i.e., considering all possible alternative total orderings compliant with the partial order between possibly concurrent actions.

Maybe we should adopt similar language to discuss explicit code annotation that conveys intended partial order relation among code fragments. The material is quite mathy and has a lot of confluent definitions. (Thinking of responding soon to Bruce Bell.)

### indeterminate order

I would like to make a different distinction: "indeterminate" versus "nondeterministic" order.

If you want to allow the maximum leeway for your PL implementation to optimize for performance, the cost is that the execution order is entirely up to the implementation. The ordering may be deterministic (chosen arbitrarily for the convenience of the compiler). It may be truly nondeterministic (if the implementation is truly concurrent). It may choose the same ordering 99 times, and the hundredth time choose something different (as a JIT re-optimizes the code path). I call this "indeterminate order".

The problem with indeterminate order is that it imposes itself on any effects associated with the execution: the implementation may insert arbitrary ordering dependencies between any effects whose ordering is left indeterminate. This makes program-visible effects with no better guarantees than indeterminate ordering practically useless! (and, if I understand correctly what you mean by END vs. IND, should be a problem for external as well as internal effects)

I prefer to reserve "nondeterministic order" to describe execution with some kind of fairness property -- typically provided either by truly concurrent execution, or by some form of preemptive multitasking. Fairness removes the implementation's freedom to insert arbitrary ordering dependencies, which allows the programmer to describe effects which are practically useful again.

Event-driven and Actor-type systems seem to prefer some kind of cooperative multitasking, in lieu of a fairness guarantee. The advantage is that cooperative tasking is typically cheaper; the disadvantage is that it requires a guarantee that must be provided by the programmer...

So, my question in this context: is there any theoretical difference between "indeterminate order" and that needed to implement and use cooperative multitasking? Or, is it just a matter of focus? Does the question even make sense?

### flavors of making partial order into total order

I will try to get to your points after I note things you bring to mind. The essential point of your third paragraph mostly eludes me, though context is almost clear. I see some contract focus in your items related to guarantees and fairness. This almost verges into a related QoS (quality of service) topic, that might be orthogonal to concurrency mechanisms, except that need for QoS features can be provoked by worst cases in concurrency support. Let me lay some background. There may be no unifying focus for organization; some paragraphs may be orthogonal.

I mainly want to see some factoring which allows some order to be injected after a program specifies a partial order it needs. Your post suggests a PL always provides additional order according to its whims. While that may occur, I was thinking a dev can inject specific orders once is it factored out, via further metainfo, perhaps as extra partial order constraints. The control given up by a program can be captured in more than one way. Some degrees of freedom would be used to pick ways of servicing requests in alternative ways. For testing a dev might force a total order for reference.

To get any performance improvement from parallelism, typically division of labor must be coarse, to still win after overhead is added by division. The coarser the better, ideally with pieces never needing to coordinate again later: scatter without gather. Load-balancing tends in this direction, where traffic is mostly partitioned into logically non-interacting parts. (Such parts may secretly interact anyway, under the covers, if a shared resource is touched.) Sometimes you almost can't make one sub-program go faster, but you can run N of them at once, making them look faster on average. Start to end latency for one task in isolation may not shrink, but you can interleave a lot of them when hardware would otherwise go idle. It would help if a PL lent itself to thinking of work loads chopped into pieces that never or only rarely talk to each other. Process architecture in an OS is good for this, if the numbers are not too large. But a million such concurrent pieces cannot be done efficiently with off-the-self OS products, unless they have lightweight processes.

In a special case, granularity is still efficient even when very fine instead of coarse. If you want to fire off many async requests, done by someone else (binding provided by dev, or PL, or runtime, or service lookup, etc) letting them run concurrently is better than waiting for each to complete in some arbitrary sequential order. If you need to wait for all to complete, before proceeding, usually something called a barrier manages this. Barrier is an overloaded term; in this case it means a form of batch sync. Anyway, the overhead doesn't go up when there are more, because you had a per-item overhead anyway. (Buffer overhead goes up when there are replies though.) It just seems an exception to a "be as coarse as possible" injunction.

Your distinction between indeterminate and nondeterminism sounds plausible, but it seems like nuance. (I like using more than one term meaning almost the same thing, to avoid repetition dulling the mind.) Jumping to your last question, I don't think there is a theoretical difference, if abstraction is present. If a program says "I know you are injecting some order I didn't specify, but I don't know what it is", then the nature of what provides the order is obfuscated. Things inhabiting dont-care space are all uniformly gray and indistinguishable. When you go outside the program and cause order to be injected in particular ways, then each is a different flavor, and distinctions you want to make are meaningful. Depends on where you stand. When a PL tries to support more than one way to do it, the docs may be irritatingly inconsistent, with some of the goofy quality we associate with quantum wavefronts. The way you use it changes the meaning.

I'm pretty sure I'm not helping, nor quite addressing your points. But maybe this extra grist helps shape use case formulation.

### A modest proposal

A nice proposal I would like to see implemented is something like a parallel merge operator for streams. I.e., if F and G have type IO = I -> (O * IO) then F PAR G gives you back the stream as results arrive in time-dependent order. (Or something like that.)

### Referentially transparent non-determinism

I don't believe you have the type correct for combining streams in parallel. But that isn't essential to your point. For the general idea of ordering values in a stream or list based on order of evaluation:

The RT variant would add an observer parameter - e.g. an oracle or reflection capability provided at the program toplevel and threaded through (perhaps statically). The merge order would thus depend on this opaque third parameter. This would also avoid problems of pervasive non-determinism, enabling programmers to precisely control which subprograms are non-deterministic.

If you're going to use non-determinism, I do recommend avoiding pervasive forms like actors or ad-hoc threading.

### Actor Programming

Before you made that recommendation, how many large systems had you written using actors?

Isn't the success of Erlang used for writing reliable distributed software a strong (evidence based) counter to your assertion?

Edit: Re-reading that sounds a bit confrontational, which was not my intention. It you look back you will see I have argued the other side of this in the past. I realised I had not really written any large systems using actors, so I couldn't really justify my negative opinion. Erlang shows how you can combine functional programming and the actor model to successfully build distributed programs. With distributed programs the non-determinism is there whether you like it or not.

Extending that thought if the functional programming + actor model is better for implementing distributed systems than functional programming alone, then maybe a pure actor environment would be even better.

I think pure functional programming is weak at managing state, and in a distributed system each node clearly has state.

Let me pose an additional question: How do you handle a distributed fault tolerant system where you want to be able to upgrade and replace individual components / nodes without taking down the whole system using functional programming? Is this more complex than the actor model?

### Isn't the success of Erlang

Isn't the success of Erlang used for writing reliable distributed software a strong (evidence based) counter to your assertion?

No more than the success of many C++ projects is 'strong evidence' in favor of expensive compilations, complicated syntax, and undefined behavior. Not every aspect of a successful tool contributes to the success of a project. In many cases, projects are successful despite some hazardous feature of a tool, not because of it.

In 2014, I went to a few local meetups by FP users discussing Erlang (and specifically Elixir). And one thing they noted is that the out-of-order messages - and how that order tends to vary a lot depending on the machines or runtime configuration used - remains one of the more common and consistent sources of bugs. This matches my own experiences with actors model.

Erlang programmers use idiomatic techniques like killing and restarting whole subnets of processes to regain some robustness in the face of errors caused ultimately by non-determinism.

Before you made that recommendation, how many large systems had you written using actors?

Six. I was also developing an actors model based PL in 2007-2010, before I came up with RDP.

I experimented with variations of actors, trying to make them more tolerable. Distributed transactions. Abstract dependencies to simplify error handling (if this actor fails, kill these other actors). Etc.. But it always seemed to complicate things and not help too much.

With distributed programs the non-determinism is there whether you like it or not.

With open system distributed computing, you must deal with non-determinism because the state, connectivity, and latency of components (services, clients, etc.) fundamentally can vary outside control of the local program.

With closed system distributed computing - e.g. high performance scientific simulations on a mesh network or cloud - that is not the case. If there are runtime errors at the physical computing layer, a distributed runtime might be configured to recompute the failed part of the computation.

For many large systems, there is a mix of aspects involved. However, the non-determinism to account for time-varying connectivity of an open system can frequently be isolated to a few subprograms. There is much practical value in separating this non-determinism from the parallelism needed for scalable high performance computing.

In any case, the OP doesn't mention open systems. It focuses on the question of performance, which is all about parallelism - effective resource utilization, scalability, etc..

if the functional programming + actor model is better for implementing distributed systems than functional programming alone, then maybe a pure actor environment would be even better

Pure functional programming has not historically competed in the niche of open distributed programs. It couldn't have back when Erlang was developed - pure FP wasn't yet a thing then. But with techniques like algebraic effects and handlers, I believe the vast majority of open distributed programs could be written functionally (assuming compiler support for splitting the program).

I think pure functional programming is weak at managing state, and in a distributed system each node clearly has state.

Pure FP handles state easily enough. Even the process function formalism PF = arg → (result, PF) includes a simple state abstraction. This is more or less how actors and Erlang processes handle state, too.

If your concern is performance of the state abstractions, use of Pure FP with substructural types or even just linear values by default (with value sharing being the special case) can have all the performance of imperative code via destructive updates. Linear values and destructive updates by default is how I'm implementing my current language runtime, with value sharing being controlled by annotations.

Let me pose an additional question: How do you handle a distributed fault tolerant system where you want to be able to upgrade and replace individual components / nodes without taking down the whole system using functional programming? Is this more complex than the actor model?

The same way one handles anything in pure FP: model it.

Model the system components and the abstract network (including latency and connectivity). If you want to upgrade a component, you can do so either at the network layer (like physically attaching a new component) or at the component layer (e.g. send a message telling it to update something). Aside: Message passing isn't the only option for modeling open distributed systems, e.g. you might favor publish/subscribe or tuple spaces instead.

Upgrade within actors model is complicated by the pervasive ability to fork/spawn new actors in response to a message. This leads to idioms like creating an actor to handle the next message in a sequence. It is infeasible to reliably propagate upgrades through an entire system of anonymous actors. So, if you want upgrades, you're forced to be careful and disciplined, using a minimal subset of the actors model idioms.

### Performance

I thought that was a very good response, and I agree (and have argued the point before) that non-determinism is a source of hard to find bugs. I think there is a good reason why algorithms get defined as a sequence of steps.

However trying to impose order on the messages in a distributed system is going to limit performance as you have to wait for the messages and buffer them to allow re-ordering messages received in the wrong order. You would pervasively pay this cost, even if the order was not relevant for a particular computation.

Regarding pure functional programming, I find the performance is not good compared to imperative languages like Rust. I find your statement "But it always seemed to complicate things and not help too much." applies to handling state in a functional language. The formalism "PF = arg → (result, PF)" requires plumbing PF everywhere, although this could be improved with a monad, it sill seems to make state a second class citizen instead of directly representing state. Haskell has the sequence and parallel monads, which I have used.

I don't think the ability to fork/spawn new actors is good or necessary, so I would leave that out. For me actors are useful because they can represent hardware as well as software, whereas of you can fork/spawn new actors it is no longer modelling hardware, so to upgrade an actor, you can just replace one actor with another that has the same ABI, and that could easily be done at runtime.

At the moment I don't see anything new in the approach you are taking (I may be missing something), compared to programming with Par and Seq in Haskell. I wrote a couple of large parallel programs with Haskell, and did not find it a great experience.

### re: Performance

have to wait for the messages and buffer them to allow re-ordering messages received in the wrong order

In practice, I have never found this to be a significant issue. Consider:

• Physical networks are weakly stable, i.e. they remain stable for a while then settle into a new weakly stable configuration. Out of order messages are the rare case during shifts or hiccups in stability, not the case for which we need to optimize.
• In interest of maximizing utilization, a CPU is frequently given more than one computation to process. Latency of waiting on an out of order message tends to overlap with latency from an M:N thread scheduler.
• In the interest of efficiency, it can be useful to buffer and batch communications not just at the receiver, but also at the sender. Doing so amortizes overheads for network connectivity, routing, encryption, checksums, ordering, data compression, etc.. The cost is a predictable and tunable amount of latency.
• A lot of distributed programming involves connectivity idioms, gateway and database idioms, publish/subscribe or observer patterns, etc.. Out of order processing of messages complicates such code, forcing developers to explicitly order messages. Fine-grained 'order-the-messages' code at the app layer plus the context switching to execute it is a lot less efficient than would be ordering handled in bulk as a general communications layer feature.

I've not been very impressed by the effectiveness of out-of-order message processing for performance. It weakly optimizes for a rare case that I almost certainly don't want to deal with or reason about.

The formalism "PF = arg → (result, PF)" requires plumbing PF everywhere, although this could be improved with a monad, it sill seems to make state a second class citizen instead of directly representing state.

The process function and its state is certainly first class. You can pass one to a function, return one from a function, create new process functions as needed, etc..

But it doesn't entangle 'identity'. If you want something like an actor or Erlang process, you would do so by introducing a process identity abstraction, i.e. a small symbol or number that you can pass around in lieu of the process function. The actual PF value would then be contained behind the identity in a lookup table. The lookup table might be held in a monad or by a parent process function. It wouldn't be difficult to model message passing between processes in terms of (identity, payload) pairs.

This is not so different than what you'd be doing in a with so-called "first-class" processes or actors. It's just a lot more explicit about the distinction between a process's behavior and identity, and about any networking behavior.

actors are useful because they can represent hardware as well as software, whereas of you can fork/spawn new actors it is no longer modelling hardware

I agree with this observation. Though, actors model also fails at representing an important part of that hardware: the network and the messaging.

But functional programming is also good at representing machines and software. The programmer is forced to be more explicit and precise, including the network model and a notion of identity. But that isn't a bad thing.

I don't see anything new in the approach you are taking (I may be missing something), compared to programming with Par and Seq in Haskell.

There's a big scalability difference between PF vs. par/seq. In particular, the PF formalism is more suitable in cases where at least one of the following conditions is true:

• the overhead for parallelism is high
• the overhead for communication is high

These conditions are both true for high performance computing on a mesh network or cloud. They are also true for any scenario where we might want to use separate heap per thread, e.g. for simplified scalable GC (recently discussed in another thread). It's mostly the latter that motivated me to develop PF as an alternative to par/seq.

The PF abstraction is better under these conditions because it amortizes the overhead of constructing a separate process over the series of calls. Further, the PF makes it easy to reason about the maximum amount of information communicated between processes at each step: arg and result are communicated, while the returned PF is not communicated.

Haskell par/seq, by comparison, relies on an assumption of cheap construction of ultra-lightweight threads called 'sparks'. This in turn relies on a shared heap and a sophisticated garbage collector. Those, in turn, further hinder scalability. (And introduce fragmentation risks after some time. Yuck.) Something like par/seq could be implemented in terms of PF, parallelizing a function for a single call.

PF also provides precise control over parallelism. Haskell can achieve similar via 'parallel strategies'. But it's a bit awkward. With PF, it's easy to reason about the how many 'processes' you've created. Each process is internally sequential unless explicitly, hierarchically parallel. PF is a simple, natural fit for most parallelism strategies I'll ever want to apply.

I wrote a couple of large parallel programs with Haskell, and did not find it a great experience.

Parallel Haskell has a lot of factors that make it "not a great experience". Laziness is the big one. In any case, I would certainly say "did not find it a great experience" of both actors model and ad-hoc threading. Pervasive non-determinism isn't fun.

Outside of pure FP parallelism (with minimal laziness), flow-based programming (FBP) and variants are the only widely used parallelism model I've both used and found to be a great experience. And PF is pretty good for modeling FBP.

Are there any open source projects using this approach that are ready for use? A typical application for this kind of distributed programming is a clustered database with multi-master and eventual consistency like CouchDB (written in Erlang)?

How do you think a competitor to couchdb written using PF would perform? How easy would the code be to maintain?

I disagree with some of the statements above, but it's hard to respond to each point-by-point, so just picking one randomly, with scientific grid computation you have no idea of the completion order of work blocks (due to interrupts etc), and you don't want nodes to sit idle, so new blocks get allocated as soon as a node is free. This will result in the block completion order being non-deterministic. If you try and impose an order on block completion it will slow the system down (and I don't see why you would want to).

### I don't see the problem

I also have tried to popularize the idea that 'IO = I -> (O, IO)' should be the canonical type for creating processing tasks. That idea was somewhat related to wanting to be able to express FSMs or be able to express co-algebraic ('like') computation. (I also noted that sometimes typecheckers for FP languages would fall over this definition.)

Somehow people here think, or propose that, since it is FP it should be confluent/deterministic. I disagree, I would propose non-deterministic combinators on this too, and I believe these non-deterministic combinators are fully expressible in most FP languages, like Haskell.

I do agree that losing identity might be a thing; but maybe a good thing? Keeping track of identity in vastly parallel applications might not be worth it too... Better concentrate on the computation than the agents might be worth it.

### Databases and Pure FP

Large scale databases can be written within pure FP. There aren't many examples of this yet. But consider:

LMDB, Sophia, LevelDB, RocksDB and other high performance key-value databases use more or less purely functional data concepts anyway. Persistent trees to support parallel reads. Single writer at a time.

Modeling concurrent writers is possible by having a single writer merge a stream of proposed updates. For example, optimistic transactional concurrency would involve tracking reads and writes against a versioned database value, to form a 'transaction' that can be proposed to the primary writer together with a set of other proposed transactions. (CRDTs would be an interesting option to explore if you want to avoid rejecting some proposed updates.)

The main feature needed for modeling databases as first class values (as opposed to external resources) is the ability to hibernate large values so they aren't part of 'active memory' or touched by fast GC. The approach I favor is to introduce a pair of functions/annotations of the general form:

stow : type → Stowed type
load : Stowed type → type

This needs some runtime support for laziness, such that loading a value shortly after stowing it doesn't actually touch the cold store. I've had a lot of success with a prototype for this idea in Haskell, and using it to model persistent tries. But I'm facing a lot of overheads from poor integration with GC. The runtime I'm currently developing will perform stowage as part GC, after heuristically waiting a few ticks to see if the value is still in use.

In any case, all of this is more or less orthogonal to parallelism. Separation of concerns is preserved. I could thread a database as a first-class value. I could tuck it inside a PF so the hibernation overheads occur in parallel. I could model a database 'service' (a DBMS) via a PF with identity. Etc..

### System Level Programming

I am not sure you got the point, or at least I can't find the relevant part of your reply. CouchDB is a distributed database, you can write to any node, so there is no canonical datastore, the database only exists 'virtually' and the state of data is reconstructed by applying the changes feed from other nodes which may arrive in an non-deterministic order (as nodes may be geographically distributed with different ping times).

You can write this as a bunch of communicating processes, but this does not help us understand and model the whole system including the messages between nodes.

I am thinking about something that would let you describe the whole distributed architecture of the system (not just one node) and prove properties like eventual consistency etc.

If each node is an actor, the messages between nodes are included in the model of the language and the whole system can be reasoned about. We are not working at the program level, but at the system level.

### distributed data and large scale computing

A stow/load mechanism can easily operate with distributed computing and data. A stowed value is, more or less, a specialized future, just like the result value from calling a parallel process function. In place of serializing a large value, a lightweight placeholder for that value is communicated. Perhaps an incremental number together with an HMAC for security purposes. When any node in a distributed computation needs to 'load' that value, it will potentially do so via access to another node in the runtime.

A runtime can move and replicate stowed data as needed. A consequence is that the stow/load mechanism I described above already gives you a distributed database of sorts. It is only for immutable values. But it means that a big tree-structured value that is semantically centralized can be physically sharded/cached/replicated across a bunch of nodes in a runtime.

That covers most of the motivation for a distributed database. You could still model explicit distribution and merge protocols if you want them. I'd probably favor CRDTs and DVCS as an inspiration rather than CouchDB. But for the 99% use case, a simple "a database is a value, and we don't need no management service" concept will serve better.

If each node is an actor, the messages between nodes are included in the model of the language and the whole system can be reasoned about. We are not working at the program level, but at the system level.

Ability to model and reason about systems at the program level is superior to reasoning about and integrating programs at the system level. I'll try to explain. Consider a program specified in three parts:

1. a list of (name, message) pairs describing messages in transit
2. a list of (name, process-function) pairs describing compute nodes
3. a step function that takes the first two lists, does some computing, then returns two new lists

If all three parts are represented in a purely functional language, our behavior will be deterministic and pure. But for real-world integration, a compiler or runtime might offer a default implementation for the third element (similar to Haskell's implicit runIO). Effects might be modeled by messages that use certain names (like "stdout"). The details aren't very relevant for my point.

Ultimately, what you propose for 'system level' reasoning is equivalent to providing just a fragment of the program, leaving the lest undetermined (and hence non-deterministic, from a program behavior standpoint). By itself, that isn't a bad thing. But by pushing this as some sort of language or paradigm-level feature and ideal, you lose the ability to specify the missing component. It becomes invisible, inaccessible. This is the case with actors model, for example.

You will eventually want to support hierarchical models, control effects by wrapping subprograms, support deterministic testing with multiple schedules, add regression tests for specific orderings of events that caused problems before, model environments and mockup effects, etc.. Or if not you, someone wiser.

To do those things, modeling your system as a purely functional program serves very effectively. But it's also necessary to do so without a huge performance loss. So we need a strong separation of concerns - e.g. that parallelism doesn't entangle identity or pervasive non-determinism, and that working with big data doesn't require external resources like databases or filesystems.

By modeling systems at the program level, we can reason about, test, and integrate systems as simple programs. And the problem of high-performance effects without bottlenecks can be addressed by a compiler or runtime that provides a model-specific equivalent to runIO.

### re: readiness and non-deterministic evaluation order

I have not encountered PF as a means of parallelism in any industry-ready language. I haven't really looked for it. I doubt I would find any: most PLs have complicating factors that would hinder transparent use of PF as a means for parallelism. Shared memory resources, non-serializable functions, no control over aliasing, etc..

OTOH, PFs aren't based on any new ideas. They're just simple, old ideas that are known to work well and implement easily: process parallelism, promise pipelining, etc.. So I have confidence in the concept.

Regarding non-deterministic completion order, an equivalent property would still exist for PFs.

Each parallel PF (each 'process') can be understood as having a queue of work to perform - the stream of arguments with which it is called. Work would stall whenever a process waits upon an empty queue or synchronizes a future result from another process. A scheduler might not always wait for work to stall before context switching, instead favoring heuristic 'push back' to avoid building a huge pile of pending requests and unread results. There will generally be an N:M relationship between processes and hardware CPUs. An idle CPU might steal work from a busy one. A runtime for a distributed computation might perform more heuristic load balancing based on communication patterns (i.e. decide whether to move the messages vs. move the processes).

If you need a 'work block' to complete in a fully independent order from all other work, with no long-lived state, you can simply create a process for a single call and drop the returned PF. But that is a rare case. Cases like process pipelines or otherwise structured parallelism are common.

In any case, there will be a lot of 'non-determinism' at the implementation level for evaluation order between processes. I don't usually call that non-determinism, though. That phrase generally refers to program logic. And none of this non-determinism is observable within the program logic. It's just Church-Rosser confluence at work.

Pure functional programming has not historically competed in the niche of open distributed programs. It couldn't have back when Erlang was developed - pure FP wasn't yet a thing then.

I disagree. They're about the same age. About '86-ish. Pure FP has an older tradition but so do Erlang-like languages. They borrow somewhat from each other too.

But Erlang was never developed to scratch an academic itch; it was aimed as a robust language for telephone switching. Hence there are more pragmatic choices in the language I think adding to its popular support.

### Pure FP history

By my accounting, pure FP wasn't a thing before it had idioms to cover the full range of program behavior. Including effects models and concurrency models. I'm curious what you're considering the start of pure FP as a tenable thing. I tend to point not to the start of Haskell and its early exploration of awful ideas for effects, but to the popularization of monads by Wadler c. 95.

Clean programming language, which favors linear types as a basis, is the other source I'm aware of. I haven't paid much attention to its development, but its ideas were still new while Erlang was built on old ideas.

Wadler should be stripped of his position for popularizing a trivial concept. So, I tend to place it earlier, somewhere around APL, Z, Squigol, and Miranda,

I don't particularly care for monads, either. But effects models for 'pure functional programming' to be a complete paradigm weren't really present earlier. Miranda, for example, has an impure concept for reading a file (ref). I know APL isn't better with respect to file manipulation. I don't know anything about Z or Squigol.

### Ten thousand competing manners of looking at monads

There are about ten thousand competing manners of looking at monads, which probably added to the confusion of what a monad is supposed to be. I personally prefer: 'useful idiom.'

Clean had a competing effect model, contentiously more general, with uniqueness types.

Whether IO Monads are that pure in comparison to Miranda? I think that's contentious too. 'runIO' is never made explicit, so you are effectively reasoning over sequences of actions you don't know the semantics of. Arguably, doesn't look much better to me. You shoved the dirty parts under the carpet, so what? (Water under the bridge, and all that.)

I would somewhere place at the heart of modern functional programming the idea that you can use higher-order functions, or combinators, to structure your programs. A monad is just one instance of that, before that, everything was supposed to be some list comprehension (popularized by APL, LISP, and Miranda, no doubt).

I personally don't care much for purity when interacting with the outside world, but -yes- the Haskell compiler certainly does. Not sure I as a programmer should give into the whims of pure functional compiler construction where you're not supposed to break some invariants. Even if that buys me something, at occasions.

### re: runIO

I think it would take a deep misunderstanding of monadic effects models for it to be 'contentious' that modeling file read as an IO request is pure where Miranda's read is not. But the opaque nature of runIO does lend itself to such misunderstandings.

If IO were modeled/implemented instead as a free monad with extensible effects, enabling programmers to freely interpose their own runIO request handlers (e.g. to log all filesystem requests by a subprogram, or mockup the filesystem), then it would be a lot easier to explain and understand how this is pure.

Whether you care about purity or not, it's useful to at least agree on definitions, e.g. that 'purely functional programming' obviously isn't happening insofar as we rely on impure effects models. :D

### Metaprogramming Model

Monads effectively allow pure functional composition of impure program fragments. In a language like ML, the function arrow '->' is impure and any function application can result in side effects. In Haskell the arrow '->' is pure and so side effects cannot happen. Monadic functions (lets consider the IO monad for simplicity) take pure arguments, and return an impure program. "a -> IO b" is a pure function that takes a pure argument and returns an impure program. The impure program is opaque and is effectively hidden in the type deconstructor for IO.

When you think about it like this it is obvious you can have a pure program that manipulates impure program fragments, and outputs an impure program as its result. Then all you need to do is run the final impure program, this but is hidden in Haskell, but because 'main' is type IO (), all Haskell programs result in an opaque impure program that is then run. Monads are therefore used as a kind of metaprogramming.

### Which Effects?

I am wary of all papers with the term 'Monad' in it. I've stated it before, if your operators are solely determined by unit and bind, I cannot trust you're thinking straight. I.e.,

unit: a -> M a                      (* result is in M a *)

bind: M b -> (b -> M a) -> M a      (* result is in M a *)


It is trivial to observe that any result object is a value in M a and, in the absence of observers, you cannot state anything meaningful about that value. (Absurdly, for any a, M a might denote a singleton set - everything is mapped to the same value - for instance.)

So, sure, you can chain effect types but you do not know anything in a formal mathematical sense about the result value.

I have already stated that Haskell's IO paradigm predominantly relies on the/this fact that a Monad acts as a logical sink. Everything is supposedly pure since you cannot even express what an impure computation even does (worse, it is categorical mathematical sloppiness which makes it work.) At least, Clean's uniqueness types gives you that; moreover, that extends to formal efficient treatment of 'impure' data structures.

A combination of a logical sink and no meaningful manner of expressing IO in a fundamental way? I really, really, really can't care much about how Haskell does IO.

So maybe Oleg did something smart, again. But I am not going to read it.

With monadic effects, the monad is itself actually a type-level-set. We could represent a type-level-set as [], and this can contain elements that represent the type of effect like ReadStdIn, WriteStdOut , etc. As such an IO function would have monadic types like this:

readLn :: [ReadStdIn] String
writeLn :: String -> [WriteStdOut] ()


If we write a simple program the monadic effects can be inferred:

test = do
writeLn a
return ()


test would have the type: [ReadStdIn, WriteStdOut] (). You do have to change the type signature of unit and bind to something like:

unit : a -> [] a
bind : [e] b -> (b -> [f] a) -> [e + f] a


Where '+' is the union operation on type-level-sets

So if you have effects for MemoryAllocation and all other side effects, providing the primitive functions have the correct types you can infer the set of effects for the monad. So the type of every function lists precisely which side effects it can cause.

### Extending the Hack

Look. It's another neat hack. But it doesn't change the fact that, formally, you cannot reason about values.

Is that a big problem? Not for programmers, everything works as expected; given some underspecified notion of what kind of values may occur, the operational model gives back expected results.

Is that good enough for a bright future of pure functional programming? Probably, since nobody really cares about functional correctness. But the whole IO Monad hack makes sure you lost thinking about correctness in an algebraic fashion.

Without observers, you cannot hope to prove any correctness property regarding input/output observations (since you very effectively got rid of them in a logical sink.)

But, again, in a sloppy manner it gives back programmers what they expect.

### Essential uncertainty of the future

I don't think monads have to be a "logical sink" if we factor in dependent types. One of the values a type exports can be a guarantee saying that, for instance, a write followed by a read will read the same value that was written.

With enough additional operations/guarantees, a monad can be a completely transparent type. For instance, even without dependent types, the Monad instances State and [] are not "logical sinks." You can fully deconstruct them using runState and the [] and :: patterns, so there's nothing you don't know about them.

An IO monad design can't become 100% transparent (Edit: but see below), but it can get much closer. Its opaqueness is partly inherent in the organizational structure of the system: There's a natural encapsulation barrier blocking information from the future (when an action is deployed to the outside world) from arriving at the present (when the computation is deciding upon its action). We may be responsible for mitigating this source of uncertainty too, but we won't replace it with certainty; we'll address it with things like empirical tests and social learning.

It's also possible there's a much easier way to compose these guarantees than by sticking them on a monad. What I'm talking about seems like it could require the programmer to write proofs using those guarantees, and another approach might make the most commonly needed proofs hold by construction.

(Edit: David's post mentions the IO monad could let clients "intercept each request as a massive GADT" to be 100% transparent. I suppose that works! I should admit there's a qualitative aspect to the concept of "transparent" I'm talking about; if you have access to 100% of the information but you have to choose the right nondeterministic program paths, or write the right classical logic proofs, or find the right private key to break the encryption, then maybe it's not really that transparent.)

### Relies entirely on deconstructors outside the Monad

You can fully deconstruct them using runState and the [] and :: patterns, so there's nothing you don't know about them.

Which is my point exactly. I've seen Wadler try to prove properties solely using the monad laws. You cannot do that, for all we know, 'IO anytype' is isomorphic to the set {0}.

Moreover, you are referring to pure mathematical concepts for which deconstructors exist. My point, IO didn't solve the purity problem, it only shoved it underneath the carpet. Now we lost algebraic reasoning over effects.

I rest my case.

### My problems with monads are different

Which is my point exactly. I've seen Wadler try to prove properties solely using the monad laws. You cannot do that,

If you can do that, it has many applications, right? One person offering general tools that use a small set of assumptions certainly doesn't mean the rest of us have "lost algebraic reasoning."

I see all kinds of additional assumptions being tacked on in algebra. We have a group... but also abelian. A category, but also with a terminal element. A monad, but also commutative.

In this case, a monad, but also with [] and ::.

It seems like this is a widespread part of algebraic reasoning.

for all we know, 'IO anytype' is isomorphic to the set {0}.

I guess you must see something wrong with that. What if the language isn't implemented yet? Then all programs do nothing. :)

Anyway, take whatever property you would like it to have, and add that guarantee as an additional operation in your specification of 'IO anytype'. That way, you can get to the algebra you actually want to work with. I'm not going to say it's easy, though.

---

I very much appreciate and relate to the general sentiment that we should find alternatives to monads, but part of my point is that if we require programs' behavior to follow branching timelines, those timelines are already going to support a kind of monadic composition whether we like it or not.

I've recently been pursuing an approach where expressions don't branch as they evaluate, but they're run under a system which offers state resources that do branch in between expression evaluations. The expressions return actions to perform on the state resources, but those actions themselves only combine by concurrent composition, and they have no result values.

I'm excited to see what I can do with this, but I won't manage to avoid monads altogether unless my programs manage to avoid chatty interactions with the outside world. I hope to minimize the use of monads and chattiness; if I manage to keep programs short along the interaction axis and wide along the concurrent axis, I think there will be a more direct correspondence between the program behavior and its source code, for better control all around.

### do tell

post something? where's your blog? how are you same/different than rdp? etc.

Can you expand on what you mean by "you cannot reason abou values", "functional correctness" and "Without observers, you cannot hope to prove any correctness property regarding input/output observations"?

There clearly are values visible with monads, it's just that the values are impure programs. You could also understand the result of a monadic function as a unique opaque value, in the same way that clean uses unique types.

So a simple monadic function like "readLn" with type IO String, can be seen to return a unique opaque value, in the same way that any constructor hides the value of its contents, and runIO unpacks the contents which is a string, but as the value is unique every "instance" of readLn returns a different value and hence runIO needs to individually map it to a String, you cannot memoise the result.

So Clean uses unique types to force the IO function itself to be revaluated every time, Haskell runs the function once to create a unique value representation, that forces the deconstructor to be revaluated every time.

I don't believe the specific details of Haskell monads are especially germane to any issue we've been discussing.

In the unlikely case you're interested: Adam Megacz did some work a few years back regarding Generalized Arrows, which allow you to specify EDSLs that can be more constrained than the host language. This includes constraining values contained by an arrow. Generalized monads are implicit to these arrows if you have the ArrowLoop instance, and would have the same constraints.

In any case, Haskell IO does not rely on being a "logical sink". That IO is opaque is a coincidental property, not an essential one. If Haskell provided an openIO function that let a client intercept each request as a massive GADT, that wouldn't break the semantics. (Though, using that function might increase binary size. A compiler can easily take advantage of IO being opaque.)

Oleg's free monads and extensible effects model is more or less based on that sort of threaded interpretation for a GADT of commands.

I really, really, really can't care much about how Haskell does IO.

I could respect this alleged apathy more if you also didn't bother having a strong opinion about it.

### Illogical Nonsense

Then start with a mathematical refutation that it isn't a logical sink. I would be really interested.

But I am not going to read up on concepts I deem irrational. Pardon me, but your whole post reads like rubbish if they didn't get the fundamentals right.

### re: Illogical nonsense

Any instance of a free monad (link is lightweight primer) not a logical sink. The commands you stick into it can be observed as a stream of command values. You can to write your own 'effects observer' if you wish to do so.

Haskell does not rely on IO being a logical sink. That is, existing Haskell programs wouldn't work any differently if IO were implemented as a free monad. Haskell would only become more expressive, because we could capture a subprogram's commands to the network, filesystem, FFI, etc.. and model them instead as operating in a mockup environment.

The Oleg paper on freer monads improves and refines the free monad concept. Simpler. More efficient. More extensible. Command values are fully separated from the command stream structure. It's been seriously discussed as a replacement for Haskell IO.

Generalized arrows (and monads based on them) go further. You can reason about them based on the value types involved, not just the structure. Substructural value types can also work in this role. I.e. types that you cannot create out of nothing or destroy arbitrarily can ensure certain behaviors in a subprogram based on its input and output types.

In any case, I have no interest in trying to defend or explain monads. If you can't be bothered to read up on concepts before you deem them irrational, that's your problem.

### Sophistry

Are you Trump? Tossing in more words to win an argument has an academic history, I agree, but I fail to see the relevance.

Just show me a monad isn't a logical sink.

### Your earlier claim is this:

Your earlier claim is this: "Haskell's IO paradigm predominantly relies on the fact that a Monad acts as a logical sink." And by logical sink you refer to "you are effectively reasoning over sequences of actions you don't know the semantics of". Reliance means something should break if that weren't the case. Which is absurd.

The general monad concept and monad laws are certainly general enough to allow for logical sinks. Nobody is arguing against that point. Though use of substructural types or dependent types could enforce quite a few behavioral properties even through a general monad binding because they also work through pure functions. (Aside: Many of your complaints about monads could be presented against functions in general. Just replace unit with id and bind with compose`.) Generalized monad abstractions, based on generalized arrows, can capture structure a lot more precisely (e.g. enforcing a structure matching a DSL).

Anyhow, IO is a specific monad. It is not a general monad concept. Haskell's IO paradigm does not rely on the IO monad being a logical sink. It works just as well if IO were implemented as a free monad, such that you could reason about it (for example) via assertions after interpreting the action sequence in some mockup environments. We are never restricted to just monadic laws when reasoning about specific monads.

TLDR:

If you require a proof that monads in general aren't logical sinks in order to refute your claim that we (somehow) rely on IO specifically being a logical sink... then the sophistry is yours, you sanctimonious ass-hat.

### You need a logical sink

I would argue you need a logical sink. If the type system is sophisticated enough to represent any computation then you in effect write the program twice, or with type inference you infer a second program with the same meaning from the first.

What makes a type system useful is that it loses information: "\x y . x + y" does not have the type: "a -> b -> a + b" for a reason. Likewise with side effects we do not want the type to be the exact operations performed, just a list of the types of side effects. This allows us to use constraints on higher rank types to, for example require a function argument may not allocate memory for example.

So a monad is a logical sink for types, but that is what you want. You could argue that the type "Int -> Int" is a logical sink for integer filters, as any filter with that type could be in there.

Remember the type does not change the program, so just because we give a program a monadic type, does not change the program, you could give the same program a non monadic type, and it would still do the same thing.

To put it another way, I don't understand what extra you expect a type to do.

### a line in the sand

"We will bomb you you B52s if you do not let me specify the exact operations performed!!"

I just mean that one moment we want something that is an equivalence class of operations (e.g. the big wide generic hammer term "IO"), and the next moment we want to break it down further and then we have to invent new types and rework things to get that detail into the static typing. I want that kind of "dialing in" of detail to be really easy and water tight. E.g. "this only writes to the display, not the printer" or "the side effects are not worse than O(n)".

It makes me think of capability approaches to security. I want to have some abstract generic infrastructure that lets me dial in what i want to dis/allow or track and monitor.