Are Monads a Waste of Time?

I was thinking about this yesterday. If the point of functional code is that it is easier to reason about, should we like Monads? Consider we have an interpreter, and that we construct a sequence of commands in the free monad. Given a program like this:

f = do
   s <- newRef 
   writeRef s 23
   writeRef s 42
   readRef s

Now I have deliberately not specified the implementation of the interpreter, it could be pure using threading, it could be impure using IORef, or it could be any other valid implementation in Haskell except we won't allow "unsafe" code.

The question is, just looking at the above code, it looks like "s" is mutated. Yes the implementation in the monad could be pure and use state threading, but how does that help us reason about the above imperative program? How is the above program any better than the following JavaScript program:

function f() {
   var s
   s = 23
   s = 42
   return s
}

Ignoring syntax differences (because we can trivially translate from one syntax to the other) surely we can make any argument about implementation in the free monad with an interpreter about this code that we can about the first example? Surely this means that this second imperative program is no easier or harder to reason about than the first?

Hence my deliberately controversial title, why write an imperative program in a monad? Either we should actually write code in a visibly referentially transparent way like this:

f = let s1 = 23 in let s2 = 42 in s2

Or we may as well just use an impure language to start with. (I am deliberately ignoring the fact that parts of a program could be pure in Haskell).

The second second part of this question is, if making a program look as if it had state is as bad as having state from an analysis point of view, how could we prevent this in the type system? What loophole is allowing 's' to behave as if it had state, and how can that be closed?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Embedded Interpreters

Some answers that occur to me, for discussion are, that the first example really is no better than the second. We have in effect implemented an impure imperative language and embedded it in our pure functional language. This is always going to be possible because our pure functional language is turing-complete and if we have the ability to write a program that interprets a different impure imperative language in a file or string, and we have the ability within the pure functional language to compose fragments that represent parts of this impure imperative languages abstract syntax, you can't stop programmers doing this kind of embedding. The final part of my answer would have to be with my current thinking that even though you can do this, it is not a good idea, as you may as well program in an existing impure imperative language, just like it is not a good idea to shoot yourself in the foot, although you can.

Edit: Another way to look at this is we could create an embedding for BASIC into our pure functional language, but this does not make BASIC a better language to program in. Does an embedding of a BASIC interpreter into a pure functional language make it any easier to reason about BASIC programs?

JavaScript is a mess with

JavaScript is a mess with its truthiness, implicit conversions, dubious semicolon insertion, second class references, automatic variable declarations, weird scoping, etc.. You'd need to try harder to make the Haskell imperative monad just as bad as JavaScript. Writing a JavaScript interpreter would do it, though.

I agree with one of your general hypotheses, that reasoning within the embedded language isn't easier than implied by the features of said language.

We do gain some benefits though. Like, JavaScript needs specific language constructs for looping. But with monadic languages, we can use generic loop and control abstractions. And there's also abstraction of the monad type itself, e.g. use of "writeRef" and "readRef" could be abstracted under a typeclass (e.g. using type family for reference type), allowing this little program fragment to be reused for many language feature sets.

I won't posit that monads are the best option. Monadic behavior is an especially awkward fit for concurrency. We could try process networks as an alternative to monadic effects models.

In retrospect, I think monads did not waste my time because it allowed me to think about my effects models and explore various alternatives in ways that working in concrete languages never did. This is liberating. When my entire programming experience was in imperative languages, I had little cause to even begin thinking about other ways of doing things, and when I did explore the language's pervasive features would fight me.

Like, what if we take an imperative language but remove only "new" from "newRef" etc.. Instead a space of references will be provided to the program and accessed by URL-like values. This makes saving program state, backtracking, inspecting and rendering state, pausing and resuming or continuously updating and compiling the program, etc.. much easier. I wouldn't easily think about this in languages where fresh variables are cheap and pervasive.

Monads are not a waste of time.

Following Betteridge's law of headlines...

I don't think Monads, as an abstract algebraic concepts are a waste of time. My thinking was more along the lines of unintended consequences. We take something that seems harmless (turing-completeness and the ability to write interpreters) and something else harmless (the free monad) and put them together to create something that appears to break referential transparency. I know it does not technically break referential transparency because of state threading, but when you read that embedded program it looks like it does, and I don't think it helps make that embedded program any more analysable. It's a loophole that lets us have stateful objects.

Now if we admit we want stateful objects, isn't all this linguistic boilerplate and complexity a problem and not a solution. We may as well build stateful objects into the language directly and design a type system that deals with it nicely.

One baked-in way of doing things?

Even if we want stateful objects, I might not want the design that you're thinking about. There are a lot of approaches to state (and objects) based on whether we want global features like undo, replay, transactions, publish-subscribe, iteration over objects, choice of concurrency model, etc..

You dismiss monads as just "a loophole that lets us have stateful objects." But I think of them more as "a substrate for a sequential DSL" which might not have arbitrary state (e.g. parser combinators, or restricting shared state to CRDTs) or might have non-conventional ideas about state (e.g. STM, ST, or adding logical timestamps to states and aggregating events to compute the next state, etc..).

Of course we can model languages that are difficult to reason about. But we can also model languages that are relatively easier to reason about, where some equational reasoning might still hold. That flexibility has been useful to me.

Algebraic Effects

Algebraic effects seem to be able to do all the things you list as important, without needing a monad as an explicit construct. You could use the argument that Monads are algebraic properties, so something that has the correct properties is still a monad whether you explicitly declare it as such or not (and I myself have made that argument in the past), but it does not require the programmer to understand Monads.

The interesting question for me here is, if you have algebraic effects, what else do you need to have a concise self-consistent language. Do you need constructor classes? Do you need higher kinded types?

Algebraic effects

Algebraic effects are useful. If you use the operational monad together with some row polymorphic variants (cf. the "freer" monad of Oleg Kiselyov's "Free and Freer Monads") you'll essentially get algebraic effects. Any monad can be implemented this way.

However, not all DSLs are sequential. For example, some represent process networks. In these cases, monads may be an awkward fit, as would be a PL that pervasively assumes sequential effects. Instead we might want arrows or applicatives or bifunctors or something else. An advantage of modeling monads explicitly is that you can simply avoid the concept when you don't want it. You could do the same for algebraic effects by marking lots of components pure, which isn't a huge issue but doesn't really leverage the language features.

So you're still making a design choice with baking in algebraic effects. It isn't a bad choice, though. Increasing performance at cost of awkward expression for some niche use cases might be a worthwhile trade.

I cannot answer your interesting question. Concise language is a verbose topic.

Haskell promotes bad imperative programming?

Who would have guessed..

perish the thought

"I am shocked — shocked — to find that gambling is going on in here!" — Captain Renault, Casablanca, 1942.

Haskell has been called ...

... the world's finest imperative programming language.

Common wisdom

Okay, the preceding was too snarky but Keean you're discussing common knowledge here. This has been discussed before at length here on LtU with people ascribing to various views.

Sure, a monad over a state means you'll end up with stateful reasoning to prove properties correct. That's just a no-brainer. There are some merits, like proper encapsulation of state, and some disadvantages, like poor composability of monads and the fact that this just isn't what pure functional programming set out to be.

About the latter, the hope was that functional programming would lead to a style with trivial correctness through mostly equational reasoning for instance. Monads completely shattered that dream. Now functional programmers are writing imperative programs in a bad manner. That wasn't the aim, far from it.

Values vs programs

Written in monadic style f is a value that can be introspected and manipulated like any other value. You can write a static analysis function to ensure the value satisfies whatever properties you want without having to rewrite or extend the actual underlying compiler. Also the value is built with a fixed set of primitives, you can easily know what all the possible effects are (e.g. you don't need to worry about exceptions if the EDSL doesn't have them).

The JavaScript program is a black box* you can't do much with it other than running it to see what it does. You can arguably extend the compiler to do any analysis you want or write extra tools, but you're always have to be wary of the full language semantics and possible effects.

* Well actually you can use f.toString() to get the source, but then you need a JavaScript parser and now you have Yak herd to shave before you can properly analyse the program.

Agree

I don't think the important concept is `monad'. It's having a pure semantics for whatever impure constructs your language provides. To me it's clear that this is a good idea. Advantages are simpler core semantics, type system support for tracking which imperative features are used where, ability to reinterpret or add new impure constructs.

(Though just using monadic style doesn't afford much introspection because continuations are opaque.)

Complete Capitulation

Then don't compare to Javascript but an OO language which neatly encapsulates state too. Tracking what goes on isn't much more difficult than observing the member variables.

You're thinking of Haskell when you propose that f can be introspected and manipulated like any other value. You can design OO languages that way too if you feel like. And you probably shouldn't because of performance reasons.

Haskell can get away with a lot of things because at heart it is a glorified expression evaluator. So far, they get reasonable performance but I wouldn't compare it to C. Stated differently, you seem to want to reason over and compose 'f', sure that's academically nice but comes at a performance cost people aren't willing to pay.

Monads, as used by most Haskell programmers, were just a complete capitulation that imperative programming, one might even say OO, is simply more productive, or even 'better'. And these programmers don't think about composability of 'f'.

Apples vs Potatoes

One can write in a monadic style* in any language, Haskell just has a nicer syntax, but one could reasonably add a nicer syntax layer in a Lisp or any language with macros. The syntax, though, isn't the important part, the ability to compose programs as values is.

Usually in FPLs, Haskell particularly, people tend to use this style much more because the language provides simple data definitions, pattern matching, etc., which help write these EDSLs. I wrote such programs in Smalltalk and Java (e.g. monadic parser combinators) too, but that's not the usual style in those languages.

WRT to performance there are many reasons why a free monad interpretation of state in Haskell would have poor performance when compared to the same code written in C, not just because "imperative programming is more productive and/or has better performance".

If a C programmer writes a naive FP interpreter on C it would perform much worse than the same program written directly in Haskell, it doesn't mean Haskell is inherently better than C, but that the usual Haskell compilers have much more engineering effort in it than whatever hours the C programmer put in his interpreter, ditto for a free monad interpretation of state in Haskell vs C.

As to monads being a "complete capitulation that imperative program is more productive", it's complete nonsense when you look at actual Haskell programs written on the wild. Yes, they include sections written in an imperative style, but most of the code isn't in monadic style and even the parts written in monadic style aren't imperative programs in disguise. It's a stretch to call a parser defined using monadic parser combinators as an imperative program (the imperative version wouldn't be syntactically near the version written in Haskell) and yet it uses monads.

Any decently sized real world program has multiple styles of programming in it, not because the main style "capituled" that the other styles are more productive for all purposes, but because there is value in specialization. Using SQL for querying instead of the imperative equivalent of walking through the B-trees and indices is more productive, yet that says nothing about the productivity of the programs that use such SQL statements, ditto for regular expressions, neural networks, and such.

No Liberation from the Von Neumann Style

I disagree with your observations which of course may differ from my random observations. Imperative Haskell programs seem to be the norm.

It was promised to me I would be liberated from the Von Neumann style of programming. That didn't pan out. That's why I call it a complete capitulation.

Re: Capitulation

You claim that capitulation implies imperative is "more productive, or even 'better'". But as far as I can tell, it's only a capitulation that imperative is the best way to integrate with imperative OS system call and foreign function APIs.

A model of interaction is imposed by the environment. Trying anything else will introduce presumptuous layers of abstraction and overhead. Best leave those layers to the programmers, right? Right.

I wouldn't assume the dominance of imperative is due to intrinsic qualities or alleged benefits of said paradigm. Path dependence and inertia are a sufficient explanation.

Sequencing Effects and Identity with State.

There are two factors that distinguish impure imperative languages: IO requires sequencing by definition, therefore language that respects the order of effects is necessary; IO objects have identity and state (read the file from the USB stick in slot 1).

Without both identity/state and effect order you could not store data to a particular object and then remove the object knowing the data is on it.

Re: Identity

We don't need identity, and state does not imply identity. We could use external wiring of processes like flow-based programming or Kahn process networks. We could use topic-based publish-subscribe. We could model a blackboard system. We can have pools of threads that in each step simply propose, receive, or suppress events, like behavioral programming. We can have content addressed memory and networks, or scoped broadcasts. There are many ways to approach IO that don't require a concept of identity.

USB

So how do I know which USB device I have saved my file to?

Context

How do you know what "USB slot 1" means to you and your system?

If you want to save a file to detachable storage device and you have more than one option, you'll probably examine the data already on that device in order to identify your target. Or perhaps you'll choose the most recently added device. I think you assume we'll indirect through an address such as "USB slot 1", and that's not a bad idea for efficiency reasons. But there is nothing to fundamentally prevent us from representing events of the form "save the file foo.txt to every connected storage device that already has bar.txt". Urbit essentially uses this with its network, broadcasting to all nodes that have a specific public key.

Similarly, if we were focused on external wiring like flow-based-programming, we might instead prefer to externally wire a process's output stream to a USB file. Like use of `myProc > /mnt/usb1/foo.txt` in a shell. Naturally, the human would still need to establish what "usb1" means in context. But `myProc` doesn't need to care or deal with object identity. Local ports/channels like `stdout` are sufficient.

Channels and State Macines

How would you write the operating system that connects the streams to files? Just pushing the problem outside the box seems a poor solution to me.

I agree that channels are sufficient for all IO, but you need to run a protocol over the channel which means a state machine at each end. I seem to remember a previous discussion where we disagreed about the nature of a state machine. The summary of that discussion is that you can view the 'plan' of a statemachine as declarative data, but you still need to respect the order in which events (messages over your channel protocols) arrive. You can describe state machines purely by pure functional transition functions, and infer the state storage, but such storage must still exist, you still need the states to implement your statemachine plan.

These transition functions require naming all the states, which means in a sequential protocol you will have many states A, B, C, D... etc, and transitions A -> B, B -> C, C -> D etc. This seems a lot like boilerplate and we could use a monad to sequence this resulting in expressing the protocol in a imperative form. Imperative form seems optimal when linear sequences dominate the state machines state diagram, but an arrow notation seems better when there is a high branching factor from every state.

I think the Imperative style developed and dominates because these state machines in general have low branching factors and tend towards linear sequences. From above we could simply view the imperative form as a simpler and more convenient notation where we allow the compiler to infer the state names.

To put it another way, we can view every JavaScript program as being in the IO monad, but there is no need to actually compile it that way when implementing on stateful hardware. There is no benefit to rewriting this program into the Haskell IO monad if we don't take advantage of the pure parts of the language, and adding pure functions to JavaScript would be a viable alternative (ignoring all the problems with the actual JS language which are not relevant to this thought experiment).

In the end I come back to Turing, and Stepanov. State is a wonderful thing that Turing introduced to mathematics. It expanded what mathematics could model. We do not want to go backwards, we should embrace state and strive to further improve the mathematical tools for dealing with it.

State is not identity

I did not say we don't need state. I very specifically said we don't need identity. I'm not sure why your counterargument only mentions state and not once mentions identity? These two concepts are not equivalent and should not be conflated. State is essential complexity. Identity can be avoided.

To avoid equivocation, I'm more specifically referring to the sort of "object identity" common to imperative or OO code, observed through aliasing. I believe this is the identity you intended. Linear objects do not have observable object identity. Because no aliasing. Values do not have object identity. Floaters in cellular automata do not have object identity.

Of course, purely functional systems can indirectly model object identity, e.g. by treating table keys as object references in a monadic context. And identity might even be useful or efficient. I'm not claiming that we should avoid identity at all costs. (It's "new" identities - as in `new Object()` or `newIORef` - that I find troublesome. I frequently use filesystem or URL paths.) But unlike a model of state, we don't "need" a model of identity within our software systems. There exist ways to do without.

Some of those ways: Make all objects linear, and model references/addresses explicitly if needed (like concept-oriented programming). Instead of addressable objects, favor mobile object concepts and locality-relative communication models (e.g. rendezvous, or broadcast to neighbors). Use "content addressed" storage, registry, discovery, and network models, i.e. moving based on intrinsic properties of things (types, data) instead of extrinsic namespace management. Record anonymous data entries, like datalog tuples or topic based publish subscribe.

Even filesystems could be replaced to avoid the object identity of filenames, if you insist on begging the question.

Relational Algebra

Does relational algebra offer a replacement for filesystems that does not have identity? Table rows are just tuples of values?

Not directly

You could certainly take inspiration from a relational algebra. But by itself I imagine it would be awkward and rigid. If that's a direction you choose, I would suggest pursuing something closer in nature to Clojure's Datomic or the Bloom programming language.

On the Origin of Objects

If you haven't already read Brian Cantwell-Smith's book "On the Origin of Objects", I think you might enjoy it. A large part of it is about the probematicness of objecthood and identity.

State, Identity, and Syntax

What does identity look like syntactically:

s.update(43)

Contrast this with no identity:

s2 = s1.update(43)

Here you can see that identity and mutable state are conflated. In fact I don't see how you can separate them. Values are immutable and have no identity (every 2 is the same so 1 + 1 == 3 - 1) however as soon as you have something mutable (it is stateful) you need to know which one you are manipulating. A = 1, B = 3, A += 1, B -= 1, A != B (the value of A may equal the value of B but they are not the same object, their identity is different).

How can you separate state from identity? How can we have multiple mutable state containers (objects) without knowing which one we are mutating?

Re: state, identity, syntax

I have read about a functional programming language extended with object-oriented concepts. In this language, the syntax `s.update(43)` was logically a linear replacement (or shadowing) of `s` in the environment. By convention, when using this syntax we'd want function `update` to return an object of the same or similar type. We also need a form `result <- s.update(43)` wherein `update` must return a pair. This language in question was briefly developed to describe a bridge between FP and OOP - mutable references were introduced in the next chapter. But it left an impression on me.

We can develop reasonable OOP systems based on mobile linear values/objects that communicate through local rendezvous or through a medium of shared environments. Arguably, this would not be too different from how physical objects interact with their physical environments IRL. Physical objects can be understood as linear structures interacting in physical media or manipulated by pervasive physics rules.

Modeling conventional OOP systems above linear ones is also not difficult: it simply involves explicitly creating a relatively stable addressing and routing model for the objects, a lot like homes or mailboxes for humans, where our objects reside permanently or perhaps occasionally visit. The stable address then serves as a form of object identity. But this identity is not pervasively conflated with object state: we aren't forced to use this addressing model everywhere.

Given how identity and even objecthood is a rather dubious concept (in philosophy), this separation appeals to me. Well, I also appreciate it because it works nicely with linear objects/values, which in turn work nicely with purely functional programming and term-rewriting and various other models.

Consider a functional programming language with a concatenative syntax (such as Joy, Cat, Kitten). In these languages, linearity is an easy fit because `copy` becomes an explicit function, clearly distinct from move-operations like `swap` or `roll`. The environment - the stack - itself is also a linear object. Your two examples would correspond to `s 43 update` versus `s copy 43 update`, the latter results in having two versions of `s` in the linear environment.

IMO, functional programming using concatenative combinators does a lot to make these mechanical and linear aspects more obvious. OTOH, it's trivial to desugar lambdas into a concatenative syntax.

Assuming:
[B][A]a == A[B]   (apply)
[B][A]b == [[B]A] (bind)
   [A]c == [A][A] (copy)
   [A]d ==        (drop)

Defining T(X,E) such that:
   X T(X,E) == E for value type X

   T(X,E) if E does not contain X == d E
   T(X,X) ==
   T(X,[E]) == [T(X,E)] b
   T(X,F G)
     if only F contains X == T(X,F) G
     if only G contains X == [F] a T(X,G)
     if both contain X    == c [T(X,F)] a T(X,G)

Assuming linearity with your latter example involving A and B variables, you'll find that A does indeed equal B. Assume a desugaring like the above, such that `-> Var;` effectively pops data from a stack into a local variable, with sensible shadowing. (Kitten uses this syntax, although it isn't currently implemented by desugaring.) You might express this as:

1 -> A;  
3 -> B; 
A 1 + -> A;
B 1 - -> B;
A B =

This subprogram should desugar without use of the copy operator, and would evaluate to `2 2 =` by the last line. Our stack value at `=` might be represented more conventionally as a tuple `((S,2),2)` (where S is the remainder of the stack). Although the two numbers have clearly different locations (space is indeed an essential concept!), and even different histories/origins (which seems essential to how we think about objecthood and state), the objects themselves (2 and 2) do not have distinct identity. Local stack or parameter or tuple position doesn't strongly correspond to the sort of imperative mutable shared state or object identity you're arguing for in your examples.

Ultimately, it is the implicit, pervasive aliasing of state - such that we have an ad-hoc global address and shared state access model and no easy way to escape it - that conflates state and identity in conventional imperative languages. If we're given freedom in our language about whether, where, and how we model addressing, we can cleanly separate the concepts of state and identity.

How can we have multiple mutable state containers (objects) without knowing which one we are mutating?

Functions on linear objects can take a local view: they manipulate the objects they're given, and may return composed, decomposed, or modified objects. Objects or agents that we model as communicating through shared mediums (e.g. pubsub, tuple spaces, blackboard systems, behavioral programming) or external wiring (e.g. flow-based programming, Kahn process networks, functional-reactive behaviors) don't need any reference to or knowledge of which objects and media they're interacting with (beyond compatible IO types). Term rewrite rules, cellular automata, temporal logics (cf. Dedalus), etc. can apply physics-like rules to manipulate whole sets of values/structures without any concept of referencing or identifying them.

I'm sure there are several more ways to answer your question. I feel you just need to be a bit more creative, more accepting, perhaps more knowledgeable of paradigms outside the few you normally use to think about these subjects. Imperative is not fundamental to physics or philosophy; it's just your favorite paradigm.

OOHaskell

We did something similar, developing a bridge between OO and functional in the OOHaskell paper I worked on with Oleg and Ralf.

I of course see how linear types and shadowing can apply to "s.update(43)" and I have already discussed this elsewhere in the thread.

Assuming linearity with your latter example involving A and B variables, you'll find that A does indeed equal B.

That would in my opinion be an error, because A is not B, they may contain equal values, but they are different objects. If I am 30 years old and Bill is 30 years old, we both have the same age, but we are not the same person. In the A/B example A and B only have one (default) property, but they are still containers not values.

I think you are making an assumption when you say that imperative objects are my favourite paradigm, I am more like a disillusioned logic and functional programmer who finds the simplicity of objects enviable and practical. I wonder if all the time spent on complex type systems and functional modelling of state is useful in the end. It doesn't help me get my day job done, what does is the study and creation of algorithms, and the simpler and more efficient the algorithm the better. If I can express the algorithm most simply and efficiently using a mutable dictionary, then that is likely to result in the easiest implementation to understand and debug. I agree with you that control of aliasing is important.

not containers

That would in my opinion be an error, because A is not B, they may contain equal values, but they are different objects.

I had specified a linear context. Thinking of mutable variables as separate "containers" is useful to understand manipulations through an alias: the different variables may in the future contain different values due to external manipulations, and so value and variable should be distinguished. But there is no aliasing if linear. In general, linear objects are indistinguishable from their values. And hence linear objects have no more object identity than values.

Regarding your "different people, same age" scenario, that's analogous to how we can distinguish a tuple value (30,1) from (30,2), even though both have the same first value. No need to introduce object identity or state.

simplicity of objects enviable and practical

I can agree with your sentiment favoring ease of expression and performance. But I think simplicity is something deeper. For example, most languages make floating point numbers easy to use, which also enables leveraging high performance floating point hardware units. But it isn't very easy to reason about how these floating point representations lose information and accumulate error. In many cases, what appears simple is deceptive. I think the apparent simplicity of imperative or OOP shared state is also deceptive.

Linear Types

I think linear types probably are the best compromise we have right now. However having programmed in Rust I can say that they do not reach the level of simplicity I was hoping for. Is Rust doing it wrong? Can we have linear types without having to worry about lifetimes (surely if we have GC we can avoid this and leave it to the compiler)? Can we solve splitting containers in a simple and elegant (and preferably transparent way). Can we make it look like a simple OO language but have it catch the problems that you think make OO complex?

No linear types

As a another opinion, I don't like substructural types. Substructural types describe how you arrived at a value rather than the value itself and I want a focus on values. But I do agree that controlling aliasing is important to having a simple semantics.

Transactions

Surely databases have all these exact problems, with 1000s of operations from different origins all trying to operate on a single shared data instance. Perhaps trying to understand what existing solutions offer and categorising them against the ACID guarantees would help understand the compromises involved with different solutions.

One solution that occurs to me is to return somewhat to the SmallTalk semantic model, and consider each object its own (green) thread, and that all method calls are messages, so that all operations on an object are serialised (one of the strongest transaction guarantees). We could then improve on this by making the object a state machine by insisting the transition functions are pure.

re: Transactions

consider each object its own (green) thread, and that all method calls are messages, so that all operations on an object are serialised (one of the strongest transaction guarantees)

That's essentially Actors model. But it isn't truly transactional. Transaction literally means "across actions". True transactions require the ability to send multiple messages to an actor, and even receive responses from those remote actors to decide messages within the transaction. This quickly becomes a very nasty distributed transaction problem!

Distributed Transactions

This quickly becomes a very nasty distributed transaction problem!

It is only a distributed problem if we break certain assumptions. If we have a bank account and a wallet object and we want to ensure transactional integrity of moving between, we need to have a 'transfer' object via which all requests to transfer between the bank account and the wallet have to go. Using modularity we can package the bank account and wallet as private objects inside a module which only exports the transfer object as its API.

Edit: This approach seems to work well for Erlang.

actors and transactions

It's feasible to work around specific problems, of course. Assuming you're willing to do a fair bit of concurrency management work for every specific problem, on top of solving said problem. Distributed transactions are effectively a 'generalized' solution.

You asked about actors a few years ago. My reply from then: Actors model criticism. In any case, actors won't solve problems related to shared state.

Erlang and Shared State

You would replace shared state with message passing, like in Erlang which seems to work well.

Of course shared state does require identity, so are you not countering your own argument about alternatives to the stateful identity model?

Message passing to shared stateful objects

Message passing to shared stateful objects/actors/processes is still a form of shared state. You cannot "replace shared state with message passing". You're only moving shared state around, distributing it, not replacing it.

See also: Erlang style actors are all about shared state. This was in my criticisms list.

shared state does require identity

Shared state doesn't require identity. There are models involving shared state that do not use or depend upon a concept of identity. I've mentioned several such models this thread: publish-subscribe, blackboard systems, term rewrite systems, temporal logics like Dedalus, etc.. In these cases, anonymous objects/agents/rules interact indirectly through implicit, anonymous, shared, stateful environments. The components do not need identity to interact.

But just now, we were discussing shared objects or actors. Those models do heavily use identity.

Practical Alternatives

Are you really suggesting those as practical alternatives for programming? Prolog is a kind of term rewriting system, and I really like Prolog, but it has performance problems, so you end up introducing cuts, and it all gets messy and dependent on the particular implementation of resolution. I am not convinced you could use these kind of solutions for efficient general purpose solutions. I am also not sure they are 'human friendly' solutions. We need programming languages that humans can easily understand and that is normally the simple and obvious model.

re: Practical Alternatives

Those techniques are indeed practical and have been used for real-world systems.

Communications or interactions involving use of 'identity' fundamentally has a few problematic characteristics: components cannot interact without introduction, components cannot easily observe or enhance existing interactions, and dangling references for transient components. Systems based on this are thus difficult to extend, control, or upgrade at runtime (by updating software of specific components). They also introduce awkward life-cycle management concerns.

It turns out that techniques avoiding identity, such as publish-subscribe and blackboard systems, can be very human friendly and practical. This is because providing a "human-friendly" interface such as a debugger or dashboard to a software system is equivalent to an extension through which humans may observe, record, and control the system. It's very convenient if such extensions do not require invasive software changes or specialized compilation modes.

The cost of these techniques is usually a moderate performance hit because we cannot optimize for point-to-point communications. But it's a predictable hit, so we can still have real-time systems if we control the set of extensions.

In the real world, I've encountered use of a tuple-space like blackboard system for an unmanned helicopter that had flown over 35 unmanned missions. At the time, they had seven different agents contributing to the blackboard - handling different tasks like high level mission control, obstacle recognition, emergency obstacle avoidance, short-term tactical planning. This was discussed over powerpoint at a conference. The authors describing the system were enthusiastic about how easy the blackboard system architecture made it to extend the system, experiment, debug, upgrade software in real-time, test software changes against recorded or simulated inputs, etc..

I've personally made heavy use of topic-based publish-subscribe systems. They proved very flexible, adaptable, plug-in extensible, and debuggable compared to the old object-oriented systems they were replacing for unmanned systems control. Specifically, we used the DDS model for publish-subscribe not too long after it was standardized by the object management group.

Term rewrite systems and temporal logics I've seen less frequently. I once read a 1980s paper involving a Datalog-based temporal logic used for hard real-time mission critical systems - specifically, fly-by-wire. The Bloom and Eve programming languages could be taken as more modern approaches in this vein. Eve is especially designed around the concern of human friendliness, while Bloom is designed more around the concern of scalability. Like blackboard systems, these techniques are readily extensible and accessible. But we may need to be careful to ensure real-time properties in some contexts.

When you say "the simple and obvious model", I suspect you mean "the model I'm familiar and experienced with". Nothing against you particularly: I suspect similar from anyone who uses either of those two words "simple" or "obvious". It's usually a valid suspicion. Evaluating models isn't easy when we try to be objective and unbiased about it (a rubric of -ilities, cognitive dimensions of notation, complexity metrics, and so on). People rarely think deeply about what they consider simple. People cannot effectively compare what they know with ideas unknown to them. And, frankly, it seems no model is obvious to someone who isn't experienced with it.

In any case, I didn't mention these models to convince you of their virtues. How much you value extensibility and other features depends a lot on your problem domain. I mention the models because you repeatedly conflate identity and state, while I believe it's valuable to distinguish those concepts.

Agreement?

Does this mean that we agree that monads are a waste of time, because they embed state with identity into a functional language, and we could either just use a stateful language as the monadic embedding doesn't really help reasoning about the stateful imperative code, or we should be looking at some of the methods that do not have identity that you posted?

sequential DSLs, generalizations

monads are a waste of time, because they embed state with identity into a functional language

Well, the ST and IO types are monadic and certainly model state with identity (e.g. via STRef and IORef). But #NotAllMonads. Monads have a great deal of freedom about which features they provide. For example, some monads, such as State and Writer, already support state without a concept/abstraction for identity. Other monads might support backtracking or probabilistic programming or exceptions without any state concept.

Two features with real value include embedding for DSLs and generalizing algorithms across similar DSLs. You could achieve this without an explicit monadic abstraction, e.g. by use of algebraic effects. But if you use a first-class data type to model sequential DSLs, you'll eventually discover some value in generalizing that "sequential" adjective, and thereby reinvent the monad.

IMO, the main issue isn't the monad abstraction per se, but rather how the Haskell language and conventions surrounding it tend to force users to confront high level abstractions before they're ready. FP drill instructor: "The Monad! You can't handle the Monad, worm! Write out the while loop for five different DSLs first, refactor, then you'll be prepared to understand and appreciate the Monad!"

This isn't an issue specifically for monads, however. For example, Arrows or Applicatives or Contravariant functors or Fixpoint or high-level zipper/iterator or or or ... abstractions likely wouldn't do any better. It's just that Monad is the first, biggest hurdle that a new Haskell programmer encounters - starting at `main :: IO()` and the explanation of do-notation.

I remember when I first tried Haskell in University, circa 2003. I hadn't yet learned any real FP, only the fake first class procedural programming from Scheme and Lisp. At that time, the Monad was a daunting concept, like diving into the deep end of a pool before learning to swim. I only got past it by telling myself to ignore the abstraction, just do a bunch of concrete work in IO and eventually other concrete monads as needed (such as parser combinators). It was only much later that I had my eureka moment and grasped monads.

So I wouldn't say that monads are a waste of time so much as an undesirable hurdle if encountered too early in a language's programmer experience. And this is a consideration for any abstraction. If we can redesign languages (or IDEs) so programmers can first express themselves concretely then later generalize algorithms automatically, that might be a more human-friendly approach.

HList Paper

I think I was working on the HList paper with Oleg and Ralf around that time. Back then I was totally convinced that Haskell was the future of programming.

Monad Transformers are where it all starts to go wrong. This is what makes me think that Monads, or at least how they are used are the wrong abstraction. Algebraic Effects can be expressed as a row-polymorphic monad (where the constructor is the row), and this has been done in PureScript.

But if algebraic effects provide a way to define DSLs then why double up with Monads. I'm not saying it should be impossible to define a monad, but I think Monads should be a speciality topic, rarely used, and when they are used it would be specifically for their property of non-composition.

I like the approach to algebraic effects in this paper where they separate the world into computations and values:

An Introduction to Algebraic Effects and Handlers

algebraic effects again

I've already mentioned (here) Oleg Kiselyov's Free and Freer Monads.

That paper essentially argues for one concrete Monad to rule them all - the operational monad (called Freer), modeling algebraic effects via row polymorphic types (in the manner of HList). Assuming you've selected this concrete monad, you don't really need the `typeclass Monad` abstraction anymore. You don't need to inform your programmers that they're using a monad. Instead, you teach them about algebraic effects and how to define their own effects handlers.

I can agree with that position. Obviously many others do as well. PureScript, Kitten, Koka, Idris, Eff, etc..

Also, due to their concrete nature, operational monad and algebraic effects are a lot more convenient to model from dynamically typed languages or where typeclasses might be awkward.

My only issue with it: I have some doubts about monadic effects models in general. They're a little too sequential for a lot of use cases - a bottleneck for parallelism, awkward for composing concurrent computations, difficult to use for distributed stream processing. These aren't problems that algebraic effects will fix! This is why I'm experimenting with alternatives such as KPNs. I wonder whether baking algebraic effects into a PL syntax and semantics will prove a premature commitment to sequential effects models.

Implicit parallelism with multi-argument handlers

Frank (on arXiv, on LtU) elegantly supports handling several arguments at once, which would let you model parallelism (you can write a program waiting on several outputs at once, and an extension with parallelism would also let you wait for "at least one output", parallel-or style). See "2.7: Handling on multiple ports" in the paper.

(Of course, as I believe Matt M pointed out, this can also be seen as a restricted form of (effect-typed) process calculus, specialized to the "only one output" flavor of the lambda-calculus. Working with effect-typed process calculi may be equally or more convenient, but the syntax wouldn't be as familiar.)

Algorithms

This is a slight aside, but as Monads are about sequencing, it's not totally irrelevant.

Threads are problematic because shared state needs locks. This amplifies all the problems with impure imperative programming.

Implicit parallelism doesn't really work - speculative out of order execution in the CPU already exploits the available implicit parallelism in programs more efficiently than a compiler can due to the ability to schedule based on actual execution time and available resources on any given machine.

The best approach I have encountered is "algorithmic parallelism" which is where the algorithm itself is expressed generalised over a number of "tasks", sort of like a template. This is how GPU programming is done.

In other words we don't want to try and parallelise sequential code, but we do write the kernels in a sequential way (as all compute elements execute sequential code, even in highly parallel architectures like a GPU) over a generalised resource model. Something like a generalised abstract model of OpenCL.

Any suggestions on papers/work relevant to this kind of approach?

Actor Model for Parallelism

Thinking about the OpenCL model for GPU and multiple CPU programming, doesn't the actor model fit well hare? You have some local state storage on each compute element, which would be the local variables in the actor, and we have data streamed through the actor from various sources. We also can allow addressing external resources which would be done via message passing to another actor.

Identity tends to work well in the real world

It turns out that techniques avoiding identity, such as publish-subscribe and blackboard systems, can be very human friendly and practical.

Only for summary information. It's a catastrophe inside the big O.

This is because providing a "human-friendly" interface such as a debugger or dashboard to a software system is equivalent to an extension through which humans may observe, record, and control the system. It's very convenient if such extensions do not require invasive software changes or specialized compilation modes.

Very true, but that works well with summary events, and not anything with transactional frequency or higher.

The cost of these techniques is usually a moderate performance hit because we cannot optimize for point-to-point communications.

In effect, they are heavily tailored systems, and not loosely coupled. The cost of building a truly loosely coupled, event-driven system is near-infinite in terms of development and operational capacity.

For example I saw a "blackboard system" 15 years ago that consumed two servers each with 16 CPUs and 128 GB of RAM, and still managed to run like the rotting carcass of a hippo. We replaced it with a few commodity blades, and improved both the throughput and the uptime by a few orders of magnitude. (At the time, the solution that went to production handled over 20% of the world's foreign exchange transactions.)

Again, to be clear, it's not the choice of the blackboard that makes the system suck. It's the religious adherence to the least common denominator, i.e. publishing data with sufficient level of detail required by any conceivable consumer, instead of using domain knowledge to craft the appropriate granularity and/or rate of publishing.

Tuple space solutions work well for a vanishingly small number of problems. I've yet to find many problems that call for such a solution, despite the fact that I love the abstraction.

Global Broadcasts

I can't see anything using global broadcasts being remotely efficient. Yes you can optimise the messages behind the scenes, but how do you write that optimisation if the PL only supports the identity-less model of state?

It seems to me that you would end up having to build an identity-less DSL in a language that supports identity, so you can optimise it for each specific application.

To me this seems similar to Prolog and logic languages (which I really like) in that it seems very promising, looks elegant, but has all sorts of practical problems that stop widespread adoption.

re: global broadcast

Yes you can optimise the messages behind the scenes, but how do you write that optimisation if the PL only supports the identity-less model of state?

There is no such thing as a Turing-complete programming language that only supports identity-less models of state. For example, even in a purely functional language, you can still model addressable IntMaps of stateful 'objects' that address each other using integers. You can even leverage the column-like layout from entity-component-system to model extended attributes. But such models will be explicit! The better question is whether a language has some pervasive or implicit model of identity, beyond control of type systems or APIs.

Further, what matters isn't how you implement the optimizations, but rather how effectively you can locally reason about costs/performance while extending or modifying programs that aren't deeply understood by you. I grant that logic programming doesn't make performance reasoning easy (because simple extensions can incur combinatorial costs). But I feel you make an error when trying to generalize identity-less state models from your experiences with Prolog.

re: identity in real world

I feel that you're assuming naive implementations.

Real-time publish-subscribe systems such as DDS support multiple quality-of-service control options for both publishing and subscribing, e.g. so we can match on metadata about where it's acceptable to lose intermediate states, desired and available frequencies, how late the data may be. Subscriptions to topics can also support queries that refine them further. And we can "publish" the metadata about subscriptions, such that publishers subscribe to the subset of relevant subscriptions and avoid publishing if unnecessary. Publishers can precisely compute and publish just the topics of interest to at least one observer, rather than publishing all things at all times.

Ultimately, physical network communications can be reasonably close to point-to-point if there are only a few interested observers. There is some extra setup overhead, registering interests or available topics. But the right performance analogy is closer to "global search" or matchmaking in a shared index/registry rather than "global broadcast".

For blackboard systems, we also have a lot of freedom about the 'patterns' we support. We can take inspirations from various forums, work-order, inventory models, and CSCW patterns for example. Rather than publishing all data at all times, we can publish when some other tuple indicates a requirement. Resources and requests can be modeled explicitly.

And of course there are plenty of intermediate models, that only weakly use identity. For example, consider a MOO or MUD. These partition an environment model into multiple "rooms" that then receive local attention. Linear objects can navigate these rooms, and only communicate by altering the state of the room, its neighbors (e.g. by shouting or shooting or moving), and themselves. They do not communicate with each other by identity, and don't even need to hold a reference the room they're in. But the concept of rooms would naturally require some form of identity due to aliasing in navigation (that is, we can reach the "same" room from multiple paths like north vs. west-north-east). Mobile linear objects navigating a Cartesian coordinate system would operate similarly, with Cartesian coordinates becoming identity and interacting only with "nearby" objects. Intriguingly, a tree-structured world would have no aliasing and could avoid the concept of identity even for rooms. Identity is 'weak' in this design insofar as it depends on the topology of the implicit environment model.

For example I saw a "blackboard system" 15 years ago that consumed two servers each with 16 CPUs and 128 GB of RAM, and still managed to run like the rotting carcass of a hippo.

I've encountered OOP systems that have similar characteristics. :D

For controlling performance, the patterns we use ultimately matter a lot more than the paradigm. But the paradigm does affect which patterns are easy to use. Compared to logic or constraint programming, imperative and FP do have an advantage of keeping the performance for 'primitive' small-step operations and compositions down to O(1). But this doesn't generalize to higher level abstractions - collections-oriented patterns like visitor pattern, observer patterns, entity-component-system, etc..

Substructural types describe

Substructural types describe how you arrived at a value rather than the value itself and I want a focus on values.

Substructural types do describe values. Specifically, they describe functions, and how those functions may arrive at values. But functions are values, too. :D

Substructural types are convenient for describing some "correctness" constraints on functions. Opportunities vs. obligations correspond roughly to affine vs. relevant types. Handshakes and protocols between agents can often be conveniently modeled using token values with substructural types.

However, for the general use case, substructural types should be scoped to subprogram computations, e.g. attached to parameters of a function rather than directly to values. Basically, if a non-linear function with non-linear arguments can return a linear result, something is wrong.

Substructural types describe

Substructural types describe how a function was constructed, not what it does (as a map from inputs to outputs). I'm probably being overly pedantic in this context, though, because I'm ok with linear type systems in a practical sense, but just think they should be recast as structural type systems.

substructural types describe functions

When you say "describe how a function was constructed, not what it does", it seems you assume it's infeasible to describe or constrain what a function does in terms of its construction. But it isn't difficult at all.

I'm unclear on what you envision with "recast as structural type systems".

Substructural types can

Substructural types can certainly constrain what a function does but they also distinguish between functions that I would want to call extensionally equal. If you work with values that are descriptors of the functions, then the violations of linearity (or whatever) become observable / structural.

values describing functions

"values that are descriptors of the functions"

Do you mean something like a DSL or GADT that we can inspect, interpret, or compile?

Edit: I think I see how that would be possible, but the way I imagine it would also be awkward. And I'd want dependent types to statically enforce constraints when composing such functions.

I was thinking more of using

I was thinking more of using them during representation selection. Linear types would be useful for extracting an efficient representation. The representations are the "descriptors". I've mentioned my ideas of rep selection before, but still haven't fleshed them out. It's basically just a way of exposing the compilation process to the programmer.

substructural types for requirements

I can appreciate controlling representations through some programmer-controllable model.

But substructural types aren't just about performance.

As I mentioned, they're useful to model obligations or opportunities. Obligations such as: close the file handle after you've opened it (via linear token for file). Or opportunities like: you only get three chances to guess the number (via an affine lockbox object). This is why I say substructural types describe functions - requirements, not mere performance considerations.

I do believe it useful to distinguish selection of unique representations (to allow efficient in-place updates) vs. substructural types (to express protocol requirements).

Effects

With effects modeled at the value level (monads, algebraic effects, your Kann network idea, etc.), obligations and opportunities are already structural. I think modeling effects with values is a good idea, partly because it means they can be modeled with structural types, but for other reasons, like being able to write handlers.

Effects as Values

Can you elaborate on what you mean by effects as values? Computations have effects, values are data. I actually disagree that functions are values (at least referentially transparent values), functions take time to execute and may not terminate.

Effects as values

I mean that instead of having effects that happen as a side-effect of evaluation, you have evaluation produce effect descriptors that are values that the run-time interprets to produce effects. I prefer considering non-termination to be a bottom value, rather than effect. If your program that was supposed to produce an Integer diverges instead, then it's errant.

Abstract Syntax

So this is basically composing an abstract syntax for the language, that composes functionally, and is then fed into an interpreter, where the side-effects happen?

An interesting question is what happens if we make the abstract syntax co-data, so the interpreter can stream the values?

vau (of course)

All this is sounding a bit similar to the handling of side-effects in vau-calculi (reasonably accessible blog post, but not going into detail on the more interestingly side-effect-ish and difficult state calculus, yonder; more complete but daunting would be my dissertation).

It is codata

In a monadic style, each effect descriptor typically has a continuation that is invoked with whatever return value the effect has. So every step is under a function call. It's basically codata already.

Lazy Languages

I suppose it is in a lazy language like Haskell.

Even in a strict language,

Even in a strict language, you can model a stream as:

   data S a = S a (() -> S a)

.

Dup.

re: Effects

I'm all in favor of representing intended effects as values.

But it isn't clear to me how to represent obligations in an algebraic effects model. Nor how to represent ad-hoc resource-limited opportunities that affine types can easily model. If you can offer some insight on this, I'd appreciate it.

Very broad strokes

This is probably too abstract to be useful, but if you have a type discipline that you like (offers good power/pain ratio), you should be able to keep it. You just reframe it as being about effect usage rather than arbitrary value usage. Because effects are values, this will convert it to a structural type system.

Rust's linear types

The call-stack is the most efficient automatic GC region. But knowing what we can allocate on the stack is somewhat complicated, especially in context of first-class functions, and even more so if the functions might remember their arguments like stateful object methods. Rust's borrow model essentially makes this concern visible in the type system. White-box compiler optimization is probably less than half as effective. A moderate hit to ease-of-expression for tricky borrow cases seems a reasonable tradeoff in context of Rust's performance goals. It is true that GC could avoid the issue, but you'd certainly pay for it with performance - conservatively allocating a lot more on the heap.

Rust does have some difficulty with borrowing array slices, but apparently some people have addressed this by dropping into unsafe code to implement a split function. See Borrow Splitting. That page should solve your immediate hangups with Rust's linear types, or at least the ones you've mentioned to me.

A borrow model isn't without some benefits. It implicitly releases borrowed fragments when we're done with them, eliminating need to explicitly glue fragments back together. Without this, we instead need to track phantom types regarding adjacency so we know we can safely use the O(1) `append_adjacent` instead of the O(N) `append`. And then we'd actually need to call `append_adjacent` to rebuild our array from the slices.

Rust Unsafe Code

I really don't like having to do this. I have stopped doing so much with Rust because I had to use so much unsafe code to do reasonable things. For me there is something wrong with Rust's type system, the ad-hoc borrow checker being one of the most obvious ones.

re: Unsafe code

Using `unsafe` to express safe things that the type system doesn't know how to handle is exactly what it's for. Essentially, it's a way to add new built-in functions to a language. If you choose to develop a new programming language instead, you'd only be fooling yourself if you imagine adding new built-ins is much safer than permitting `unsafe`.

If you're pushing the bleeding edge of what's possible and efficient in a language, you should expect to see `unsafe` on occasion. I've certainly used the equivalents in Haskell and F#, to implement features such as distributed virtual memory and structure sharing for larger-than-memory values via secure hash binary references, and transactional database interfaces that use these values.

I haven't used Rust enough to speak good or ill of its borrow checker's subtle caveats, especially not in a more specific context of Rust's standard libraries. In general, borrow seems not a bad approach. Just, the developers perhaps overlooked the importance of splitting collections for various divide-and-conquer strategies.

Unsafe

Unfortunately Rust hides unsafe, which is "unsafe", really any code that calls an unsafe function should be unsafe, which in effect means all Rust code is unsafe. Haskell does not need "unsafe" code to do the same things, you never need to use unsafePerformIO.

I agree that pass-by-reference (which is what a Rust borrow is) is a good way to efficiently pass data to sub-functions.

Haskell unsafe

Haskell uses unsafePerformIO more than a little bit. Under the hood of Data.ByteString, as one common example. A lot of FFI marshalling integration as another (not all FFI is for impure functions). It's true you don't "need" to use it. Unless you need performance or FFI.

Haskell Safety

I disagree, you can put those external functions into the IO monad with no loss in performance.

The only case where I have ever used unsafePerformIO was to add debug logging to a pure part of a program, because to refactor a whole chunk of code into the IO monad, when the logging would not be in production seems a waste of time.

still unsafe

Haskell has unsafe operations other than `unsafePerformIO`. Such as, when marshalling, we have `peek` and `poke` that operate at byte offsets. It isn't difficult to peek or poke beyond the end of an allocated array.

Putting all ByteString accessors and constructors into IO would also make them unusable in practice, e.g. cannot use ByteString keys in a Map, cannot replace inefficient String type by utf-8 encoded ByteString, etc..

In any case, unsafe operations are probably used more than you imagine in Haskell, even if you're not the one writing them. But mostly for edge cases, which is how it should be.

Historic Library Reasons

I suspect the reasons for any unsafe operations are historic and not fundamental. There is no reason you cannot have a pure immutable byte-string, and a mutable byte string would need to be a monad. You do not want a mutable byte-string as a hash key, as you can easily mutate it without updating its hash index position (it's definitely unsafe). You should 'freeze' it into an immutable byte-string before using as a key. This can be done efficiently, in fact there is no reason it cannot be the same underlying buffer providing we know there are no other references to it.

I find it odd that you are arguing in favour of identity less implementations of state, and them pointing out flaws in existing languages where people have used such representations as if they are some fundamental problem.

re: Historical

I agree that inertia and path-dependence have a lot to do with things. As I mentioned, unsafe code is basically how we introduce new "built-ins" to a language without waiting on the language designers. The immutable ByteString wasn't a built-in, so we use unsafe code to implement it. Similarly, the accelerate libraries to support high-performance linear algebra on the GPGPU will hide its use of IO because basic math isn't really effectful and the GPGPU is conceptually an implementation detail. Or alternatively, we might want to use LAPACK or BLAS.

There's a lot of history. So, naturally, we end up using a lot of unsafe operations. D:

re: favoring identity-less state

I find it odd that you are arguing in favour of identity less implementations of state

Arguing "shared-state doesn't require identity" is not the same as "you should avoid identity when modeling shared-state". Much like "people pick their noses" is not the same as "you should pick your nose". So far, I haven't said much at all about what I favor or disfavor.

State and identity are separate concepts that should not be conflated. Keeping these concepts separate in my mind and heart is useful for understanding non-conventional paradigms, useful for thinking about alternative ways to model software systems. Especially at larger scales.

I do disfavor software that internally creates/allocates "new" identities rather than computes/discovers existing identities. The latter is vastly more convenient in context of job control and software system management, transparent durability, caching or mirroring, distribution and sharding, continuous deployment, debugging, etc..

That is, I favor a relatively RESTful approach to associating state with identity, not avoiding identity entirely.

Edit: obviously, conventional OOP systems do the opposite of my preferred use of identity. But it is feasible to model OOP software systems that avoid allocating identities (e.g. concept-oriented programming seems relevant).

Dup

Dup

Clarifying my point.

Let's take this one step further, what if we create an interpreter for JavaScript in Haskell, and then embed this in the free monad.

How is programming JavaScript embedded in Haskell any better than just programming JavaScript? It's the same language so any analysis would be just as difficult.

It's strictly worse

If you put your whole program as a giant JavaScript fragment embedded in Haskell then that's pretty clearly worse to me than just doing it in JavaScript. The tooling is worse, etc. and everything has an added layer of complexity. But that seems like a straw man. Who's proposing that style of programming?

Is arguing about monads a waste of time?

As Daniel Yokomizo said, the important thing is that your imperative program is now a first class value with all the properties that entails. This is a clear advantage over JavaScript. If you are posting code snippets of Haskell and JavaScript and pointing out how similar they look then you've completely missed the point, because regardless of the syntax, JavaScript does not make those computations in to values, which is the important part.

Sure, it would be nice to have another paradigm besides imperative programming with better properties. And there are people working on those things. Those new paradigms may or may not make use of similar algebraic structures.

I think it's a little weird that there are so many people who are triggered by the word "monad", and spend so much time trying to debunk the concept. It's just one algebraic structure among many, but for whatever reason it has become a strangely political topic among many developers.

Is arguing a waste of time?

"This is a clear advantage over JavaScript."

No idea why you think so. For an interpreter or compiler, if that would be the general behavior, that's just wasting cycles on something which may be academically neat but isn't an interesting property. For the programmer, if he wants that kind of compositionality, he can just abstract into functions in most languages.

The Value of Values

I'm not sure what "abstract in to functions" means. The point is to capture a computation as a value. You can indeed do that in most languages, Haskell is not special besides providing some syntactic sugar and static analysis (e.g. for dispatch on the return type of `return`).

As to why making things values is a good idea in the first place, you might enjoy Rich Hickey's talk, The Value of Values. It's not academic at all, it's extremely pragmatic and is relevant to all developers. "Wasting cycles" is the sort of argument people used to oppose C when it first came out so I'm totally unconvinced by that.

I am not saying you shouldn't

No, you can do it if you want to. But there are trade-offs. Lisp, of course, has had this for ages. Does that make Lisp better than C, or C better than Lisp? Neither, you make a list of pros and cons for both languages and choose one.

If C never adopts the idea that data and computations are interchangeable, I just would be fine with that. Even if there are languages where it does make sense. It just depends how high on the abstraction stack you want to go, and what you're willing to pay for that.

The value of state

I glanced at the talk. Yeah, neat. But it is also a prime example of how deceivingly simple ideas can be wrong. Sure, it would be nice if referentially transparent programming would pan out. So far it hasn't.

And apart from that, any OO guru can give a similar deceptively convincing talk that it's normal for objects to have state and that that should be encapsulated. Or are you the same person now as you were yesterday? Bank accounts change, there's merit to encapsulated state, it's how we see the world.

Neither talk would be correct. It's never that easy.

False temperance

If you glanced at the talk then you admit you didn't really grok it. He addresses the things you mentioned.

Sure, it would be nice if referentially transparent programming would pan out. So far it hasn't.

Of course it has.

Bank accounts change

The bank account's value yesterday and today are two different values. This is a really simple example that is addressed in the talk. You don't need or want mutable state to model this situation.

Neither talk would be correct. It's never that easy.

This sort of false temperance in PLT is kind of frustrating. "Well, these people over here say this thing, and these other people say this other thing, so the truth must be in the middle." If we want to be able to make progress, we have to be able to say that some things are just bad. You don't see nearly as much equivocation over, say, `goto`. We've accepted that it's rarely what you want. In the same way, this value oriented movement is making the case that, no, it's not about your perspective, or about trade-offs, but it really is the case that you want to be programming with values >90% of the time.

Have it your way

Well, if we are going to say that some things are bad then it is referential transparency. The computer is way more amenable to OO, it leads to better shorter code, you don't need half a PhD, programs are more readable, it easily satisfies most engineering needs, domain modeling is a breeze, tons of efficient algorithms can be easily implemented, and referentially transparent languages even capitulate into that direction.

Have it your way.

If some state has different

If some state has different values over time, it is mutable, even if it is time indexed, represented as a new world replacing an old one, or whatever alternative representation you can think of.

What FP people rail against is implicit mutation and stateful effects, not explicit mutation and stateful effects. And of course, if they want to do deal with a changing world, they have to model non-values (call them objects or entities or whatever synonym of object you can come up with) that have state that is changing, unless you are happy with 1970-era interfaces and the like.

Grandfather's Broom

The nature of identity and naming seems something fundamental to programming. I like the terms object and value for the two kinds of things. The real world seems to be made of objects, but objects are also values (as in my grandfather's broom). There is some kind of duality here. The components of a collection (the parts of the broom) appear to behave as values, yet if I carve my name on the head of the broom, this maintains state, so it is also an object.

To me it seems that concrete things are objects, anything physical in the real world. It seems that values only exist in the abstract (I have three oranges, I could swap a orange with someone else and I would still have three oranges).

Likewise in a computer memory locations are like objects. The problem in a computer is that instead of the contents of a location being an object, like in the real world, it's an abstract value.

Personally I think a language needs to put objects and values on an equal footing. The separation into pure and impure in Haskell, where once something is impure it cannot be used in pure computations seems unhelpful to me.

Purity seems to be a property of abstraction, so when I say I have three oranges, I abstract away the individuality of the oranges.

I guess the other interesting thing is that objects can be serialised into values, and values deserialised into objects. This all relates to how things are referenced and copied.

Whether something is an object or a value looks more and more like a property of the observer not the thing itself.

Fermions and bosons?

(Fermions would be objects, I suppose.)

I've long suspected that our programming languages at scale become so dependent on names that it becomes a liability. And yet, the point of the exercise seems to be explaining programs to sapient minds, for which language is the natural communication medium...

It depends on how you define

It depends on how you define things. A value is a point, an object is a speeding bullet, its position is mutable in what point it is at. Defining values as immutable atoms allows us to define mutable properties as being assigned to different values over time.

A GUID is a value also, of course, but with it you can form an arbitrary object. All you need is a dictionary, even an immutable one, which you can see as a value also. Now throw in some points in time (also values) that can index some more immutable dictionaries, and you easily have objects with mutable properties,

Objects are simply higher-level constructs that can be made up of value constructions, conflating values and objects takes away that nice layering.

Objects are lower level

My suggestion was that objects are actually lower level, and values are the higher-level constructs. The fundamental building blocks of matter have identity (and location). An electron has spin state, and is a unique object. Values only exist in the human mind as a way of categorising and abstracting the real world.

Off topic

I thought that at the lowest levels, quantum entangled particles don't really have individual identity.

while we're there

Depends what you mean by identity. Particles are of two kinds, fermions and bosons. The difference is that no two fermions can occupy the same quantum state at the same time (the Pauli exclusion principle, which is why you can only have so many electrons in each orbit in an atom), whereas two bosons can occupy the same state (e.g., photons).

You can always think like

You can always think like that, sure.

But when you are implementing a programming language system, it is nice to have values as the foundation to stateful objects, rather than the other way around.

Objects based on values

I don't object to objects based on values, although I don't see it as necessary. I am happy to have two classes of thing, objects and values on equal footing.

What I do think is important is the ability to abstract pure value-oriented interface from impure object-oriented code. Haskell only permits the reverse, I think you need both. To truly encapsulate state and tame the combinatorial explosion of complexity, being able to define a pure (message passing) API to a module is important. I want module implementations to be written in simple efficient imperative code, but I want the module interfaces to compose easily and encapsulate that implementation.

QFT the Ultimate

The properties that electrons have, like charge and spin, have values, like +1 or -1, +1/2 or -1/2. "Value" is the label we give them, but the values themselves can be measured empirically. As such, they don't only exist in the human mind. (Although you may believe otherwise if you're an anti-realist.)

You can describe a particle by a collection of such properties and their values, much as you do with an object in a programming language. However, the values associated with a fundamental particle don't change without some sort of external interaction. Particle "identity" can only survive such interactions in some cases, and then only if we choose to impose it.

(This becomes even clearer when you consider that even particles aren't fundamental, since there's good evidence that quantum fields are more fundamental, and particles are emergent phenomena arising from the behavior of fields. Assigning identity to a wave packet in a field is an arbitrary operation that depends on choices we make.)

In other words, identity for particles is a property that only exists in the human mind. We construct that identity by assigning it to a series of particles related by interactions over time. We do the same basic thing with "mutable" objects in functional programming, and indeed with all objects in the physical world that we consider to have identity.

It seems clear that at the lowest levels, "values" are in fact fundamental, as far as we know. They can't be decomposed, whereas particles - or wave packets, or fields - can be described by a set of properties and associated values.

Of course as you go higher up the stack of abstractions, the concept of a "value" can be used at any level, and such values can be composite and thus non-fundamental. But in both programming languages and physics, at the bottom of the abstraction stack, values must ultimately be fundamental, since they can't be decomposed, and objects have to be described by or constructed from them.

Quantum Computers

I am not sure I agree with your interpretation. If we take a bunch of particles which can have spin state, and compose them into a group that can have an exponential number spin states, and set it into a superposition, you cannot copy of that superposition, so they do not behave like values.

It appears to me that quantum cryptography also relies on the object nature of photons. If photons behaved like values then only normal mathematical cryptography would be possible.

a little weird

I think it's a little weird that there are so many people who are triggered by the word "monad", and spend so much time trying to debunk the concept.

Not so sure people are "triggered" by the word "monad". Granted people sometimes debunk the concept, but imho that seems like a natural reaction to other people pushing the concept.

The concept is trivial

I'm just confused as to why people have such a strong reaction to an algebraic structure that can be expressed in perhaps >95% of languages in common use. It's a very strange thing to put on a pedestal, to me.

Sure

Tell that to Wadler. Maybe he will stop popularizing trivial notions.

Embedding not Monads

The free monad is trivial in this case, it's basically just a fancy list constructor, we could write "run [Store 23, Store 42, Get]" instead. It is the embedding of an impure imperative program into a pure functional program that I am trying to discuss here. Although the free Monads role in embedding is interesting.

What interests me is the possibility of a balance ...

... between purity and impurity in a programming language. In particular, the Steelman requirements, which dictated what the language later called Ada had to look like, include this one:

4C. Side Effects. The language shall attempt to minimize side effects in expressions, but shall not prohibit all side effects. A side effect shall not be allowed if it would alter the value of a variable that can be accessed at the point of the expression. Side effects shall be limited to module-level variables. The language shall permit side effects that are necessary to instrument functions and to do storage management within functions. The order of side effects within an expression shall not be guaranteed. [Note that the latter implies that any program that depends on the order of side effects is erroneous.]

(It actually says "own variables of encapsulations" in the third sentence, but module-level variables are what is meant.)

Ada as we know it actually permits arbitrary side effects in expressions, but taking this seriously suggested to me a language design in which subroutines and functions are firmly separated but treated on an equal footing insofar as possible. Subroutines don't return values but can have call-by-value (Ada in), call-by-result (Ada out), and call-by-value-result (Ada inout) parameters, as required by Steelman requirement 7F. Functions have call-by-value parameters and return a single value.

Syntactically, the body of a subroutine is a sequence of declarations of local variables with or without values (specifying a value is equivalent to a declaration plus assignment), primitive commands including assignments and calls to subroutines. The body of a function consists of declarations of local variables with values, commands that update module-level variables (possibly depending on the previous value of the variable), and an expression that provides the value of the function. Subroutines can contain conditional commands, functions can contain conditional expressions.

So the impure part of the program (which includes the main program) is specified by subroutines that may invoke functions in order to compute values. The functions are the pure part of the program and may not invoke subroutines. Pure code may or may not be evaluated lazily; values returned to impure code are forced. So this is something like Haskell, but impurity is not a bag on the side and laziness is not an implementation requirement, just a choice.

Encapsulation of Impurity

I like Ada, I especially like Ada's package and generics system, but I don't like the way they added the OO stuff using tagged objects.

What I am interested in is the encapsulation of Impurity. The following:

f(x, y) = {
   t := 0
   while x < y:
      t += x
      ++x
   return t
}

Has an impure implementation, but has a pure interface. In Haskell this would need to be in the State/IO monad, but what I would like to see is that we look at the interface not the implementation. I am interested in observable purity/impurity.

ST, Linearity

In Haskell, that would be in the ST monad. The impurity is encapsulated by runST. This is guaranteed by the type system by cleverly using a `forall` qualifier. The main disadvantage of ST is that we cannot use it to model continuations or generators (and hence concurrency) with impure implementations. This is ensured by the type system, but the fundamental issue is that the capability to "copy" a continuation value is incompatible with ongoing in-place mutation of captured STRefs. ST is only "safe" because we contained it to the linear context of a monad.

Another useful technique for efficient in-place mutation in functional programming is to leverage unique references. When we have a unique reference to a value, mutation is not observed. This also allows for "copy on write" approaches when we might have shared references. Linear or affine types can protect uniqueness, but aren't strictly necessary.

You have argued that objects should precede values, or at least be on equal footing. I'd call this goal a failure if we have major issues representing a pure interface to impurely implemented continuations. So ST and similar techniques won't work. (I suspect whatever you're envisioning would qualify as a "similar technique" in this regard.)

But linear values can fulfill this role admirably. We can understand linear structures as objects in the computation, with spatial and temporal properties, a history and trajectory. We can model conventional object oriented systems by explicitly modeling an address space for message routing (separating concept of object from address). But the default is also interesting in its physical analog: mobile objects that have no fixed address and instead only interact with their local environment.

But linear values can

But linear values can fulfill this role admirably.

Linear values don't compose.

composition of linear values

Can you explain your assertion?

In my experience, linear values compose easily enough that I've never considered it a problem. We can use them with normal data structures. We can literally compose two linear functions into a larger linear function. And so on.

I grant this: the presence of linear values in a language pervasively influences how we represent data structures and algorithms. Conventional data composition patterns might not be compatible with linear types.

For example, for record types, we conventionally we use `foo.x` syntax to access a value at label `x` and discard the remainder of `foo`. However, this is problematic because there might be a `foo.y` property with a relevant or linear type which cannot be discarded. So instead we must (in general) use a linear destructuring of records, e.g. `match rec with {x=myVar | rem} -> body` like row-polymorphic records. For large, recursively structured composites, the Huet Zipper (data structure) has proven very widely applicable for developing algorithms to safely access, update, and process trees or lists that may contain linear values.

Linear Array of Linear Values

Uhm. I need to reconstruct the observation from memory but the problem was as I remember a linear array of linear values. Since you can't reference a component two times you're stuck to swapping it out, modifying it, and swapping it back in.

Since that can be translated to an in-situ update I imagine that that problem is solvable but hasn't been solved yet. Though it follows the general pattern of 'introduce type system X to solve problem Y to end up with non-trivial problem Z.'

linear arrays

My recommendation is to partition and append arrays. A runtime can implement append very efficiently in cases where we know (statically or dynamically) that the components are adjacent in memory. Also, because arrays are usually language primitives, it is reasonable to introduce primitive features to enforce O(1) logical append of adjacent array fragments, and to track "holes" - array fragments consisting of undefined entries.

An advantage of partitioning compared to swap-and-modify is that partitions work nicely with fork-join parallelism. We can pass each partition to a separate task, update in place, then logically glue the pieces back together.

My own language simply doesn't have arrays as a primitive data type. But I instead optimize some lists to semi-transparently use an array or array-fragment representation, controlling this optimization via annotations. The common list length, access, update, append, fold, scan, etc. functions are transparently replaced by the compiler or interpreter with with equivalent implementations for arrays.

Of course, using an intmap or finger tree (or an LSM-tree variant thereof) is also a pretty good option given linear operations on these structures can also feasibly avoid allocations. Having small array fragments near the leaves might support greater efficiency, though.

Hansaplast

Sure, there are some solutions to ease the pain of the most obvious problems. But swapping is going to hurt when you've modeled a matrix as a linear array of linear arrays of linear values.

The problem becomes worse when mixing linear and shareable resources in a data structure. I am not sure you run into Rice. You're looking at some language inclusion problem, at best that means equivalent to deciding FSM membership, maybe intersection. At worst, you're looking at language inclusion.

Arrays are a polite fiction

Arrays are a polite fiction anyway. We don't really get "O(1)" update and access because of physical data layout and speed of transfer. If I were asked which is more fundamental - arrays vs. linear values - I'd say the latter. Hence, I attribute these "pains" you describe more to the fiction of flat arrays and random access memory. (An IntMap or finger tree zipper, for comparison, makes structurally clear which fragments are in "near" memory without relying on a hidden cache effect.)

Regarding a matrix, partitioning works fine. We split one array, then the next, then operate on the value, according to structure. Swap is awkward because it doesn't generalize nicely (due to need for dummy values of the correct type, and the non-uniform overhead for swapping the dummy value for struct types). Partitioning with a gap generalizes easily. It's essentially zippers for arrays. The only issue is that "zippers for arrays" are a little troublesome to get "right" in a simple static type system when compared to zippers for trees or lists. (It would help a lot if the partition type contained the array size, for example, but that gets us into dependent types.)

I've not encountered any difficulty mixing sharable values with linear ones. We can have records with both shared texts and linear components, for example. Can you illustrate the difficulty you're envisioning?

Maybe

Yah, but it's still a band-aid solution to one particular problem. And the problem with band-aid solutions is that you don't know what other problem you're going to hit around the corner.

Moreover, it also depends on the setting. Using linear types in a functional language to allow for efficient in-situ updates is very different from using them for live time analysis in an imperative language.

I am not convinced.

Linear decomposition of structure

Linear decomposition of structure, whether through techniques like zippers or row-polymorphic records, is a general and natural solution. Arguably, it's the symmetric dual to composing structures with linear elements. Partitioning of arrays is just one instance of this more general solution. That's essentially a zipper on the array type.

Perhaps you believe it a "band-aid solution to one particular problem" because I did not show my work, did not explain above how I originally answered that problem. That's understandable. Arrays are popular enough that I've thought about them years ago, and consequently I have an answer available in my memory. But zippers are a concept and solution that can be systematically applied to algebraic data structures. If you've composed a well-behaved data structure with linear values, there is necessarily some way to decompose, access, and modify it linearly.

Using linear types in a functional language to allow for efficient in-situ updates is very different from using them for live time analysis in an imperative language.

For real-time interactive processing, what we need is a model to inject input data and extract required output data in small steps, and a real-time garbage collector. With this much, we can develop real-time systems. The developer needs only to ensure bounds for how much memory the program requires and how much work is performed between input and output. If the bound is smaller than the period between inputs, it can be real-time.

Functional vs. imperative program model isn't a significant distinction here. Requests for effects are essentially a vector for output as far as the external real-time system is concerned; it doesn't matter how we express this within the program. The entire issue of paradigm seems irrelevant in this context.

And introducing linear values is also nearly insignificant. It only offers greater efficiency of avoiding allocations and garbage collection for a significant portion of the work performed. And it makes working with arrays more tolerable, allowing update in a false "O(1)" instead of a naive O(N), so we might suddenly favor arrays instead of an intmap or whatever. The improved efficiency might indirectly enable live update for edge cases where raw performance was previously almost-sufficient to keep up with high frequency live inputs, but there are no guarantees.

I'm curious what you believe is "very different".

Overshoot

Well, I shouldn't have said very differently. I was roughly thinking about the difference between having mutability native or expressed through an interface on abstract functions for in-situ updates. Somewhat in a similar vein as the distinction Keaan made between monadic and imperative programming. But when I think about it, I assume that inlining and peephole optimization could take care of a lot of problems.

Linearity

The problem I have with linearity is how it interacts with generic programming. Take for example a very simple algorithm like "swap". We want swap to take two mutable references to values of the same type and swap them. The problem I have encountered in languages like Rust is that it won't allow two mutable references into the same array, which prevents you writing a simple algorithm like an in place sort using swaps. My view of programming is that algorithms are fundamental, and simple (efficient) algorithms are more valuable than complex or inefficient algorithms. So here linear typing prevents us using the simplest efficient algorithm, which means it is working against the programmer rather than with the programmer. Yes you can work around this by either giving up efficiency or simplicity, but if this happens with every simple algorithm, you will end up with a complex mess. I think there is a complexity budget,that limits the most complex system a programmer can work with, so you need to be very careful to not introduce unnecessary complexity or you reduce the maximum size of program that a programmer can reliably work on.

Having established that, the array problem could be a limitation with Rust, and not fundamental to linear types. We could view an array as an index of mutable references, the index being trivially constructed from the range of addresses from the array start to the array end. You can the pass ownership of the whole array by passing a linear reference to the array, and pass individual cells to swap by their 'synthetic' mutable reference. You would of course have to write the sort in a way the type system can confirm the same mutable-cell-reference does not get passed to swap as both arguments.

Artificial problems

I can say from experience that linearity doesn't prevent generic programming. However! This is assuming a language designed for linearity in every primitive feature. Languages with non-composable features are hell on generic programming no matter what those features are.

It's unfortunately easy to introduce features unfriendly to linearity. An example I already described in this thread: conventional record accessors conflict with linearity due to discarding data implicitly, whereas row polymorphic record deconstructors work well by allowing capture of the record's remaining data.

Arrays are another primitive that needs special attention in context of linearity. We'll at least want some way to partition mutable access to arrays and treat partitions of one element as equivalent to other mutable references.

Linear Typing of Impure Imperative Languages

I think I agree, with the right approach to modelling arrays I think you could design a language that looks like the nice bits of JavaScript, but is linearly typed in the IO monad. If you also added effect types (again without changing the syntax) and effect constraints, you could then selectively restrict effects for parts of the program. We can have a wide range of effect constraints like no-mutation, no-heap-allocation, no-stack-allocation, no-exceptions. The tightest constraint would effectively prevent any runtime failure by not allowing allocation (no out of memory) not allowing exceptions (division by zero not allowed), maybe even not allowing context switches or interrupts, if we define non-termination as an effect we could even constrain that, or with the right cost function constrain the maximum execution time of a block of code. Such things seem esoteric if you look at current languages, but even UI interactive programs could do with 300ms constraints on blocking the UI.

ST monad is not enough

The ST monad allows encapsulating state, but it's a bad way to do it. For a start ST Monads do not compose so to write an algorithm that needs access to two different state 'encapsulations' at the same time is clumsy and not generic. Further we might consider other encapsulations 'safe' for example of we have exclusive access to a file we might wish to encapsulate that in a pure function. For a simple example a function that is passed an exclusive capability to access a file and returns the entire contents as an array can be considered pure, because providing the capability was given at the start of the program the file cannot change.

This leads to an interesting thought, purity can be for a given time-span only. We see this in Haskell that you can call a pure function from impure code. Considering the above we can also allow pure code to call encapsulated impure code, providing immutability is maintained for the scope of the pure code. This allows nested structures where impure code can give exclusive capabilities to a pure block of code, that can in turn call encapsulated impure functions that use the capabilities, with the restriction being that no mutation from the encapsulated impure code must be visible from the pure code, but such mutations can be visible from the outer impure context.

Re: ST not enough, monad composition

That monads do not compose is not a big concern. Although concrete monads don't compose, it's trivial in Haskell to abstract monadic APIs under corresponding type classes, and hence compose feature sets. Writing an implementation for a set of monadic APIs isn't very different from writing a set of handlers for algebraic effects. We essentially can develop generic subprograms of type `(EffectA m, EffectB m, etc.) => args -> m results`. Or even more generically, `m args results` so we can potentially constrain compositions, e.g. use arrows or categories instead of monads.

That Haskellers don't normally pursue this seems to mostly be a community style thing. To write code as generically as possible is not widely accepted as a goal. But there's also some inertia involved: the required extensions to Haskell (type families, rank N types, etc.) were developed later than the dominant style.

We can leverage ST for pure implementations of a `run` method for many sets of effect classes. There is no need to compose ST generically.

I agree that ST is insufficient in general. My reason is because `runST` doesn't let me return a continuation that preserves efficient access to encapsulated arrays. This constrains use of ST in incremental computation contexts. I cannot use ST to model generators that yield or coroutines that interact in small steps with external input from pure contexts. This is why I resort to linearity. ST continuations could be safely modeled if tagged as an affine type.

Regarding your last few sentences, at "such mutations can be visible from the outer impure context", that's probably incompatible with optimizations on pure code. But if you're just building up a set of effects to run upon returning successfully, you could basically use a writer monad.

Shifting Sands

I agree that effects offer a nicer way to do this, and with a large amount of typeclass gymnastics you can simulate row polymorphism (we did it back on 2004 in the HList paper), but the difficulty of doing it was a real effort. I even got as far as a complete web framework with runtime plugin modules and relational algebra, but monad transformers really killed it for me. On reflection I could have implemented an effect system on top of HList, but it felt like I was fighting the type system to achieve my goals. I was effectively implementing a new language inside of Haskell that did what I wanted, but I was stuck with the syntax, and having to duplicate code at the value and type level.

They say that all languages that are Turing complete can implement any program, and can duplicate any style, it's all about whether they make that style easy. You can write OO code in 'C' by explicitly constructing the vtables, but that does not mean it is going to be easy to debug and maintain. To me trying to write the code I wanted to write in Haskell reminded me of trying to write OO code in 'C' back in the early days. A clear sign that a new language is needed.

indeed

And people are developing functional programming languages with algebraic effects - e.g. Idris, Eff, and apparently your own ZenScript which I keep hearing about whenever Shelby tags me.

Hacker News discussion

Hacker News has over sixty comments as well. Several are insightful. Here.

Monads, Effects, and Language Design

Just read through some of the HN comments. I think I need to clarify some things, Monads in themselves as in category theory are neither good morning bad, they are (as someone on HN pointed out) a design-pattern, an abstract interface. What is more interesting and seems to be missed is that a monad is there whether you explicitly name it or not. JavaScript code can be considered to be in the IO monad. We can probably give types to most JS statements providing we have some syntactic sugar.

Monads have several problems, the biggest one being that they don't compose, and monad-transformers are a pain to work with. Algebraic effects seem a better way to handle this to me. Yes we can view algebraic effects as a row-polymorphic monad, and that may be how we actually type them

Instead of starting with an pure function language like Haskell and adding Monads for state and IO, we could start with an impure imperative language and impose constraints on the allowed effects. This might be an interesting way to design a language.

Monadic programming and Schrödinger's cat

It seems very clear to me (but I haven't had much luck explaining this so it makes sense to programmers) that one of the major objections to monadic programming is the same as one of the major objections to quantum mechanics.

In programming: The actual, real, practical, experienced world, the one that we live in, is a sequence of states. It's imperative. Pure functional programming isn't like that; it's declarative, and mathematicians like the formal well-behavedness properties of that, and would like to not have to deal with imperative state. So the pure functional folks came up with this monadic programming idea that lets them, for mathematical purposes, indefinitely postpone applying their pure functions to imperative-stateful reality.

In quantum mechanics: The world we experience at any given moment has a classical state, that is, things are a certain way. However, the rules about how one such classical state relates to the next are probabilistic; moving forward from one classical state is a superposition of possible successor states. The mathematics of how this superposition of states evolves over time is rather elegant (comparatively speaking), a "wave function" propagating according to a straightforward wave equation. However, there is no actual classical successor-state, only a superposition of possible successor states. Sooner or later, the system has to be observed, at which point something happens (in the mathematical description) called wave function collapse, in which the quantum state (superposition of classical states) derives probabilities that various classical states will be observed. From whichever classical state you observe, you then start over with a new propagating wave function. It's possible to get along by using this tool, at least for some cases. The wave functions are nice clean mathematics (if sometimes hopelessly difficult to solve), the wave-function-collapse is really ugly but one hopes to nudge it into a place in one's calculations where it doesn't get in the way of producing a useful prediction. And some physicists are satisfied with this arrangement, or at least settle for it in practice, while others get really worried that this isn't a complete description of reality, we're missing some important piece. Physicists have batted this problem about for the better part of a century without resolving it; they can't even agree on whether or not it's a problem.

A standard demonstration of the problem is the "Schrödinger's cat" thought-experiment. A cat is put into a box, with a device that might kill the cat depending on the outcome of some random quantum event such as a radioactive decay. The box is closed, and we suppose that while the box is closed we cannot tell what happens inside. Is the cat alive or dead? Neither, says quantum mechanics; it's in a superposition of states, essentially both alive and dead. Until we open the box, at which point either we observe that it's alive, or we observe that it's dead. But is it really okay to entertain a superposition of cat?

Hopefully the parallel with monadic programming is visible, here. The pure-functional program with its monadic structure is like the quantum wave function, mathematically clean though not necessarily easily solved, but apparently missing a fundamental aspect of reality that can only be reintroduced by applying the program to an actual imperative state of reality to produce a new imperative state of reality, from which then one starts over. And some people find the arrangement satisfactory while others feel that the whole framework is missing an important part of reality without which the description must be (qualitatively?) flawed.

It figures that this isn't going to be easily resolved, given that theoretical physicists have been at it for generations. About the only thing that might resolve the physics debate is some sort of theory that eliminates the wave-function collapse. Likely the same is true in programming — a whole new "paradigm" would be needed.

The actual, real, practical,

The actual, real, practical, experienced world, the one that we live in, is a sequence of states. It's imperative.

That first sentence I agree with. The second I find ludicrous. There is no particle or field in the real world that we (or any agent, other than a god) can manipulate imperatively. It seems terribly limiting to conflate stateful with imperative.

There are lots of paradigms that involve a sequence of states. The closest I know to the real world is cellular automata. More distant, but much more convenient to program, is a temporal logic (cf. Dedalus programming language) or term rewriting (e.g. Maude, OBJ), which can loosely be understood in terms of physical syntactic structures evolving in-place according to global programmer-defined physics rules.

mechanical computation

I like to say that "computation is mechanical mathematics". To add two numbers requires their physical representations have spatial-temporal proximity. Moving information is not free. Destroying information creates heat. Unlike pure math, we must carefully consider costs and frictions in computation or programming.

This mechanical and physical aspects of computation are most obvious when all states are syntactically representable. For example, it's very clear how cellular automata or term rewriting or even a Turing machine proceed mechanically through a sequence of states. Every step is visible. For those models, we can usefully view concrete syntax and source code itself as a linear object that has a sequence of states that an evaluator manipulates according to a set of rules. Essentially, the evaluation model is a disciplined approach to self-modifying code.

Lambda calculus obscures this somewhat due to the sophisticated automation and optimizations surrounding variable substitution. But purely functional programming can also be done with confluent combinator rewriting, no need for variables or substitution. I favor combinators over lambdas, but we can easily translate between the two. Regardless of representation, purely functional programming is still mechanical in nature. Further, it's a very discrete and predictable mechanics, so I see no similarity with state superpositions.

Effects are fundamentally interactions between a computation and its context. Like everything else, they must be implemented mechanically. That is, a context must inject or extract data between or during computation steps. This holds even for imperative system calls, which essentially must rendezvous between a process computation and OS context. The OS will extract the call parameters and inject result data.

Monadic effects models are not different or special. We have a term that rewrites to yield some operation and a continuation. The external context, whether the OS or an intermediate interpreter, will rendezvous to extract the operation then inject data for the continuation. Unlike an imperative thread, it's a lot easier to think about a monadic computation as a value or as a stateful linear object. This is because the point of interaction is explicitly modeled as a value type. But the essential nature of state, interaction, and effects remains the same. There is nothing similar to wave-form collapse when we observe a proposed effect, whether monadic or imperative.

The concept of state superposition or wave function collapse is neither useful to understand the state of a monadic computation, nor to understand effectful interactions between a monadic program and its environment.

Well, I grant an exception if we're modeling a monad for probabilistic programming or similar. If we are modeling such a monad at the effects layer, then we should yield a set of proposed effects, each weighted with a probability, each accompanied by a corresponding continuation. It would then be up to our external system to pick one and continue - this choice would be analogous to a "wave function collapse". Although, intriguingly, in some effectful contexts it might be feasible to select multiple effects (e.g. reducing 100 possibilities to 3 instead of 1) or inject multiple probabilistic results for a path. The k paths chosen add up to 100% instead of selecting 1 path with one result at 100%. Partial collapse will naturally lead to anti-causal entanglements where the probability for chosen past effects is refined based on choice of future effects.

There is no concept

There is no concept of state superposition or wave function required.

I feel the point of my post has not come across to you. I drew an analogy between the machinery of programming and the machinery of quantum physics; I wouldn't expect the particulars on either side of the analogy to show up on the other side.

Your analogy relies on

Your analogy relies on ignoring the mechanical, observable, ultimately physical realization of computation. Purely functional programming is inherently restricted by the "programming" aspect to reject non-computable properties, such as law of excluded middle or axiom of choice. There is nothing unreal about computation. Hence, I reject your analogy.

I also feel you're making invalid distinctions between imperative effects vs. monadic effects, despite both essentially having the same mechanical realization - a local rendezvous between a computation and its environment (e.g. a runtime or OS). Hence, I reject a major premise upon which your analogy was constructed.

I'll edit the quoted segment to: "The concept of state superposition or wave function collapse is neither useful to understand the state of a monadic computation, nor to understand effectful interactions between a monadic program and its environment." I think this more clearly states my third point, which is that your analogy isn't worth much.

As I mentioned, there are some interesting ways to realize an analogy between quantum physics and monadic programming, such as modeling, composing, and selecting among probabilistic effects. I mentioned this because it's something that interests me. I've contemplated such techniques in the past, in context of improving HCI (here). I hoped you'd be interested in these "particulars" of modeling probabilistic effects with quantum-like properties, but oh well.

Your analogy relies on

Your analogy relies on ignoring the mechanical, observable, ultimately physical realization of computation.

My analogy isn't even related to the physical realization of computation. Nor does it have anything to do with imperative effects versus monadic effects as you mention in your second paragraph. Somewhere between the two of us, each putting thoughts into words interpreted by the other, by the end of the round trip the original thoughts are no longer in evidence (like the urban legend about the automatic translator that started with "The spirit is willing but the flesh is weak", translated to Russian and back to English).

You claim your analogy has

You claim your analogy has nothing to do with the nature of computation and expression of effects. Yet, you based the analogy on a distinction between imperative and purely functional monadic code regarding exactly those things. So I don't believe they are unrelated. The analogy doesn't hold simply because purely functional programming isn't like what you described.

Yet, you based the analogy

Yet, you based the analogy on a distinction between imperative and purely functional monadic code regarding exactly those things.

No, I didn't.

We're failing to communicate. I want to communicate. I have no interest whatever in "fighting", nor in causing offense. Yet I'm not sure how to explain the problem in a way that will help things along. It seems you have come to expect certain kinds of errors of thought from others, you look for them, and once you believe you've found them you look no further. But to me the big concepts are the point of the conversation, and you're pouncing on specific words. It's kind of as if one day I looked up and saw some truly incredible sight splashed across the sky, and pointed it out to my neighbor, only instead of looking up at the sky where I was pointing, they objected to the gesture I was using to point. But the sight really is worth seeing, and I'd still like to share.

There's a difference between

There's a difference between "look at this, I find it interesting" and "purely functional programming with monadic structure is like this thing." The latter involves an assertion about properties of things.

Your earlier assertion, "the pure functional folks came up with this monadic programming idea that lets them, for mathematical purposes, indefinitely postpone applying their pure functions to imperative-stateful reality" is required to accept your analogy, "The pure-functional program with its monadic structure is like the quantum wave function, mathematically clean though not necessarily easily solved, but apparently missing a fundamental aspect of reality that can only be reintroduced by applying the program to an actual imperative state of reality".

Yet it's hard for me to find a phrase I don't object to in your earlier assertion. Applying functions to reality? Effects don't work like that, neither monadic nor imperative. Postpone effects? No more or less than imperative (*), so why single out functional? Imperative-stateful reality? There is nothing imperative about reality's state model, consider cellular automata instead.

(*) Because both have stateful evaluators (e.g. a linear structure in FP is no less "real" a stateful object than any object in context of OOP or imperative), and both can indefinitely delay emitting explicit externally observable effects whether by monadic yield or system call.

I believe it was your intention to share an analogy that you found interesting. After rejecting your premise of similarity, I can't even begin to accept the analogy. But is that really due to miscommunication?

I contribute to LtU because

I contribute to LtU because I hope to be part of interesting, constructive exchanges of ideas; but I reluctantly no longer think that is your goal.

exchange of ideas

I did mention an idea I find interesting and constructive - how probabilistic effects monads and selecting among probable effects gives us something very similar to quantum entanglements, superimposed states, and wave-function collapse. You casually dismissed it as irrelevant to your analogy. That's fine. You don't need to be interested in the same things as me. But I would have enjoyed an engagement on it, even if you were critical.

To me, a good exchange of ideas is one where the participants are also willing to give and receive criticism about those ideas.

Without criticism, I suppose another option is a museum - a place dedicated to the muses, a place to share art and artifacts to spark the imaginations. I don't consider LtU a museum. Although, it sort of feels that way these days with the few front-page articles rarely receiving comment.

If someone says, "imperative is like an apple because both are edible", how should an audience respond? Perhaps a clever responder might say, "no, but imperative is like an apple because both are full of wormholes", punning on an occasional name for side-effects that channel data behind the scenes. But there must be some misconceptions involved to create the analogy.

Your analogy is based on your conceptions about purely functional programming, which you described when leading up to your analogy. I consider those conceptions deeply wrong for reasons I've since given. Although I do vaguely understand how you might reach those conceptions, if you have not actually done much purely functional programming and effects modeling.

In your conception of LtU, should I have held my tongue?

You have afaict consistently

You have afaict consistently assumed that you understood my analogy. Your remarks indicate, to me, missing the point; it's only an added irony (or is it meta-irony) that the disconnect between us fits neatly into the pattern I'm describing that you're not seeing. I wondered, should I be trying to explain it a different way, hoping to get it across; but on one hand, it wasn't clear to me what to say that might work better, and on the other hand, you seemed to have become so obsessed with "debunking" what you claimed I was saying that it seemed unlikely you would pay any attention to an attempt to explain what I actually was saying. So it seemed my only options were either to just not respond to you (but that would be giving up hope of useful conversation, and I don't like to give up hope), or to try to politely point out that you'd misunderstood. At that point, the ball would be in your court. Your two likely responses would be either to ask some sort of question about what I had meant — indicating interest in communication — or to double down on your claim that what I was saying was invalid. You doubled down. I concluded, reluctantly, that tearing me down is a higher priority for you than communicating.

To me, a good exchange of ideas is one where the participants are also willing to give and receive criticism about those ideas.

Exactly. I'm uniquely well situated to recognize if you've misunderstood my analogy, yet when I indicate that you have, you prefer to insist my analogy is BS rather than entertain the possibility you might be fallible.

You have afaict consistently

You have afaict consistently assumed I don't understand your analogy. Yet you always STOP there, making no effort to explain or clarify based on what you have observed of my understanding.

IMO, repeating that I misunderstood without a word to explain my misunderstanding or clarify your point is not only unconvincing, it's also condescending. Unconvincing because it seems at least as likely that you misunderstood my counterpoint or what it reveals of my understanding. Condescending because you've assumed I'm either so close-minded as to ignore an explanation if you were to offer one, or that I'm so far beneath you that I don't deserve an explanation.

Seems you hoped for me to ask for said explanation, then assumed I was close-minded when I did not. Clearly we have very different ideas about how a discussion should proceed. Probably some cultural influence there. To me, if someone doesn't explain, it's because they don't want to explain, not because they're secretly hoping for a question. Asking can even be impolite - presumptuous, nosy, annoying. There is no body language or tone of voice in this medium to cover the gap. So if you don't volunteer an explanation for discussion, and I don't have explicit permission/authority to request an explanation, I am certainly not in the habit to ask. It's entirely up to you whether you actually "want to communicate".

Perhaps the meta-discussion has been slightly useful. Or it has potential to be useful, if either of us were to remember these cultural differences next time. (Which seems optimistic.)

So tell me, what do you believe I have misunderstood, and why?

How will you clarify your original point to avoid this misunderstanding in the future?

What other questions do you secretly hope to see on this topic?

Ahem

Monads may or may not be a waste of time, but this discussion most probably is. It may be time for the two of you to cool off, agree to disagree on this analogy, and go back to productive conversations another day on another topic.

(throat lozenge?)

Though I take in good part the caveat, I expect to make one more (careful) comment in the main line of this subthread, when I get a chance to compose such. On one hand, I owe dmbarbour a reply to their question; and on the other, if we're to have productive conversations another day we need the air to be well cleared now.

sigh

I fear gasche is only about one-third right. Probably right that this particular subsubbranch of discussion is no longer accomplishing anything useful, but wrong that we're disagreeing about the analogy, and probably wrong about the prospect of future productive conversations. And I really hate to give up on future prospects, which causes me no end of trouble.

I'm thinking I should save my efforts to explain my thinking for the recent reply by someone else.

Heh

And I really hate to give up on future prospects.

Because it suggests that your language is not expressive enough :-)

Verily

Or not abstract enough.

imperative world

a late comment: I disagree that "The actual, real, practical, experienced world, the one that we live in, is a sequence of states. It's imperative." This is just a very specific way to describe the world. For example it is really difficult to account for sound in such a world.

But I do agree that there is a valid analogy, and also that "a whole new "paradigm" would be needed" - maybe by drawing from different metaphors, or by revising these metaphors fundamentally.

I disagree that "The

I disagree that "The actual, real, practical, experienced world, the one that we live in, is a sequence of states. It's imperative." This is just a very specific way to describe the world. For example it is really difficult to account for sound in such a world.

I could quibble a bit on this. For one thing, in that paragraph I was talking about programming, not physics. And even for a classical physics treatment of sound, one might take "sequence of states" to include a continuous sequence of states. But those are minor quibbles.

More to the point, I think there's a flaw in the way I first presented the analogy (no surprise; I did say I've been having trouble explaining it). If, as you say, we're looking for new metaphors, perhaps the flaw is still worth trying to fix. At the heart what I was trying to describe, is a contrast between two different ways to approach describing reality, and my description of it was flawed because, being a participant in one of the two ways myself, I was describing it from the perspective of that side of the contrast. The contrast in how people view things should, perhaps, be emphasized over the technical details (and yet, how can one describe it all except in terms of the technical details?). This is going to take a bit of space, alas.

(I'm also having trouble with whether to start with the physics side or the programming side; I seem to recall once before trying to express this and suspecting it had fallen flat because I started with the physics side and lost programmers that way. While if I start with programming — monads — it seems that's the less clear side of the analogy. Hm.)

In physics, there was the famous.. or notorious... disparity of view between Einstein and Bohr; though it's not limited to just those two physicists. This is one of those things where people disagree about whether or not they understand what the disagreement is about; it's that fraught. But one of the issues floating around in that has to do with wave-function collapse. Is it really adequate to view the entire universe as a wave function that never collapses? The wave function is traditionally understood as describing a distribution of probabilities of classical states of the universe, but if the universe actually has no classical states... well, some people see that as a problem. [JS] Bell did some thinking on this that I, at least, found quite insightful. Recalling free-hand: In a (now old-fashioned) CRT, an electron gun fires an electron at a screen. It causes a scintillation. A quantum-mechanical wave function predicts the probability distribution; but there is nothing in the wave function that suggests the discrete event of the scintillation. Bell says it better, of course, and discusses various interpretations (such as many-worlds). But there's always going to be this basic contrast between those who see it as a problem that the wave-function doesn't have that discrete event in it, and those who don't see it as a problem.

In programming, a state monad does something similar to a wave function, by providing a pure mathematical description without requiring things to settle on a particular state of reality. The program takes the stateful world as a parameter, and computes the traditional result plus the new stateful world resulting from interaction with the program. Providing a stateful-reality parameter is like collapsing the wave function. And you see, by this point in my account of the analogy, some people who will be screaming to high heaven that this is not a problem, there's nothing "missing" from this monadic description of the program's interaction with stateful reality, while other people see the whole monadic approach as setting up a complicated mathematical device in order to pretend statelessness when reality is really stateful. Basic disagreement about whether there is a problem, and over a mathematical description that suspends choice between states, just as arose in physics.

The flaw in my presentation was, apparently, that one really wants to present it in a way that doesn't require taking sides between the two contrasting views, within physics, or within programming, even though part of what the two sides disagree about is whether or not there's a problem.

And, as for metaphors, well, physics still hasn't figured out what to do with these two contrasting views — they're still disagreeing about whether there's a problem. So we're not likely to borrow a successful solution from physics, although I'm still hoping there might be some sort of insight from connecting the physics and programming sides of the analogy.

[Slipped up on the name; doh.]

forms of analogy and states of reality

There are many forms of analogy. You've argued not just that "state monad is like a wave function" (which arguably has some merit), but rather the form of analogy you chose is "imperative IS TO reality AS state monad IS TO wave function". Hence, there's more to take issue with.

I argue that state monads are just as real (and unreal!) as imperative. Insofar as state monads are like wave functions (due to manipulating a model of program state internally, rather than directly manipulating the greater universe), so is imperative. Insofar as imperative is like reality (due to evaluation requiring mechanical manipulations of concrete, physical representations of program state), so are state monads.

Neither monadic programming nor imperative paradigm is especially close to reality. Physical states seem to be guided from past to future by physical laws. In that sense, temporal logics, term rewrite systems, cellular automata, and other rules-based programming models that simulate "laws" operating on a state representation would be much closer to reality than either imperative or monadic programming.

In any case, you seem to misunderstand where the disagreement with your analogy arises. You say state monads are missing something. I say sure, but those things state monads are missing, imperative also is missing. And conversely, those features imperative has, state monads also have. (Well, state monad or variants such as state-continuation monads to model concurrent threads.)

If your primary intention is to present an interesting relationship between program state or effects models and wave collapse, i.e. based on the fact that input to any program is a discrete observation on reality, or any program's model of world state is incomplete and simplified, then I think it would be better to leave out the whole issue of monadic versus imperative. As is, the impression I receive is that you favor imperative expression of programs because it's more "real" in your mind by some unjustifiable premise.

a state monad does something similar to a wave function, by providing a pure mathematical description without requiring things to settle on a particular state of reality. The program takes the stateful world as a parameter, and computes the traditional result plus the new stateful world resulting from interaction with the program.

These phrases "state of reality" and "stateful world" here seem like a potential point of confusion. A subroutine written in a state monad receives a representation of program state. Imperative subroutines also operate on program state - the heap and stack and data structures represented within them. In either case, the "particular" state is effectively an unknown parameter at the time of writing the subroutine.

Program state necessarily has a concrete, mechanical, physical realization during evaluation by a physical computer. In that sense, it could be considered a "state of reality". But also in that sense, it would be a state of reality regardless of whether we favor imperative or monadic expression for the program.

No matter which way I interpret it, I'm unable to see the proposed 'reality' of state as a valid distinction in your analogy. AFAICT, the relevant differences between "state monads vs. imperative" are more along the lines of "explicit vs. implicit" or "precise vs. ad-hoc" rather than "simulated vs. real". And even explicitness and precision are just a matter of degree, based on the language conventions, type systems, or the specific choice/API of the stateful monad.

Implicit Monads

Something is a monad because of its structure, not its name, hence the monad exists even where not explicitly named. Most imperative languages are monadic, as the monad expressed is a sequence with variable binding and a return value. It doesn't matter when analysing the program if the monad is explicit in the type system or not, but obviously language quirks can get in the way of a clean analysis.

Particle wave duality

We can show by experiment that although you can generate a wave interference pattern with an electron gun firing at a phosphor screen with a double slit in the path, the electron actually only passes through one slit or the other. Hence the idea that the wave describes everything is clearly wrong. The wave is a probability function.

Indeed

I'm inclined to agree, and I take that to be Bell's position.

Monads aren't a waste of

Monads aren't a waste of time in the absence of an algebraic effect system, as others have noted. I haven't read the entire thread, so forgive me if this is essentially a repeat of other comments, but: monadic code, with or without syntactic sugar such as "do-notation," presents us with a well-defined algebraic structure that obeys a few simple laws, and can be reasoned about in terms of those laws. In addition, all monads are also functors, etc. so all of the derived functionality from all the typeclasses monads are also instances of is available for your monad.

So while some people find the imperative-looking do-notation code disquieting (and some people even incorrectly call code that uses, e.g. IORef "impure"), it remains true that we can reason about such code via the substitution model of evaluation and equational reasoning—exactly the same intellectual tools as for non-monadic functional code.

A better question is: should we put everything in IO, "the largest monad that can exist?" I think it's fair to say it would be preferable to work in more than one smaller monad and find ways to compose them. This is why we have debates today about MTL vs. monad coproducts vs. Eff vs. other algebraic effect systems...

IO impure

As a long time Haskell programmer, I still think it's reasonable to "call code that uses, e.g. IORef "impure"".

We don't just evaluate an IO value. We expect to run it. Haskell does help us purely reason about correct abstraction and refactoring of IO fragments. But what we're most concerned about getting correct is the behavior when run, e.g. whether we need `atomicModifyIORef` at a given point, or whether asynchronous exceptions are masked correctly, etc.. Equational reasoning won't help us there.

Idiom is not a waste of time

One of the things we spent time thinking about in BitC/Coyotos was how we could simplify reasoning about stateful programs. There was never any question that the programs would be stateful; operating system kernels are about as stateful as things get, and the state threading approach really doesn't generalize to multiprocessed multithreading. For such codes, much of the issue of dealing with state has to be handled by whole-program analysis that exploits (checkably enforceable) whole-program design constraints. Language tools and idioms can help a lot, but they aren't the whole answer, and one need not (indeed cannot) lean on the language exclusively.

But...

Whatever the means of expression, reasoning about stateful programming is hard, and limiting the amount of stateful program you need to analyze is overwhelmingly important. To this end, we attempted to focus on several things in BitC:

  1. Clearly identifying relationships between locks and state,
    so that we know when single-threaded reasoning is valid.
  2. Clear segregation and identification of functional sub-programs.
  3. Facilitating functional idiom where possible.
  4. Identification of access-only sub-programs (ones that source, but do not sink, stateful locations) which can be treated as functional in some useful ways when we know the sourced values aren't changing during the window of execution.

Our goal in all of this was to limit the portion of the program that had to be analyzed using stateful methods. It turns out that a large proportion of most stateful is effectively accessor-only.

Looking back, I would put more effort than I did into state-threading idioms. While it is true that monads do not compose especially well, any idiom that reduces the burden of stateful analysis is helpful.

Analysis

I don't think it does help stateful analysis though. Look at the two programs in the topic introductions, one is in the IO Monad in a Haskell like language, and one is in a JavaScript like language. My whole point was we can analyse the JS program _as_if_ it were in the IO monad, and hence there is no advantage to writing that program in Haskell. We can use the same analysis techniques in both cases.

But it does...

Unfortunately that isn't true. The reason it isn't true is that the JVM admits both concurrency and mutability as core concepts with defined sematics, and Java's concurrency semantics are somewhere between horrific and unspeakably bad. The closest thing to a sequence point in Java is the bytecode. Indeed, you cannot assume from one bytecode to the next that a cell of array type points to the same array. Further, Java shares with C# the problem that "readonly" members actually aren't because of a flawed constructor specification, so you get no benefit from that.

Haskell, by contrast, has neither a problem with sequence points (because it is purely functional) nor a problem with concurrency (because it doesn't admit a notion of concurrency over shared mutable state).

So while I agree that what you say *appears* to be true at first glance, the difference in the respective low-level execution semantics of the two languages is profound, and has very significant impact on the just the sort of analysis you are talking about.

As with C, there are idioms that can be used to mitigate the analytic damage, most e.g. copying array references to a local (which can therefore be seen to be non-interfered on a thread-local basis) and computing from the local reference. But the point is: you actually have to, and even having done this sort of thing, the analysis is anything but straightforward.

Fix Imperative

This may be true of Java, but I'm not a fan of Java partly for the reasons you state.

My point is we could easily have an imperative language with sound semantics that conforms to a type system that uses a pure functional monadic model, yet is compiled like an imperative language direct to assembly.

So we could start with the IO monad, define a 'C' like syntactic sugar, and have a beginner friendly, OO like imperative language with sound semantics.