Non-determinism: a sublanguage rather than a monad

Non-determinism: a sublanguage rather than a monad

A puzzlingly named, exceedingly technical device introduced to structure the denotational semantics has by now achieved cult status. It has been married to effects -- more than once. It is compulsively looked for in all manner of things, including burritos. At least two ICFP papers brought it up without a rhyme or reason (or understanding), as the authors later admitted. I am talking about monads.

In truth, effects are not married to monads and approachable directly. The profound insight behind monads is the structuring, the separation of `pure' (context-independent) and effectful computations. The structuring can be done without explicating mathematical monads, and especially without resorting to vernacular monads such as State, etc. This article gives an example: a simple, effectful, domain-specific sublanguage embedded into an expressive `macro' metalanguage. Abstraction facilities of the metalanguage such higher-order functions and modules help keep the DSL to the bare minimum, often to the first order, easier to reason about and implement.

The key insight predates monads and goes all the way back to the origins of ML, as a scripting language for the Edinburgh LCF theorem prover. What has not been clear is how simple an effectful DSL may be while remaining useful. How convenient it is, especially compared to the monadic encodings. How viable it is to forsake the generality of first-class functions and monads and what benefits it may bring. We report on an experiment set out to explore these questions.

We pick a rather complex effect -- non-determinism -- and use it in OCaml, which at first blush seems unsuitable since it is call-by-value and has no monadic sugar. And yet, we can write non-deterministic programs just as naturally and elegantly as in Haskell or Curry.

The running tutorial example is computing all permutations of a given list of integers. The reader may want to try doing that in their favorite language or plain OCaml. Albeit a simple exercise, the code is often rather messy and not obviously correct. In the functional-logic language Curry, it is strikingly elegant: mere foldr insert []. It is the re-statement of the specification: a permutation is moving the elements of the source list one-by-one into some position in the initially empty list. The code immediately tells that the number of possible permutations of n elements is n!. From its very conception in the 1959 Rabin and Scott's paper, non-determinism was called for to write clear specifications -- and then to make them executable. That is what will shall do.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

welcome to the real world

Hear, hear! Such loads of categorial pretense, neither necessary nor sufficient for the job, emanating from lazy-land!

A correction, though: the mentioned structuring effect (ahem) is not to do with monads, but rather modalities, specifically the lax modality, as detailed in my Practical Foundations for Programming Languages, wherein I also make the point that it was all present in Algol back in 1960! The much-touted formulation in the Haskell language is neither original, nor even properly formulated (modality, not monad).

Will sanity at last prevail?

Is this surprising?

OCaml is not a pure language, it's function-arrows can contain side effects (like the whole ML family), so of course it doesn't need Monads, as every function is implicitly impure. It's practically an imperative language :-)

This code is pure

As far as I can tell, the code in this article is all pure. There are no uses of refs or similar, although the accompanying code provides an alternative implementation of the same signature that uses delimited continuations. The article also makes the case that there are "effects" that can't reasonably satisfy the monad laws.

monad nonsense

Right.

Effects

The effects themselves represent impurity, without the 'purity' restriction in Haskell this does not need Monads and is trivial.

Implement it in Haskell without Monads, and you will have something interesting.

Did you actually read the article?

This comment implies that you did not. The implementation here could be trivially translated to Haskell and would still not be monadic.

Problems in Haskell

The problem in Haskell is that the DSL operations exported from module S cannot have any state unless they are in a monad, so the whole DSL becomes monadic.

Edit: so the meta-language can be functional because you are "macro" programming the DSL which you can consider pure-functional composition in both Haskell and ML, but you can only have a functional DSL in ML because all functions can hide impurity, an Haskell the DSL would need to be monadic to enable it to be stateful.

GADTs

GADTs are the easiest way to model arbitrary DSLs in Haskell. And the DSLs can easily be stateful, that just depends on how you interpret the GADT.

Interpreting?

Are you suggesting that you use the GADT to represent a language that builds an abstract syntax representation of the computation? You still need an interpreter for the language, and that interpreter will be monadic.

We actually did this for SQL back when we were writing the HList paper, you used functional composition in the meta-language (in this case Haskell) to construct abstract representations of SQL queries, however when you actually wanted the results back from the database you had to run the query and that was monadic.

I suppose if we had pushed more of the program into the DSL so it could represent the UI and other IO requirements of the program we could have rolled the whole thing into a single abstract representation which we then interpret (or compile) in one go when the meta-program has finished running. Interestingly this is the direction I am going in my own work, using a Prolog-like meta language assembling a "DSL" that is imperative. In my case the DSL is intended to be as close to the machine model as possible whilst still being portable, sort of C-- like, and the logic language provides the generics, polymorphism, type-classes etc.

In any case the monadic structure exists whether you call it a monad or not. For example 'C' is monadic with unit="return" and bind=";". Monads are implicit in any imperative language, that is one that has a sequence of commands/statements rather than evaluating everything as an expression. A monad is an algebraic structure, so like an 'additive semigroup' it describes properties that concrete entities have anyway. Addition of numbers behaves as a semigroup whether you like it or not. Likewise the monad is there whether you like it or not. If you don't chose to structure your code algebraically it does not make the monad disappear, it's still there, just like it's there in 'C'.

Hrm

Monads aren't particularly important in Haskell, except for particular monads like IO. Also, C language isn't monadic for a number of reasons and even if it were, bind would not be related to semicolon since effect sequencing exists within expressions.

'C' expressions are in the IO monad

Are you sure? Yes what I have said about ';' being bind is an oversimplification, but I could take a 'C' program and give everything monadic types, and then type check the 'C' program.

The sequencing of effects in expressions just shows us that even arithmetic operators like '+' in 'C' are in the IO monad.

(+) : Numeric a => (IO a, IO a) -> IO a

Would be the type for the 'C' add operator.

Edit: it is also interesting to see the different views, below gasche argues that ML needs Monads, and here you are saying that Monads are not that important to Haskell. My views are somewhat in the middle, I think you are undervaluing Monads in Haskell, and I think gasche is overvaluing Monads in ML. Like Goldilocks, the middle way seems just right.

Monads, mo' problems

There are a bazillion reasons why the semantics of a fragment of C code isn't just a value in the IO monad. But as a very hand wavey idea ... sure, I get what you mean.

I don't think gasche argued that ML needs monads. His point was rather that an ML programmer might need monadic constructions (or effect handlers) for the same reasons that a Haskell programmer would. And I agree.

Haskell IO is a DSL.

The reason I think what Oleg has written is obvious, is this is precisely how every Haskell program is written. There is a DSL for IO that you macro program in a functional meta language. 'bind' itself is a macro that composes two DSL fragments into a single fragment. 'unit' lifts a DSL fragments into a meta-language fragment.

Oleg's DSL has a 'unit' in this context 'unit' is like macro quoting, he does not define 'bind' because his DSL is a functional language and does not need it. It is a Monoid because '|||' behaves like 'plus'.

So a monad is part of the algebraic analysis of the DSL. It is either monadic or not , which is a property of the language, whether you choose to call the DSL functions 'unit' and 'bind' or something else. If the DSL had statements as well, then it automatically would have been a monad. A monad is an abstraction over many DSLs, when you have programmed a lot in this way you notice that every DSL needs to be quoted to be handled by a macro, so you can abstract 'unit' as a property every DSL has. You then notice a lot of DSLs that are imperative have a command sequencing operation, basically "do this _then_ that", so you can abstract 'bind' as a generic operation you perform on all those DSLs. So being monadic is a property DSLs have by merit of being embedded in another language, and having an effect sequencing operation.

So 'C' as a DSL would be a monad (that is we can define a mapping from 'C' to the meta-language operations of unit and bind). Oleg mentions trying to understand what the minimal viable DSL would be, he does not have functions in it for example. The lack of 'bind' is not because bind is 'bad', but simply because this DSL does not need it. This is part of the exploration of 'minimal' DSL, not suggesting that 'bind' is not needed for all DSLs. If you want a DSL that involves sequencing commands, you already have bind.

So every Haskell program that does IO has already been doing this since Monads were introduced. The only difference is Haskell's IO DSL supports 'sequencing' so you can sequence fragments using bind.

This sounds different from what you or gasche have been saying, but maybe I just have not understood you. Hopefully you can see why I said it's obvious, as all Haskell IO is already done like this, with the addition that we want to sequence IO, so we can bind DSL fragments together.

I think this approach is very important, and most of my work since the HList paper has been based on this approach of a simple DSL with advanced features provided by the meta-language, except not hosted in another language like Haskell or ML, so the DSL is simply 'the language', and it can have its own purpose designed type system.

[Note: this is a different point to that above about the interpreter for the DSL needing to be in a monad in Haskell if it has state, which it does not need to be in ML.]

I think you may be confused

I don't know what you mean when you say that Oleg's example isn't monadic because his DSL is functional. Generally, I'm having trouble following your posts in this thread except as fuzzy ideas. The usual approach for non-determinism would have him doing this:

module type NDet = sig
    
   type 'a NDet  (* non-deterministic value of type 'a *)
   
   ...

This would treat non-deterministic values of any type uniformly, and would be monadic. Instead of doing this, he goes out of his way to handle only two concrete types of non-deterministic values: integers and lists of integers. Because he only has to deal with these two concrete types, he can provide implementations that he couldn't use if he had to implement "the whole monad", and he gives code generation as an example.

I find the construction to be kind of janky. It kind of reminds me of the Scala embedded DSLs that were posted here a while back, which I had similar concerns about.

Expressions only

What I mean is the DSL consists only of expressions, it has no statements, and hence has no need of bind.

Don't think that's the main point of his construction

Bind isn't just about statements. Look at the type of it. If it were just about statements, the continuation parameter would have type () -> ... And I don't think that's the point when Oleg emphasizes his construction not being monadic. It's rather that monads have to be defined at every type.

Bind is 'do this then that'

You do not need bind to compose expressions. For example '+' in a monad had the type: m a -> m a -> m a. Here the LHS and RHS monad can be executed in parallel, as there is no dependency between them (and the meta-language is pure, so no side effects).

Looking at the type of bind: m a -> (a -> m b) -> m b we can see the RHS depends on the result of the LHS, and so we are forced to run the LHS before the RHS. The type signature makes this sequencing explicit. If we contrast it with the signature for '+' the _only_ difference between '+' and 'bind' is this explicit sequencing in bind. So bind is all about sequencing one thing after another, the imperative "do this then that". If the DSL has no side effects then bind is not needed, and everything can be done with '+' style operators. Looking at Oleg's goal of being minimal, this makes sense to use an example that does not require sequencing, and is functional.

For example '+' in a monad

For example '+' in a monad had the type: m a -> m a -> m a

Where a = Int?

You're missing the point that Oleg could have made his example monadic without adding an explicit bind operator. The point isn't that he didn't need a bind operator. It's that for the example he chose it isn't possible to write one.

No Monads necessary

If it's not in a monad, then plus is just 'a -> a -> a' where 'a' is an abstract type. If there is no 'bind' then you don't need the monadic structure at all, you just need 'unit' which is provided by replacing each type like int with an abstract replacement 'int_t'. You have to 'quote' the DSL in the meta language somehow, all this does is "un-abstract" unit.

You still need an

You still need an interpreter for the language, and that interpreter will be monadic.

Interpreters will naturally be monadic if the language being interpreted is essentially monadic (that is, associative structure and sequential interpretation). Most stateful DSLs are procedural/imperative and hence fit the monadic structure.

But if you start getting into process networks, term rewriting, stabilizing soft constraint systems, cellular automata and variants, generalized arrows, or highly constrained languages (like a hardware design language with space and power constrained fanout), then the monadic structure doesn't always fit.

Of course you can force it to fit. Monads are quite general. But sometimes they're general in the Turing tar-pit sense. If it's awkward, you don't need to use a monadic interpreter just because you have a stateful language. And monads are awkward for representing and optimizing commutative structure, for communicating substructure, and elsewhere.

I'm recently of the opinion that KPNs would serve as a better 'standard' foundation for stateful, effectful interpreters and objects than monads. KPNs handle a fair bit of natural concurrency, state, flexible composition, and can hook into multiple input sources and destinations. It isn't difficult to make them reactive by uniformly adding some time/tick messages.

Monads are implicit in any imperative language

But imperative languages are not the only form of stateful languages.

And even when the monad is implicit - i.e. because you have your additive semigroup - a monadic interpreter may be inefficient if your language has much more structure than a simple monad (e.g. with commutative compositions or communicating substructures).

It seems you came into this discussion mostly thinking about imperative DSLs?

General Purpose Language

I think I came to the discussion thinking about general purpose languages. For example you could use a stateful minimal DSL embedded in a functional meta-language, where the meta-language provides all the nice programming facilities (functions, polymorphism, etc). Oleg mentions this in the article, and talks about the origins of ML as a meta-language for a theorem prover.

Embedding a pure functional DSL in a pure functional language seems trivial to me, you just need the 'quote'/'unit' operation. Maybe I'm missing something?

I also suspect there is some difference in our understanding of effects? The whole point was to demonstrate an embedding of effects. To be an effect there must be some implicit state or context that is modified by the effectively operators. Consider logging, we modify the state of the log-file when we encounter "log('something')". Without the implicit context or state,'log' is not causing an effect, it would just be a pure function.

Missing something

What you seem to be missing is the ability to doubt yourself. I have spent considerable effort trying to convince you that you are misunderstanding what effects are (I think it would have been respectful to at least give a try to the notion that you may be misunderstanding something), Brandon Bloom had the impression that you hadn't even read the article you were replying to, and Andreas Rossberg pointed that (obviously) you had not read the implementation. At some point there is a limit to the number of different kinds of ways someone can tell you that you are mistaken while remaining within the boundaries of polite conversation.

What is an effect?

I did above suggest that I might be missing something, but I haven't seen any suggestions as to what that is. What is the point in making an ad-hominem attack? I did admit that I missed the stateless implementation, but I don't think it undermines my point. It's a bit like dismissing an argument because the spelling is wrong, when that does not affect the actual point being made.

Let's switch tack. What is the difference between a function and an effect?

Edit: from 'An Introduction to
Algebraic Effects and Handlers' [Matija Pretnar]:

Algebraic effects are an approach to computational effects based on a premise that impure behaviour arises from a set of operations such as get & set for mutable store, read & print for interactive input & output, or raise for exceptions [16,18]. This naturally gives rise to handlers not only of exceptions, but of any other effect, yielding a novel concept that, amongst others, can capture stream redirection, backtracking, co-operative multi-threading, and delimited continuations [21,22,5].

Note 'impure' behaviour is there in the definition, so it is not just my personal definition of effects you are disputing, but what would appear to be a published consensus.

Oleg has a paper here:
https://icfp17.sigplan.org/event/hope-2017-papers-higher-order-programming-is-an-effect
this explains the approach, and states:

What is an effect then? It is an interaction with the context. An expression is effectful if it needs context to interpret it.

Note "interaction" with the context. Just reading the context is not enough, as we could then package the context with the function, so a closure (absent state, IO etc) is pure and not an effect. Oleg also talks about pure and impure code making the clear assumption that effectful = impure.

What we get though is the notion that if I take an impure imperative DSL with side effects and package it with the context and state (so for example an interpreter for that language) we can be left with a pure functional thing from the outside. For example a program to calculate the GCD in an impure DSL along with its interpreter all packaged into a functional interface like 'gcd(x : int, y : int) : int' can be regarded as pure from the outside. Effects are what link the impure code to its context.

Substituting a particular for a universal

I only glanced at this thread but it seems you keep substituting a particular for a universal. Compare it with money, dollars can be used to pay for goods, so can pounds, they're both money. Looks like you keep insisting that every form of money must be in dollars.

Monads simply aren't the only manner of dealing with DSLs or effects.

That's not what I am doing.

I have never said Monads were the only way of dealing with effects. I have said:

- Effects imply impurity (see quotes from Oleg above backing this up).

- In Haskell you have to use Monads for impurity because functions have to be pure (ignoring things like unsafePerformIO).

- That in ML you don't need Monads because functions and modules can encapsulate impurity. Of course you _can_ have Monads in ML, you just don't need them.

Where am I going wrong?

Edit: Thinking about it a bit more I think you have it backwards. The imperative case is the universal, because the pure functional DSLs that could be implemented with a functional interpreter in Haskell are a subset of the imperative.

With an imperative interpreter I can implement any DSL including all the examples like non-determinism as well as IO languages.

The functional interpreter can only implement the pure subset of those, even allowing state threading it can't do IO in Haskell.

So my conclusion is I am discussing the universal, whereas everyone else is focusing on a particular subset.

I'll try!

- Effects imply impurity (see quotes from Oleg above backing this up).

The code is pure so people are likely wondering why you're discussing impurity. If anything, Oleg showed he doesn't need monads for an 'effectful' interpreter. But, hmm, maybe not convincingly so.

- In Haskell you have to use Monads for impurity because functions have to be pure (ignoring things like unsafePerformIO).

You don't. Monads are one manner of dealing with impurity. Granted, they are concise and natural to programmers but likely one can prove there is an uncountable number of equivalent manners.

[Edit: True that Haskell implements Monads but that's a design choice. It's a particular, not a universal. You don't need monads for dealing with the outside world.]

- That in ML you don't need Monads because functions and modules can encapsulate impurity. Of course, you _can_ have Monads in ML, you just don't need them.

True. You often don't need monads where you would need them in ML. But I've used them just fine in a functional interpreter of ML. So what?

Why are we discussing impurity? What does a monad have to do with this all? Why are we discussing C?

Oleg just showed an example where he doesn't use monads for a DSL. Yeah, okay, whatever. These examples exist. An eval function for a small arithmetic language usually doesn't need monads either.

Well. I am not sure why we are discussing this. Ask gasche, he seems to know.

Precisely

Oleg just showed an example where he doesn't use monads for a DSL. Yeah, okay, whatever. These examples exist. An eval function for a small arithmetic language usually doesn't need monads either.

That's the point I made right back at the beginning which started the discussion.

In Haskell you have to use

In Haskell you have to use Monads for impurity because functions have to be pure

This is where you go wrong. Monads are also pure values in Haskell.

When you're writing in the IO monad, you don't need to concern yourself with how the Haskell compiler attaches the IO operations to the real world sensors and displays and databases and actuators. You just accept that it will be attached in a reasonable manner, and you're free to focus on your application logic. The same principle holds for representing effectful application logic using any other structure.

You could compile/interpret your structures to IO or just provide them as an export from a Haskell module for unspecified interpretation. You might compile the app to another intermediate language with another effects model, which again might not be monadic. In any case, what matters is your application logic and the application model for accepting inputs and providing observable output and evolving statefully.

The imperative case is the universal

It really isn't. The imperative language is just one more example of an interpreted structure. Even whole 'machines' can be understood this way (e.g. via virtual machines and black box testing).

You could reasonably argue that imperative is "closer to the physical machine". Or that imperative allows adding effects to an application in a more pervasive and ad-hoc manner (at cost of hindering typeful control over effects). But it isn't more universal. Every effects models is ultimately interaction of structure and context, whether it's imperative system calls or attaching cells in a cellular automata to a set of IO pins and registers.

Almost

I agree we can functionally compose DSL fragments in the meta-language. This to me is what Oleg was investigating in his example. This is how Haskell's IO already works, 'bind' composes fragments of an IO language, and then the whole IO program is interpreted on destruction.

However pure is clearly a subset of impure, you can call a pure function from an impure one, but you cannot call an impure function from a pure function.

Edit: Further to this 'bind' is just an emergent property of a language in which there is a sequence of statements.

Sophistry

Cue the endless debate? Haskell can only be called pure due to categorical dishonesty! And what isn't pure about your microprocessor? It picks up voltages and transforms those to other voltages due to some function which can loosely be called mathematically pure.

This is such a tiring discussion.

Trying to establish some inclusion relation on purity based on superficial observation of how one badly designed language does things boils down to sophistry. There are no compelling arguments either way.

context and scope

I think your "pure is clearly a subset of impure" statement is obviously true but also no more significant than an observation that "functions in (a -> a') is clearly a subset of functions in (a,b) -> (a',b)". In both cases, the former can be embedded in the latter. It isn't a truth specific to the nature of effects. It's just a truth about scope control in general.

There are many forms of scope control, of course, and not all of them are typeful. Object capability model is one example. Modal logic is another.

IMO, 'purity' is rather informal and is mostly useful as a statement about the scope of language expressions. Impure languages are essentially those that don't make it easy to control and reason about scope. Impurity is about chaotic scope. Impurity isn't about stateful effects except insofar as state tends to complicate scope (by adding a temporal dimension to scope). A language that has no stateful side-effects but permits ad-hoc reflection on the call stack or the deep implementation of its arguments (e.g. such that we can distinguish `42` from `6 * 7`) could still be considered impure from all practical standpoints.

Not all effects are monads in Haskell

Keean Schupke:

In Haskell you have to use Monads for impurity because functions have to be pure

It is worth noting that not all effects are encapsulated monadically in Haskell either. Specifically, non-termination is not.

Algebraic Effects

Whilst non-termination is a 'side-effect', I am not sure if it's an algebraic effect? What is the context the procedure is interacting with via an effect?

Co-hygiene

These days, I wouldn't think of non-termination as a side-effect, because I'm now thinking of side-effectfulness as violation of co-hygiene. Afaics non-termination is orthogonal to co-hygiene.

Re: Effects and State

Your earlier assertion to which I replied was this: "the DSL operations exported from module S cannot have any state unless they are in a monad".

I would first distinguish between "having" state (i.e. modeling a system that evolves statefully as a consequence of inputs, including time) vs. effectfully interacting with external state (a consequence of an external context observing outputs and providing inputs). There are a lot of useful systems, such as simulations or control loops, that may be stateful without specifying effects.

In any case, monadic structure is just one way to present observable outputs to an external context. KPNs are another - e.g. a 'log' output channel could be attached to a real world console, and a 'keyboard' input channel could also be attached appropriately. It's even possible to do something like insist specific cells within a cellular automaton shall serve as the observable "outputs" or mutable "inputs" from context of an external system. These are "effects models" neither of which is monadic.

You have accepted that monads can be "effectful" despite monads being pure values that essentially represent commands and continuations as data to an external context, like `Log("Hello", continuation)` or `Read(register, continuationReceivingValue)`. But I think you haven't yet generalized this to non-monadic structures. It works just as well for integrating KPN inputs and outputs with keyboard and console and GUI, or for integrating cells and time-steps within a cellular automata with real-world sensors and actuators. There isn't anything special about the monadic structure for the purpose of modeling and integrating effects. The only thing "special" about monadic structure regards its ability to conveniently model procedural languages.

Mondic Structure = Effects + Embedding.

There are a lot of useful systems, such as simulations or control loops, that may be stateful without specifying effects.

I would disagree, effects are happening. If it is stateful then there are effects locally. See Oleg's paper I referenced above. Simply storing data to memory is an effect. At a low level even functional languages have effects (the stack 'push' and 'pop' are effectful operations, and the stack is the context). However as Oleg points out what is locally effectful can be seen as pure from the outside, providing the effects do not 'leak' out. We require procedures to be regular (see Stepanov's definition of a 'Regular Procedure') then we can view them as pure from the outside.

KPNs can have internal state too, and if each node is a process the input and output channels would be effects to that process (taking the local view of each process). Globally the thing may be pure (in that putting the same input sequence results in the same output sequence), but each node is definitely not pure. As Oleg states pure/impure is all about where you draw the line. We know this because CPUs are all impure, yet we can write a pure functional language to run on them.

So where does that leave Monads? If you have a language with effects, like 'put', 'get', then you will have semantics that rely on sequencing, and therefore you will have an operation that maps to 'bind' and therefore there will be a monad for that language even if you don't call it that, or don't use it. Whether something is monadic is not a choice, it is a property of the language that you observe, like every rectangle has a width and a height. You cannot have a rectangle without width and height. Likewise you cannot have a DSL with effect sequencing without it being a monad, because embedding the DSL gives you 'unit' and the sequencing can be expressed as 'bind'.

So for each KPN where within each node we have a process that has input and output effects, we have a language with the property of sequencing. All we need to do is embed that KPN process language in a meta-language and we have a monadic structure.

The 'special' thing about monadic structure is that it is an emergent property of any effectful language when embedded in a (possibly pure) meta-language, and effects occur everywhere you need a context to interpret a program.

Edit: a further thought is that the effectful view is the more universal, because we can only have closed pure systems. As soon as the system refers to any external state (say a network) we involve effects. Also if the system had internal state we must be able to reset it, so we can replay inputs to get the same results. I think we can show there are more open systems than closed systems, and that every closed system is a special case of an open system with no external references. For example a KPN which references an external data store, or network. Now you could include that external store inside your model to make it pure again, but you would end up having to put the entire universe into the model.

Thinking about this, what kind of model would include a file-store inside the pure closure? As we require replayability we need to erase the store at the beginning of any call to this function. To me this clearly seems less general, but I admit there is a duality here.

You're still missing something

Likewise you cannot have a DSL with effect sequencing without it being a monad, because embedding the DSL gives you 'unit' and the sequencing can be expressed as 'bind'.

No. To be monad, unit & bind have to have the correct types. Unit is supposed to have type 'forall a. a -> m a'. If you'll look at Oleg's DSL, it only supports two embedded types: int & list of int. Yes, you can add unit and bind, but they won't have the correct types for this reason.

I'm gonna pass on discussing the meaning of "effect".

They Exist

Such functions with the correct types exist, whether you define them or not. My point is,if you _can_ define 'unit' and 'bind' for language, it is monadic, whether you use those particular functions or not.

Can you?

Care to elaborate? What return do you have in mind for Oleg's example?

Quoting

'unit' is lifting the embedded language into the meta-language, hence it is "quoting". Unit is an abstract operation, that is it generalises a specific operation. In Oleg's example he 'specifically' quotes 'int' as 'int_t' and 'list' as 'list_t'. Unit would be a generalisation of this operation so for example a macro that replaced '*' with '*_t'. We can probably represent this in ML with a type constructor, say 't a' so we would represent 'int_t' as 't int' and 'list_t' as 't list'. Now 't' looks more like a monadic structure as we abstract the embedding of the DSL.

If you change it use a type

If you change it use a type constructor that embeds all types, you've solved the problem. Sometimes you can't do that. When you can't, you don't have a monad.

Sometimes you can't do that?

Seems a bit vague. Precisely when can't you do that? In Oleg's example you could clearly replace the quoting of 'int' and 'list' with an abstract operation. I would suggest you can always do that.

Edit: Besides the abstract operation still exists even if the meta-language cannot express it. For the structure to not exist you would require a DSL that _cannot_ be embedded in _any_ meta-language.

Effects

I think your claim "If it is stateful then there are effects locally" is untrue. Cellular automata, synchronous reactive systems, and temporal logic provide fine examples of systems that are stateful without any "local" effects or manipulation of state.

KPNs are interesting because they can be understood as pure values. Although it is certainly most efficient to implement them statefully with an object per process, fundamentally a KPN "in progress" isn't any more stateful than a lambda application expression that we might evaluate by rewriting.

I think you should not simply hand-wave KPNs into monads. There is no single standard way to "bind" two KPNs. Every bind statement would need to include a specification of which output channels to attach to which input channels, which is already beyond the monadic structure. The type is also very complicated, and is not a general value type.

"As soon as the system refers to any external state (say a network) we involve effects" - I agree, but a system doesn't need to refer to external state to be useful as an effectful application model. A KPN could be wired into to a network and database by a third party (based on labeling of channels) without containing any references to a network. This is the scenario I've been assuming for KPNs as alternative to monads.

Capturing a KPN

One thing you get with strict sequencing of operations is the ability to capture continuations. Have you thought about doing something similar for KPNs?

KPNs are just values. The

KPNs are just values. The interesting point is due to their monotonic nature, we can easily inject and extract messages in parallel with ongoing evaluation, and we may freely commute injections and extractions on separate ports / channels which is useful for wiring up to real world systems or other KPNs. But we can certainly capture any KPN as a plain old value after any given set of updates to it. KPNs as first-class values, evaluating in parallel in the background, make for an interesting OO object model.

From an implementation standpoint, it's easier in a language that makes value copies explicit, so we can efficiently mix long sequences of linear updates with infrequent copies. (This is convenient for both KPNs and in-place array manipulations.)

KPNs can have state.

There is no restriction I can see on the node processes having state in a KPN. Each node could be a CPU with local memory. It is only required that the whole system produce the same output given the same input sequence(s). To say they are just a value seems an over simplification (or an extreme case of the 'outside' viewpoint).

KPN descriptions

The KPN value is essentially just the document of circles and arrows, a data structure describing the KPN. I sometimes think a better term for it would be "network description", and I originally was careful to use that phrasing. But the literature surrounding KPNs tends to conflate the two (always pointing to a description and calling it a KPN) so I've slowly gravitated towards using the term as I've seen it used. Apologies for the miscommunication.

Evaluating a KPN description (e.g. with some pending input messages) results in an updated KPN description (e.g. with some pending output messages). This evaluation may proceed by constructing some imperative or even physical representation of the network, then running it. If evaluation terminates, we'll have a normal form. We can still monotonically observe and interact with KPNs where evaluation doesn't terminate, albeit at risk of divergence if we try to observe an output message that is never produced.

Anyhow, it makes sense to think about KPN 'nodes' as stateful, even in context of an immutable network description. But it's important to not confuse layers. The "stateful" here can ultimately understood in the same sense that a pure lambda-based process model can be understood as "stateful":

type Process i o = i -> (o, Process i o)

State is an incremental aggregation of information over time, so such a process is stateful if we supply inputs over time.

Ultimately, KPNs could be understood as a `Process` with a rather more complicated type. (I haven't found a good way to model KPN types in conventional type systems.)

FSM

Even with a Finite State Machine model, with pure transition functions, the state is still there. I actually like this FSM kind of model, which is similar to some formulations of the actor model, where you have something like an object with private local state, and pure transition 'methods'.

You can capture a KPN as a

You can capture a KPN as a value, but I asked specifically about continuations. When events can arise in parallel it's not clear to me how you'd capture a continuation on an event.

Current continuations

Well, there is no equivalent to capturing a "current continuation" for KPNs, if that's what you're asking. But if you want to model delimited continuations, you would do so much as you might model delimited continuations in normal functional programming. You simply need first-class KPN values representing continuations.

Such continuations would be delimited both spatially (only having so many input and output ports) and temporally (being values produced, composed, and consumed at specific times by the parent computation). But they'd preserve a lot of nice properties of KPNs, e.g. we could compose our continuations, push messages into certain input ports, freely compute and distribute our continuation in parallel with computing the outer computation. We could usefully compile programs in a continuation-passing style. Etc.

I see, so you wouldn't

I see, so you wouldn't attempt to capture the continuation of a general KPN event, but would use a special continuation passing style when that's desired.

Affirmative.

But I'd recommend doing the same for conventional functional or imperative continuations, too. It is `call/cc` or `yield` that is "special". :D

Olegs HOPE 2017 Paper

Did you read Oleg's HOPE 2017 paper I linked to? Do you agree with him? I do, but perhaps he can explain it better than me.

partial agreement

I agree with his overall argument that effects should be understood in terms of interaction between an expression and its context.

That said, for "effectful" to be useful as an adjective describing expressions, it must make useful distinctions. For example, I would not consider `6 * 7` to be "effectful" just because it requires a "context", such as the programming language and libraries, to provide the meanings for `6` and `7` and `*` and even the whitespace.

Also, as someone who favor concatenative combinators over lambda calculus as a foundation for purely functional programming, I find Oleg's argument generalizing to higher order programming is very weak. I do higher order programming all the time without variables, without any 'context' beyond the language definition. My Awelon language has the following four-combinator basis:

[B][A]a == A[B]   (apply)
[B][A]b == [[B]A] (bind)
   [A]c == [A][A] (copy)
   [A]d ==        (drop)

Evaluation is by local syntactic rewriting, which is confluent. This behaves a lot like a stack language, but we don't require evaluation to result in a stack of values - i.e. we evaluate instead from program to program. Closures and higher order programming are modeled using explicit bind rather than capturing a lexical environment. There is no need for variables, environments, alpha renaming, etc..

It's Turing complete, and translation from lambda calculus is trivial enough that I can support lambdas as a syntactic sugar. Given `X` as a variable that is known to be a singular value like `[A]`, we can extract it from subprogram E such that `E == X T(X,E)` and `T(X,E)` does not contain `X` as follows:

T(X,X) =>
T(X,   ) => d
T(X, [F]) => [T(X,F)] b
T(X, F G) => c [T(X,F)] a T(X,G)

We can optimize this to avoid unnecessary copies and bindings. But essentially we get lambdas by simply rewriting `λX.E` to `T(X,E)`.

In any case, Oleg is generalizing from his own experiences, which seem to be oriented around monadic effects models in lambda-based functional languages. His justification for generalization to higher-order programming was based dubiously upon typical implementations of lambda calculus, rather than something more essential to "higher order programming". Based on my own experiences, I feel he's generalizing too far. The idea that "effects" are an interaction between expression and context is still a useful one. But we need to discriminate what is a relevant context and interaction.

Also, AFAICT, the paper isn't even relevant to your position that "If it is stateful then there are effects locally". Maybe we're using different definitions for 'stateful'. To me, a stateful system is simply any system that accumulates information over time (i.e. "history matters"). Use of effects like get/set to manage state is merely an implementation detail. Even the concept of "expressions" is unnecessary for modeling stateful systems, as demonstrated by cellular automata.

Stateful

Even keeping a history requires memory, with 'get' and 'put' operations. All you do is bunch all the 'gets' at the beginning of the program, and have all the 'puts' at the end.

A cellular automata still has state, it's implicit in the model that every cell has state storage. The evolution of the automata is defined by a state transition function in the same way as a finite state machine in terms of defining the state at time 't+1' from the state at time 't'. We still need to 'get' the state before we evaluate the transition function, and 'put' it back afterwards.

"get" and "set"

What's interesting about the imperative state model is that it models a separation between an agent (who performs get and put or set) and state (that which is get or set). This separation is especially interesting in context of concurrency or consistency concerns.

Neither FSM nor cellular automata exhibit this separation. FSM and cellular automata state models are fundamentally different from imperative state. There is no get and set by an external agent. Instead, there are baked in transition models.

Trying to understand a model in terms of a typical implementation is a good start. But it's a category error to conflate the two, no matter how typical the implementation.

Insisting that FSM or cellular automata is stateful simply because you can implement them using get and set is outright wrong. They are stateful for more fundamental reasons. And there are also plenty of stateless models that can be implemented with get and set.

FSM

Insisting that FSM or cellular automata is stateful simply because you can implementation them using get and set is outright wrong.

I am saying you cannot implement them without state. Having the state implicit in the model is not the same as having no state. I am saying 'get' and 'set' effects are not just some incidental thing, but something fundamental to the nature of finite state machines and cellular automata. They literally could not exist without them.

category error

Perhaps you are imagining an imperative machine as the foundation for computation. I vaguely recall you've mentioned something along those lines before, that you take some abstraction on Von Neumann as the basis.

I could argue instead that you should understand imperative get/set in terms of cellular automata. Fundamentally, our real world's physical state is updated by apparently local rules of physics just like a cellular automata is updated by local rules, and imperative get/set is ultimately implemented physically, by pumping electricity into dumb but carefully shaped rocks. Every 'get' and 'set' is performed by bazillions of electrons moving around, none of which know anything at all about get or set or even the laws that guide them. Even the physical CPU or memory or bus can't be really said to get or set anything; they just shape electricity according to physical laws. Cellular automata is in some sense closer to the physical machine than imperative. We can learn about imperative state, get and set, and computer architecture by creating and observing a big simulation of a PDP11 in Conway's game of life.

This would also be a wrong argument, for the same reason yours is wrong. It's a category error to attribute properties of an implementation to the model being implemented. Assuming a particular foundation for computation is both unnecessary and a logic error.

An FSM is ultimately stateful simply by definition of stateful. Its state is a function of its history. It doesn't matter how I implement the FSM - with imperative cells in memory, with a digital logic circuit, with a Babbage engine, with an FPGA, with a static object per state whose transition method returns the next state-object, with matrix multiplication, etc.. Intriguingly, even the layer of physical or mathematical abstraction at which I represent and implement the FSM doesn't matter. It is sufficient to know it can be implemented, and perhaps some costs/benefits (like, matrix multiplication easily generalizes to non-deterministic or probabilistic finite state machines, but can be quite expensive).

Anyhow, you are going to stubbornly insist on your position even after having your category error pointed out. I won't waste more of my time or energy trying to convince you. Perhaps someday you'll escape that mental trap of framing everything in imperative terms and convince yourself. Perhaps not.

Category Errors?

If I write a state transition function 'S(t) -> S(t + 1)' I need to read the state from somewhere at time 't' and I need to write it back somewhere at 't + 1'. State requires memory, and memory has two fundamental operations, 'store' and 'retrieve'... even human memory has these two operations. I have never seen any memory that does not have these two operations. We don't even need to think about the implementation, the state is right there in the mathematical definition. Even if we implement the FSM with a pencil, we would record the state by 'putting' it on paper, and we would 'get' the state by reading it back.

What we can describe without the state, is the definition of the state-transition functions. In other words not the FSM itself but a 'recipe' for constructing a particular FSM. I think you are making a category error by confusing the recipe for a particular FSM with that actual FSM.

Yes. Category Errors.

You're still making them.

Nothing about the FSM model says you need to read or write anything. You're just inserting your in-the-head computation engine and on-paper memory as an alternative implementation of the Von Neumann architecture at this point.

When I originally came out of imperative programming and started on functional programming, I also focused on "the actual thing" (like 'actual actors model actors' vs. those simulated within a function) as though an imperative or OO simulation for a model was somehow more 'real' than the functional one. With benefit of hindsight and perhaps a little work involving virtual machines, I know that I was wrong. I was trapped by an imperative way of thinking and framing ideas. But I haven't a clue how to grant you a similar aha moment.

Time

Time and events (effects with order) are fundamental and many processes simply cannot be modelled completely without them.

Returning to the FSM model, it does say you have to read and write the state. Otherwise it is just a 'power' expression, without storing the state between each expression. For example if we represent the state transition as a function 'f', then without state we have 'f . f . f' which would be f^n, where 'n' can only be infinity as we have no way of starting and stopping the FSM in the functional model (start and stop are time based effects). So we end up with an FSM that happens all at once and cannot interact with any other system by input and output effects.

Without effects an FSM is useless.

Folds.

You can fold a FSM over a sequence of inputs. You can preserve just that sequence rather than its current state. You don't need anything beyond pure functions for a Moore or Mealy machine. You could say "without a context to provide input and process the output, even pure functions are useless." And that is true, but it's still useful to distinguish them from effectful expressions, and it is not difficult to find uses for FSM within 'context' of pure computations.

Time is indeed relevant to being stateful. A stateful system accumulates information over time. History matters. But there aren't any specific constraints on how state, or time for that matter, is modeled and represented.

We do need some unspecified context to provide time and input and observe state or outputs, of course. But you err when you assume a specific context - such as imperative state, where an agent explicitly reads and writes states to external memory - then wrongly attribute to the FSM (or whatever you are modeling) the properties belonging to said context. It's your assumed implementation that reads or writes state, not the FSM.

Even if it were the case that imperative model is the ultimate, fundamental context for all state and time - i.e. such that our real world is implemented by imperative reads and writes in God's computer - your argument would still be formally wrong.

Fatalism

Your argument requires we buy into fatalistic philosophy, that the sequence of inputs is pre-ordained, and therefore can be known ahead of time and fed into the program at the beginning of time. There is no free choice in your worldview, there is no opportunity to start a program where the outcome of some future event is not yet known. If a program is interactive, and has an event loop, the state space if infinite, so you cannot model all possible paths in advance. This means even the simplest tool like a desk calculator is intractable as a functional model. I am reminded of Einstein who said "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality."

I think the "unspecified" context is the kind of hand-waveyness that causes problems in mathematics. A FSM consists of two components, a stateful storage, and a combinatorial time-gated feedback function. That is the definition, and if you read Kohonen's definative work on this it is all modelled in terms of memory, including neural networks, self organising maps, cellular automata etc.

Not at all

My argument only says I can model and use FSM in many contexts. I can model an FSM within a pure computation. I can model an FSM within cellular automata or neural networks or temporal logic. I can model FSM within imperative memory and languages. I can model it with digital circuit feedback loops that represent state structurally without using separate get and set operations. I can model it with matrix multiplications by encoding states and inputs and edges as vectors or matrices. There is no need to insist on a "fundamental" state model to understand or utilize the FSM model.

I haven't made any assertion that requires fatalism. That's just your straw man or misunderstanding.

You can model anything in mathematics in terms of any Turing powerful substrate, and you can model many things using less powerful substrates. The fact that you can model those many things in terms of memory is not surprising. Regarding how you feel about the "hand waviness" of unspecified context, your argument that FSMs are implemented with reads and writes seems quite analogous to claiming circles are black simply because you can't imagine any other way to draw them and you think it "hand wavy" that math didn't specify a color.

Incomplete Specification

What you are doing is not providing a complete specification of the FSM, you are implying the existence of some unspecified context that lets you interpret the combinatorial part of the specification as an FSM. However not specifying it does not make it not exist. It is still there whether you write that part down or not, you are just assuming the mathematician reading it has the requisite knowledge to fill in the gaps. Ignoring a problem does not make it go away.

Further you cannot actually use a pure functional model of an FSM, because you need to provide events to trigger the state transition functions. As these events occur in real-time, you cannot pass a list of all events in advance.

You could use a stream, but you would need the concept of suspending the computation when there is no value ready in the stream, and that's not functional anyway, it's cofunctional.

Now maybe we could agree you can model an FSM cofunctionally?

re: incomplete

The FSM model does not specify where the inputs come from. It certainly doesn't specify "real time events". For example, we can use FSM to model regular languages, to match regular expressions. In that case, the inputs to the FSM might come from a string, or might come from stdin. The source of input is simply outside the scope of the FSM. As is where the FSM output is used, or how the FSM is implemented.

If you want to call it 'incomplete', then go ahead.

But every model of computation is incomplete for this reason. Even imperative code can be run inside a virtual machine on a simulated network of other such machines. That entire virtual network could be represented and evaluated within a pure expression whose final result might be `42`. Imperative code cannot know whether all inputs are provided or confined within an expression. Of course, a pure context doesn't make those imperative language subprograms any easier to reason about or refactor.

It's wrong to say "you cannot pass a list of all events in advance". It's also wrong to say "you must pass a list of all events in advance" or "you must provide inputs in real-time". Whether it's FSM or imperative or even pure functions, the origin of inputs and use of output is outside the scope of your model.

You earlier quoted Oleg defining effects in terms of an interaction between expression and context. But I think you might have missed an important aspect of this: the interaction doesn't imply a global or real-time context. A purely functional context might observe an FSM state and decide whether to provide another character input or compute something else. This is still a useful interaction between model and context, and hence an 'effect'.

Objects and Order

If I take a FSM and feed it a string, and then give you the FSM, and say 'now model this functionally', but I will not tell you what string I have already put in, how will you model it functionally?

Starting state

An FSM does have a starting state. So if you've already "put in" a string, that will just move the starting state for the FSM representation you give me.

When can you pause it?

So if I can pause the FSM and give it to you after any sequence of state transitions, then we must 'store' the current state to the 'starting state' at each of these points in order to transfer it with that state. Then if we don't know when I am going to give you the FSM then the only solution is to save the current state to the staring state after every transition. We have just proved that we need to have a 'put' effect after every transition.

Sigh.

You're just saying: "But my FSM is embedded in some concurrent imperative/object context such that I cannot blah blah blah without pause/get/put/etc. Therefore, FSMs in general have these features." You're once again wrongly attributing properties of your envisioned context and implementation to the finite state machine. Same category error as before.

Further, the implementation you're envisioning wasn't even implied by the question you asked. For example, the "I will not tell you what string I have already put in" restriction clearly does not restrict how you manage inputs or represent state on your end, and does not imply the sort of concurrency or state aliasing of the FSM implementation that requires explicitly pausing input or writing intermediate states.

Fold Vs FSM

And yet a fold is not a finite state machine. Why bother having two different concepts if they are the same? Certainly your television and most other electronics would not work without finite state machines.

The point is, all implementations have state, so when you abstract the algebraic properties of an FSM, the state is not abstracted away.

All you need is one implementation that does not have any state to prove me wrong. Should be easy right?

Wat?

I do not conflate folds with finite state machines. They aren't at all the same concepts. I've only said they can be used together.

An FSM is stateful in the sense it records information accumulated over multiple interactions. But its implementation does not need to have mutable state managed by reads and writes. For example, we can represent FSM in a pure lambda calculus and use it for regex matching. You might argue lambda calculus is likely to be implemented statefully. But that use of state in the implementation is abstracted away in the semantics, not observable, which contradicts your claim that "the state is not abstracted away".

Also, it doesn't take any implementations to prove you wrong. Pointing to your category error is sufficient. Convincing you of your error is a more difficult task.

Abstraction and Values

How do you implement an FSM without mutable state and reads and writes, and I mean a real implementation, something I could have on my desk?

Also I don't think lambda calculus abstracts state at all. An abstract operation is one which generalises a category of concrete operations. There are many kinds of memory that can be used to implement state, and they all have differing concrete protocols to read and write data. 'put' and 'get' effects are an abstraction of these concrete operations.

Where is the state abstraction in lambda-calculus? Any lambda term beta reduces to normal form, and it is a value not a state. Values, like the number '3' have no state, they are just values.

Back at you.

You argue FSMs are stateful on the basis that their physical implementations ultimately must be. Unless you have a double standard, the same argument should apply to lambda calculus. How can you implement lambda calculus, a real implementation that I can have on my desk, without some form of mutable state? How can you physically rewrite an expression to normal form? Information and values and data structures are ultimately represented by waves, entanglements, and vibrating particles in physical space-time, are they not? How can even a value be stateless when it has a stateful physical implementation?

Your argument isn't a valid one. Category error invalidates the whole thing. Further, even ignoring the category error, you cannot differentiate between stateful and stateless computation models just by looking at their physical implementations. Computation is fundamentally mechanical, after all, and whether two physical structures have the same observable behavior ultimately depends on which observations are made, potential interference, and other non-local emergent properties of a system. This depends on higher level semantic constraints that aren't visible in the physical implementation. All computations must be considered equally stateful if we judge by observing their mechanical implementations.

Anyhow, I'll review a few points. First and most importantly, the properties of implementations don't belong to the model. The entire foundation for the argument you're trying to make by considering implementations is invalidated by category error. Second, I have mentioned we can easily implement FSM in hardware with a digital logic circuit, such that it is gated and performed by physical laws without an external agent or CPU doing reads or writes of a separate store. You can put a digital logic circuit on your desk. A lot of hardware design code like VHDL can be implemented this way. Arguably, your choice of externally mutable state with reads and writes is too high level and non-local a state abstraction if your interest is physical implementations. I have an impression that you're willing to get vague and hand-wavy to squeeze all forms of physical state into "reads and writes" to fit your imperative world view, but I think doing so is unwise and will blind you to a vast and interesting world of state models. In any case, physical implementations are irrelevant to whether a computation model is stateful because of the first point. Third, we can implement FSM within lambda calculus, e.g. using it for regular expression. We can similarly use State or ST monads, or even implement networks of virtual machines running imperative code. The existence of such implementations doesn't make FSMs or imperative code pure or stateless because it would also be a category error to attribute the purity of a functional implementation to the model being implemented. But it nonetheless contradicts your repeated arguments about implementations of stateful models requiring a mutable state all the way down, and it demonstrates that non-physical implementations can be useful.

You are simply wrong at multiple levels.

I doubt you'll be convinced. Opening your mind to doubt for your own hypothesis and world view is up to you and your honesty as an intellectual, not up to me. If you want to convince me, you'll first need to avoid that glaring category error. You'll also need to resist your confirmation-bias double standard where you argue in terms of physical implementations only for stateful models.

pure = impure + context

Registers still have read and write operations, so the VHDL argument does not work.

I agree by the way that lambda calculus cannot be implemented without state.

If you look at Oleg's argument is says fundamentally everything is an effectful computation + context, but that if we can 'box' those two together, so that we cannot see the effects between the computation and the context we can call it pure.

So a random access memory machine implementing lambda calculus is pure only if we cannot see any interaction between the program implementing the lambda calculus and it's context.

So given all the input in advance, say a string, and a stateful FSM you can put both into a black box and call it pure, you feed the input into the FSM as a single step and it "reduces" to its output, after which it must be disposed of and not used again. Each input string must be fed to a fresh FSM. A FSM is a tool you use for managing real time events, and without the events there are simpler models you could use, rather than packaging an FSM with its initial state and feeding it's input all in one go. A fold for example seems more useful.

So to fully state my position, I still think a FSM is by definition impure, but if you package the FSM along with an "initial state" you can have a pure function that accepts an input argument and returns a result.

This is exactly what Oleg is stating in his paper I referenced: pure = impure + context.

FSM doesn't imply a particular implementation

If you look at the definition of FSM, you won't see anything about how it's implemented.

Pastafarianism?

I know Keean seems to have some imperative-fundamentalist faith regarding state and computation. But linking to the flying spaghetti monster in a discussion about finite state machines seems a bit...

[removed]

Removed sarcastic and dumb comment. Ignore me.

finite STATE machine

It is even in the name :-)

Can you define the 'finite' property of an FSM without referring to states? Can you find published literature that defines a finite state machine that is stateless?

Nitpicking but agreed

I triggered on this:

Fundamentally, our real world's physical state is updated by apparently local rules of physics just like a cellular automata is updated by local rules

So, the nitpicking: Quantum mechanics, Bell's theorem, and nonlocal phenomena. Although the world seems causal and mechanical at the superficial level we observe it in our daily lives, we are still pretty clueless what the physics is at a fundamental level.

For the rest, I agree with your argument.

So, the nitpicking: Quantum

So, the nitpicking: Quantum mechanics, Bell's theorem, and nonlocal phenomena.

The Cellular Automaton Interpretation of Quantum Mechanics by Gerard 't Hooft.

Is that supposed to be evidence?

I don't hope you want to claim that 't Hooft solved the problem and now the generally accepted view in physics is that it's all cellular automata deep down?

Just pointing out that QM

Just pointing out that QM and apparent non-locality don't preclude an interpretation based on deterministic local physics. Unless your nitpick was meant to imply something else?

Doesn't look to be that easy.

Well, my nitpick was more the second line in that we simply don't know but it seems non-local. 't Hooft makes a superdeterministic argument based on cellular automata. He can do that but we still don't know.

big data in closures

what kind of model would include a file-store inside the pure closure?

To me, "file store" brings to mind all sorts of ad-hoc issues like timestamps, permissions, memory mapping, lockfiles, and shared mutable state. But we could consider a simpler model: a persistent trie.

I believe purely functional programming, as a general purpose paradigm, would benefit greatly from more effective support for larger-than-memory data structures, i.e. such that we can easily work with first-class trie values measuring many gigabytes. We don't even need to start with many data structures: intmap and finger trees cover about 99% of use cases (since we can easily build tries and hashmaps above intmap).

This doesn't eliminate need for 'external state' such as networks and user input/output.

But it does cover a lot of cases where we are currently forced to use 'effects' to access external databases and filesystems for scalability or durability reasons, simplifying our application models, testing, replayability, and so on. Whole databases and filesystems could essentially be captured as first class values. We can ideally 'save' the whole value just by updating a small root reference.

One feasible approach is to make it easy to refer to data structures by secure hash of a binary representation. I've built this idea into the language I'm developing now, and I'm implementing the storage and GC layer and smart references in F# at the moment. I have several performance lessons learned from an earlier version from a few years ago, VCache in Haskell.

Interesting.

It seems an interesting approach. How would you write interactive software, like an airline online seat booking service, for example.

Interactive

Obviously you must receive your user inputs from outside of the process, e.g. from HTTP POST requests.

An application could be modeled as an `(Input * State) -> (Output * State)` handler function for POST requests and `(Input * State) -> Output` for GET requests, where Input may include our URL, cookies, headers, and any form data. Or a parsed type, abstracting away HTTP in a command pattern style.

This certainly isn't the only option, of course.

For my F# implementation, I have an API where developers register durable abstract variables called TVars by providing a key and codec, and may perform transactions that read and potentially update many TVars in STM-style. Ephemeral TVars are also available. It provides snapshot isolation and multi-version concurrency control. So a "handler" could generally be transactional, in this case. We could model command pattern logs and cached state, transactionally.

The important bit is that most of our application logic can be moved into pure computations observing or updating local state models instead of explicit management of external state. Pure application logic with first-class state values is easy to grok, debug, test, replay, replicate/mirror, batch process, correctly cache, etc..

State Transition Only

It seems to me, in a general model there is both state and pure function. The Finite State Machine model seems appropriate. You can make your state transition functions pure, but you still need some state. Functional languages 'cheat' by making that state global and saying it's outside the language.

Or to put it another way, how do you write the web-server in your language, as by talking about the command pattern you make the state transitions pure, but have just pushed the state up into the web-server.

As nearly every piece of software is interactive, and there is very little batch processing still done, don't you think that interaction should be the focus of software development?

Not cheating

FP programmers aren't impractical. We want to simplify our application logic and refactoring and testing etc. by making it pure where feasible. But feasibility is constrained by practical requirements and knowledge. We'll use an IO monad or alternative to create a web service or integrate GTK windows or access an external database as needed. Ideally, we'll keep it mostly in a separate layer, limiting entanglement with application logic.

Effective support for larger than memory, durable, immutable data structures supports scalability and durability requirements. This enables a tight focus of our application model on essential interactions such as user input and queries. That is, we can avoid interaction with a database or filesystem if it would primarily be used for non-functional requirements of scalability and durability. Rather than a "cheat", I call that "progress".

As for batch processing: although it is true that software is highly interactive, batching remains useful when working with master-slave mirrors or rollback of a bad input that may quickly recompute state from a set of inputs, or quick startup from a log of inputs and an old state snapshot. In imperative systems, we might use append-only logs and command pattern to achieve similar. Software architecture design isn't a zero-sum game; better support for batch processing in either case doesn't hurt support for interactive update. For some interactions - like "universal undo" - it might even help.

interaction

don't you think that interaction should be the focus of software development?

FP does a good job of making interaction a major focus of software development simply by making it explicit in the type system and application models. Since we cannot hide interactions, we must give them much of our attention.

You should be asking this instead to imperative programmers, who seem to treat interaction with network or database as just a negligible background thing to be hidden from other programmers, rather than the focus of their development.

FWIW

I believe purely functional programming, as a general purpose paradigm, would benefit greatly from more effective support for larger-than-memory data structures, i.e. such that we can easily work with first-class trie values measuring many gigabytes.

At Morgan Stanley, we have a PL that supports this and it essentially works by adding a type for within-file references that works along with algebraic data types (sums, products, recursives, ...). We define concurrent sequences, btrees, skip-lists, etc in a pretty straightforward way and it integrates well with queries and function definitions so that it's mostly the same whether data is stored in memory or on disk.

We record/process/index about 1.5TB/day with this. The PL is a variant of Haskell (which we also use in other ways on the hot path in low-latency, high-volume applications).

hobbes

I can't tell from the Readme whether these are first-class values or second-class references to external storage. But I'll be perusing the hobbes language for a while now. :D

first-class, second-class, coach

I can't tell from the Readme whether these are first-class values or second-class references to external storage.

Basically, a value of type "a@f" is a 64-bit value that refers to an offset in the file "f" (here "f" represents a _value_ that determines the file in question) where a value of type "a" can be found. There's a function "load" of type "a@f->a" that essentially "dereferences the pointer" (in the implementation this doesn't read from disk but just passes through mmap).

There are also similar functions for allocating out of a file, and files carry their whole type structure with them (so we can have generic readers with type-checked queries).

The whole thing is pretty similar to region allocation for memory and these file reference types are just like region-annotated types. I'm not sure how you feel about that in terms of "class" (ie: first vs second class) but for the most part in our day-to-day query/analysis work we don't need to think about reference types (thanks to overloading cases in common type classes covering them).

We have to think a little about them when implementing data structures (e.g. the full structural btree type looks a bit like an extended curse as "^x.|leaf:[k*v]@f, node:x@f*[k*x@f]@f|" here with "^x.T" representing a recursive type "mu x.T", "|leaf:T,node:T|" representing a labeled variant of two constructors named 'leaf' and 'node', "[T]" an array, "T*T" a product), but I've gotten used to it and the secondary benefits have been sufficient (for us at least) to justify the extra thought.

Regional values in files. Nice! GC?

I can easily see how this would be quite useful, and would make it easy to build larger than memory data structures. Are these values garbage-collected somehow?

garbage collection

We never delete data in these files, but we do essentially "compact" them at EOD by accumulating/sorting the data that is interesting and we know won't be added to further. This is kind of like a copying garbage collector with a period of 8 hours or so.

Every file has "roots" (data immediately readable from the file via "f.field") so a textbook "copying collector" can work generically this way.

Impure languages may use monads too

Keean, it is not true that "ML family languages, because they are effectful, don't need monads" for the reason you propose.

Monads, effects handlers or Oleg's more general framing of interpreting computation signatures are about letting users implement their own computational effects. They also often allow those effects to show up in the type signature, but not always; OCaml folks are experimenting with untyped effect handlers, for example.

A ML-family language gives you direct access to *some* effects provided by the runtime, with a fixed interpretation: you get randomness, I/O, very rich mutable state and exceptions in a certain order (exceptions do not rewind I/O or state action), and some runtimes (but not all of them) also provide you with some form of control inversion and some implementations of concurrency.

Even in ML languages it makes perfect sense to use one of these user-defined-effects devices to implement other effects (for example, distributed computation) or to provide different interpretations of some of these effects (for example, have randomness-using program compute a probability distribution instead of computing one sample).

Some languages take a maximalist approach of providing every notion of effect you could wish for as part of their runtime. It's part of Racket's philosophy for example. This gives a lot of flexibility, but even then there are interpretations of effects and some combinations that you won't get without doing the work.

(This point can also provide an understanding for some of the evolution of Daan Leijen's Koka language for example. It has a type-and-effect system that has always been able to type-check the use of runtime-provided effects, but recent versions have worked on allowing users to define their own notions of effect (as in Links, Frank, etc.)).

Difference between need and want.

I think it is exactly true that they don't _need_ Monads, that is I can write code with side-effects, code that does IO without them. I can do this because the ML function arrow "f : a -> b" allows this function to have side effects, and therefore the arrow is impure.

So I can include side-effects without a monad (if I were a lawyer, I would ask you to answer this rhetorical question yes or no).

It's not just some effects, if you can do IO it is all effects. It is a Turing complete imperative language.

Haskell needs Monads because the function arrow is pure, and hence there is no other way to include side-effects. The side effects happen in deconstruction, so you need both a way to construct something (unit) and a way to chain those constructions together (bind), so you construct a program that then gets executed when it's deconstructed at the end of the functional program. You can see these operations (unit and bind) define a monad. This is not mere coincidence, a monad is precisely what you need to functionally construct an imperative program.

Now if you say you _want_ to use Monads to organise your code in ML I may think you are a bit weird, but you would be correct, you _can_ create a monad in ML, and use it as a way to create domain specific languages etc. It may be a bit less useful because you cannot inject it into third party libraries because they may not be written monadically, but you get the same problem in Haskell when people write directly in a concrete monad like:

f :: IO ()

Instead of using abstraction like:

f :: MonadIO m => m ()

Which would allow you to inject your own monad other than IO.

So to summarise, I never said it is useless to have Monads in ML, so counter examples do not help here, I said you do not _need_ Monads, you could write that code another way. Monad haters (which i am not) might think you are stupid to use a monad, adding a layer of abstraction and indirection where you do not _need_ it, I just think it's a little weird, like trying to use Monads in C++.

Hm

I think you either have not understood my point or that you are countering it with a claim that I don't agree with. I don't believe it is true that "if you can do IO it is all effects".

Your operating system and language runtime provides you with a set of interfaces to perform *some* side-effects, but you could conceivably desire other notions of effects not in that set *or* different interpretations of those effect interfaces than those provided by the type system. Again, consider:
- if I want to program with notions of state and exceptional return such that exceptions roll back state update, or
- if I want a notion of non-deterministic choice where the interpretation of an effectful program is not the a single run/sample, but the list of all possible results, or a *distribution* of the possible results as a data-structure to inspect.

None of these are directly provided by the runtime systems of ML languages that I know of (or standard operating system interfaces that you could bind through the FFI), so for these you, the user, may have to implement your side-effects on your own, for example using a monad (or effects handlers, or a final interpretation as discussed here). (It is of course possible to implement those things as library without realizing that the library can be seen as a new user-defined linguistic construct.)

Module with functions

I can just provide that functionality with a module and a set of functions to manipulate the types I declare for roll-backable expressions.

The point is you can just have state anywhere in ML, a normal function can hide stateful update, so there is no _need_ for a monad.

Last try

This is my last try in explaining this to you in this thread.

Monads, or Oleg's "sublanguage", or effect handlers, are programming techniques to let users define their own notions of side-effects. Lets call them "devices for user-defined effects".

In Haskell, the standard library provides almost no notion of side-effect that is not encapsulated in a monad (except by going through the unsafe C interface), so you or library providers, including the standard library, use one of these devices (not necessarily monads) to provide *any* notion of effects.

In ML languages, the standard library and language definitions provide you with *some* notions of side-effects that can be used directly, without using these devices with user-defined effects: states, exceptions, often randomness, sometimes control, in some fixed order. Let us call those "builtin effects". But there are *other* notions or interpretations of effects that are not provided. If you want to use some of those, it is important and useful to recognize the need for one of these devices for user-defined effects. (How exactly you implement the notion of side-effect you need is up to you, and in particular you may or may not rely on some of the builtin effects of the language -- it is usually a good idea for efficiency.)

It is *correct* to say that ML "does not need monad" (or, in general, does not need any device for user-defined effects) to support any of the builtin effects. But it is *wrong* to say that ML, in comparison to Haskell, "does not need user-defined effects" *in general*, because there are other effects that you may want than those builtin effects -- I have given explicit examples in my previous post.
Haskell and ML users may both need user-defined effects, there is no absolute difference, only a relative difference: ML users can often rely on builtin effects, so they need devices for user-defined effects less often.

Finally, let me come back to your original comment. You are asking whether it is surprising that Oleg can demonstrate, in ML, how to implement user-defined effects without directly relying on a monadic structure. Your point is that ML programmers already write effectful expressions by using builtin effects. However, this does not explain Oleg's work or make it unsurprising, given that this work does not rely on ML builtin effects, and in fact could have been implemented in Haskell just as well.

Maybe it's not surprising to me?

Maybe it's just not surprising to me, having worked with Oleg in the past (on the HList and OOHaskell papers), or maybe I haven't fully understood it.

If you can implement it in Haskell _without_ using Monads, or unsafe trapdoors like "unsafePerformIO" I would be surprised.

Being surprised would be a good thing though, because it would mean I have learned something new and unexpected, but at the moment I doubt it, I just think there is a difference in how we are explaining things, I don't think effects fundamentally alter the nature of pure function, you still have to work within the rules.

Edit: I have re-read it all more thoroughly and I still agree with my original position. The DSL would need to be monadic in Haskell because it contains state in module S.

You probably want to read it once more

You probably want to read it once more. There is no mutable state in NDetL which is the example implementation of S.

Limited Application

If there is no mutable state in NDet, then it is only useful for a stateless subset of possible DSLs? Certainly it would preclude any stateful implementations of NDet. Is it even a proper effect, if it is not "effecting some state"?

I am not even sure NDetL is stateless as you claim, as it contains set and get operations on an associative data structure. The problem is because ML does not enforce purity it is hard to tell. Oleg's comment about implementation NDetD:

(* Second implementation, using delimcc
No explicit monads any more
*)

Suggests that implementation NDetL is using monads, even though you don't _need_ them in ML.

The choice of non-determinism was influenced by Oleg wanting to explore the minimal DSL. If we want to look at other DSLs, say for file IO then the implementation would need to be stateful. As soon as the DSL has the concept of sequencing effects it would have 'unit' from the meta-language quoting, and 'bind' from the composition/sequencing of fragments and would therefore permit abstraction by a monad.

Er

Er, I don't see any associative data structure nor get or set operations. Also, the code is right there in the article, so what is hard to tell? All it's doing is very basic list manipulation.

I don't think that there is a problem with using a monadic implementation either. The types `int_t` and `ilist_t` can well be represented by a monad (well, they already are in NDetL: the list monad). The only place where it may cause problems in this example is with the observation function -- that obviously requires the monad in question to have `run` functionality, but the ST monad has just that (though IO does not).

Is it even a proper effect, if it is not "effecting some state"?

Yes, there are many effects that have nothing to do with state.

Running Effects

You are right about NDetL using simple list operations, I must have been reading a different part of the source file without realising it. I don't think this changes anything though.

When I think of effects, I think if memory allocation, random number generation, file IO, channels etc.

If you were trying to define a language with pure functions and handle IO by effects, you would have to deal with these cases. You could have a pure functional meta-language that assembles the impure imperative fragments into a final program which gets run automatically on deconstruction. This is precisely how the IO monad in Haskell already works.

So what Oleg has done is taken the same mechanism and applied it to a simpler (more minimal) DSL that does not need bind (it still needs unit however). It is not surprising that there are some DSLs that are functional not imperative.

Copenhagen

I've been struck, in recent times, by the broad structural similarity of monadic programming to traditional quantum mechanics. Consider. Traditionally quantum mechanics describes a physical system by a wave function, a complex-valued wave across the system's state space which evolves over time according to some equation such as Schrödinger's. Fans of QM have liked, historically, to gush about how beautiful the mathematics is, which... has some truth to it. The wave function evolves in a mathematically clean way, with two main difficulties. One, the clean math may be too difficult to deal with symbolically; and, two — it's inherently impossible to directly experience the wave function (whatever dubious claim one might make anyway about its ontological status). When it comes to actual observable reality, it's not a wave function; that's the point of the Schrödinger's cat thought-experiment, that despite the mathematics, what's actually there is not a wave function, it's a cat. (Meow!) Mathematically, you have this lovely wave function, but you only get to keep it as long as you're not in touch with reality, at which point you have to switch tracks by means of wave function collapse (which mathematically is ugly as sin). I vividly recall my professor in introductory quantum mechanics (one of those truly excellent teachers one occasionally has the good fortune to learn from, btw; but I digress), after spending weeks on a single electron in a potential field, introduced a second electron. Each electron had its own wave function, which we were well familiar with by then; and what happened when they met each other? One of the electrons observed the other, the observed electron's wave function collapsed — and we were able to integrate over all the possible observations and end up with a combined wave function for the two-electron system. Which worked only because we were lucky enough to have a problem where we could work out a symbolic solution to that integral. The whole history of quantum mechanics, afaics, huddles around a few, relatively very very simple, problems we're able to solve. In most realistically big systems, the nice elegant wave functions of the parts don't combine into a remotely manageable wave function for the whole.

Set that beside monadic programming. Fans of functional programming gush about the elegance, but there's this pesky detail that observable reality isn't a monad, it has imperatively evolving state. And difficulties with composing monads, and scaling them up. I've come to think of monads as the Copenhagen interpretation of programming.

observable reality

An interference pattern on a screen is also a state. It can't be accounted for by isolated particles - so yes, state change is important, but it might not be individual.

Do you have any suggestions for a Young's Slit-Experiment in Computation?

Non-determinism as an effect

People do strange things. One of the strangest things is deciding that some particular aspect of a definition is the hill that they personally are willing to die on. And then arguing the point to death... or in the case of LtU far far into the right margin of the site. I'm starting to think that the design decision that makes really deeply nested replies disappear into a column is looking more and more like feature, rather than a bug. If a thread changes topic (many times) then the understandability of the replies cramped into the thin column starts to model the long chain of assumptions that the reader is being asked to follow. If a point is worth being made then starting afresh forces basic assumptions to be stated clearly, and leads to a more readable discussion.

Some interesting points have been raised in the long wandering discussion. Rather than simply attack them and point out that they are wrong, is it not more interesting to track them back to the original source and ask where these misunderstanding have arisen? What can we learn about the original claims from the discussion that it has produced?

Here is the claim that seems to have prompted a large part of the discussion: "We pick a rather complex effect -- non-determinism". This would cause me to ask two questions - in precisely what way is non-determinism an effect? How does the example exhibit it?

One way to define non-determinism as an effect is to define the semantics of a language so that it reads some unseen state from the environment (treading perilously close to the right-hand margin) and use it to pick a value or control path in the execution. It is certainly difficult to do so without implementing some kind of stateful read from the environment in the language semantics. The observable outcome is that when the computation is executed its result cannot be determined without access to the extra state - hence the non-determinism.

That does not seem to be what is shown in the example code. Another form of evaluation of a non-deterministic code is to ask "what is the deterministic hull of solutions?". This computation is itself is deterministic, a classic example would be "what is the set of all permutations of a list?". This purely deterministic question is related to, but distinct from, the non-deterministic question - "what is a permutation of this list?".

If we are computing the hull of solutions then it is not surprising that we can do so without access to any hidden state, and so it starts to seem less surprising that we did not require a monad to model the implementation. The paper is a good read, and it stands on its own merits as an interesting use of a tagless final interpreter, but I don't see how it does what it claims. In the example provided:


let _ = let module M = Perm(NDetL) in M.test1
(*
- : NDetL.ilist_t = [[1; 2; 3]; [1; 3; 2]; [2; 1; 3]; [2; 3; 1]; [3; 1; 2]; [3; 2; 1]]
*)

...where is the non-determinism? That looks like the deterministic hull of a non-deterministic function. It is true that the same hull can be rendered into different lists, but only one such list is being produced by the implementation. It is not non-deterministically selecting between them. If there is no non-determinism then what is the effect being modelled?

re: nondeterminism effect

...where is the non-determinism?

Well, it would be equally valid for the returned `ilist_t` to have a different order, or even just one result via backtracking. :D

As I understand Oleg's argument, the non-determinism 'effect' is in the interaction between `Perm` and the `NDet` handlers/context. Our program `Perm` was written against this signature without knowing or caring about order of results. We are similarly free to commute arbitrary `|||` subprograms or eliminate/introduce duplicates without affecting Perm's program semantics (because `x == x|||x` and `x|||y == y|||x` equivalences are implied by our NDet semantics as described in the comments).

This would probably feel closer to a DSL if we avoided passing the `NDet` implementation as an argument to `Perm` and instead reverse that, passing the DSL code to an NDet runtime/interpreter such that the source of non-determinism is never visible in scope of the program leveraging it. Oleg mentions this in his "exercises" section, e.g. use of freer monads instead. But the two approaches are essentially equivalent, and ultimately it's only `Perm` by itself that is non-deterministic, not an arbitrary combination of `Perm + NDet implementation` (of course, an NDet implementation could support threads and race conditions and generally be non deterministic, another 'exercise' left to Oleg's readers).

Similarly, a system using imperative threads would be deterministic if driven by a deterministic scheduler and network. It's the program written using threads that we call non-deterministic because it makes no assumptions about scheduling.

Different handlers

I agree, you could use a handler that has a random generator and produces a single non-deterministic output. I also agree the 'interpreter' model makes things clearer, and it's easier to see that for a random non-deterministic output the interpreter itself would need to be monadic in Haskell, but the effects themselves need not be.

Actually I'm not sure that's a correct interpretation, the effects themselves are not pure, what we are doing is composing fragments of an impure language, and of course composition is a pure operation. It seems easy to get confused between the DSL which is actually untyped (as we do no type checking on the int_t/list_t abstract syntax generated) so the whole pure/impure element of the effects us hidden from the host language.

What would be interesting is what happens if we try and type the abstract syntax of the DSL in the host language?

Indeed

That's pretty much the original point I was trying to make, only expressed far more eloquently.