Non-determinism: a sublanguage rather than a monad

Non-determinism: a sublanguage rather than a monad

A puzzlingly named, exceedingly technical device introduced to structure the denotational semantics has by now achieved cult status. It has been married to effects -- more than once. It is compulsively looked for in all manner of things, including burritos. At least two ICFP papers brought it up without a rhyme or reason (or understanding), as the authors later admitted. I am talking about monads.

In truth, effects are not married to monads and approachable directly. The profound insight behind monads is the structuring, the separation of `pure' (context-independent) and effectful computations. The structuring can be done without explicating mathematical monads, and especially without resorting to vernacular monads such as State, etc. This article gives an example: a simple, effectful, domain-specific sublanguage embedded into an expressive `macro' metalanguage. Abstraction facilities of the metalanguage such higher-order functions and modules help keep the DSL to the bare minimum, often to the first order, easier to reason about and implement.

The key insight predates monads and goes all the way back to the origins of ML, as a scripting language for the Edinburgh LCF theorem prover. What has not been clear is how simple an effectful DSL may be while remaining useful. How convenient it is, especially compared to the monadic encodings. How viable it is to forsake the generality of first-class functions and monads and what benefits it may bring. We report on an experiment set out to explore these questions.

We pick a rather complex effect -- non-determinism -- and use it in OCaml, which at first blush seems unsuitable since it is call-by-value and has no monadic sugar. And yet, we can write non-deterministic programs just as naturally and elegantly as in Haskell or Curry.

The running tutorial example is computing all permutations of a given list of integers. The reader may want to try doing that in their favorite language or plain OCaml. Albeit a simple exercise, the code is often rather messy and not obviously correct. In the functional-logic language Curry, it is strikingly elegant: mere foldr insert []. It is the re-statement of the specification: a permutation is moving the elements of the source list one-by-one into some position in the initially empty list. The code immediately tells that the number of possible permutations of n elements is n!. From its very conception in the 1959 Rabin and Scott's paper, non-determinism was called for to write clear specifications -- and then to make them executable. That is what will shall do.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

welcome to the real world

Hear, hear! Such loads of categorial pretense, neither necessary nor sufficient for the job, emanating from lazy-land!

A correction, though: the mentioned structuring effect (ahem) is not to do with monads, but rather modalities, specifically the lax modality, as detailed in my Practical Foundations for Programming Languages, wherein I also make the point that it was all present in Algol back in 1960! The much-touted formulation in the Haskell language is neither original, nor even properly formulated (modality, not monad).

Will sanity at last prevail?

Is this surprising?

OCaml is not a pure language, it's function-arrows can contain side effects (like the whole ML family), so of course it doesn't need Monads, as every function is implicitly impure. It's practically an imperative language :-)

This code is pure

As far as I can tell, the code in this article is all pure. There are no uses of refs or similar, although the accompanying code provides an alternative implementation of the same signature that uses delimited continuations. The article also makes the case that there are "effects" that can't reasonably satisfy the monad laws.

monad nonsense

Right.

Effects

The effects themselves represent impurity, without the 'purity' restriction in Haskell this does not need Monads and is trivial.

Implement it in Haskell without Monads, and you will have something interesting.

Did you actually read the article?

This comment implies that you did not. The implementation here could be trivially translated to Haskell and would still not be monadic.

Problems in Haskell

The problem in Haskell is that the DSL operations exported from module S cannot have any state unless they are in a monad, so the whole DSL becomes monadic.

Edit: so the meta-language can be functional because you are "macro" programming the DSL which you can consider pure-functional composition in both Haskell and ML, but you can only have a functional DSL in ML because all functions can hide impurity, an Haskell the DSL would need to be monadic to enable it to be stateful.

GADTs

GADTs are the easiest way to model arbitrary DSLs in Haskell. And the DSLs can easily be stateful, that just depends on how you interpret the GADT.

Interpreting?

Are you suggesting that you use the GADT to represent a language that builds an abstract syntax representation of the computation? You still need an interpreter for the language, and that interpreter will be monadic.

We actually did this for SQL back when we were writing the HList paper, you used functional composition in the meta-language (in this case Haskell) to construct abstract representations of SQL queries, however when you actually wanted the results back from the database you had to run the query and that was monadic.

I suppose if we had pushed more of the program into the DSL so it could represent the UI and other IO requirements of the program we could have rolled the whole thing into a single abstract representation which we then interpret (or compile) in one go when the meta-program has finished running. Interestingly this is the direction I am going in my own work, using a Prolog-like meta language assembling a "DSL" that is imperative. In my case the DSL is intended to be as close to the machine model as possible whilst still being portable, sort of C-- like, and the logic language provides the generics, polymorphism, type-classes etc.

In any case the monadic structure exists whether you call it a monad or not. For example 'C' is monadic with unit="return" and bind=";". Monads are implicit in any imperative language, that is one that has a sequence of commands/statements rather than evaluating everything as an expression. A monad is an algebraic structure, so like an 'additive semigroup' it describes properties that concrete entities have anyway. Addition of numbers behaves as a semigroup whether you like it or not. Likewise the monad is there whether you like it or not. If you don't chose to structure your code algebraically it does not make the monad disappear, it's still there, just like it's there in 'C'.

Hrm

Monads aren't particularly important in Haskell, except for particular monads like IO. Also, C language isn't monadic for a number of reasons and even if it were, bind would not be related to semicolon since effect sequencing exists within expressions.

'C' expressions are in the IO monad

Are you sure? Yes what I have said about ';' being bind is an oversimplification, but I could take a 'C' program and give everything monadic types, and then type check the 'C' program.

The sequencing of effects in expressions just shows us that even arithmetic operators like '+' in 'C' are in the IO monad.

(+) : Numeric a => (IO a, IO a) -> IO a

Would be the type for the 'C' add operator.

Edit: it is also interesting to see the different views, below gasche argues that ML needs Monads, and here you are saying that Monads are not that important to Haskell. My views are somewhat in the middle, I think you are undervaluing Monads in Haskell, and I think gasche is overvaluing Monads in ML. Like Goldilocks, the middle way seems just right.

Monads, mo' problems

There are a bazillion reasons why the semantics of a fragment of C code isn't just a value in the IO monad. But as a very hand wavey idea ... sure, I get what you mean.

I don't think gasche argued that ML needs monads. His point was rather that an ML programmer might need monadic constructions (or effect handlers) for the same reasons that a Haskell programmer would. And I agree.

Haskell IO is a DSL.

The reason I think what Oleg has written is obvious, is this is precisely how every Haskell program is written. There is a DSL for IO that you macro program in a functional meta language. 'bind' itself is a macro that composes two DSL fragments into a single fragment. 'unit' lifts a DSL fragments into a meta-language fragment.

Oleg's DSL has a 'unit' in this context 'unit' is like macro quoting, he does not define 'bind' because his DSL is a functional language and does not need it. It is a Monoid because '|||' behaves like 'plus'.

So a monad is part of the algebraic analysis of the DSL. It is either monadic or not , which is a property of the language, whether you choose to call the DSL functions 'unit' and 'bind' or something else. If the DSL had statements as well, then it automatically would have been a monad. A monad is an abstraction over many DSLs, when you have programmed a lot in this way you notice that every DSL needs to be quoted to be handled by a macro, so you can abstract 'unit' as a property every DSL has. You then notice a lot of DSLs that are imperative have a command sequencing operation, basically "do this _then_ that", so you can abstract 'bind' as a generic operation you perform on all those DSLs. So being monadic is a property DSLs have by merit of being embedded in another language, and having an effect sequencing operation.

So 'C' as a DSL would be a monad (that is we can define a mapping from 'C' to the meta-language operations of unit and bind). Oleg mentions trying to understand what the minimal viable DSL would be, he does not have functions in it for example. The lack of 'bind' is not because bind is 'bad', but simply because this DSL does not need it. This is part of the exploration of 'minimal' DSL, not suggesting that 'bind' is not needed for all DSLs. If you want a DSL that involves sequencing commands, you already have bind.

So every Haskell program that does IO has already been doing this since Monads were introduced. The only difference is Haskell's IO DSL supports 'sequencing' so you can sequence fragments using bind.

This sounds different from what you or gasche have been saying, but maybe I just have not understood you. Hopefully you can see why I said it's obvious, as all Haskell IO is already done like this, with the addition that we want to sequence IO, so we can bind DSL fragments together.

I think this approach is very important, and most of my work since the HList paper has been based on this approach of a simple DSL with advanced features provided by the meta-language, except not hosted in another language like Haskell or ML, so the DSL is simply 'the language', and it can have its own purpose designed type system.

[Note: this is a different point to that above about the interpreter for the DSL needing to be in a monad in Haskell if it has state, which it does not need to be in ML.]

I think you may be confused

I don't know what you mean when you say that Oleg's example isn't monadic because his DSL is functional. Generally, I'm having trouble following your posts in this thread except as fuzzy ideas. The usual approach for non-determinism would have him doing this:

module type NDet = sig
    
   type 'a NDet  (* non-deterministic value of type 'a *)
   
   ...

This would treat non-deterministic values of any type uniformly, and would be monadic. Instead of doing this, he goes out of his way to handle only two concrete types of non-deterministic values: integers and lists of integers. Because he only has to deal with these two concrete types, he can provide implementations that he couldn't use if he had to implement "the whole monad", and he gives code generation as an example.

I find the construction to be kind of janky. It kind of reminds me of the Scala embedded DSLs that were posted here a while back, which I had similar concerns about.

Expressions only

What I mean is the DSL consists only of expressions, it has no statements, and hence has no need of bind.

Don't think that's the main point of his construction

Bind isn't just about statements. Look at the type of it. If it were just about statements, the continuation parameter would have type () -> ... And I don't think that's the point when Oleg emphasizes his construction not being monadic. It's rather that monads have to be defined at every type.

Bind is 'do this then that'

You do not need bind to compose expressions. For example '+' in a monad had the type: m a -> m a -> m a. Here the LHS and RHS monad can be executed in parallel, as there is no dependency between them (and the meta-language is pure, so no side effects).

Looking at the type of bind: m a -> (a -> m b) -> m b we can see the RHS depends on the result of the LHS, and so we are forced to run the LHS before the RHS. The type signature makes this sequencing explicit. If we contrast it with the signature for '+' the _only_ difference between '+' and 'bind' is this explicit sequencing in bind. So bind is all about sequencing one thing after another, the imperative "do this then that". If the DSL has no side effects then bind is not needed, and everything can be done with '+' style operators. Looking at Oleg's goal of being minimal, this makes sense to use an example that does not require sequencing, and is functional.

For example '+' in a monad

For example '+' in a monad had the type: m a -> m a -> m a

Where a = Int?

You're missing the point that Oleg could have made his example monadic without adding an explicit bind operator. The point isn't that he didn't need a bind operator. It's that for the example he chose it isn't possible to write one.

No Monads necessary

If it's not in a monad, then plus is just 'a -> a -> a' where 'a' is an abstract type. If there is no 'bind' then you don't need the monadic structure at all, you just need 'unit' which is provided by replacing each type like int with an abstract replacement 'int_t'. You have to 'quote' the DSL in the meta language somehow, all this does is "un-abstract" unit.

You still need an

You still need an interpreter for the language, and that interpreter will be monadic.

Interpreters will naturally be monadic if the language being interpreted is essentially monadic (that is, associative structure and sequential interpretation). Most stateful DSLs are procedural/imperative and hence fit the monadic structure.

But if you start getting into process networks, term rewriting, stabilizing soft constraint systems, cellular automata and variants, generalized arrows, or highly constrained languages (like a hardware design language with space and power constrained fanout), then the monadic structure doesn't always fit.

Of course you can force it to fit. Monads are quite general. But sometimes they're general in the Turing tar-pit sense. If it's awkward, you don't need to use a monadic interpreter just because you have a stateful language. And monads are awkward for representing and optimizing commutative structure, for communicating substructure, and elsewhere.

I'm recently of the opinion that KPNs would serve as a better 'standard' foundation for stateful, effectful interpreters and objects than monads. KPNs handle a fair bit of natural concurrency, state, flexible composition, and can hook into multiple input sources and destinations. It isn't difficult to make them reactive by uniformly adding some time/tick messages.

Monads are implicit in any imperative language

But imperative languages are not the only form of stateful languages.

And even when the monad is implicit - i.e. because you have your additive semigroup - a monadic interpreter may be inefficient if your language has much more structure than a simple monad (e.g. with commutative compositions or communicating substructures).

It seems you came into this discussion mostly thinking about imperative DSLs?

General Purpose Language

I think I came to the discussion thinking about general purpose languages. For example you could use a stateful minimal DSL embedded in a functional meta-language, where the meta-language provides all the nice programming facilities (functions, polymorphism, etc). Oleg mentions this in the article, and talks about the origins of ML as a meta-language for a theorem prover.

Embedding a pure functional DSL in a pure functional language seems trivial to me, you just need the 'quote'/'unit' operation. Maybe I'm missing something?

I also suspect there is some difference in our understanding of effects? The whole point was to demonstrate an embedding of effects. To be an effect there must be some implicit state or context that is modified by the effectively operators. Consider logging, we modify the state of the log-file when we encounter "log('something')". Without the implicit context or state,'log' is not causing an effect, it would just be a pure function.

Missing something

What you seem to be missing is the ability to doubt yourself. I have spent considerable effort trying to convince you that you are misunderstanding what effects are (I think it would have been respectful to at least give a try to the notion that you may be misunderstanding something), Brandon Bloom had the impression that you hadn't even read the article you were replying to, and Andreas Rossberg pointed that (obviously) you had not read the implementation. At some point there is a limit to the number of different kinds of ways someone can tell you that you are mistaken while remaining within the boundaries of polite conversation.

What is an effect?

I did above suggest that I might be missing something, but I haven't seen any suggestions as to what that is. What is the point in making an ad-hominem attack? I did admit that I missed the stateless implementation, but I don't think it undermines my point. It's a bit like dismissing an argument because the spelling is wrong, when that does not affect the actual point being made.

Let's switch tack. What is the difference between a function and an effect?

Edit: from 'An Introduction to
Algebraic Effects and Handlers' [Matija Pretnar]:

Algebraic effects are an approach to computational effects based on a premise that impure behaviour arises from a set of operations such as get & set for mutable store, read & print for interactive input & output, or raise for exceptions [16,18]. This naturally gives rise to handlers not only of exceptions, but of any other effect, yielding a novel concept that, amongst others, can capture stream redirection, backtracking, co-operative multi-threading, and delimited continuations [21,22,5].

Note 'impure' behaviour is there in the definition, so it is not just my personal definition of effects you are disputing, but what would appear to be a published consensus.

Oleg has a paper here:
https://icfp17.sigplan.org/event/hope-2017-papers-higher-order-programming-is-an-effect
this explains the approach, and states:

What is an effect then? It is an interaction with the context. An expression is effectful if it needs context to interpret it.

Note "interaction" with the context. Just reading the context is not enough, as we could then package the context with the function, so a closure (absent state, IO etc) is pure and not an effect. Oleg also talks about pure and impure code making the clear assumption that effectful = impure.

What we get though is the notion that if I take an impure imperative DSL with side effects and package it with the context and state (so for example an interpreter for that language) we can be left with a pure functional thing from the outside. For example a program to calculate the GCD in an impure DSL along with its interpreter all packaged into a functional interface like 'gcd(x : int, y : int) : int' can be regarded as pure from the outside. Effects are what link the impure code to its context.

Substituting a particular for a universal

I only glanced at this thread but it seems you keep substituting a particular for a universal. Compare it with money, dollars can be used to pay for goods, so can pounds, they're both money. Looks like you keep insisting that every form of money must be in dollars.

Monads simply aren't the only manner of dealing with DSLs or effects.

That's not what I am doing.

I have never said Monads were the only way of dealing with effects. I have said:

- Effects imply impurity (see quotes from Oleg above backing this up).

- In Haskell you have to use Monads for impurity because functions have to be pure (ignoring things like unsafePerformIO).

- That in ML you don't need Monads because functions and modules can encapsulate impurity. Of course you _can_ have Monads in ML, you just don't need them.

Where am I going wrong?

Edit: Thinking about it a bit more I think you have it backwards. The imperative case is the universal, because the pure functional DSLs that could be implemented with a functional interpreter in Haskell are a subset of the imperative.

With an imperative interpreter I can implement any DSL including all the examples like non-determinism as well as IO languages.

The functional interpreter can only implement the pure subset of those, even allowing state threading it can't do IO in Haskell.

So my conclusion is I am discussing the universal, whereas everyone else is focusing on a particular subset.

I'll try!

- Effects imply impurity (see quotes from Oleg above backing this up).

The code is pure so people are likely wondering why you're discussing impurity. If anything, Oleg showed he doesn't need monads for an 'effectful' interpreter. But, hmm, maybe not convincingly so.

- In Haskell you have to use Monads for impurity because functions have to be pure (ignoring things like unsafePerformIO).

You don't. Monads are one manner of dealing with impurity. Granted, they are concise and natural to programmers but likely one can prove there is an uncountable number of equivalent manners.

[Edit: True that Haskell implements Monads but that's a design choice. It's a particular, not a universal. You don't need monads for dealing with the outside world.]

- That in ML you don't need Monads because functions and modules can encapsulate impurity. Of course, you _can_ have Monads in ML, you just don't need them.

True. You often don't need monads where you would need them in ML. But I've used them just fine in a functional interpreter of ML. So what?

Why are we discussing impurity? What does a monad have to do with this all? Why are we discussing C?

Oleg just showed an example where he doesn't use monads for a DSL. Yeah, okay, whatever. These examples exist. An eval function for a small arithmetic language usually doesn't need monads either.

Well. I am not sure why we are discussing this. Ask gasche, he seems to know.

Precisely

Oleg just showed an example where he doesn't use monads for a DSL. Yeah, okay, whatever. These examples exist. An eval function for a small arithmetic language usually doesn't need monads either.

That's the point I made right back at the beginning which started the discussion.

In Haskell you have to use

In Haskell you have to use Monads for impurity because functions have to be pure

This is where you go wrong. Monads are also pure values in Haskell.

When you're writing in the IO monad, you don't need to concern yourself with how the Haskell compiler attaches the IO operations to the real world sensors and displays and databases and actuators. You just accept that it will be attached in a reasonable manner, and you're free to focus on your application logic. The same principle holds for representing effectful application logic using any other structure.

You could compile/interpret your structures to IO or just provide them as an export from a Haskell module for unspecified interpretation. You might compile the app to another intermediate language with another effects model, which again might not be monadic. In any case, what matters is your application logic and the application model for accepting inputs and providing observable output and evolving statefully.

The imperative case is the universal

It really isn't. The imperative language is just one more example of an interpreted structure. Even whole 'machines' can be understood this way (e.g. via virtual machines and black box testing).

You could reasonably argue that imperative is "closer to the physical machine". Or that imperative allows adding effects to an application in a more pervasive and ad-hoc manner (at cost of hindering typeful control over effects). But it isn't more universal. Every effects models is ultimately interaction of structure and context, whether it's imperative system calls or attaching cells in a cellular automata to a set of IO pins and registers.

Almost

I agree we can functionally compose DSL fragments in the meta-language. This to me is what Oleg was investigating in his example. This is how Haskell's IO already works, 'bind' composes fragments of an IO language, and then the whole IO program is interpreted on destruction.

However pure is clearly a subset of impure, you can call a pure function from an impure one, but you cannot call an impure function from a pure function.

Edit: Further to this 'bind' is just an emergent property of a language in which there is a sequence of statements.

Sophistry

Cue the endless debate? Haskell can only be called pure due to categorical dishonesty! And what isn't pure about your microprocessor? It picks up voltages and transforms those to other voltages due to some function which can loosely be called mathematically pure.

This is such a tiring discussion.

Trying to establish some inclusion relation on purity based on superficial observation of how one badly designed language does things boils down to sophistry. There are no compelling arguments either way.

context and scope

I think your "pure is clearly a subset of impure" statement is obviously true but also no more significant than an observation that "functions in (a -> a') is clearly a subset of functions in (a,b) -> (a',b)". In both cases, the former can be embedded in the latter. It isn't a truth specific to the nature of effects. It's just a truth about scope control in general.

There are many forms of scope control, of course, and not all of them are typeful. Object capability model is one example. Modal logic is another.

IMO, 'purity' is rather informal and is mostly useful as a statement about the scope of language expressions. Impure languages are essentially those that don't make it easy to control and reason about scope. Impurity is about chaotic scope. Impurity isn't about stateful effects except insofar as state tends to complicate scope (by adding a temporal dimension to scope). A language that has no stateful side-effects but permits ad-hoc reflection on the call stack or the deep implementation of its arguments (e.g. such that we can distinguish `42` from `6 * 7`) could still be considered impure from all practical standpoints.

Not all effects are monads in Haskell

Keean Schupke:

In Haskell you have to use Monads for impurity because functions have to be pure

It is worth noting that not all effects are encapsulated monadically in Haskell either. Specifically, non-termination is not.

Algebraic Effects

Whilst non-termination is a 'side-effect', I am not sure if it's an algebraic effect? What is the context the procedure is interacting with via an effect?

Co-hygiene

These days, I wouldn't think of non-termination as a side-effect, because I'm now thinking of side-effectfulness as violation of co-hygiene. Afaics non-termination is orthogonal to co-hygiene.

Re: Effects and State

Your earlier assertion to which I replied was this: "the DSL operations exported from module S cannot have any state unless they are in a monad".

I would first distinguish between "having" state (i.e. modeling a system that evolves statefully as a consequence of inputs, including time) vs. effectfully interacting with external state (a consequence of an external context observing outputs and providing inputs). There are a lot of useful systems, such as simulations or control loops, that may be stateful without specifying effects.

In any case, monadic structure is just one way to present observable outputs to an external context. KPNs are another - e.g. a 'log' output channel could be attached to a real world console, and a 'keyboard' input channel could also be attached appropriately. It's even possible to do something like insist specific cells within a cellular automaton shall serve as the observable "outputs" or mutable "inputs" from context of an external system. These are "effects models" neither of which is monadic.

You have accepted that monads can be "effectful" despite monads being pure values that essentially represent commands and continuations as data to an external context, like `Log("Hello", continuation)` or `Read(register, continuationReceivingValue)`. But I think you haven't yet generalized this to non-monadic structures. It works just as well for integrating KPN inputs and outputs with keyboard and console and GUI, or for integrating cells and time-steps within a cellular automata with real-world sensors and actuators. There isn't anything special about the monadic structure for the purpose of modeling and integrating effects. The only thing "special" about monadic structure regards its ability to conveniently model procedural languages.

Mondic Structure = Effects + Embedding.

There are a lot of useful systems, such as simulations or control loops, that may be stateful without specifying effects.

I would disagree, effects are happening. If it is stateful then there are effects locally. See Oleg's paper I referenced above. Simply storing data to memory is an effect. At a low level even functional languages have effects (the stack 'push' and 'pop' are effectful operations, and the stack is the context). However as Oleg points out what is locally effectful can be seen as pure from the outside, providing the effects do not 'leak' out. We require procedures to be regular (see Stepanov's definition of a 'Regular Procedure') then we can view them as pure from the outside.

KPNs can have internal state too, and if each node is a process the input and output channels would be effects to that process (taking the local view of each process). Globally the thing may be pure (in that putting the same input sequence results in the same output sequence), but each node is definitely not pure. As Oleg states pure/impure is all about where you draw the line. We know this because CPUs are all impure, yet we can write a pure functional language to run on them.

So where does that leave Monads? If you have a language with effects, like 'put', 'get', then you will have semantics that rely on sequencing, and therefore you will have an operation that maps to 'bind' and therefore there will be a monad for that language even if you don't call it that, or don't use it. Whether something is monadic is not a choice, it is a property of the language that you observe, like every rectangle has a width and a height. You cannot have a rectangle without width and height. Likewise you cannot have a DSL with effect sequencing without it being a monad, because embedding the DSL gives you 'unit' and the sequencing can be expressed as 'bind'.

So for each KPN where within each node we have a process that has input and output effects, we have a language with the property of sequencing. All we need to do is embed that KPN process language in a meta-language and we have a monadic structure.

The 'special' thing about monadic structure is that it is an emergent property of any effectful language when embedded in a (possibly pure) meta-language, and effects occur everywhere you need a context to interpret a program.

Edit: a further thought is that the effectful view is the more universal, because we can only have closed pure systems. As soon as the system refers to any external state (say a network) we involve effects. Also if the system had internal state we must be able to reset it, so we can replay inputs to get the same results. I think we can show there are more open systems than closed systems, and that every closed system is a special case of an open system with no external references. For example a KPN which references an external data store, or network. Now you could include that external store inside your model to make it pure again, but you would end up having to put the entire universe into the model.

Thinking about this, what kind of model would include a file-store inside the pure closure? As we require replayability we need to erase the store at the beginning of any call to this function. To me this clearly seems less general, but I admit there is a duality here.

You're still missing something

Likewise you cannot have a DSL with effect sequencing without it being a monad, because embedding the DSL gives you 'unit' and the sequencing can be expressed as 'bind'.

No. To be monad, unit & bind have to have the correct types. Unit is supposed to have type 'forall a. a -> m a'. If you'll look at Oleg's DSL, it only supports two embedded types: int & list of int. Yes, you can add unit and bind, but they won't have the correct types for this reason.

I'm gonna pass on discussing the meaning of "effect".

They Exist

Such functions with the correct types exist, whether you define them or not. My point is,if you _can_ define 'unit' and 'bind' for language, it is monadic, whether you use those particular functions or not.

Can you?

Care to elaborate? What return do you have in mind for Oleg's example?

Quoting

'unit' is lifting the embedded language into the meta-language, hence it is "quoting". Unit is an abstract operation, that is it generalises a specific operation. In Oleg's example he 'specifically' quotes 'int' as 'int_t' and 'list' as 'list_t'. Unit would be a generalisation of this operation so for example a macro that replaced '*' with '*_t'. We can probably represent this in ML with a type constructor, say 't a' so we would represent 'int_t' as 't int' and 'list_t' as 't list'. Now 't' looks more like a monadic structure as we abstract the embedding of the DSL.

If you change it use a type

If you change it use a type constructor that embeds all types, you've solved the problem. Sometimes you can't do that. When you can't, you don't have a monad.

Sometimes you can't do that?

Seems a bit vague. Precisely when can't you do that? In Oleg's example you could clearly replace the quoting of 'int' and 'list' with an abstract operation. I would suggest you can always do that.

Edit: Besides the abstract operation still exists even if the meta-language cannot express it. For the structure to not exist you would require a DSL that _cannot_ be embedded in _any_ meta-language.

Effects

I think your claim "If it is stateful then there are effects locally" is untrue. Cellular automata, synchronous reactive systems, and temporal logic provide fine examples of systems that are stateful without any "local" effects or manipulation of state.

KPNs are interesting because they can be understood as pure values. Although it is certainly most efficient to implement them statefully with an object per process, fundamentally a KPN "in progress" isn't any more stateful than a lambda application expression that we might evaluate by rewriting.

I think you should not simply hand-wave KPNs into monads. There is no single standard way to "bind" two KPNs. Every bind statement would need to include a specification of which output channels to attach to which input channels, which is already beyond the monadic structure. The type is also very complicated, and is not a general value type.

"As soon as the system refers to any external state (say a network) we involve effects" - I agree, but a system doesn't need to refer to external state to be useful as an effectful application model. A KPN could be wired into to a network and database by a third party (based on labeling of channels) without containing any references to a network. This is the scenario I've been assuming for KPNs as alternative to monads.

Capturing a KPN

One thing you get with strict sequencing of operations is the ability to capture continuations. Have you thought about doing something similar for KPNs?

KPNs are just values. The

KPNs are just values. The interesting point is due to their monotonic nature, we can easily inject and extract messages in parallel with ongoing evaluation, and we may freely commute injections and extractions on separate ports / channels which is useful for wiring up to real world systems or other KPNs. But we can certainly capture any KPN as a plain old value after any given set of updates to it. KPNs as first-class values, evaluating in parallel in the background, make for an interesting OO object model.

From an implementation standpoint, it's easier in a language that makes value copies explicit, so we can efficiently mix long sequences of linear updates with infrequent copies. (This is convenient for both KPNs and in-place array manipulations.)

KPNs can have state.

There is no restriction I can see on the node processes having state in a KPN. Each node could be a CPU with local memory. It is only required that the whole system produce the same output given the same input sequence(s). To say they are just a value seems an over simplification (or an extreme case of the 'outside' viewpoint).

KPN descriptions

The KPN value is essentially just the document of circles and arrows, a data structure describing the KPN. I sometimes think a better term for it would be "network description", and I originally was careful to use that phrasing. But the literature surrounding KPNs tends to conflate the two (always pointing to a description and calling it a KPN) so I've slowly gravitated towards using the term as I've seen it used. Apologies for the miscommunication.

Evaluating a KPN description (e.g. with some pending input messages) results in an updated KPN description (e.g. with some pending output messages). This evaluation may proceed by constructing some imperative or even physical representation of the network, then running it. If evaluation terminates, we'll have a normal form. We can still monotonically observe and interact with KPNs where evaluation doesn't terminate, albeit at risk of divergence if we try to observe an output message that is never produced.

Anyhow, it makes sense to think about KPN 'nodes' as stateful, even in context of an immutable network description. But it's important to not confuse layers. The "stateful" here can ultimately understood in the same sense that a pure lambda-based process model can be understood as "stateful":

type Process i o = i -> (o, Process i o)

State is an incremental aggregation of information over time, so such a process is stateful if we supply inputs over time.

Ultimately, KPNs could be understood as a `Process` with a rather more complicated type. (I haven't found a good way to model KPN types in conventional type systems.)

FSM

Even with a Finite State Machine model, with pure transition functions, the state is still there. I actually like this FSM kind of model, which is similar to some formulations of the actor model, where you have something like an object with private local state, and pure transition 'methods'.

You can capture a KPN as a

You can capture a KPN as a value, but I asked specifically about continuations. When events can arise in parallel it's not clear to me how you'd capture a continuation on an event.

Current continuations

Well, there is no equivalent to capturing a "current continuation" for KPNs, if that's what you're asking. But if you want to model delimited continuations, you would do so much as you might model delimited continuations in normal functional programming. You simply need first-class KPN values representing continuations.

Such continuations would be delimited both spatially (only having so many input and output ports) and temporally (being values produced, composed, and consumed at specific times by the parent computation). But they'd preserve a lot of nice properties of KPNs, e.g. we could compose our continuations, push messages into certain input ports, freely compute and distribute our continuation in parallel with computing the outer computation. We could usefully compile programs in a continuation-passing style. Etc.

I see, so you wouldn't

I see, so you wouldn't attempt to capture the continuation of a general KPN event, but would use a special continuation passing style when that's desired.

Affirmative.

But I'd recommend doing the same for conventional functional or imperative continuations, too. It is `call/cc` or `yield` that is "special". :D

Olegs HOPE 2017 Paper

Did you read Oleg's HOPE 2017 paper I linked to? Do you agree with him? I do, but perhaps he can explain it better than me.

partial agreement

I agree with his overall argument that effects should be understood in terms of interaction between an expression and its context.

That said, for "effectful" to be useful as an adjective describing expressions, it must make useful distinctions. For example, I would not consider `6 * 7` to be "effectful" just because it requires a "context", such as the programming language and libraries, to provide the meanings for `6` and `7` and `*` and even the whitespace.

Also, as someone who favor concatenative combinators over lambda calculus as a foundation for purely functional programming, I find Oleg's argument generalizing to higher order programming is very weak. I do higher order programming all the time without variables, without any 'context' beyond the language definition. My Awelon language has the following four-combinator basis:

[B][A]a == A[B]   (apply)
[B][A]b == [[B]A] (bind)
   [A]c == [A][A] (copy)
   [A]d ==        (drop)

Evaluation is by local syntactic rewriting, which is confluent. This behaves a lot like a stack language, but we don't require evaluation to result in a stack of values - i.e. we evaluate instead from program to program. Closures and higher order programming are modeled using explicit bind rather than capturing a lexical environment. There is no need for variables, environments, alpha renaming, etc..

It's Turing complete, and translation from lambda calculus is trivial enough that I can support lambdas as a syntactic sugar. Given `X` as a variable that is known to be a singular value like `[A]`, we can extract it from subprogram E such that `E == X T(X,E)` and `T(X,E)` does not contain `X` as follows:

T(X,X) =>
T(X,   ) => d
T(X, [F]) => [T(X,F)] b
T(X, F G) => c [T(X,F)] a T(X,G)

We can optimize this to avoid unnecessary copies and bindings. But essentially we get lambdas by simply rewriting `λX.E` to `T(X,E)`.

In any case, Oleg is generalizing from his own experiences, which seem to be oriented around monadic effects models in lambda-based functional languages. His justification for generalization to higher-order programming was based dubiously upon typical implementations of lambda calculus, rather than something more essential to "higher order programming". Based on my own experiences, I feel he's generalizing too far. The idea that "effects" are an interaction between expression and context is still a useful one. But we need to discriminate what is a relevant context and interaction.

Also, AFAICT, the paper isn't even relevant to your position that "If it is stateful then there are effects locally". Maybe we're using different definitions for 'stateful'. To me, a stateful system is simply any system that accumulates information over time (i.e. "history matters"). Use of effects like get/set to manage state is merely an implementation detail. Even the concept of "expressions" is unnecessary for modeling stateful systems, as demonstrated by cellular automata.

Stateful

Even keeping a history requires memory, with 'get' and 'put' operations. All you do is bunch all the 'gets' at the beginning of the program, and have all the 'puts' at the end.

A cellular automata still has state, it's implicit in the model that every cell has state storage. The evolution of the automata is defined by a state transition function in the same way as a finite state machine in terms of defining the state at time 't+1' from the state at time 't'. We still need to 'get' the state before we evaluate the transition function, and 'put' it back afterwards.

"get" and "set"

What's interesting about the imperative state model is that it models a separation between an agent (who performs get and put or set) and state (that which is get or set). This separation is especially interesting in context of concurrency or consistency concerns.

Neither FSM nor cellular automata exhibit this separation. FSM and cellular automata state models are fundamentally different from imperative state. There is no get and set by an external agent. Instead, there are baked in transition models.

Trying to understand a model in terms of a typical implementation is a good start. But it's a category error to conflate the two, no matter how typical the implementation.

Insisting that FSM or cellular automata is stateful simply because you can implement them using get and set is outright wrong. They are stateful for more fundamental reasons. And there are also plenty of stateless models that can be implemented with get and set.

FSM

Insisting that FSM or cellular automata is stateful simply because you can implementation them using get and set is outright wrong.

I am saying you cannot implement them without state. Having the state implicit in the model is not the same as having no state. I am saying 'get' and 'set' effects are not just some incidental thing, but something fundamental to the nature of finite state machines and cellular automata. They literally could not exist without them.

category error

Perhaps you are imagining an imperative machine as the foundation for computation. I vaguely recall you've mentioned something along those lines before, that you take some abstraction on Von Neumann as the basis.

I could argue instead that you should understand imperative get/set in terms of cellular automata. Fundamentally, our real world's physical state is updated by apparently local rules of physics just like a cellular automata is updated by local rules, and imperative get/set is ultimately implemented physically, by pumping electricity into dumb but carefully shaped rocks. Every 'get' and 'set' is performed by bazillions of electrons moving around, none of which know anything at all about get or set or even the laws that guide them. Even the physical CPU or memory or bus can't be really said to get or set anything; they just shape electricity according to physical laws. Cellular automata is in some sense closer to the physical machine than imperative. We can learn about imperative state, get and set, and computer architecture by creating and observing a big simulation of a PDP11 in Conway's game of life.

This would also be a wrong argument, for the same reason yours is wrong. It's a category error to attribute properties of an implementation to the model being implemented. Assuming a particular foundation for computation is both unnecessary and a logic error.

An FSM is ultimately stateful simply by definition of stateful. Its state is a function of its history. It doesn't matter how I implement the FSM - with imperative cells in memory, with a digital logic circuit, with a Babbage engine, with an FPGA, with a static object per state whose transition method returns the next state-object, with matrix multiplication, etc.. Intriguingly, even the layer of physical or mathematical abstraction at which I represent and implement the FSM doesn't matter. It is sufficient to know it can be implemented, and perhaps some costs/benefits (like, matrix multiplication easily generalizes to non-deterministic or probabilistic finite state machines, but can be quite expensive).

Anyhow, you are going to stubbornly insist on your position even after having your category error pointed out. I won't waste more of my time or energy trying to convince you. Perhaps someday you'll escape that mental trap of framing everything in imperative terms and convince yourself. Perhaps not.

Category Errors?

If I write a state transition function 'S(t) -> S(t + 1)' I need to read the state from somewhere at time 't' and I need to write it back somewhere at 't + 1'. State requires memory, and memory has two fundamental operations, 'store' and 'retrieve'... even human memory has these two operations. I have never seen any memory that does not have these two operations. We don't even need to think about the implementation, the state is right there in the mathematical definition. Even if we implement the FSM with a pencil, we would record the state by 'putting' it on paper, and we would 'get' the state by reading it back.

What we can describe without the state, is the definition of the state-transition functions. In other words not the FSM itself but a 'recipe' for constructing a particular FSM. I think you are making a category error by confusing the recipe for a particular FSM with that actual FSM.

Yes. Category Errors.

You're still making them.

Nothing about the FSM model says you need to read or write anything. You're just inserting your in-the-head computation engine and on-paper memory as an alternative implementation of the Von Neumann architecture at this point.

When I originally came out of imperative programming and started on functional programming, I also focused on "the actual thing" (like 'actual actors model actors' vs. those simulated within a function) as though an imperative or OO simulation for a model was somehow more 'real' than the functional one. With benefit of hindsight and perhaps a little work involving virtual machines, I know that I was wrong. I was trapped by an imperative way of thinking and framing ideas. But I haven't a clue how to grant you a similar aha moment.

Time

Time and events (effects with order) are fundamental and many processes simply cannot be modelled completely without them.

Returning to the FSM model, it does say you have to read and write the state. Otherwise it is just a 'power' expression, without storing the state between each expression. For example if we represent the state transition as a function 'f', then without state we have 'f . f . f' which would be f^n, where 'n' can only be infinity as we have no way of starting and stopping the FSM in the functional model (start and stop are time based effects). So we end up with an FSM that happens all at once and cannot interact with any other system by input and output effects.

Without effects an FSM is useless.

Folds.

You can fold a FSM over a sequence of inputs. You can preserve just that sequence rather than its current state. You don't need anything beyond pure functions for a Moore or Mealy machine. You could say "without a context to provide input and process the output, even pure functions are useless." And that is true, but it's still useful to distinguish them from effectful expressions, and it is not difficult to find uses for FSM within 'context' of pure computations.

Time is indeed relevant to being stateful. A stateful system accumulates information over time. History matters. But there aren't any specific constraints on how state, or time for that matter, is modeled and represented.

We do need some unspecified context to provide time and input and observe state or outputs, of course. But you err when you assume a specific context - such as imperative state, where an agent explicitly reads and writes states to external memory - then wrongly attribute to the FSM (or whatever you are modeling) the properties belonging to said context. It's your assumed implementation that reads or writes state, not the FSM.

Even if it were the case that imperative model is the ultimate, fundamental context for all state and time - i.e. such that our real world is implemented by imperative reads and writes in God's computer - your argument would still be formally wrong.

Fatalism

Your argument requires we buy into fatalistic philosophy, that the sequence of inputs is pre-ordained, and therefore can be known ahead of time and fed into the program at the beginning of time. There is no free choice in your worldview, there is no opportunity to start a program where the outcome of some future event is not yet known. If a program is interactive, and has an event loop, the state space if infinite, so you cannot model all possible paths in advance. This means even the simplest tool like a desk calculator is intractable as a functional model. I am reminded of Einstein who said "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality."

I think the "unspecified" context is the kind of hand-waveyness that causes problems in mathematics. A FSM consists of two components, a stateful storage, and a combinatorial time-gated feedback function. That is the definition, and if you read Kohonen's definative work on this it is all modelled in terms of memory, including neural networks, self organising maps, cellular automata etc.

Not at all

My argument only says I can model and use FSM in many contexts. I can model an FSM within a pure computation. I can model an FSM within cellular automata or neural networks or temporal logic. I can model FSM within imperative memory and languages. I can model it with digital circuit feedback loops that represent state structurally without using separate get and set operations. I can model it with matrix multiplications by encoding states and inputs and edges as vectors or matrices. There is no need to insist on a "fundamental" state model to understand or utilize the FSM model.

I haven't made any assertion that requires fatalism. That's just your straw man or misunderstanding.

You can model anything in mathematics in terms of any Turing powerful substrate, and you can model many things using less powerful substrates. The fact that you can model those many things in terms of memory is not surprising. Regarding how you feel about the "hand waviness" of unspecified context, your argument that FSMs are implemented with reads and writes seems quite analogous to claiming circles are black simply because you can't imagine any other way to draw them and you think it "hand wavy" that math didn't specify a color.

Incomplete Specification

What you are doing is not providing a complete specification of the FSM, you are implying the existence of some unspecified context that lets you interpret the combinatorial part of the specification as an FSM. However not specifying it does not make it not exist. It is still there whether you write that part down or not, you are just assuming the mathematician reading it has the requisite knowledge to fill in the gaps. Ignoring a problem does not make it go away.

Further you cannot actually use a pure functional model of an FSM, because you need to provide events to trigger the state transition functions. As these events occur in real-time, you cannot pass a list of all events in advance.

You could use a stream, but you would need the concept of suspending the computation when there is no value ready in the stream, and that's not functional anyway, it's cofunctional.

Now maybe we could agree you can model an FSM cofunctionally?

re: incomplete

The FSM model does not specify where the inputs come from. It certainly doesn't specify "real time events". For example, we can use FSM to model regular languages, to match regular expressions. In that case, the inputs to the FSM might come from a string, or might come from stdin. The source of input is simply outside the scope of the FSM. As is where the FSM output is used, or how the FSM is implemented.

If you want to call it 'incomplete', then go ahead.

But every model of computation is incomplete for this reason. Even imperative code can be run inside a virtual machine on a simulated network of other such machines. That entire virtual network could be represented and evaluated within a pure expression whose final result might be `42`. Imperative code cannot know whether all inputs are provided or confined within an expression. Of course, a pure context doesn't make those imperative language subprograms any easier to reason about or refactor.

It's wrong to say "you cannot pass a list of all events in advance". It's also wrong to say "you must pass a list of all events in advance" or "you must provide inputs in real-time". Whether it's FSM or imperative or even pure functions, the origin of inputs and use of output is outside the scope of your model.

You earlier quoted Oleg defining effects in terms of an interaction between expression and context. But I think you might have missed an important aspect of this: the interaction doesn't imply a global or real-time context. A purely functional context might observe an FSM state and decide whether to provide another character input or compute something else. This is still a useful interaction between model and context, and hence an 'effect'.

Objects and Order

If I take a FSM and feed it a string, and then give you the FSM, and say 'now model this functionally', but I will not tell you what string I have already put in, how will you model it functionally?

Starting state

An FSM does have a starting state. So if you've already "put in" a string, that will just move the starting state for the FSM representation you give me.

When can you pause it?

So if I can pause the FSM and give it to you after any sequence of state transitions, then we must 'store' the current state to the 'starting state' at each of these points in order to transfer it with that state. Then if we don't know when I am going to give you the FSM then the only solution is to save the current state to the staring state after every transition. We have just proved that we need to have a 'put' effect after every transition.

Sigh.

You're just saying: "But my FSM is embedded in some concurrent imperative/object context such that I cannot blah blah blah without pause/get/put/etc. Therefore, FSMs in general have these features." You're once again wrongly attributing properties of your envisioned context and implementation to the finite state machine. Same category error as before.

Further, the implementation you're envisioning wasn't even implied by the question you asked. For example, the "I will not tell you what string I have already put in" restriction clearly does not restrict how you manage inputs or represent state on your end, and does not imply the sort of concurrency or state aliasing of the FSM implementation that requires explicitly pausing input or writing intermediate states.

Fold Vs FSM

And yet a fold is not a finite state machine. Why bother having two different concepts if they are the same? Certainly your television and most other electronics would not work without finite state machines.

The point is, all implementations have state, so when you abstract the algebraic properties of an FSM, the state is not abstracted away.

All you need is one implementation that does not have any state to prove me wrong. Should be easy right?

Wat?

I do not conflate folds with finite state machines. They aren't at all the same concepts. I've only said they can be used together.

An FSM is stateful in the sense it records information accumulated over multiple interactions. But its implementation does not need to have mutable state managed by reads and writes. For example, we can represent FSM in a pure lambda calculus and use it for regex matching. You might argue lambda calculus is likely to be implemented statefully. But that use of state in the implementation is abstracted away in the semantics, not observable, which contradicts your claim that "the state is not abstracted away".

Also, it doesn't take any implementations to prove you wrong. Pointing to your category error is sufficient. Convincing you of your error is a more difficult task.

Abstraction and Values

How do you implement an FSM without mutable state and reads and writes, and I mean a real implementation, something I could have on my desk?

Also I don't think lambda calculus abstracts state at all. An abstract operation is one which generalises a category of concrete operations. There are many kinds of memory that can be used to implement state, and they all have differing concrete protocols to read and write data. 'put' and 'get' effects are an abstraction of these concrete operations.

Where is the state abstraction in lambda-calculus? Any lambda term beta reduces to normal form, and it is a value not a state. Values, like the number '3' have no state, they are just values.

Back at you.

You argue FSMs are stateful on the basis that their physical implementations ultimately must be. Unless you have a double standard, the same argument should apply to lambda calculus. How can you implement lambda calculus, a real implementation that I can have on my desk, without some form of mutable state? How can you physically rewrite an expression to normal form? Information and values and data structures are ultimately represented by waves, entanglements, and vibrating particles in physical space-time, are they not? How can even a value be stateless when it has a stateful physical implementation?

Your argument isn't a valid one. Category error invalidates the whole thing. Further, even ignoring the category error, you cannot differentiate between stateful and stateless computation models just by looking at their physical implementations. Computation is fundamentally mechanical, after all, and whether two physical structures have the same observable behavior ultimately depends on which observations are made, potential interference, and other non-local emergent properties of a system. This depends on higher level semantic constraints that aren't visible in the physical implementation. All computations must be considered equally stateful if we judge by observing their mechanical implementations.

Anyhow, I'll review a few points. First and most importantly, the properties of implementations don't belong to the model. The entire foundation for the argument you're trying to make by considering implementations is invalidated by category error. Second, I have mentioned we can easily implement FSM in hardware with a digital logic circuit, such that it is gated and performed by physical laws without an external agent or CPU doing reads or writes of a separate store. You can put a digital logic circuit on your desk. A lot of hardware design code like VHDL can be implemented this way. Arguably, your choice of externally mutable state with reads and writes is too high level and non-local a state abstraction if your interest is physical implementations. I have an impression that you're willing to get vague and hand-wavy to squeeze all forms of physical state into "reads and writes" to fit your imperative world view, but I think doing so is unwise and will blind you to a vast and interesting world of state models. In any case, physical implementations are irrelevant to whether a computation model is stateful because of the first point. Third, we can implement FSM within lambda calculus, e.g. using it for regular expression. We can similarly use State or ST monads, or even implement networks of virtual machines running imperative code. The existence of such implementations doesn't make FSMs or imperative code pure or stateless because it would also be a category error to attribute the purity of a functional implementation to the model being implemented. But it nonetheless contradicts your repeated arguments about implementations of stateful models requiring a mutable state all the way down, and it demonstrates that non-physical implementations can be useful.

You are simply wrong at multiple levels.

I doubt you'll be convinced. Opening your mind to doubt for your own hypothesis and world view is up to you and your honesty as an intellectual, not up to me. If you want to convince me, you'll first need to avoid that glaring category error. You'll also need to resist your confirmation-bias double standard where you argue in terms of physical implementations only for stateful models.

pure = impure + context

Registers still have read and write operations, so the VHDL argument does not work.

I agree by the way that lambda calculus cannot be implemented without state.

If you look at Oleg's argument is says fundamentally everything is an effectful computation + context, but that if we can 'box' those two together, so that we cannot see the effects between the computation and the context we can call it pure.

A random access memory machine implementing lambda calculus is pure only if we cannot see any interaction between the program implementing the lambda calculus and it's context.

Given a stateful FSM plus its initial state, you can put both into a black box and call it pure, providing you feed the input into the FSM as a single step and it "reduces" to its output, after which it must be disposed of and not used again. Each input string must be fed to a fresh FSM.

A FSM is a tool you use for managing real time events, and without the events there are simpler models you could use, rather than packaging an FSM with its initial state and feeding it's input all in one go. A fold for example seems more useful.

To fully state my position, I still think a FSM is by definition impure, but if you package the FSM along with an "initial state" you can have a pure function that accepts an input argument and returns a result.

This is exactly what Oleg is stating in his paper I referenced: pure = impure + context.

It is true that CPU

It is true that CPU registers have read and write operations. Literally, there are operations in the CPU that address, read, and write registers. However, not all hardware memory works that way. If memory is not accessable by a separate agent like a cpu, e.g. if embedded in a control loop, then there literally is no read or write 'operation'.

All models are tools, and even functions are frequently used in real time systems. I don't believe FSMs should receive special attention in this regard. Also, a fold doesn't even serve the same role as an FSM. The FSM would serve the role of an accumulator argument to a fold.

Regarding "package FSM along with initial state" that initial state is literally in the formal definition of most FSMs like Mealy and Moore machine. State isn't something separate from the FSM. It simply doesn't need to be a mutable representation, e.g. `StateEnum` vs `ref StateEnum`.

Control Loops

if embedded in a control loop, then there literally is no read or write 'operation'.

A register is a generic term for an electronic device that has data write lines, data read lines, and a latch trigger, that causes the input to be registered so that the output does not change when the input changes.

This still has a write operation, that may be tied to the clock signal, so that each clock edge the output of some combinatorial logic is written to the register. In effect a register behaves like a single location memory, which is not surprising as a mylti-location memory is made by multiplexing many of these registers so that the address controls the read and write data routing to the correct memory 'cell'. Of course high density DRAM uses transistor gates as capacitors to save space, and also has a 'refresh' operation that must be triggered every so many cycles to prevent data loss.

It is the separation between

It is the separation between "reader" and "that which is read" and similar for writes that lets us meaningfully use the read/write metaphor.

A control loop lacks this separation, and the metaphor quickly falls apart. A clock or other circuit controlling read or write "mode" is not doing any reading or writing. There is no instruction you can point to as a literal "read operation". Even if you choose to use vestigial language like "read and write" in this context, it would be an equivocation to conflate this with imperative reads and writes.

In any case, not all stateful digital circuits and control loops have separate read and write modes. They aren't "registers" by the definition you give. Are you assuming that all stateful circuits expressed in vhdl and similar hardware design languages must use a register?

I admit I've never used vhdl, but only some lesser known hardware design languages like lava. (An old version of lava, embedded in Scheme rather than Haskell.) I mentioned hardware design code "like vhdl" because I assume vhdl is better known and similarly expressive. And since my use was only for a university class, I couldn't even begin to use standard register abstractions for state until after I had basically implemented one and had already developed several simpler stateful circuits and control loops.

VHDL and FSM Redux.

I have written VHDL, and I can say that even the simplest two logic gate implementation of a register has a specific write operation. Infact a J/K flip-flop which is about as simple as you can get has two commands, 'set' and 'reset'.

Yes the output is always on, but the next register does not latch the data until triggered. So in many ways the write operation is also the read operation. This is important because real logic is not immediately stable, when the state of a register changes the signals take different times to propagate through the combinatorial logic due to passing through differing numbers of gates and different wire lengths (even different electron speeds in the case of polysilicon Vs metal layers). So it's important we wait to read the state of any combinatorial block until the settle time has elapsed after the register at the input to the block has changed its state.

So it's more of a "read/write" combined operation in embedded registers.

Going back to the discussion, I think we agree with Oleg's position that you can view the combination of something impure, along with its context, that it interacts with via effects, as pure if you bundle it all up.

So if we agree that "pure = impure + context", then I think our only disagreement is that I am saying the "bottom" of our pile of turtles is an impure thing. It is not "turtles all the way down" :-)

For the FSM, if you follow the rules, which is encapsulate the FSM, and it's initial state, and then feed it a single finite input sequence, and it returns a single finite output sequence, after which you throw it away and never use it again, then you can treat it as pure.

My view is, if you don't follow those rules it is impure, and because those rules are not part of the definition of an FSM the thing itself is impure, but you can treat it as pure in special circumstances, if you are careful.

Purity and context

I think we both agree that an FSM's state may be modeled by a value or by some form of mutable reference. And we both agree that stateful models must have some indirect, information-preserving association with stateful physical representations. And we both agree that directness of representation may be important for performance or real-time requirements.

But an FSM isn't intrinsically impure. It does not specify interactions with context. It assumes a context will exist to provide inputs and observe outputs, but does not itself observe or operate on this context. This scenario is not different from a pure function.

Perhaps a useful analogy is making circles "resizable" with mutable radius. Unlike radius, being resizable cannot be reasonably considered an intrinsic property of a circle. It's an incidental property of some common implementations. But it would be strange if I were to insist "circles are impure" just because I and many others might choose to implement circles as resizable. Similarly, mutable implementation is common but incidental to formal models of an FSM or DFA. I find it strange that you insist FSMs are impure.

There are plenty of intrinsically impure abstractions such as imperative threads or actors model or tuple spaces, where components observe and act upon a shared context. For those things, you really do need to package a context together with the active components to achieve a pure model. But for FSM, all the relevant state is local, internal, so it's sufficient to avoid incidental impurity such a mutable state implementation.

Objects

Well the difference is you can feed many inputs into the same FSM. We can easily define the transition function of a FSM:

let new_state = transition(old_state, inputs)

And the transition function is clearly a pure function. So we can agree that.

This is not however an FSM, because the recursion is open. An FSM would be the fix-point of the transition function, which would require an infinite sequence of inputs to be provided.

What we really want for a FSM is an object model like this:

let outputs = FSM.transition(inputs)

Where FSM is the current state of the FSM.

This makes sense if you look at the OOHaskell paper Oleg and Ralf wrote with contributions from me, it shows that an object is the fix-point of a record with open recursion.

Dup

Dup

Disagree

A typical FSM is formally defined by a state, transition function, and a few types (input alphabet, states, etc.). That's it. And since these few things define an FSM, we have an FSM whenever we package these few things together.

An input stream is not part of the FSM, and does not need to be packaged or integrated with it. There are no special constraints for how input is provided or output is observed. If you want to fold-map an FSM over a stream, you can do that. But it isn't necessary. Use of an FSM by providing one input at a time is not a disqualifier by any formal definition of FSM I've encountered.

I grant there are many issues with naively modeling input as infinite list or stream values. That's basically a failed experiment in FP. Today, you might instead try pipes/conduits/transducers/etc.. But I don't see input modeling as more than peripherally relevant to modeling and use of FSMs.

Machines

But is it a machine? We have a sense of what a machine is from a "random access memory machine" and a "Turing machine".

By unpacking the state so that instead of the fix-point you have an input value, an output value and a transition function, you have lost the machine.

For example nothing forces you to feed the output back into another transition function, you can inspect the input and output.

If you take this stance, then a Turing machine is just an input value, a transition function and an output value.

If a Turing machine is just a function, then what did Turing invent?

If both a Turing machine and an FSM are an input value, a transition function and an output value, what is the difference between them? We cannot describe the constraints that make these machines different within the language of lambda calculus, we have to introduce concepts of state to do this.

re: Machines

Like Turing machines, a finite state machine is an abstract machine, characterized by simple operational behaviors that can feasibly be implemented mechanically. Isn't that the common feature for 'machine'?

Actually, I don't believe pontifications over the connotation of "machine" versus "calculus" or whatever is very relevant. It would be ridiculous to generalize properties of a model based on an assumption about informal naming conventions. If we had a formal taxonomy, it would be a different matter.

By unpacking the state so that instead of the fix-point you have an input value, an output value and a transition function, you have lost the machine.

That's just nonsense. If you want to "interact" with an abstract Turing machine over time, you must compute on an input tape, wait for halt, observe the output to influence context and potentially affect the next input tape (this is the "interaction"), and continue. Almost exactly as you would for an FSM. Whether a tape is represented by mutable state or some form of persistent zipper is irrelevant to both the machine model and valid interactions with it. What have we lost, precisely?

If you want to model systems where outputs and interaction are incremental or even concurrent, and stable identity becomes relevant, there are models and calculi for that. But FSM and Turing machines (without significant modifications) aren't among them.

If a Turing machine is just a function, then what did Turing invent?

Turing invented a general, mechanical, but relatively awkward way to represent all computable functions. Of course, he didn't know this for certain at the time. He only had his hypotheses. Church's lambda calculus was invented around the same time as Turing's abstract machine, but Church-Turing thesis and the relationship to general computability took a few years longer to hammer out.

The Turing machine is not very interesting or profound except in its historical context exploring computability. So don't go making a big deal about it in a context where the history is irrelevant.

If both a Turing machine and an FSM are an input value, a transition function and an output value, what is the difference between them?

The latter has finite input alphabet, finite input, finite state, a finite output set, and trivially halts after each input in bounded time.

We cannot describe the constraints that make these machines different within the language of lambda calculus

Usually, we distinguish implementation from requirements.

If we're "implementing" an FSM or Turing machine in the lambda calculus, then our goal is not to describe or enforce constraints but rather to faithfully represent the machine. Even for physical implementations, hardware cannot locally enforce constraints like: "please avoid strong magnets and gamma radiation and water" that might affect state outside purview of the model. At best, we can attach comments and assume disciplined use where our implementation details might otherwise leak into our abstractions.

So describing constraints doesn't seem relevant in context of this discussion so far.

Further, even if it were relevant, I don't see how introducing mutable state would help describe constraints. You just add one more concept to constrain with respect to aliasing and concurrent access. To describe constraints, you would be much better off trying a type system or intermediate DSL.

Maybe

Whether a tape is represented by mutable state or some form of persistent zipper is irrelevant to both the machine model and valid interactions with it. What have we lost, precisely

I find myself agreeing with this, and yet you still cannot 'interact' with the state machine. All the transitions have to be known ahead of time. So you can do:

let state1 = T(state0, input0)
let state2 = T(state1, input1)

You must know the complete sequence of inputs from the start - unless we allow IO to occur between applications of the transition function, which means we are processing events/effects on each transition. To me that makes it a bit useless, but if that is all you are claiming, then I would have to agree with you.

interaction

We do allow IO between steps. Or put another way, we interact with a context, and in general the relevant context might include a human or database. If it's just a database, perhaps we could represent it as a local trie without doing IO. But if it's a human, we'll certainly need IO, UI, and so on.

Aside: even for closed world contexts, it's incorrect to say that we know up front all the inputs to a machine modeled within the context. Each input may be a result of computations in context, and may depend upon prior outputs. And in general we can't even know up front whether a machine will halt on a given input, much less what the output or consequent next input will be. We can only run the machines and diligently perform the interactions we have modeled. What you could reasonably argue is that all information in a closed world context must already be represented somewhere in that context. As for whether a context is a closed world, that's not something our FSM or Turing machine can know and it ultimately doesn't affect the concept of "interaction" between model and context. (Consider: we humans cannot prove whether or not the real world is "closed" under some macro view of the universe. Based on conservation laws, it might well be closed. Yet we freely speak of our own interactions. So unless we have a double standard...)

Turing machines, well, you can interact with them but they really aren't designed for it. They are designed for one-off computation. They are a poor basis for teaching, learning, or understanding interaction.

To study interaction, a better start is contemplating an interactive fiction modeled via purely functional `type Process i o = i -> (o * Process i o)` with i and o set to a type like String. How would you interact with this model? How might the model be tweaked, while remaining pure, so we can ask "Hello, are you still there?" when the user hasn't responded after some time?

Monadic IO itself can also be understood as a value with which an interpreter or runtime interacts by taking turns. An IO type, if not returning, essentially represents a singular request such as to print a line to console or call a foreign function or input a file, together with a continuation. It's up to the context to observe the request and supply input data to the continuation.

An interesting observation is how features like `newIORef` tend to entangle the model and context through shared reference identity, such that we cannot easily extract and move the model into another context. One of my interests has been avoiding this sort of entanglement, as I see it as a significant source of complexity in distributed systems. But it's also this sort of entanglement that leads to conventional notions of stable object identity within software.

Anyhow, interaction is a deep subject. The idea of interaction as taking turns between context and model is simple and useful. But we aren't limited to taking turns in control flow, e.g. there are models for concurrent interaction, incremental inputs and output, multi agent systems, etc..

Double-disagree

I have to nit-pick your disagreement slightly: a FSM is formally defined by a finite set of states, a finite set of events, and a definition of a transition function.

A typical implementation of a FSM will include some concrete representation of states and the transition function. Execution of the "function" will interact with the events in an implementation-specific manner, e.g. callbacks, access to a stream etc...

How the events are communicated to the transition function is not part of the definition. But all implementations of a FSM include some definition of interaction with the events, although typically this is implicit in the implementing code.

This may seem somewhat pedantic, but it does seem to be the crux of the disagreement. The definition that you are using is suitable as a model of a FSM, but it is not effective. It requires the part that Keean is describing in order to become an effective procedure. Where you refer to "state" in the formal definition - it is actually the set of states that bound the execution of the function. Getting between the model (over the set of states) and the concrete instantiation (where it is "a" state) is exactly where it is argued that some form of state reading/writing is required.

Finiteness and Locality

I agree that the finite nature of both the input alphabet and state set are important properties of the finite state machine. Finiteness is implicit in the "few types (input alphabet, states, etc.)" for the FSM. But I could have been more precise with my words there.

all implementations of a FSM include some definition of interaction with the events

If you accept a transition function like `State -> Input -> State` as "some definition of interaction", then this is trivially true. I assume you weren't trying to say something trivial. By "include", I assume you mean something internal to the FSM implementation, as opposed to external. If you believe all FSM implementations are integrated internally with an event source or sink, I disagree.

Physical implementation of FSM on a chip is a fine example that does not include interaction with events. Due to locality of physical interaction, a physical FSM implementation will only provide a few lines for input and output but leave integration and interaction of those lines to the external system. That is, we can have an FSM on a chip, but how that chip is installed or integrated is absolutely not up to the FSM, and is clearly not "included" in the physical implementation.

Physical implementations are close in nature to the formal definition. The formal definition support interaction by defining a transition function but eschews integration with external event models, and separates interaction from the FSM by moving control into the context.

One of the intriguing aspects of pure FP is its fundamentally local nature, that all 'interactions' must be modeled locally. This is closer to physical structure than the ad-hoc action-at-a-distance from OOP or imperative systems. It's a lot easier to physically implement first order functions and pure state than it is to physically implement the ad-hoc connectivity of imperative code. I find it baffling that people who focus on physical machines seem rarely to appreciate the benefits of computational locality and its natural alignment with physical locality.

Programmers who perform ad-hoc internal integration with event sources and shared state are further removed from both formal definitions and physical implementations of FSM. Meanwhile, focusing on models with local, finite interactions is not any less effective or scalable. Although I grant it requires learning a different paradigm and suitable design patterns.

Where you refer to "state" in the formal definition - it is actually the set of states that bound the execution of the function.

No, I was referring to the "initial state" in the formal definition, which serves equally well as a current state. The "set of states" I instead described among "a few types (input alphabet, states, etc.)". Unless you are modeling a nondeterministic or probabilistic FSM, the current state is not usually itself a set.

some form of state reading/writing is required

We need some sort of physical representation of information held by the FSM. But this is also true for every computing model, including the pure ones. Information in a computation will always, ultimately, have a physical representation whose size is directly related to the amount of information (e.g. number of bits).

A point of contention has not been the presence of such storage, but the direct access and 'mutability' thereof.

If you accept a transition

If you accept a transition function like `State -> Input -> State` as "some definition of interaction", then this is trivially true. I assume you weren't trying to say something trivial. By "include", I assume you mean something internal to the FSM implementation, as opposed to external. If you believe all FSM implementations are integrated internally with an event source or sink, I disagree.

I would accept it as some part of an interaction - but not the whole thing. The Input is the value that arose as a result of the interaction. But it is not the interaction itself. There is something missing from the description of how the FSM is embedded in the environment that it interacts with. Unless the embedding context is purely functional, which is as you point out trivial.

A definition of the state-transition function is sufficient to model what the FSM will do - but there is a small amount of bricolage (in the Turkle/Papert sense of the concrete implementation details) that are required to make it executable in a particular context. It is these details that I refered to as "some definition". It the distinction between a FSM as

  • a definition of a computation [embedded in some environment].
  • a definition of an interaction with an environment.

It can serve in both roles but in the second we need some concrete definition of how the interaction occurs. Typically the interaction will correspond to a state monad (in a synchronous system) or a sequence of transitions.

Transition vs interaction

I agree we need more than the transition function to have an interaction. The context needs to actually provide the input, observe the output, and influence its own future behavior such as deciding the next input (if any).

An FSM does not define interactions with an environment. The FSM model doesn't have any say regarding where inputs come from or whether outputs are observed or decisions made in context. An FSM only serves the role as one part of an interaction.

You can entangle an FSM implementation with its context to an extent where you can argue the composite object "includes" some definition of interaction, yet is still recognizable as an FSM. But you certainly aren't required to do so. Not "all implementations".

Time and reduction

The FSM model does have the concept of time and sequence though, you have state at time t, and a transition to a new state at time t+1.

Functions evaluate through beta-reduction which has no time. In a functional program '1+1' and '2' are identical values. So the result of applying a transition function to a sequence of values in a (pure) functional program is simply the final value, and re-evaluating does not necessarily cause all the intermediate states to happen again, we can simply cache the result.

It seems clear that reduction is not sequencing, as some optimisations are valid for reductions that are not valid for sequencing. Beta reduction semantics allow for eager and lazy evaluation without the result changing, in fact there is no order in beta reduction, which is very different from the sequential t, t+1, t+2 ... t+n approach of the state machine.

Abstract time

We speak of "time" in any scenario where it makes sense to speak of before and after, causality. For example, we can usefully speak of time within an ST or State monad, even though they have pure evaluators. Similarly, we can speak of state and updates and time for an FSM even when we're just using the FSM to recognize regular expressions on an input string within a pure computation.

We could argue an FSM's time is borrowed from its context or input model. But there's no good reason to think too hard about it - time is peripheral to the formal definition, implicit to the sequencing of inputs. An abstract machine like FSM naturally has abstract time.

Monads and Nondeterminism

I agree that time can be abstract in a FSM, however there is not even abstract time in a functional program. The technique used by Haskell to create an IO sequence is the same as that used by Oleg, it composes imperative program fragments and then runs these imperatively when the IO monad is deconstructed.

Now you can cheat in other languages by passing the world, but that relies on the implementation of the language (that it evaluates beta reduction in an eager way). This evaluation order is not part of the model, so you cannot in general rely on it.

Going back to an earlier part of the discussion, yes everything is imperative at the bottom. The CPU has "time". The FSM has abstract time. Lambda calculus/beta reduction has no time. We can run an FSM or lambda calculus on a CPU because we can abstract time, or hide time. What we cannot do is re-introduce time once we have hidden it, because evaluation order, or even whether something gets re-evaluated is non-deterministic.

Not relevant

Time is part of a model, such as FSM or temporal logic or synchronous reactive programming or actors model or a physics simulation. Implicitly or explicitly.

Implementation within lambda calculus, or even running a physics simulation within a pure function to produce a result, doesn't make the model timeless. If you're trying to say that not all contexts use the same "clock", that is true. A context may supply its own model of time.

We cannot compute a future state without first computing representations for inputs and intermediate states. But we aren't relying on evaluation order, and it doesn't matter whether these representations are in normal form. With a pure lambda calculus we don't use "side effects" to model interactions and hence "effects".

Representing Time.

Within lambda calculus, you can have a variable represent time, but you cannot guarantee that all values of that variable will be evaluated in the correct sequence due to the non-deterministic nature of beta-reduction. It is the same in any mathematical calculus, you can integrate a velocity over time to get a position, but this is not the same thing as a moving object. A ship navigating a course starts at the beginning and must pass through each intermediary position to reach the destination. Integrating the velocity function of the ship will give you the final destination, or any point along the way by changing the bounds, but it is not the same as the actual course. The pure approach can model the ships position, but it cannot actually sail the ship.

Still not relevant

Why would I care about order of reduction? I'm not relying on it for side effects or anything. Whether a model's state, inputs, and outputs are represented in a "normal form" of lambda calculus is not something we conditionally observe and is therefore irrelevant to both our model and our interactions with it.

Whether input to an FSM comes from string or console doesn't change the FSM or the basic interactions with it. Similarly, I can use simulated or sensor input to drive a pure model of a ship's position and estimate its position within the limits of that input data. A pure model can even compute outputs designed to control actuator systems of a suitably automated sailing vehicle, such that the context can control a real or simulated ship.

Something to consider: one of my first jobs involved unmanned surface vehicles (USVs), and running great gobs of existing ad-hoc imperative code in a network of virtual machines so we could transparently control a simulated USV (e.g. attaching to video-game-like systems). And a major goal was to support deterministic simulations so we could easily perform repeatable regression tests under slightly different conditions and simulations. (E.g. Control over scheduler could help find concurrency bugs, or see how behavior changes when we have slight variations in scheduling. Similar for simulating network delays and outages. And similar for wind conditions and adding obstacles.)

It's still imperative-paradigm code even when running in a fully deterministic simulated environment. For example, we don't have any benefits of functional purity for refactoring, abstracting, reasoning about, or optimizing the imperative code. As demonstrated by the simulation, imperative doesn't guarantee our interactions or model of time is "real". Imperative only entangles some concerns, making interactions more "implicit" and reducing control over them.

Not the real world?

So it's good for simulating things, but not for actually doing them for real?

'real'

No, it's equally as good for "doing things for real" as imperative. While you replied, I was adding a bit above to help clarify this.

Imperative doesn't guarantee our interactions or model of time are "real". We can still transparently provide simulated inputs to imperative programs via virtual machine sandboxes. Imperative only entangles some concerns - making interactions more "implicit" and reducing control over them.

With a pure model, our interactions are necessarily explicit. We cannot hide interactions as implementation details. But these interactions are still 'real' whenever our inputs and outputs are provided and utilized in context of real world sensors, actuators, UI, etc..

It is the context of use (and corresponding consequences) that makes a computing system "real". Paradigm is completely irrelevant.

How?

I don't see how you can do this. Haskell could not do this and needed to introduce the IO monad. Clean needed to introduce uniqueness types. If you try and do this in pure lambda calculus, it will be unsound due to the non-deterministic nature of beta reduction? Are you saying that languages like Haskell and Clean are trying to solve a non-existent problem? Their authors certainly thought there was a problem. ML can get away with it because it's not actually functional, as it allows side effects everywhere, that makes it more of an imperative language.

IO monad

IO monad is already an example of "doing this". It is a pure model with explicit interactions that are "real" when performed in context of an open network, sensors, etc. as opposed to a simulation sandbox.

IO is a model for interaction with an imperative context. An IO 'thread' is essentially a value representing either a final returned state or a pending request to be performed by the runtime together with a continuation to receive input data. An IO thread is analogous to FSM, which has either a final state or an intermediate state awaiting more input.

IO is not about controlling order of evaluation. For example, it doesn't matter whether you evaluate the IO request in parallel with the continuation. IO controls order of interaction, which is the ordering that matters for reasoning about and modeling state and time. A context can progress to compute the next request in the IO thread only after it provides an input to the continuation whose type depends on the current request.

I'm somewhat baffled as to how you imagine IO was a counter-example to anything I've described. It's one of the most obvious and widely known examples of how interacting with pure models can support real-world programming. IO doesn't require special privileges; it's just a value that happens to be opaque in Haskell like many other abstract data types. Haskell could just as easily have modeled IO as a typeclass or operational monad, such that we could transparently use it in context of pure computations. Doing so would not significantly change how a runtime context interacts with IO.

Functional composition of imperative programs

So I see where our disagreement comes from. The IO monad is not what you think. It does not allow interaction with a functional program. With the IO monad you are composing an imperative program using function composition. What happens is that no IO takes place while the functional program is running. At the and if the functional program you have composed an imperative program. This is then run, and all the IO interaction takes place with this imperative program.

IO can be run within a

IO can be run within a purely functional context. I've already told you how. Either the IO type can be made accessible for simulation (like a GADT or operational monad or typeclass), or we can sandbox arbitrary imperative programs to run in a simulated context.

If you want to say that simulated input-output is not "real world" input-output, that's obvious. But my intention isn't to conflate those. Rather, what makes interaction "real" is context of use, especially having real world consequences for output. Paradigm is irrelevant. Consequently, we must study paradigms in terms of utility and other fitness characteristics, not their 'reality'.

Functional programs model interactions explicitly. That is, we explicitly create interactive models, then represent our application logic within those models. Interactive models include FRP, KPN, FSM, IO, Actors model, Elm's Model-Update-View architecture, Felleisen's big-bang world model, and so on. We can hook these models indirectly up to real world endpoints. What makes programming of these models 'functional' is how the models are constructed, abstracted, and refactored. The functional program is ultimately a pure expression of an immutable value that we can 'run' interactively.

It is useful to distinguish "evaluating" the value from "running" the program. In Haskell, we frequently use terms like `runState` and `runST` and even the trivial `runIdentity`. It seems you're using "run the functional program" to mean "evaluate" instead of "apply a run function appropriate to type of model". Fine. I understand why you might do so, even if it's rather confusing from my perspective.

Removing IO.

What it seems you are saying is we can replace the IO monad with some pure monad and simulate the program in a pure context. This seems obvious if you realise what you are saying is you can run a program that does IO in a pure functional context if you remove all the IO.

However coming back to our FSM discussion, I can see that yes, if all you want is a simulation, then you can have a FSM in a pure context. So this is obviously useful for test frameworks etc.

Simulated IO is still IO

Remember a long time ago when you quoted Oleg?

What is an effect then? It is an interaction with the context.

Whenever we draw a line (or other topological distinction of choice) between a context and an object modeled within that context, and we have any form of interaction between them, we essentially have effects. Interaction, by nature, involves a flow of information - input and output between the modeled object and the context. This interaction isn't something we can avoid even within a closed world computation. In accordance with Rice's theorem, we cannot generally predict non-trivial properties such as whether a computation halts or its result. We can only compute step by step, perform these interactions and IO according to our model, then observe the result after (and if) we get one.

Further, from a practical standpoint, potential confinement within a simulation does not affect the paradigms or language we humans use. Even knowing an imperative program might run against a simulated filesystem and network, we'll still develop, abstract, and explain that program in terms of network and filesystem IO. Whether our interactions are simulated or real doesn't affect how we reason about, discuss, or develop our models. I think that not even you will change the entire language you use to discuss a program based on non-local properties of the context in which it is run.

Simulated IO is still IO, both formally and practically.

So I disagree with characterizing this as "removing IO".

Simulation

If you simulate IO, can it interact with a real person? For example can I type text into the software and have it actually send the message to another network node?

My understanding is the simulation only works if both the user and the network are part of the simulation, which means they are not _real_. Real IO involves getting information in and out of the computer hence the term Input/Output, which we contract to IO.

Real IO isn't Simulated IO. Or is it?

I'm happy to distinguish between "simulated" and "real". That should be pretty obvious from my use of the "simulated" qualifier, I think.

OTOH, I cannot prove that what I call "real" is not a simulation that might or might not ultimately return `42`. Perhaps "real" is just a matter of perspective. To the machine or human at a given level of inception, isn't "real" input simply any input ultimately sourced at the same level or below? And isn't it impossible to determine whether information is sourced below the current level? A solipsist might argue he can't even know whether input is sourced outside his mind.

In any case, whether IO is real or simulated has negligible impact on our software models. An FSM or even an imperative program cannot determine whether its inputs are "real" from the perspective of a human, nor whether the outputs have "real" consequences. So although I'm happy to distinguish "real" from "simulated", I don't see it as a particularly useful or relevant distinction in context of programming and computer science.

What I do see as useful and relevant is the ability to conveniently use the exact same code and model in both 'real' and 'simulated' contexts, such that I can easily test the same code I deploy, or develop adapters between contexts. Purely functional or object capability model languages can help with this. Conventional programming doesn't prevent simulation, but using a big virtual machine sandbox is certainly not convenient.

Interesting

This has been a good discussion from my side, and has raised some interesting points.

I agree with Oleg, that functional = imperative + context. That would mean that code that can be used in both a "real" IO or simulated IO environment would be imperative, however we can take this imperative code and package it with its context for simulation, and get a pure function.

So this shows we can take an imperative control mechanism and package it with its context to get a pure function for simulation, which is good because it gives us referential transparency in the simulation.

Frameworks

My current understanding of the other way around is you can take a functional model, and embed it in some kind of imperative framework to apply the functional model to the real world.

With the previous case we can enclose an imperative model with a context to get a pure simulation, and this has the nice property of modularity.

With this case we cannot enclose the functional model to get it to interact with the real world, instead it needs a firm of dependency injection. We need to "split apart" the functional model around some time interval or events and evaluate the model before and after each event. The whole combination of functional model and events becomes impure. What we have done is lift the pure functional model into the IO monad.

To justify my earlier position, it is clear that lifting the functional model into the IO monad makes it impure/imperative (as it now interacts with a context, the real world, via effects).

However I admit it is useful (and I never meant to imply it was not) to have a model you can use for both simulation and real application.

What seems interesting is that real world interaction involves effects and is impure (we knew that anyway right) and that simulation is pure/effectless (like the State monad for example), and that we can turn one into the other, by either packaging an effectful program with its context for simulation, or lifting a pure function into the IO monad for real world interaction.

"To justify my earlier

"To justify my earlier position" -- you should start using that for your subject lines.

Position not changed.

I have not however changed my position, it is clear lifting a functional model into the IO monad makes it impure. Nothing I said contradicts anything I said previously, rather the language has been refined to make my position understandable. We still end up with needing imperative code to interact with the real world, and a FSM is still an imperative idea, but you can package it with its context and treat it as pure.

If you really want you can take an FSM package it with context to get a pure function, then lift it to the IO monad so that it can interact with the real world, but it should be obvious that this is overcomplicating things, and complexity is the enemy of correctness, so perhaps we are better off just connecting that imperative FSM directly to the real world?

Edit: A further thought is that lambda calculus is only pure if we consider it with a context, even substitution has a state (the current expression, that we apply reductions to). A mathematician evaluating a lambda expression will write down (or use human memory) to store the intermediate expressions. Any rewrite system has implicit storage and implicitly gets the current expression, rewrites it and stores it back. We can only consider it pure if we don't look inside the box. Philosophically mathematics assumes the existence of humans to evaluate expressions, and puts the human into the same box as the expression to consider it pure. A more disciplined approach would have to consider the interaction between the mathematician and the expression, which would introduce effects, maybe even at the brain level, like "associative recall" and "commit to short term memory".

Close enough.

There are several points I'd quibble if we weren't so far into the right hand margins and I wasn't already weary of this discussion. I find irritating that you conflate imperative with effectful with impure. But the broad strokes are agreeable.

FSM doesn't imply a particular implementation

If you look at the definition of FSM, you won't see anything about how it's implemented.

Pastafarianism?

I know Keean seems to have some imperative-fundamentalist faith regarding state and computation. But linking to the flying spaghetti monster in a discussion about finite state machines seems a bit...

[removed]

Removed sarcastic and dumb comment. Ignore me.

finite STATE machine

It is even in the name :-)

Can you define the 'finite' property of an FSM without referring to states? Can you find published literature that defines a finite state machine that is stateless?

Nitpicking but agreed

I triggered on this:

Fundamentally, our real world's physical state is updated by apparently local rules of physics just like a cellular automata is updated by local rules

So, the nitpicking: Quantum mechanics, Bell's theorem, and nonlocal phenomena. Although the world seems causal and mechanical at the superficial level we observe it in our daily lives, we are still pretty clueless what the physics is at a fundamental level.

For the rest, I agree with your argument.

So, the nitpicking: Quantum

So, the nitpicking: Quantum mechanics, Bell's theorem, and nonlocal phenomena.

The Cellular Automaton Interpretation of Quantum Mechanics by Gerard 't Hooft.

Is that supposed to be evidence?

I don't hope you want to claim that 't Hooft solved the problem and now the generally accepted view in physics is that it's all cellular automata deep down?

Just pointing out that QM

Just pointing out that QM and apparent non-locality don't preclude an interpretation based on deterministic local physics. Unless your nitpick was meant to imply something else?

Doesn't look to be that easy.

Well, my nitpick was more the second line in that we simply don't know but it seems non-local. 't Hooft makes a superdeterministic argument based on cellular automata. He can do that but we still don't know.

big data in closures

what kind of model would include a file-store inside the pure closure?

To me, "file store" brings to mind all sorts of ad-hoc issues like timestamps, permissions, memory mapping, lockfiles, and shared mutable state. But we could consider a simpler model: a persistent trie.

I believe purely functional programming, as a general purpose paradigm, would benefit greatly from more effective support for larger-than-memory data structures, i.e. such that we can easily work with first-class trie values measuring many gigabytes. We don't even need to start with many data structures: intmap and finger trees cover about 99% of use cases (since we can easily build tries and hashmaps above intmap).

This doesn't eliminate need for 'external state' such as networks and user input/output.

But it does cover a lot of cases where we are currently forced to use 'effects' to access external databases and filesystems for scalability or durability reasons, simplifying our application models, testing, replayability, and so on. Whole databases and filesystems could essentially be captured as first class values. We can ideally 'save' the whole value just by updating a small root reference.

One feasible approach is to make it easy to refer to data structures by secure hash of a binary representation. I've built this idea into the language I'm developing now, and I'm implementing the storage and GC layer and smart references in F# at the moment. I have several performance lessons learned from an earlier version from a few years ago, VCache in Haskell.

Interesting.

It seems an interesting approach. How would you write interactive software, like an airline online seat booking service, for example.

Interactive

Obviously you must receive your user inputs from outside of the process, e.g. from HTTP POST requests.

An application could be modeled as an `(Input * State) -> (Output * State)` handler function for POST requests and `(Input * State) -> Output` for GET requests, where Input may include our URL, cookies, headers, and any form data. Or a parsed type, abstracting away HTTP in a command pattern style.

This certainly isn't the only option, of course.

For my F# implementation, I have an API where developers register durable abstract variables called TVars by providing a key and codec, and may perform transactions that read and potentially update many TVars in STM-style. Ephemeral TVars are also available. It provides snapshot isolation and multi-version concurrency control. So a "handler" could generally be transactional, in this case. We could model command pattern logs and cached state, transactionally.

The important bit is that most of our application logic can be moved into pure computations observing or updating local state models instead of explicit management of external state. Pure application logic with first-class state values is easy to grok, debug, test, replay, replicate/mirror, batch process, correctly cache, etc..

State Transition Only

It seems to me, in a general model there is both state and pure function. The Finite State Machine model seems appropriate. You can make your state transition functions pure, but you still need some state. Functional languages 'cheat' by making that state global and saying it's outside the language.

Or to put it another way, how do you write the web-server in your language, as by talking about the command pattern you make the state transitions pure, but have just pushed the state up into the web-server.

As nearly every piece of software is interactive, and there is very little batch processing still done, don't you think that interaction should be the focus of software development?

Not cheating

FP programmers aren't impractical. We want to simplify our application logic and refactoring and testing etc. by making it pure where feasible. But feasibility is constrained by practical requirements and knowledge. We'll use an IO monad or alternative to create a web service or integrate GTK windows or access an external database as needed. Ideally, we'll keep it mostly in a separate layer, limiting entanglement with application logic.

Effective support for larger than memory, durable, immutable data structures supports scalability and durability requirements. This enables a tight focus of our application model on essential interactions such as user input and queries. That is, we can avoid interaction with a database or filesystem if it would primarily be used for non-functional requirements of scalability and durability. Rather than a "cheat", I call that "progress".

As for batch processing: although it is true that software is highly interactive, batching remains useful when working with master-slave mirrors or rollback of a bad input that may quickly recompute state from a set of inputs, or quick startup from a log of inputs and an old state snapshot. In imperative systems, we might use append-only logs and command pattern to achieve similar. Software architecture design isn't a zero-sum game; better support for batch processing in either case doesn't hurt support for interactive update. For some interactions - like "universal undo" - it might even help.

interaction

don't you think that interaction should be the focus of software development?

FP does a good job of making interaction a major focus of software development simply by making it explicit in the type system and application models. Since we cannot hide interactions, we must give them much of our attention.

You should be asking this instead to imperative programmers, who seem to treat interaction with network or database as just a negligible background thing to be hidden from other programmers, rather than the focus of their development.

FWIW

I believe purely functional programming, as a general purpose paradigm, would benefit greatly from more effective support for larger-than-memory data structures, i.e. such that we can easily work with first-class trie values measuring many gigabytes.

At Morgan Stanley, we have a PL that supports this and it essentially works by adding a type for within-file references that works along with algebraic data types (sums, products, recursives, ...). We define concurrent sequences, btrees, skip-lists, etc in a pretty straightforward way and it integrates well with queries and function definitions so that it's mostly the same whether data is stored in memory or on disk.

We record/process/index about 1.5TB/day with this. The PL is a variant of Haskell (which we also use in other ways on the hot path in low-latency, high-volume applications).

hobbes

I can't tell from the Readme whether these are first-class values or second-class references to external storage. But I'll be perusing the hobbes language for a while now. :D

first-class, second-class, coach

I can't tell from the Readme whether these are first-class values or second-class references to external storage.

Basically, a value of type "a@f" is a 64-bit value that refers to an offset in the file "f" (here "f" represents a _value_ that determines the file in question) where a value of type "a" can be found. There's a function "load" of type "a@f->a" that essentially "dereferences the pointer" (in the implementation this doesn't read from disk but just passes through mmap).

There are also similar functions for allocating out of a file, and files carry their whole type structure with them (so we can have generic readers with type-checked queries).

The whole thing is pretty similar to region allocation for memory and these file reference types are just like region-annotated types. I'm not sure how you feel about that in terms of "class" (ie: first vs second class) but for the most part in our day-to-day query/analysis work we don't need to think about reference types (thanks to overloading cases in common type classes covering them).

We have to think a little about them when implementing data structures (e.g. the full structural btree type looks a bit like an extended curse as "^x.|leaf:[k*v]@f, node:x@f*[k*x@f]@f|" here with "^x.T" representing a recursive type "mu x.T", "|leaf:T,node:T|" representing a labeled variant of two constructors named 'leaf' and 'node', "[T]" an array, "T*T" a product), but I've gotten used to it and the secondary benefits have been sufficient (for us at least) to justify the extra thought.

Regional values in files. Nice! GC?

I can easily see how this would be quite useful, and would make it easy to build larger than memory data structures. Are these values garbage-collected somehow?

garbage collection

We never delete data in these files, but we do essentially "compact" them at EOD by accumulating/sorting the data that is interesting and we know won't be added to further. This is kind of like a copying garbage collector with a period of 8 hours or so.

Every file has "roots" (data immediately readable from the file via "f.field") so a textbook "copying collector" can work generically this way.

Impure languages may use monads too

Keean, it is not true that "ML family languages, because they are effectful, don't need monads" for the reason you propose.

Monads, effects handlers or Oleg's more general framing of interpreting computation signatures are about letting users implement their own computational effects. They also often allow those effects to show up in the type signature, but not always; OCaml folks are experimenting with untyped effect handlers, for example.

A ML-family language gives you direct access to *some* effects provided by the runtime, with a fixed interpretation: you get randomness, I/O, very rich mutable state and exceptions in a certain order (exceptions do not rewind I/O or state action), and some runtimes (but not all of them) also provide you with some form of control inversion and some implementations of concurrency.

Even in ML languages it makes perfect sense to use one of these user-defined-effects devices to implement other effects (for example, distributed computation) or to provide different interpretations of some of these effects (for example, have randomness-using program compute a probability distribution instead of computing one sample).

Some languages take a maximalist approach of providing every notion of effect you could wish for as part of their runtime. It's part of Racket's philosophy for example. This gives a lot of flexibility, but even then there are interpretations of effects and some combinations that you won't get without doing the work.

(This point can also provide an understanding for some of the evolution of Daan Leijen's Koka language for example. It has a type-and-effect system that has always been able to type-check the use of runtime-provided effects, but recent versions have worked on allowing users to define their own notions of effect (as in Links, Frank, etc.)).

Difference between need and want.

I think it is exactly true that they don't _need_ Monads, that is I can write code with side-effects, code that does IO without them. I can do this because the ML function arrow "f : a -> b" allows this function to have side effects, and therefore the arrow is impure.

So I can include side-effects without a monad (if I were a lawyer, I would ask you to answer this rhetorical question yes or no).

It's not just some effects, if you can do IO it is all effects. It is a Turing complete imperative language.

Haskell needs Monads because the function arrow is pure, and hence there is no other way to include side-effects. The side effects happen in deconstruction, so you need both a way to construct something (unit) and a way to chain those constructions together (bind), so you construct a program that then gets executed when it's deconstructed at the end of the functional program. You can see these operations (unit and bind) define a monad. This is not mere coincidence, a monad is precisely what you need to functionally construct an imperative program.

Now if you say you _want_ to use Monads to organise your code in ML I may think you are a bit weird, but you would be correct, you _can_ create a monad in ML, and use it as a way to create domain specific languages etc. It may be a bit less useful because you cannot inject it into third party libraries because they may not be written monadically, but you get the same problem in Haskell when people write directly in a concrete monad like:

f :: IO ()

Instead of using abstraction like:

f :: MonadIO m => m ()

Which would allow you to inject your own monad other than IO.

So to summarise, I never said it is useless to have Monads in ML, so counter examples do not help here, I said you do not _need_ Monads, you could write that code another way. Monad haters (which i am not) might think you are stupid to use a monad, adding a layer of abstraction and indirection where you do not _need_ it, I just think it's a little weird, like trying to use Monads in C++.

Hm

I think you either have not understood my point or that you are countering it with a claim that I don't agree with. I don't believe it is true that "if you can do IO it is all effects".

Your operating system and language runtime provides you with a set of interfaces to perform *some* side-effects, but you could conceivably desire other notions of effects not in that set *or* different interpretations of those effect interfaces than those provided by the type system. Again, consider:
- if I want to program with notions of state and exceptional return such that exceptions roll back state update, or
- if I want a notion of non-deterministic choice where the interpretation of an effectful program is not the a single run/sample, but the list of all possible results, or a *distribution* of the possible results as a data-structure to inspect.

None of these are directly provided by the runtime systems of ML languages that I know of (or standard operating system interfaces that you could bind through the FFI), so for these you, the user, may have to implement your side-effects on your own, for example using a monad (or effects handlers, or a final interpretation as discussed here). (It is of course possible to implement those things as library without realizing that the library can be seen as a new user-defined linguistic construct.)

Module with functions

I can just provide that functionality with a module and a set of functions to manipulate the types I declare for roll-backable expressions.

The point is you can just have state anywhere in ML, a normal function can hide stateful update, so there is no _need_ for a monad.

Last try

This is my last try in explaining this to you in this thread.

Monads, or Oleg's "sublanguage", or effect handlers, are programming techniques to let users define their own notions of side-effects. Lets call them "devices for user-defined effects".

In Haskell, the standard library provides almost no notion of side-effect that is not encapsulated in a monad (except by going through the unsafe C interface), so you or library providers, including the standard library, use one of these devices (not necessarily monads) to provide *any* notion of effects.

In ML languages, the standard library and language definitions provide you with *some* notions of side-effects that can be used directly, without using these devices with user-defined effects: states, exceptions, often randomness, sometimes control, in some fixed order. Let us call those "builtin effects". But there are *other* notions or interpretations of effects that are not provided. If you want to use some of those, it is important and useful to recognize the need for one of these devices for user-defined effects. (How exactly you implement the notion of side-effect you need is up to you, and in particular you may or may not rely on some of the builtin effects of the language -- it is usually a good idea for efficiency.)

It is *correct* to say that ML "does not need monad" (or, in general, does not need any device for user-defined effects) to support any of the builtin effects. But it is *wrong* to say that ML, in comparison to Haskell, "does not need user-defined effects" *in general*, because there are other effects that you may want than those builtin effects -- I have given explicit examples in my previous post.
Haskell and ML users may both need user-defined effects, there is no absolute difference, only a relative difference: ML users can often rely on builtin effects, so they need devices for user-defined effects less often.

Finally, let me come back to your original comment. You are asking whether it is surprising that Oleg can demonstrate, in ML, how to implement user-defined effects without directly relying on a monadic structure. Your point is that ML programmers already write effectful expressions by using builtin effects. However, this does not explain Oleg's work or make it unsurprising, given that this work does not rely on ML builtin effects, and in fact could have been implemented in Haskell just as well.

Maybe it's not surprising to me?

Maybe it's just not surprising to me, having worked with Oleg in the past (on the HList and OOHaskell papers), or maybe I haven't fully understood it.

If you can implement it in Haskell _without_ using Monads, or unsafe trapdoors like "unsafePerformIO" I would be surprised.

Being surprised would be a good thing though, because it would mean I have learned something new and unexpected, but at the moment I doubt it, I just think there is a difference in how we are explaining things, I don't think effects fundamentally alter the nature of pure function, you still have to work within the rules.

Edit: I have re-read it all more thoroughly and I still agree with my original position. The DSL would need to be monadic in Haskell because it contains state in module S.

You probably want to read it once more

You probably want to read it once more. There is no mutable state in NDetL which is the example implementation of S.

Limited Application

If there is no mutable state in NDet, then it is only useful for a stateless subset of possible DSLs? Certainly it would preclude any stateful implementations of NDet. Is it even a proper effect, if it is not "effecting some state"?

I am not even sure NDetL is stateless as you claim, as it contains set and get operations on an associative data structure. The problem is because ML does not enforce purity it is hard to tell. Oleg's comment about implementation NDetD:

(* Second implementation, using delimcc
No explicit monads any more
*)

Suggests that implementation NDetL is using monads, even though you don't _need_ them in ML.

The choice of non-determinism was influenced by Oleg wanting to explore the minimal DSL. If we want to look at other DSLs, say for file IO then the implementation would need to be stateful. As soon as the DSL has the concept of sequencing effects it would have 'unit' from the meta-language quoting, and 'bind' from the composition/sequencing of fragments and would therefore permit abstraction by a monad.

Er

Er, I don't see any associative data structure nor get or set operations. Also, the code is right there in the article, so what is hard to tell? All it's doing is very basic list manipulation.

I don't think that there is a problem with using a monadic implementation either. The types `int_t` and `ilist_t` can well be represented by a monad (well, they already are in NDetL: the list monad). The only place where it may cause problems in this example is with the observation function -- that obviously requires the monad in question to have `run` functionality, but the ST monad has just that (though IO does not).

Is it even a proper effect, if it is not "effecting some state"?

Yes, there are many effects that have nothing to do with state.

Running Effects

You are right about NDetL using simple list operations, I must have been reading a different part of the source file without realising it. I don't think this changes anything though.

When I think of effects, I think if memory allocation, random number generation, file IO, channels etc.

If you were trying to define a language with pure functions and handle IO by effects, you would have to deal with these cases. You could have a pure functional meta-language that assembles the impure imperative fragments into a final program which gets run automatically on deconstruction. This is precisely how the IO monad in Haskell already works.

So what Oleg has done is taken the same mechanism and applied it to a simpler (more minimal) DSL that does not need bind (it still needs unit however). It is not surprising that there are some DSLs that are functional not imperative.

Copenhagen

I've been struck, in recent times, by the broad structural similarity of monadic programming to traditional quantum mechanics. Consider. Traditionally quantum mechanics describes a physical system by a wave function, a complex-valued wave across the system's state space which evolves over time according to some equation such as Schrödinger's. Fans of QM have liked, historically, to gush about how beautiful the mathematics is, which... has some truth to it. The wave function evolves in a mathematically clean way, with two main difficulties. One, the clean math may be too difficult to deal with symbolically; and, two — it's inherently impossible to directly experience the wave function (whatever dubious claim one might make anyway about its ontological status). When it comes to actual observable reality, it's not a wave function; that's the point of the Schrödinger's cat thought-experiment, that despite the mathematics, what's actually there is not a wave function, it's a cat. (Meow!) Mathematically, you have this lovely wave function, but you only get to keep it as long as you're not in touch with reality, at which point you have to switch tracks by means of wave function collapse (which mathematically is ugly as sin). I vividly recall my professor in introductory quantum mechanics (one of those truly excellent teachers one occasionally has the good fortune to learn from, btw; but I digress), after spending weeks on a single electron in a potential field, introduced a second electron. Each electron had its own wave function, which we were well familiar with by then; and what happened when they met each other? One of the electrons observed the other, the observed electron's wave function collapsed — and we were able to integrate over all the possible observations and end up with a combined wave function for the two-electron system. Which worked only because we were lucky enough to have a problem where we could work out a symbolic solution to that integral. The whole history of quantum mechanics, afaics, huddles around a few, relatively very very simple, problems we're able to solve. In most realistically big systems, the nice elegant wave functions of the parts don't combine into a remotely manageable wave function for the whole.

Set that beside monadic programming. Fans of functional programming gush about the elegance, but there's this pesky detail that observable reality isn't a monad, it has imperatively evolving state. And difficulties with composing monads, and scaling them up. I've come to think of monads as the Copenhagen interpretation of programming.

observable reality

An interference pattern on a screen is also a state. It can't be accounted for by isolated particles - so yes, state change is important, but it might not be individual.

Do you have any suggestions for a Young's Slit-Experiment in Computation?

Non-determinism as an effect

People do strange things. One of the strangest things is deciding that some particular aspect of a definition is the hill that they personally are willing to die on. And then arguing the point to death... or in the case of LtU far far into the right margin of the site. I'm starting to think that the design decision that makes really deeply nested replies disappear into a column is looking more and more like feature, rather than a bug. If a thread changes topic (many times) then the understandability of the replies cramped into the thin column starts to model the long chain of assumptions that the reader is being asked to follow. If a point is worth being made then starting afresh forces basic assumptions to be stated clearly, and leads to a more readable discussion.

Some interesting points have been raised in the long wandering discussion. Rather than simply attack them and point out that they are wrong, is it not more interesting to track them back to the original source and ask where these misunderstanding have arisen? What can we learn about the original claims from the discussion that it has produced?

Here is the claim that seems to have prompted a large part of the discussion: "We pick a rather complex effect -- non-determinism". This would cause me to ask two questions - in precisely what way is non-determinism an effect? How does the example exhibit it?

One way to define non-determinism as an effect is to define the semantics of a language so that it reads some unseen state from the environment (treading perilously close to the right-hand margin) and use it to pick a value or control path in the execution. It is certainly difficult to do so without implementing some kind of stateful read from the environment in the language semantics. The observable outcome is that when the computation is executed its result cannot be determined without access to the extra state - hence the non-determinism.

That does not seem to be what is shown in the example code. Another form of evaluation of a non-deterministic code is to ask "what is the deterministic hull of solutions?". This computation is itself is deterministic, a classic example would be "what is the set of all permutations of a list?". This purely deterministic question is related to, but distinct from, the non-deterministic question - "what is a permutation of this list?".

If we are computing the hull of solutions then it is not surprising that we can do so without access to any hidden state, and so it starts to seem less surprising that we did not require a monad to model the implementation. The paper is a good read, and it stands on its own merits as an interesting use of a tagless final interpreter, but I don't see how it does what it claims. In the example provided:


let _ = let module M = Perm(NDetL) in M.test1
(*
- : NDetL.ilist_t = [[1; 2; 3]; [1; 3; 2]; [2; 1; 3]; [2; 3; 1]; [3; 1; 2]; [3; 2; 1]]
*)

...where is the non-determinism? That looks like the deterministic hull of a non-deterministic function. It is true that the same hull can be rendered into different lists, but only one such list is being produced by the implementation. It is not non-deterministically selecting between them. If there is no non-determinism then what is the effect being modelled?

re: nondeterminism effect

...where is the non-determinism?

Well, it would be equally valid for the returned `ilist_t` to have a different order, or even just one result via backtracking. :D

As I understand Oleg's argument, the non-determinism 'effect' is in the interaction between `Perm` and the `NDet` handlers/context. Our program `Perm` was written against this signature without knowing or caring about order of results. We are similarly free to commute arbitrary `|||` subprograms or eliminate/introduce duplicates without affecting Perm's program semantics (because `x == x|||x` and `x|||y == y|||x` and associative equivalences are implied by our NDet semantics as described in the comments).

This would probably feel closer to a DSL if we avoided passing the `NDet` implementation as an argument to `Perm` and instead reverse that, passing the DSL code to an NDet runtime/interpreter such that the source of non-determinism is never visible in scope of the program leveraging it. Oleg mentions this in his "exercises" section, e.g. use of freer monads instead. But the two approaches are essentially equivalent, and ultimately it's only `Perm` by itself that is non-deterministic, not a combination of `Perm + deterministic NDet implementation` (of course, an NDet implementation could support threads and race conditions and generally be non deterministic, another 'exercise' left to Oleg's readers).

Similarly, a system using imperative threads would be deterministic if driven by a deterministic scheduler and network. It's the program written using threads that we call non-deterministic because it makes no assumptions about scheduling.

Different handlers

I agree, you could use a handler that has a random generator and produces a single non-deterministic output. I also agree the 'interpreter' model makes things clearer, and it's easier to see that for a random non-deterministic output the interpreter itself would need to be monadic in Haskell, but the effects themselves need not be.

Actually I'm not sure that's a correct interpretation, the effects themselves are not pure, what we are doing is composing fragments of an impure language, and of course composition is a pure operation. It seems easy to get confused between the DSL which is actually untyped (as we do no type checking on the int_t/list_t abstract syntax generated) so the whole pure/impure element of the effects is hidden from the host language.

What would be interesting is what happens if we try and type the abstract syntax of the DSL in the host language?

Indeed

That's pretty much the original point I was trying to make, only expressed far more eloquently.