Diagram showing all programming paradigms and their relationships

Here is a diagram showing all the main programming paradigms and their relationships. This is quite different from the usual diagram which just shows programming languages. This diagram condenses a lot of information in a compact way. If your favorite language is not mentioned here, make a convincing argument and I may add it. If you find any errors, please let me know. I will take all people's comments into account to improve the diagram. Note to Unix aficionados: the diagram was made using xfig running on Mac OS X.

Edit: here is the latest updated diagram.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

E

The E language sits at the end of a chain that contains (+ thread), but E has no threads.

SQL

Not to mention SQL is not Turing-equivalent.

Imperative SQL

Also, SQL appears as completely undeclarative. What was meant by shared state in SQL? Host variables in various bindings of SQL to imperative languages? Cursors? Transaction isolation?

E and threads

Actually, each vat in E is a thread. Within a vat, one viewpoint is that there is a single thread, but another viewpoint is that there are multiple threads with a deterministic scheduler.

The discussion E has no threads uses the term threads to refer to shared-state concurrency, which is indeed a disaster unless it is strongly constrained.

Within a vat, one viewpoint

Within a vat, one viewpoint is that there is a single thread, but another viewpoint is that there are multiple threads with a deterministic scheduler.

I don't believe this is true. Within a vat, one object can steal the CPU (infinite loop) and the only way to recover is to kill the entire vat. I'm not sure how you can view this as deterministic scheduling with multiple threads.

Scheduling inside a vat

It's not preemptive. Preemption is a source of nondeterminism, which is bad for security reasons (covert channels).

But stealing the CPU is another security problem: denial of service! I think that the E philosophy is that objects inside a vat should not be considered completely isolated from each other. From the outside, a vat should be considered as a single "aggregate" for security reasoning. True isolation is only possible between different vats.

It's not preemptive.

It's not preemptive. Preemption is a source of nondeterminism, which is bad for security reasons (covert channels).

But stealing the CPU is another security problem: denial of service!

I agree (at least with the last point), but my point is that there is no observable concurrent execution, which seems to contradict your assertion that a vat can be considered multiple threads with a deterministic scheduler. The lack of interleaved execution between objects implies no schedule, simply co-operative yielding. I'm just trying to reconcile your statements of E with what I know of E. Perhaps we have a terminology mismatch on 'threads' and 'schedulers'.

As for preemption, I haven't read anything about covert channels via the scheduler, so if you have any links I'd be very interested. As a final thought, cover channels aren't much of a concern considering we can't even properly isolate overt communication yet (on a mainstream platform at least, so E and friends don't count ;-) ). Joe-E may be the closest to achieving this.

What is being captured?

Not sure what the vertical axis means. Is it just addition of features, or is there some order that's not apparent?

Java does not currently have closures.

It seems that nearly every branch ends with Oz and Alice. Is this diagram intended to promote these languages or completely capture the state of the universe of language paradigms?

Finally, the imperative programming line looks thin. Surely, the distinguishing features between the various imperative paradigms are larger than the presence/absence of closures, threads, and logs, woudln't you say?

To answer questions with questions...

binstock: It seems that nearly every branch ends with Oz and Alice. Is this diagram intended to promote these languages or completely capture the state of the universe of language paradigms?

Are those two goals mutually exclusive? Have you used either Oz or Alice ML?

binstock: Finally, the imperative programming line looks thin. Surely, the distinguishing features between the various imperative paradigms are larger than the presence/absence of closures, threads, and logs, woudln't you say?

Would you? I'm not Peter, but I'll offer my opinion that talking about "various imperative paradigms" is an oxymoron. Some will say that object-orientation is a distinct imperative paradigm, but that's problematic because objects are trivially modeled with closures and don't necessarily require mutation. So from my POV, the imperative programming line looks thin because it is. I don't think that implies a value judgment necessarily; you can make the point that once you have first-class functions and mutable state you can macro-represent almost anything else, e.g. delimited continuations. Or you can indeed view it somewhat pejoratively and say that while you might be able to do that, it's difficult and error-prone relative to having a more orthogonal feature set that might be easier to reason about and treating mutable state as one more extremely powerful, easily abused, hard-to-reason-about feature. The diagram as presented supports both interpretations (and more!) equally well, so I think it would be more fruitful to concentrate on some issues of accuracy that have already been raised, e.g. SQL not being Turing-equivalent, E not having "threads" (but perhaps "threads" is really just shorthand for "concurrency constructs," in which case the characterization of E is accurate), etc.

Update: It's worth reiterating that this diagram is derived from the exposition found in Concepts, Techniques, and Models of Computer Programming, which uses Oz as its vehicle in essentially the same way that Structure and Interpretation of Computer Programs uses Scheme. Unlike SICP, however, CTM provides an extensive analysis of how the paradigms introduced appear and are used in other languages such as Java, Haskell, and Erlang. CTM builds up its definition of Oz by starting with a tiny kernel language and very gradually adding features until the kernel is full Oz, needing only syntactic sugar in order to be the "real" Oz. The result is that each new kernel feature is strongly motivated by a combination of comparison to other languages and the requirement that the new feature not be implementable by other means without requiring non-local changes to the code. In this way, it becomes easy to understand why Oz is what it is and, crucially, why it really is a multiparadigm language with a level of orthogonality that other multiparadigm-in-practice languages, e.g. C++ and OCaml, unfortunately lack. I cannot recommend either the book or the language highly enough.

Well, FWIW, my first

Well, FWIW, my first impression was also, "Is this intended frame the totality of programming paradigms in such a way as to make Oz and Alice always as the ultimate language?" (Actually, that's a sugar-coating of my initial impression.)

Whether or not that was the intention, I think it's perfectly fair to say that that's how this diagram comes across. At least to me.

I'm still confused...

So what if it is? That would only seem to be problematic if the framing were dishonest, which is why I asked the previous poster if he or she had actually used Oz or Alice.

Smells like propaganda

Not trying to be obnoxious, but "propaganda" was the first word that popped into my head when I saw the chart. It seemed to be implicitly defining "programming paradigm", as "something in Oz/Alice". It's a smell.

I could say "what about X" for any number of X (as others are), and one could reply with "well, X is not a *paradigm*, it's something else". Ok.

As was wisely mentioned by another poster, perhaps leaving out the langauge names would make it feel less so. (I'd hardly describe Java as C with closures, anyway.)

...and answers with answers

snively: so I think it would be more fruitful to concentrate on some issues of accuracy
You must have missed the second line of my post.

snively: I cannot recommend either the book or the language highly enough.
Awright, it's on my to-explore list. Thx for the heads-up. Back to the mines. :-)

Hasse diagram?

My interpretation of the vertical axis is the reverse of the usual in Hasse diagrams - whenever a < b, then a lies higher than b (or one can define a < b to mean "language a has all features of language b" to restore the usual Hasse diagram convention).

Some explanations of what is being captured

Not sure what the vertical axis means. Is it just addition of features, or is there some order that's not apparent?

The order is idiosyncratic. It is based on the 'creative extension principle' (appendix D in CTM), but the choice of what features to add and in which order is a personal one. Other orders are certainly possible, and I would be very happy to see someone propose another one!

Java does not currently have closures.

Java can be implemented in a language with closures and explicit state. I.e., even though it does not give the programmer full access to closures, it does give some access.

It seems that nearly every branch ends with Oz and Alice.

Yes, that's a bit unusual (although FRP and STM are not supported natively in Oz or Alice), but I don't know of other languages that are multiparadigm. I define a language as multiparadigm if one can program in each paradigm without interference from the other paradigms. It's not sufficient to implement many programming concepts. E.g., Common Lisp contains many concepts, but it is not multiparadigm in this sense.

Note that there are several languages that are dual paradigm. That seems to give very nice results. For example, Prolog implements both logic programming and imperative programming. Modern constraint modeling languages (like Comet and Numerica) improve upon Prolog: they are both constraint languages and object-oriented languages. The constraints are the engine and the object orientation is for structuring problems.

Joy

My favorite language: Joy (http://www.latrobe.edu.au/philosophy/phimvt/joy.html).

Argument:

Whereas all other functional programming languages are based on the application of functions to arguments, Joy is based on the composition of functions... The concatenation of appropriate programs denotes the composition of the functions which the programs denote... Some functions expect quoted programs on top of the stack and execute them in many different ways, effectively by dequoting. So, where other functional languages use abstraction and application, Joy uses quotation and combinators -- functions which perform dequotation. As a result, there are no named formal parameters, no substitution of actual for formal parameters, and no environment of name-value pairs.

...Although I guess this isn't a "main" paradigm... oh well.

Generic programming?

I think you're missing an important feature that some languages support and some languages do not - Generic programming.

C++ Templates, Lisp hygienic macros, Haskell polymorphic types. Maybe Java generics though they're weaker.

Polytypic programming? What, no types?

Then what about polytypic programming (a.k.a. "even more generic")?

This leads me to the following observation - while looking at the diagram I didn't find any features that require static typing - may this warrant some qualification to the generality of the diagram?

Interesting point...

I'd say it depends upon whether you can identify a programming paradigm that requires static typing. I can't. "Generic" or "polytypic" programming seems to me to be a kind of cross-cutting concern that can be applied to functional, object-oriented, imperative... programming. Indeed, the fact that the leaves of the tree tend to be occupied both by Oz and Alice ML reinforces this impression.

What's a paradigm?

I'd say it depends upon whether you can identify a programming paradigm that requires static typing.

Which again depends on what we call a programming paradigm. From what you say, I will take a liberty to infer that none of the following is:

  1. Polytypic programming
  2. Metaprogramming
  3. Aspect-oriented programming

Should we take a step back and define what's in the name? For example, Wikipedia defines it as follows: "A programming paradigm is a fundamental style of programming regarding how solutions to problems are to be formulated in a programming language".

Object Domain Specific languages

What about object domain specific languages? Maybe this is just an implementation dependent and Ruby community culture thing. Special syntax for common objects such as /RegularExpressions/, [A,r,r,a,y,s], and {H=>a, a=>s, s=>h} tables is encouraged. As Ruby matures it will be interesting to see if all objects have a domain specific language attached to them.

Yep

We must define our terms, certainly. Since the diagram is explicitly inspired by CTM, I'll use the definition from CTM. Unfortunately, my copy of CTM is currently on loan, so we'll have to wait for its return, or for some other kind soul to post the definition from CTM, before we can really view the diagram in the context of that definition. That's not to say that Wikipedia's is bad/wrong; I merely wish to relate the diagram to its source of inspiration, at which point we can discuss whether that source is reasonable or not.

Definition from CTM

Quoting from the opening page of the Preface:

"A computation model is a formal system that defines how computations are done. There are many ways to define computation models. Since this book is intended to be practical, it is important that the computation model be directly useful to the programmer. We therefore define it in terms of concepts that are important to programmers: data types, operations, and a programming language. The term computation model makes precise the imprecise notion of "programming paradigm." The rest of the book talks about computation models and not programming paradigms. Sometimes we use the phrase "programming model." This refers to what the programmer needs: the programming techniques and design principles made possible by the computation model."

Paradigms, typing, and aspects

whether you can identify a programming paradigm that requires static typing

Typing is orthogonal to what the diagram expresses. Typically, each paradigm can be done with different styles of typing (varying degrees of static or dynamic typing). The orthogonality is not perfect: typing can affect expressiveness. I would be happy to see a diagram that takes the influence of typing on expressiveness into account!

Aspects are also orthogonal to the diagram. Aspects are parts of a specification (a program does A /\ B /\ C, etc., where A is the functionality, B is an aspect, C is an aspect, etc.), so they apply to all paradigms. Aspects are certainly not limited to the 'aspect' construct in AspectJ!

Minor Point

Minor point: Scheme is not only functional, but imperative as well.

Scheme's all over the place

FWIW, when I think of multi-paradigm languages, the two PLs that have the most coverage of paradigms would be Scheme and Oz.

As for the diagram itself, I think a better perspective is to ignore the particular programming languages that are attached to the nodes - as that tends to evoke passion. The better question is whether PvR has managed to capture the main paradigm forks and the relationships between them? Once the main paradigm tree is mapped out, then you can assign PLs to the nodes at leisure.

Distinction: with a difference or not?

A distinction that I think is important (but others might not) is between languages with support for paradigms that is "built in" or "first class" in some sense that is admittedly hard to define crisply, vs. offering facilities to implement such paradigms if you want them. From that POV, Scheme is a fascinating language, lacking built-in support for objects, concurrency, goal-directed evaluation, laziness, etc. but making it easier than in any other language I know that lacks these features to add them on—which I think is a great testament to the power of lexical closures, delay/force, call-with-current-continuation, and macros.

So I'm not sure that I agree that Scheme is particularly multiparadigm, but I would definitely agree that it deserves calling out distinctly for providing a much better foundation for implementing multiple paradigms than any of its cousins other than Oz and Alice ML.

Foundation for implementing multiple paradigms

This poses the question: what exactly are those foundations which make it possible/easier adding paradigms? Is it first-class functions plus macros? Why is Scheme/Lisp particularly good for implementing paradigms? Because of its macros, so the language itself can be extended?

Needless to say that Scheme/Lisp macros where already discussed at LtU (e.g., see here, here, and here)

My list

IMHO, the crucial things about Scheme are the ones I listed above: first-class functions, lexical closures, delay/force, call-with-current-continuation, and syntactic extensibility. Remove any of them and you do significant harm to Scheme's ability to implement multiple paradigms. IMHO, the real win in terms of expressive power is, of course, call-with-current-continuation, which is known to be able to express any known control structure at all. Strictly speaking, of course, you can live without it and implement a CPS-conversion macro, but I think we've already stipulated that we're talking about making things easy as opposed to making things possible. Historically, I think we can argue that Scheme is important because it deliberately attempts to add extremely little sugar to the lambda calculus, leaving the addition of more sugar to the user. This view is, of course, controversial, as the status of R6RS, I think, underscores.

Tail call optimisation is too easily forgotten

TCO *and* first-class continuations are what makes it possible for so many control operators to be macro-expressible in Scheme.

One issue I have with call/cc is that its uses often don't seem to compose well, nor is the way they should compose obvious, a priori. For example, what should happen when I use a green threading library and a backtracking search one, both implemented with (vanilla, undelimited) continuations?

Cells and call/cc. Or delimited continuations.

call-with-current-continuation, which is known to be able to express any known control structure at all

Assuming presence of mutable cells, of course.
Also, as we are discussing things that are easy as opposed to possible, call/cc may be not the best atomic unit for building control structures (some delimited continuation structures may be more handy).

Imperative languages with search

The diagram makes me wonder why there are no imperative languages with Prolog-style backtracking. (Apart from my proposed extension to Intercal). I can't see anything intrinsically difficult about it. You'd just need to keep a stack of choice-points, each of which contained a snap-shot of program state. Hardly efficient, I admit, but no worse than Prolog already is.

Where is SNOBOL and Icon.

Where is SNOBOL and Icon. Both are imperative languages that has some form of backtracking. And of course both Oz and Alice has those features to, even thougt i havent ever seen them used together. But I don't think eihgter of them reverts destructive updates on backtracking. You could use Haskell and combine the list monad with the STM monad or something.

Good additions!

I wholeheartedly agree that Icon (I never used SNOBOL) fits nicely somewhere in the imperative nonmonotonic logic branch.

Smalltalk

LaLonde and Van Gulik implemented backtracking in Smalltalk: http://dl.acm.org/citation.cfm?id=62094

The tl;dr version is that Smalltalk reifies the call stack as a list of context frames, and gives access to the currently executing context through the special variable thisContext. This boils down to "call/cc + mutable cell".

Other things for the chart?

Great-looking chart. A few things which might be interesting to add (I realize that these things are less revealing when it comes to analyzing "paradigms", but these are noteworthy nonetheless).

* Types, as mentioned above.

* Various ways of supporting metaprogamming (excluding higher-order types), such as macros of various types, MOPs, introspection and reflection, aspects, homoiconicity/eval.

* Control structures which are a subset of full continuations, but which go beyond function invokation and return--such things as coroutines, exception handling, generators, etc.

* DBC-ish features (preconditions, postconditions, invariants, assertions). In an ideal world, this would integrate seemlessly and be covered by types, but we aren't there yet.

* Types of dispatch--simple static dispatch, static overloading, single-argument polymorphic dispatch ("virtual functions"), arbitrary polymorphic dispatch ("multi-methods"), pattern matching, predicate types, etc.

Labels

A better label for the axis at the bottom of the page would be "analytic" on the left and "synthetic" on the right. This is a better way to understand the difference between these two extremes. State is a consequence of choosing one or the other. Declarative implies analytic but non-declarative doesn't imply synthetic.

Edit: Also the separation of these two extremes linked to state implies that they can't be combined or should not be combined, this is certainly not the case.

Definition of 'analytic' and 'synthetic'?

Can you define what these terms mean?

I am thinking about the

I am thinking about the common usage of analytic and synthetic. Analytic is the easy one. The ordinary use of the word in math, science and engineering seems appropriate. An analytic system is one in which everything is defined in terms of everything else. This is the accepted way to understand formal logic, Prolog, and pure functional programming. The left side of the chart is certainly analytic.

Now for the hard part. As you probably know there is an unresolved debate in philosophy over the distinction between analytic and synthetic. This is an interesting language issue and it does have something to do with computer language. But as I will argue it is bogus with respect to labeling the chart.

The commonsense use of "synthetic" or synthesis is "putting together" or "acting out". It is synonymous with state. Imperative programmers will understand exactly what it means. This is an extremely practical and useful definition that should not be held hostage by unresolved philosophical issues.

The use of "no state" and "stateful" is confusing and misleading. Functional programs can and do use state, and even when we achieve perfect declarative-ness, the machine that solves the problem is certainly stateful. "Imperative" systems can and must also be both stateful and functional. As you have said clearly the chart is about paradigms, state does not define any paradigm, and paradigms can be used together.
Edit: State defines a system and a computer is a system.
Edit: I also have a problem with the term "declarative". The left side of the chart is certainly declarative, But synthetic systems can also be declarative of their synthesis. Thus both sides are declarative.
Edit: The essence of the link above is that analytic and synthetic are not separate, absolute categories but points of view that tend to overlap. In other words they are separate paradigms that tend to overlap in practice but not as points of view. This should be the implicit meaning of the chart.

State in the diagram

The use of "no state" and "stateful" is confusing and misleading.

The meaning is simple: "no state" means the kernel language has no explicit state and "stateful" means the kernel language has explicit state. Functional languages do not have explicit state (in their intended use as pure languages). Whether or not the underlying machine has state or not is irrelevant.

The meaning isn't simple

The meaning is simple: "no state" means the kernel language has no explicit state and "stateful" means the kernel language has explicit state.
Since you are taking a rather technical point of view about this why not point that out in the chart. On the left "no state in the kernel language", and on the right "state in the kernel language".

Clarification

For someone who thinks about systems and state, states do not come and go as we please. States are subject to morphisms, they change but they don't go away. A functional language is simply a morphism of the states to the stack. A system always includes both the language constructs and the stack. People who think this way can be baffled and confused by "language centric" thinking that seems to abolish systems theory. Not that that is anyones real intention.

not all states are united

For someone who thinks about systems and state, states do not come and go as we please.

This refers to a different definition of "state", from a different context. Peter's use of the term "state" in the diagram can be considered shorthand for "mutable state" or "updateable state". It's standard usage in the PL theory and functional language context, and doesn't imply anything about incompatible uses of the term in other contexts.

Good comments

Thanks to everybody for all the good comments. I will try to take them into account when I make an updated version of the diagram.

When I put a language under a paradigm, it doesn't necessarily mean that there is a perfect fit between the language and the paradigm. It means that part of the language is *intended* (by the language designers) to natively support the paradigm without interference from other paradigms. It is not enough that libraries have been written in the language to support the paradigm. The language's kernel language should support the paradigm. Also, when there is a family of related languages that supports the same paradigm (e.g., Java/C#/C++/..., Scheme/Common Lisp/Logo/...), I tend to mention just a single representative of the family to avoid clutter.

Some paradigms and concepts that are clearly missing are imperative search (SNOBOL and Icon), continuations, coroutining, and exceptions. It's not always clear whether a concept should be explicit or left implicit. For example, I think that exceptions can be left implicit, since many paradigms incorporate them (and they are subsumed by continuations). Coroutining can arguably be considered a form of concurrency that has deterministic scheduling with explicit yield. I think it is important whether a paradigm has observable nondeterminism or not. This is a more general property than support for coroutining.

Axes that are orthogonal to the diagram are typing, aspects, and domain-specificity. For example, it should be the case that any paradigm could be used for a domain-specific language (except when the domain arguably needs a particular concept). Typing is not completely orthogonal: it has some effect on expressiveness. Aspects should be completely orthogonal, since they are part of a program specification.

Metaprogramming is a whole new way of thinking. The term covers many different approaches, from higher-order programming, syntactic extensibility (e.g., macros), to higher-order programming combined with syntactic support (e.g., meta-object protocols and generics), to full-fledged tinkering with the language's kernel language (introspection and reflection). This deserves its own diagram! I tend to think that syntactic extensibility and kernel language tinkering are orthogonal to the diagram: they should be applicable to most paradigms.

The diagram shows just the tip of the iceberg of all possible paradigms. Trying to be more complete without adding clutter is hard (but I will try). In fact, it's possible to take almost any subset of programming concepts and label it as a paradigm. So the full diagram would be a complete graph (but most paradigms in this graph would be practically useless). The diagram chooses those paradigms that have revealed themselves (by use) to be important for practical programming. The diagram can be used as a tool for finding unexplored paradigms: it is definitely possible that there exist useful paradigms that haven't been discovered yet by programmers. (When writing CTM, we rediscovered declarative concurrency in this way.)

Updated diagram

Here is an updated diagram. You can see which paradigms have observable nondeterminism and which don't. For example, the nice thing about FRP is that it regains the determinism that is lost in concurrent logic programming. Continuations are still missing from the diagram; I will add them later.

What about the issue of SQL

What about the issue of SQL being labeled "Turing Complete"

No SQL

There is no SQL in the updated version of the diagram (replaced by "SQL bindings"). That's a bit slippery, as there are many different bindings even for one host language, and a binding is not a language anyway.

I see "Prolog & SQL" on the

I see "Prolog & SQL" on the left hand side of the diagram. Don't you?

SQL fix

I've changed both references to SQL to say "SQL embeddings". The URL now points to the fixed diagram.

S-exps

I'd add S-expressions next to XML as an example of descriptive declarative programming. This hints at some approaches which cross the data structure/Turing equivalence line in ways that XML usually doesn't.

Very nice diagram, btw. I'd love to see it expanded along the various other dimensions that have been raised. Although it might need to be converted to a 3D model... :)

A new updated diagram

I've added an explanatory text to the diagram, "Clarifications", which I hope will make the diagram completely clear. The new updated diagram is at the same URL as the previous update.

Updated diagram showing programming paradigms

The most recent updated diagram (version 1.07) is here. This diagram adds MapReduce, Esterel, Excel, Curry, and other languages. The CTM book follows this diagram, but the diagram goes much further than the book. There was a critique that Oz, Alice (and Curry) are mentioned often and many popular languages are not. I can't do anything about this: these three were *designed* to support many programming concepts. The others were just not designed for this. That makes them more awkward for some programming tasks. This is compensated partially in popular languages by better development environments and more libraries. This kind of compensation only goes so far. For example, concurrent and distributed programming is poorly supported by most of the popular languages and libraries can't fix that. To fix the problem, the language itself has to be changed. For example, the language can take concepts from Erlang, which supports concurrent and distributed programming well.

Explanation of state in the diagram

The updated diagram also has an explanation of state, where I summarize why monads are the weakest form and why it's important for state to be named, concurrent, and support nondeterminism. Different languages have state with different combinations of these three properties.

Misleading summary

Haskell as practiced (if not Haskell 98) has had support for cells via the IO and ST monads for a long time. The ST monad is also easy to give a naive implementation in a dynamically-typed language and can be implemented in a statically-typed language with a sufficiently powerful type system (essentially, one that can prove that if you're holding a reference to a cell then you know the type of the value held in it).

It's also possible for monads to support concurrent state (in fact, the IO monad does this) and nondeterminism. The limits are largely on what can be implemented purely with any real kind of efficiency, not on what the monad concept can introduce to a language. This raises the question of whether monadic pure functional programming - or some generalisation thereof - deserves recognition as a separate paradigm, though I appreciate that it would likely make something of a mess of your existing diagram.

The word "monad"

It's also possible for monads to support concurrent state (in fact, the IO monad does this) and nondeterminism.

Well then we're no longer functional. A "monad" is like a magic box into which we can cram anything. The support for concurrency and nondeterminism does not come from the original monad idea. Rather, the monad is used as a foundation on which to add other concepts. The monad is by no means essential for this. These concepts can be added in other ways. What monads give you seems to be a kind of separation of concerns: you can separate the nonfunctional concepts from the purely functional part of the program.

Depends what "monad" means

You could talk about "adding monads" in the sense of monads definable within an existing language. In that sense you might say "monads can provide state or continuations in a pure language".

You could also talk about "adding monads" to a language in the sense of adding a new primitive type constructor that is mathematically a monad. In that sense just about anything can be provided as a monad. If you preserve your favorite semantic equality on non-monadic code you might even claim that your language is still "pure", and say "monads can support concurrent state and nondeterminism".

The interesting thing about adding "non-functional" primitives as monads is that it is relatively easy to do it without spoiling your existing semantic equivalence on pure code.

You can fit just about anything into a monadic interface, because semantic equivalence for pure code with uninterpreted functions can't prove your monad operations shouldn't be order-dependent, non-idempotent, or at all well-behaved. Binding results from monadic computations is syntactically distinct from let-bindings, so you hardly have to worry about breaking referential transparency, if you take that to mean the ability to safely inline let-bindings (if you are providing something like reflection you might need some extra semantic dodges, like claiming to nondeterministically return code semantically equivalent to the function being inspected).

Monads are a programming technique that provides an interface

You can fit just about anything into a monadic interface

Exactly. Monads provide separation of concerns. They are a programming technique (possibly supported by syntax for ease of use and implementation for efficiency) that provides an interface for adding non-functional concepts to a functional language, so that the functional part of the program is still functional. The monadic interface itself uses functional abstraction to hide the new concepts from the functional part of the program. But that's not the same thing as saying that the monad idea subsumes concurrency, mutable state, nondeterminism, and all kinds of other concepts. Monads, being a programming technique, are not visible in the kernel language. From this vantage point, they are not even a programming concept! In the terminology of CTM, Haskell monads can be called a linguistic abstraction.

I think that people confuse the monad idea with the new concepts that it can interface to. It's not because monads can interface to mutable state that monads subsume mutable state. Monads are just the interface to it.

The term monad is confusing

The term monad is confusing in this context, because the monad structure itself is not relevant. The relevant point is that (e.g.) state can be modeled functionally. This encoding essentially creates an embedded sub-language that supports state, with runST (itself pure) providing the interpreter. I suppose you can argue that using this encoding means you're programming in another (non-pure) language, but I don't really see the point. It's a very shallow encoding that plays nicely with the rest of the language.

Nondeterminism is different - it can be modeled with a powerful type system but not implemented in a pure language (any implementation will be over-constrained).

Two languages are involved

It's a very shallow encoding that plays nicely with the rest of the language.

So you have two languages, L1 and L2. L1 is pure and in L1 you write an interpreter for a non-pure language L2. From the viewpoint of L1, you have added exactly nothing to the language (the kernel language is the same). From the viewpoint of L2, you have a kernel language with a new concept, i.e., named mutable state. What you have achieved is a clean interface between a pure language L1 and a non-pure language L2. The monad mechanism has given you that. But you have not actually added state to the pure language L1. L2 is implemented using a pure language L1, but from L2's point of view that is irrelevant. In CTM I call this the "Fallacy of the Preprocessor".

I don't really see the point

In computer science, it is always important to distinguish carefully the level of abstraction that you are working in. In this example, are you programming in L1 or in L2? If you are programming in L2, then functional reasoning is not sufficient. You have to use stateful reasoning techniques. If you are programming in L1, then you do not get the modularity advantages of named mutable state. At the end of the day, what you have gained is a clean interface between L1 and L2, so that you can use functional reasoning when appropriate and stateful reasoning when appropriate. You have not eliminated the need for stateful reasoning.

Thought experiment

Suppose you take Haskell and move ST from the libraries into the definition of the language (so it still has the same monadic interface). This language has mutable state as a kernel concept, right? I don't see the point of considering this a different language from the one where we implemented ST with equivalent semantics ourselves.

It makes no difference where ST is put

I don't see the point of considering this a different language from the one where we implemented ST with equivalent semantics ourselves.

Exactly! They are both the same language. They are both *stateful* languages. They are the language I call L2. No matter how you turn it, L2 requires stateful reasoning.

Are you trying to say that L2 does not require stateful reasoning, just because it is implemented in a pure language?

That's not the same as L2

That's not the same as L2 as I understood it. L2 was supposed to be the interpreted language, under the metaphor that 'runST' is interpreting an embedded language. This language that exists "inside the monad" is a typical stateful language - there are essentially instructions for load and store that (from the point of view of L2) have side effects.

The language I described isn't the same as L2 - it's the same as Haskell. It's a pure language that includes mutation descriptors as first class values and has a way to convert a mutation descriptor into a function from state to state. So to recap the argument: If you start with Haskell and add mutable state as a kernel concept through a monadic interface you just end up with Haskell again. Hence we might as well say Haskell supports mutable state. I think what you're calling "stateful languages" may just be languages with a particular interface to state - load/store with side effects - which is fine but will exclude the functional modeling approach which is IMO the right thing.

Monads via interpreters

For what it's worth, I've been known to actually implement monads by defining the corresponding algebra (as a GADT) and then writing an interpreter - though strictly the interpreter isn't required to have a monad. The more common implementation style is simply the fused version where the constructors do all the interpreting work as well.

The situation where the "preprocessor" is in fact the language's own evaluation mechanisms in normal use definitely has some interesting properties of its own!

This is not a post

Move along