Philip Wadler: Category Theory for the Working Hacker

Nothing you don't already know, if you are inteo this sort of thing (and many if not most LtU-ers are), but a quick way to get the basic idea if you are not. Wadler has papers that explain Curry-Howard better, and the category theory content here is very basic -- but it's an easy listen that will give you the fundamental points if you still wonder what this category thing is all about.

To make this a bit more fun for those already in the know: what is totally missing from the talk (understandable given time constraints) is why this should interest the "working hacker". So how about pointing out a few cool uses/ideas that discerning hackers will appreciate? Go for it!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Here's one: Hutton's

Another one I like (slightly

Another one I like (slightly beyond the things in the lecture, though): Y in Practical Programs by Bruce McAdam.

Curly C

The first slide:

curlyC(A,B) = { A->B e curlyC }

What is curlyC?

Can I assume that this is just really sloppy notation?

Product slide, 20 minutes

what does the "|-" symbol mean? And what does

Gamma |- M : A



I'll take a shot at that (thus opening myself up to ridicule). Turnstile notation is used for formal logics. If L is a formal logic in which a particular proposition P can be proven, one writes L |- P. If your type system Gamma allows you to conclude that M has type A, you write Gamma |- M : A.

|- means nothing

I give a fairly in-depth answer to this question on the CS StackExchange: It also links to a blog post that connects the notation to code.

Hom-set notation definition

I think what that stuff about curlyC means is "The notation curlyC(A, B), for any category curlyC and objects A and B in that category, shall mean the collection of every morphism f that goes from A to B in curlyC."

This is known as the hom-set between A and B, and it has many notations. According to Wikipedia, "The collection of all morphisms from X to Y is denoted homC(X,Y) or simply hom(X, Y) and called the hom-set between X and Y. Some authors write MorC(X,Y), Mor(X, Y) or C(X, Y)."

So this is Wadler establishing what notation will be used for the hom-set in this presentation.


Meanwhile, "Gamma |- M : A" is type theory notation. It means "In the context Gamma, the expression M has type A."

A context is also known as a scope or environment. It encodes what variables are available and what their known types and/or bindings are. Contexts are often given the name Gamma, for some reason.

"That's what category theory buys you"

At 38 minutes. Eh, what does that mean for actual programming?

He's just described the

He's just described the principle of duality in category theory, which I'm happy to agree is a cool thing about category theory. But whether it has anything to do with actual programming depends on whether you subscribe to the pure-functional agenda for actual programming. Personally, I don't. It's quite true that our mathematical technology does a lousy job of coping with side-effects in programs; Wadler sees in that a problem with side-effects, I see in it a problem with our mathematical technology. Either way, he's working through the connection between logic and programming called the Curry-Howard correspondence (CH). An important (imho) thing to keep in mind about CH is that it's a connection between formal reasoning about types in programs, and formal reasoning in general. That's important because of what it doesn't say: CH is not a connection between formal reasoning and computation. It's all just about types. In programming, types are (at best) a means to an end; actually doing computation is what really matters, and types are an incidental device intended to help correctly specify how to do computations. In formal logic, the actual reasoning is the whole point; working through the correspondence, it's as if making sure your program is correctly typed were more important than what happens when you run it. This leads, of course, to one of the things I deeply disapprove of about CH: I see it as distracting from things in programming that should matter more than typing. My attitude is, of course, so alien from Wadler's mindset that I suspect we may not even be disagreeing so much as thinking past each other.


Yes, I caught that comment about side-effects. My immediate thought was "and how are you going to model side-effects in this formalism?".

Thanks for the clarification. I was actually hoping for a theorem "If your programming model satisfies this duality then you can not have deadlock" (or whatever other desirable property) and I could then live with a corollary "functional languages satisfy this duality so they don't have deadlock" (or whatever).

I've tried reading up on categories several times now, and have never come across such a theorem that would explain why I would care. It's all nifty, but to me it never leaves the domain of the abstract.


Well, the question "and how are you going to model side-effects in this formalism?" has an answer, of course: monads. (Speaking of type systems run amok.)

Though I don't think we've anywhere near a mathematical handle on side-effects, a small first step is the variant lambda-calculus of Felleisen's dissertation (to which my own dissertation added the notion of specialized classes of mediating variables). A curious result I've heard of: Timothy Griffin showed that the usual Curry-Howard correspondence, which relates lambda-calculus to intuitionistic logic, can be extended to relate Felleisen's side-effect-ful calculus to classical logic. Which doesn't deter my criticisms of Curry-Howard, but seems like a glimmering of... something.


You can model side-effects as monadic sets, where the type constructor becomes a set of 'effect flags'. This means function types effectively list the 'types' of side effects they can produce in their return types.


Even if you can model effects as monads its the wrong idea. Functional programming is an incomplete model. It's great for calculating things but incompetent at doing them.

The right approach is to meld calculations with actions in a symmetric way as hinted by Andrzej Filinski in his master thesis and later papers. As it turns out my Felix programming language is ahead of him, it already implements a system which combines values, functions, and continuations using fibres and synchronous channels. (Similar to Hoare's CSP but without concurrency: fibres are coroutines not processes).

This system is desperately in need of theoretical analysis to make it easier to reason about. The key is probably a type system, but not the kind you're used to: active programming is all about control so we need type systems to approximate control models.

The simplest constructions is a pipeline which looks like:

src -> tr1 -> tr2 -> tr3 -> sink

where src plays the role of a value, trx are transducers which are like functions, and sink is the final continuation. Pipe formation is composition and so associative. It's no accident the arrows look categorical! By adding parens you can see:

(src -> tr1) -> tr2 -> (tr3 ->sink)

that the first group, value + function is a value, and the last group, function plus continuation is a continuation (you already know function plus function is a function, right? :)

So in this model, the boundary between values, function, and continuation is notional: its wherever you want it to be (just slide the parens about).

A simple pipeline therefore subsumes both applicative functional programming and procedural programming. And if you go to more complex flow arrangements it transcends both.

A pipeline of continuously running (or one shot) fibres is an example of a control model. This one is easy to work with and clearly "correct" whatever that means.

A more difficult model involves a fibre that writes its result to its own input. That can be easily proven to be equivalent to suicide: an extra fibre acting as a buffer is required and then you have a model of tail recursion. (The picture is a bit hard to draw with ascii art).

We all know you can do any procedural code in a functional system, and any functional code in a procedural one. They're known to be equivalent. What has been lacking until now is a way to do BOTH in the same program with unified model.

Interesting, but misses the point.

This is interesting, but lets first address effects, as what you propose has nothing really to do with effects as far as I can see, although maybe you can correct me.

I came across several examples in Rust where effects would be necessary. One example is if I take a value out of a data structure I have to replace it with another. Due to Rusts safety constraints I cannot leave the value undefined for even a microsecond. It is an atomic swap or nothing at all. The problem is that if any operation causes a panic (Rust's version of an exception) and Rust unwinds the stack, there could be a crash as there is data missing which is not allowed. If we could guarantee no exceptions, we could allow a function to run and then replace the data, which overall would result in a neater and better performing solution. Other examples would be interrupt handlers that are not allowed to allocate memory, using effects the caller could ensure that constraint is enforced.

So the purpose of effects is to expose side-effects to the type system in a way that function signatures can impose constraints on those side-effects.

In your examples I have no way of knowing if "tr1" allocates stack or heap memory, opens a file, reads or writes data to message ports, throws exceptions, takes longer than "N" clock-ticks to run, etc.

What is missing in your 'pipelines' is the type of the data that is being transferred, but it would seem trivial to have a constructor for a Pipe, which is effectively a lazy list, you would then have the following signatures, assuming the data passed along the pipe is a stream of integers:

src : Pipe Int
tr1 : Pipe Int -> Pipe Int
tr2 : Pipe Int -> Pipe Int
tr3 : Pipe Int -> Pipe Int
sink : Pipe Int -> ()

That would of course be for functions with no side-effects. Lets assume that tr1, tr2 and tr3 have no side effects, and src reads from a file and sink writes to a file. The type signatures of trN do not have to change, but src and sink might become:

src : [FileRead] (Pipe Int)
sink : Pipe Int -> [File Write] (Pipe Int)

and the code would simply be:


or perhaps nicer:

sink ((tr1 . tr2 . tr3) src)

or using Haskell's arrow notation:

sink ((tr1 >>> tr2 >>> tr3) src)

Edit: Even better would be to have a pipe as a type-class, so that a pipe can define an interface. Then you would have:

src : Pipe a => a -> [ReadFile] a
tr1 : Pipe a => a -> a
tr2 : Pipe a => a -> a
tr3 : Pipe a => a -> a
sink : Pipe a => a -> [WriteFile] ()

re: FP is an incomplete incompetent model

Functional programming can model those pipelines, no problem. Cf. presentation by Rúnar Bjarnason on machines and the associated library by Bjarnason and Kmett.

FP is a 'complete' programming model in the sense that it can express both arbitrary programs (with arbitrary effects models) and the arbitrary machines to interpret them.

For 'competence' the main question is whether we can achieve competitive performance. Can we leverage a GPGPU to accelerate graphics and vector processing and machine learning? Can we leverage multiple CPUs? How about massive clouds and mesh networks? Can we model incremental inputs and reactions to an input? Can we work effectively with 'big data', e.g. multi-terabyte log-structured merge trees as first-class values?

Pure functions can do these things. For example, accelerate and lambda cube support GPGPU processing. However, such features require either special support from a compiler/runtime (which is absent with unfortunate regularity) or a library hiding unsafePerformIO under the hood to integrate external resources (which is ugly, and frequently involves painful configuration issues). Despite those caveats, I believe FP languages will gradually bridge all relevant performance gaps.

(edited for clarity:)
Separating 'performance' features from 'effects' models seems worthwhile even outside a purely functional contexts that would force the issue. Like any separation of concerns, this separation can improve simplicity, security, and accessibility. Examples: If accelerated graphics processing is available as a pure function, we could easily have spreadsheets that present ad-hoc images in cells without worries about side-effects. If databases are first class values rather than requiring sophisticated interactions with a network or filesystem, we can more directly model experiments, tests, and forking/backtracking whole databases.

Effects and performance

I am not sure I get your comment about separating effects models. Personally I want to write imperative code for readability, but have the same advantages as functional languages for type safety and composability. I want to be able to write an efficient string manipulation module that is imperative internally and explicitly manipulates memory for performance, but presents a functional API to the rest of the program, ideally with meta-level proofs of soundness.

Functional languages will never bridge the performance gap in my opinion because optimisations are domain specific. The fastest way to manage memory for a matrix library is different from the fastest way to manage memory for strings. Each application domain needs low level access to write efficient code and optimisations specific to that domain. Hence my objective of a language that lets you write precise imperative code for expert written libraries, but let's the library user operate in a safe functional way that doesnt need to manage memory.

re: functional will never bridge the performance gap

I understand your opinion on FP performance is a common one. But I've never seen a good argument for that belief. I just see pessimism or post-hoc rationalization for one's commitment to another programming paradigm.

The argument you present above is no exception.

There is nothing about functional languages that prevents domain-specific optimizations. I can support any specific domain I please, e.g. optimize matrix structures and functions to manipulate them. If I want to model efficient imperative-style mutation and explicit memory management, I can introduce a notion of unique/linear vectors.

If I need to optimize a wide variety of domains, I can develop a general system of performance annotations to mark types and functions, and support a modular compiler. There is more than one way to "low level access" and optimize new domains.

Conversely, there is nothing about imperative languages that guarantees precise control over representations, memory management, or ability to usefully hand-implement domain specific optimizations. This is aptly demonstrated by languages such as Basic, Python, and Ruby.

AFAICT, you're conflating too many concerns to present a coherent argument about the potential future limits of performance in functional languages.

That said, I won't discourage you from pursuing your vision of functional-imperative programming. It is certainly a low risk approach, depending only on decades old well proven techniques.

Reinventing with added complexity

You are right, you could provide a library of domain specific optimisations by perhaps allowing new rewrite rules to be added to the compiler. Let's think about this, now instead of writing in one language you are writing in two, a functional implementation in one, and a bunch of rewrite rules in another. This seems a more complex and worse solution than a simple language that provides access to memory layout and mutable bits. It also seems harder to debug because you cannot see the final code as you cannot write the result of the optimisations in the source language. This means you also have to understand the target language. This approach seems worse the longer I think about it.

Performance annotations are similar, why not write what you want to do directly and simply instead of writing what you want and then tweaking it with magic compiler tweaks to control how it is implemented. This seems like trying to drive a car by shouting instructions to a blind man at the wheel.

The argument about Ruby and Python is a straw man, as I am sure you realise.

What you see as conflation, I see as a web of interconnected reasons. That you see it as conflation may be because you don't understand the problem, although it's more likely I am just not explaining myself very well.

In practice, performance

In practice, performance features tend to cover a wide variety of domains. Like the GPU style of computing is good for a lot more than graphics, and has been extended to cover machine learning and physics simulations. Even your own position is essentially that "we can hand-implement a lot of domain optimizations given an efficient model for mutable vectors of bits". (Of which there are several suitable for FP.) Ultimately, it doesn't take tweaks specific to every little domain to get competitive performance.

As far as "writing in two languages" goes, that's not a significant issue. It is not at all uncommon for functional programmers to model a wide variety of mini languages for specific problems. Separating the program from its implementation is convenient outside of performance reasons. I do not consider this separation inherently a "more complex or worse solution" than programming in a more direct style.

This seems like trying to drive a car by shouting instructions to a blind man at the wheel.

Performance annotations can be robust, precise, and predictable. In the sense of not merely being "suggestions", but instead being closer to "language extensions" oriented around performance.

In any case, with regards to whether FP will gradually bridge the performance gaps, concerns of convenience or complexity seem much less critical than simply having a straightforward approach that is known to be workable. Integrating features like par/seq parallelism or GPGPU acceleration of vector math into the language IS straightforward and known to be workable.

Let's not shift goal posts too much here.

The argument about Ruby and Python is a straw man

It's a fine, constructive counter to: "imperative programming gives me performance". I grant that might not be your argument. You seem to be tossing a lot of features under the 'imperative' banner that have nothing to do with expressing programs in the imperative mode. Including efficient models for update-able bit vectors, precise control of representation, and avoidance of GC.


Indeed, but then what is the difference between functional and imperative programming? I think functional programming has several desirable features, but when you include monads and mutable references you have an imperative embedded DSL in any case. It is possible to write pure functions in an imperative language too.

So functional languages can close the gap, but the code will either look like imperative code, or a functional description will get transformed into imperative code by annotations or rewrite rules. In the latter cases what you have is an obfuscated imperative program.

Imperative programming is

Imperative programming is differentiated from purely functional programming, in a technical sense, primarily by aliasing of state. One procedure can update state and another can observe this update without any explicit communication on a call-return path. This becomes the usual basis for side-effects. And a source of implicit entanglement with the environment.

FP can, of course, model imperative aliasing. But must do so explicitly, e.g. using a reference model within a state monad.

OTOH, if what you want is just efficient mutation, you don't need aliasing or effects. You could use affine types or unique representations so your compiler/interpreter can update a representation of data in place without risk of another function observing the change.

Regarding "a functional description will get transformed into imperative code (...) what you have is an obfuscated imperative program", I think by that logic one should conclude all high level programs are ultimately obfuscated machine code. I nonetheless enjoy the properties of pure language during the bulk of software development and maintenance.

Imperative programming is ..

"One procedure can update state and another can observe this update without any explicit communication on a call-return path"

Yes, good definition, and explanation of what's wrong with it (traditional imperative programming), except your statement is too specialised saying "call return path" since others exist than the master/slave model.

But FP is just as bad. Remember lambda doesn't specify order of evaluation!

But with fibres, there IS explicit communication. As mentioned earlier, consider a fibre that reads, acts, then reads again in a loop. You can make a logical statement: the action occurs *between the reads* and this is well defined. It's not lambda calculus: the control flow is explicit.

But wait! We were listening to Wadler talk about Curry-Howard? Isn't a proposition such as the one I just made about read/then act/read again .. isn't that (isomorphic to) a TYPE?


chip tr[D,C] (f:D->C)
  connector io
    pin inp: %<D
    pin out: %>C
  while true do 
    var x = read io.inp; 
    var y = f x;
    write (io.out, y); 

This is a function adaptor transducer that converts a function of type D->C into a fibre which reads a D on the inp channel and writes a C on the out channel, then goes back, the channels are typed, but this is only the data type. I do not know how to specify the control type. (Yes, that's real working Felix code)

The endpoints of control types are EVENTS such as "read on io.inp", "write on op.out", loop. What's important is that if you join two of these in a pipe, the CONTROL types have to agree somehow, not just the data types.

If you consider a loop which writes first, then reads, and try to plug it into a read then write chip in a pipeline fashion, it deadlocks, and which causes both fibres to suicide provided the connecting channels are otherwise unreachable (due to garbage collector). So in some sense the combined types equal the proposition "false".

So, the fibre above: its a function, its pure, it has a datatype. Its also a procedure, and it has definite control type type as well, although I do not know how to express it notationally.

The key to the model is that matching reads and writes are synchronisation events. Order of operation is only defined between such events. The events are totally ordered. In Felix the order is deterministic, but in a more abstract model it is not (on synchronisation which of the two meeting fibres goes first?)

If you throw effects into the loop above, or some more complex chip with multiple I/O channels, you can still reason about the chip and how it can be composed with others.

The Felix compiler actually automates one such reasoning, it gives a warning if all the channels connected together are input ends, or all output ends.

re: call return path, fibres

A 'call return path' may be implemented in a wide variety of ways: tail calls as gotos, continuation passing styles, pipelines, even physical hardware. When I say "call return path", I'm not referring to any call return implementation so much the simple dataflow path it represents.

In a pure style, all effects are modeled on this path as values. Even something simple like state is modeled by including a representation for the 'next' state among the return values.

FP is just as bad. Remember lambda doesn't specify order of evaluation!

As mentioned earlier, consider a fibre that reads, acts, then reads again in a loop. You can make a logical statement: the action occurs *between the reads* and this is well defined. It's not lambda calculus: the control flow is explicit.

Pure data flow can represent control flow without loss of expressiveness or generality. For example, by expressing a conditional result, you can know the condition is computed and observed before the result. Further, in absence of side effects, order of evaluation is not relevant outside of performance considerations. (Consequently, controlling evaluation order is a convenient target for performance annotations in pure languages - laziness, parallelism, sequencing.)

AFAICT, you seem to be assessing pure FP as "bad" under a lens that doesn't apply. The whole "reads, acts, reads" loop and "action between reads" discussion is only sensible in a context where you have side effects, actions outside of returning some data. I understand you want to promote Felix-like fibers, but misrepresenting the alternatives isn't the right way to go about doing so.

with fibres, there IS explicit communication

The presence of explicit communication does not imply there is no implicit or aliased communication.

If you model fibres in a purely functional system, you'll make explicit all the communications involved. Primarily, this would require making explicit the communication with 'scheduler'. For example, a fibre awaiting input on a given pin might be modeled by a value of form: `(In pinId, data → fibre')` where `fibre'` represents the state for the fibre after reading.

Call return path

I'm confused. You're saying a call return path can be "implemented in a number of ways" and say you're not interested in that but in the simple data flow it represents.

But channel I/O is unlike function calls. With functions you send an argument along with control, and get back a result with control. It's a master/slave (or client/server) system.

With channels, you never get back a result. It's not functional. It's procedural. Meaning, the computational unit is a continuation.

Synchronous channels and fibres are a special case of CSP (basically CSP without concurrency). In which I see no lambdas floating about.

And so talk of "side effects" is in some sense insane. Procedures have effects. They're not "side effects". Side effects are evil things that confuse functions. Fibres aren't functions. Procedures sequence effects. That's what they're FOR.

You do get the idea of duality?

I say FP is incomplete, not bad! It is not symmetrical. A good PL must be symmetrical. You have to have peer/peer operations. Functions are slaves so they are not enough. (I'm not against functions!)

re: channel IO, call return path, effects

With functions you send an argument along with control, and get back a result with control. It's a master/slave (or client/server) system.

In the presence of parallel evaluation, call by need, etc.. the idea of control flow doesn't really fit. Instead you might immediately get back a placeholder for a 'future' result, whose computed value you may later wait upon when you need to observe it. This doesn't change the dataflow, however.

talk of "side effects" is in some sense insane. Procedures have effects. They're not "side effects". Side effects are evil things

The phrase "side effects" in computation doesn't mean the same thing as "side effects" in medicine. It isn't about 'intended' vs. 'unintended' effects. I like to think of it this way: if I draw a diagram representing a computation, I might represent a dataflow or a control flow or a state machine. For many programs, I will still have state or communication effects that do not fit within the framework of the chart and instead are annotated to the side of it. Hence, "side" effects.

I would not call side effects 'evil'. (Moral capacity for evil seems to be well beyond any computing primitives.) But regardless of whether the side effects are intended or not, obvious or hidden, they can greatly hinder reasoning and reusability and a number of other nice things.

In context of your fibres, you might model your channel-based communications as a dataflow (if it's simple enough). But you'll still have plenty of effects outside this flow, e.g. to integrate your sources and sinks. Those are side-effects.

You have to have peer/peer operations.

And FP can 'have' them. Via modeling.

The phrase "side effects" in

The phrase "side effects" in computation doesn't mean the same thing as "side effects" in medicine. It isn't about 'intended' vs. 'unintended' effects.

On consideration, I don't altogether agree. There really is a normative implication in the computational terminology "side effects". The terminology is based on a normative assumption that certain kinds of structure are the primary way things should be understood, and things that don't fit that model are therefore "to the side". It's remarkably easy to decide to look at the world in a certain way and then become gradually so acclimated to it that you're no longer able to see anything peculiar about your perspective on things, and start to think your perspective is obviously the most natural. I recall, once upon a time, coming across remarks by someone who had become so acclimated to W-grammars that they seemed genuinely baffled why anyone would think W-grammars difficult to work with.

Value-oriented programming

IMO, the name 'functional programming' really puts to much emphasis on functions, when the more important idea is focusing on immutable values. Functions are certainly important because parameterization is fundamental, and a function is just a parameterized value. But on the other hand it's not so important (IMO) that every value be defined in terms of functions. You can take a value-oriented approach to procedures without representing them as functions. Similarly concurrent procedures can be modeled as values that produce one of a number of possible outcomes.


Is that really the case? If you program only with immutable values, all your code will have side-effects, namely out-of-memory due to exhaustion of stack or heap space. With procedures that return results by mutable reference you can have 'pure' side-effect free code that cannot fail at runtime.

For me the important property is 'regularity' that is call the procedure or function with the same arguments, and you get the same results. The calling convention by which results are returned should not matter. Input arguments can be copied or passed as immutable reference, in/out arguments as mutable references, and out arguments as uninitialised mutable references, as long as you get the same result for the same input.

re: out of memory errors

If you program only with immutable values, all your code will have side-effects, namely out-of-memory due to exhaustion of stack or heap space.

That's simply untrue.

First, 'immutable value' doesn't mean the memory for representation of that value cannot be recycled, whether by GC or update-in-place when we know the last reference to a value's representation in memory has been lost.

Second, bounded memory is a logical property of programs or algorithms. It isn't really affected by the whole GC vs. manual memory thing. GC bounds generally won't be as tight, as some allocation buffer is needed for GC efficiency. But it's still bounded. For example, if you can bound 'live' data to 100MB, then a Cheney collector with 600MB address space can trivially make a hard real-time guarantee of 66% recovery of memory on every pass, and 200MB of allocation 'work' per pass (and zero risk of memory fragmentation).


I think it's clear. Consider the machine code for a function that adds two values. If the arguments are immutable the function must allocate space for the return value. If there is no memory left it will have to fail. I don't see your argument countering this.

Consider an interrupt service routine that must not fail. If we pre-allocate the space for the arguments, return values, and working space and pass this into the routine by reference then we are safe. This can only be done with mutable references.

re: demonstration

Your argument becomes circular upon the phrase: "if there is no memory left". Yes, if we're out of memory, we're out of memory. While circular arguments are certainly 'clear', they're also pointless.

Adding two numbers only requires space for three numbers. Thus, we can bound how much memory is required: three number slots. If you want, you can allocate the third slot ahead of time and pass it to the machine code for 'add'. (That's a common case for return values anyway.) If not, you can allocate the third slot as needed. The bound doesn't change either way.

If you have ISRs with bounded memory, you can include the worst-case for ISRs with the program's memory bound. You may preallocate space for the ISR activation records, or not, either way won't change the bound.

So long as we have bounded memory requirements - whether it's three fixed-width numbers or three hundred megabytes - we can simply provide that much memory and henceforth make a strong guarantee against out-of-memory errors. An implementation language might make a difference for space efficiency (e.g. 200MB vs. 300MB for a program) and in some contexts that matters. But language, even use of mutable references, won't change whether an algorithm or program is fundamentally bounded space or not.

Your belief that "this can only be done with mutable references" is simply wrong. Maybe you should take a look at the work on statically allocated systems by Mycroft and Sharp, and their functional language in particular to get a better grasp on the nature of bounded memory and how the whole 'immutable value' thing is irrelevant.

Hidden mutable references

That's an interesting paper, however, your argument about bounds is also circular. You are saying if there is enough memory you don't have to worry (because there is enough memory). This neglects multi-processing systems where more than one program is running at once. Another program making similar bounds assumptions may have just used all the free memory. So we are looking for the guarantee that a block of code does not heap allocate, or if we want stronger guarantees does not stack allocate, as that is the only guarantee in a multi-processing system that can prevent an out-of-memory condition in a block of code (with interrupts we may not know how much stack is left, and need to avoid a double bus fault). So I think we can agree that the only way to prevent this is to pre-allocate the storage needed before the critical section.

So you have to pre allocate mutable storage for the results and intermediate values, and you still have to pass a mutable reference to the memory into the function that is doing the work.

The version is very

The version is very different from what I responded to above. It isn't a duplicate. Edit: And now I'm also annoyed with whomever deleted my other response.

My argument about bounds isn't equivalent to "if there is enough memory, don't worry". There are plenty of computations that DON'T have bounded memory requirements, but for which you might have enough memory in practice. For the class of bounded memory computations, we can prevent "out of memory". For computations that we cannot place an upper bound on memory use, we generally cannot prevent "out of memory". Mutable refs don't make a relevant difference to determining which class a computation is in.

Multi-processing doesn't change this. (Just allocate and memlock a sufficient arena up front, e.g. as a runtime heap configuration option. Or just control the set of processes on your system.) Efficiency/waste doesn't matter in context of this specific issue, either, so long as it's bounded (though, memory fragmentation can be unbounded). Your assumption that you need to prevent allocation is wrong - to prevent "out of memory" it is sufficient that the allocation succeed. Your repeated attentions to interrupts and such seems irrelevant to the general case.

Whether or not you agree or even understand, I'm done arguing for now.

Understanding and Logic.

I edited my response as I wanted to keep more on the topic of this sub-thread, and not introduce distractions. Due to an error in LtU earlier versions of my post were left behind, however I don't have permission to delete anything. I apologise for mentioning interrupts I don't think it affects the main argument, and it is distracting from the key points.

I agree with this statement 'For computations that we cannot place an upper bound on memory use, we generally cannot prevent "out of memory"'. So the first requirement is that pure code must have bounded memory use.

I agree with this statement too "to prevent "out of memory" it is sufficient that the allocation succeed." However you cannot guarantee that the allocation will succeed. There may be sufficient memory when you start the program, but some other process may do a large allocation meaning that the program can fail in the middle when it tries to allocate. This is clearly a side-effect, and as the memory allocation can always fail (which is a side effect) then pure code must not allocate. Its that simple (and the functional language you linked to does exactly that). If memory allocation could never fail then 'malloc' would not ever return a null. So it is clear the second requirement is that pure code must not allocate.

Put another way, memory-allocation is a side effect. You could argue this another way, in that measurable free memory is the side effect. So if a function allocates memory and we can measure memory then it is 'leaking' information be the side-effect of changing the amount of free memory.

So the final step is purely one of logic. If pure code has no side-effects, then pure code must use bounded memory and not allocate. If a block of pure code cannot allocate then it must be passed a reference for is working space and its results, and by necessity this reference must be to mutable memory, or the pure function will not be able to store its intermediate values or results at all.

When you refer to pre-allocating an 'Arena' you seem to be admitting as much? How can you pass the arena to a function without mutable reference? The result of a pure function must only depend on the values of its arguments, so if we do not pass the arena by mutable reference, we have no way of accessing the arena from within the pure function.


Evaluation of pure code always has side-effects. It consumes space, takes time and energy and generates heat during evaluation, etc.. Potential for allocation failure is not too different from failure on a time quota, battery failure, CPU temp alarm, nor from an impatient human using Ctrl+C to kill an evaluation that takes too long. But pure code does not express or observe these physical computation side effects. This is an important difference from effectful code, where something like printing to console is an essential (and generally explicit) part of the program's behavior.

Elimination of evaluation effects is NOT generally considered a prerequisite for purity of 'the code' being evaluated. Though such effects do limit the scope of equational reasoning. We reason about equivalence between values or behaviors of expressions as observable within the program, not equivalence in heat profile. (While 'purity' is somewhat informally defined, the simple laws for local reasoning is why we care about it. One could formally define purity in terms of such laws.)

The arena I mentioned above can be configurably allocated by the runtime, which subsequently allocates from the arena. You seem to be assuming that runtime allocation is subject to the whims of the OS, and hence affected by multi-processing. But this isn't always the case. Even 'malloc' can work from preallocated arenas if you choose to implement it that way. In any case, I wasn't implying the arena needs be exposed as a first class object within the program.

Purity and Functions

If code has side-effects it is not pure. So I guess we have to stop being black and white, and admit that no code is ever pure. I guess we need to talk about more or less pure. So code that does not have a memory-allocation side-effect is more pure than code which has such a side-effect (if all other side-effects are the same).

If the arena is not a first class object then the result of the function is depending on something that is not an argument, and that doesn't seem functional at all.

I guess you could consider the arena an implicit reference to mutable memory? Its still a reference to mutable memory whether implicit or explicit.

No return vs. Dependent result.

If a function fails, e.g. due to exhausting a time or space quota, you don't get a different result. You just don't get a result.

The result of a pure function does not depend on anything but the arguments. But whether you manage to compute the result ultimately depends on a mechanical, physical computation process with all that entails.

If you argued 'bounded memory' computations to be more pure in some useful sense, controlling against a particular divergence 'effect', I could agree with that. But since you insist on 'does not use allocation' as your metric, I instead just think you're confusing essential semantic properties with specific implementation details.

Failure and Function

Surely failure is a different result? If I try and add 3 and 4, and get "out of memory exception" instead of "7" it is a very different outcome.

I don't think out of memory is a specific implementation detail. Infact I think "implementation" is part of the problem. You don't implement a function, a function is what it is. You can change it I to a different function, but then it's a different function.

This is what category theory says, you have objects and morphisms (values and functions) and you can have functors that map those to objects and morphisms in another category. So there is a category of "computer programs" which has functions and those have to deal with side effects like memory use. You are working in the category of "hypothetical programs" or something like that. Now whist you could automate the mapping, the problem is you cannot specify what to do in the case of failure. In the category of hypothetical programs memory cannot be exhausted, so how can you say how to recover from such a condition? Personally I want to write computer programs, that run well on computers, and can cope with the full range of behaviours in the category of "computer programs".

Purity would seem to be those functions that can be mapped from the category of "hypothetical programs" to the category of "computer programs" with no change in behaviour. So a function that cannot run out of memory (as it is passed a mutable reference for its results) is more pure than one which allocates, as it is changed less by the functor (no allocations need to be inserted by the functor).

re: functions and failure

Surely failure is a different result? If I try and add 3 and 4, and get "out of memory exception" instead of "7" it is a very different outcome.

Calling that outcome a result of the `(3 + 4)` expression would be misleading. E.g. we don't evaluate `(6 * (3 + 4))` to `(6 * Exception:out-of-memory)`. Further, even attributing an out of memory condition to a specific sub-computation is incorrect. Rather like attributing the camel's broken back to the final straw.

I don't think out of memory is a specific implementation detail.

To clarify, it's your specific approach of avoiding an out-of-memory condition BY MEANS OF "not allocating" that I call an implementation detail.

You don't implement a function, a function is what it is.

There are many ways to 'express' any given function. Even something trivial like the identity function on integers:

λx.((λx.x) x)
λx.((x * 1) + 0)
λx.((x + 1) - 1)
λx.if (x == 1) then 1 else x

I believe a specific expression of a function can reasonably be described as an 'implementation' for that function.

Purity would seem to be those functions that can be mapped from the category of "hypothetical programs" to the category of "computer programs" with no change in behaviour.

That is not how functional programmers define purity. Also, it seems you're attempting to re-define "computer programs" to fit your argument and agenda. I can't be bothered to argue definitions with you.

I want to write computer programs, that run well on computers, and can cope with the full range of behaviours in the category of "computer programs".

For effectful programs, non-localized errors like exhausting a time or space quota generally do not admit reasonable coping behavior. If the quota can be localized to a specific subprogram, that can help, but even then one must be concerned about incomplete interactions with other subprograms through any shared state or environment.

Coping in the absence of effects is relatively straightforward. Provide more resources (or optimize code) and recompute. Cache/snapshot long-running computations to continue them more efficiently. Etc.. But this does come at the cost of an indirection, since we cannot directly express the coping within the code.

I am not inclined to assume either approach is superior to the other. But I personally enjoy not having to write bunches of code outside the happy path to deal with potential non-local errors in my functional programming.

Clarity of definition

In Haskell you can handle the exception by performing the computation in the error monad. Then you really do have "6 * OOM = OOM". So your reductio ad absurdum argument fails, because it is not an absurd thing to do at all.

For your definitions of "identity" I would say you have many different functions. You are relying on the data type being an integer ring, there are types for which these are not identities, for example the type of negative numbers. Only the first function is identity for all types. Therefore these are different functions.

I think I am providing clarity of definition in the context of category theory. What well defined definition of purity would you give? Using a common definition:

A pure function is a function where the return value is only determined by its input values, without observable side effects.

So as I can observe free memory, that is an observable side effect, therefore any function that allocates is not pure.

In Haskell you can handle

In Haskell you can handle the exception by performing the computation in the error monad. Then you really do have "6 * OOM = OOM".

Nope. Potentially excepting the `IO` monad (which is able to reflect on the runtime), you cannot catch an 'out of memory' error in an error monad.

there are types for which these are not identities

I did specify "identity function on integers". Noting that some expressions of that function might better generalize to other data types is irrelevant. I could just as easily have made my point using multiple expressions of Fibonacci function or something else with obviously specific data types.

I would say you have many different functions.

In context of functional programming, the word 'function' describes the more or less mathematical mapping from a domain of values to a codomain. There are many ways to represent such a mapping, and hence to express a function.

If it is your goal for people to misunderstand, you are of course free to use 'function' to mean whatever you want.

So as I can observe free memory, that is an observable side effect, therefore any function that allocates is not pure.

I wouldn't be surprised if, on occasion, mathematicians have argued that mathematics is impure because mathematicians need to eat and poop and write on dead trees and so on. Purely functional programming can only ever be as 'pure' as mathematics.

I think the standard response is to clarify a distinction between the math and the physical/social/economic process of computation. The math is pure because it does not observe or interact with the world. There is no "out of dead trees" or "too hungry, gone home" exception within mathematical expressions. Likewise, a runtime's "out of memory" exception is not properly part of our purely functional program.

I did touch on this distinction earlier.

In any case, I made some effort to clarify my terminology. But it seems to me that this has only lead to arguing about terminology. At this point I'm just calling our discussion a wasted effort. Good luck with your project.

Standard Definition

Yes, it seems you don't even agree with the standard definition of 'Pure', so there is no point in continuing. Your point about mathematics is "pointless" as all mathematical functions are pure, so why even bother having a definition. The same function can be pure in the category of mathematical functions which are all pure and have no side effects by your own explanation above, or can be impure in the category of computer programs. The problem is you are arbitrarily shifting to the category of mathematical functions when the function is being interpreted by a computer and hence in the category of computer programs.

So as I can observe free

So as I can observe free memory, that is an observable side effect

Observing the free memory is the effectful function because it's non-deterministic. Allocating memory in languages with GC is not effectful. Allocating is only an effect in languages which aim to explicitly track resources.

Similarly, observing the clock is a non-deterministic/effectful function, even though using CPU time to execute code is not an effect.


It dont think something is a side effect only if you can observe it in the language itself. In any case you can observe memory by having a function that uses more and more memory until it is exhausted, and you can do this differentially, with the function whose memory use you want to observe included or not.

Effectively this is the programming equivalent if the philosophical question "does a tree falling make any sound even if there us nobody there to hear it".

Also observation is the opposite of an effect. An effect is an output from a function, not an input.

In any case you can observe

In any case you can observe memory by having a function that uses more and more memory until it is exhausted

Except you can't observe this within the program because it aborts with an OOM error, so the side-effect doesn't exist as far as the program is concerned. Only in languages in which you can recover from this error might this be a meaningful effect.

Also observation is the opposite of an effect. An effect is an output from a function, not an input.

Exactly, it's the non-deterministic output of a function. An aliased mutable cell can change at any point, making the getter non-deterministic.

You've long advocated for local mutable variables, but since those mutations aren't visible to callers or callees, there should be no effect annotation on the function signature, so it's pure. Effects must be observable to be relevant.

If all effects are encapsulated in a way that isn't observable, then your program is pure. What other possible meaning could we ascribe to a program in Haskell that runs on imperative hardware? Clearly the runtime performs all sorts of effects, but the effects are not observable to Haskell programs, so Haskell programs are pure (modulo modelling effects via monads of course).

Effectively this is the programming equivalent if the philosophical question "does a tree falling make any sound even if there us nobody there to hear it".

No one disputes that it makes a sound, just like no one disputes that mutation is an effect, but the only relevant question is whether anyone sees the effect (or hears the sound). If no one will ever see the effect, then it's indistinguishable from a pure computation that produced the same result, and so you might as well treat is as pure.

all abstractions are leaky, some more so than others

e.g. "we programmers" tend to be able to not think too often about things like cpu cache, memory access patterns, tcp/ip flow control and sliding window retries, tunneling of electrons in gallium arsenide, how much our computer is heating up the world and advancing the heat death of the universe, on-cpu reordering of instructions, maintenance backdoors on our cpus that give other's underlying access and control of what we thought was "ours", how many trees it took to print out that gmail note, how many photons it took to print out that gmail note, etc.

by which i am agreeing in that if it isn't something we can or actually bother to really observe, then at some level we are saying it doesn't matter. or we are desperately trying to assume we can say it doesn't matter.


Then the Ariane rocket explodes because an error in a pure function causes bad data to be fed into the guidance computer because everyone forgot about side effects. The large fireball would seem to be easily observable, despite protestations that the language does not allow side effects. (Edit: this isn't quite what happened, but its similar, and I am sure you get my point).

Who or what is the "Observer"? If I run a Haskell program and it crashes because it runs out of memory, I can easily observe that side effect. Other programs running on the same computer can observe the side-effect. Like the tree-falling, it still happens whether Haskell acknowledges or not.

Simplicity begets correctness

There's a reason why our axioms for the real numbers don't attempt to model "calculator out of memory". It would be absurd to insist that all rocket scientists use some updated math that accounted for the possibility of overflow. It's a similarly bad idea to code in a style that requires attention to memory concerns in every line of code.


This discussion is evidently at the boundary where functional programming meets error-handling. But error handling seems to be a yawning gap in our ability to express programs. I recall a conference keynote talk years back where the speaker (best guess, Kiczales) said the vision inspiring AOP was about how much more cleanly an algorithm could be described if only you could separate out the error handling so you didn't have to muddy the central description of the algorithm with it; the nowhere that AOP has gone seems a commentary on that. I tangled with error-handling myself when I devised guarded continuations for Kernel, inspired by facets of Java exception handling and Scheme continuations — and the lesson I took away from the experience was that we don't have a clue how to handle errors.

Not Absurd

Actually l, I think that is exactly what they should do. For example Rust accounts for overflow, and you have to use 'wrapping' operators (which are not the normal mathematical symbols) if you want to allow overflow/underflow.

Then the Ariane rocket

Then the Ariane rocket explodes because an error in a pure function causes bad data to be fed into the guidance computer because everyone forgot about side effects.

Type soundness for a type system ensures that an expression returns the correct type, or it diverges. If you agree that type soundness is useful despite admitting divergence, I don't see how non-observability of mostly benign effects is much different.

Certainly high assurance domains may require tracking and controlling more effects, like overflows and bound checks, but that doesn't mean ignoring effects is necessarily wrong for general software.


This misses the point, which was that the side effects are observable, by other programs, or the user of the computer.

Your point seems to be the user may not care about failure? This seems an odd position to take? I could understand an engineering argument (the cost to fix the bugs is greater than the cost of failure).

My point was that a pure function cannot fail, as failure is an observable (by the user or other programs) side effect (for example in maths itself functions do not fail). So any function that can fail is not pure. I think a computer program cannot ever be a pure function, it can just be more or less pure than some other computer program.

Edit: Trying to summarise, I don't agree that somehow the implementation of a function is separable from the function itself. Addition is a function, it has certain rules, and is a pure function. If a language has an approximation to addition (for example it may sometimes fail due to out-of-memory) it is neither pure nor addition, but a useful approximation of addition. It is however still a function, just a different one from the mathematical definition of 'addition'. To pretend that this 'different' function actually is addition within certain limitations seems reasonable, but to call it a pure function seems to be claiming too much. Just because you pretend it is addition when reading the program code, does not change the fact that it is not addition, and it can fail.

Edit2: You could think of it like this? What is the specification of the programming language? If the language specification does not allow for things to fail, then I guess you could consider the addition function pure, but this language specification could not be implemented on any real computer in the universe ever. It's not just a limitation of current technology, an infinite computer is impossible. If on the other hand the language specification deals with the realities of computation then addition cannot be pure and is only an approximation of mathematical addition.

If the language

If the language specification does not allow for things to fail, then I guess you could consider the addition function pure, but this language specification could not be implemented on any real computer in the universe ever.

Right, and everyone knows this. Without explicit resource tracking, treating arbitrary precision integers as proper integers is perfectly sensible even if some programs fail due to resource constraints. The equational reasoning we expect of "pure" functions holds, modulo resource constraints, but "modulo resource constraints" is a caveat that's understood to apply to every physical incarnation of abstract computing. Languages that don't track resources then require empirical validation for every application if failure tolerance must be high, but this will probably be true anyway, even for programs written with theorem provers.

If your language has an abstraction for separate tasks that can fail (like concurrent futures), then you can model purity failures of those tasks and recover via fail-fast techniques, but that's fairly coarse-grained failure handling. "Pure" functions that have such failure modes can still be treated as pure within that task though.

I'm not sure what better balance you think is easily achievable. Annotating most functions with allocation effects is just line noise for probably 99.9% of programs in existence. The type and effects research has concluded that you can't automatically infer optimal resource allocation (see section 6), so what better balance do you think is achievable?

Pure Computer Programs

I was really trying to point out that pure computer programs don't exist and that functions that return results by mutable references can be considered purer than those that return them by value. I think the argument that you cannot observe the out of memory condition is wrong, as it mistakes the language itself for the observer. A language cannot observe anything. My point about the Ariane rocket is that the explosion is clearly observable whether the language spec allows for out-of-memory or not. I don't see how any argument that a function is pure because 'something' cannot observe the error can be correct, because 'something else' can still observe it.

As to what I think a solution looks like, I think resource allocation should be explicit, or predictably implicit (like stack allocation), and that effects can be inferred. If you read above I pointed out that you could take a simple imperative language like 'C' and give all the functions the appropriate monadic types, and obviously expressions could be given functional types (although it would be better to even have addition in an 'overflow' monad, and allocation in a 'alloc' monad). Despite this language effectively being imperative, what you would end up with would be as purely functional as Haskell. After the language has been type checked, you could compile in a similar way to 'gcc', thus getting all the performance of 'C'. Rust gets close to this, but does not have any effect control.

Edit: This is also an answer to the original question, the ability to get the best of both worlds, imperative performance, readability and understandability with functional safety, aliasing and effect control is why people should be interested in type theory, category theory and logic.



real world out of memory

I am trying out NixOS for fun and learning. What is the very first serious UX problem I encounter? Yeah, running out of memory. Ha hah!

Not in rust

You can't do that in Rust, having a mutable reference imposes exclusive access to the referenced object. You can have many immutable references though. Rust has statements and mutation. It has for and while loops. Is Rust imperative? By your definition it is not. By the classical definition (imperative = a command or a sequence of commands) it is.

re: not in rust

Rust still has aliasing of state, e.g. via the Cell (edit: oops! that should be Mutex/RwLock) types. The fact that you must temporarily obtain exclusive write access is not relevant. Additionally, like most imperative languages, rust has plenty of implicit aliasing of state through external resources (like the filesystem).

With imperative code, an infinite loop can be productive via manipulations of aliased state. Hence one might say a loop is a sequence of 'commands' affecting an external environment.

For and while loops are easy to model in purely functional languages. However, one can't use them outside of local manipulations of data. E.g. for 'while : (s->bool) -> (s->s) -> s -> s' we can pass in a whole World as 's' but we still can't do anything but eventually return our updated World. Of course, monadic variants of loops do include ability to return to caller to produce intermediate outputs or await some inputs, so can model imperative code.

If Rust removed aliasing of state, then I would not call it imperative. It would become a purely functional language. Albeit, one with a lot of support for affine (or uniqueness) types to support efficient in place mutation.

Rust does not allow aliasing is state

Rust does not allow aliasing of state as far as I can see. Only one mutable reference can be held anywhere, so if a function takes a mutable reference as an argument you are guaranteed only local changes. Yes Cell types exist to allow runtime exclusive access, but I am not sure how this differs from say a mutable reference in Rust. Without Cells you cannot create graphs, so the normal references only allow the creation of DAGs, that seems more "functional" than references in many functional languages.

Is ML a functional language? ML allows side effects anywhere in functions?

re: rust aliasing

While it is true that Rust only allows one 'mut' reference to data to exist at a given time, it is possible in Rust to alias shared data then acquire a 'mut' reference temporarily when one needs it. This can be expressed, for example, through use of locks. By use of locks, multiple threads may both reference the data and modify it statefully in a manner observable to the other threads. Consequently, you have stateful aliasing.

I'm not a rust programmer. It seems I found `Cell` when what I was looking for was `RwLock<T>` or `Mutex<T>`. Regardless, I had read enough about rust to know that shared memory can be directly modeled wherever you want it.

Similarly, actors model has aliasing (two actors share a reference to a third). Actors model has state (an actor may behave differently based on past messages). These two conditions may coexist for a single actor. Hence actors model has stateful aliasing. The fact that each message is handled exclusively is irrelevant. The fact that you don't have direct access to the aliased actor's state is irrelevant.

Is ML a functional language? ML allows side effects anywhere in functions?

I have been using 'functional' to mean 'purely functional' for this discussion. ML is certainly not a purely functional language.


I think even Haskell can have aliased mutability. Take a function that accepts two state monads and does something like copy the contents of the fist to the second reversing the order. Now create a single state and pass is as both arguments to the reverse function. I think Haskell will allow this, and go wrong. Rust will not allow this whether a plain reference (statically checked for a single mutable reference), or a cell (dynamically checked for a single mutable reference). Note that a single mutable reference means no immutable references can be held at the same time, it is totally exclusive.

Here's an example of Haskell going wrong due to aliasing of mutable data:

import Control.Monad.ST
import qualified Data.Vector.Mutable as VM

rev :: VM.MVector s Int -> VM.MVector s Int -> ST s ()
rev u v = do
    let l = min (VM.length u) (VM.length v) in rev' l l
    return ()

    rev' i l | i < 1 = return ()
    rev' i l | i > 0 = do
        x <- u (i - 1)
        VM.write v (l - i) x
        rev' (i - 1) l

test1 :: ST s String
test1 = do
    u <- 3
    VM.write u 0 1
    VM.write u 1 2
    VM.write u 2 3
    rev u u
    a <- u 0
    b <- u 1
    c <- u 2
    return $ show a ++ "," ++ show b ++ "," ++ show c

test :: String
test = runST test1

Which outputs:


So due to aliasing the 'rev' reverse function has gone wrong. This cannot happen in Rust, as to pass two mutable references to 'u' into 'rev' would not be allowed.

re: Haskell, rust, ownership types

As I stated above, "FP can, of course, model imperative aliasing. But must do so explicitly, e.g. using a reference model within a state monad." Use of the ST monad is one such scenario.

Rust uses 'ownership' types to constrain aliasing by default, but also supports use of aliasing. You could model the above scenario in Rust. You'd need to pass in a pair of `Mutex<Vector>`s instead of a `mut` reference. And you'd grab the mutex independently for each read and update. This is unlikely to happen by accident if you're working directly with mutexes. OTOH, it might happen by accident if you first wrap your `Mutex<Vector>` behind a `SharedVector` class so you aren't wrestling with all the mutex line noise.

Rust's ownership types are a variant of substructural type. In particular, they are related to the `affine` type which prevents replication. Support for substructural types is a more or less orthogonal feature to purity or paradigm. I know of several pure languages that have used them (e.g. Mercury, Clean, the DDC Haskell dialect) and I also know of several imperative languages (e.g. Rust, ATS, and Plaid). Rust is perhaps the first language to bring this feature under the lens of mainstream developers, which I find encouraging. (I am fond of substructural types.)

Anyhow, it is true that ownership types enable Rust to constrain code aliasing of references in a manner that (normal) Haskell cannot. Haskell allows `∀v. v→(v,v)` as a function, and hence cannot restrict aliasing of references upon modeling references.


Perhaps you would be happier if I said that Rust makes aliasing explicit, so you have to choose the datatype to enable this? Using a 'Cell' is not the normal way to pass a reference in Rust.

So to take your statement: "FP can, of course, model imperative aliasing. But must do so explicitly, e.g. using a reference model within a state monad".

We can say the same about Rust, it does not normally allow mutable aliasing, but it can model it explicitly, by using a Cell. Due to strong typing, you cannot pass a Cell into a function that does not expect one, nor can you return one where it is not expected. This seems as strong a restriction as a reference in a monad to me.

Note the Haskell example above is using the pure functional 'ST' monad, so the whole thing us pure and does not need to be in the IO monad. The result of runST is not in a monad.

re: explicit aliasing

I'd say that Rust does a lot to discourage aliasing. It's certainly not on the `path of least resistance`. Hence, it's a lot less likely that a Rust programmer is going to make aliasing-related errors than would a C programmer. This is a good thing.

OTOH, nothing prevents a Rust developer from modeling shared mutable maps, multi-writer channels, etc.. Aliasing can quickly become implicit if anyone bothers to write libraries that make it so.

The "you cannot pass a Cell into a function that does not expect one" argument seems weak in the presence of traits, trait objects, implementation hiding, etc.. Regardless, it isn't nearly as strong a restriction as using a reference within a monad. Aliasing within a monad is limited to the scope of the monadic effects interpreter (e.g. a `run` function), and is represented in the type of the computation rather than the available values.


As someone who has programmed both Haskell and Rust, it seems Rust has better control over aliasing than Haskell. You really cannot subvert Rust's aliasing control in the way that you seem to think. Aliasing cannot become implicit because it always requires a different kind of reference, and even then it is dynamically checked via a mutex. For example a rust function like "fn f(&mut x) -> int32" cannot be passed a 'Cell' so within the function you know that 'x' cannot be aliased.

Yes you probably could use 'unsafe' code to subvert this, but Haskell has 'unsafePerformIO' as well, so I see no practical difference.

By your definition Rust would seem to be a functional language as it does not suffer from the aliasing problems of imperative languages (this is actually one of the major design features of Rust, aliasing safety), and you should be happy about that because it would contradict my original point. If Rust is a functional language, it shows how functional languages can match imperative language performance.

Putting that aside, what Rust lacks that Haskell has is a split between pure and impure functions. But what is a pure function. Haskell pure code can fail with out-of-memory errors at runtime, so it is not really pure at all. I think you have to look at this as shades of grey, not black and white. So personally I would add an effect system to Rust (actually it used to have one, but they abandoned it).

Really the difference is in the type-system, you could keep the Rust compiler working exactly as it does now, producing efficient 'C' like code, but by adding monadic effects to the type system, you would have the same degree of control as Haskell.

I think this is my point, an imperative language plus an effect system has all the advantages of a pure function language like Haskell (in the pure parts), yet can be as efficient as 'C' in the impure parts, by not modelling the imperative code, but implementing it directly. The trick is to give 'functional' types to the imperative code using monadic effects. Maybe monads are not necessary either and simpler effect system would be better.

re: rust expressiveness

I think you're underestimating Rust's expressiveness. Given you didn't even seem to be aware of mutex before today, I have my doubts about the strength of your experience with Rust concurrency.

What else might you have missed? (glances at the first page of mpsc documentation). Ah, look, it seems Sender<T> is 'cloneable', enabling multi-writer single-reader channels. Hence, actors model style stateful aliasing is effectively a feature of Rust. Oh, and those atomic types seem to be all about aliasing.

Aliasing might need a 'different kind of reference'. But that's still subject to implementation hiding.

you probably could use 'unsafe' code to subvert this, but Haskell has 'unsafePerformIO' as well, so I see no practical difference.

Use of unsafePerformIO to perform actual IO (as opposed to debugging or performance features) would be a subversion of Haskell's purity. Support for aliasing in Rust, however, does not require subverting any semantics of Rust or its reference model.

By your definition Rust would seem to be a functional language as it does not suffer from the aliasing problems of imperative languages

Imperative is characterized by aliasing of state. Mitigating the problems of aliasing is admirable. But unless you achieved it by eliminating aliasing of state (with a corresponding loss of expressiveness in the direct style), it's still imperative. And not just local state, either. As I mentioned early on, aliasing through the filesystem or FFI also counts.

If Rust is a functional language, it shows how functional languages can match imperative language performance.

Rust is not a functional programming language. But consider a hypothetical language, Rust--. Rust-- is just Rust minus stateful aliasing and FFI. No more Mutex. No Channels. Forking a thread and communicating with it would be tweaked, e.g. based on a process functions concept. Etc.. Rust-- would still be high performance - it could even use the Rust compiler. But it is little more expressive than a purely functional computation.

Channels normally alias a shared message buffer between a writer and a reader. But you could maybe add them or a variant back in. Carefully. In restricted form. No cloneable sender. No 'try_recv'. Even this highly constrained use of stateful aliasing would increase expressiveness beyond that of direct-style pure functions, to something closer to Oz with its logic variables - single-assignment futures. (You might model channels as an effect, of course.)

I think for a lot of the problems you're personally concerned with, Rust-- would probably work fine. But other Rust programmers that use a different subset of Rust's features would be quick to notice the differences and wonder what the hell the Rust-- designer was thinking that made them take away those precious shared mutable bits.

re: imperative plus effects types

I think this is my point, an imperative language plus an effect system has all the advantages of a pure function language like Haskell (in the pure parts), yet can be as efficient as 'C' in the impure parts, by not modelling the imperative code, but implementing it directly. The trick is to give 'functional' types to the imperative effects.

I agree with this overall point. Starting with something like Rust and subtracting/controlling the problematic 'effects' would be a simple and effective approach to high performance functional programming. I encourage you to pursue the idea.

Limits of Rust

Okay, this is a good conclusion. I was aware of Cell, and RefCell, but I personally don't use them, as I try to use the static tools (so & and &mut). I am not convinced that the existence of Cell and RefCell is a problem if none can ever be passed to my code or returned from it without my permission (due to type signatures). I think your point is that I could use some code that internally has a Cell or RefCell and does not expose it in its API, and I agree Rust has this problem, but I don't think it causes many issues in practice (so the Rust browser components shipped so far do seem to be free of exploitable bugs). I think there are two solutions to this, one is that code that uses a cell internally needs to be proved safe, rust does this by forcing runtime dynamic checking on the Cell and RefCell, however you can do worse with 'unsafe' code and basically have a plain 'C' pointer. My approach to this would be some kind of proof carrying code that if you can prove the code is safe to the compiler, you should be able to use in 'pure' code. The second as we have discussed is an effect system. Real pure code will look like an imperative procedure where return arguments are passed by mutable reference and it has no local variables, and no division or other operation that can cause a runtime exception, as this will not fail at runtime due to out-of-memory errors etc.


I think ATS does a lot of what you're envisioning, e.g. including proof carrying code and types to control aliasing and effects, and might be a closer fit than Rust. It has an atrocious syntax, unfortunately. But I do recommend you experiment with and learn from it.

Aliasing of state

So, what about building a language on serious alias control? Rather than taking Rust's approach of rationalizing C-style semantics, you could view this as starting with a pure-functional/linear typed semantics, and building it into a practical engineering tool.

First, distinguish between pure-functional values/functions, and imperative-type resources/operations. (Operations may destructively update resources passed as arguments, but are otherwise composable pure values, like functions).

Then, add the default assumption that resources used by a given operation may not alias (unless so specified by the type interface).

In most situations, this could be seen as simply systematizing good practice. However, I worry about cases where mutable references really are a natural way to express things...

re: control of aliasing - ss types

I think it's a fine idea to control aliasing. Substructural types are awesome. Everyone should use them. OTOH, it isn't a new idea. There's a lot of experience with these things going back over 30 years that you can review and learn from. Look at Clean, Mercury, DDC Haskell, Plaid, ATS, etc..

An interesting feature with full substructural types is that you can easily enforce protocols (do X at least once, or at most once, or X before Y). This can be simple stuff like making sure you close your open files. Or it can be sophisticated stuff like multi-step handshakes. Structured programming is a lot less relevant because 'structure' is now represented in a set of available data objects.

forest and trees

I admit to frustration in reading this thread. It seems to have started out reaching for big, sweeping ideas, and somehow lost track of them as it degenerated into fine details. As I see it: We live in a world where things actually happen. There's an agenda in play to write programs, to be embedded in this "impure" world, using a "pure" mathematical model where things don't happen. Some people reckon it's inevitably clumsy to describe a changing world using an unchanging model. Other people reckon the elegance of the mathematical model is more valuable than matching the model to the "change" aspect of the world in which the program is embedded. I suppose this could get somewhat subtle, as one considers different consequences of using a pure model to describe a program embedded in (not just interacting with) a changeful world. But still, the moment the big ideas are no longer visible, the details are likely to stray from the point.

Side effects in pure FP / Mutation alternatives

I appreciate pure FP with awareness of the side effects it does permit. It does permit computations that consume some nonzero amount of time and other resources. It does permit taking on the nondeterministic risk that cosmic rays, language implementation bugs, or interactive debuggers will interfere in the computation. It does permit code snippet A to make use of the implementation details of code snippet A, even though code snippet B has no access to them.

In some situations I'm interested in, it's actually pretty interesting to imagine the use of implementation details as a side effect, in which case pure programs are not quite pure enough:

  • I'm interested in being able to distribute code to other agents on an open network so they can run it locally, but I want to treat some code as a well-protected implementation detail that never gets distributed. I don't want to bend my code out of shape whenever I adjust my decision.
  • I'm interested in the (very mainstream) situation where more than one person maintains a single body of code, just working on different pieces, and they want to be sure they're not interfering with each other so they don't have to schedule constant meetings. I think this keeps syntax and modularity goals well-grounded in a use case. The way I view this is that the syntax itself has a sort of coinductive or late-bound structure, and we typically only maintain small pieces of it at a time, like whatever we have scrolled into view or loaded into a typechecker. (This gets implicated in concurrency as well. If systems of interacting agents can be thought of as code maintainers, then maybe we can apply lessons from language syntax and modularity in the design of their shared state resources and vice versa.)

With all this in mind, pure FP does permit quite a few things that could be considered side effects. However, for all that it permits, including a limited form of nondeterminism, it manages to keep programs deterministic up to their own implementation details.

Determinism makes it easier to talk about two subprograms being "the same" after a refactor. Since a (successful) program deterministically results in a value, we can pretty much treat the value space as the semantics, rather than resorting to some more heavyweight denotational semantics like store-passing style or core dumps.

If pure FP is not pure enough for some purpose, that's a good reason to pursue models of computing that are purer in some ways. Maybe they do need to be less pure in other ways, but it's easy to give up too soon, and there seems to be a lot of potential to stay within the bounds. For instance, linear typing's prohibition of aliasing "effects" does nothing to stop us from reading its programs as pure FP programs to discuss their semantics, but it makes it possible to represent first-class outside worlds so that we can do mutation and concurrency without the continuation-passing style of the monadic approach.


If a resource-conscious language needs to have mutation, my gut reaction is that the language might be sticking too close to its predecessors. Mutation threatens to sequentialize the programs that heavily use it, and a single act of mutation can generate garbage and do aliasing at the same time. It seems almost antithetical to efficient use of resources; would a resource-conscious language really use it?

I don't really know enough to seriously discuss alternatives, but speaking casually, I imagine a system of regions that can toggle between modes: A region can be readable, writable, or a container for a resource-bounded, typed computation in progress. The mode-toggling action itself and any other sychronization between computations would occur in the style of transactional writes.

I haven't implemented this, and there really are huge gaps in the design when I try to formalize it, like how much space it takes up to store the transactions before they're processed, or how much scratch area the typechecker needs when we toggle a region from the writable mode to the computation mode. :)

Seems a balanced response

This seems a balanced response to me. I would point out that the refactoring comment is not always true, as with lazy languages like Haskell small changes can move you from bounded memory (consuming a stream in bounded finite space) to unbounded (creating an unbounded heap if thunks for later evaluation).

Further all programs use mutation. Even the purest functional language I have seen uses a stack, which relies on mutation to work.

Languages like Rust avoid aliasing during mutation by using affine types. Would resource conscious languages use mutation? When writing for very small memory use if assembly is still common, and yes use of mutation results in a much smaller program. A single byte can act as an event counter, geo-coordinates get updated in-place by interrupts etc.

Pure functional languages model mutation (like the ST monad in Haskell). I see no reason you cannot translate that monadic code into an imperative language like 'C', and then compile it. In which case what advantage do you get from modelling mutation rather than implementing it directly. The difference seems to me to be simply in the type system, where the monad let's you constrain where mutation can happen.

Imagine we take 'C' and implement a type system which gives monadic types to impure 'C' programs, and pure types to pure 'C' fragments like expressions. How is this any different from a pure functional language?

i think yes

isn't that what some commercial / ivory tower projects claim to do with regular old C? either statically check extra cruft, or let you add annotations for static checking, or move it into a C+++ with such features, and then compile down to C. e.g. i dunno ATS or something for random example?

"simply in the type system, where the monad let's [sic] you constrain where mutation can happen" --> errr, yes, would that not be the whole point of how FPs can suck infinitely less than C?! :-)

copy the world on tick

A state monad can be implemented in various ways under the covers I guess. Anything with a loop can split the available memory in half (2 regions) and then on every loop the current state is passed in readonly and the next state is passed in writeable. At the end of the loop you have your new state; clone and swap back and forth on each loop. No extra allocations happen and nothing gets freed so you avoid spending extra time on memory but you get to do mutation, at least if you e.g. are writing a video game where you decide bduf how many particles maximum you are going to allow but i am probably talking out of my ass.

Functional Languages will dominate.

Functional programming is the only reasonable way to make use of scalable multiprocessing, parallel processing, and vector processing.

Whatever language we use in the future, multiprocessing, parallel processing, and vector processing are going to continue to scale bigger and bigger and bigger. Therefore I expect that many if not most of the dominant programming languages in another ten years or so will make functional programming easy. I do not claim they're going to be purely functional like Haskell; I claim that they're going to make functional programming easy, like Scheme or Clojure.

I once implemented a Pascal-subset compiler using Scheme lisp as the implementation language. Scheme is not purely functional. It can mutate variables. It can do direct I/O. But it's constructed so that ordinarily there's just no need to do those things. I realized after the fact that I hadn't used mutation even once in the whole project, that there were no operations affecting output that were done before the whole input was read, and there was no output that was done except as the last step before program exit. A pure functional program just came about naturally by the interaction of my normal, ordinary straightforward programming style and a language that makes functional programming easy.

Screw stateful objects. They are confusing and don't parallelize.

re: functional languages will dominate

Compilers are one of those areas where the program-is-a-function philosophy is an obvious fit. For better or worse, that experience doesn't generalize to problems such as robotics control or multi-player video games.

For general systems programming, a functional program must model effects. These effects are modeled explicitly in the path of normal call-return dataflow, rather than to the side of it. Nonetheless, use of effects with incremental computations does introduce challenges for reasoning about whole-program behavior.

State isn't really avoided, either. State machines might be modeled with the general form `type M i o = i → (o, M i o)`. Our program functions may fold a very large stream of inputs over time, and understanding that state isn't necessarily easy.

If FP dominates, it won't be due to avoiding effects or state, but rather due to simplifying and controlling the problems of testing in presence of effects and state, and reducing barriers for reuse of programs in new contexts (e.g. ability to wrap a program that internally uses message passing within an entirely different effects system).

imperative != stateful objects

Screw stateful objects. They are confusing and don't parallelize.

Sure, but traditional bang-on-records-in-place databases parallelize just fine and somehow manage to remain far less confusing than typical ravioli code.

Some problems/solutions are more easily modeled imperatively, even parallel ones! Don't conflate "imperative" with the historical failings of languages centered around mutation & global memory pointers.

There's a separate argument around whether or not forcing all logically-imperative operations through the monad pinhole is a good idea or not.

Fp will dominate?

Hang on there! You mean "at this time its the only well understood way" do you not?

And I would question that. The fact is most people doing low level high performance programming use C or Fortran or assembler and do not, in fact, understand FP. Furthermore current FP compilers rely on optimisations such as self-tail-rec which are incompatible with concurrent performance GC due to the need for write barriers (at this time though that seems about to change)

You may be right, but it is definitely not well understood. Ocaml, for example, cannot even do pre-emptive multi-tasking at this time. (You can have threads but they interleave via a global lock).

Furthermore, for better or worse, CPUs are built for performance of existing languages like C and Java. Ugghh...

I do agree however in this day and age any language that does not support the common themes of FP is badly broken: pure functions possible, GC, higher order functions, parametric polymorphism, variant types .. to name a few essentials. +1 to Apple for Swift.

Given all the work done on FPLs, it would be mad not to actually USE that research. It just doesn't have to be the end of it.

CPU design

I think the idea that cpus are designed for 'C' and Java is a bit backwards. CPUs are designed to maximise instruction throughput, and ALUs are designed to maximise operations for a given gate count. CPUs for functional language gave been tried before, but the architecture just does not work as well. In the end a CPU operates by mutating memory very quickly. The idea that there is some kind of functional CPU that could be competitive with modern super-scalar out-of-order speculative CPUs seems to be pie in the sky to me. I would be happy to be proved wrong, but my opinion is its not possible due fundamental architectural issues.

"well understood" vs. "no, it's just that way."

FP being the only thing that reasonably scales any single process to unspecified numbers of CPUs or cores available isn't just a matter of it being well studied. It's because pure FP can proceed without locking overhead.

Nothing that relies on access to variables or objects between unsynchronized processing nodes scales without locking overhead unless all those variables are write-once. In fact I should have said bindings then because they don't vary.

Any non-FP thing that parallelizes, parallelizes solely because whatever parts of it are non-FP, are not fighting over variables. Meaning, it's relatively easy to handle a thousand interactive sessions using non-FP threads, where each session has its own set of variables which it doesn't need locks for and doesn't need access to any other session's variables. But if there's any single thing, or any universal realtime-shared things, that all those sessions have to get a lock for and read and write, then there is a scale at which that system breaks. It is simply a fact of physics that locking overhead scales with the square of interprocess bandwidth.

So yeah, you can go non-FP at the program's leaf nodes, if there's nothing that has to be shared around and modified by all those leaf nodes.

But at some scale, where you have to do anything big that requires lots of theads to access the "same" thing, you have to either accept losses, or go FP. It simply isn't possible to write and scale without limit any huge application with globally shared bindings, unless those bindings are immutable. Databases absolutely push the limit on this just as hard as they possibly can, but it's still true - at some scale either the database program breaks, or you absolutely require the data that the different threads access to be disjoint. The indicia that tell sessions where to find this disjoint data, can be widely shared but the wider they are shared through the distributed application, the more rarely they must be written, or the application melts down at some scale.

The core of any app that's big enough, has to use immutable bindings.

monotonic data structures

subjectively agreed, and I feel like the non-ACID distributed data structure stuff is where the future lies.

FP modelling

Yes, of course FP can model it, it can model anything (modellable, if that's a word).

That isn't the point. You can use Haskell to write a program that "models" any C program. And you can write a C program that "models" any Haskell program: on the latter point one may say, more precisely, Haskell can be "modelled" in x86 assembler, and there's a weird tool that does that, called a compiler.

The point is you don't want to. If you want to do something, use a language that is designed to do things .. I hesitate to say "C" but it is a language which emphasises control flow and mutation. If you want to calculate stuff, use an FPL.

If you want to do BOTH what do you do? If you use C, you will find functional code hard to write. If you use Haskell you may have a bit of trouble with array modifications, and you'll need a lot of housekeeping crud to model imperative operations.

What you want is a language that can do BOTH well, and allow you to "swap" between techniques "seamlessly". The issue is about expressiveness, not capability.

Look at Ocaml. It has a functional base but throws in a mutable record field and some mutable data structures in the library. It can do both, but the integration is very bad. If you dare use mutable fields or hashtables your ability to reason about the code with functional arguments is gone.

re: FP modeling

I do understand your opinion on 'doing' vs 'calculating'. I remember expressing a similar opinion on this site not too many years ago, back when I still was trying to grasp FP from a mental framework of imperative experience.

But modeling effects is superior in many respects compared to reaching out and just 'doing' effects during computation. Superior for testing, reasoning, reuse, control, security, etc.. Maybe not superior for performance, but maybe not much worse.

For example, I might model a program that interacts with a filesystem as producing a series of 'Read Filename (Range)' and 'Write Filename (Range) (Data)' requests (perhaps simpler than we'd really want, but sufficient for discussion). By modeling this explicitly: I can test my program in pure, ad-hoc mockup environments. I can control which effects my program uses. I can wrap my program to compute bytes read and written. I can write a trivial adapter to use a database or web service instead. I could limit my program to use specific files.

I do want to model effects in a pure style. Even when my program is mostly expressing effects or awaiting responses rather than calculating. Perhaps especially then.

Thus rather than 'switching' styles, I'd be more interested in languages that make it easy to model effects easily, efficiently, transparently, and compositionally. Haskell is not the ideal example here, since the nicer effects models (algebraic effects and handlers, freer monads with extensible effects, etc.) weren't developed until more recently and Haskell is dealing with weight of its opaque IO legacy.


FP is only superior for reasoning because that is where the most theoretical research has been done. This is natural due to the connection between mathematics and functional programming.

Now you said "I do want to model effects in a pure style." I agree, but FP does not do that.

In the Felix system, we have an "Algol like" combination of functional and procedural programming. The functional component by Felix base rules may observe effects but may not cause any (which is not an entirely satisfactory situation). If you want to, say, modify the file system, you can't use a function, you have to use a procedure.

Functions may call procedures *provided* their effects are mutations local to the function (i.e. not escaping it). Except for debugging.

Yet, ordinary procedures alone are like gotos: they're open ended, they can do have any effects. Felix has an effects management system (in user space only) based on row polymorphism. It's not clear how useful this can be.

So we must fix the procedural system. Enter coroutines. Actually fibres.

A fibre can be pure or not. The important thing is that you have a unit of modularity with documented effects, and *hopefully* a way of calculating the effect of compositions of fibres.

With fibres communicating synchronously using channels, the problem of calculating possible control flows (and thus effects) is basically reduced to calculating ordering of synchronisation events on schannels, which is constrained by the connection topology, so there is a definite model of things to reason about.

I can do this for simple models like pipelines. Now I want a more comprehensive theory.

Example of reasoning: A fibre reads (on a synchronous channel) a piece of data, modifies a file, reads again. Deduction: the file got changed between two reads. Knowing the effect isn't the issue. The issue is: when did it happen? I have no exact answer but I DO have an approximation.

So, I have a TYPE. Give me more! An ordering protocol is a type.


To me channels are the important abstraction. I don't see why it matters whether concurrency is achieved by fibres or processes. Haskell's Channels which communicate between processes seem to offer all the language properties you want.

Note, I am not saying fibres are a bad idea, I appreciate the performance advantage of a stackless model with systems that process many requests in parallel, but that the programmer model is the same as with processes, except you need to worry about not relinquishing the CPU with fibres.


The point of fibres is that they are NOT concurrent. With functions (and traditional procedures) you have a single thread of control and a method of decomposition of code into subroutines activated by a master/slave operation (call/return).

Many languages have enhancements because this model is very hard to use: Scheme has call/cc, C++ has exceptions, Felix has non-local gotos. Unfortunately none of these modifications are any good.

Fibres are coroutines, and use a peer/peer control exchange, and I claim this enhancement has more promise because it is simple and appears even more fundamental than call/return. The core control flow is given by the branch and link instruction, in which you jump to an address stored in register A, saving the current continuation in B.

If you load A with a constant, this is precisely a subroutine call. Return then jumps to B (the continuation) without saving anything (more precisely the continuation stored in A is ignored).

So basically C was a design error from the start. The machine it was designed for (PDP-11 basically) had a branch and link but they didn't use it.

Function calls (and procedure calls) and both subobtimal concepts. Exchange of control subsumes it.

In principle synchronous channel I/O is higher level and more abstract, because a rendezvous does not specify which continuation runs next, but despite the indeterminacy, it does specify exactly *one* of them runs next.

My current experiments seem to indicate, paradoxically perhaps, that the indeterminacy is actually essential to reasoning, it actually simplifies things.

Anyhow my point is, coroutines provide a way to decompose *sequential* code into smaller units which do not require making arbitrary choices about which code is master and which is slave.

Slaves are hard to write. So coroutines are often better than functions. A good example: fold. Folds are easy to write, but the user processing function (the argument) is very hard to write. Control invert that into code that calls an iterator so the user processing is easy to write, and then the iterator (cofold) is hard to write.

Write two coroutines connected by a channel, and BOTH parts are masters and so easy to write. Surprisingly, Higher Order Functions are a good example of what's wrong with functional programming. Of course traditional imperative code (eg C++) just has the dual problem instead. Coroutines solve the problem.

Ideally, you use functional code when it is convenient, procedural code when that is good, and coroutines when that is the best. Procedures are the dual of functions, but coroutines are self dual.

Re: Fibres

I agree with all your points. I don't like that loops can prevent prevent context switching, and I don't know if there is a solution to this. In some ways the process model, where no memory is shared between processes and all data has to go over channels is cleaner, and can be implemented using fibres without changing the semantics. I think all the above points are true for processes and channels too.

re: fiber concurrency and loops

I have not read all of skaller's post. (I tend not to be interested in channels per se. I think about messages, streams via pipes or sockets, and cond vars with fibers running in the same local vat. The last must be local, but the other two can also have remote endpoints.)

The point of fibres is that they are NOT concurrent.

They are not parallel in the sense of running simultaneously on more than one CPU, but they are indeed concurrent in the sense of interleaved execution, if one yields to another before exiting. (We need a better word than concurrent, since it means interleaved, whether or not parallel, and thus conveys no information about parallelism beyond hinting it may be present.)

I don't like that loops can prevent prevent context switching, and I don't know if there is a solution to this.

You cannot let a fiber loop without yielding longer than it's target fiber timeslice, perhaps approximated by only p partial iterations of N total loop iterations. For performance you want to let a loop run without yielding a short while. But eventually you need to construct a continuation that resumes the loop where where it yields via park, so it can resume later when unparked.

Letting a user do that is error prone. So I prefer to have code rewriting do it, at the time continuations are represented. I would annotate a fiber with a profile that indicates how many iterations can go uninterrupted before a yield. In this sense a loop cannot prevent context switching longer than some small time span. (It really only matters in "tight loops" calling no async functions, which would yield anyway on timeslice exhaustion.)

Edit: of course, fibers running in another thread can run in parallel, but that is hidden inside the other thread. That just looks like other threads can run in parallel with the thread hosting fibers. Whether they have more fibers inside is irrelevant.

Process model, fibre execution.

I think having the compiler do the transformation into fibres would work for me. In this case the programmer will be using the process model or thread model in the code they see. Personally I think mutable shared memory should be avoided, so I would go for the process model where no memory is shared, and you can only communicate by message passing over pipes/streams/channels.

re: process model and immutable data

I want tools to offer affordances which let me do things when I want, but catch me breaking rules I adopt (as indicated by things I do in code). So I like immutable data many places, but sometimes I need to instantiate a model of something that has local mutable memory, like cloning a debugged C tool as a lightweight process. There I want mutation. But if that ported code steps on memory others think is immutable, I want it detected. More below.

I think having the compiler do the transformation into fibres would work for me.

I want a model of lightweight threads (aka fibers) to be first class, so code can know they exist and manipulate them, via grouping and spawning, etc. But I want the compiler to manage the messy details involved in what a fiber actually has to be in the runtime. I want to write code that looks like it runs alone, and looks natural. So I want the compiler to do anything I don't want to see unless I review intermediary forms. One transformation might be to turn it into an actual native thread; here making APIs used compatible is hard without an abstraction that looks the same for fibers and threads, even when there are system calls involved for threads.

In this case the programmer will be using the process model or thread model in the code they see.

Yes, you want things to look overtly like processes and threads, but presented as abstractions that might be native or lightweight.

Personally I think mutable shared memory should be avoided, so I would go for the process model where no memory is shared,

I like shared in a process, but normally not shared between (lightweight) processes, unless immutable and therefore not subject to sync issues. As an exception, sometimes you want to model something complex as multiple processes, for organization sake, despite them representing one tightly integrated "program" taken all together. You can think if this as an "app", that could have been just one process, except that would have made things more complex than breaking it up into sub-processes devoted to different things. For example, you might want some of them killable, where others can recover at less cost than restarting the app.

Where processes are not part of one app, you want them to interact only in terms of immutable data: messages, streams, etc.

and you can only communicate by message passing over pipes/streams/channels.

That's the model I like between apps, where an app might be a collection of mutually dependent processes.

I like a fiber runtime to allocate large contiguous blocks of memory from the host environment, from which it sub-allocates to meet both runtime and fiber demand. This makes it easier to associate metainfo with every block of memory allocated, such as refcount, generation number, optional checksum, and immutable status. (Ideally, this is somewhere else in the same large contiguous block, not in the same cache line as the usable memory start. It's better for alignment and makes it hard to step on by accident.)

When a piece of memory becomes immutable, and marked as such, you can take a checksum then. You would always do this when debugging, but might also do so in release builds if you want a high standard of quality. When you free memory, you verify the checksum is right when it was captured for immutable data. Any time it's wrong you signal "the world is on fire" in a suitable way.

push pull dataflows

I agree that coroutines or fibres can be a useful abstraction. Though, I think it isn't the specific abstraction that matters so much as supporting both 'push' and 'pull' of data. Fibres are pushing data on output channels and pulling it from input channels. But publish/subscribe models, tuple spaces, behavioral programming, etc. offer similar benefits to expressiveness.

Such abstractions can be conveniently expressed in FP assuming a lightweight notation for sequencing small step computations (e.g. do-notation or a sugar for continuation passing style).

re: reasoning

FP is only superior for reasoning because that is where the most theoretical research has been done.

I think it's superior for reasoning mostly because it makes explicit a lot of what is implicit in other models. E.g. in actors model, some implicit things include: actor names, messages in flight, routing. To model actors within a functional computation, you would need to represent these things. With your fibres, implicit things include: scheduling, and any effects outside of reading/writing on pins (such as a filesystem interaction).

you said "I do want to model effects in a pure style." I agree, but FP does not do that

You might take my meaning as more or less equivalent to "model effects in a purely functional style". So pure FP does that pretty much by definition.

Though, I'm not totally on board with the 'opaque' effects models (like Haskell's current IO implementation). Opaque should certainly be an option (data hiding is a thing). But transparent values that I can intercept, rewrite, observe, etc.. offer greater utility.

ordinary procedures alone are like gotos

(Aside) In context of functional languages, I like to think of 'tailcalls' as the equivalent to 'goto'. Better, it's a parameterized goto. :D

The issue is: when did it happen? I have no exact answer but I DO have an approximation.

I hope you have some luck with that. Consider trying to predict which pins a fiber will next read or write. You might need dependent types in general, but you can probably cover a lot of common cases without dependency.

Unringing the bell

I'd be more interested in languages that make it easy to model effects easily, efficiently, transparently, and compositionally

Me too.

I do want to model effects in a pure style.

We might agree here, but I want to clarify: I don't want to have to model effects in a pure style myself, I want to be able to recover a pure modeling when it is convenient to do so. This turns out to be pretty tricky, but I think quite doable in practice.

One challenge with effects is that they have dynamic extent, so if you model them explicitly, you get viral monads in function signatures. If you make effects implicit, then you wind up with accidental effects. It's hard to tell what is an accidental effect because hidden effects are useful.

Consider an Eff style effects/handlers model where you have some data that represents an effectual operation and a special form to "raise" that operation for handling. If you have operation Outbound(stdout, "foo") you want to be able to call it Print("foo") and have that implicitly perform Raise(Outbound(stdout, "foo")).

You can always do something like CaptureOutput{ ...Print("foo")...} to recover a string, but what you really want is something like captured = CaptureOperation{...Print("foo")...} so that you can reraise that effect: Raise(captured)

Being forced to explicitly model everything 100% of the time as data has lots of benefits, but it is very verbose. I want to be able to explicitly model things with effects and recover an implicit, pure model when it suits me.

Further, I had a hard time coming up the name "Outbound" for the operation (noun) to differentiate from the function (verb). Names like "Print", "Write", etc don't immediately tell you operation data vs effectful function easily. A more implicit modeling of effects would enable greater economy of taxonomy.


I don't have any problem whatsoever with notational sugars or type systems being optimized to keep effects out of one's way during normal programming. Abstraction and composition of effects should certainly be feasible (if not, that's a PL failure).

But I don't think of this as "recovering" a pure modeling. From what would I be recovering it? The model was pure all along.

hard time coming up the name "Outbound" for the operation (noun) to differentiate from the function (verb). Names like "Print", "Write", etc don't immediately tell you operation data vs effectful function easily.

Well, you can always take the Haskell route: capitalized is data, lower case is verb.

Or you could take the Forth route and just skip the use of nouns entirely. :D

right on

As a Regular Joe day job programmer, I see lots of ways things can go wrong in code. I have only ever more wanted pure functional style over the years, never less. I want all the stuff to be first class and higher order as much as possible. DCC indicated that the explicitness can quickly go wrong. I dunno about ATS, et. al. If there were inference or something, or a way to pull out the big gun tools if/when needed, instead of having them required/on all the time, that'd be cool.


I think ATS can do this. The precise typing and proofs are not required. See the original tutorials where you can write things without bothering with proofs. Exactly how seamless this is (migrating from no proofs to adding them to parts of your code) is not clear to me but I asked Hongwei Xi about this and he was quite definite that the intent is to allow using crude types and then refining them.

Better than random

I would rather have a type system based on logic than something ad-hoc and random. At least you can reason about types in languages like Haskell without falling immediately into unsoundness.

I agree that the limitations of mathematics should not limit programming, and I tend to favour imperative constructs for their inherant readability and understandability. Having spent some time with both Rust and Python recently, I really miss type signatures in Python, especially for return types when the type is a collection. In Python I have to look at the source file (and which import and where in the filesystem the definition is, is not obvious either) when a simple type signature like: "Dictionary<String, Int> heights = get_people()" would normally avoid the need to even look.

I can do without the Category theory, but due to the computational trinity, you can view types as logic or categories. I would like to say I get category theory but every time I try and understand it my large intestine leaps through my neck and tries to strangle my brain. I have no such problems with the logic interpretation. I can only conclude that category theory is the mathematical equivalent to Vogon poetry.

Any type system

Any type system can be understood as a logic. The surprise in Curry-Howard is that basic typing constructions correspond so very closely to familiar basic logical principles. In particular, function application corresponds perfectly to modus ponens: Given an expression with type A->B, and an expression with type A, you can put them together to get an expression with type B. Given a proof that A implies B, and a proof of A, you can put them together to get a proof of B.

Despite my criticisms of category theory (and despite my lol at the Vogon reference), I've found that on occasion, looking at ordinary things through the lens of category theory can provide glimpses of beauty that was invisibly there all along. (I feel inspired to write a blog post, if I can find time and somehow overcome my uncertainty over how to generate and embed 3D animations; not everyone could visualize the twisted hourglass of an adjunction with less than a 3D animation.)


My problem with category theory is probably one of presentation. It is normally presented in a very abstract way, and I am a kind of 'learn by doing' sort if person. I can re-read the same chapter of a book several times and still not get it. I either need someone to discuss it with (and by that I mean someone I can explain what I think, and have them correct me) or I need to do it (use it in a meaningful context). I think category theory is interesting, but I find logic more intuitive, hence my joke about Vogon poetry (and for and purists out there, it was of course Grunthos the Flatulent's small intestine that saved civilisation by strangling his brain before he could recite his 12 book epic, nothing to do with Vogons, except of course the bad poetry).

please, please do

i think this would be a true contribution to the development of humanity. no joke at all. it would be awesome.

I expect my best efforts

I expect my best efforts will fall far short of expectations; but even so it's encouraging to know someone else sees the potential in the idea.

I think type systems are more like unification.

As somebody who worked with logical-inference systems and the unification algorithm right out of college, the logic of type systems has always been familiar from that.

You know the way horned clauses work in prolog? Where you have something like

(parent, ?A, ?B) ^ (male, ?A) :- (father, ?A, ?B)
(parent, ?A, ?B) ^ (female, ?B) :- (daughter, ?B, ?A)
(parent, ?A, ?B) ^ (parent, ?A, ?C) :- (sibling, ?B, ?C)
(sibling, ?A, ?B) :- (sibling, ?B, ?A)
(sibling, ?A, ?B) ^ (male, ?A) :- (brother, ?A, ?B)

And so on, with the more-or-less obvious meanings.

Anyway, the basic Curry-Howard logic seems to me to be exactly the same Prolog-ish unification algorithm, and you could write it in exactly the same way as a production system.

(ReturnsType, ?PROCA, ?TYPEA) ^ (Args, ?PROCA, ?B) ^ (IsType, ?B, ?TYPEB) :- (IsType, ?TYPEA, ?PROCA(?TYPEB))
(Arg, +, ?A) :- (IsType, Numeric_Type, ?A)
(Returns, +, ?B) :- (IsType, Numeric_Type, ?B)
(IsType, Integer_Type, ?A) :- (IsType, Numeric_Type, ?A)
(IsType, Float_Type, ?A) :- (Istype, Numeric_Type, ?A)
(IsType, Numeric_Type, ?A) ^ (~ IsType, Float_Type, ?A) ^ (~ IsType, Integer_Type, ?A) :- falsum

and so on are its basic laws, and all the stuff we did for inference production systems-as-expert-systems back in the day (including hyperoptimizations of implementation like the RETE algorithm and shortcuts like backward-chaining plus forward-chaining to meet in the middle, etc) is applicable to type inference. That which can be deduced about type is simply the transitive closure of all the facts under the set of production rules.

The ability to derive a contradiction (or falsum) corresponds to a type error.


I can confirm it is the same algorithm, having written a Prolog interpreter and an HM type inferencer. Also interesting is that type classes operate exactly like Horn clauses, although in Haskell they are closest match with no backtracking. Rust traits have backtracking so the type system is effectively an interpreter for the positive subset of pure Prolog. In my language project I am using a Prolog dialect as the type system, so the inference algorithm is written in the embedded Prolog, and it is user extensible. There is no specific mechanism for type classes, they are just Horn clauses with associated code (it's very much a work in progress).

see Shen

Shen has a build in prolog that is used to define new types.