A Functional Representation of Data Structures with a Hole (1998)

LtU doesn't seem to have any mentions of this paper by Minamide from 1998.

Here's a simple problem that benefits from the ideas described in the paper. The usual way of writing functions like "map" or "append" in a strict language like ML involves building the result list in reverse and then reversing it, which uses 2x more memory than needed. Imperative languages like Java can do without intermediate lists, but at the cost of using mutation on the list nodes after allocating them. To get the best of both worlds, we can extend the idea of tail recursion to also allow "tail allocation" aka "destination passing style", which just means that a recursive call nested within a constructor call should be treated as tail-recursive, and compiled to code that uses mutation internally. That way, the naive implementations of functions like "map" and "append" become as efficient as imperative code, allocating only as much memory as needed for the final result.

Some folks on the OCaml mailing lists have pointed out that such an approach would break some assumptions of the garbage collector, namely that the old generation never points into the young generation. I don't know if that problem outweighs the potential benefits of the technique.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Interesting

The problem that the natural way to write the "append" function is not tail-recursive is well-known. There are five classic solutions:

- Tail-recursion modulo cons: add a compiler optimization that recognizes these patterns (the non-tail-recursive context is a form of data constructor context that can be turned into partially-defined data and unified later) and make the transformation automatically; I believe Lisp/Scheme compilers have implemented that in the wild, but to my knowledge that is not the case of ML compilers. This is the same kind of theory later used in the Clements and Felleisen work on formalizing stack inspection.

- Explicit destination-passing style: with a linear type system that supports linear mutable state, you can explicitly write programs in destination-passing style. The experimental Mezzo language, for example, lets you do that in a relatively convenient way.

- Lazyness by default: if your data constructors are lazy, non-tailness of map/append is not a problem as long as the *consumer* of the result is itself lazy or productive (it starts by forcing the constructors rather than the recursive result).

- CPS-style or CPS compilation. Writing the function in CPS style, or having your compiler use a CPS representation that allocate the call stack on the heap (no OS limit) will diminish the need for tail-recursion. You still have the problem of double-allocation, but it may still be faster than mutating approaches because of GC barriers.

- Difference lists, when you have unification variables.

Minamide proposes a variant of "explicit destination-passing style" where an explicit mutation is replaced by the application of a linearly-used function. The linear invariant guarantees that corresponding closure can be represented exactly as the partially-defined data of the explicit imperative algorithm -- along with a pointer to thole.

That is a cute solution in the sense that it may feel more ML-ish than the imperative approach: it is a nicer functional idiom for this need. You still need a linear typing discipline (so, semantically, you pay the cost of the imperative style), but with an interesting functional syntax. One question is whether that is actually as general as the imperative approach (which also helps for eg. bootstrapping recursive data and all sort of recursive initialization), or that is restricted to special use-cases -- and thus maybe users would be better-served by the full imperative route.

Another way to think of hole-functions is that they are a special optimized case of CPS style, when continuations are used linearly and their closures can be represented exactly as a partially-defined prefix of the continuation's result. "Difference continuations", may we say.

I find it interesting that lazy thunks can be thought as generalized cases of such "hole functions", without the linear invariant: after a thunk is mutated, future access to the thunk (here the hole-punched function) access the full initialized values. A difference is that thunks do not get passed the value-to-put-in-the-hole at forcing time, they already know what it is. Hole functions do not say what will go in the hole, but in all examples it looks like we actually know what should go there at hole-function definition time: the flexibility to define the function and the hole-value separately is not really useful. This could suggest that a controlled use of lazyness (where forcing happens )

(Note that people that think that destination-passing style should be invisible to the programmer, that is favor tail-recursion modulo cons as a compiler optimization, will not care much about the difference between destination-passing-style, CPS or hole-functions, except maybe if you want to avoid having mutation in your intermediate compiler language as well.)

One issue with those destination-passing style approach is the interaction with the garbage collection:

- Mutating values after the fact may need a write barrier which may make this implementation *less* efficient than the usual double-allocation one. I'm not an expert on generational stuff, maybe it's possible to special-case those partially-defined values to avoid that.

- Having a pointer to the hole inside a value may break a GC invariant than internal pointers always go to the beginning of a value (where tag information is present). Minamide actually has a paragraph about that (quote below), but if I understand correctly the experiment is done in the middle of an experimental TIL compiler with a tag-free GC, so the approach may not be easy to adapt to more realistic current implementations.

Our compiler uses tag-free copying garbage collection. There is one problem in the implementation of garbage collection in presence of hole functions: holes point to the middle of objects in heap memory. Thus we have to treat holes carefully in implementation of garbage collection. Our strategy is updating the pointers to holes after usual garbage collection is finished. This works because if a hole is live, then there is a live object containing the hole.

Futures

Futures and promises provide another option, somewhere between destination-passing style, unification variables, and laziness. For example, tail-recursive append in Alice ML.

However, while it looks neat on paper, I ultimately think that most of the ways to enable top-down construction are not worth their overall cost, either in language/type-system complexity (destination passing), or in runtime overhead (laziness, transparent futures), or both (unification variables). Or they are too narrow a solution for this problem (tail-recursion modulo cons, CPS). In Alice ML, despite the very general mechanism that its futures provide, we rarely found a need other than toy examples to use them for this purpose in sequential programming.

Indeed the general solution

Indeed the general solution for a 1*n tail-recursive append in a language with control operators would either need the aforementioned linearity constraint. Often times in Typed Racket, I'll want to "optimize" some construct with mutation, and it's always shot down as unsafe in the presence of control.

Yield

What about pseudo-lazy approaches like 'yield', how do they fit into this?

Languages that support this

The naive append is tail recursive in Mozart/Oz because of its use of single assignment variables. Function calls f(x) actually get compiled as f(x,res) where res is the single assignment variable that will store the result. So if you have cons(a,f(b)), then f is in tail position. Like with laziness, the disadvantage is that you pay for single assignment variables everywhere.

ATS has a type T? for an uninitialized version of T. That way you can allocate a list node with an uninitialized tail, which later gets filled in. The compiler verifies statically that you can only use such a value after it is initialized.

Loosely related note: Tag-free garbage collection isn't that simple if your garbage collector is generational. In a normal generational collector there is a write barrier that records pointers from the old to the young generation, so that when we do a minor collection we know that we have to use that pointer as a starting point to trace the live data. When you are tag-free, you can't just store the pointer, since from a raw pointer you don't know the layout of the data that it's pointing to. So in the write barrier you don't just need to record the pointer itself, but also the layout information for what's behind the pointer, so that when you do a minor collection you know how to traverse it.

Hm

I enjoyed reading CTM years ago, but I'll admit I forgot about that part -- much as I was originally embarrassed to have forgotten about AliceML's promises which indeed would have been relevant to the discussion.

Without more context it's not exactly clear how you derive the fact that cons(a,f(b)) has f(b) in tail position from the return-parameter translation. In fact it seems there are two translations:

  g(Y) =
    new X,
    f(b,X)
    cons(a,X,Y)

  h(Y) =
    new X,
    cons(a,X,Y),
    f(b,X)

Only in the latter translation is `f` in tail-position. How do we choose between both? It looks like doing the destination-parameter translation corresponds to a form of ANF form (decomposition of complex expressions) with more ordering possibilities.

If the compiler has to be "clever" about choosing the second translation, I would consider this with suspicion: it's not an absolute fact of the language semantics, it's an heuristic decision much like tail-recursion modulo cons (which is here elegantly expressed).

Also there may be trouble with effects: if `cons(x,xs,Y)` was an arbitrary function rather than a known primitive, that could do side-effects, you would have to choose the translation that corresponds to the evaluation order specified in the base language, which I suspect would be the first one (`f` effects are performed first, then `cons`).

Yes you're right, as far as

Yes you're right, as far as I know this works because in Oz constructors are different than function calls syntactically, so the compiler knows to order constructors before their arguments.

continuations improve correct compilers

From around the same time period, Erik Meijer's "More Advice on Proving a Compiler Correct: Improve a Correct Compiler" (1994) mentions on p.9 that (in his context of refactoring syntax-directed translation to avoid a direct reinterpretation of everything at runtime)

Invariably every efficiency improving transformation is aimed at making explicit an otherwise implicit evaluation order by the introduction of additional continuation arguments.

These days I'd guess we would say his m.o. is to explicitly unzip bulk structures until he gets to machine-sized chunks of data- and control-flow.

(Note that at the end of that paper he laments the lack of machine theorem proving support in order to confidently construct (or at least synthesize) each transformation step; it doesn't help that even morally local transforms involve rewriting almost every rule in his semantics. My current interpretation is that this process is like using Jacobians to change variables in an integration, but that the terms in the integral calculus are very simple compared to what we need to manipulate when using, say, a Montaguian translation base, to pass between algebrae. Do we have better ways to perform these transformations now?)

You still need a linear

You still need a linear typing discipline (so, semantically, you pay the cost of the imperative style), but with an interesting functional syntax.

A linear type discipline is significantly simpler than a full imperative style. Full imperative programming includes pointers, which give you duplicable references to mutable data. The duplicability of pointers means you now can have a pointer equality test, which brings in all of the complications of names, hidden state, and equivariance (a la the nu-calculus) -- which is where most of the difficulty with reasoning about state comes from.

If your program admits a simple, unfancy linear typing, it will be very easy to reason about (even easier than an ordinary purely functional program, in fact).

Interesting

Of course my comment there was much too coarse-grained, but I'm not sure I'm convinced by your symmetric position here.

By "imperative style" I was thinking of the mental overhead of thinking at the returned value in term of sequence of operations building a result, rather than a direct expression describing the value. The "linear typing discipline" I had in mind is Mezzo, which does not allow arbitrary aliasing and duplicable references, so I agree it is not "full imperative programming".

I was trying to say that if you need to embed a linear typing discipline in your programming language, it's not much more work to add linear references with strong udpate à la Mezzo, which corresponds to a form of imperative programming. Thanks for pointing out that reasoning in such a setting is not terribly difficulty: most of the difficulties come from aliasing indeed.

I'm not completely sure what you mean by your last sentence. I'm not sure it's realistic to have users program only with linear types without a bang modality -- outside specific problem domains where this restriction is helpful for important resource analysis, eg. you own work on memory usage in FRP. I'm quite sure linear types will invade mainstream languages at some point, as a way to get finer-grained information in specialized subsets of programs, but probably with the usual functional style retained in most other parts -- that would therefore not be a simplification.

Linear types have already

Linear types have already invaded the mainstream a little bit, e.g. non copyable types in C++. Though types that can't be implicitly destroyed haven't yet entered the mainstream as far as I know.

What I mean...

...is that you get much stronger parametricity properties for linear types.

E.g., if you have a term of the type ∀a. a ⊗ a ⊸ a ⊗ a, then by parametricity you know that it is either the identity function, or it swaps the two arguments, and that's it. In fact, this generalizes in a really nice way: linear functions of type ∀a. list a ⊸ list a are necessarily permutations.

So for example, we could imagine writing a sorting function using linear types. Ordinarily, to prove a sorting routing correct, you have to prove that (a) the result is in increasing order, and (b) the result is a permutation of the input. One thing I'd like to work out is to show how to discharge (b) purely by observing that the sort routine has a linear type and then appealing to parametricity.

That's the kind of thing I mean when I say linear functions are easier to reason about than ordinary pure functions.

Simple linear types

If we have a linearly typed program in our hands we can reason about it. Is Gan, Tov and Morrisett (2014) a reasonable summary of the current state of the art with respect to actually getting those programs in our hands without unduly burdening programmers?

This, plus...

...Tov and Pucella's Alms language, plus Mazurak, Zhao and Zdanciewic's F-pop (see Lightweight linear types in System Fo) pretty much brings you to the state of the art in lightweight linear types. (To bound the length of this comment, I'm ignoring linear capability/typestate-based languages, like Mezzo and Plaid. Also, Rust also has a form of linear (affine) types, but AFAICT it works in a somewhat indirect fashion.)

All of these calculi try to blur the distinction between linear and nonlinear types, distinguishing them fro each other by use. My personal preference is to avoid that kind of inference, and use the Benton-Wadler LNL (linear/non-linear) calculus. This lets you add linear types to an ordinary functional language, without seriously modifying the ordinary functional part of the language in any way. (In fact, my coauthors and I have a recent draft which shows that this style of calculus scales all the way up to full dependent types in an unproblematic way.)

lets you add linear types...

This lets you add linear types to an ordinary functional language, without seriously modifying the ordinary functional part of the language in any way. (

Could you explain how you'd do that?

Intuitionistic arrow

(I've been tempted to turn this comment into a post about neelk's article, but I've refrained myself in the past of submitting un-reviewed drafts, so I'll leave it as a sub-discussion for now.)

The usual idea of linear calculi is that the core abstraction construction is the linear arrow (A -o B), and that the usual functional arrow (A -> B), where the argument may be duplicated as many times as you want, can be encoded, for example as (!A -o B) -- there are other encodings and choosing one in particular has interesting implications. In LNL, Both (A -> B) and (A -o B) are primitive notions that co-exist.

Mixing linear and dependent types is a hard problem. Roughly, neelk's draft above keeps the linear arrow (A -o B) non-dependent, and independently turn the intuitionistic arrow into a dependent Pi-type. In This way you avoid the bad interferences of linearity and dependence, and you have a language where you can mix both. In fact, you can go a little further, and have dependent pairs and functions where the dependency is on an intuitionistic (duplicable) type, but the rest of the type may be linear.

The draft has nice examples of applications (linearity lets you talk about state/resources nicely, and dependency about program invariants), and a decomposition of hoare-triples in term of dependent and linear types. There have been similar decompositions before (eg. Fstar's Dijkstra Monad), but throwing linearity in the mix allows to talk about mutable state in a fine-grained way.

Now for a little personal opinion, with the warning that I didn't read the paper in depth yet. I find the problems when mixing dependent and linear types (or dependent types and sequent calculi, for that matter) rather fascinating. This paper does not solve those problems, it rather side-steps them in a clever way; so if you're looking for "finally a solution to dependent linear calculi", you may not be satisfied with it. I would rather take-away from the paper the fact that, once more, it's been shown that rich program logics can be decomposed in term of simpler connectives (and that linearity is excellent at encoding varied modalities); we could possibly go even further (it's a bit surprising that there are both squash-types [A] for irrelevance and, separately, irrelevant dependent pairs and functions).

There has been a lot of papers working on specialized type theories or separation logics to tackle this and that hard problem, we may see some unification coming, with all those concepts re-expressed from powerful primitives. You need strong meta-theoretic tools for this, and it's interesting that this paper takes inspiration from NuPRL in its formal development.

Pointers

Tradeoff. The judicious use of imperative features including pointers not only leads to better performance, but can also make code much easier to reason about. Just because I can now create aliasing problems doesn't mean I will. Sometimes types help, but if they force you to write suboptimal complex code to achieve the same ends, its not at all clear the ability to reason about the code is improved.

My system has pointers as well as supporting FP. Just because I have pointers doesn't mean, in a particular part of the code, I'm using them. I can still do FP. And I can write imperative code too, and even encapsulate it in functions, so that even though the type system doesn't make such strong promises .. I can still use my brain and reason about the code.

  //$ Concatenate two lists.
  fun join[T] (x:list[T]) (y:list[T]):list[T] =
  {
    if is_empty x do
      return y;
    else
      var z: list[T];
      var last: list[T];
      copy_last (x,&z,&last);
      splice (&last, y);
      return z;
    done;
  }

Destructive imperative code applied to a copy. Close to optimal, and the join function is purely functional. I know it is safe because the join function *owns* the copy. I wrote this before I implemented uniqueness types. copy_last stores a copy of a list in its first argument and also stores a pointer to the last node in its second argument for splice to modify.

Tail recursion modulo cons

- Tail-recursion modulo cons: add a compiler optimization that recognizes these patterns (the non-tail-recursive context is a form of data constructor context that can be turned into partially-defined data and unified later) and make the transformation automatically; I believe Lisp/Scheme compilers have implemented that in the wild, but to my knowledge that is not the case of ML compilers. This is the same kind of theory later used in the Clements and Felleisen work on formalizing stack inspection.

Got to this thread from an ATS talk a Strange Loop where he mentions this optimization. Further searching turned up this discussion on the OCaml lists where Greg Morrissett discusses a few compilers that implemented it in a couple of different ways. Search for "tail-allocation".

TRMC in OCaml

TRMC has been proposed as a (not-yet-integrated) patch against the OCaml compiler by Frédéric Bour in May 2015: RFC: tail recursion modulo constructors #181.

I read Greg Morrissett message (thanks for the reference), but I am not sure TRMC is as easy to implement as claimed. It is easy for lists, but it seems becomes non-trivial as soon as the recursive datatype built by the function has more than one recursive constructor, and it has subtle interactions with garbage collection and non-linear control flow effects.

Oleg has a page about the interaction between destination-passing style (or TRMC) and duplicated delimited continuations, A Time-Travel Story.

It's worth noting that a

It's worth noting that a good solution still seem to be an open problem, as just last week I read a thread proposing a new, faster OCaml List.map implementation.

As you've said below, the performant yet elegant solution still seems out of reach, but at least some performant pragmatic solutions are available. Perhaps once supercompilation matures a little, elegance will reproduce the pragmatic results.

Hm

For OCaml itself, I'm not sure whether the TRMC problem is open by difficulty, or because people are not working on it. I asked Frédéric Bour about it recently and he claimed that most of the issues with his TRMC patch have been solved, but he simply hasn't got around to making a new push.

(I haven't had time myself to carefully read Oleg's proposal to solve the time-traveling issues; I made a proposal that is too complex, I think, to be implemented in the general case, and he has a simplification that may be an improvement.)

Regarding efficient List.map, a factor that comes into play is the fact that the OCaml compiler does not perform unrolling. (If it did, then it would be very hard to beat the non-tail-recursive implementation, and people wouldn't have all these reimplementation competitions.) To me that sounds like yet another "nothing hugely difficult but it requires work that nobody is pouring in right now" rather than a fundamental difficulty.

I do agree with you, from a more high-level point of view (beyond TRMC), that the tension between "for performance and compatibility it's important to use the system stack" and "finiteness of the system stack introduces unpleasant concern in reasoning with non-tail-recursive code" is difficult, fundamental and fairly frustrating. Most non-C languages are seeing a lot of churn and sophistication around these issues, often on the occasion of thread model / {co,go}routine implementations.
Explicit TRMC is a partial solution to this problem (it only works in some cases, and being explicit rather than transparent to the user doesn't fully solve the unpleasantness) that I have come to appreciate.

Oleg has a page about the

Oleg has a page about the interaction between destination-passing style (or TRMC) and duplicated delimited continuations, A Time-Travel Story.

It's interesting how much the optimized map looks like a Confused Deputy. The map function has amplified rights in relation to its callers, in that it can see a mutable version internally, and it's tricked into exercising that authority improperly via the stack manipulation of the delimcc library.

Control/sate

Some time back I started to write out a list of the important ideas we've isolated in a half-century or so of programming. Stuff one could be pretty sure would still seem truly fundamental in another century or two. The list, I recall, was very short (maybe two or three items?); but one of the things on it was the distinction between data and control (though this dichotomy isn't even specific to programming, but crops up all over: nouns and verbs, matter and energy, space and time). It seems we're still looking for the right balance/relationship between the two. OOP tried to glom control onto data with, it seems to me, some initial success due to being more integrated, but bogs down in the long run because it really isn't a general solution to the integration problem. This sounds like another part of the elephant, hitching a bit of the data side of things onto control. I do wonder what the whole animal looks like.

what were the other 2 then?!

don't leave me hanging on a limb, here...!

regarding the yin/yang of it all: i always wish there were some ultimate and concise solution but so far i ain't heard of it, so i think the wisdom must boil down to the classic retort to all non-math interview questions: "it depends."

my list, fwiw

Heh. Well, fwiw. I said "two or three" because I think maybe I had three on my list, but whenever I think back on it I can only remember two. Control/data. And the broad concept of an "operating system"; roughly, an underlying housekeeping program that provides services. No doubt somebody else might come up with a very different list; but those are the two I remember listing.

I'm reminded of George Pal's rendition of The Time Machine, where at the end George has returned to the far future to rebuild the world from scratch, taking three books with him — but we don't know what three books they were. Any list of three books Pal came up with wouldn't have held a candle to asking us to speculate.

Can we have the beautiful?

Is it reasonable to expect the naive, elegant `append` function to be the Right One?

A few years ago I was distinctly of the opposite opinion: automatic, implicit optimization hides semantics from the surface language in ways that are hard to keep well-specified and orthogonal -- having surprising performance heuristics in your compiler is not the ultimate language design. I wanted my language design to let me be explicit about efficient way to compute this, instead of doing it by magic.

Recently I've thought again on the issue, and I'm a bit tired of having to ask the beginners to write the beautiful version, and then decide whether I'm satisfied with this, or I should go further and explain about tail-recursion and that other, better, but cumbersome way to write the function.

This renewed my interest in, for example, tail-recursion modulo cons: can it be specified in a rigorous, robust, predictable way? I'm not satisfied with most of the solutions brought by the above discussion (destination-passing, explicit use of unification variables or promises, one-hole functions), because in the end the code is much uglier than it should be.

Can we have a beautiful semantics where the beautiful version is the final one?

Or maybe we're looking at the wrong problem, strict lists are a bad datastructure to use, and we should be using well-engineered sequences that strike the right balance between on-demand production and bulk computation. Does this frustrating problem only happens with list, or would side-stepping lists just bury it deeper in some less well-known problem domain?

maybe not in a functional language.

Maybe the answer is functional languages are not the right model. The list is a simple data structure, replacing it with something more complex and bizantine to get good efficiency seems like something isn't right. Surely nothing can beat the single write required in a random access machine to append two singly linked lists? Its also a fairly obvious solution, and that may suggest it is the right model.

Its possible to write a naive, simple and efficient list append in an imperative language, without using a data-structure you need a PhD to understand.

beauty is truth, truth beauty

For me, the goal of programming is to write beautiful code, in the sense that it's obvious by looking at it whether or not it's right. I'm mulling over a possible blog post on this; the goal is to arrange that errors can't hide: if something were wrong, it could be readily found by reading the code. It ought to be possible to seriously study the ways that code succeeds and fails by this criterion, and therefore how to maximize it.

This is, I think, why I'm repelled by language features that limit how things can be expressed: I don't expect any mere formal restriction to be able to circumscribe all the really beautiful solutions, in a manner poetically akin to the way no consistent system of logic can circumscribe all consistent reasoning (Gödel's theorem(s)).

Form liberates

Quoth Fred Brooks:

There are many examples from other arts and crafts that lead one to believe that discipline is good for art. Indeed, an artist's aphorism asserts, "Form is liberating." The worst buildings are those whose budget was too great for the purposes to be served. Bach's creative output hardly seems to have been squelched by the necessity of producing a limited-form cantata each week. I am sure that the Stretch computer would have had a better architecture had it been more tightly constrained; the constraints imposed by the System/360 Model 30's budget were in my opinion entirely beneficial for the Model 75's architecture.

Similarly, I observe that the external provision of an architecture enhances, not cramps, the creative style of an implementing group. They focus at once on the part of the problem no one has addressed, and inventions begin to flow. In an unconstrained implementing group, most thought and debate goes into architectural decisions, and implementation proper gets short shrift.

This effect, which I have seen many times, is confirmed by R. W. Conway, whose group at Cornell built the PL/C compiler for the PL/I language. He says, "We finally decided to implement the language unchanged and unimproved, for the debates about language would have taken all our effort."

Sic et non

Sure. I'm a big fan of form. It's rigidly constrained form I'm not keen on. An artist should know the rules, and when and how to bend or break them. There's a difference between sketching an outline on a canvas and then painting over it, and using your pretty crayons to color in the spaces in a coloring book.

This is essentially why in my definition of abstractive power I sought to address not the ability to do stuff (which is expressiveness, in Felleisen's sense) but the ability to make a wide variety of determinations about what will be possible to do later.

destructive append

Surely nothing can beat the single write required in a random access machine to append two singly linked lists?

Well, that's a destructive append. Mutable lists get nasty and ugly in their own use cases, e.g. if you want structure sharing or parallel processing or multi-version concurrency control.

While finger trees are awesome, they're also a fine example of "a data structure you need a PhD to understand". Yet, with O(1) access and update to both ends, O(min(lg(N),lg(M))) append, O(min(lg(K),lg(N-K))) split and read/write access to the middle, and non-destructive updates, I'd be inclined to say they do beat imperative singly linked lists quite soundly in almost every parameter except for being easy to understand. :)

Simple incremental streams, e.g. `type a Stream = µS.(1 → ((a*S)+1)`, are pretty nice and simple, though. They have the same asymptotic append cost as lists - O(N) in the number of elements in the first stream. But that cost is paid incrementally as we process the stream rather than all at once.

do they?

Do they? If you share a list you expect changes to be visible everywhere - that does not cause problems. I guess you are referring to problems with multi-threading? But maybe multi-threading is the problem. State machines communicating by message passing is one way to solve that by not having threads. If you want threads then using software transactional memory as the memory model for the language might work.

I guess the difficulty is there are two ways to refer to a list, you either want it to behave like a value or a container. In an imperative language you make the choice explicit (pass by value or by reference) in a functional one you limit to only the 'value' - sacrificing expressivity for simplicity.

Maybe ML gets is about right, but people just need to accept programming with references is not 'bad'.

I don't see how finger trees beat a singly linked list, providing you have a tail pointer in the list 'head'. Append and split are O(1)?

Also I may be misusing the term functional? If we just mean first class functions. I am not even sure pure functions excludes mutating arguments passed by reference (as the output of the function depends only on the values of its arguments). So I'm not quite sure where the requirement for immutable data comes from? I suppose its referential transparency, but I wonder about memoising the function as a graph rewrite? Effectively the value of a function can be seen as a constant graph rewrite operation recovering referential transparency?

pervasive mutability

Do they? If you share a list you expect changes to be visible everywhere - that does not cause problems.

Pervasive mutability is a cause of a great many bugs in real applications - even when we expect the visibility, even in single-threaded applications. Consider reentrancy, and iterator invalidation, and callbacks, and the growing number of possible state interactions. Concurrency exacerbates state problems by allowing even more interactions. Even if your brain is big enough to track all these, you might be better off spending its finite resources elsewhere.

two ways to refer to a list, you either want it to behave like a value or a container [..] in a functional one you limit to only the 'value'

Not really. Lists are often used to model containers in functional languages. It's just treated as a separate concern, e.g. using explicit `STRef [a]` or `State [a]` or similar. Separation of concerns is usually a good thing, and this case doesn't seem to be an exception.

people just need to accept programming with references is not 'bad'

What does 'bad' even mean in this context?

Is it?

Is pervasive mutability the problem? Hardware design uses pervasive mutability, and has a much lower error rate than software. Could it instead be an architectural problem with failure to componentise properly and limited substitutability and parametricity of components? Again hardware design (essentially VHDL programming) does much better on this front.

Your first paragraph is what I mean by 'bad' :-). I am not sure there is much difference between C where things can be const and ML where things can be a mutable ref. I am not concerned whether things are default mutable or immutable, as long as both are available. I guess the difference is you would have to make your own list out of value 'maybe ref' pairs to gain control over the links in ML.

Does it?

Hardware design uses pervasive mutability

AFAICT, hardware design leaves the vast majority of stateful logic to software. If you just look at what the hardware is doing - e.g. processing sound buffers, message buffers, etc. they might use some local indices and iterators per message but not much by way of long term state.

Computing hardware certainly enables pervasive mutability, but it seems to buck most responsibility for deciding what that state should be or how it should evolve over time.

hardware mpeg encoder.

Chips like the hardware mpeg encoder I worked on a just as complex as software, and pretty complex software that requires standards compliance. You feed yuv video in one end and an mpeg stream comes out the other.

Seems like an exception to

Seems like an exception to prove the rule. :)

Yes, hardware certainly can implement ad-hoc software, and sometimes this is worth doing - e.g. to save power and reduce latency compared to using a general purpose computer.

But I expect you'll also find that your earlier claim - that hardware has much lower error rates than software - does not readily apply as you shift more logic into the hardware. There is plenty of buggy hardware and firmware.

$500,000 bug

When a new mask set can cost $500,000 (although that was a few years ago now), they make sure there are very few bugs. Certainly a much lower error rate than released software.

I think the issues you listed earlier can be addressed in other ways, for example encapsulating state in objects that are verified against a specification.

Valid conclusion?

Hardware design uses pervasive mutability, and has a much lower error rate than software.

You compare the outcome of two scenarios (mutability in hardware versus software) and observe that one of the scenarios typically fare better than the other.

How can you draw any conclusions at all based on that observation since project budget, team size, programmer skill, the cost of bugs, quality assurance level and a swath of other factors probably differ between the two?

Ignoring those issues, what is the intuition behind mutability not causing problems? There are plenty of research papers on reasoning about stateful programs published every year so there must be something about it that we haven't quite figured out just yet.

python?

Python programmers write fine code. As you say with so many variables its hard to say either way. I just don't buy mutability creates more bugs.

The arguments about reasoning seem to be something about mathematics rather than intuition. This seems more to be about lack of adequate mathematics.

I would guess that bugs depend more on overall complexity than mutability. I would suggest that sticking to an immutable model which increases the complexity of the algorithm and data structure would result in more bugs. Of course there are also clear cases where immutable models are a clear win as they result in simpler cleaner code.

Python?

Python programmers write fine code.

Perhaps so, but not in Python. :)

I just don't buy mutability

I just don't buy mutability creates more bugs.

Have you never had to restart your computer or a process or because it entered a bad state? Has LtU never been interrupted on you because some table crashed? It seems to me that most software (and hardware) bugs involve entering bad states or failure to keep states consistent.

Ultimately (as you mention), it's complexity that results in bugs.

However, state is fundamentally a complexity multiplier. The number of possible states and paths to reach them in a system is exponential in the number of stateful components (both explicit, like mutable variables, and implicit, like scheduling). You can mitigate complexity growth to a significant degree via encapsulation, architecture, and discipline. But you can only avoid the multiplier by avoiding state.

It turns out that some state is necessary, a natural part of the problem. But in practice we use a lot more state than necessary, resulting in more complexity than necessary, and more bugs.

The arguments about reasoning seem to be something about mathematics rather than intuition.

Human intuition is not always an effective substitute for reasoning. We humans have a lot of biases, and we have almost no intuition at all for exponential functions.

Avoiding state does preserve some simpler intuitions about how different subprograms interact, at a relatively minor cost to expressiveness. There are many ways we can seek to support our intuitions mathematically.

State of computer science.

As you point out state is necessary. A computer with no state would be useless. So rather than avoiding state like an unwanted guest at a party, we should embrace it as the definitive element of computation, as without state all you have is a filter. State should be the center of computing as the unique thing computer science has given to the world that mathematics did not have before. Rather than marginalising state it should be the central focus. Ignoring the difficult bits and pretending they don't exist was never a good solution for anything. Rather we should strive to solve the problems of mathematical reasoning that make it inadequate to reason about state.

The problems with reasoning

The problems with reasoning are merely a direct consequence of the (ever-underestimated) inherent complexity of mutable state. The only scalable way of dealing with that complexity is abstracting it away into something more well-behaved.

Scalable Abstraction

But is the abstraction really scalable? I have seen no proof that a general abstraction does scale. I have no doubt that there are application specific scalable abstractions, but I would hypothesise that there are no generally applicable scalable abstractions.

I should also point out that I am not the only person to think like this, Alexander Stepanov, proposes more or less the same idea, except he is a mathematician, and probably knows a good deal more about mathematics than most computer scientists. It is interesting that the other prominent mathematician turned programmer I can think of (Donald Knuth) writes code in a stateful way.

Knives and fire are necessary

Knives and fire are necessary for cooking. That doesn't mean we should leave them exposed and strewn about the kitchen. Using more knives or fire than necessary isn't a good thing. Embracing knives or fire is a painful and ineffective exercise. Similar can be said of state in computing.

(Your "X is necessary, so embrace X" argument is not sensible.)

Nobody has suggested we ignore state or pretend it doesn't exist.

If you can come up with some simple, scalable ways to reason about exponential state spaces, more power to you. I suspect the same technique will unify P with NP.

Learn how to use tool X.

(functional languages ignore state and pretend it does not exist, or marginalise it into a monad or 'ref'. Admittedly this view is probably overly hash as I have a lot more experience with Haskell than ML. The irony is I prefer the way Haskell deals with things like functors and type classes due to its purity.)

A master chef knows how to use tool X. They train with it. Many safety gadgets have been proposed, and are found in amatuer kitchens, but the for the professional there is no substitute for skill with the original.

By the way metaphoric arguments are weaker than direct arguments, so unless you wish to trade ever more obscure metaphores, you should find a direct argument. If you find it hard to make a direct argument, could it be that intuitions about tool X are not applicable to computing?

I would suggest the trick is to have mutable state but avoid the exponential space part by partitioning. In many ways functional languages do not remove the state, they just hide it or remove it from reasoning. As such what is the advantage for reasoning? Really there is a state machine, and you reason about the state transitions. I would suggest that very rarely do people store state they do not need.

I think the question of over use of state is a very interesting one. I guess we are talking about algorithmic use of state, rather than actual program state. I don't think algorithmic state is particularly a problem in isolation either, the problem is probably to do with side-effects and aliasing? Combining two algorithms that each internally use state should not result in an exponential increase in state space, if any increase at all (as functions the states would not exist at the same time in sequential execution).

The problem is probably also to do with the general level of precision required. Stepanov has a great example where he asks people to write a min and max function. Very few people can even write a correct min and max function first time, so what hope do we have for larger programs?

To be constructive for a moment, I think the answer might be something like mutable state inside pure boxes, like Haskell's state monad, which functionally models state, such that 'runState x' is a pure function. We could treat stateful functions and objects as if they were pure, or pure operations on an external 'database'. I think all the type system mechanics exist for this kind of thing, it probably just needs a nice restrictive language that turns it from an idiom into a requirement, and a compiler that can compile the inner-stateful code directly. Haskell's pure state monad suffers from the compiler first transforming away the state monad, to its internal functional intermediate language which gets compiled onto the spineless-tagless-gmachine runtime. I would want to translate the imperative code directly to runtime-less low-level C/machine-code. In other words I want to exploit the duality, and type check it as pure code (for reasoning about it) and compile it as imperative code (for performance) in a similar way to Prolog where you reason about it as first order logic, but it executes according to Prolog's operational semantics.

exploiting time

I've been suspecting, lately, that the attraction of imperative programming has something to do with conveniently exploiting the similarity between time as experienced by the programmer, by the compiler, by the running program, and by the world in which the running program is embedded. This may sound like a silly, trivial example, but a while back I found I needed a regexp in javascript (nasty language, but I digress) for a string with matching parentheses nested up to four levels deep. I could write a regexp for a string with no parentheses, then work out a prefix-and-suffix to embed it in that transforms it into an expression with up to one matching set, and do that four times to get what I want. But I'd end up with a humanly unreadable, effectively write-only regexp, and not only would I have to save my notes till I'd tested it and was sure I had it right, but this was going into a public setting where likely somebody else would want to increase the depth, perhaps years later, and wouldn't have my notes to work with. So what did I do? Instead of writing this unholy monster of a regexp, I set up a string variable and a for loop to build a string representation of the regexp at runtime. With a comment that said "SINGLE POINT TO ADJUST NESTING DEPTH". Debugged it, and then anybody any time in the future who wanted to change the nesting depth would only have to change the upper bound on the for loop. No mess, no fuss. Like I said, this is sort-of a trivial example. But I found it quite interesting, because I'd deliberatey pushed this process from programming time to runtime for the express purpose of making it easier to be sure the code was correct and would stay that way.

Amusing

I find your example amusing because that's one way I design (or teach) recursive functions. I work out the closed expression for n=0,n=1,n=2, and then I abstract it into a recursive function that covers the general case. A poster example of this would be a function to enumerate (say, print) all the words of length N for a fixed alphabet; make the beginner write one loop for words of length 1, two loops, and then wonder about how to make the "number of nested loops" a variable itself.

Loops/jumps are good at expressing repetitions of instructions that linearly follow each other (in time and meaning). Recursion is good at expressing repetitions of contexts/transformations that compose each other.

indeed

Heh. It does seem rather like some sort of exercise in composition, doesn't it. Though, carrying the thought back to where I came in — since my object was to code in a way that could be readily made correct and kept correct (curiously contrary to the idea that state multiplies complexity), recursion would not have served me nearly as well since a separate function would have been much more cumbersome. (For recursion to be the most lucid solution in this case, the language would presumably have to have very lighweight notation for embedded first-class function expressions.)

domain specificity

There is a level of domain specificity in every programming language or style. Even if that level is really general. When the domain which you reason about is fundamentally operational and implies state changes, it might be necessary to account for them as state changes in the language.

I read the example you gave in such a way: programming itself, as an activity, is a state change to the program. So if you want to move fore and back between programming and program, it can be necessary or helpful to make change explicit. Even if you can factor out all state eventually and end in a purely functional system.

It can be a nice constraint having to keep state out, nevertheless, of course.

Time and time again

Kind of a fascinating little example (about which I'm glad to be reminded).

programming itself, as an activity, is a state change to the program. So if you want to move fore and back between programming and program, it can be necessary or helpful to make change explicit.

Yes! In the example I was exploiting the similarity between time experienced by the programmer and time experienced by the running program. Of course, one could imagine writing it in an optimizing compiled language that would shift the regexp construction from runtime back to compile-time — itself exploiting the similarity between time for the running program and time for the compiler.

Even if you can factor out all state eventually and end in a purely functional system.

Hmm. The example was using state to provide clarity in aid of code maintenance into the indefinite future; so it's not clear one would ever want to factor it out.

It can be a nice constraint having to keep state out, nevertheless, of course.

Sometimes, perhaps... and yet, as soon as one wants to do i/o, it seems, there's a temptation to exploit the similarities between time for the external world, for the running program, and for the programmer. Rather as if time is another of those "Platonic solids" the other thread was discussing.

interactive programming and embedding (edited)

From my experience with the more radical approaches to interactive programming (where programming is understood as integral part of the program) I suspect that there is an unexplored spectrum of possibilities of the relation between state change and general law (I find that this relation is best exposed when you can program at runtime). So time is nothing obvious here.

Hmm. The example was using state to provide clarity in aid of code maintenance into the indefinite future; so it's not clear one would ever want to factor it out.

Yes, just as much as I see each paradigm as a productive constraint, I agree very much with you that sometimes you may just not be able to factor out something like reassignment or explicit change.

In this context, it might be interesting to think about how languages are embedded in each other. For example, I usually work in a language that embeds stateless sublanguages in a stateful language. But in Haskell, it is done the other way round. With reference to your work on hygiene, I'd be curious to understand better how to organize the transitions between them.

cross-paradigm linkage

In this context, it might be interesting to think about how languages are embedded in each other. For example, I usually work in a language that embeds stateless sublanguages in a stateful language. But in Haskell, it is done the other way round. With reference to your work on hygiene, I'd be curious to understand better how to organize the transitions between them.

This thought resonates with themes I've tinkered with, going way back. There's the transition between different levels in a syntax tree —cf. RAGs— and the transition between languages induced by any declaration in a program —cf. abstraction theory— but there's something about the linkage between languages of different paradigms that it seems neither of those quite captures.

how to tell the boundary?

There are a number of questions involved, even when the syntactical rules of a host language are not changed by an extension. E.g. in Haskell, a change in signature changes the binding of functions. In Scheme or Smalltalk, the basic constituents are already interpreters, so embedding is rather fine-grained.

When "only" the semantics of the embedded language differ, in how far the semantics of the host language are also changed? And how to tell the boundary between them? This difficulty seems directly related to the smootness conjecture.

In the initial example of interactive programming, an extension of a language can even happen at runtime. So parts of an already running program are retroactively reinterpreted by code that is added or edited.

Somehow this all falls into an intermediate space between calculus/algorithm (branching in) and grammar/algebra (branching out).

Draft

In this context, it might be interesting to think about how languages are embedded in each other.

We have an unpublished draft about "multi-language programming systems" and how to state properties about such systems that I was interested in (the ability to guarantee that, if you only know some of the languages involved, your mental model is not broken by the addition of the other languages): FabULous Interoperability for ML and a Linear Language, Gabriel Scherer, Max New, Nicholas Rioux and Amal Ahmed, 2016. It's quite different from John's approach I suppose (in particular, we use type systems to limit interactions and prevent the abstraction-breaking ones).

(Another interesting line of related work is the work on fine-grained language composition by Laurence Tratt, Lukas Diekmann, Carl Friedrich Bolz and their co-authors.)

The direct argument made by

The direct argument made by dmbarbour is that state is a complexity multiplier. If functional languages manage to remove that from reasoning there is one multiplier less to reason about. Your argument is that you are not prepared to pay the cost of language support because you can already carefully partition your program to avoid the blowup in complexity. Is the same true for all your colleagues? Do you never have a bad day where you make a mistake?

I'm not sure I understand the last paragraph, but it seems to conflate language design and compiler implementation issues. The STG machine is an artefact of the Haskell implementation you are looking at rather than something fundamental that is required for compiling monadic code. A compiler for a monadic version of ML would not suffer from that particular problem.

Language Constraints

If I write a library the language constraints/axioms should prevent users from breaking the assumptions, so once my library is written nothing outside the library will 'multiply' the complexity. Where is the exponential blowup? A validated library is validated.

I think there are too many variables to make simple conclusions about mutability, for example I think strong type systems with parametric polymorphism and first class functions contribute significantly to correctness. My intuition is that they contribute more to correctness than immutability.

As for the STG thing, you are correct that ML would not have that problem, but it is still significantly slower than C. It should be possible to match 'C' with an imperative mini-language (or do better with control over aliasing) that has a pure interpretation using a state monad.

validation and complexity

You seem to deeply misunderstand what it means for state to be a complexity multiplier.

Let's say your library encapsulates and uses state - e.g. ten mutable boolean flags. You've validated your library independently. Yet, the client of your library must now be aware of those flags (and, generally, the state of your library) in order to use your library correctly. Of course, your client also wants to use another library, which might have its own states to track. (Without loss of generality, we could substitute 'objects' if you'd prefer not to allow global state at the library layer.) The state space and resulting complexity burden for using the two different stateful libraries or objects will effectively multiply.

This burden is on the client, not on the developer of each individual library. Of course, as a library developer, you'll be a client of yet other libraries. So, in the general case, you'll also experience this complexity multiplier.

The fact that your library is validated is not relevant except for isolating errors and casting blame. And casting blame on human developers for failing to keep up with an exponential complexity burden arising from use of state is not the right answer. Even machines can't keep up.

Validation only proves your programs meet a specification. It does not prove that said specification is simple, sane, minimally stateful, etc..

OTOH, if you're assuming a library of pure functions, such that internal state is never exposed to your client, then you misunderstand what it means for a software component to be stateful. A pure function is, by nature, not stateful regardless of how it is implemented. Of course, building pure functions from stateful concepts suggests you aren't building them out of pure functions. Thus, your ability to reuse and abstract code is either greatly limited... or you're reusing and abstracting code at the stateful-concept layer, at which point you're back to stateful reasoning and the resulting complexity multipliers.

Misdirection.

I think the state is a complexity multiplier is a distraction. The state is there like it or not, and the functional program has to represent the same state as the imperative program in order to implement the required functionality. We have to deal with it, and this kind of state is in the worst case exponential.

So the difference between the imperative program and the functional program is not in the essential state, which is the same for both, but in the statefull computation of algorithms, for example computing max:

_max x Nil = x
_max x (Cons y tail) = let m = if (x > y) then x else y in _max m tail
max = _max 0 

vs

max(in list) = {
    mutable m := 0;
    foreach x in list {
        if (x > m) then m := x;
    }
    return m;
}

But if as we have said, the 'interface' of 'max' looks pure from the outside, then it may as well be pure. If we insist that function arguments are read-only and return values are read only, copies, or moved, then we can enforce that max looks pure.

value semantics deal with a lot of the problems, so that a := b where b is a list will create a copy.

So with this in place, the algorithmic state does not really contribute to the global state and would not result in an exponential increase in the state space.

not just essential vs. algorithmic state

What makes you assume those ten variables I mentioned were all essential state? Contrary to your position, there are many sources of non-essential state other than just "algorithmic state". Polling, explicit caching, stateful views, observers and callbacks, dirty flags, delayed initialization, mixing IO with calculations, and various other imperative patterns contribute many sources of non-essential state to imperative code.

Regarding your particular language idea:

You argue for a highly specialized and constrained use of imperative code, e.g. similar to Haskell's ST monad, but local to each function. However, so far, you're using only toy examples.

If you move beyond toy examples, you'll eventually be tempted to abstract that imperative code, e.g. passing mutables as arguments, capturing them within structures, building imperative libraries and object models. OTOH, if you resist temptation, you'll be left with a language that neither abstracts nor decomposes, where you're forced to copy and paste ever larger segments of imperative code.

You'll ultimately be better off allowing developers to abstract the imperative code. Even in Haskell, nearly every distinct `State` monad gets its own small problem-specific library of utility functions, and there are many utilities for IO. The cost will be shifting back to imperative reasoning and patterns, potentially for large imperative subprograms and architectures.

Not Sure.

Just because you have mutable state does not mean you have to use it everywhere. Infact I am not even sure mutable state and imperative are the same thing. We could have an imperative language with immutable variables. How would adding mutability to this be different from the 'ref's already in ML?

I don't see a problem with passing mutable arguments, providing assignment is by copying.

I dealt with structures in my response to your other post.

I also disagree with your statement about lack of abstraction composition. Imagine you have a library of low level imperative parametrically polymorphic components and you just write a logic program to express the computation. You can then replace parts of the logic program with higher-level components where performance is a problem.

assignment calculus

We could have an imperative language with immutable variables.

It is my impression that assignment is central to imperative concept. Indeed, someone even developed an assignment calculus as an analog to lambda calculus for purely imperative programming.

providing assignment is by copying

Assignment by copying/moving will work. My own language operates similarly - e.g. the increment word ('inc') will remove a number from the stack, add one to it, and put the result back on the stack. While this is implemented as a short sequence of purely functional bytecodes (r#1+l), the language manages a very imperative feel modulo aliasing.

But I'm curious as to whether you'll be able to achieve your performance goals under this 'assignment is by copying' and your earlier 'function arguments are read-only' constraints. My language certainly couldn't implement the imperative singly-linked list with O(1) append, which relies on aliasing (the last node in the list has two references to it). Can you do your singly linked list example? or your favorite union-find?

I urge you to try your ideas with more sophisticated examples than 'max' or simple folds. Your earlier functional max by someone that knows fold would simply be: max = foldl Number.max 0. It gains nothing from your algorithmic state.

Performance and Predictability

It gains performance and a predictable memory usage.

Imperative languages without mutable variables

ML is an obvious example, and it's common for Scheme compilers to convert mutable variables to immutable variables holding reference cells ("boxes"). Basically a mutable variable is a second-class reference cell.

Relevance?

I don't see that the specific representation of shared mutable state is significant to any of the discussion in this thread. Whether it be language variable or an aliased actor modeling a 'cell' or a filesystem read/write.

the dose makes the poison

By the way metaphoric arguments are weaker than direct arguments, so unless you wish to trade ever more obscure metaphores, you should find a direct argument

I wasn't making a metaphoric argument so much as demonstrating that "X is necessary, so embrace (centralize, pervasively use, etc.) X" is an invalid form of argument. It comes up pretty often.

IMO, a more appropriate adage and metaphor is "the dose makes the poison". Using more state than necessary is less than beneficial.

I would suggest that very rarely do people store state they do not need.

That doesn't match my experience at all. I expect somewhere between 1% and 10% of state in most applications, services, and systems is essential.

Much unnecessary state comes from accumulating events and statefully maintaining 'views' of data - often replicated many times in a system - when functional transforms of canonical views would work well. Other unnecessary state comes from laziness - e.g. 10MB save files in video games (zip the whole world data) when about 100kB would do if you just saved the necessary player data and nearby hostiles and such.

functional languages ignore state and pretend it does not exist, or marginalise it into a monad or 'ref'

Minimizing state and modeling it explicitly (e.g. with monads or references) is not the same as marginalizing it. Just the opposite: making state explicit allows us to more precisely think about and express state whenever it is necessary to do so.

I would suggest the trick is to have mutable state but avoid the exponential space part by partitioning.

If you have a thousand mutable booleans, the state space is exactly of size 2^1000 regardless of how you partition those booleans. Perhaps you can restrict some irrelevant combinations and interactions. In that case, the effective state space might be closer to 1.2^1000. But it's still exponential. State is fundamentally exponential in complexity (not just in combinations; also in permutation of events and intermediate states to reach a given state). I doubt any "trick" you can come up with will avoid this.

I think the question of over use of state is a very interesting one. I guess we are talking about algorithmic use of state, rather than actual program state.

I'm talking mostly about actual program and system state. If we pretend pure algorithms could execute instantaneously, there'd still be state to account for user inputs, sensory data, displays, actuator outputs, simulation, etc..

Algorithms can be stateful to the extent they are applied along a temporal dimension. For example, real-time stream processing of YUV data to MPEG is fundamentally stateful regardless of whether you model this functionally or imperatively. Fortunately, the state for an MPEG encoder is relatively short-lived, e.g. up to the next I-frame, which makes it more resilient to the sort of flaws stateful systems tend to have.

A major reason stateful programming bugs are problematic involves long-lived state; the ability for state errors to propagate, grow, and taint systems over time; the resulting temporal distance between the source of an error and its symptom.

Combining two algorithms that each internally use state should not result in an exponential increase in state space, if any increase at all (as functions the states would not exist at the same time in sequential execution).

Indeed. For algorithms that don't have a temporal aspect, this is the case.

I think the answer might be something like mutable state inside pure boxes, like Haskell's state monad

You should check out the ST monad. ST allows mutable references scoped to a pure function. Of course, given dependent types, you could model ST using State (modulo performance).

We could treat stateful functions and objects as if they were pure, or pure operations on an external 'database'.

I assume by 'stateful functions' you refer to pure functions implemented using mutable references and such under the hood. I wouldn't consider those stateful. From an external perspective, they aren't stateful. My main concern with your proposed approach regards lack of compositionality. Your functions aren't constructed of other functions, but instead of stateful concepts.

I do like the idea of focusing on pure behaviors and operations with externalized database-like state.

I want to exploit the duality, and type check it as pure code (for reasoning about it) and compile it as imperative code (for performance)

A lazy language isn't the right place to start, if that's your goal. But Haskell's ST monad does seem relevant.

Many small doses increase tolerance.

When I said people do not tend to _store_ unnecessary state, the emphasis was on store. Applications don't tend to store unnecessary state in files and databases. Generally people know what information they need to persist, and store it appropriately. I think state living longer than necessary is genuine problem.

Functional languages make it hard to work with state. Making something difficult is not the same as minimising it. Making something difficult makes hard to debug problems more likely. I think we can have a language that makes state easy to work with, but still controls it appropriately.

I disagree with the 2^N exponential argument. Very few problems truly have exponential structure, as can be evidenced by SAT solvers. Although SAT should be NP, many useful real world problems do not have the worst case complexity. This is a fundamental problem with 'O' notation. For many problems average, or best-case complexity is more appropriate. Also with state encapsulation the states spaces cannot interact, so it is not as simple as you make out.

The ST monad from Haskell is precisely what I was thinking about when commenting that you can have a pure functional implementation of an imperative algorithm.

To address your concerns about function composition, if they are externally pure, they can of course be composed. I guess you are talking about 'decomposing' functions? There is nothing stopping declarative implementations being available. In fact I would suggest you would start with a simple declarative implementation of all the functionality (done naively so as to be implemented in the obvious way), and you would optimise by replacing with low-level code. In this way you would start with a program specification, and gradually enhance the performance. I am thinking of a logic language as the specification meta-language.

store

When I said people do not tend to _store_ unnecessary state, the emphasis was on store. Applications don't tend to store unnecessary state in files and databases.

I understood this. And I believe you're overly optimistic.

Very few problems truly have exponential structure, as can be evidenced by SAT solvers.

True. And we have corresponding structured-state in the form of architectures and such. OTOH, even SAT solvers can easily be stymied if we express our problems statefully, e.g. with temporal logic or staging or other sophisticated relationships. Use of non-essential state can easily hinder reasoning about a problem.

State is fundamentally exponential. The fact that we can find some structure is useful and helps in-the-small, but doesn't change this fundamental nature. Ultimately, we cannot reason about precise state of large stateful systems. We can focus on compositional 'laws' - e.g. safety of HTTP GET, or capability security, and so on - and use these to guide reasoning.

with state encapsulation the states spaces cannot interact

Encapsulation of state, e.g. into an OOP object, does not prevent state spaces from interacting. The client of the object must remain aware of its state, and the state of other such objects. (Encapsulating state is not the same as isolating or confining it, such that it is never externally observable.)

you would start with a simple declarative implementation of all the functionality (done naively so as to be implemented in the obvious way), and you would optimise by replacing with low-level code.

You can certainly pursue this route for pure functions. I expect some challenging design and integration aspects will arise as you begin addressing parallelism, real world state, and concurrency.

You'll also need to consider how copies of values in the functional layer are exposed to your imperative layer for performance purposes (deep copy? copy on write? etc.). Conversely, you mentioned appending two singly linked lists in a single operation. How would you preserve this property for linked lists in the functional layer? What performance pressures will your developers have to stick to just one layer or the other? (These questions aren't rhetorical, but I don't mean to engage you in a design discussion here.)

I expect the performance benefits will be rather marginal in most cases, with occasional exceptions like union-find (which I've never needed, but some people swear by). Meanwhile, there are alternative approaches to improving performance for purely functional code, e.g. accelerators for known EDSLs (cf. Haskell's Accelerate). Given a limited complexity budget for a language design, I'd favor some of the alternatives.

Large performance gains

As always, measurement is the important thing, it would have to be significantly faster to be worth it. Strangely union-find is one of the algorithms I find myself using a lot.

As for the state space of objects, exponential is worst case, for an object that has no intrinsic structure. It would be pointless to have an object with no structure, you might as well just store the stuff in an array. To make an object worthwhile it should have some structure, lets take a set object as an example. The exposed state space of a set object obeys predictable mathematical axioms. It does not have exponential complexity. because order is irrelevant, in fact the complexity would be something like N (linear in the number of items in the set). Internally the set may be implemented as a binary tree, with pointers etc. The internal complexity is clearly much greater than the external, and hence validation to 'seal' the object is useful.

state space of objects,

state space of objects, exponential is worst case, for an object that has no intrinsic structure

True. And if all you had was a single object, that thinking would work pretty well. But you'll ultimately be dealing with much more than one object, much more than one structure. And even if each individual object is well behaved (modeling sets, stacks, state machines, etc.), the program as a whole tends to lack a simple lawful structure about the evolution of its state or how said state will affect program behavior.

Well, if you have capability security laws or similar for the program as a whole, those can be useful for reasoning about how state evolves and influences behavior. And there are some declarative state models with nice whole-system guarantees about idempotence, commutativity, and monotonicity. But even with nice laws, it's better (for simple reasoning) to avoid non-essential time-varying state.

Sets, lists and stacks.

I just don't experience these problems in reality. If a list of actions, and I put the next action in it, I know I can read the actions out in order. I have an array of objects and I sort them in-place, I now have a sorted array (especially if the sort function is proved correct).

Personally I think the problem is people constructing their own random data structures, rather combining well modelled components with a single abstraction (like a list, queue, deque, btree, etc). This even extends to parallelism, I have seen C++ programmers with raw locks all over their code, instead of writing one small synchronised-queue component that can be thoroughly verified with unit-tests, and re-using it for inter-thread communication in the rest of the application.

However, although it does not really support my argument, I do agree that it is best to avoid non-essential state, although in my case not at the cost of performance or memory use. Many applications are essentially stateful however, and the discussion is not between keeping the state but between destructive update and object replacement. I think it is interesting that although pure functional records are replaced with a new copy each time, people think about it as just changing one value.

I think the clear alternative is to allow mutable data, (and allow it in function arguments) but make assignment a 'copy' operation (limiting data to regular objects).

I feel we're talking past

I feel we're talking past one another. I don't see how pulling actions from a set or sorting a list of objects is even relevant to any problem I was describing (or apparently failing to describe). A typical application is bigger than one well-defined data structure. One must reason not only about the selection of an action from a set, but the effects of reading/applying them in all possible orders that the set might provide them (potentially allowing for independent change in the set's implementation), and how to trace problems when something goes wrong.

In any case, it isn't my goal to talk you out of your ideas. I do agree that the constrained use of what you're calling state is unlikely to raise complexity to the same degree more conventional imperative state would.

Conversation is useful

Don't worry, I am paying attention. I don't quite understand the point you are trying to make, but I think the difference comes here: "One must reason not only about the selection of an action from a set, but the effects of reading/applying them in all possible orders". If we are talking about a non-deterministic term-rewrite system, with a set of available actions, then what you say is true, but I would avoid non-determinism if at all possible. In any case that kind of non-determinism seems to be application specific and unavoidable when you need to use it, (in writing a constraint solver for example).

The problem is we are talking about things that are too abstract. What we would need is a concrete example of where a functional program performs better than an imperative one, with appropriate benchmark timings. I might have a go at some stuff in OCaml, as that seems pretty fast. What do you think the fastest functional language is, if I was going to try implementing some stuff? (I worry that to get comparable performance will take a lot of work with obscure data-structures, and optimisation hacks).

The point he is trying to

The point he is trying to make is that reasoning about state is difficult. He's been trying to show it to you by giving concrete examples that are small enough to understand but still prove the point. Your response is some clever way to work around those issues for the particular problem at hand under a closed-world assumption where the example is the only "component" in the program. Closed-world might be a reasonable assumption even in the real world, but a single stateful component is not.

I tried to convince you by pointing out that there are many research papers published on the subject even this year, to which you responded that it is a shortcoming of the state of the art in mathy computer science. We agree that it is a shortcoming. I believe the shortcoming is because reasoning about state is inherently hard, and this was also what Andreas Rossberg hinted at.

Since we haven't been able to show it to you despite numerous attempts there is obviously some kind of miscommunication going on. Perhaps you should sit down and prove something about one these simple examples that dmbarbour has given you? If that turns out to be easy you can try the same thing on a slightly larger example. I am confident that you will quickly feel the pain that the rest of us have already felt.

Attempts to show that functional languages are faster usually boil down to "but I could have written that program in C/Fortran/assembler as well". That argument is hard to refute so you probably need additional metrics to decide the winner.

If you specify the benchmark on the form "take an array of numbers and return a new array consisting of the elements that are the square of the numbers in the first array and then compute the sum of all elements" the functional language can easily fuse those operations and never create the intermediate array. Despite 20 or 30 years of engineering behind your C compiler it will still not fuse the intermediate array so the functional language will perform much better. You can argue this is an unfair comparison, but the fact is that fusion is much simpler to perform in a pure language. The reason the purity helps is because there are fewer things the compiler must prove before it can perform the transformation.

If you specify the benchmark on the form "compute the square of the numbers in an array" the idiomatic C program will not perform the same amount of work as the idiomatic Haskell program. You might be able to coerce your functional language compiler into generating the same C as you would have written by hand, but that will take hours or possibly days and the input program is not going to look like the idiomatic Haskell program.

Specifics.

To briefly address the array example, if you just want the sum of the squares, why would you even use an array in an imperative language, you would just add the square of each number into an accumulator:

sum2x = 0
foreach x in list {
    sum2x += x * x;
}

The request to 'prove' something about some concrete examples seems to be the heart of the problem. I am not concerned with some toy problems, I have real applications with real requirements to implement. I am concerned with how the languages perform against these real requirements (and it was my mistake to use toy examples as well). I have implemented my type inference algorithm in Haskell and C++, and I prefer working with the C++ version. Trying to use graphs in Haskell to represent the type equivalence class graph was painful.

Lets look at a specific problem then. A computer program to play the game of Go using UCT tree and monte-carlo simulation. In C and Ada I represent the board as a union-find data structure with one cell per board location (a 19x19 board is usual). Each location contains a 'parent' pointer so that union find can be used to find the canonical representative of any group of stones on the board. At the same time a 'next' stone pointer creates a ring, allowing enumeration of all the stones in a group. Merging stone groups is a union operation, followed by a swap of next pointers between any two stones in the groups (may as well be the canonical ones). In each cell we keep some statistics, the colour of the stone group, the pseudo-liberties of the group, and an array of the neighbouring stones colours. We also have an array of available moves to play. The speed is critical as we need to complete as many monte-carlo simulations per second as possible - as this directly affects the play strength of the software. The more simulations per second, the better it will play. I have written versions of the basic core engine in C++ and Ada, and get about the same performance from each (~147,000 simulations per second on a 2.4GHz Intel Core i7).

Now I won't say the reasoning about the state was easy, but the problems seem just as hard in a functional language. I wouldn't expect it to be any easier, and it could be harder due to the complex data structures you may need to avoid destructive update. I have enough experience with both functional and imperative programming to be able to have an opinion about relative difficulty here. The board has an API, and it needs to store the current play-state to be able to function as a board. So if we had two implementations of the board, one functional and one imperative, they would both have the same external complexity, and the same state space, as they can represent the same game-states. Internally the imperative one may be more complex, and use pointers, but once it has been proved correct (via formal verification or unit testing) it can be treated as a sealed box, and has no more state-space complexity than the functional one. We can then proceed to implement the algorithms that play on the boards without ever 'opening' them to rest of the code. I don't see what is hard to grasp about this. I have a lot of experience with application development, and the architecture can always be split into independently verifiable components with APIs. These APIs are driven by application domain (or business process) requirements, and so would be the same in a functional or imperative implementation.

Lets say I wanted to re-implement the monte-carlo engine in a functional language for comparison. Which language would you recommend for the best performance, and what immutable data structures would you recommend to replace the union-find, and cyclic-linked lists that are used to keep track of the groups of stones on the board?

Edit: I did not choose this example because it uses union-find, its a strange coincidence that a lot of the problems I am interested in seem to involve union-find. A hypothesis might be that union-find is fundamental to a class of "interesting" algorithms... I don't really know, suffice to say being able to implement union-find simply and efficiently is a key requirement of any language used to solve these kind of problems.

I am not concerned with some

I am not concerned with some toy problems, I have real applications with real requirements to implement.

Most of us have real applications with real requirements to implement. The reason we keep returning to the toy examples is that they capture the essence of some aspect we're trying to convey.

I don't think I will be very constructive in any further replies on this subject. If you don't like working on toy examples I'd like to encourage you to sit down and prove something about your real application.

Model of Computation

Lets try and return to the original point, as I think I have got side-tracked. The original point was that a functional language may not be the right model for expressing some computations , like for example a list append, because it turns out the naive implementation is inefficient.

It is not my obligation to prove the above, because it should be obvious from the original statement that the naive functional implementation is inefficient, and there would not be ongoing research into trying to make the efficient version more elegant if this were not the case. It is trivial to see the imperative version is efficient, hence I have nothing to prove.

The objections to do with reasoning about code and state space, have nothing to do with the applicability of the programming model to the problem, and miss the point. They are a general attack on imperative programming and do not specifically address the point being made. I should not have got side tracked into a general defence of imperative programming here.

evolution of a programmer

Keean, i suspect most people agree with you up to a point (and even maybe thought the same thoughts), at which time they say, "yes, but!!!" and the stuff following that is what renders the initial world view no longer correct. I mostly do software maintenance and i've seen most bugs come from crappy shared mutable (often not even locked, hah!) state. :-(

* yes mutation (list append example) can be more efficient. that's why functional languages under the covers do mutation. the point is to make sure mutation is used only in the right ways. look at clojure's transients.

* yes simple single threaded programs can mutate and get away with it. bad habit to form, however, in the long run.

* as soon as there is any form of concurrency what-so-ever then you are utterly fubar'd if you have shared mutable state w/ or w/out locking.

* one should keep "imperative" clearly separated from "mutation".

* there's a balance to watch out for between imperative and others, more declarative approaches. personal choices, subjective, nurture and nature play roles in it all. but on the whole i think more declarative is a better default than more imperative.

The wheel turns.

I have thought the same way as most people in the past :-) I've been doing this for a while now. 10 years ago I thought Haskell was the future, but listening to Stepanov changed my mind. "We like bits ... We like real processors ... Etc.". If you haven't watched Stepanov's talk at Stanford about "Elements of Programming", I highly recommend it:

https://www.youtube.com/watch?v=Ih9gpJga4Vc

If you don't have time to watch the whole thing, watch from 15:00 to 20:00 (bits and cpus) and 44:00 to 49:00 (memory).

On parallelism, this assumes shared state and threading is the only way to do parallelism. Its not, and the most parallel machines are shared nothing. SMP does not scale.

On mutation, I think proving correctness and modularity can help here. The problems with software maintainance occur for many reasons, and the languages you refer to have many faults. To pin it all on imperitivity or mutability seems unjustified.

I am a big fan of declarative programming, although recently I have favoured the logic programming approach, its more declarative than functional. It also integrates well with type theory and type systems, which is why I'm planning an imperative language with a logic meta-language. The logic program will assemble imperative components, replacing the functionality of C++ templates, Ada generics and Haskell type classes (which are all really the same thing).

MORE VIOLENT AGREEMENT!

agreed that even the languages that help deal with crappy bugs due to mutation are not in and of themselves the be all end all. i doubt anybody would claim they are. well, ok, there are always too many kool aid drinking apologists for any language, but i hope people on LtU are wiser than that.

please note that concurrency and parallelism are not the same thing and that most people actually have to worry about the former more often than the latter. (that's a fact in the vein of, "Aw, you can come up with statistics to prove anything, Kent. Forfty percent of all people know that.")

so i dare say nobody is trying to pin it ALL on any single thing. BUT mutation the way it is currently usually unconstrained IS a BIG source of bugs. thank goodness for things like clojure doing experiments with transients etc.

have you looked at shen? i'm ignorant but just based on pattern matching words you used it might be of related interest.

Hypothesis: Update more common than copy

Here's a straw-man: The majority of algorithms (I encounter) update structures more than they copy them (especially if we allow a copy-less swap, which can emulate move when we don't care about the value of one side after the move). Essentially copy is only really needed in the case of backtracking, so a lot of algorithms don't even need it. If we have a record, or container of 'values' we want to update the values in the container, this is often the whole point of the algorithm. The idea that we want to keep a reference to the unsorted copy of an array for example, is a misunderstanding of time in algorithms. We move forward in time, we want to go from unsorted to sorted, and we want to work with the sorted version after sorting. In many (the majority of?) cases holding a copy of the unsorted array is a mistake.

parallelism, concurrency, and copies

If you assume a single-threaded algorithm with exclusive access to data, then I expect your observations hold true and copies are less frequent than updates. Even in functional programming, it is not uncommon to break one data structure down while building another up, discarding access to the older structure.

OTOH, as you introduce parallelism to your algorithms, then you'll often find each thread needs to read the same data structures. Similarly, if you step back from the individual algorithm, you'll often encounter cases where the same data is processed by many algorithms then somehow combined - e.g. one-to-many broadcast patterns. And with concurrency in this higher layer, you might want MVCC where readers are processing an old version/frame of the data structure even while the writer is updating it to build the next version/frame.

(There are many more scenarios - e.g. memoization, caching, incremental computing, backtracking - where copies might be pretty common.)

Modulo MVCC and a few other exceptions, you can cover many cases with just the following two patterns:

  1. your algorithm has exclusive access to data, and may update it
  2. your algorithm has shared access to data, but will only read it

If you address both of these without copies, you'll have most of your bases covered without need for copies, and you can always make a copy or use some awkward hacks (e.g. modeling update logs) for the exceptions. You might accomplish do this by having some way to "freeze" an object, e.g. going from an exclusive read-write pointer to a shared read-only pointer. (There might also be a reverse direction, albeit potentially with runtime failure - e.g. casting a shared pointer back into a unique pointer, but failing if reference count is greater than one.)

At least one programming language, Plaid (developed by Jonathon Aldrich), uses a similar technique - cf. Concurrency by Default, albeit allowing manipulation of shared objects within an 'atomic' context.

...

I find the above to be a lot more sophistication than I want in my language design. Using persistent data structures everywhere, even when I don't need the persistence, is nice and simple. And performance is adequate if not optimal.

Besides that, MVCC and incremental computing are common in the paradigm I'm developing and the architectures I favor. Persistent data structures are often a good fit for performance of the system-as-a-whole, even if not optimal for the individual algorithms in isolation.

Parallel and Sequential

I have two possible approaches to parallelism, one would be more suited to a functional language, and that is using software-transactional-memory everywhere, but the runtime added would spoil the point of having a low-level imperative language. The other is to not use threads at all, but favour a vector approach, combined with low-overhead message passing.

If performance is not a priority, then I totally agree with your approach - I don't believe one-size-fits-all.

My target is performance critical coding - where every cycle counts, but I don't think such coding has to look very low-level, and it can have elegant abstractions.

purely functional finger trees vs. mutable linked lists

I don't see how finger trees beat a singly linked list, providing you have a tail pointer in the list 'head'. Append and split are O(1)?

Splitting a singly linked list in half is O(N), e.g. you iterate until you have a pointer to the middle of the list. Splitting a finger tree in half is O(lg(N)), and has the extra advantage of being non-destructive. Same for updates.

Appending two singly linked lists is O(1) only with destructive append. Appending two finger tree sequences is O(min(lg(N),lg(M))), which is a bit worse but still very acceptable... and has the extra advantage of being non-destructive.

I am not even sure pure functions excludes mutating arguments passed by reference

It doesn't, really, so long as the argument is linear (no aliases). If we use linear types, we're free to mutate arguments. It's also quite feasible to functionally model mutation, e.g. using a `Map Int a` to model doubly linked lists by first modeling the heap.

I'm not quite sure where the requirement for immutable data comes from?

We cannot mutate data that might be aliased. E.g. in f(x,g(x)), observable mutation of `x` would be a problem if we tried to understand this expression in terms of mathematical functions.

I wonder about memoising the function as a graph rewrite? Effectively the value of a function can be seen as a constant graph rewrite operation recovering referential transparency?

I understand what you're trying to say. The language I'm developing essentially operates in that manner. It's a natural fit for concatenative functional programming: every function 'rewrites' the stack (or whatever is the tacit input).

But if your rewrite language has any sort of copy operation, e.g. of form `copy :: a → (a * a)`, then you'll need to choose at the implementation layer between deep-copy and aliasing under-the-hood. Aliasing is the nicer option for many performance goals.

non-destructive disadvantage

Whether being nondestructive is an advantage seems subjective. It depends on the measurement criteria. If we care about transient memory use (ie number of allocations and deallocations) then its a disadvantage. I am not sure which simple metric would score nondestructive update as better?

transient memory use

It depends on the measurement criteria. If we care about transient memory use (ie number of allocations and deallocations) then [non-destructive update is] a disadvantage.

That argument is much too simplistic. Non-destructive updates in many use cases reduce transient memory use in a system by reducing need for deep-copies of structure. Besides that, they also can reduce synchronization overheads and latency, reduce long-term memory costs by sharing or interning common structure, and simplify reasoning.

for some definition of use case?

I agree its simplistic, and there are some use cases where it (immutable lists) clearly makes sense. It all depends on which operation dominates the algorithm. That's where I like the approach of the C++ STL, where "Concepts" imply axioms that involve limits on complexity. For example a forward_iterator guarantees an O(1) succ ooperation. As far as I am concerned algorithms should never directly use data structures, but should require the "concepts" such that the performance guarantees of the algorithm are met.

Hm

I find imperative linked lists (in particular the doubly-linked, but already the single-linked ones) rather complex to manipulate. Implementing insertion or deletion of one element at an arbitrary position is something that is always a bit tricky; I need to draw the three involved cells and their pointers on a paper sheet before writing the code -- in practice I don't do that and I get it wrong one the first time or, embarrassingly, the first few times.

I don't have this problem for ML-style algorithms over immutable lists. The non-tail-recursive versions are particularly simple and beautiful to express. But they are not the "final" version, because not efficient enough. They are easily rewritten in the good versions, though -- I don't think I ever introduced a bug by making this transformation (tree algorithms are a bit more tricky as you can have two recursive calls), and I nowadays feel absolutely confident in writing the tail-recursive version directly, on my first try.

I like writing naive immutable list functions, they're nice and beautiful (much as I was amazed as a math student by the construction of natural numbers using the inductive, unary representation). But as an algorithm&data-structure designer I am rather suspicious of them; they're easily overused for bad purposes, in ways that even lazy lists seem to partially avoid.

I think my criticism of the (lack of) simplicity of working with mutable only goes stronger with more complex structures. I have mostly erased from my memory whatever balanced trees people tried to teach me in an imperative style -- except maybe the CLRS heap-sort implemented on an array with that delightfully primitive (2*n, 2*n+1) indexing of children. Mutable red-black trees are a nightmare. The Okasaki version is beautiful (but he doesn't talk about deletion!).

Not that those imperative versions are un-implementable: I'm confident anyone could do it with enough time, a very clear specification, and if possible a (hand-written or machine-checked) Hoare-condition derivation of correctness to definitely weed out any remaining mistakes. That would be a thing of patience, not something done "in anger" or to bootstrap a programming course with simple examples.

Are you interested in imperative programming with pervasive immutability because of efficiency reasons? I don't know whether that is your opinion, but it is one I have considered and discarded. I'm sure we can have efficient programming languages that functional at the core, given enough good design, type systems and program logics to allow us to *safely* write very specific subsets in lower-level variants of those languages, without rejecting the good design that comes from the mathematics. Value types when they're really needed, fine; a linear discipline for strong update, fine. We can still write beautiful programs 90% of the time.

(I'm surprised by the continuing noise of the hordes of programmers that acclaim GC-less languages (or those that "don't make it mandatory") today, in the 2010s. Sure, some kernel hackers or GUI toolkit implementers hate GCs for good domain-specific reasons, but are there so many of them around? Have people been disgusted by the abundance of mediocre GCs implemented in too-young virtual machines? Are so many programmers spending most of their effort building low-level blocks of animated user interfaces?)

Familiarity

How much of this is to do with familiarity? To be honest when I first started with Haskell I thought it was a wonderful revelation. But through extended use I found the limitations of the abstractions, and things that just should be simple, that we becoming complex. With the lists, you shouldn't really ever need to implement your own lists, you should use the STL generic implementations in C/C++. You should use the ADT API. The same should be true in any good imperative language, domain specialists can write the low level code to manipulate lists (or other containers / data-structures) efficiently. You concentrate on writing the algorithms that use them. There should be axioms, that enforce the assumptions made by the libraries enabling the compiler/type checker to make sure your code does not violate them.

As for my motivation, I write a lot of code where speed and memory are critical. Things like monte-carlo simulations where we need fast PRNGs, and the amount of data we can fit in memory is critical. Fixed sized arrays and destructive update are essential. We want to fit the largest decision tree possible in the available memory etc.

Unconvinced

I sometimes use the "only domain specialist should write this" argument myself, about concurrent data structures implemented on top of mutable shared memory -- especially with weak shared memory models. Surely concurrent data structures are becoming just as fundamental as linked lists for many use-cases, but I'm lost at the disconnect between the post I responded to, in which you said that

Its possible to write a naive, simple and efficient list append in an imperative language, without using a data-structure you need a PhD to understand.

and you current position that

With the lists, you shouldn't really ever need to implement your own lists, you should use the STL generic implementations in C/C++. You should use the ADT API. The same should be true in any good imperative language, domain specialists can write the low level code to manipulate lists (or other containers / data-structures) efficiently

Sure, we can ask domain expert to build our fundamental building blocks (but then what do we actually trust us mortals to write? should we restrict ourselves to plumbing existing APIs?), but my initial request was whether there was a way to fix language semantics or approaches so that the simple, natural code that beginners would write (in a functional programming language) would align with the ultimate code you should write to use forever for this data-structure. If your answer is to "let experts do the job" (in whichever language/paradigm is used), that doesn't really tell us a lot about language design -- except your fair point that we should have static verification tool to enforce the expert's preconditions, which I fully agree with.

Oversimplifying Intelligence?

I agree with what you say, but I am not optimistic that beginners can ever write the ultimate code. Rather I think algorithms are in a continuous state of refinement, and they can always be improved (even experts code will not be the final state). The problem is that you cannot write the abstractions first time. Rather you implement the same thing over and over, and eventually realise there is an abstraction to be made. Quoting Stepanov on this point:

" I find OOP methodologically wrong. It starts with classes. It is as if mathematicians would start with axioms. You do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work."

But of course this applies to the functional approach as well if we are discussing abstraction and interfaces.

To try and explain the apparent disconnect between the two statements you quote, I am confusing two different points. The first point that an efficient implementation of a linked list is simple, and the second that with a good parametrically polymorphic API you should only really need to implement a linked list once. The are not contradictory, and they stand as points on their own.

I think I am beginning to be able to understand where the different positions come from on this issue. I think there is a big oversimplification of intelligence going on, assuming that all people (programmers) think the same way, and therefore one paradigm is optimum for everyone. I am beginning to think this is an error. Why do I find writing a singly linked list (without having to draw diagrams) easy, yet you seem to find it hard? Perhaps it is because I have a high aptitude for geometric and spacial problems? (I remember a test where you had to determine the rotation of an output shaft given an input rotation and a complex set of gears). Perhaps the problem is the assumption that all programmers share the same aptitudes and problems?

yes (at least a little bit)

i aced gemoetry and it was so easy i should have doubled-up with algebra, but all other non-visual, more symbolic math easily defeats me.

i do wish our systems (including our &@#$%*ing usa public educational system) was all about leveraging different ways of representing and manipulating so people could use the best parts of their brain, whoever they are, whatever it is. (see core talk cubicon although it is a joke it is sorta a nice idea in passing.) pascal genie showed linked lists visually (but really slowly in 1989).

(re: oop yeah well that's why we do things incrementally and iteratively and use refactoring and rewriting and all that jazz. and there's prototype oo so all in all i think that particular complaint about oo is not that big a deal -- there's much much much worse things to be levied against oo.)

nicer data structures

I agree with the sentiment on strict lists. They are a source of much frustration. Using finger trees, a'la Data.Sequence, avoids a lot of problems. We should choose our data structures based on the properties we need.

Derivative of Data Structures

I am reminded of this paper by Connor McBride: The Derivative of a Regular Type is its Type of One-Hole Contexts.

yes!

that was in the back of my head, thank you for extracting it! i never really understood that stuff, wonder if it would make sense if i reread it again now...

Maintaining those pesky GC invariants without a write barrier

I believe that the invariants of a generational GC can be maintained during the writes necessary for tail consing without the expense of a write barrier, at some expense in implementation complexity.

The trick is to take advantage of the fact that the object undergoing mutation is guaranteed to be freshly allocated unless a GC runs. A write to a fresh object does not require a barrier, because it does not invalidate heap invariants.

To handle the case where the GC has run and the object is not fresh, in the frame descriptor of the tail-consing function we mark the mutatee with a tag indicating that it should be placed into the remembered set after other GC work is completed. This maintains the invariant that any promoted object that contains a pointer into the nursery must be in the remembered set.

This plus tail-recursion-modulo-cons should allow the "naive" definitions of append, map etc to perform well without double allocation or risk of blowing the stack. The cost is that stack scanning becomes more complicated, and frame descriptors larger.