What is a state?

Here are two versions of the factorial function written in Lisp (MuLisp).
A recursive version:

((ZEROP N) 1)
(* N (FACTORIAL (SUB1 N))) )

and an imperative version,

(SETQ M 1)
(SETQ M (* N M))
(SETQ N (SUB1 N)) ) )

The imperative version has two mutable states N and M. The way I look at it, the recursive version has a mutable state N. Is there any argument for saying that the recursive version has no state, or no mutable state?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


The lack of mutation operations in the first definition should be a clue. ;-)

Slightly more rigorously, note that at all points of reference to N in the first definition, the value is the same for that invocation of FACTORIAL. N rather obviously (if you know Lisp) isn't mutated within the body of FACTORIAL. It's true that N is bound within FACTORIAL, but binding is not mutation; binding is the introduction of a new lexical scope in which an identifier is bound to a value, and if that identifier were bound to a different value in the enclosing lexical scope, that binding is said to be "shadowed" by the new binding. That's the machinery that makes it possible for the first definition to work: N keeps getting rebound to smaller and smaller values until it hits zero, at which point the stack unwinding causes the chain of multiplications to actually be performed, which would give the incorrect result if N got mutated (vs. bound) at every invocation.

So FACTORIAL remains what it has always been: a pure, albeit partial, function.

Update: I should add that the second implementation only differs in taking two arguments and using mutation internally; observationally, it's still a pure partial function. An analogy can be made here to using STRef in Haskell: that is, it's possible, in Haskell, to write a function that uses mutable state internally, but is observationally pure, and the type system guarantees it.

Second Update: So what's the big deal? Well, at this scale, there really isn't one, but the lurking trouble is twofold: first, if you ever need to reason about the code, all of that reasoning can be done lexically given the first implementation. That's not true of the second; you need something like Floyd-Hoare logic to reason about the second. Second, in this example, the mutation in the second implementation is contained; FACTORIAL remains observationally pure. But start passing around entities that get mutated, or start mutating shared entities, and that's an enormous force multiplier for the problems.

Why doesn't the line (N *

Why doesn't the line (* N (FACTORIAL (SUB1 N))) count as a mutation it changes the return value of FACTORIAL.
Edit: The mutation takes place when the argument N is evaluated.

It doesn't change the return value

It is not mutation when a value is assigned the first time.
In this case the return value is set only once, in the line (* N (FACTORIAL (SUB1 N))), so that does not count as a mutation.

Because the models say it isn't

The first example can be evaluated by following a lambda-calculus-like substitution model, in which an expression such as (factorial 5) can be beta-reduced by replacing N with 5 throughout the body of the factorial function. This leaves an expression such as (if (zerop 5) 1 (* 5 (factorial (sub1 5)))), in which N has been eliminated. No other machinery is needed to keep track of the values of variables in this situation. (See e.g. SICP, The Substitution Model for Procedure Application.)

Once assignment (SETQ) is introduced, you can no longer simply substitute identifiers with values, in general. You need to add machinery to keep track of the state of variables. In formal models, this is typically achieved via a "store", which maps locations to values, and thus contains the program's mutable state.

To expand a little on my previous comment, the usage of "state" we're discussing corresponds to "the state of mutable variables", where mutable variables are variables for which the substitution model of evaluation fails (without additional machinery).

In this context, (* N (FACTORIAL (SUB1 N))) does not count as a mutation because no variable is mutated, and evaluation can be modeled with simple substitution. Perhaps you could construct a model in which "changing the return value of FACTORIAL" is considered a mutation (although it's not mutation of a variable), but that's not a model that is used in PL theory, afaik.

BTW, I mentioned SICP above, but it's not a good source for the more formal PL concepts here (e.g. I don't think it mentions "store" in the formal sense). A possible intro text here would be PLAI, which has a set of chapters devoted to state in the sense we're discussing. CTM is also very good on this subject.

Reasoning about programs

Anton points to the crucial point: mutation breaks the substitution model. The reason this is important is that it is much easier to reason about programs when substitution is allowed. In general, this is why "side effects" are considered as problematic.

It is much better to think about the root cuases, and lanuage desing goals, than simply to concentrate on rules such as "state is bad" or "mutation is bad". These are only useful when we keep in mind that they are shorthands.

Dangerous games

The message I get from this is that what you really are concerned about is mutation and not state. When it is said that "there is on state" the meaning is that there is no mutation as defined by lambda calculus. The word "state" has been around a long time, long before computer science, and it has meanings outside of computer science. It is dangerous to fool around with meanings this way, and not really necessary.
Edit: A quick check of SICP and I see no mention of state in connection with the substitution model. One mention of state on page 32 is the well known definition of state.

Separation of Church and State (cf. PLAI)

This may be true, but mathematical logic and computer science put the terminology on a formal footing within the context of computation that has served as an effective boundary between the mathematical logic/computer science context and other contexts for over a century: no educated computer scientist gets confused when someone refers to a Haskell program as "stateless" or refers to the HTTP protocol as "stateless." There's a level at which talking about the relationship between binding and mutation is interesting, but it's essentially at the level of implementation of a runtime for the programming language in question; if that implementation makes you, the user of the language, care about bindings-as-state, it's a bug.

A boundry between computer

A boundry between computer science and engineering is not helpfull if it hinders communications.

no educated computer scientist gets confused when someone refers to a Haskell program as "stateless"
And this may be why only computer scientists use Haskell. Oh, and what about a state monad?


Hank Thediek: And this may be why only computer scientists use Haskell.

Or folks who use Haskell eventually become computer scientists. :-)

Hank Thediek: Oh, and what about a state monad?

The resulting program might or might not be referred to as "stateless," depending upon observability, and that's the key point: there's still no confusion. A stateful Haskell program uses one or more of the state monads. A stateless one doesn't. Attempting to impose non-mathematical-logical/computer-science notions of "state" onto this model isn't useful.

Update: What I mean by "not useful" in this context is, specifically, not useful in answering the original question, "is there an argument for not calling this first definition of FACTORIAL stateful." That doesn't mean that one couldn't or shouldn't take the engineering definition of "state" and model it in Haskell—on the contrary; Haskell's laziness and clean separation of mutation from pure code would likely lend itself better to such modeling than would other paradigms, in a strong "source language" being distinct from "object language" sense. Think "Structure and Interpretation of Classical Mechanics" implemented in Haskell.

Attempting to impose

Attempting to impose non-mathematical-logical/computer-science notions of "state" onto this model isn't useful.

The notion of state that I would like to apply is chapter one of this, An up to date version of systems theory. Note the many examples related to programming.


Thanks for the clarification—I saw this after adding my own update to what I meant by "not useful" in the post you're responding to. Given this, you might find Venanzio Capretta's work interesting, particularly "Recursive Coalgebras from Comonads." But it's all good stuff. It also sounds like you'd appreciate what our own Frank Atanassow has been working on for some time.

Note also that one of the

Note also that one of the first things it says is that it's talking about reactive systems, not algorithms. Smells like potential scope mismatch to me, especially when talking about pure functional languages.

Page 7, item 1:

Systems may indeed run forever, but they may also terminate. An algorithm becomes a terminating system when it runs on a machine. Likewise the two functions have a system interpretation only when they run. The system includes an interpreter (evaluator) and its environment if it is interpreted. Otherwise all that is compiled into an executable file. An algorithm is not a very interesting system but is within the definition. Systems become really interesting on a grander scale. If you read on you will see that Kurz covers the basic cases.

Yeah, I find the distinction

Yeah, I find the distinction between an algorithm and its implementation on a given machine significant though. My pure haskell code still doesn't meet the definition until I compile or interpret it, and any state is an implementation artefact.


Another idea implicit here is the notion of "analytic". Pure functional programs like Haskell are analytic in the same sense that a knowledge base is analytic. As it appears in a text file it is simply "knowledge". Knowledge in Haskell or a knowledge base in Prolog are often said to be "stateless" meaning that the order of statements is unimportant. This is also called "declarative". But as you point out one cannot execute such a thing without creating a system with state. This is another reason I prefer "analytic" to "stateless" when describing a knowledge base.

Abstract machine state

The notion of state that I would like to apply is chapter one of [Coalgebras and Modal Logic].

The referenced notion of state seems to be that of state transition systems, which is what I've been assuming (or more generally, that of abstract machines). As we've discussed before (I'd have to dig for the threads), it's fairly straightforward to map a functional PL semantic model onto an abstract machine which has the notion of state you want. See e.g. A functional correspondence between evaluators and abstract machines.

Beyond that, you have to take into account that you're dealing with different definitions in different models.

Context switches:

What I really object to is the association of the word "state" with any sort variable mutation. If I read this correctly there seem to be two types from an FP perspective. Repeated assignment and bind steps as in unwinding a stack; and type two, the bind and rebind of a variable. A program can have many type two mutations of variables that are not states. In yet other contexts, type two mutations are also called "fluents" to avoid confusion.

The word "state" has a systems theory definition as outlined in the above paper. There are usually only a few of these true states in a program. They are very important and define the behavior of the program when it runs as a system.

The habit of referring to any type two mutation as a state probably comes from the intuition that such a variable might become an "accidental state" and alter the flow of the program. This is often referred to as a "bug" and should be avoided. Conversely there are often many type two mutations, and only a few "states".

The desire to separate the "buggy" from the "pure" probably leads to the idea that functional programs are "stateless" (meaning only type one mutations) even though they have obvious behaviors. From a systems theory point of view you can't have behavior without state. Regardless of context switching I find this to be confusing and misleading.

"is there an argument for

"is there an argument for not calling this first definition of FACTORIAL stateful."
Do you mean stateless?
Yes, The one I used in the other thread. It goes like this: Two systems with the same behavior are homomorphic. The two functions above have the same behavior, it doesn't follow that one has state and the other is stateless. Both are statefull by this argument. They are both systems and they both have the same behavior.
Edit: A stateless system would have no behavior at all in systems theory.


Hank Thediek: Do you mean stateless?

Indeed I did, sorry.

Hank Thediek: Yes, The one I used in the other thread. It goes like this: Two systems with the same behavior are homomorphic. The two functions above have the same behavior, it doesn't follow that one has state and the other is stateless. Both are statefull by this argument. They are both systems and they both have the same behavior.

Quite right, and you'll note that I addressed this in terms of observational equivalence in my original reply. My claim is that treating two functions, one of which uses mutation and the other of which does not, as "systems having the same behavior" is of sharply limited utility in any context other than the one in which the mutation is strictly local. Of course, strictly local mutation can be quite handy in quite a few cases, which is precisely why STRef in Haskell is so useful. :-) But even without that, for small, simple functions, I don't think programmers should necessarily avoid mutation; mutation becomes problematic at some scale.

Hank Thediek: Edit: A stateless system would have no behavior at all in systems theory.

Yes. There's a direct analogue here to Simon Peyton-Jones' observation about Haskell's "embarrassing problem" that, if a program couldn't do I/O, then all it could do would be to make the computer get hotter for a while. :-) So clearly there's a scope at which you do, in fact, need observable state. The question at hand seems to be identifying the appropriate scope for the goal at hand: if I'm talking about reasoning about a function, then calling both of your FACTORIAL's "stateful" isn't helpful, as discussed above, because the logic used to reason about each of them must be quite different. If I'm talking about observational equivalence, then it doesn't matter whether the two definitions are "stateful" in the CS sense. If I'm talking about systems theory, then they are indeed systems with the same behavior. It just seems to me that systems theory is best applied to whole programs, vs. individual functions, but conversely, algebraic reasoning isn't probably usefully applicable to whole programs—although you can certainly argue that the whole point behind Haskell's design is to achieve precisely that.


"the key point: there's still no confusion. A stateful Haskell program uses one or more of the state monads. A stateless one doesn't."

I wonder how is that any better than these:


int factless( int n )
	if (n > 0) return n*factless(n - 1);
	return 1;


int factful( int n )
	int i, r = 1;
	for( i = 0; i < n; i++ )
		r = r*(n-i);
	return r;

Specifically, what good is being stateless when to do any real work you have to interface with a stateful world? By your account above, I guess most useful Haskell programs are all stateful and shock full of State monads and IO monads and other fancy new names for good ol' IO underneath... kinda defeats the original purpose of being purely functional and ends up being about the same as regular C: stateful for real work, stateless for short calculations here and there...

In practice there's

In practice there's comparatively little work that has to be done in the IO monad (yes, even if you're writing a web server or an IRC client) - and the thing about having state monads everywhere is that you only give each monad the state it needs to do its job, at which point you're really factoring out state that you'd be passing around manually anyway. In fact, that factoring's how most of the state monads in my code end up appearing - they're there when there really is something inherently stateful about an algorithm.

Non-IO state monads really don't have to be IO underneath - indeed State, Reader and Writer all have pure implementations in the standard libraries and an awful lot of code written using them has a pure interface as a result.

Guesses are cheap

By your account above, I guess most useful Haskell programs are all stateful and shock full of State monads and IO monads and other fancy new names for good ol' IO underneath... kinda defeats the original purpose of being purely functional and ends up being about the same as regular C: stateful for real work, stateless for short calculations here and there...

That would be pretty terrible, if it were even remotely true. It's not, though. But describing why is ultimately pointless - as Ehud has put it, you can't learn to swim by debating the color and shape of pools. You need direct experience. What do you need to help you take the plunge?

The Slope Isn't That Slippery

To see why the slope isn't that slippery, I recommend looking at ZipperFS and Simon Peyton-Jones' "A Taste of Haskell," part I and part II, in which he discusses xmonad, an X-Window manager written in Haskell.


SPJ definetely is a very charismathic man. He'd make you willing even to program in assembly if that was his forte. :)

That is proof of useful real-world Haskell coding. But it's not the kind of everyday, commercial application making. It'd make wonders to show ways of modelling business rules and domain objects in terms of Haskell functions and types/classes and how to assemble them together and interacting with real world stateful systems...

Shifting goalposts

I can't help but feel we're being asked for more and more info here.

How would you like to model business rules and domain objects? Odds are it can be done, it's a matter of applying general techniques to build the abstractions you want. The interaction's much the same whatever real world stateful system you're interacting with.

Narrowing Focus

My point is just that you'd think that a multi-user network filesystem with commit, rollback, isolation, copy-on-write, and so on would be a good example of a stateful system. But it turns out not to be. Similarly, you'd think that xmonad, being an X-Windows manager, would be filled with state. But it still isn't. If those examples don't fall into your trap, how likely is it that many other things would?

As Anton and Philippa have pointed out, guessing that the situation is as you believe it to be, and asserting that the development tasks you have in mind are somehow different from those involved in ZipperFS and xmonad aren't very helpful. It sounds like you might want to closely examine a lot more Haskell code to see how the things you want to do are done in Haskell.

Another example

If those examples don't fall into your trap, how likely is it that many other things would?

Here's another nice example, using a zipper to eliminate bugs due to mutability in the implementation of a compiler.

It sounds like...

It sounds like you might want to closely examine a lot more Haskell code to see how the things you want to do are done in Haskell.

To me it sounds more like that the classical advice to increase the size of the brain just in order to hunt a mouse isn't really sozializable. I'd be more interested in knowing why someone is interested in Haskell when he actually isn't? Maybe it's just the social pressure to have an opinion about everything or even "smartness narcism"? More questions than answers.

Humans are curious by

Humans are curious by nature. And Haskell is highly regarded as well as its paradigms. It certainly is worth investigating and I'm just wondering how to fit it to everyday tasks in a typical commercial setting.

but thanks everyone for the links. I'll try to appreciate xmonad source code more closely...

It's a great question

I also recommend hanging out on the #haskell IRC channel and asking questions. They're very good at the support thing. You might also want to consider playing with WASH, since web applications are pretty "real world" these days. :-)

In any case, I didn't mean to imply that curiosity or even skepticism were bad. I'm glad to hear that you're willing to get your hands dirty to answer your questions. Time is a precious thing to invest, but I think the value of it will become apparent—even if you never write a Haskell program in anger.

Motivation for Suggestion

Now might be a good time to remind folks that I'm not a Haskell programmer, so there's an important sense in which I don't have a dog in this fight. I just find it odd that some folks feel perfectly comfortable making claims as to what can and can't be done, and/or how it must be done, or how different the doing of it in Haskell must (not) be from doing it in C/C++, on the basis of no actual experience. The suggestion to gain direct experience isn't meant to put anyone in a position where they're doing something they don't want to; it's to help them better inform their opinions.

I should add that this observation isn't about anyone in particular; most working programmers aren't well-informed as to how many languages other than the ones they use can help them avoid statefulness while doing real work. Consider, for example, that CTM doesn't introduce state until over 400 pages into the book, yet it has developed dozens of programs by then. A large number of people, for reasons that are unclear to me, don't want to believe that, and will argue that the stateless code can't possibly be "realistic." But they won't work through the book or try to build anything they might be interested in with the language, making their claims, effectively, superstitious.

One more try

The message I get from this is that what you really are concerned about is mutation and not state.

It's both. Part of the problem here is that you're using "state" as a shorthand for something like "abstract machine state". The term is used in this way in a variety of contexts, but it's no more universal than the usage we're discussing.

Similarly, I mentioned elsewhere that you can think of "state" in the sense we're discussing as a shorthand for "mutable state", but that was to help explain the term, not to justify why the term "state" is used at all. For that justification, you should also consider the way the term is used in contexts such as e.g. a stateless server, as in the HTTP example Paul mentioned.

The issue in that case, as Paul alluded to, is observability: an operation is said to be stateful if you can observe that its state has changed, e.g. by repeating the same operation with the same arguments but getting different results. Such an operation depends on state which is not provided as part of invoking the operation. If you take such an operation and fix the state on which it depends, i.e. make it constant, then the operation becomes stateless. This is equivalent to substituting the state into the definition of the operation, exactly as with LC substitution.

As has been mentioned, if you observe both of the factorial examples in the topic from the outside, they're stateless in the above sense. However, if you're discussing PL semantics, then you're interested in the evaluation of any and all program terms, not just entire functions. The semantic features that have to be introduced to model a stateful function apply equally to the dependence of individual terms on mutable state. We can say that both factorial functions are stateless in one sense, but that the definition of the second one relies on state internally, which becomes clear when looking at formal models. This is observable depending on what level you're examining, and when it's observable, we say it's stateful.

Clarifying the meaning of terms like this can be interesting, but some of the rhetoric in this thread is a bit over the top. Please read points 6(b) and 14 of the LtU policies. A basic lesson driven home by PL theory, or study of formal systems, is that nothing has any meaning outside of a context which gives it meaning. Change the context, and the meaning can change. This applies well beyond PL theory.


It occurred to me that my "not useful" comment could easily be misinterpreted outside the context in which I intended it, so added an update, but perhaps I was too late. In any case, I appreciate both Hank's link that greatly clarified his point, and your executive summary.


I agreed with your "not useful" point, actually, since it goes to the issue of applying definitions in the wrong context.

context frames value

Anton van Straaten: One of the most basic lessons of PL theory is that nothing has any meaning outside of a context which gives it meaning. Change the context, and the meaning can change. This applies well beyond PL theory.

Yes, far beyond PL theory. I like the way you said it. As near as I can tell, it's universal, rarely failing to explain more in studies where a human mind is involved (which seems to be most of them :-).

In the 90's for years I ended online messages with this signature: "All values have meaning only against the context of set of relationships." (That's what I'd finally concluded after more than a decade of reasoning about knowledge representation, leading down so many avenues of gnarly logical regress I hate to talk about it.)

Lately I say "context frames value" to mean about the same thing. [cf]

and beyond

Yes, far beyond PL theory. I like the way you said it. As near as I can tell, it's universal

I'm glad someone caught the metatheoretic understatement. :)

I agree that the point is a universal one, but studying it in a formal context is a great way to understand it. As a formal concept, it goes back at least to Tarski's truth definitions, which led to model theory. PL semantics is closely related.

logic and modeling

Sorry this is offtopic, I know. I'm something of an autodidact. I didn't become interested in PL until some years after I informally studied logic systems and modeling in the early 80's, in university library stacks between one stint in college and another. (Somehow Tarski didn't do anything for me; algebra doesn't fit well with some ideas I had.)

I often meet tech industry folks less familiar with precursors of programming technology. For example, at Netscape I told Guha that RDF was just a graph representation of first order predicate logic, and he unenthusiastically agreed. After that I started seeing RDF described that way. I'm not a big fan of logic as knowledge representation. It does a poor job of capturing emergent ideas. Logic tends to assume context in givens. Actually, I don't want to talk about logic :-), so let's stipulate I'm wrong.

[OT] Tarski and such

I think Tarski's contribution was about as much philosophical as formal, and that can be appreciated separately from the logic. There are probably earlier precursors on the philosophical side (some of them ancient Greeks, no doubt)[*], I'm just not sure who to point to off the top of my head.

Classical logic is kinda old-fashioned </flamebait>. But mathematical logic is a bit different, and from there it's a small step to lambda calculus and some truly interesting formal systems, i.e. programming languages and their semantic models. (Is my prejudice showing yet?)

But really, all I was getting at is that the idea that something only has meaning when we map it to something else is embodied very precisely in these systems, and for me at least, thinking in terms of how meaning works at the formal level greatly clarifies woolier informal questions of meaning in natural language, scientific theories, etc.

[*] Edit: there's Frege and Kant, but Tarski's formalization made a big difference, adding credibility and precision. Perhaps you could say that Tarski was on stronger ground philosophically because he provided a stronger formalism. Again, formal results can have big philosophical ramifications.


Aw heck, I apologize too, but no particular transgression seems to stand out above all the rest, so I just apologize for everything, whatever.

overloaded terminology

You can't get a really complete answer to a question like that in under more than 3 printed pages (at least). The problem is that "state" has too many meanings -- there's no one answer.

All programs are entirely about mutable state, in one sense, regardless of what language they are written in. Computation is about applying a fixed evaluation rule to a program state in a feedback loop, so that the evaluation maps the input state to the output state and is then fed back the output state as the new input. (We're fixated on such models because CPUs are physical things, and therefore are finite state space-like regions, and therefore to be "turing complete" we need to connect CPUs to a (theoretically) infinite memory in a feedback circuit.

Programming languages have to somehow model the whole feedback circuit, including the memory. There is a debate in programming language design:

Should we impose a system of space-like coordinates on memory with the feedback being expressed as reads and writes to those locations? Or should we model memory as a graph of events and causality, such that the current state of memory can be regarded as an "event" which is the causal outcome of the single step rule applied to previous events?

Another way to say it: when the CPU puts new state in memory, does it take over a name previously used for different data (assignment) or does it get a new name (binding)?

The "binding" side of the debate is attractive for two reasons. First, humans (good engineers, especially) "think algebraicly" -- we build small parts, then assemblies of parts, then devices, circuits, etc. "Imagine a beowulf cluster of ....," goes the joke. And, the binding-based models of state change make that kind of combinatoric practice easier. Second, problem specifications for programs are often best expressed (or at least best expressable) in terms of things like basic finite-set theory (with relations, and mappings, and all of that). If we stick to a very intuitive (literally, "conservatively intuitionist") realm of math, the "binding"-based programming language solutions and the set-based specifications coincide to a very large degree -- so we can do things like write programs and write proofs about programs at the same time.

The whole time we're still writing programs that talk about how to mutate memory -- we're just expressing those mutations in a coordinate free way, without trying to assign names to space-like regions of memory.

In contrast, the "assignment based" model of programming languages is a kind of Ptolemeic world-view in which there is a privileged coordinate system for "state". This is physically unrealistic and leads to little besides confusion when engineers start trying to compose parts into circuits or devices. Yet, it has two huge advantages: (1) It fits very well with people's built-in intuitions about what it is like for a thing (e.g., a CPU) to operate in space-time. Real CPUs are connected to real memories and the busses between them impose (of physical necessity) real space-like coordinate systems --- programming languages which directly expose this stuff to programmers allow programmers to apply their spacial intuitions directly. (2) It's comparatively cheap and easy to implement the assignment-based model as long as you aren't trying to do anything too fancy.

The curious thing is, though, that the dichotomy of assignment vs. binding is almost entirely b.s. The two are duals in a very deep way. Binding-based programs take single steps by mutating "head form" of program state where those single steps are permitted to pull values just outside the head form into it and pursh values inside the head form just beyond it. Assignment-based programs take single steps by mutating directly accessible memory and are permitted to make "almost directly accessible memory" into "directly accessible" and vice versa (by, for example, moving values in and out of an index register).

The only difference, at the end of the day, are the choices binding and assignment languages make about the names of the finite number of directly-addressable memory locations. Assembly gives locations absolute names. Fortran/C/etc. give some locations absolute names but permits relative addressing for stack variables and structure fields. Binding-only languages are like a virtual memory system, in this context, giving programmers the illusion of an infinite supply of distinct, directly accessible locations but always (in the implementation) working out a mapping from those virtual location names to real addresses.

There is a third way: neither "assignment" or "binding" based. For example, see the oft-mentioned Cat and Joy languages.


I agree with both

We often see the environment/state mapping distinction, with the ability to add and remove to these mappings happening in very set ways. However, what if we tied an aspect to parameter N of the first example for observation purposes to create a stream of N's? In what mapping does that belong?

With FRP, we often do such bindings with variables, watching changes to references etc, but it would be fairly simple to do it with other bindings and I've long suspected it would be useful in certain domains. The state relation is normally made in a specific way, but analogs make sense (though what they're called, I don't know).

If you untie the recursive loop for optimization...

...then N is certainly mutable state.

Here is another definition

Here is another definition of the factorial function:

factorial n == 1                    if n == 0
               n * factorial (n-1)  otherwise

Is there any kind of state here, by any meaning of "state"? ("Ohio" is right out.) I would think that there is not. There is also, as Hank points out, no behavior: it is a mathematical definition; it lies there, looking pretty.

The neat thing about that definition is that I can do things with it, besides computing the value of the factorial of some number. If I know that, for my purposes, n will always be 0, I can replace the use of factorial by 1. If I will need to multiply the result by 5, I can carry the multiplication into the definition, defining a new function with "1" replaced by "5". (Hmmm. How do I know I can do that?) The point being that I can manipulate that definition without evaluating it or without, heaven forbid, "thinking like a computer".

The recursive definition of factorial in Lisp is, I would argue, a attempt at a direct translation of the mathematical definition into what amounts to a different notation. As a result, I can perform the same manipulations on the recursive factorial as I can on the mathematical definition. (There are problems with the translation. In particular, if I evaluate it, there will be state and behavior, and likely limitations on the emulation of mathematics by Lisp. But those are only partially relevant when I am looking at the code.)

The imperative definition of factorial is another beast entirely because that definition relies on the existence of things that take on different values at different times. (I wrote "time" there inadvertently, but appropriately. The recursive definition of factorial does not involve any particular notion of time, while building an understanding of the imperative version that does not require at least a rudimentary involvement is a challenge.)

Pure functional programming, as represented by the recursive version, is, in my view, an attempt to bring relatively simple mathematical transparency to programs. There are other approaches, like Hoare triples, which essentially deal with the program as a text. (Me, I come from Dijkstraland.) Or, you can bring in modal logic and attempt to reason directly about programs, but your math won't even look simple anymore.


((ZEROP N) 1)
(* N (FACTORIAL (SUB1 N))) )

The answer is simple: with sufficient smart tail-calling this version gets converted to the second one. As you can only hope that this is the case, it does have state.

On the other hand, the computation and compilation, in difference to the mere definition, certainly has state... like parser position and instruction pointer.

/me feels like the guy in the hitchhiker's guide who figured out that to construct the impossible improbability drive one just has to calculate and then feed the exact improbability of an impossible improbability drive into the possible improbability device constructor, who soon later got killed by a mob of scientists who didn't like smart asses.

So on zaphod's third hand, any definition only has one state, namely that of being defined to that or that. Saying otherwise would be confusing the blueprint with the car or, worse, the race. The confusion starts when you try to use the same words for the full recursive meaning of the above definition and for one single "instance" of the recursion pattern. Aside from being infinite (the wrong kind) for negative numbers, the question arises whether (FACTORIAL 2) is two instances of that pattern or a border-case of an infinite recursion pattern. Think term reduction and laziness vs. step-interpreting.

And if zaphod's third hand is my first foot: On the second foot, my definition of state says that none of the versions has state, as both functions only take one parameter that cannot be seriously considered as being the state of something. If your program consists of a text entry and output area who are tied by FACTORIAL, it certainly is state (and you should use a lazy, memoizing version).

You guys think too theoretically. *dodges blows*. The difference between theory and practice is that in theory, there is none. Analyzing the inside instead of the outside of any abstraction defies the very reasons you do abstraction for. But surely, you can use more Greek variables like that. *runs away*

Too many levels, too many definitions

The comments about transforming the recursion into a loop or rewriting in a "pure functional" style are the best examples I can offer that this discussion has mixed various definitions of (and, perhaps, emotional reactions to) the idea of "state". Much of this revolves around the fallacy of mixing levels.

At one level, both of the original definitions define "stateless" functions in the black-box sense that there's no memory or state between invocations; successive evaluations of FACTORIAL are independent of previous evaluations. As a counter-example, of course, one can offer a pseudo-random-number pseudo-function in which each invocation mutates some internal data (state) that persists between invocations.

Going down one level, one can argue that inside both original definitions there is state, in the sense of association between names and values, but that the first is immutable while the second is mutable. The first has immutable state because within a given evaluation the relationship between the symbol 'N' and a value is established once and never changed. The second has mutable state because within a given evaluation the relationships between 'N' and 'M' and their respective values are modified.

As an aside, one way to see that this is the case is to imagine that we are dealing with a "call by reference" implementation of the language (anathema, of course, to a purely functional mentality, but we are cracking the FACTORIAL black box open to look at its implementation!) If the caller passes a reference to something in its own environment, then the caller's "state" will have been changed after invoking the second version of FACTORIAL, but not the first.

The only way I can imagine relating the first FACTORIAL to mutable state is if I assume an implementation where there is a single FACTORIAL entity that is doing some sort of save-and-restore over 'N' when doing a sub-invocation. But that's not the only way the first version could be implemented! Suppose, for example, that each invocation (sub- or otherwise) of FACTORIAL causes the creation of a fresh entity (e.g., activation record, stack frame, etc.) which is simply destroyed on completion (or not, in the case of a really smart caching mechanism!). Then each invocation is a separate entity and its state is never modified. However, this entire paragraph is about what goes on below the level of the language in the original question; at that original language level it is inescapable that the second version expresses changes the values associated with 'N' and 'M' within a single invocation, while the first does not.

Finally, I'd like to apply this idea of levels to the argument made partway through the discussion:

Two systems with the same behavior are homomorphic. The two functions above have the same behavior, it doesn't follow that one has state and the other is stateless. Both are statefull by this argument. They are both systems and they both have the same behavior.

There's a problem with the phrase "the same behavior". Do we mean "same" in the sense of "a caller cannot tell which one was invoked based solely on examining the returned result"? In that case, as I mentioned above, they're both stateless.

Or do we mean "same" in the sense of "the same language-level actions occur when each is invoked"? In that case, they're certainly not the same, so the whole argument is moot.

Or do we mean "same" in the sense of "the same sequence of machine-code operations are performed when each is invoked"? In that case, we have no idea. It is possible to imagine a compiler that would transform the first into something equivalent to the second before generating machine code (virtual or otherwise), but there's nothing about the language-level definitions as offered that requires (or entitles) us to make any such assumptions or imaginings.

A very large portion of the history of computing (theoretical or otherwise) is about the concept of achieving results that are in some useful sense equivalent, while differing greatly in other details. One of the most useful tools for accomplishing this is to be clear with ourselves about boundaries between logical (or implementational) levels, and to be very careful about what we allow to leak across those boundaries (including our own wandering gaze! ;-)


A very large portion of the history of computing (theoretical or otherwise) is about the concept of achieving results that are in some useful sense equivalent, while differing greatly in other details.

A very large portion of everything is about this... In fact, the very notion of equivalence (and I'd say even the notion of identity) requires that there be some difference between the objects under consideration. Relating two objects is only sensible when, at least at the level of meta-discourse, they are in some sense two objects.

But I guess this is now purely philosophical...

This might be of interest.

This might be of interest.

Monads again?

Wasn't this what category theory is about? Because aristotelean "is a" or "is" proofed itself to be too inflexible (or undefined, take your pick) it says that members of a category (functions, objects, types, whatever) can only be related sensibly by saying which categories the relations apply to, in the sense that a woman is a man and a bitch is a woman (sorry), the first one being equality of species and the second one equality of sexes.
Even identity gets quite undefined when seen like this, like is it the identity as in f x = x or the (think like Kant for a moment) identity of things/appearances like in "x is the same electron in the same bit in the same ram chip".

I would say that that large portion of everything is about whether or not something constitutes nitpicking or being too imprecise, and in which way.


Because I brought up the notion of sameness of behavior, and systems theory I will try to answer your questions about "sameness" in terms of this theory. The link that I gave above provides answers to your three questions. The answers there are the same as what you can find in a classical theory of automata. In fact the entire classical theory as presented by Shields in "An Introduction to Automata Theory" can be seen as a discussion of sameness. There are 219 pages of mathematical results with theorem after theorem on equivalences of various kinds, behavioral, structural, compositions, and decompositions, and so on. The Kurz paper uses a different style of math but otherwise seems to have the same results.

Basically behavioral equivalence is a matter of comparing output strings. If the strings are equal the behaviors are the same. In the case of the two functions this could be the final answer or the complete string of results. This says nothing about structural equivalence, but in the example we know that the two systems are different and this implies a homomorphism or now days simply a morphism exist between the two systems.

Do we mean "same" in the sense of "a caller cannot tell which one was invoked based solely on examining the returned result"? In that case, as I mentioned above, they're both stateless.
I don't get the point, but the theory is experimental and we would test each system with the same input and observe the output.

Otherwise I am not qualified to defend the theory philosophically. Sorry.


I suppose I should add that there is a process of converting each function to the automata form. This is another exercise in "sameness", you need to imagine the input and output functions and the transition function appropriate for the experiment. Typically automata theory assumes that there is an output for each state transition. This is important for behavioral equivalence, or constructing the final system (ie the system of all behaviors for a signature) It is remarkable that all this has been formalized so well considering how "commonsenseacle" it is. There are numerous ways to convert and represent the two functions: rules, input/output boxes, transition diagrams, etc.

As I see it the automata or system forms are more basic than the code because they carry so much theory and because they are interdisciplinary. A system theory representation works independent of implementation type. We can compare a mechanical device with an electrical device, with a discrete digital device, with a software device or with an organic system. These are all exercises in sameness. I know that this is controversial, but you can't argue with the fact that it is done successfully all the time. Every successful design is a sameness between a theory and some tangible thing. Every sucessful theory is a sameness between the tangible thing and the theory.


I only had time to give a quick glance at the article you referenced, but it appears to me that Kurz describes (much more elegantly, to be sure) just what I meant by the first alternative (in his section 1.1, p.7):

1. Systems are reactive, that is, unlike algorithms, they are not supposed to terminate and announce a result, but are supposed to run, possibly forever, and to communicate, while running, with their environment.

2. The possible communications between a system and the environment are described in an interface. An external observer can observe a system only through the interface.

3. The external observer's view is called the black box view of a system. The black box view is given by the complete observable behavior of the system.

When I attempt to model in these terms either definition of FACTORIAL above, I end up with a FSA with one state--WAITING--and arcs from WAITING to WAITING (labeled as 0/1, 1/1, 2/2, 3/6, 4/24, etc.)

Of course there is the possibility of adding one more state--BROKEN--with arcs from WAITING to BROKEN (labeled as -1/, -2/, -3/, etc.) and no arcs from BROKEN at all. (And if I were Sir Tony Hoare, I would have called it BOTTOM... ;-)

In either case, my use of the term "black box" seems consistent with Kurz's in his point 3. Both definitions of FACTORIAL, viewed only externally, are described by the same (trivial) state diagram. If that's the level at which we're looking at FACTORIAL, then I honestly don't see a place for talking about 'N' and/or 'M', and would be happy to be enlightened on this point.

OK, I am not sure what we

OK, I am not sure what we are arguing about but I addressed 1.1, p7 above here.
Edit: I am not sure why Kurz is dancing around this issue, maybe he is afraid of arguments about statelessness? Anyway the notion of a terminating system based on an algorithm is classical.
Edit: Also the strength and significance of the coalgebra approach over the classical, is its ability to deal with infinite streams. This may be why kurz wants to emphasize infinite streams.

Recommended reading

There's a great thought provoking discussion in the "What is State?" section of Alan Bawdens's thesis Implementing Distributed Systems Using Linear Naming.


I thought of that too, when I read Hank's comments. Very well worth reading, and easy to get into. Of course, if people start pushing Alan's ideas here, the level of misunderstanding will only increase...

Just Boxes

For some reason this reminds me of input/output block diagrams, One of my favorite old-fashioned forms of representation. No names at all, just boxes.

Strikingly different

For a strikingly different point of view on "system" and "state" take a look at page 39, "What is a System".


Although I must admit that I'm not struck by the difference in the points of view (between Ashby and Bawden, as I assume you meant), I found Ashby's comments interesting. For some reason his discussion of single-valued predictions and finding the right variables reminds me of Robert Laughlin's A Different Universe, which I highly recommend. One of Laughlin's arguments about the limits of reductionism highlights the essential importance of complexity and the basically abstract nature of complex phenomena. It's been years since I read it, but I recall that one example is the laws of thermodynamics which, while certainly "implemented" by the low-level laws governing particle interaction, are in an important sense independent from those lower-level laws, as we are fundamentally limited in our ability to extrapolate lower-level laws from higher ones, and vice versa. (Importantly, the laws of thermodynamics were understood well before the modern theory of particle interaction, and are consistent with both the current and obsolete theories.)

I guess this is rambling a bit, but there is a salient connection. One of the key themes of this thread is, as Anton said:

A basic lesson driven home by PL theory, or study of formal systems, is that nothing has any meaning outside of a context which gives it meaning. Change the context, and the meaning can change.

Or, as Rys said, "context frames value." I think the Ashby reading makes clear that there is a fundamental relationship between context and abstraction. And I think this is very interesting to LtU, since of course abstraction is another great object of study in PL theory. Is abstraction opaque, or always translucent, so to speak? What are the properties of abstraction? Are there seams "out there" in the world, or do we carve it up according to our whims? Are there better and worse ways to carve it up?

If you want to be quite concrete about it, consider delimited control operators, which in one sense can be understand to convert between context and abstraction.


we are fundamentally limited in our ability to extrapolate lower-level laws from higher ones

Totally off topic, but if this sort of thing interests you you might find this philosophical paper about laws of functional biology interesting.


You are as usual a gold mine of great references! This paper makes exactly the point I was trying to make. I find his emphasis on invariance under counterfactuals novel and interesting, and the relationship between abstraction and context is quite clear. Of course many questions remain open...

If it's not obvious by now, "this sort of thing" interests me quite a bit... Thanks for the great link!