On the Importance of Purity

I've learned a lot from LTU over the past few years, and I'm very grateful for all the discussions and the patience. LTU has inspired me to create my own language, of which I only have an interpreter so far. To help organize my thoughts, and to spread the word about advanced programming languages to a wider audience, I've written a post explaining some poorly understood features (outside of academia or LTU at least) which I think will become increasingly important in the coming years.

On the Importance of Purity:

The benefits of advanced programmings languages are sometimes difficult to grasp for every day programmers. It's sometimes hard to understand the features of such languages and how they relate to industrial software development, especially since the arguments are couched in terms such as "referential transparency", "totality", "side-effect-free", "monads", "non-determinism", "strong static typing", "algebraic data types", "higher-order functions", "laziness/call-by-need", and so on.

Many of these features are attributed to "pure" languages, but purity can be hard to understand in the context of everyday programming. I will explain the importance of a number of these features and how they impact the everyday programmer's life.

The explanations aren't rigourous, and perhaps even the distinctions I draw overlap somewhat in the real world, but I'm hoping it's an accessible intro to newbies; I'm still one myself after all! I won't post anymore on the subject here unless others think it's sufficiently on-topic, but I welcome any corrections or suggestions to my post! Post there if it's off-topic for LTU. This concludes my self-promotional message. :-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

?

Nice article Sandro, clearly you put a lot of time and thought into it, but I would take issue with this part:

Consider some function that is RT, but, whenever you call this function with the same parameters, it returns a completely different value! [...] Let's revisit the scenario again: some function that is RT, but, whenever you call this function with the same parameters, it returns a completely different value every time. We've all used such functions at one time or another; consider a function that returns the value of the clock, or consider the rand() function.

Those functions aren't referentially transparent. I think that you might confuse your readers by saying this, as even the definition of referential transparency you offer in your article contradicts these statements.

Hmm, yes I can see how it

Hmm, yes I can see how it can be misinterpreted. I'm really trying to say, "assume we're using an RT language, how can we include functions which returns different values even with the same inputs? such functions exist ... clock ... rand(), etc."

Is that re-wording clearer?

non-determinism too

I also cringed a bit at the use of "non-deterministic" as well in describing functions like rand(), which are quite deterministic, but with hidden state.

Undeterminable determinism = non-determinism

I also cringed a bit at the use of "non-deterministic" as well in describing functions like rand(), which are quite deterministic, but with hidden state.

I want to take up the defense of that usage. From "inside the system", hidden state is the same as non-determinism. Just because you have "looked behind the curtain" and know that "really" there is a deterministic pattern to the stream generated by repeated calls to rand() doesn't change the view from within the program, where for all intents and purposes it is non-determistic.

(Think about if and how you would prove what the pattern was deterministic just from the output without knowing anything about the algorithm or even if it was algorithmic.)

This is the flip side of implementing a purely functional language on a stateful machine: the statefulness of the machine is irrelevant from within the language because it is not visible from within it.

My crypto is showing...

The problem is that it isn't necessarily irrelevant, ask any cryptographer. With a function like rand()[1] it's fairly trivial to determine the hidden state from observing only the output. Even cryptographically secure PRNGs can be exposed this way (given enough resources) and the reason for this is because the deterministic nature *is* visible, even if you only get a brief peep with each call. The P in PRNG is there for a reason.

[1] Note that I am talking about the common notion of a rand() function. Yes, as renox mentioned, it can be tied to a hardware based RNG that's spitting out high quality bits, but that implementation just isn't common,.

Determinism

Might it be said that at some level all things are deterministic? But, if I gather what Marc says, non-determinism is a Point of View characteristic.

Just as an aside, another form of non-determinism is in the form of concurrency. For example, a language like Erlang has non-mutable variables. But that does not mean that it does not have mutable state. Here's what I came up with recently to mimic mutable state cells in Erlang:

new_cell(X) -> spawn(fun() -> cell(X) end).
cell(Value) ->
   receive
      {set, NewValue} -> cell(NewValue);
      {get, Pid}      -> Pid!{return, Value}, cell(Value);
      {dispose}       -> {}
   end.
set_cell(Cell, NewValue) -> Cell!{set, NewValue}.
get_cell(Cell) ->
   Cell!{get, self()},
   receive
      {return, Value} -> Value
   end.
dispose_cell(Cell) -> Cell!{dispose}.


Nothing particularly new there. It just helped me make the mental connection between state and concurrency.

Omniscient message passing is determistic

It just helped me make the mental connection between state and concurrency.

Exactly, Chris: I was explicitly thinking of this as I was formulating my thoughts in this thread.

Imagine if you have a message passing scenario where you knew a priori what messages would be sent, in what order, and in what order they would be received, you would have defacto determinism, i.e. you could reimplement the same scenario as declarative concurrency.

It is exactly because we normally can't know this information a priori that it is non-determistic.

Maybe and sort of

I honestly don't know the answer to your first question. I'm guessing it's still an open question if there's such a thing as non-determinism in our universe. Though we can certainly model things that appear non-deterministic in a deterministic way.

As for the second bit, I guess it depends on what you mean by point of view characteristic. At the least, it's a characteristic whose absence can be safely ignored in a lot of cases, but whose presence shouldn't be assumed in others. But that's the case for an awful lot of properties, e.g. referential transparency.

Trust In God

Might it be said that at some level all things are deterministic?

Yes, because God has forseen it all.

God and Determinism

Yes, because God has forseen it all.

This doesn't belong here. Take it somewhere else.

Well where does the state of the universe variable come from?

I think it's actually pretty fair to use the word God in this conversation.

When trying to explain the Haskell IO monad, I've occasionally said that you can think of your program as one command in the IO monad, in a chain of such functions with an initial state passed in by God, and a result read out by Him at the end.

Given the existence of God, and a deterministic universe, this explanation makes perfect sense. Whether these conditions are met is a metaphysical or philosophical question, but from the inside of a Haskell program, you can assume these things - you can't tell the difference.

Epistemology

the reason for this is because the deterministic nature *is* visible, even if you only get a brief peep with each call.

Yes but only if you know for sure that it is deterministic. ;-) If it is a truly random implementation, it could just be identical to some deterministic sequence up to the n you have sampled, but may suddenly diverge from that sequence at n + 1.

Determinism can be reframed in this way as a problem of knowledge: if the box is black enough, all sequences are non-deterministic, and if the box is clear enough (beyond the actual limits of knowledge), all sequences are determistic.

Heh

I wouldn't call rand deterministic

After all, it could be implemented by reading the output of a hardware random generator which gives true randomness..

Exactly what I was thinking

Exactly what I was thinking when I used rand() as an example. I know rand() as currently implemented in libc is not truly random, but the idea of a random number generating function is what I was using. Perhaps I should just use random() to distinguish from libc's rand() to satisfy the nitpickers. ;-)

Another thought on "determinizing"

Another solution to making rand() play nicely with RT is to augment it with a paramater: rand(): N->N.

Even if the implementation of rand() was truly random, you would could memoize the results, so that rand(k), k some particular value, always returned the same value. Then you simply need thread through a counter to the callers of rand() to get a "random sequence".

In this case, there would be a global, RT implementation of rand().

Furthermore, if you thread the counter implicitly through the subcallers, you have a pretty good model of how Haskell monads "determinize" non-determinism within a program.

Nice article -- glad to see

Nice article -- glad to see some good propaganda being written! ;-) I like how you try to pick examples from the context of a C/C++/Java programmer.

I've worked some with (sequential and parallel) pseudorandom number generators, and wrote an article (for the class I'm TAing this semester) explaining the "stream(s)" model of computation that they use, in the context of shared-memory concurrency:

http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf

I do agree with the other commenters that it's important to distinguish "nondeterminism" from "having side effects," especially in the context of pseudorandom numbers. My article argues for a "streams" model that makes this distinction while allowing the PRNG to be implemented without side effects (or with side effects, for that matter, as long as different processes don't share state, and the side effects aren't on ambient objects but only on an object which is passed explicitly by handle).

I think your shorter article could make a good start on a longer article too, if you expand out the examples into code fragments, and also expand out the strong static typing section with examples.

"In C, everything inherits from void*. Except for register variables. And bytes. And..." ;-P

Capabilities

Those of you well-versed in the security field might recognize this constraint: it is the same authority propagation constraint underlying capability security.

This is a good observation. Unfortunately, having the IO monad doesn't give you proper capability security. I think if you had a more precise effect system it could act as a capability security layer though.

The type of a computation which writes to a file and a computation which deletes a directory will need to differ - this is something Simon Peyton Jones says he would like to see, in the Haskell retrospective. See The Marriage of Effects and Monads by Wadler, too.

The downside of strong static typing is that the analysis is necessarily conservative, meaning some legitimate programs will produce a type error even though they would not generate an error at runtime."

Since an ill-typed term doesn't have a semantics, isn't it a bit nonsensical to talk about what it would do at runtime?

This is a good observation.

This is a good observation. Unfortunately, having the IO monad doesn't give you proper capability security. I think if you had a more precise effect system it could act as a capability security layer though.

I don't see it. A blatant violation of capability security in Haskell is that the file system is an ambient authority [1] (but not the only violation unfortunately). The file, once reified as an "IO Handle", is a capability though.

Perhaps you have a different notion of capabilities?

And the point I hinted at was that complete referential transparency is at the heart of capability security, not the IO monad. Ambient authorities are antithetical to referential transparency IMO, which is a point I intentionally did not elaborate on.

[1] See "openFile", Section 7.4

And the point I hinted at

And the point I hinted at was that complete referential transparency is at the heart of capability security, not the IO monad.

Ah, ok.

I don't see it. A blatant violation of capability security in Haskell is that the file system is an ambient authority [1] (but not the only violation unfortunately).

This is what I meant - an IO action can cause any effect, it has ambient authority.

This is what I meant - an IO

This is what I meant - an IO action can cause any effect, it has ambient authority.

Just to make sure we're clear: in order to trigger that effect, you must be given a handle to the IO monad in question. So the effect is not ambient, it is reified in the handle, ie. it's a capability; it's the ability to create that handle that is ambient, ie. openFile is an ambient authority, but writeFile is not.

Your part on CH is totally jacked up

Your part on CH is totally jacked up. I will provide more constructive criticism when I have more energy and time. Probably tomorrow afternoon.

I welcome any corrections.

I welcome any corrections. Note that the treatment is necessarily brief due to the audience, but if what I've said is simply incorrect in some way, please let me know!

CH

The types correspond to propositions. The type checker doesn't check that they are consistent. I can feed it all kinds of rubbish forall a.a, forall a b. a -> b, forall a b.(a,a -> b). It doesn't care. A type system isn't a theorem prover. What it is -is- a proof -checker-. You provide the proof (program) and the type checker checks that it is valid. Usually, type checkers don't explicitly construct (or at least give you access to) the proof of -that-. So it checks that your proofs are "valid", but it doesn't check if anything is logically consistent. Furthermore, in most languages, you can readily prove "false" propositions and all kinds of "logical contradictions", that is, in most languages the logic of their type systems is inconsistent (true = false, so all things are true). In fact, this has to be the case, somewhere at least, to have Turing-completeness.

A more "powerful" static type system -allows- you to make tighter specifications though usually not for free.

So the types correspond to the "logical specification" (though usually a very loose one, even with fairly sophisticated type systems) and the program is a constructive proof that that specification is satisfiable.

I appreciate the

I appreciate the corrections, but I'm not clear on one or two things:

Usually, type checkers don't explicitly construct (or at least give you access to) the proof of -that-. So it checks that your proofs are "valid", but it doesn't check if anything is logically consistent.

Hmm, I'm having trouble drawing an accessible analogy to explain this additional precision; the corrected statement is not that the type checker verifies the consistency of the propositions, but that it verifies the consistency of any deductions (programs) given the propositions (types). So in a way, the analogy I used is still valid it just requires a different mapping than the current text implies.

So this:

Thus, if your whole program type checks, it is logically consistent: it contains no contradictions or logical errors, and the logic machine constructed a proof of this fact.

should be amended to:

Thus, if your whole program type checks, the use of all types is consistent: it contains no contradictions, and the logic machine verified a proof of this fact.

Is that sufficient? That seems like the only truly contentious line given your comments.

Furthermore, in most languages, you can readily prove "false" propositions and all kinds of "logical contradictions", that is, in most languages the logic of their type systems is inconsistent (true = false, so all things are true). In fact, this has to be the case, somewhere at least, to have Turing-completeness.

Interesting; this is the first I've heard of this connection. Do you have a pointer to something that elaborates specifically on the connection between Turing-completeness and inconsistency?

Given that total languages are not Turing-complete, I suspect that partiality is the source of this inconsistency, and consequently, the existence of terms of type _|_, correct?

Also, I did specify that a static type system in combination with the previously discussed features has these consistency properties. This would imply that the language is total.

A more "powerful" static type system -allows- you to make tighter specifications though usually not for free.

Yes, I did not elaborate on what this involves, I just wanted to mention the extent of what is possible.

Curry-Howard does Curly Howard

Do you have a pointer to something that elaborates specifically on the connection between Turing-completeness and inconsistency?

I think this is something a little bit confusing about Curry-Howard, and interestingly it links up with some other threads around here lately involving determinism and Gödel.

For a particular program to act as a proof of the proposition represented by its type, it must terminate on all inputs specified by the type. The reason for this is simple: proofs must terminate (must be finite) or we don't consider them proofs.

If a type system admits a potentially non-terminating program as a proof of its type interpreted as a proposition, it must be inconsistent when interpreted as a logic. (It proves non-provable things)

This explains why all the well-known CH type systems are strongly normalizing.

Having said that, I don't think a type-system has to model a consistent logic in this way to make interesting guarantees, but perhaps that's another post. ;-)

PS the joke implied in the title is not original. See this thread.

Just as a point of reference

Just as a point of reference for others, I found the following links illuminating:

neelk's quick and dirty breakdown of Curry-Howard (in the comments of the article you pointed to)

Termination checking with types; the specifics are still over my head, though I get the general idea and the discussion is interesting. I wonder what types of recursion are permissible using this technique, ie. only primitive recursion, or more general forms.

Total functional programming; my first intro to totality, and having re-read it recently, it has a brief discussion on _|_, recursion, partiality. In the comments there is an interesting discussion on encapsulating partiality in a monad, but all I can find are slides on the subject, no papers. Anyone have more info on the subject?

Usually, type checkers don't

Usually, type checkers don't explicitly construct (or at least give you access to) the proof of -that-. So it checks that your proofs are "valid", but it doesn't check if anything is logically consistent.

Hmm, I'm having trouble drawing an accessible analogy to explain this additional precision; the corrected statement is not that the type checker verifies the consistency of the propositions, but that it verifies the consistency of any deductions (programs) given the propositions (types). So in a way, the analogy I used is still valid it just requires a different mapping than the current text implies.

The type checker checks that you program does actually inhabit that type. It could return a proof of that, namely a typing derivation, but about the only use that would have would be to verify the correctness of the type checker. What it doesn't do is prove that the types are "logically consistent" or meaningful in anyway. It also doesn't prove that the theorems expressed by your types are "true." Your program does that. The type checker proves Γ |- E : A, not Γ |- A, that is, it proves in some environment Γ, E proves A, not given given some assumptions Γ, A is true. For example, the type checker cannot prove that A → B,B → C |- A → C. It can prove that given f : A → B, g : B → C then g . f is a proof of A → C. All the type checker it proves is that you used the typing rules correctly.

I wouldn't bother trying to explain what the type checker "proves", I'd simply say the type checker is a proof checker, the programs the proofs and the types are the theorems. true + 3 is not a contradiction or an inconsistency, it is just not a valid proof.

Thus, if your whole program type checks, the use of all types is consistent: it contains no contradictions, and the logic machine verified a proof of this fact.

There's too much ambiguity there for me to tell if that is much better. If the whole program type checks, then all typing rules (corresponding to rules of inference) were correctly used. The programs don't have contradictions. The type system ("logic machine") has proved that the typing rules have been used correctly, which implies (verifies) that the program is a proof of the theorem corresponding to it's type.

What may help is to look at what a "type error" corresponds to in a logician's proof. Here's an example: "Via modus ponens (the beta rule), A → B and B implies A." This corresponds to the program λ(f : A -> B, x : B). f(x) : A, i.e. calling a function with a value of the wrong type. Such an error isn't usually called a contradiction or inconsistency.

Given that total languages are not Turing-complete, I suspect that partiality is the source of this inconsistency, and consequently, the existence of terms of type _|_, correct?

Correct. That said, the potential for partiality (via non-termination) is unavoidable in a Turing-complete language.

Also, I did specify that a static type system in combination with the previously discussed features has these consistency properties.

Where?

Ultimately, a lot of my issue does stem from terminology you use. In that particular section you seem to be making things more confusing than necessary. A slightly modified form of the usual CH slogan, "Types are theorems and programs their proofs," is something everyone can understand terminologically. No need to make up "logic machines" (a completely undefined term) and then use it with precisely defined terms like (logical) consistency and contradiction. All you need to do is connect "theorem" to specification and "type checker" to proof checker and the potential benefit of CH should be clear.

Where to draw the line of precision?

There's too much ambiguity there for me to tell if that is much better.

Well, I think it's better in the sense that it's not blatantly wrong, it's just imprecise; I'm actively trying to avoid discussing that level of detail. I did go into more detail in previous sections however, so I will apply some thought to this section as well to see if it can be made more precise/meaningful without harming the accessibility.

Such an error isn't usually called a contradiction or inconsistency.

Ok, good example. I'm trying to place myself in the mindset of someone who isn't familiar with formal logic, but who has an informal grasp of what a "contradiction" is. In your example, the declared type of f is A->B, and the declared type of x is B. Applying f to x, is a clear example of how your declarations contradict your usage, and thus either the declarations or usage must be incorrect.

So in the analogy I drew to a mysterious "logic machine", the dialogue goes: "Expect f:A->B, Expect x:B, Use:f(x)", and the machine produces: "Usage of f and x inconsistent with declarations".

Ideally, I would like my explanation to be correct and precise, so as to educate readers a little on what static typing entails, but I will not use terminology like "terms", "typing judgments", "propositions", etc. because they are meaningless to readers coming from imperative languages, who at best have an informal notion of a "statement". I hope you can appreciate the corner I've backed myself into. ;-)

Where?

Here:

There are many arguments for and against strong static typing, but in combination with the previously discussed features, static typing enables developers to write software that is "almost correct by construction"

"Types are theorems and programs their proofs," is something everyone can understand terminologically. No need to make up "logic machines" (a completely undefined term) and then use it with precisely defined terms like (logical) consistency and contradiction. All you need to do is connect "theorem" to specification and "type checker" to proof checker and the potential benefit of CH should be clear.

So would you be satisfied with "types are specifications, and a successfully type checked program is a proof of that specification"? I might be able to work with that.

Making the discussion too precise harms the accessibility. The reason I think the imprecise and undefined term "logic machine" works, is because people have informal understanding of "logic", and an informal understanding of "machine", so the conjunction "logic machine" actually does have some meaning to them; the ambiguity when discussing the machine is inevitable when invoking the "magic" that most people don't understand.

The problem with "theorem", "proposition", perhaps even "specification", is that those terms hold no meaning for them, and so a lot more explanation and thought would be required. The fact that the reader has no idea how a "logic machine" could be constructed is irrelevant, because at least they can picture an interaction with the black box.

Not more precision, more clarity

Well, I think it's better in the sense that it's not blatantly wrong, it's just imprecise; I'm actively trying to avoid discussing that level of detail. I did go into more detail in previous sections however, so I will apply some thought to this section as well to see if it can be made more precise/meaningful without harming the accessibility.

I didn't say that it was right, I said I didn't really know what you were saying to be able to judge. It's not clear how you are using the terms. In my opinion, there are better words to use. For example, instead of contradiction or inconsistency, why not just "error"?

The problem with "theorem", "proposition", perhaps even "specification", is that those terms hold no meaning for them, and so a lot more explanation and thought would be required. The fact that the reader has no idea how a "logic machine" could be constructed is irrelevant, because at least they can picture an interaction with the black box.

Everyone knows what a theorem is. "Theorem" here means the same thing it did in middle/high school geometry and algebra. Every programmer should know what a specification is. I explicitly avoided proposition (though that's the word I'd prefer) as I will agree most people don't know the technical meaning of that term. "Proof checker" is the only term I'd worry about, though the "obvious" meaning from the words is accurate enough.

So would you be satisfied with "types are specifications, and a successfully type checked program is a proof of that specification"? I might be able to work with that.

There is a reason I suggested saying "types are theorems" and then relating theorems to specifications. Usually you don't speak of "proving" a specification, but rather (correctly) implementing it. So using the terms consistently would lead to: "types are specifications, and a successfully type checked program is a (correct) implementation of that specification," which is tautological to some, though perhaps very revealing to others.

Making the discussion too precise harms the accessibility. The reason I think the imprecise and undefined term "logic machine" works, is because people have informal understanding of "logic", and an informal understanding of "machine", so the conjunction "logic machine" actually does have some meaning to them; the ambiguity when discussing the machine is inevitable when invoking the "magic" that most people don't understand.

The intersection of "logic" and "machine" would for me suggest something that "does" logic. In particular, a logic programming language or (equivalently, via CH no less!) a theorem prover. A theorem prover is precisely what a type checker is not.

CH? Fsck!

What it is -is- a proof -checker-. You provide the proof (program) and the type checker checks that it is valid.

Tsss. I find this a very skewed interpretation which is only true for just some of us that are in the pulp and paper industry. For mere mortals a type system just guarantees that certain constraints on programs are met and provides a convenient manner of expressing the interface of certain constructs. [btw, this is no critisism on Dereks explanation, just a statement that CH has hardly no pragmatic relevance]

Some disagreements

Nice article, but I have some disagreements:

1) You say that

Most developers have experienced the problems of non-determinism when a client calls in with a problem, but the problem is not reproducible even by retracing the exact same steps.

Retracing the exact same steps does not make the state of the client's computer equal to the state of the developer's computer. Repeating the same steps only approximates the previous state.

This is important to understand because the problem of 'but it runs on my machine' is frequent amongst developers.

2) I am still waiting for someone to explain to me why assignment is not good in programming languages. I've read many papers, I am reading LtU continuously, but I haven't found a persuasive explanation.

Before jumping to conclusions, I have to say I agree that referential transparency is important in programming, and global variables should not be allowed.

But what if assignment to local variables was the only assignment allowed?

I think that this approach could be described as referentially transparent, because the output of a function only depends on input.

Assignment of this type could be described as serialization of function application, as one result of the computation depends on the previous results...which is exactly the same as monads.

The reason I am a supporter of the above is because not being allowed to have assignment causes unnecessary complexity and performance issues. Granted, monads can simulate all types of destructive updates, but I wonder why do I have to use and learn about them when a simpler solution exists with equal results.

No monads needed

But what if assignment to local variables was the only assignment allowed?

I think that this approach could be described as referentially transparent, because the output of a function only depends on input.

Assignment of this type could be described as serialization of function application, as one result of the computation depends on the previous results...which is exactly the same as monads.

You don't need monads for this kind of assignment, even in a pure language. You're right that mutation of this type can be modeled as serially applied functions, and that's exactly what nested LET does, e.g.:

let x = foo in
  ... let x = bar in
      ...

This kind of code is easy to write, so supporting local mutation doesn't buy you very much. Some people might find it convenient, although that has a lot to do with what they've previously been exposed to.

Even in the local case, it can be useful to have immutable variables because it makes code easier to reason about and less susceptible to errors due to refactoring: i.e. local referential transparency is useful, too.

In a language like Haskell, for which lack of mutable variables is a fundamental design characteristic, it wouldn't make much sense to compromise purity just for this. The only uses of monads it would eliminate would be redundant uses - valid uses of monads wouldn't be affected, because the point about monads is that monadic values carry their state around with them, beyond local scope.

In languages that support mutation, like ML and Scheme, you can of course do local mutation, but for the reasons above that's not the real benefit. The real benefit is being able to mutate variables that have wider than local scope, when needed, and this is what allows you to avoid monads for state manipulation in those languages.

[Edit: added the following]

Granted, monads can simulate all types of destructive updates, but I wonder why do I have to use and learn about them when a simpler solution exists with equal results.

I'm sure if you dial 911 (or your local equivalent), they'll come and deal with whoever's holding that gun to your head and forcing you to use Haskell! ;)

But seriously, to recap, the "simpler solution" to monads is mutation that's not restricted to being local, and there are languages like ML that let you do that, so you already have what you want. However, for people who understand why they need it (e.g. including, but not limited to, people working on theoretical applications where the mathematical tractability is an advantage), being able to avoid mutable variables can be invaluable.

[quote]

The real benefit is being able to mutate variables that have wider than local scope, when needed, and this is what allows you to avoid monads for state manipulation in those languages.

When I referred to local assignment, I meant any variable, including parameters passed by reference.

Doesn't that count as referentially transparent? the result of the function depends only on its inputs.

However, for people who understand why they need it (e.g. including, but not limited to, people working on theoretical applications where the mathematical tractability is an advantage), being able to avoid mutable variables can be invaluable.

Why functions which modify their arguments (but not other variables) are not mathematically tractable?

It's not RT because you're

It's not RT because you're not really passing a value any more, you're passing a reference which has to be dereferenced at least once.

Mathematically it's a little better than being able to modify anything, but really all you've altered is some scoping rules. It's a useful discipline if you've got to have complex state, but don't mistake it for offering anywhere near the advantages of purity.

It's not RT because you're

It's not RT because you're not really passing a value any more

But it satisfies the criterion that a function's results depend only on its inputs.

but don't mistake it for offering anywhere near the advantages of purity

Could please someone explain those advantages? I would appreciate it if someone posts an example of RT code and non-RT code (in the way specified above) and show why one has the advantage over the other. I apologize but I don't understand why one is RT and the other is not.

Could please someone explain

Could please someone explain those advantages? I would appreciate it if someone posts an example of RT code and non-RT code (in the way specified above) and show why one has the advantage over the other. I apologize but I don't understand why one is RT and the other is not.

The advantages of RT are not obvious from a small snippet of code. RT makes code more compositional, so its true benefits are more apparent in large bodies of code, and particularly while refactoring; even large changes in code structure will not have large ripple effects on the behaviour of code. Unless you were very disciplined in your use of side-effects, you can only approach the compositional safety of code with an enforced RT code structure.

Also, if you're not familiar with capability security, then read up on it a bit, as the RT constraint is similar as that of capability security as I mentioned in the article.

The advantages of RT are

The advantages of RT are not obvious from a small snippet of code. RT makes code more compositional, so its true benefits are more apparent in large bodies of code

Ok, could you please provide an example of a big snippet of code but with removing the unnecessary parts?

even large changes in code structure will not have large ripple effects on the behaviour of code. Unless you were very disciplined in your use of side-effects, you can only approach the compositional safety of code with an enforced RT code structure.

Even if side affects are limited to local scope?

And how much more safety would RT buy you, if the algorithms are incorrect? Refactoring bad code, even functional one, would result in bugs.

Also, if you're not familiar with capability security, then read up on it a bit, as the RT constraint is similar as that of capability security as I mentioned in the article.

Thanks for the tip.

That was not so long time

That was not so long time ago posted here.

In short, first version of compiler graph manipulation routines was based on mutable graphs. The decision proved itself to be error-prone.

Second version of graph manipulation routines is based on inductive graphs which are referentially transparent. It allowed developers to reduce code size to less than 50% of first version, reduce number of errors and implement new optimisation without any noticeable loss of speed.

All examples you need are there.

It sounds like some logical

It sounds like some logical errors took place ("violated invariants"). I suspect that (without having read the paper in detail) that what they initially did was to pass a control flow graph around, and temporarily mutate its nodes because they didn't want to copy them or increase the number of parameters of functions (i.e. they wanted to pass around a single graph parameter, instead of two of three).

Is it mutation to blame or an inability to manage the problem?

There are some established practices in computers when large and complex data are to be manipulated: a copy of the data is retrieved, then modified, then merged back in in the original data source. For example, when I am editing a set of source files, I do that on the local copy of the project, not directly on the files of the repository. When I am editing data from a database, I edit them separately, then store them in the database...In the same manner, when I want to 'edit' a large part of a graph, I first copy it, then modify it, then merge it back to the graph. Copying the graph should have been their strategy from the first place; it's illogical to me, why experienced people did not choose copying in the first place.

I am not convinced that mutation is bad. Mutation imposes a sequencing of computations. Sometimes this sequencing is not only necessary, but desirable. Furthermore, the referential transparent code could have had the same type of violations; for example, instantiation of the wrong type of node (for example, wrong id of node), could have caused a whole bunch of invalid subsequent computations. That a pointer is assigned or passed as a parameter to a function is irrelevant, with respect to the need for sequencing of the computations.

As for not noticing loss of speed, it's specific to this type of problem; it is even specific to the input data (if lots of temporary graphs were thrown out, then performance would certainly be worst). In other types of problems, mutable variables have the advantage (in-place quicksort, for example).

Mathematically, any functional code can be converted to imperative and vice versa; but modern and foreseeable capabilities of computers dictate that mutation is a required capability. Mutation not a 'tool of destruction', it just needs some discipline in certain cases. I'd rather have the freedom to avoid mutation rather than the "freedom" to simulate it (which does not increase the difficulty of complex tasks but makes the simple tasks complex).

In other types of problems,

In other types of problems, mutable variables have the advantage (in-place quicksort, for example). [...] but modern and foreseeable capabilities of computers dictate that mutation is a required capability.

On the contrary, modern computers are becoming more hostile towards mutation due to multicore CPUs. Mutation triggers cache-coherency protocols which is devastating to performance. RT code instead scales almost linearly with the number of CPUs, as locks are generally only required to synchronize access to roots.

No doubt you can simulate the same approach in a mutating language using purely functional data structures, transactions, and so on, but once again, it's up to the individual developer to be sufficiently disciplined to maintain the required invariants. Personally, I'd rather be lazy and let the language enforce these constraints for me so I can focus on more challenging problems.

In other types of problems, mutable variables have the advantage (in-place quicksort, for example).

Whose advantage is quickly lost once scaling a sorting algorithm across multiple CPUs.

On the contrary, modern

On the contrary, modern computers are becoming more hostile towards mutation due to multicore CPUs. Mutation triggers cache-coherency protocols which is devastating to performance.

That's only on code which is written to run specifically on multiple cores, i.e. very little code out there. Modern computers have 2 or 4 cores, but these cores are used to parallelize processes instead of threads.

RT code instead scales almost linearly with the number of CPUs, as locks are generally only required to synchronize access to roots.

RT code has mutation in it, you just don't see it. For example, each time you do a 'cons', there is mutation: the pointer of the head node is set to point to the next node, i.e. memory contents change; the same thing happens when you compute a + b and write the result in variable c.

That's why we don't have automatic parallelization of functional code yet.

it's up to the individual developer to be sufficiently disciplined to maintain the required invariants. Personally, I'd rather be lazy and let the language enforce these constraints for me so I can focus on more challenging problems.

How come the one-size-fits-all approach is good when it comes to FP but it is not good when someone complains about the number of available languages?

Whose advantage is quickly lost once scaling a sorting algorithm across multiple CPUs.

So do you propose that a database (for example) should sort its records using multiple CPUs, if available, and using RT? I don't think that the performance of this will even be comparable to the in-place sort. The best approach would be a hybrid one, i.e. partition the table and give each partition to a CPU do to in-place sorting.

For the average person, true multicore computing (in the sense that each additional core increases a program's performance) is still decades away. Having 2 or 4 cores gives little chance to programs to increase their performance...and most programs are sequential (even FP programs), since each computation usually depends on the previous ones.

Seeing is believing

RT code has mutation in it, you just don't see it.

That's the key point though: you can't observe any side-effects from that mutation within the system of the program.

I'm starting to wonder, Achilleas, if you are actually a plant funded by some shadowy pro-RT think-tank to come here and provoke us all into repeating our arguments in favour of RT. ;-)

If you don't want to reduce statefulness, or choose a less stateful language, don't. I just strongly believe you won't be as effective at building reliable systems if you don't know how to do so.

I just don't believe that

I just don't believe that statefulness is evil. I believe that unmanaged statefulness is evil, i.e. having state without rules.

I also believe that if values were types, statefulness would be just as good as purity.

No one has convinced me so far, no one has presented a real argument, why purity is important. Each time they tell me an example, I point out that the same mistakes can happen even if with purity. Logical problems are exactly the same in pure and inpure languages.

I believe that unmanaged

I believe that unmanaged statefulness is evil, i.e. having state without rules.

But what rules? Referential transparency is a rule, and monads are a way to introduce statefulness. See the recent survey for other approaches to introducing effects. Given your stance, I think you will be most favourable towards type and effects systems.

But note that without an effects system, or some other way to control effects, the rules for languages with state are too weak. For instance, in SML/OCaml you can design a map whose internal logic dictates that only pure functions 'a->'b be applied (perhaps because it memoizes results), but the client can provide it with a side-effecting one, thus resulting in unpredictable behaviour; this property can only be communicated via developer comments in such languages. With an effect system it can be enforced via the (type,effect) signature assigned to each function.

global vs local reasoning

I point out that the same mistakes can happen even if with purity

You are of course ultimately right. If two languages are universal then either one can simulate any program written in the other - even the incorrect garbage.

So purity can't prevent you from screwing up. But, what it does do is limit the scope of each possible screw up.

Without purity, to be correct you have to implicitly assume that the entire state of your store (and in fact the whole world when you consider I/O) is an input and output of each procedure. That's VERY hard to reason about. In practice nobody does that and we all try very hard to figure out what corner of the world is actually important within a particular procedure call. But that's easy to get wrong.

Impurity creates at least two problems
1) aliasing means an apparently local update (e.g. a change to an in/out parameter) may have very far reaching effects that can't be determined without global analysis to determine where those aliases live.
2) side effects that occur when you call other procedures may cause I/O or update aliases or something. The result is that moving a procedure call earlier or later in your code can cause incorrect behavior even though it looks good as far as the obvious local dependencies are concerned.

With purity you can reason about code using a much simpler, locally focused logic. That doesn't automatically mean you'll reason correctly. But it does improve your odds.

A concrete example of refactoring

One of the first higher order functions that people are typically introduced to is map for lists. Map's roots come from functors in category theory. One of the nice properties that falls out is that if xs is a list and f and g are functions with the appropriate types then

map g (map f xs) = map (g . f) xs

In other words, it doesn't matter if we first map with f then map with g or if we compose g and f and map with that composition.

Given that law I can feel comfortable "refactoring" one into the other based on aesthetics, performance, or whatever. In a pure language I can look at such code locally and make decisions.

But in an impure language it's not so simple because the order of operations might matter. On the left f will be applied to each element of the list first and then g will be applied to all the results. On the right, calls to f and g will be interleaved. If f and g have side effects like doing I/O or updating closure variables then the refactoring might lead to incorrect results.

So in an impure language I can only do this refactoring if I know what f and g do. But f and g might be parameters that came from some distant corner of the code - possible a corner I didn't write. And even if I know what a particular f and g do now I can't know that in the future some other programmer won't pass in a different f and g where the order matters.

In practice in an impure language the best I can do is document that f and g should not have side effects and hope that the documentation is followed and then hope that testing catches the inevitable mistakes.

That's why we don't have

That's why we don't have automatic parallelization of functional code yet.

Today, I can have GHC link my concurrency-unaware code with the -threaded option, run it with the -Nx RTS option and have it run with x threads processing at once. IIRC for small x this reliably speeds up the execution of GHC itself.

But it depends on the exact

But it depends on the exact type of problem. Some problems can easily be parallelized (especially those where data can be divided in chunks), some don't. We don't have automatic parallelization of ANY functional code yet, especially on mutable state (which is not going to go away any time soon, because programming is side effects, after all).

His point is that Haskell is

His point is that Haskell is more easily parallelized precisely because effects are so controlled, and the structure largely functional. Thus the GHC compiler can parallelize a great deal more than you think. I already posted a link to the Haskell concurrency paper in this thread.

*cough*

Her, I think you'll find.

Also, I think it's a bit much to complain about not being able to parallelise code that was written with explicit sequential dependencies!

The big problem is working out how to get the granularity right. You can go a long way with only a few changes though, as this blog post from Don Stewart demonstrates.

RT and cache coherency

I enjoy most of your posts since they often apply completely realistic intuitions based on experience. I usually agree with substance of your analysis and only see different tacks in emphasis and direction. It's not my intention to disagree below -- only to expand or tease apart pieces I found interesting because they cleave close to what I'm doing.

Achilleas Margaritis: That's only on code which is written to run specifically on multiple cores, i.e. very little code out there. Modern computers have 2 or 4 cores, but these cores are used to parallelize processes instead of threads.

Yes, I know threaded code I write lately isn't representative practice. Most folks write single threaded code using more cores via processes.

But after correctness of my threaded code, cache-coherency is really my number one concern as a performance matter. The intersection of every pessimistic device I use can easily flush CPU caches constantly unless I look for ways to avoid it as much as possible.

The cost of touching memory has gotten spectacular in recent years, in the sense trying to get maximum effect from CPU caches has a large constant factor influence on actual performance. (Just memory alignment of specific branch targets loops of bottlenecks can have significant impact, which is bizarre -- poor alignment in code can cost more than poor data alignment.)

Achilleas Margaritis: RT code has mutation in it, you just don't see it. For example, each time you do a 'cons', there is mutation:

Yes, trying to make those things threadsafe and still fast in C++ is interesting. But I try to use immutable data as much as possible after it's made visible to other threads. For example, if I can break requests and replies into two pieces, with a reply allocated later instead of mutating an original request (say with space pre-allocated for a reply), this simplifies diagnosis of multiple concurrent replies, especially when a coder doesn't think it can happen and protects nothing.

Achilleas Margaritis: That's why we don't have automatic parallelization of functional code yet.

Sounds about right. More clues about what data sets can interact are needed as annotation to drive optimizations. If all memory allocation pools are just one big soup, then it's hard to separate allocations into different scopes.

Achilleas Margaritis: For the average person, true multicore computing (in the sense that each additional core increases a program's performance) is still decades away.

I was thinking more like five years for suitable async event runtimes, which is still a long time. But twenty years seems too long for newer runtimes, but we'll likely still have code using runtime styles then which doesn't benefit from multi core.

Achilleas Margaritis: Having 2 or 4 cores gives little chance to programs to increase their performance...and most programs are sequential (even FP programs), since each computation usually depends on the previous ones.

That'll probably stay true of CPU bound programs. But i/o bound programs ought to benefit a lot from more cores, just as soon as folks use async styles to make computation and i/o more independent. Individual computation flows tend to be serial in nature; but we should be able to parallelize flows a lot more.

That's why we don't have

That's why we don't have automatic parallelization of functional code yet.

Seeing is believing. See also Philippa's response to your comment.

So do you propose that a database (for example) should sort its records using multiple CPUs, if available, and using RT?

Depends on the overheads. Still, premature optimization is the root of all evil. Or so they say. :-)

The programmer's chainsaw massacre

I am not convinced that mutation is bad. [...] Mutation not a 'tool of destruction', it just needs some discipline in certain cases.

Many would agree with that, although "certain cases" may be a bit weak. If you use an impure language that supports functional programming well, like ML or Scheme, you find quite naturally that it isn't necessary to do very much mutation in most programs. The cost of avoiding mutation in most cases is negligible, you get the advertised RT benefits when you avoid it, but you can still use it when you need it. IOW, it's an unadulterated win. (In the pure language case, there's a perception of a more substantial tradeoff; I'm not taking a position on whether that perception is justified.)

However, mainstream languages have a problem in that they don't provide good support for a mostly-pure programming style. They're "Mutation-Oriented LanguagES" (I'll dub these MOLES[*]), which encourage and sometimes even enforce undisciplined, problematic use of mutation.

A concrete example would be the Java and C# standard libraries, which are inexorably geared towards mutating objects. You have to do extra work, such as explicit copying, to avoid potentially dangerous mutation of shared objects. This a recognized problem, and I noticed recently (on a Microsoftie's blog) that there are plans to improve this in future. Over time, these languages will evolve to support better discipline when it comes to mutation.

In the meantime, people using these and other mainstream imperative languages are heavily overusing and abusing mutation, because the languages make it difficult not to. Further, most people who don't have experience with functional languages are unlikely to recognize the degree to which the mutation they rely on isn't actually necessary.

I'll close with an obligatory analogy: it's as though we've all been cutting our food with chainsaws. Chainsaws are powerful and efficient at cutting - they "just need some discipline in certain cases", to avoid also cutting through your crockery, your table, and your limbs.

But now there's a new movement towards "functional eating" which involves using a knife and fork (think ML) or chopsticks (Haskell ;) instead of a chainsaw. Its proponents claim that this approach is far superior, but chainsaw fans are skeptical. After all, it can take longer to cut through a steak with a knife than with a chainsaw, so why would you waste the extra time? Sure, there's sometimes some collateral damage with chainsaws, and you might occasionally cut off a finger or two, but that's only if you're not careful with the chainsaw. But for some strange reason, the functional eating fans insist that there are advantages to eschewing chainsaws. Go figure!

[*] This also refers to the tunnels (mole holes) through a program between the site of a mutation and the references to the mutated object.

Unintended espionage

MOLES ... also refers to the tunnels (mole holes) through a program between the site of a mutation and the references to the mutated object.

Not to mention the appropriate connotations of "mole" in the world of espionage. (I thought that those languages were on my side!)

Of course...

try cutting down a tree with a knife and fork. :)

Here's an interesting thought question for you.

Imagine Java was magically changed have the following alternate semantics. Existing code everywhere were to magically be transformed. (As this change affects only the source and not the compiled JAva, class files need not be harmed).

1) By default, all local variables and class attributes are final. (The keyword "final" may still be used in this context, and has no effect). Like today's Java, final locals and class attributes may only be written to once while in scope, and are thereafter read-only until they go out of scope.

2) A new keyword, "var", is introduced which must be used to get mutable attributes and Static final class attributes need initialization in the class definition; nonstatic final class attributes must be initialized in the constructor (or given default initialization);

3) Final's other uses--preventing class inheritance or method overriding--are unchanged.

How would this simple change, were it to occur, affect Java programmer's thinking and behavior?

1) It wouldn't--other than thousands of programmers grumbling about what a stupid language it is that makes them explicitly declare mutable variables.
2) It would lead to moderate improvement of code, as accidental modification of variables-that-should-be-final is reduced and programmers use more idioms which avoid mutation.
3) It would revolutionize the programming world, as programmers learn that "var" means trouble; as organizational coding standards restrict or discourage the use of mutable constructs, and IT departments reap significant productivity gains.
4) People would write in a slightly more functional style, but would still produce mounds of bad code.
5) Mutable locals and class attributes ain't the problem. Mutable locals and class attributes combined with mandatory reference semantics for aggregates, further coupled with promiscuous multithreading, is the problem. :)

It's not the wand, it's the magician

How would this simple change, were it to occur, affect Java programmer's thinking and behavior?

To abuse an old saying: You can lead a programmer to a nice language, but you can't make him think.

It is perfectly possible to write fairly decent, safe, elegant, thread-safe Java code. (At least, I'd like to think I do so every day. ;-))

You just have to understand the design options and limitations of the language, and learn what examples to follow, and which to shun.

We at LtU help by focusing on understanding PLs in general, rather than language advocacy: this shows that better understanding can get better results from (almost) any language.

Just creating a new fashion in languages won't have the desired effect.

Puttin' on the Ritz

You just have to understand the design options and limitations of the language, and learn what examples to follow, and which to shun.

Sure. The same is true for writing business applications in assembly language or COBOL, but the fact that you can do that doesn't make those languages appropriate to continue writing applications in, as superior approaches become available and understood. (I hope I don't drag any COBOL 2000 fans out of the woodwork with that...)

We at LtU help by focusing on understanding PLs in general, rather than language advocacy:

I'm not sure what language advocacy you're seeing here. What I was getting at (and what Scott's experiment seems aimed at) is language design. I was saying that the current heavy reliance on imperative style is, in the majority of cases, an error, in that it creates an unnecessary class of problems — that with the right tools, it's just as easy or easier to do things in a more robust way. I also believe that PL theory supports this conclusion, i.e. it's not just subjective.

Certainly, with skill and care you can avoid the problems, but the same holds true for manual memory allocation and a host of other low-level and "unsafe" things that programming languages have gradually moved away from.

For me, LtU and the subject matter it covers would be much less useful if it didn't offer some basis for assessing claims like the one I'm making. It's exactly this kind of assessment that language designers have to perform, and are performing, in order to improve languages.

Fortunately, IMO, the designers and implementors of a number of mainstream languages have recognized this particular error, so the kind of changes I'm talking about will happen, albeit incrementally because of the huge legacy issue. Then the market will decide, so we'll have to wait and see whether the current mutation-happy approaches are significantly reduced. Based on experience with impure functional languages, I'm predicting that because of the "unadulterated win" that I mentioned, more functional approaches will win in the marketplace once they're available in mainstream languages.

Just creating a new fashion in languages won't have the desired effect.

Fashion implies that there's no objective advantage between different fashions. By that criterion, reducing the emphasis on mutation and providing good support for doing things easily without mutation is not just a fashion.

And just to preempt a possible misunderstanding, I'm not saying all languages should be pure, but that supporting the writing of substantially pure code, and allowing mutation to easily be restricted to cases where the tradeoffs are worth it, are desirable features for most general-purpose languages.

Back to Saltines

Fashion implies that there's no objective advantage between different fashions.

For me the distinction is one of intent and depth: you can do something objectively good because it is latest big thing, but without having an internalized sense of its importance. (Of course that never happens in the tech world...) In such a case, it is just fashion, and will probably produce anti-results.

It is only if one develops one's own faculties and judgment to see why a particular approach is good (and its weaknesses as well) that it will bring about lasting improvement.

As you know, I would rather see better designed languages, i.e. ones that make it easier and more obvious to reduce the amount of state, become more widely used, but I don't want the fact that we aren't there yet to give anyone an excuse to avoid learning how this would work now even with less well-designed languages.

No matter how we slice it, the programmers have to fix themselves first (perhaps by learning an FP language or studying some PLT) before changing language features is going to make a significant difference to software design quality.

Moreover, we are more likely to get the adoption of good designs when people understand the elements of good (and bad) design, and can better assess the strengths and weaknesses that various languages bring to the table.

When everyone is designing their Java (or Cobol or C++...) to reduce state à là CTM, the allure of languages that make that easier will be obvious and compelling. (And we won't need threads like this one anymore... ;-) )

Don't mutate me, bro!

I mostly agree, except on one fairly significant point: I don't think it's likely that the average programmer will "fix themselves" without some assistance from their languages, and I don't have faith that enough of them will go and learn an FP language or study PLT (although the influence of the web could still prove me wrong on that).

Particularly for non-CS-grad programmers, the programming language(s) they use are one of the primary sources of, and limits to, their understanding of programming. Changing the languages can fix their thinking, or at least create an environment which fosters the fixing of their thinking[*] as we all continue our ongoing collective learning experiment.

You knew this was going to come back to Sapir-Whorf, right?

--
[*] P.S. I'm not really a megalomaniac, but I sometimes play one on the Internet.

Two legs bad, four legs good

You knew this was going to come back to Sapir-Whorf, right?

Well, you know that I prefer the Sapir-Worf Hypothesis: thinking you are a Klingon leads you to want to learn the Klingon language. ;-)

I guess I don't really have an interest in fixing programmers via some Newspeak, but rather sharing what I've learned so that they can, if they so desire, use that information to help themselves become better.

In my professional environment, I get to set the standards of what is acceptable code and design, but I never set up rules that must be blindly obeyed, or arbitrary constraints to be tolerated: I set standards of practical outcomes and disuss and explain the principles that underlie those standards, so that, as motivated professionals, programmers can internalize those principles and use them themselves to be more effective. (They can even disagree with me and win, if they offer a sensible argument!)

I apply the same principles to what I post on LtU: I will expound and defend my principles and positions, but in the end I truly believe it is up to the recipients what they make of them ( or even if they make anything of them ).

Yes, the world would be a better place if the CTM hierarchy was the default assumption of all PLs, but a revolution is not necessary to start being effective now.

[Fixed typo... for real this time]

A matter of faith...

Obviously this is a matter of our faith in peole being able to be trained and disciplined, which is where we get to our beliefs and attitudes. For a good practical estimate of whether or not people are capable, consider history:

Ahhh, the health industry--before medical schools and insurance and research and the melding of acadamia/industry, there were quacks. Observe: erroneous, illogical practices; no qualifications; cures that killed, etc...Enter new associations and certification standards. The state of the health industry is much more mature now.

As a computer software industry, we have alot of catching up to do. We see the same symptoms in programming that we saw in early medical practice--lack of foundation, quality, discipline to the field...

That's our new job as an industry--to heighten the standard of a competent programmer by disciplining the field with qualifications and education in code design. Pretty soon, all business will not want to hire a programmer who doesn't have any code design education, since programmer time costs much more than computation time, and bugs are a huge source of programmer time.

Good start

Attempting to cut down a tree with a knife and fork yields a type error.

How would this simple change, were it to occur, affect Java programmer's thinking and behavior?

This simple change doesn't go far enough. What about all the standard library code which assumes that mutating an object is the appropriate way to sort it or append elements to it, for example? There are plenty more examples like that.

The evolution of these languages will include many changes which, taken in isolation, will make many programmers react as in (1), although in the end (2) might happen anyway.

But programmers who understand the reasons for the changes will be more easily able to achieve (3). Mounds of bad code (4) are a universal constant; I'm not talking about eliminating bad code, I'm talking about supporting the writing of good code, as opposed to discouraging it.

I'm missing the point of (5); if you don't mutate objects, then reference semantics are not a problem, and you eliminate a class of multithreading bugs, but would have to use a different interthread communication mechanism.

Not enough

Imagine Java was magically changed have the following alternate semantics. [...]

That wouldn't be enough. The ability to do effect-free programming effectively (pun intended) in functional languages is due to a lot of other features. Off the top of my head:

1. Flexible expression language - if all control structures (e.g. conditionals) are just statements then you'll have a hard time avoiding at least temporary local state.

2. Cheap recursion (esp. tail recursion) - to avoid state you must avoid loops, which makes efficient support for recursion a must. (And in fact, you also want to ban/discourage loops.)

3. Lean higher-order functions - to make the use of recursion convenient you have to be able to encapsulate it in nice abstractions.

4. Efficient allocation and GC - you will be allocating small short-lived objects at a much higher rate, something for which mainstream GCs typically aren't optimized. (And of course, automatic memory management itself is an absolute must.)

5. Algebraic datatypes - allow you to "reify" decisions as data in a save and convenient form, such that you don't have to "execute" them in place.

6. Suitable libraries - obviously.

Some of these have begun to show up in mainstream languages, but I have yet to see the full package. Without them, your suggested change of semantics will almost certainly result in merely "a stupid language".

I'll emphasise point 5 a

I'll emphasise point 5 a little: in pure programming, what GoF Design Patterns fans know as the Interpreter Pattern becomes widespread and having to use precisely the wrong structure (we're essentially talking AND instead of OR) to implement abstract syntax trees creates a whole world of pain. The saved decisions can be as complicated as you like, they can be an entire plan of action or to put it another way a program, but nobody in their right mind really wants to have to embed the thing as objects.

Doing this also offers us the advantages of metaprogramming - who among us really expects their RDBMS to /not/ optimise queries? Isn't it so much easier to make use of our understanding of logic and of the type systems we use every day and perform consistency checks by writing a typechecker?

In short, isn't life so much easier when language manipulation stops being weird voodoo stuff because we're trying to use all the wrong tools?

Friday Afternoon => Batteries Low

I even popped open the Design Patterns book for this one, but I just can't see what you mean by AND instead of OR. The nonterminals on the right are subclasses of the nonterminal on the left in rules with alternation, instance variables otherwise. This looks ok(*) to me. Could you please elaborate?

(*) I still dislike objects in AST's due to the visitor pattern.

Visitor-pattern matching function

I still dislike objects in AST's due to the visitor pattern.

Actually, there is a striking duality of message=algebraic data type, object=pattern matching function: in object terms, the visitor is an object, and the data is some sequence/structure of first-class message to that object. This completely does away with the need for the visitor pattern in object oriented languages. The paper I link to to in that post has a compelling design for first class messages which I'm trying to build on.

The identity is flawed. One

The identity is flawed. Two decades of OO brainwashing hasn't been sufficient to establish the idea that switching over types is something entirely different from type based method dispatch. A few years ago type switches were still considered as imperative 70s style retro computing because they didn't respect subclassification or "ad hoc polymorphism". Now, they got redressed in functional clothes and everyone seems to be excited again. Progress through rebranding. I love fashion.

What I find rather bemusing is that the quoted author doesn't compare pattern matching to the visitor pattern but replaces the pattern by something entirely different. Well, there is something looking remotely similar: an interface named IVisitor. I guess this shall proof the assertion.

The identity is not flawed

The identity is flawed. Two decades of OO brainwashing hasn't been sufficient to establish the idea that switching over types is something entirely different from type based method dispatch. [...] because they didn't respect subclassification or "ad hoc polymorphism"

The paper linked from the blog post supports both. For instance, here's a possible example of inheritance:

let baseType = (mfun BaseType `SomeMethod(x) -> (* do something with x *)) in
  mfun SuperType `SomeMethod(x) ->
    if (* some condition *) then (* do new thing *) else baseType(`SomeMethod(x)) end;;

Perhaps you should read the paper.

What I find rather bemusing is that the quoted author doesn't compare pattern matching to the visitor pattern but replaces the pattern by something entirely different.

I discussed the "visitor pattern=pattern matching function" identity in previous posts. The post I linked to discusses how to eliminate all the boilerplate inherent in the visitor pattern by increasing the power of object type systems to include first class messages. This extension aligns the expressiveness with functional languages for these data-oriented problems, while retaining the benefits of objects for operation-oriented problems.

Progress via interparadigm-bouncing

Now, they got redressed in functional clothes and everyone seems to be excited again. Progress through rebranding. I love fashion.

Bouncing between progressively refined alternatives seems to be a major way that real progress occurs. As another example, consider mainframe terminal applications, followed by desktop GUIs, followed by web applications. What happens is that we run into real limits and problems with one approach, so we find something else promising that addresses many of those problems. We embrace that, then run into its limits, and repeat.

In the meantime, our understanding of the problems tends to increase, in part precisely because we're exploring the problems through the lens of more than one paradigm. We start to see that the old approaches had potential that wasn't fully exploited. We also start to see connections between the approaches, as in Haskell's overlooked object system.

There's much more going on in these cases than just rebranding and fashion.

PLT as a fashion

If I knew how to make progress I was already a Turing award candidate ;)

Unfortunetely many of the things I made have only some recreational value, showing that with some stretching an idea can be realized in an unexpected context. For example I once implemented checked exceptions in Python for an ActiveState cookbook recipe and yeah I got 5 from 5 stars and a nice comment by someone who said he would hate checked exceptions and wouldn't ever use my recipe but finds it awesome nevertheless. Life can be so beautifull.

So what does it mean that this is irrelevant that most research and even most programming paradigms go nowhere ( anyone still remembers AOP? ). Maybe they just address the wrong problem or the right problem in the wrong context or being overly ambitious and make fast progress but get stuck soon or they come up with a solution which is a monstrosity and solves thousand problems at the same time but are still impoverished to solve more complicated ones and being too complex to express the easy but common use cases. So progress is indeed very hard to recognise from staring at code and publishing papers.

That's also my main concern with the latest hype for theoretical cleverness which came up with the popularity of Haskell - and yes, also LtU! This discussion style of quoting just an arcane, technical paper that "makes my argument" without any precise reading and patient interpretation has become pop. Programmers pretend to be language theorists and instead of using their own idioms they strive to appear like scientists. PLT is the new coolness. Instead of taming the "paradigm warfare" PLT has just become immersed.

That's what's going on.

Wearing my designer lambda shades

Programmers pretend to be language theorists and instead of using their own idioms they strive to appear like scientists. PLT is the new coolness.

I'm really not sure what the scope of this comment is, Kay. PLT per se is still pretty much a niche interest among the programmers I know of.

Specific aspects of PLT necessarily form the basis of knowing how to program effectively, and certain concepts, such as state, are vital to understanding how large systems work. However, it's not that trendy (or necessary) to know PLT in all its glory.

But if by some miracle understanding PLT makes us cool, then, hey, I'll take it; I hope there is a nice tee-shirt that comes with it. ;-)

Sausages

To paraphrase a well-known saying, watching progress is like watching sausages being made. If you're squeamish (or impatient, or cynical) it's better not to watch too closely.

Leaving aside the question of to what degree the phenomenon actually exists w.r.t PLT, the spread of "pop" emulation of something doesn't necessarily detract from whatever is being emulated.

Information spreads through a society in many ways, and some of them can seem quite strange. The cargo cult phenomenon, where people emulate something in form without succeeding in emulating its function, can actually be a way of learning: fake it till you make it, as the pop psychs say.

A major aspect of successful human learning is the ability to function with only incomplete knowledge - without that, we'd never be able to bootstrap and trial-and-error our way to understanding anything.

So, to me your observations are simply describing what is to be expected as we make, and experiment with, progress.

Open-minded sausage cult

The cargo cult phenomenon, where people emulate something in form without succeeding in emulating its function, can actually be a way of learning: fake it till you make it, as the pop psychs say.

I'll have to dissent a bit on this.

The most pernicious anti-pattern programmers I've ever seen thought they already understood good practice X (for all the relevant values of X), and therefore didn't need to understand any more.

Another variation on this is the "We tried X here, but it didn't work". Upon further investigation, one discovers that the version of X that was tried was about as far from X as you could imagine. Again, in this scenario, the need for further knowledge is explicitly denied.

If you qualify your "fake it till you make it" philosophy with an awareness of "beginner's mind", I'll go along: being open to refining one's knowledge while just diving in and doing the best you can is a useful and constructive approach.

On the other hand, sometimes after open-minded investigation one is in a position to say that X is just a bad idea, or a good idea, but only in a restricted circumstance.

Whether one is entitled to this position or not, in the end, must be a matter between oneself and one's own conscience. ;-)

Be like water

I agree that the beginner's mind strategy is ideal when you can get it, but I'd say the problems you mention are an inevitable part of the same sausage-making process. When X really does work well enough, it tends to spread despite such obstacles - because some people succeed with it, and then do better in some way than the people who didn't.

Also, the people who resist X may have good reasons in their own context, even if they draw overgeneral conclusions from that context. You might not want too much beginner's mind when the next revision of some critical product depends on you. So it's all part of the struggle - some bits move forward before others.

I'm making a sort of best-possible-world argument: despite all these messy issues close up, given that we're not each perfect learning machines and oracles, it couldn't really happen any other way. At least, not in the big-picture perspective. There might be room for improvement here and there, if one does not subscribe to Panglossian pessimism.

An ever empty cup

When X really does work well enough, it tends to spread despite such obstacles - because some people succeed with it, and then do better in some way than the people who didn't.

The flipside of this position is quite dangerous: X (say FP) hasn't yet succeeded, therefore it is a waste of time.

A key part of what I'm saying is that it is not necessarily X itself that is the cause of the success, it is an understanding of when and how to use X.

Imagine for a moment that X is such that, when it is used appropriately, it produces 10 times the results of not using X, but when it is not used so wisely it causes catastrophic failure. Allowing X to freely propagate in the evolutionary world you propose will not cause it to succeed massively in the general case: it will only take hold if some group that can wisely use X gains some kind of enviable niche in that ecosystem (the species Google googlensis comes to mind as an exemplar, for several values of X ;-)).

Even in this latter case, anyone not starting out in this wise group would have to accept that they need to change themselves to be like that group in their knowledge about X (and possibly their other resources for X) in order to replicate their success.

I will agree with you if you say that it is easier to spread the form of X than the substance of X in most populations, but for many values of X, only the substance can produce any beneficial effect.

Again, I'm not sure that I think everyone needs to use all good Xs for it to be good that some do. I personally strive to understand as many Xs as possible, and invite others so inclined to join me in those Xs: anything more ambitious may not achieve desirable results.

vs. CTM?

I haven't read it (asking for a copy for xmas) but I think I have heard that CTM says one usually good way to approach anything is to build up in layers with declarative at the base? Sorry if I'm butchering the details. The thing I'm getting at is: can you follow CTM and get something good 90% of the time? If there is going to be cargo-cult programming, and there always will be, what is the safest default to teach?

Let's start with 10%... ;-)

can you follow CTM and get something good 90% of the time? If there is going to be cargo-cult programming, and there always will be, what is the safest default to teach?

In some ways my position is akin to Teach Yourself Programming in Ten Years: to get 90% results you need the right ideas plus the right experience.

Certainly, studying CTM (or SICP, etc.) seriously, and mastering the concepts in it will make one better than they are currently. However there is a catch with the word "mastering": one's understanding of ideas and the mastery of them evolves over time as one's experience and understanding deepen.

The philosophy I've been propounding in this sub-thread amounts to suggesting to experienced programmers that they be open to new ideas (i.e. not to assume that there experience covers all the bases) and recommending to novice programmers that they be open to the lessons of experience (i.e. not to assume that because they have learned the right ideas/languages that they already know the "right things" to do).

My "disagreement" with Anton (which is too small to really deserve that name ;-) ) is that I think it is the ideas that various PLs give us that are more important than the PLs themselves.

So learning CTM and Oz (and SICP/Scheme, and SoE/Haskell, etc.) can make you a better Java (or C, Perl, etc.) programmer if you apply the lessons learned, and to me that is more important than trying to get people to "switch allegiance" to those languages for their practical work. (However much better the world might be if those were the dominant languages...)

I agree with you about

I agree with you about this.

You can also make the case that even if you don't apply specific lesson, simply by experiencing more languages/paradigms/ideas you stretch your mind, and become a better programmer as a result.

clarification

i agree with your position (at least as i understand it ;-). i was just wondering how to bandaid the badness. some folks need or want or will only use a pithy slogan, in which case instead of it being "nobody ever got fired for hiring Java" i would like it to be something like "if you want your software to suck less, use CTM" or something along those lines. :-) i'm not saying bandaids are to be lauded. but i hope that a better bandaid will at least have folks be exposed to better things, than if they just do basic enterprisy Java for ever?

Sloganeering

something like "if you want your software to suck less, use CTM" or something along those lines.

OK, how about:

"No one ever became a worse programmer by studying CTM (or SICP, etc.)"

That's the best I can do without talking to a marketing guy. ;-)

I wonder if this

I wonder if this generalization is indeed universally true....

Would bad programmers bother reading either?

And by "bad programmers", I don't mean inexperienced ones who have a desire to get better. I'm referring to the lazy-slob sort of programmers, who view generation of code in the same manner that your typical teenager working at McDonalds views generation of hamburgers. :)

In other words, it might be universally true by virtue of being a tautology.

Managers out there: A good question to ask on interviews is whether or not the candidate has read either. :)

The problem with slogans

I wonder if this generalization is indeed universally true....

Coming back to my theme, I can think of one obvious exception: the programmer who thought he understood them but really didn't. ;-)

Disenhanced agreement

My "disagreement" with Anton (which is too small to really deserve that name ;-) ) is that I think it is the ideas that various PLs give us that are more important than the PLs themselves.

I agree that the underlying ideas are more important, but I think there's some additional subtlety not captured by that claim.

To summarize: many (most?) programmers are primarily exposed to those ideas through the languages they know. On the subject of ideas about mutation, a number of the mainstream languages seem to be heading towards a more disciplined approach to mutation. There are indications that this is an increasing trend. I believe that it's through the resulting language changes that many (most?) programmers will learn about the corresponding ideas.

Of course, that doesn't preclude more ambitious programmers from learning new languages and ideas now, and applying that knowledge to their programming in mainstream languages. If they're commenting on LtU, then they have an obligation to do that. ;)

So learning CTM and Oz (and SICP/Scheme, and SoE/Haskell, etc.) can make you a better Java (or C, Perl, etc.) programmer if you apply the lessons learned, and to me that is more important than trying to get people to "switch allegiance" to those languages for their practical work.

I get the impression that something I wrote may have been misinterpreted. I certainly wasn't arguing that people should switch allegiance for practical work — if anything, it's the opposite. I recognize that most people either don't want to and/or don't feel practically able to switch, which is why I expect that the most predictable upcoming changes will involve the evolution of mainstream languages. I won't try to predict the unpredictable changes, such as eToys taking over the world.

Ouch...

Ouch...

Contravariant coagreement

a number of the mainstream languages seem to be heading towards a more disciplined approach to mutation

Perhaps it is my definition of mainstream (or a temporary failure of imagination), but I can't think of any unambiguous examples of this. What are some you are thinking about?

I expect that the most predictable upcoming changes will involve the evolution of mainstream languages

When you say evolution do you mean by adding features to existing mainstream languages, or by getting changing the languages that are mainstream. I had interpreted you to mean the latter, but if you mean the former, we might agree, depending on the examples you give. ;-)

Your MOLEs are already dead

(Channeling Agent Smith, not flamebaiting.)

Perhaps it is my definition of mainstream (or a temporary failure of imagination), but I can't think of any unambiguous examples of this. What are some you are thinking about?

The most prominent example is C#, and ultimately by extension, the whole .NET platform. See e.g. this post from Mads Torgersen, a C# PM (at the time of writing at least), and this PPT presentation for just a couple of examples. There are many more where those came from. This evolution is not remotely complete, but it's well underway and already seems somewhat inexorable. Even though the focus of changes in C# 3.0 is perhaps not so much on controlling mutation directly, it provides a set of related features which naturally move programs away from excessive mutation.

I don't think any other mainstream languages are as far ahead in terms of actually delivering such features, but I said "heading towards". There's pretty widespread acceptance of the advantages of controlling mutability, both amongst language designers and programmers setting and following best practices. There are also many examples of changes being discussed or planned that move in that direction. One factor driving this is the conventional wisdom that languages have to be able to exploit multicore processors.

When you say evolution do you mean by adding features to existing mainstream languages, or by getting changing the languages that are mainstream.

Definitely the former. The latter would be more of a revolution — which could and ultimately will happen in some form, but the details are less predictable.

Back to the burrow

The most prominent example is C#, and ultimately by extension, the whole .NET platform.

In which case I will have to plead skeptical agnosticism.

The mainstream langs may enlist PLT "good guys" to help them (as Java did with generics), but this doesn't mean that the features so added are understood or well-used by practitioners.

You might be right that this will ultimately "bait the hook" for improved understanding, but my enthusiasm for this is lukewarm at best: I see too many confused (and otherwise talented) practitioners already.

Burrow fast, they're comin' for ya! ;)

There's much more going on than just the judicious application of a few PLT superheroes. The point is that existing mainstream language designers and developers, and a not insignificant number of users of those languages, seem to be getting the message.

Confused practitioners are just part of the sausage-making process. The spreading of knowledge is a trickle-down process, so depending on where you look, things can seem either better or worse than they are "on average". I'm betting that "good knowledge" will spread and mostly win out over "bad knowledge", in much the same way as modern medicine has mostly displaced witchdoctors.

One difference is that not many witchdoctors made the transition to medical doctor, whereas many programmers, with access to useful information, are able to recognize its usefulness and exploit it, particularly if it's provided in a form they can use, such as in languages they're already familiar with. Add the social web as a means for disseminating and discussing knowledge, stir a few times, and call me in ten years. I think you'll be pleasantly surprised.

See e.g. this post from Mads

See e.g. this post from Mads Torgersen, a C# PM (at the time of writing at least), and this PPT presentation for just a couple of examples

That powerpoint presentation seems inaccurate. C# does not implement GADTs safely, as some programs require runtime casts.

This discussion style of

This discussion style of quoting just an arcane, technical paper that "makes my argument" without any precise reading and patient interpretation has become pop.

I've found most citations on this site to be very readable despite having absolutely no background in computer science when I first signed up here; the gist of any paper is usually quite clear from the abstract and conclusion. If any questions remain, they can be posed here and generally someone will answer. Honestly, what more could one ask for?

If you view objects as

If you view objects as "essentially records" (which isn't an unrealistic description in many of the languages GoF applies to, even if it's an unfair one in Smalltalk) then you're working with "labelled products" in place of "labelled sums" (ADTs) and using subtyping to emulate the "OR". That last bit incidentally means that the subtyping relationship is in one sense backwards - you can add new subtypes at any time, despite visitors not being equipped for it, whereas creating new supertypes is difficult despite being a more valid thing to do.

Ok, we so agree that OR is

Ok, we so agree that OR is emulated by subclassing while products are directly supported. I thought your previous post implied that products were misused to implement OR as well.

They effectively are. We're

They effectively are. We're using a record that acts as a "sum-dispatcher" or an elimination form for sums to simulate a sum. As the subtyping relationship ends up being the wrong one anyway (the variances end up backwards), subclassing and tying methods to specific classes is just a further complication - prototype-based OO is a minor improvement here.

The core of the problem is that we're forced to work with something not unakin to a church encoding in a setting where neither the semantic nor the syntactic costs of the encoding can be dealt with adequately.

Many would agree with that,

Many would agree with that, although "certain cases" may be a bit weak. If you use an impure language that supports functional programming well, like ML or Scheme, you find quite naturally that it isn't necessary to do very much mutation in most programs.

I can certainly agree with that. Functional composition is the big plus of functional languages, but it is in no way an exclusive advantage of functional programming. Lack of mutation, on the other hand, can be a double-edged sword.

In the meantime, people using these and other mainstream imperative languages are heavily overusing and abusing mutation, because the languages make it difficult not to

I agree with that as well. In my experience, the real bad thing that complicates programs is mutation of variables not passed as parameters. Having free mutation of only local variables is an approach which requires better initial planning but pays off later, and it does not strangle your mind on how to do even the simplest tasks.

But now there's a new movement towards "functional eating" which involves using a knife and fork (think ML) or chopsticks (Haskell ;) instead of a chainsaw. Its proponents claim that this approach is far superior, but chainsaw fans are skeptical. After all, it can take longer to cut through a steak with a knife than with a chainsaw, so why would you waste the extra time? Sure, there's sometimes some collateral damage with chainsaws, and you might occasionally cut off a finger or two, but that's only if you're not careful with the chainsaw. But for some strange reason, the functional eating fans insist that there are advantages to eschewing chainsaws.

I prefer the middle road. Free mutation of everything is bad, as not being able to mutate anything at all.

And you need the steak slicer...

...for slicing stakes. You need the butter knife for buttering bread. You need the water gun for cutting stone.

And how much more safety

And how much more safety would RT buy you, if the algorithms are incorrect? Refactoring bad code, even functional one, would result in bugs.

In most cases it shouldn't - it should only propagate them.

Sometimes the bug creates an invariant which the refactoring breaks, but this can't happen in a pure language because there are no side-effects whose order can be changed by refactoring. A refactoring that changes the denotational meaning of a piece of code isn't strictly a refactoring any more.

In most cases it shouldn't

In most cases it shouldn't - it should only propagate them.

If refactoring of functional code shouldn't result in bugs, then what bugs would be propagated?

Anyway, here is an example of a functional code introducing a bug. The invariant is that n must be greater than 0:

int factorial(int n) {
    if (n == 0) return 1; else return n * factorial(n-1);
}

int[] data = new int[]{-1, 0, 1};

int i = 1;

int r = factorial(data[i]);

Then I refactor the code:

int i = 0;

Oops, the refactoring introduced an invariant violation.

Another example: we process an array of objects and assign them an id; we copy the objects in another array. The invariant is that the array should not contain null pointers:

class node {
    int id;
    char data;

    node(int id, char data) {
        this.id = id;
        this.data = data;
    }

    node(char data) {
        node(0, data);
    }
};

node assign_id(node obj, int id) {
    return new node(id, obj.data);
}

node make_node(int n) {
    if (n >= 0) return new node((char)n);
    return null;
}

node make_node_1() {
    return make_node(1);
}

node make_node_2() {
    return make_node(-1);
}

void main() {
    node[] data = new node[]{make_node_1(), make_node_2()};
    node[] new_data = new node[]{assign_id(data[0], 0));
}

I refactor the code to:

void main() {
    node[] data = new node[]{make_node_2(), make_node_1()};
    node[] new_data = new node[]{assign_id(data[0], 0));
}

...and then I have a bug. I simply switched the order of invocation of make_node_2() and make_node_1() in order to achieve it.

Changing a value to a

Changing a value to a completely different one is not a refactoring, it's simply a change.

You can use any definition

You can use any definition you like, but even the smallest change is refactoring, in by book...and also according to wikipedia: http://en.wikipedia.org/wiki/Refactoring

Furthermore, the change I showed in the examples could easily be part of code refactoring.

The problem in programming is not mutation, but undisciplined programming, which can introduce bugs in any language. Having no mutation does not result in fewer bugs in any way, since bugs are logical errors, and logical errors can happen anywhere and in any form.

Terminology

The very first line of the Wikipedia article is: A code refactoring is any change to a computer program's code which improves its readability or simplifies its structure without changing its results. It is and has always been the intent of the people who coined the term refactoring and propagate it that a refactoring is semantics preserving. Fixing (or adding) a bug is not a refactoring.

The point is that trying to

The point is that trying to refactor functional code may result in a bug, just like trying to refactor non-functional code. Being pure or not does not play a role in refactoring vs less bugs.

Experience

Two observations:

First, this flies in the face of my experience—that is, refactoring referentially-transparent code does actually result in fewer application logic bugs than refactoring non-referentially-transparent code does.

Second, any refactoring that results in a bug isn't a refactoring, since, by definition, refactorings are semantics-preserving. :-)

Refuting nihilism

The problem in programming is not mutation, but undisciplined programming, which can introduce bugs in any language. Having no mutation does not result in fewer bugs in any way, since bugs are logical errors, and logical errors can happen anywhere and in any form.

By that argument, nothing but discipline can change the bug rate.

Tell you what, I'll bite: automating a discipline that yields correct results is a big win, as is simplifying one. There are fewer possible errors - fewer ways that discipline can be lax, if you like - in refactoring pure functional code. In particular, pure functional code can be refactored in blissful ignorance of the actual logic involved! It's a strictly syntactic process. Thus the probability of an error in refactoring should be significantly lower.

This goes double for pure code written in a good functional language where the syntax has evolved to aid exactly the kind of reasoning involved in refactoring. The same does not apply to attempts to write pure functional code in a C-like syntax.

[quote]

In particular, pure functional code can be refactored in blissful ignorance of the actual logic involved!

I don't understand how you can say that. For example, changing a simple value from 1 to 0 (accidentally, as a result of refactoring) may destroy a pure algorithm, just like a non-pure one.

What part of "that is not a

What part of "that is not a refactoring" do you not understand?

You're changing logic, not refactoring it!

Changing a 0 to a 1 is changing the logic, not refactoring it. Referential transparency with strong static typing almost ensures that a refactoring does not change the logic. It's much harder to make that guarantee with ambient mutation available, because maintaining the original order of operations is not properly checked in such languages. Perhaps it could be checked with a type and effects system, but no such language is in widespread use at the moment.

Other have answered your

Other have answered your other questions, so I'll just finish off this one:

Even if side affects are limited to local scope?

I don't think side-effects limited to local scope break the observable RT of the function, except if those side-effects lead to bugs in the function's implementation of course. So then ask yourself: if RT is good for avoiding bugs in the large, why not apply the RT discipline in the small right down to the individual line of code?

Finally, side-effects in a local scope are local mutable state, and exceptions all of which are caught and handled locally, so such local side-effects are of very limited usefulness in any case. Anything involving I/O is not a local effect.

If RT is good for avoiding

If RT is good for avoiding bugs in the large, why not apply the RT discipline in the small right down to the individual line of code?

Because it makes simple things more complex than they should be and it may hurt performance.

Finally, side-effects in a local scope are local mutable state, and exceptions all of which are caught and handled locally, so such local side-effects are of very limited usefulness in any case.

Not if you can assign values accessible from references.

Because it makes simple

Because it makes simple things more complex than they should be and it may hurt performance.

It only seems complex because you've trained yourself/been trained to think another way.

Further, if a slight increase in complexity in the small produces huge benefits in the large, then isn't it worth it?

Not if you can assign values accessible from references.

But then the effects are not local. You are essentially advocating capabilities, which I'm all for, but this constraint falls just short of referential transparency.

It only seems complex

It only seems complex because you've trained yourself/been trained to think another way.

Not really. It seems simple to you because you have been trained (possibly hard) in the subject.

Take a simple game like Pong, for example. The imperative version is fairly straightforward: the bat's X and Y position is updated at each frame, based on what key the player has pressed. On a purely functional language, the code is not straightforward. You have to use monads or arrows or things like that.

Another example is the model-view-controller approach. In imperative code, it's very easy and straightforward (and most people get it right in the first try). Doing something similar in pure FP is a good puzzle.

There are countless other examples which are very small, trivial in imperative code, but quite complex in FP code.

But then the effects are not local. You are essentially advocating capabilities, which I'm all for, but this constraint falls just short of referential transparency.

But the result of a function depends only on its inputs in this case as well.

In pseudo-OCaml:

The imperative version is fairly straightforward: the bat's X and Y position is updated at each frame, based on what key the player has pressed. On a purely functional language, the code is not straightforward. You have to use monads or arrows or things like that.

Not necessarily. In pseudo-OCaml:

type position = { x: float; y:float }
type frame = { left_bat:position; ball:position; right_bat:position}

let update (c:frame channel, s:frame)= fun -> 
  let {left_bat=lb; ball=b; right_bat=rb} = s in
    let {left_bat=delta_lb; ball=delta_b; right_bat=delta_rb} = read c in
      let u_lb = {x=lb.x .+ delta_lb.x; y=lb.y .+ delta_lb.y} and
          u_b = {x=b.x .+ delta_b.x; y=b.y .+ delta_b.y} and
          u_rb = {x=rb.x .+ delta_rb.x; y=rb.y .+ delta_rb.y} in
        update (c, {left_bat=u_lb; ball=u_b; right_bat=u_rb});;

There are probably better ways to do the above, but I don't see what's so hard about the above algorithm to understand. The above is purely functional except the channel read operation. A type and effects system could ensure this is evaluated correctly. Refactoring this with monads is also not too difficult: the channel is the state threaded through, and 'bind' maps from frame to frame by reading from the channel.

Another example is the model-view-controller approach. In imperative code, it's very easy and straightforward (and most people get it right in the first try). Doing something similar in pure FP is a good puzzle.

I don't see why. Seems to me that functional reactive programming is a natural fit for the same problem as MVC, and it's safer too.

There are countless other examples which are very small, trivial in imperative code, but quite complex in FP code.

And the converse is quite true as well. Try building a persistent, versioned file system in an imperative language.

But the result of a function depends only on its inputs in this case as well.

In order for a function to only depend on its inputs, it cannot be affected by state changes made by any other functions. This means the program must be in CPS form, and the state of the world must be passed to every function.

!RT

It's not RT because you're not really passing a value any more

But it satisfies the criterion that a function's results depend only on its inputs.

That criterion is not the whole story. A better way to think of it is whether you can replace a function call with its result, without changing the meaning of the program. If the function in question performs an externally observable mutation, replacing a call to it with its result will eliminate the mutation, and the behavior of the program may change (and usually will).

Another way to look at it is that whenever you perform a mutation, the effects of that mutation propagate to every place in the program where the mutated location is referenced, which breaks RT potentially for every function which references that location. So a function which mutates a location but otherwise behaves in an RT way can actually be more damaging to RT, globally, than a function which just returns a different result every time it's called.

That criterion is not the

That criterion is not the whole story. A better way to think of it is whether you can replace a function call with its result, without changing the meaning of the program. If the function in question performs an externally observable mutation, replacing a call to it with its result will eliminate the mutation, and the behavior of the program may change (and usually will).

No, because the mutated variable is part of the results.

In other words, if function F has input X, and results R and X', then you can replace F(X) with {X', R} and the meaning of the program will not change.

Another way to look at it is that whenever you perform a mutation, the effects of that mutation propagate to every place in the program where the mutated location is referenced, which breaks RT potentially for every function which references that location.

What do you mean 'every function which references that location'? When a function F is executed, no other function references the memory locations F references at the same time. Functions do not have references, they only have inputs.

If you are talking about closures, closures are not functions, they are objects, and therefore they are input parameters to functions, even if they are implicit.

It's not RT because you're

It's not RT because you're not really passing a value any more

But it satisfies the criterion that a function's results depend only on its inputs.

No, it doesn't. To quote the full sentence again:

It's not RT because you're not really passing a value any more, you're passing a reference which has to be dereferenced at least once.

That dereference requires access to storage, something which you haven't passed in - you've only passed a reference.

Others have addressed the advantages of RT. Anton's point about being able to replace a function with its result is an important one, I use that kind of reasoning - especially while debugging complicated code - all the time.

That dereference requires

That dereference requires access to storage, something which you haven't passed in - you've only passed a reference.

But even in this case, a function's result still depends only on its inputs.

And if the referenced value changes somewhere within the function, it can only be because the reference was passed as an argument to another function.

Anton's point about being able to replace a function with its result is an important one

I think the above is valid in the case of limited assignments as well: you can take a bunch of functions and replace them with the data they have manipulated at any level.

That dereference requires

That dereference requires access to storage, something which you haven't passed in - you've only passed a reference.

But even in this case, a function's result still depends only on its inputs.

No, it doesn't. Imagine for a moment that I place you in an empty universe (ignore life support and communication issues): if I give you a street address from this universe and ask you to get something from inside it, you can't. You need the store (this universe) as well as the reference (street address).

You have to pass in the store and you have to get back a bunch of actions on the store as well as what imperative coders are used to considering the "result". You can in fact do this kind of substitution in some imperative languages if you're careful about the book-keeping, but it's not simple equality any more and it's certainly not just a simple syntactic manipulation.

And if the referenced value changes somewhere within the function, it can only be because the reference was passed as an argument to another function.

Anton's point about being able to replace a function with its result is an important one

I think the above is valid in the case of limited assignments as well: you can take a bunch of functions and replace them with the data they have manipulated at any level.

Not just syntactically. If you want to limit your side-effects to some local scope that's fine, but you'll have to prove that you've done so before you can validly substitute away - in doing so you'll almost certainly reinvent one or more of the techniques used in pure functional languages to enable this kind of mutation. A personal favourite is the technique used with the ST monad, where values are tied to a scope by a phantom type.

No, it doesn't. Imagine for

No, it doesn't. Imagine for a moment that I place you in an empty universe (ignore life support and communication issues): if I give you a street address from this universe and ask you to get something from inside it, you can't. You need the store (this universe) as well as the reference (street address).

Assuming that the universe is U and address is A, how isn't the following function only dependend on its arguments?

R foo(U *u, A a) {
    return u->data[a];
}

It does not have any external dependencies.

But the following, which does an assignment, does not have external dependencies as well:

R bar(U *u, A a) {
    u->data[a] = 5;
    return foo(u, a);
}

The result is {U, R}, of course, since the universe is modified. But still, there are no external dependencies: both foo and bar's results only depend on its inputs.

Not just syntactically. If you want to limit your side-effects to some local scope that's fine, but you'll have to prove that you've done so before you can validly substitute away

But if assignment is limited only locally (including parameters passed by reference), then by definition it is not possible for a function to return something that does not depend from its arguments.

You don't have the moon, you

You don't have the moon, you only have the finger.

Dereferencing your U* requires a store to look it up in. Unfortunately you don't have that store, only a pointer.

A U* won't do, you need a U.

Edit to add: You're introducing a concept of 'result' that's different from the one your language uses. As a result, you're not talking about whether the language is referentially transparent but whether your model of it is.

Dereferencing your U*

Dereferencing your U* requires a store to look it up in. Unfortunately you don't have that store, only a pointer.

That was my intention.

You're introducing a concept of 'result' that's different from the one your language uses. As a result, you're not talking about whether the language is referentially transparent but whether your model of it is.

Variables passed by reference are automatically 'out' parameters, i.e. results.

Maps ain't RT 'cos the map ain't the road

Dereferencing your U* requires a store to look it up in. Unfortunately you don't have that store, only a pointer.

That was my intention.

Therefore you can't dereference and examine the value, and thus your code can't in fact use it. The expression u->data[a] which appears in your examples is thus invalid. Unless, of course, you're working in a non-RT language where you already have access to a store.

When that happens your references are no longer transparent - they're rendereed opaque by the need for a store lookup mechanism.

Variables passed by reference are automatically 'out' parameters, i.e. results.

Not in any formal sense - you merely happen to be using them that way. They are most certainly not results in the sense used in the definition of RT, nor in the sense used in the C standard.

Something fundamental that you're missing: RT is a syntactic property. If you have to do semantic manipulation, as you're doing to talk about {U,R} above, it's not RT. It may be a useful discipline if you're going to use mutation, but that still doesn't make it RT.

Therefore you can't

Therefore you can't dereference and examine the value, and thus your code can't in fact use it.

I don't see why you say that. Of course I can dereference and examine the value.

Unless, of course, you're working in a non-RT language where you already have access to a store.

I am speaking generally, using a Java-esque like syntax for example purposes.

When that happens your references are no longer transparent - they're rendereed opaque by the need for a store lookup mechanism.

But a store lookup is a referentially transparent operation: the input is the address, the output is the value. It can be any value, it doesn't have to be a 1:1 mapping between address and value, as long as the value's meaning is correct.

Not in any formal sense - you merely happen to be using them that way. They are most certainly not results in the sense used in the definition of RT, nor in the sense used in the C standard.

Since I am arguing on the formal definition of RT, I have to agree with you.

What I am saying is that this form of only mutating locals is RT.

nor in the sense used in the C standard

I disagree with that. Out parameters in C are implemented via pointers.

Uniqueness types

What I am saying is that this form of only mutating locals is RT.

You seem to be describing a manually-enforced version of the approach used by functional languages such as Clean, which has uniqueness typing to enforce RT, or the related mechanism linear typing. (The latter page has a broken link to Baker's article on 'Use-Once' Variables and Linear Objects.) Alan Bawden's thesis, Implementing Distributed Systems Using Linear Naming may also be of interest.

Argumentum ad contradictio

(I've re-added context that Achilleas removed in his previous post)

Dereferencing your U* requires a store to look it up in. Unfortunately you don't have that store, only a pointer.

That was my intention.

Therefore you can't dereference and examine the value, and thus your code can't in fact use it.

I don't see why you say that. Of course I can dereference and examine the value.

You just agreed that you intentionally don't have the store, and implicitly that you require a store to do the dereferencing against. So no, no you can't dereference.

You've also contradicted yourself in order to reassert your original claim, and chosen not to quote the part of the preceding post that made it clear that you're doing so. This is less than astute, and could be considered rather bad form.

I am speaking generally, using a Java-esque like syntax for example purposes.

You're also making assumptions about the semantics. For example, your code is invalid in a pure language.

But a store lookup is a referentially transparent operation: the input is the address, the output is the value.

No, in this form it's the textbook example of an operation that is not referentially transparent. You can't replace it with anything but itself given just that input - to replace it with its output you need to know the state of the store as well.

Not in any formal sense - you merely happen to be using them that way. They are most certainly not results in the sense used in the definition of RT, nor in the sense used in the C standard.

I disagree with that. Out parameters in C are implemented via pointers.

Unless the C standard defines the term "result" to include the contents of "out parameters", that doesn't matter. We're discussing the C language as a system with formally named parts, not the concepts you layer on top of it.

That's not pure

When I referred to local assignment, I meant any variable, including parameters passed by reference.

Doesn't that count as referentially transparent? the result of the function depends only on its inputs.

That's an incomplete description. For a function to be pure you must be able to replace any call with the value returned by an earlier call with the same arguments, without changing the results of your code (value returned or observable side effects).

(I said value, something you can stick in a variable, store in a data structure, pass to a function. Your general idea of "result" which includes changes to variables is not a value in C, you can't make an array of them.)

If you have defined this function

int increment (int& arg) { return arg++; }

then changing this code

int x = 0;
int y = increment (x);
int z = increment (x);
return y + z;

into this code

int x = 0;
typeof(increment(x)) result = increment(x);
int y = result;
int z = result;
return y + z;

gives a different return value, so increment is not a pure function.
This is completely standard. If anybody disagrees (with "not pure" or "completely standard") other than Achilleas, please comment.

That's not to say there is anything wrong with talking about functions which can only mutate local variables, that the idea is not useful (it seems to be the basis of capability security, and quite handy for separation logic), or even that pure functions any better for software engineering (that's the debate).

But, if you want to claim that functions that can mutate arguments passed by reference are pure, then either you don't understand the definition of purity, or you want to redefine purity. Neither is productive.

Why functions which modify their arguments (but not other variables) are not mathematically tractable?

That was a comparative judgment, pure code is easier to analyze mathematically. Even machine code is not completely intractable (Abstracting Allocation).

Pure code is easier to reason about because variables stay the same once they are defined, so your evaluation rules just have to get that value from the definition to each of the uses.

If values of variables can change, then your evaluation rules have to describe that, and you have to reason with the extra complexity.
If you want to focus on some little subexpression in a pure function you just have to look back to the definition of the variables involved. If your language has mutation you have to look at every earlier subroutine and assignment statement to see whether and how they might have changed a variable your expression uses.

If you can only modify local variables, including arguments passed by reference, and you avoid aliasing somehow, then that should make it a lot easier to tell whether and where some subroutine call might have changed some variable and help reasoning about things. But, the logic isn't going to be quite so simple.

That's an incomplete

That's an incomplete description. For a function to be pure you must be able to replace any call with the value returned by an earlier call with the same arguments, without changing the results of your code (value returned or observable side effects).

(I said value, something you can stick in a variable, store in a data structure, pass to a function. Your general idea of "result" which includes changes to variables is not a value in C, you can't make an array of them.)

But any output of a function, returned either by register, on the stack or on the heap, is, by definition, something the function has produced. And the output is stored in a variable nevertheless.

then changing this code

int x = 0;
int y = increment (x);
int z = increment (x);
return y + z;

into this code

int x = 0;
typeof(increment(x)) result = increment(x);
int y = result;
int z = result;
return y + z;

gives a different return value, so increment is not a pure function.

But you changed the algorithm.

The first part is the following algorithm:

int x = 0;
++x;
int y = x;
++x;
int z = x;
return y + z;

The second part is the following algorithm:

int x = 0;
++x;
int y = x;
int z = x;
return y + z;

I don't understand how you can compare two different algorithms and then claim that something is not referentially transparent.

If you can only modify local variables, including arguments passed by reference, and you avoid aliasing somehow, then that should make it a lot easier to tell whether and where some subroutine call might have changed some variable and help reasoning about things. But, the logic isn't going to be quite so simple.

It's all about making the compiler writer's life easier, isn't it? while we programmers have to suffer the consequences... :-)

Demonstration by Example

Brandon offered a demonstration by example as to why the increment() function isn't pure: it alters the value of a variable outside of its own scope. increment() is perhaps not a very useful canary in this particular coal mine, since changing the value of a variable outside its own scope is all that it does, but nevertheless it illustrates what is precisely the point: with a pure function, Brandon's change wouldn't have "changed the algorithm." With an impure function, it does—and that is the (easiest? Most useful?) definition of a pure vs. an impure function. It might also help clarify for you why pure functions lead to fewer bugs in refactorings.

By Definition

I was offering a more correct definition, and an example of how to use that definition to decide something is not referentially transparent.

I took a piece of code that had two calls to a function with the same arguments. I changed it to a piece of code that made one call with those arguments, saved the results in a local variable, and used that variable everywhere.

By definition, a function is referentially transparent only if making that change will never change the meaning of a piece of code. Making that change when the function was "Increment" changed the meaning, therefore Increment is not referentially transparent.

I don't see how it make it any clearer.

Are closures referentially transparent?

Are closures referentially transparent?

function a(x)
  return function(y)
    return x + y
  end
end

If a() is called twice with different values for x, are the results the same function or do they violate the notion that "two equal expressions can substitute each other without affecting the meaning of a functional program" because they are determined by the prior calls to a()? Or is both calling a() and the function it returns part of the same expression? Or is the inner function only referentially transparent with respect to a(), not necessarily the caller of the function that a() returns?

How are 1 + y and 2 + y

How are 1 + y and 2 + y equal? Of course they're not the same function. However, if you call a() twice with the same value of x, the results will be equal.

Is the identity function referentially transparent?

If id() below is called twice with different values for x, are the results the same value or does it violate... etc?

function id(x)
  return x
end

In a functional language, functions are values, and the answer to the question in both cases is the same.

Coming at it from another perspective, the 'function' or 'lambda' keywords are constructors for functional values. Each time it is evaluated it generates a new functional value. (This raises a slightly more challenging question, which is how calling the original a() twice with the same value for x affects referential transparency.)

This [RT] may not sound very

This [RT] may not sound very significant, but permitting functions which are not RT is literally a nightmare for software developers; this fact is simply not obvious because most mainstream languages are not referentially transparent.

Yes and it can also be incredibly usefull for software developers. I wrote an identity function producing a side-effect looking roughly like this:

def measure(x):
    monitor.measured(x)
    return x

It was actually a bit more complicated and took an extra parameter generated at compile time to address the place of measurement:

def measure(k, x):
    monitor.measured(k, x)
    return x

So the code generator could replace an expression x in the code by measure(k,x) without affecting the intended application logics ( unless you are using deep reflection using stack trace inspections etc. ).

As a general rule: those language properties which might turn your code into a nightmare are also those which enable the most ingenious hacks.

As a general rule: those

As a general rule: those language properties which might turn your code into a nightmare are also those which enable the most ingenious hacks.

Which is why there is so much research into metaprogramming to achieve the same ends without compromising on safety.

Metaprogramming

Of course, metaprogramming is one of those language properties which might turn your code into a nightmare but which also enables the most ingenious hacks.

Nightmare code

can be written in any language; it's not a "capability" restricted to things like C. :)

Taking a diverging perspective

As a general rule: those language properties which might turn your code into a nightmare are also those which enable the most ingenious hacks."

I would have to agree, to the point of saying that:

[code C].quality = C.programmer.code_design_discipline * C.language.ability

I tend to think of programming languages in terms of effort magnification--a means to an end. If the means is restricted, the perfect programmer whines and codes a work-around. Therefore, by this argument, the programmer must be able to do what he wants, however he wants to do it. According to the equation above, garbage-in, garbage-out. The objective of a language is for its ability to approach 1 (according to the equation). The objective of a programmer is to learn how to make good situationally adaptive code design decisions. After all, nobody decided to throw out the invention of paper currency because it would make people greedy. People are already of a greedy or selfless disposition. Money would bring that to the surface. Call it the magnifier principle.

On this principle we can see the balance between frequent-case advantages of functional programming over imperative programming, and logic programming over functional programming, while also seeing a good cases for abstract imperative programming--one being synchronous, action-oriented procedures as a method of process or state-transition simulations.

Those who use these "harmful" things properly will be more productive (keyword), while those who make themselves "victims of their own freedom" will hurt themselves. "With great power comes great responsibility." The whole case of restriction comes out when we talk about the "industry", and about the many average programmers that (as we all know) are trained in coding in programming languages, but that are relatively untrained in code-systems designing in programming languages.

But many are irresponsible...

...and probably deserved to be restrained. Ergo, Java. :)

Of course, one of the interesting avenues of PLT research is finding ways to give your gurus the ability to develop useful and powerful abstractions, while keeping the inexperienced or lazy programming staff from bringing the entire system to a halt by careless programming.

True; good points;

I think the starting point is to make code design a required curriculum material for every programmer to get a degree. Good design can be learned, and if the average graduate learns it, that should increase the average quality of code. Plus, that would place a greater importance on getting a degree to find career success! It would further increase the practical value of a degree. Programmers ought to be "academically manufactured" as good systems designers--certain standard levels of competency should be expected of a typical hotshot programmer.

...one of the interesting avenues of PLT research is finding ways to give your gurus the ability to develop useful and powerful abstractions, while keeping the inexperienced or lazy programming staff from bringing the entire system to a halt by careless programming.

Maybe the next-generation programming team will have a new, distinct role within it--"software systems designers", who are responsible for managing/developing the language used, as well as the system design of software.

eyes only -> fingers only?

maybe some day we could have a language where you have to pass some rigorous exams to qualify for different levels of usage, and go through some security steps to get access to those things. ;-)

Jesse James, language implementor

But... if you outlaw advanced features, only outlaws will have advanced features. Although now that I come to think of it, I can imagine certain legislative bodies outlawing the writing of computer languages. I guess the question is whether we have freedom of meta-speech...

If that happens...

it wouldn't be in a true effort to improve code quality. (Though that might be the pretext).

On the other hand, the scenario outlined in RMS's short story The Right to Read is somewhat plausible. There are at least three independent factions of somewhat powerful interests (law enforcement and spooks, multimedia barons, and system SW/HW vendors seeking means to enforce lock-in and prevent "tampering") who might benefit from further legal restriction on what can be done with computers. Just imagine a PL (or an operating environment which resembles one) with an industrial-strength type system that simply won't let you play that movie or song file without a license... and such things as low-level PLs (with subvertible type system), and assembly language, as well as tools such as debuggers, logic analyzers, etc. are all outlawed or restricted?

The phrase "bondage and displine language" will surely have new meaning. :)

Apologies for the political rant...