Haskell for C Programmers

Many people are accustomed to imperative languagues, which include C, C++, Java, Python, and Pascal....For [beginning] computer science students,...Haskell is weird and obtuse....This tutorial assumes that the reader is familiar [only] with C/C++, Python, Java, or Pascal. I am writing for you because it seems that no other tutorial was written to help students overcome the difficulty of moving from C/C++, Java, and the like to Haskell.

I write this assuming that you have checked out...the Gentle Introduction to Haskell, but...still don't understand what's going on....

Haskell is not 'a little different,' and will not 'take a little time.' It is very different and you cannot simply pick it up, although I hope that this tutorial will help.

If you play around with Haskell, do not merely write toy programs. Simple problems will not take advantage of Haskell's power. Its power shines mostly clearly when you...attack difficult tasks....Haskell's tools...dramatically simplify your code....

I am going to put many pauses in this tutorial because learning Haskell hurt a lot, at least for me. I needed breaks, and my brain hurt while I was trying to understand....

Now I'm working on a video game in Haskell...and we've written a short tutorial...on HOpenGL....

Haskell has both more flexibility and more control than most languages. Nothing that I know of beats C's control, but Haskell has everything C does unless you need to control specific bytes in memory. So I call Haskell powerful, rather than just 'good.'

I wrote this tutorial because Haskell was very hard for me to learn, but now I love it...."Haskell is hard!" "You can't write code the way I know how!" "My brain hurts!" "There aren't any good references!" That's what I said when I was in college. There were good references, but they didn't cover the real problem: coders know C.

New explorers might enjoy Eclipse IDE support (version 3.1M7 or later only). Old hands might help improve it.

Haskell compiles to C (cf. JHC) and machine code.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I/O

I am beginning to think that introducing Haskell to C programmer MUST start by showing them a function that does I/O...

I agree!

I'm a Java/Python programmer interested in Haskell. My biggest handicap in my occasional efforts to learn is that writing actual programs that do I/O is always left to the end of the book.

The strategy I want to be able to follow is to learn a little, write some code to exercise that knowledge and play with it, and then learn more, but I can't do that till I know how to do I/O and get a program running, rather than just knowing how to write a function in isolation.

More generally, tutorials don

More generally, tutorials don't show you how to make a whole programme full stop until near the end, rather than at the beginning, typically. I suspect that this would aid learning for a lot of programmers simply because the REPL is an alien way of working for them, and in my experience Haskell REPLs tend not to be quite as friendly as Lisp REPLs because of the whole prelude/package thing that you don't have to contend with in Lisp.

easy

"writing actual programs that do I/O is always left to the end of the book"

just read it backwards then! :)

This one does it, imho

Simon Peyton-Jones Tackling The Awkward Squad does the best job of all the tutorials I've read. It also assumes very little Haskell knowledge.

Speaking of very little Haskell knowledge, I must state that I don't necessarily agree with the claim that "one need not understand monads fully in order to perform IO in Haskell". Fortunately, I don't think monads are that hard and I believe that many people who believe they are struggling with monads are actually struggling with IO with monads.

I also think that Fibonacci numbers don't appeal to many C programmers. If you are coming from another programming paradigm, the first thing you look for in Haskell is "how can I do the stuff that I used to do in Python?". If you don't see that immediately, you are not likely to appreciate the infinite lists, lazy evaluation, etc.

I've been translating some of my Python scripts into Haskell. Nearly 70% of the Haskell code ended up within do-blocks. I think that a tutorial aimed at imperative programmers must have examples with a similar ratio.

Purity


Simon Peyton-Jones Tackling The Awkward Squad does the best job of all the tutorials I've read. It also assumes very little Haskell knowledge.

Agreed, very good paper.


Speaking of very little Haskell knowledge, I must state that I don't necessarily agree with the claim that "one need not understand monads fully in order to perform IO in Haskell". Fortunately, I don't think monads are that hard and I believe that many people who believe they are struggling with monads are actually struggling with IO with monads.

I think most programmers' reactions are more among the line that (a) they understand it very fast and than (b) well, uhm, "yeah, so what?" or "how does this IO monad thingy help me?". And after a while, they just realize that being impure inside modules which are thought of as pure sometimes is just plain right. (Ok, I should be concrete: I am thinking of unique identifier generation, randomization, weak hash-maps, or storage modules)

I think a lot of people then start bringing good or silly arguments against monadic IO - I know I did; but it is not about that, isn't it (?), it's just that they want to be able to be impure because they know it solves their problems.


I also think that Fibonacci numbers don't appeal to many C programmers. If you are coming from another programming paradigm, the first thing you look for in Haskell is "how can I do the stuff that I used to do in Python?". If you don't see that immediately, you are not likely to appreciate the infinite lists, lazy evaluation, etc.

Most programmers understand Fibonacci; that's just not-understanding/ dumbing-down the "opposition" (hmpf, talking about not being willing to play with the rest of the world, duh). Even academics choose other implementation languages than Haskell. For the rest: Good point.


I've been translating some of my Python scripts into Haskell. Nearly 70% of the Haskell code ended up within do-blocks. I think that a tutorial aimed at imperative programmers must have examples with a similar ratio.

This is somewhat opposite to the claim made by your cited paper which states that most Haskell programs have a little I/O on top and do the rest functional/pure. Seems like you did some worth-while experiments. Can you elaborate?

[btw: other thought experiment to run assumptions by "Why was Google implemented in Python, and not in Haskell?"]

Haskell programs aimed at rep

Haskell programs aimed at replacing scripts have a higher IO:pure ratio in terms of the stuff they're supposed to do - typically the data manipulation involved is fairly simple, often to the point it's not worth putting it in a separate function ("oh look, I'm just mapping this over a list then writing out the results").

Yeah

Well, that's true and what I expect. And in fact, even if you do 99% IO in your program, that doesn't make the Haskell way lesser, or better, than the Python way.

Traduttore, traditore

I've been translating some of my Python scripts into Haskell. Nearly 70% of the Haskell code ended up within do-blocks.

This may not tell us anything about Haskell, but about how literal your translation is.

Of course, if you translate into Haskell preserving the WAY that Python does it, you will necessarily end up with big do blocks.

If you specify the problem that the Python solves, and then solve it from a Haskell point of view, you might find that a lot can be done outside those blocks.

Imperative programs (especially "scripting" programs ) have a strong tendency to muddle the computation of solutions of problems with the I/O they are encoded in.

Philippa was correct

It really had nothing to do "The Python Way" vs. "The Haskell Way". I just had to list directory contents, get modification times of files, etc. and none of what I did with the data (sort, etc.) required me to write much pure code.

Not everybody can write a Haskell compiler to learn Haskell. Often, life is simple and you don't write ray tracers. You just need a GUI up and running. Not all applications are CPU intensive, so why would all Haskell programs be mostly pure ?

In fact, when just starting to learn (skimming a tutorial maybe -- will buy book later), most of the example programs people expect are of the impure kind (think "hello world"). And it's going to remain IO-heavy for many people for quite a while. Impure Haskell may be a subset of Haskell, but it deserves a much larger portion of a tutorial/book than it is getting today.

if you want Python

you know where to find it. :)

despite most languages being marketed "general-purpose", they certainly have their target audience pretty clear...

Personally I've actively enjo

Personally I've actively enjoyed using Haskell for scriptish tasks - Parsec in particular's great for lightweight parsing jobs, so long as you value readability over trimming a few more characters off.

The Way of Ways

It really had nothing to do "The Python Way" vs. "The Haskell Way". I just had to list directory contents, get modification times of files, etc. and none of what I did with the data (sort, etc.) required me to write much pure code.

It may not have required it, but you could have done it anyway. ;-)

For example, you could make all of your sorting and other manipulations of the items pure, and only have do blocks for the input and output itself.

For small problems this may not seem worthwhile, but the modularity it allows gains value as you scale up, or if you need to make significant modifications later on to what your program does.

If, however, you only have small problems, then you are already biasing to the Python Way by the problems you are choosing to solve, and the kind of thinking that tends to be used for that class of problems.

Haskell and Python may simply not occupy the same niche in the PL ecology.

The sorting almost certainly

The sorting almost certainly was pure - if it's on the RHS of return or passed in as a parameter, it's pure code. You might end up yanking it out elsewhere and naming it later, but that's no biggie.

And I indeed do that...

Yes, I tend to follow the same principles and isolate the pure sections, but sometimes it is so trivial that you just leave it inside the do-block because if I decide to move it outside, then I need to pick a better name for it so that it doesn't clash with other little functions, etc.

I agree with Philippa (again!) that actually Haskell is not that inappropriate for scripting tasks. It's a bit more verbose, but as you said it also can scale better.

Yes, Haskell and Python don't exactly overlap when it comes to respective strengths. I just don't understand why need to behave as if there is no overlap whatsoever. More students would be interested in how darcs is written in Haskell than what Pugs is doing. We'd also like to "reason about" GUI code.

Not so far apart as it might seem

I just don't understand why need to behave as if there is no overlap whatsoever.

By all means, if you and Philippa want to write a tutorial on Haskell scripting, I'm sure many of us (and beyond) would be interested to read it.

There is a lot of room for specialized tutorials in the world; I just don't think it is necessarily a failing of most tutorials if they don't cover all specialized approaches.

We'd also like to "reason about" GUI code.

Have you seen SOE? Most of the book covers doing GUIs and other tasks that are normally considered "not very functional". Highly recommended.

The Microsoft angle

Microsoft marketing battalions will soon enough propagandize monads. From OnMonads,

Our biggest mistake: using the scary term 'Monad' rather than 'Warm Fuzzy Thing.'

Interestingly, Microsoft's new Longhorn command shell, which does use typed pipes, was originally called Monad. Considering they employ Haskell's lead developer, this is probably not a coincidence.

Not that I particularly like monads, but the C hackers who currently bemoan them will laud them when Microsoft PR gives the cue, I bet.

It is always nice to hear the other side of the monad story.

There has been much effort expended to remove the concept of mutable state from programming, to be replaced by immutable values as in mathematics. This is entirely misguided. The idea of a changing world, if not in the nature of reality, is certainly in the nature of humans. To replace this utterly common sense idea with elaborate abstractions is precisely the wrong thing to do to make programming more usable. Monads are a reductio ad absurdum.

Indeed, why mutable state is

Indeed, why mutable state is wrong? it is considered that mutable state is the source of all problems, but clearly it is not. There is nothing stopping a programmer from doing silly things in any language, even functional ones.

The problem lies that programming requires a 100% perfect synthetic ability at all levels: the programmer must remember all preconditions and postconditions, at all levels, and code around those pre- and post-conditions. When these are forgotten (because usually they are too many), then there are bugs.

immutable state programming makes the promise of not needing to remember pre- and post- conditions, because "there is no way to change something that works". So simplicity and flexibility is lost, because of the failure to remember.

I am still not convinced that immutable state programming is of any usefullness or of any difference to imperative programming, even after having read quite a lot of articles, posted in LtU and elsewhere. Does anybody have a convincing example where immutable state programming is actually useful? please not that I am not talking about FP languages (because closures/tail recursion/continuations/good type system/type deduction are useful constructs that any language should have), but specifically for immutable state programming.

Not Winnable

I guess I've arrived at the conclusion that the value of being stateless is one of those things that's either apparent or isn't, and is either reflected in your experience, or isn't. For myself, having spent decades dealing with problems arising from abuse of shared-state concurrency, cache coherency, inability to replicate/scale, etc. in software due to the difficulties inherent in keep state straight, it's obvious that avoiding state manipulation where possible is advantageous. The best explication of the issues and alternatives I'm familiar with remains CTM.

On an even more pragmatic level, though, consider the HTTP protocol and the notion of REST (Representational State Transfer) that underlies it. Some methods in HTTP are idempotent; others aren't. The protocol was carefully designed to be stateless itself but to support the representation of state, and it gets quite interesting once you look beyond GET and POST to the rest of the protocol. The protocol design makes possible caching proxy servers, cheap fault-tolerance via failover, and extreme scalability.

It's been said many times before, but I'll say it again: perhaps the single most significant benefit of stateless programming is the ability to reason about code locally. I find the value of this extremely difficult to overstate, given the mind-boggling amount of time I waste in an effort to understand the context in which a problem with a function occurs in the body of code I face at work.

I'd argue that languages have

I'd argue that languages have been moving toward avoiding state manipulation. In a typical Perl or Python script, I write in a much more functional style than I do in C. And that's good.

There are also different level of stateless programming. Destructive updates to local variables in a short function is globally pure, even if not locally.

Too much state is a bad thing...

... but I remain in the camp that believes that sometimes a stateful solution is the most natural solution to a given problem. That said, there is lots of gratuitous mutable state about.

If you look at this history of mainstream languages, there has been a gradual shift from procedural to OO. Most procedural languages make no attempt at data encapsulation; many programs written in them contain a plethora of global variables and functions which silently manipulate them. Reasoning about such spaghetti--either by human or by computer--is difficult, and both academia and industry have long recognized this as an anti-pattern.

OO languages with encapsulation have had one benefit--and one drawback--in this regard. The benefit, of course, is encapsulation. Use of objects replace many uses of global variables (including lexically scoped variables defined in outer functions but fiddled with in nested functions), and encapsulation allows far better reasoning about where a given variable might be used or modified. (Some OO advocates disagree with this...) The drawback OO has provided is that a sea-of-mutable-objects programming style is encouraged; this style doesn't scale well at all. It's hard to do persistence right, it's hard to do concurrency right, and it's hard to do distributed computing right in the sea-of-objects.

Personally, I like the policy undertaken by Cecil (and a few others)--variables are immutable by default (can only be bound once), but can be made mutable if needed. (When I write code in C++, I find myself declaring most local variables as const, I catch lots of bugs that way).

replace

"many programs written in them contain a plethora of global variables and functions which silently manipulate them"

replace "global" with "instance" and "functions" with "methods" and you just decribed object-oriented programming. :)

well, at least, instance variables are supposed to be logically integrated to the classes they are defined in and are not supposed to be directly manipulated by external classes/modules...

Yup....

...but in properly-written OO code, analysis of instance variables is tractable. A class/object will have a finite set of methods, which are the only places the instance variables are modified--this makes analysis of the mess tractable.

I assume, of course, that instance variables are indeed "private"; public/protected instance variables are another matter altogether. (Likewise with variables that have direct setters/getters).

At any rate, encapsulated variables are certainly preferable to exposed globals...

Modules can encapsulate, too

"Global" isn't an across-the-board scary term. Before OOP, the common practice was to encapsulate as much as possible in a single module. C is icky in this respect; I'm thinking more along the lines of Modula-2 or Ada or Borland's Turbo Pascal extensions (units, introduces in version TP 4). And so you'd have exactly the same encapsulation as with OOP, except you'd call abstract_data_type_do_something(handle) instead of handle->do_something(), where "handle" is of type abstract_data_type.

I don't see how this is any less tractable than OOP.

An added complexity of state is that...

...any tests, that seek to be somewhat all encompassing, will have to hold a copy of the environment (world) before and after each particular test. With the use of state, you can have indirect effects to the state of the world that may have dire consequences in terms of the functionality of the program. So what I have to test for is not just that the expected results were returned, but also to make sure there were no unintended consequences to that action.

One of the top ten causes of Cthulhu invocations.

And this is exactly why so many J2EE programmers end up joining the forces of Cthulhu. At least there's a chance Cthulhu won't fry your brain right away.

Look at the simple architecture of the Cactus framework for example.

Cactus is a simple test framework for unit testing server-side java code (Servlets, EJBs, Tag Libs, Filters, ...).
The intent of Cactus is to lower the cost of writing tests for server-side code. It uses JUnit and extends it.
Cactus implements an in-container strategy, meaning that tests are executed inside the container.

Simple in this case is twenty two thousand lines of java source.

I did J2EE programming for a year or two.

I very much prefer to use partial application and closures to simulate objects. That gives me faith that my data doesn't change while I'm using it, and allows simpler (re)production of state for testing.


--Shae Erisson - ScannedInAvian.com

State of the world

Which is true for any program which changes the world; still doesn't make a pure language a solution to any problem.

If you really need that strong testing then my preferred solution would be somekind of sandbox model - I.e. state for each module what parts of the sandbox it uses. But why put that information into a type?

Or not defendable?

[sigh....]

A lot of programmers know the value of immutable state but that doesn't make a pure language a panacea; or even a cure of anything.

Think of the following: I want fast comparisons on strings (generalize this to any datastructure) of which I expect to have a lot. So I implement a simple StringStore module which stores lists of chars uniquely [in a store] by sharing postfixes and which exposes one function make_unique:String -> String, this gives me O(1) instead of O(n) comparison and reduces memory load. Anywhere I want a unique string I just add one function to the code. Easy, simple, elegant, etc.


In a pure language I have to explicitly weave the store through all functions in modules which need these unique strings.

One unique connection to a database? One small change to a module so that settings can be restored from a file? One function which needs strong semi-random numbers to do a probabilistic search? One small optimization in a program to do some caching on HTTP requests?

If I would make a robust implementation in Haskell I know I would start of with one refactoring of libraries: parameterize each and any function with the IO monad. I am not going to risk refactoring 30000 lines afterwards because of one requirement.

Nobody makes designs in which one is able to mutate everything. But, &%#!, The world is there, you cannot hide from it.

The world is there, you can

The world is there, you cannot hide from it.

True, but you can at least model it accurately, for example taking into account the time dimension, which is the underlying factor which creates the illusion of mutable state.

I'm not arguing for pure programming languages, but there's a persistent meme that mutable state is some kind of "reality", when it's more like a degenerate view of the world which ignores one of its most important dimensions.

Good fences make good neighbours

But, &%#!, The world is there, you cannot hide from it.

I'm not sure exactly what you are arguing for marco, but you don't need to hide from the world, just make sure you know where the boundaries with it are.

It is very useful to have a nicely modularized "inner space" of stateless functionality and relegate the messy statefulness to a top-level that is clearly delimited from it. This can be done explicitly (say by dependency injection) or implicitly (for example, by monads).

Knowing where the boundaries are and where you need to worry about state is a big boost for building robust apps.

The only fence against the world is a thorough knowledge of it


I'm not sure exactly what you are arguing for marco, but you don't need to hide from the world, just make sure you know where the boundaries with it are.

My, my... I am not hiding; purists claim that the program must hide from the world; I find that, well, puristic.


It is very useful to have a nicely modularized "inner space" of stateless functionality and relegate the messy statefulness to a top-level that is clearly delimited from it.

Which is good software engineering practice but sometimes just plainly unwanted as the examples I gave showed. [ok,ok, I'll add it; as the examples showed clearly to me ;-)]

Please, discuss the examples.

Divide the world and conquer

purists claim that the program must hide from the world; I find that, well, puristic.

These sound like straw purists to me. ;-) Who is advocating "hiding from the world"?

Please, discuss the examples.

To be honest, I didn't understand your examples very well.

The one about parameterizing all libraries with IO seems counter productive to me. Why not do your stateless thing with the library, get your result all nice and tied up, then pass it to IO as needed at the top level?

The string thing didn't seem relevant to statefulness to me, but maybe I just don't understand what you mean.

Strings vs Atoms, Monads vs. Uniqueness

If you store every string in a hash table, you can use a single pointer comparison to check for equality. This is the common technique used to implement atoms in Lisp and Erlang, and it is a substantial performance boost for interpreters and compilers.

The issue is that the mapping of each string to a unique pointer (necessarily!) depends on the state of the computer, but from the perspective of a single execution, it's observationally indistinguishable from a pure function.

I understand several of marco's other examples. He sorta gets at one of my minor complaints of Haskell, that the gulf between stateful computations and non-stateful computations makes shifting between the two difficult.

For example, I wrote a fairly substantial library in Haskell to manipulate string automata. Many of the algorithms depend on finite maps and sets, which were parametrized to arbitrary Edison implementations. These algorithms use these data structures in a linear fashion, meaning that they can be implemented using mutation. The problem is that mutation requires a monadic interface, which Edison does not provide. Moreover, I certainly don't want to use a monadic interface for persistent data structures!

And this is an eminently practical question. For the data structures I'm familiar with, mutation-based structures like arrays can operate in nearly O(1) time, and persistent structures like trees and random-access lists operate in O(log n) time. This makes a huge difference when dealing with large automata. For me, it was a compelling argument for some kind of linear logic in the type system, such as Clean's uniqueness typing.

Premature optimization is the root of all evil

If you store every string in a hash table, you can use a single pointer comparison to check for equality.

Thanks for the clarification, Leon. I get the issue now.

IMO, this is a completely different type of statefulness than I thought we were discussing. In some ways, I think this is a much more dangerous form of statefulness, since it is statefulness that the programmer is not supposed to have to worry about.

It is a not at the level of language specification or at the level of the application domain (the programmer's natural habitat) but rather at the level of the implementation, where things should be "under the hood" as far as the programmer is concerned.

There are two ways out of this problem: to ignore it or to add a new kind of equality testing to Haskell. Both relate to fundamental design decisions in Haskell that I don't believe you could change without sacrificing some core positive things about the language.

The "ignore it" solution relates to the fundamental trade off that comes from chosing functional programming: sometimes you take a performance hit in exchange for the benefits of FP. How severe this hit is varies quite a bit, and whether it matters or not is subjective and circumstantial.

Adding "identity testing" equality to Haskell, is another approach that would solve this problem, but having seen the three kinds of equality that Scheme provides, I'm not sure that adding these would be a win. ;-) Furthermore, value equality is conceptually fundamental to Haskell: you really just muddy the ontological waters by adding a new, lower-level type of equality.

Ultimately, this is a case where I would say that if you can't live without pointer equality as a language feature, don't use Haskell. Plenty of other language fill that need, and the needs that Haskell fulfills aren't improved by its addition.

Stateful implementations of pure interfaces

Well... GHC does have unsafePtrCompare, and unsafePerformIO. Their exact semantics is a little sketchy, and subject to change between compiler versions. (Though behavior shouldn't change if the use of these functions is "safe")

I used unsafePtrCompare to implement atoms for my Tiger compiler. I tried to unsafePerformIO to create them, but that didn't work. I had to move the atom creation into the IO monad before it would work right. That was a couple years ago, and I never really understood why unsafePerformIO didn't work right.

Perhaps I finally have an answer my question, atom creation is a "pure function" from the perspective of a single execution, but not all executions. I'm guessing that the safe use of unsafePerformIO requires the IO actions are "pure functions" from the point of view of all executions.

So, if something is observationally indistinguishable from a pure function, what's the problem with implementing it using stateful computations? As I understand it, NHC does a fair bit of work implementing linearly-used values as destructive updates. I don't know what GHC does in this regard, and I really should aquaint myself with the MLKit implementation.

I've wondered from time to time if there is some way that you could create a pure language that allows stateful implementations of pure module interfaces. GHC has ST monads, but this state does not survive across function calls.

Implementation is under the hood

So, if something is observationally indistinguishable from a pure function, what's the problem with implementing it using stateful computations?

In principle, nothing, so long as the statefulness is not observable at the program level.

(In effect, this already happens; computers normally have mutable state registers that implement the "stateless" program, for example.)

Good variables

Perhaps I finally have an answer my question, atom creation is a "pure function" from the perspective of a single execution, but not all executions.

I vaguely remember an intuition like that, I suspect I got it while (or after) reading A Game Semantics of Local Names and Good
Variables
(or some paper related to it by citation).

It seems to me one way to do

It seems to me one way to do that might be to have a variant of runST that returned its heap as well, and a continueST that took a heap as a parameter? There's a problem with that though, the s type parameter would escape the call to runST - in effect typing this correctly'd require a type-level gensym.

Platonic Solids


Moreover, I certainly don't want to use a monadic interface for persistent data structures!

This was one of my reasons to go from Haskell to ocaml. [Hehe, see the other thread. If C is the ball and Haskell is the cube, then I guess ocaml is the dodecahedron for now.]


And this is an eminently practical question.
For the data structures I'm familiar with, mutation-based structures like arrays can operate in nearly O(1) time, and persistent structures like trees and random-access lists operate in O(log n) time. This makes a huge difference when dealing with large automata. For me, it was a compelling argument for some kind of linear logic in the type system, such as Clean's uniqueness typing.

Unfortunately, that won't solve the interface problem. Instead of passing a monad you now need to pass a UNQ type around. (Incidently, I actually switched from Clean to Haskell because of its interactive interface - both for teaching, as for my own pet projects. Unfortunately, I don't teach it anymore.;( )

code refactoring

The example you gave is very valid, but the solution can be one of several, each of which requires different degree of refactoring the existing code base.

If state passing or monad sounds over-kill in this case, maybe implicit parameter could be helpful?

Actually, if you have a good refacting tool that can implement the single change across thousands of files safely and automatically, maybe state passing isn't a bad idea after all?

It takes courage to face a problem than running away from it, and that's a price one has to pay by being purist or impurist, either way!

The naked purity


The example you gave is very valid, but the solution can be one of several, each of which requires different degree of refactoring the existing code base.
If state passing or monad sounds over-kill in this case, maybe implicit parameter could be helpful?




I doubt it, I think purity comes with clear advantages and clear disadvantages. You can try to pass stateful computations around explicitly, stick them into one big Monad, or maybe do some sandbox tricks on the module level but I don't think purity scales (the last two I actually, being a purist myself, see as manners to blur the distinction between pure and impure solutions - you might as well go all the way).




Actually, if you have a good refacting tool that can implement the single change across thousands of files safely and automatically, maybe state passing isn't a bad idea after all?




I still can only imagine heavily state-loaded functions everywhere in a major program; I don't think refactoring -even automatic- will solve any problem. Of course, I have nothing to back up my claims except for the given examples, and yeah, I will gladly trim Wadler's beard for free when there are three applications of Haskell of size 10^6 LoC, well, so to speak ;)




It takes courage to face a problem than running away from it, and that's a price one has to pay by being purist or impurist, either way!




Good for the purists. In that respect, I really liked the (here somewhere cited) paper by Peyton-Jones, he was nicely critical of his own work and, of course, I understand scientific practice to study "whether soccer can be played better with a cube instead of a ball" ad nauseum. But to me, it just seems like that: playing soccer with a cube - [Edit: it's time to try a dodecahedron.] Is it time to try a dodecahedron?

My name is legion...

I don't think purity scales (the last two I actually, being a purist myself, see as manners to blur the distinction between pure and impure solutions

I'm starting to think you are arguing with yourself, there, marco. ;-)

As a "purist", you have defined "purity" in a very extreme way ("the program can't use state in any form") and then proceeded to argue why this is bad.

I think most of us see "purity" as a technical issue, more or less "how thick do you build the firewall between stateful and stateless".

I also think most of us recognize that some kinds of computations are inherently stateful, sometimes in theory, sometimes only in practice; we just want to build a fence between the two so we know which is which, and can choose carefully which items must go on which side of the fence.

Have you read CTM yet? I strongly recommend that you do so as soon as you can. It may clear this schizophrenic argument up for you, and stimulate some new and more fruitful arguments to pursue.

They sacrificed unto devils, not to God

[I will not feed the troll, I will not feed the troll, $!%&*, I WILL FEED THE TROLL]


As a "purist", you have defined "purity" in a very extreme way ("the program can't use state in any form") and then proceeded to argue why this is bad.

The definition stands and the reasoning doesn't look that bad to me.


Have you read CTM yet? I strongly recommend that you do so as soon as you can. It may clear this schizophrenic argument up for you, and stimulate some new and more fruitful arguments to pursue.

Ha! Well, marc, I already have a few bibles in my library, at some point I will add this one. Is this a my ... is bigger than yours discussion?

You say I presented a schizophrenic discussion - so, whatever, I don't think I did and, furthermore, if you can't throw a million opposing views at a problem and then start analyzing what you did, you cannot hope to understand it - that's my own take on things.

Duh, If I would play your call, then I would say you tried to be part of a highly critical fruitful discussion you only half understood and are trying to blame others for that.
Look: democracy.... but not the way I like it. Do you?

De Do Do Do, De Da Da Da

[I will not feed the troll, I will not feed the troll, $!%&*, I WILL FEED THE TROLL]

Hey, that was MY line. ;-) Troll is in the eye of the beholder, sometimes.

The definition stands and the reasoning doesn't look that bad to me.

One of the purposes of posting on LtU, IMO, is to test one's assumptions, and to clarify the assumptions and ideas of others. IMO, your were reasoning against a straw man that obstructed more fruitful views of the discussion.

You are of course free to disagree, but then again, you shouldn't be surprised if I continue to disagree with you as well.

Duh, If I would play your call, then I would say you tried to be part of a highly critical fruitful discussion you only half understood and are trying to blame others for that.

I admit some of your points did not entirely make sense to me. Is this a fault in my understanding, or in their presentation? A matter of perspective perhaps.

democracy.... but not the way I like it.

I have no idea what democracy has to do with this situation in particular, so I will respond with a favorite quote on the subject:

Democracy is a device that insures we shall be governed no better than we deserve.
- George Bernard Shaw

Please

Can we return to PL issues?

Thanks ;-)

Why is mutable state wrong?

Indeed, why mutable state is wrong?

Before you can fully grasp the answer to that question, you have to ask "when is mutable state right?" To answer that question, you need enough experience programming without mutable state to be able to tell the difference between bad mutable state, and good (or at least acceptable) mutable state. It's very difficult to recognize why mutable state is wrong if you don't have enough experience programming without it to compare.

In practice, it turns out that the places where mutable state is really useful are a tiny fraction of any given program. If you're using a language that discourages mutable state, but still supports it, then this whole issue is mostly moot to the language user - you use mutable state sometimes, when appropriate, and your programs will be the better for it.

Does anybody have a convincing example where immutable state programming is actually useful?

Immutable state is widely relied on, even in imperative languages.

Transaction servers discourage mutation of shared state outside the current transaction. As Paul pointed out, the web's REST model similarly discourages relying on shared state outside of the current request. In both of these cases, mutable state inside the current request isn't particularly necessary, since the usual pattern is for these requests to behave like real functions, converting some input into some output. Introducing mutation here adds little value, but can introduce various classes of problem. Most transactional applications can easily be written to be purely functional, and this makes them easier to reason about, debug, and maintain. That's more than useful, it's critical to developing reliable software.

For another example, accounting applications avoid mutating historical information, instead adding new records to represent changes, even if those changes apply to historical knowledge. The data model in all accounting applications is almost purely functional, with the exception being things like caching of current balances, which is usually considered one of the acceptable uses of mutation.

These are all "macro" examples in a sense, applying to an application's entire context including database, I/O model, etc. However, the same kinds of reasons that it's good to avoid mutation in all of these cases, applies to programs in general.

Different levels of mutable state badness

Since I seem to be in a rather irreverent mood, here is my completely unscientific, thoroughly informal, and wholly anecotal list of Dante's circles of mutable state. Not all the circles are populated, but Hell wasn't built in a day, you know...

:)

From least nasty to most nasty

The vestuble: Non-mutable state. Just so we have a frame of reference. (Some may argue that this is Heaven rather than the outermost circle of Hell, but in Heaven all the problems are already solved. :) )

1st circle: Mutable state which is not externally observable (given the programming model). Such things as variables cached in registers and other optimizations performed by the machine.

2nd circle: "Monotonic" mutable state. Discussed a bit in CTM, these correspond to variables which can be bound separately from their definition (but once bound cannot be re-bound), lazy evaluation, etc. In general, preserve referential transparency, and are well-behaved.

3rd circle: Linear mutable state. Mutable variables which are accessed from only one place; and generally aren't subject to aliasing. Examples include local variables in functions (ignoring the issues of lexical closures which might alias such variables, and call-by-reference), "private" variables within objects, etc.

4th circle: Managed mutable state. State in a database or some other system which provides transaction semantics and other means of arbitrating access to the shared variables.

5th circle (getting hotter): Aliased mutable state. Mutable variables/objects which are "used" in numerous different contexts (and therefore aliased)--and thus frequently used as a backdoor communications channel (often unintentionally). Often described as a "feature" of certain OO languages (much the same way pointer arithmetic is trumpted as a feature of C).

6th circle (even hotter): Global (unmanaged) mutable state. As bad as the previous, but as the state is globally exported, you don't know WHO might be messing with it, or where.

7th and 8th circles: Currently unoccupied, though I await suggestions. :) Feel free to interpose suggestsions above too, and we can demote intervening circles to fill in the gap.

9th circle: Concurrent mutable state. The obnoxious practice of mutating shared state from multiple threads of control, leading into a predictable cycle of race conditions, deadlocks, and other assorted misbehavior from which there is no return. And if a correct solution (for synchronization) is found for a given program, chances are any substantial change to the program will make it incorrect again. But you won't find it, instead your customer will. Despite that, reams of code (and TONS of middleware) has been written to try and make this tractable. And don't get me started on a certain programming language which starts with "J" that saw fit to make EVERY object have its very own monitor....

In general, the more a program(mer) can do to stay in the outer layers, the better off he/she/it will be. I personally have little problems with the outer four circles, and can tolerate the fifth... but the innermost ones are often a code smell in my experience.

main = putStrLn "Hello World"

There are various plateaux in learning Haskell, I think.

Right now I'm having trouble with type classes - specifically, trying to make a version of the State monad that's also an instance of MonadPlus and MonadState.

You would have thought that something like instance MonadPlus State (Maybe s) where... might work, and indeed it might, but I can't get it to.

If I start with a new type,

newtype StatePlus s a = { runStatePlus :: a -> Maybe (a, s) }

then I can make it an instance of Monad and MonadPlus, but not MonadState. It's frustrating, because I want to use standard types and type classes as far as possible, but can't seem to extend them to do what I want, and can't quite figure out why not...

Maybe...

you want Monad transformers?
import MonadState
import MonadTrans

type StatePlusError s a = StateT s Maybe a

bar :: Bool -> StatePlusError Int ()
bar f = do
  if f then mzero else return ()
  put 42

runStateT (bar False) 0 yields Just ((),42). With bar True, you get Nothing, i.e. an exception was thrown, so to speak.

With MonadState and MonadPlus you can also combine non-deterministic with stateful computations:

type StatePlusNondet s a = StateT s [] a

option x | odd x = put x >> return (x + 1)
	 | otherwise = mzero

foo :: StatePlusNondet Int Int
foo = option 1 `mplus` option 2 `mplus` option 3
runStateT foo 0 yields [(2,1),(4,3)], i.e. option 1 and 3 succeed with internal state 1 and result 2, or internal state 3 and result 4, respectively. Option 2 does not succeed at all.

I remember the same post made

I remember the same post made on Slashdot...I wonder if you were the one that posted it.

No

I've never posted to Slashdot and such questions are better handled by private email, having no relevance to the topic.

sorry, just asking.

sorry, just asking.

I am the poster and author

I posted the post you are thinking of on slashdot. I posted a link to this tutorial at the end of a long post about programming languages in general. I am also the author of this tutorial, and I am happy to hear that you remembered it. If you would like to give any feedback, please look me up here:

http://www.haskell.org/~pairwise/

Thank you for your feedback.

very nice!

very nice tutorial. less academic, more pragmatic... :)

Nice job

... but what I am really waiting for - for some time now - is the Haskell equivalent of Dive Into Python.

*Bookmarks that idea*

*Bookmarks that idea*

As I was saying

Back to the main subject...

We have here an example of something LtU is forever seeking: C programmer epiphany. In spite of monads and other brain-hurts, this guy was so impressed that he wrote an article to help others find enlightenment. Hooray! And thank you.

Hi there!

Hi, I'm the author of the tutorial. Thank you so much for letting people know about it. If you'd like to give any feedback, feel free to go here and look me up:

http://www.haskell.org/~pairwise/

I'd like to hear any suggestions for other tutorial ideas as well.

New Location

This is the author again, commenting in this old thread to give the new location of the tutorial:
http://haskell.org/haskellwiki/Haskell_Tutorial_for_C_Programmers

Enjoy!