Object Oriented Programming + Referential Transparency

After doing a search through some of the archives (although not as extensive as it could have been) I couldn't find any references to the combination of object oriented programming and the maintaining of referential transparency. Is such a thing possible? Are there simple solutions or is a rather complex operation?

Thank you.

MJ

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Certainly possible

It's certainly possible; IIRC, there are object systems for Haskell, for example. It can be done in any OO language, though; all it really takes is treating objects as values rather than containers. For example, rather than modifying a Point, you return a new one.

How complex it is...depends on what you need to do. If you want to build a demonstration app, you can pick a problem where it works smoothly. If you want to build a gigantic GUI where actions have visible side effects, that's harder.

True, but ...

A problem is that according to OO design philosophy a program is modeled as a collection of cooperating parts (objects) which communicate over (mostly known) channels by message passing; of course theoretically this can also be done in a natural FP style, and a lot of programs (say, with tree-like class diagrams) are examples where referential transparency isn't a problem.

If the structure of the class diagram becomes more graph-like, there don't seem to be many good pure FP solutions for modelling complex OO programs except for those FP languages which support concurrency explicitly (though, than, often, you somehow need to implement your own RPC mechanism on top of the concurrency layer).

It's one of the reasons I like languages like o'caml a lot - my favorite way of designing programs is to keep as much as possible referentially transparent, but at the top level expose objects (in the sense of reinstantiatable modules) which may have state and interact.

(of course, the next step is then to implement combinators which then define the class diagram for the program such that the main function corresponds to instantiating that class diagram - didn't make it that far, yet ;-)...)

Declarative objects

Section 6.4 of CTM shows how to do all four combinations of stateless/stateful and object/ADT, i.e., the usual stateful objects, declarative objects, the usual stateless ADTs, and stateful ADTs. Declarative objects are implemented on page 423.

Possible != Desirable

I think you could achieve referential transparency in, for instance, Java, if you forbade update of variables outside the current scope. However, that would render Java a completely retarded language. Note that it would imply that you can't update member variables in methods! While there is merit to reducing mutable state even in Java, the rest of the language isn't powerful to make up for that deficiency. On the other hand, O'Caml is OOP-like and FP-like but also allows mutable state, so I wonder how well it supports referential transparency.

Featherweight Java

It is a subset of Java used as object calculi to prove typing proprieties. If I remember right it is referential transparent (does not have updates).
http://homepages.inf.ed.ac.uk/wadler/topics/gj.html#featherweight

Oleg's purely functional OO system

This system is implemented in just 45 lines of code.

and it shows...

Really, defining an object in terms of such esoteric low-level operations like "make-dispatcher" or "new-mmap" is not that exciting and feels like a hack more than anything else.

I know it's amazing for its size, but i'm glad there are more practical and "normal" OO systems for Scheme...

"Practical"

rmalafaia: I know it's amazing for its size, but i'm glad there are more practical and "normal" OO systems for Scheme...

Unfortunately, this is one of those domains where "practical," more often than not, means "wrong." It's worth remembering that Oleg didn't write his purely-functional object system to be concise, but as a solution to this problem.

Of course, another perfectly valid solution to the problem is for the language not to conflate subclassing and subtyping. At least Sather and O'Caml have avoided the mistake.

Are you sure?

It's worth remembering that Oleg didn't write his purely-functional object system to be concise, but as a solution to this problem.

Oleg himself writes:

For example, http://pobox.com/~oleg/ftp/Scheme/index.html#pure-oo implements a purely functional OO system. It supports objects with an identity, state and behavior, inheritance and polymorphism. Everything in that system is immutable. And yet it is possible to define something like a BagV, and derive SetV from it by overriding a put message handler. Acting this way is bad and invites trouble as this breaks the LSP as shown earlier. Yet it is possible. This example shows that immutability per se does not turn object derivation into subtyping.

My Bad!

Right you are; I should have said that Oleg wrote it to explore the question as to whether immutability alone solves the subclassing vs. subtyping problem, and that the answer is "no, not by itself." It's his BRules implementation that actually solves the problem.

Correction, 17 lines

Really, defining an object in terms of such esoteric low-level operations like "make-dispatcher" or "new-mmap" is not that exciting and feels like a hack more than anything else.

That could easily be addressed by a simple macro layer. But perhaps I should have provided a bit more explanation. The original post asked:

Is such a thing possible? Are there simple solutions or is a rather complex operation?

Oleg's code shows that it's possible, and that there are simple solutions.

BTW, I miscounted the code, including the definition of the point-2D object in the count. The OO system itself is only 17 lines. Naturally, 17 lines of code isn't going to give you a full-blown competitor to Smalltalk. But even if you don't like it as it stands, you can extrapolate. Imagine what you could do in 100 lines of code! ;)

100 lines

By jove, with 100 lines you could write a program with at least five custom object systems!

Come to think of it, that sounds just like the one serious Scheme program that I wrote :-)

chuckle

chuckle

TOOT

A while ago I created a Tcl extension that attempted to be an OO-like system with pure value semantics (referential transparency) called TOOT: Transparent Object-Oriented Tcl. ("Transparent" means that objects are pure values which can be recreated from their string representation, if needed, as opposed to being opaque handles/references as in most Tcl OO systems). In actual fact, the system is more of a functional programming system than being particularly OO. It just looks a bit like OO in usage. Lots of Tcl commands are already defined as "ensembles", which are basically commands which take a sub-command (message) as first argument, e.g. [string length $string] or [file size $file]. By noticing that many of these ensembles basically define "types", and take as their second argument the value of that type (i.e., something resembling a "self" parameter), it was a short step to package up an ensemble with an appropriate value and pass that around as an "object". All it takes is a function that swaps its second and third arguments.

proc object: {type self method args} {
    uplevel 1 [linsert $args 0 $type $method $self]
}
proc apply {func args} {
    uplevel 1 $func $args
}

Then you can do things like:

set obj [list object: string "Hello, World!"]
puts "Length of obj = [apply $obj length]"
puts "10th char     = [apply $obj index 10]"

The advantage of this is that many Tcl built-in commands instantly become classes in the scheme, and you get various advantages from using pure values: GC, transparent serialisation, etc. You can then add a framework for creating the ensembles with inheritance, etc (or just use an existing OO system for that part). You can also get rid of that "apply" via some clever tricks (it actually isn't needed in many places anyway). I've found that it works very well. However, whether this is still OO anymore is probably debateable (depends what definition of OO you're using). I've found that it covers most of what I personally found useful about OO, and you can always add back in mutable state by defining a new "ref" class (built on top of variables). It was fun creating it anyway :)

For those who don't know Tcl, here is a similar thing in Scheme:

; Define an "ensemble" for lists
(define (list-type message self)
  (cond
    ((eq? message 'length) (length self))
    ((eq? message 'first) (car self))
    ((eq? message 'rest) (cdr self))
    (#t (error "undefined message"))))
(define (object: type self)
  (lambda (message)
    (type message self)))

You can then use this in two ways:

(list-type 'length '(1 2 3)) ; yields 3
(define l (object: list-type '(1 2 3 4 5)))
(l 'length) ; yields 5
(l 'first)  ; yields 1

Someone who knows more Scheme than me would have to generalise that. Tcl does variable/symbol lookup in the opposite manner to Scheme: i.e. in Scheme a symbol is replaced with it's value unless quoted (with '), in Tcl symbols are left as names (strings) unless dereferenced (with $).

A conflict between OOP and FP

Inheritance is less useful in the functional paradigm. If a method updates the state, then perhaps in an inherited class it will be reused implicitly. But if it returns a new object, then it should probably be changed to return an object of the subclass.

What if an object deep in the hierarchy must be updated?

For example, when processing an XML DOM, an attribute of an XML node many levels deep may be changed. How is that supposed to be handled with referential transparency? does the whole tree need to be copied?

It's one thing that I have not understood how to handle in functional languages.

Does anybody have an answer on this?

Yes

You copy the whole tree, but because referential transparency allows you to share implementations, you only reconstruct the parts of the tree which have changed. This means that the path from the root to the changed node is the only thing that's different, which means changes cost you about log N in the size of the tree. Still more expensive than in-place update in terms of space cost, but I believe that SVN is more or less implemented this way, and it seems to work just fine for that.

The book to read...

...on this topic is Purely Functional Data Structures by Chris Okasaki. Available at a fine seller of technical books near you, or at amazon.com.

...and the PDF to search

Is the electronic copy Purely Functional Data Structures vastly different from the paper one? I am all for supporting authors, but sometimes a searchable copy is much better.

There are significant differences

(Not intended as plug, just trying to answer the question...)

The book is an expansion of the electronic version. It adds Haskell source code, exercises, and more introductory material. Perhaps most significantly, I had more freedom in the book to describe data structures that I didn't invent.

There is one part of the electronic copy that didn't make it into the book, and that is the appendix describing the technical details of adding lazyness to ML.

Persistence

Still more expensive than in-place update in terms of space cost

On the plus side, it gives you persistence for free, i.e. the new and the original version are both available simultanously without interference. For some problems this is a huge advantage. With an imperative solution you usually start copying in those situations (with O(n) cost, unless you are doing something clever and complicated).

Data-Intensive Apps

I'm curious to know how FP apps perform on data-intensive problems like transaction processing, data mining, and especially simulation (like finite-element analysis). I'm wondering how much purity and referential transparency becomes a liability when memory footprint becomes a serious issue.

About the same

Have you seen the Great Computer Langage Shootout?

In summary, FP languages are about the same speed as everything else.

If you can profile and refactor, nearly anything can be fast enough.

FP and memory issues

This essay on experiences with using Haskell with really large datasets focuses on laziness and how it can really hurt your memory consumption.

Laziness can also reduce memory consumption.

I wrote curryspondence, a mailing list archive searcher and discovered that a lazy pipeline ended up using a maximum amount of memory equal to the largest email message plus the runtime system.

Any strict language would require explicit laziness to do incremental parsing of mbox files.

Laziness also allows amortized algorithms which have a better algorithmic complexity than can be achieved in a strict algorithm. Not all algorithms can be amortized, but those that can will always perform better than their strict equivalents. See Okasaki's research on amortization for details.

Personally I'm still convinced that problems with laziness are a cultural issue. That is, everyone knows how to optimize for strictness, but how many people know how to optimize for laziness? Since this knowledge is so hard to find, laziness causes problems for most users.

I wrote curryspondence, a

I wrote curryspondence, a mailing list archive searcher and discovered that a lazy pipeline ended up using a maximum amount of memory equal to the largest email message plus the runtime system.

I don't know the details of your system, but I would expect a strict implementation of something like that to use a bounded amount of memory (a fixed size read buffer + runtime).

Laziness also allows amortized algorithms which have a better algorithmic complexity than can be achieved in a strict algorithm.

One can always introduce the necessary laziness explicitly in a strict language. I've translated some Haskell combinator libraries (making use of laziness) to SML and adding the laziness explicitly has been trivial. In many cases the laziness can be completely hidden from the user of the library.

Not all algorithms can be amortized, but those that can will always perform better than their strict equivalents.

In a(n impure) strict language one can use ephemeral data structures that often provide even better time complexity than purely functional amortized data structures. Amortized algorithms (data structures) are not a panacea. They may have higher constant factors and they may not be suitable for applications that require short response times (hard/soft real-time).

Personally I'm still convinced that problems with laziness are a cultural issue. That is, everyone knows how to optimize for strictness, but how many people know how to optimize for laziness? Since this knowledge is so hard to find, laziness causes problems for most users.

My intuition tells me that a major problem with laziness is that it isn't a strictly local phenomena. The space efficiency of a lazy producer depends on the consumer. So, you can't just fix the space leaks of a producer in a modular fashion. Instead, you have to consider the program as a whole. If my intuition is correct about this, then I believe that the problems with laziness are fundamental and will never go away.

My intuition tells me that a

My intuition tells me that a major problem with laziness is that it isn't a strictly local phenomena. The space efficiency of a lazy producer depends on the consumer. So, you can't just fix the space leaks of a producer in a modular fashion. Instead, you have to consider the program as a whole. If my intuition is correct about this, then I believe that the problems with laziness are fundamental and will never go away.

Most of the time the dependency on the consumer is about whether the consumer eats and throws away or keeps hold of all the data. Having the option to do either without altering the producer probably boosts rather than limits modularity, but currently we're not used to keeping track of the pattern through a call chain - we're not in the habit of documenting consumption or production behaviour beyond strictness.

Reasoning about space consumption modularly

I probably wasn't clear enough. I meant reasoning about space consumption modularly. In a strict language one can easily statically specify what the space consumption of a particular data structure or algorithm is and one can then statically reason about the overall space complexity of an entire program. Lazy evaluation, or so it would seem to me, complicates matters. You can't just say, for example, that the space consumption of a binary tree is O(N) in the number of items in the tree, because that only holds for the fully evaluated tree. In a lazy language the space consumption of a binary tree is something like O(N+I+D) where N is the number of strict items in the tree, I is the number of delayed insertions, and D is the number of delayed deletions. Some, but not necessarily all, of those delayed operations are evaluated when you perform a lookup on the tree. So, the space consumption of a lazy program depends very much on the dynamic sequence of operations performed rather than just the amount of actual live data.

I bet people have written articles on these matters. Pointers to particularly enlightening articles would be nice...

Sure, production behaviour

Sure, production behaviour includes how much is suspended at any given time as well as how much is computed. Lists are a nice special case, of course, but even then it gets hectic when you've a series of whole-list transformations and a fully-evaluated list going in at the start (which in the general case is likely to be live until at least the first time the entire list's forced).

Garbage collecting space leaks due to laziness?

I had a crazy idea. Has the idea of performing reductions on suspended computations during carbage collection been exercised? I tried googling a bit, but couldn't find relevant papers. It is obvious that there would be complications, but I can't immediately say whether the idea would be completely unusable. Basically, in conjunction with garbage collection, reductions (not just on outermost redexes) would be performed (in such a way that non-strict behavior is retained). Naively thinking, the effect could be that the GC would also (possibly over multiple collections) collect space leaks due to laziness?

Fixing some space leaks with a garbage collector

Fixing some space leaks with a garbage collector
Hughes has pointed out that some programs which one expects to run in constant space cannot do so under a sequential evaluator; other researchers have reported similar problems This paper proposes a modification to the garbage collector as a solution to this problem. The modification eliminates a class of space leaks due to where clauses.
...
The modification to the implementation is relatively simple, namely, the garbage collector is changed to perform a certain class of reductions when possible.

Eager Haskell during GC

Thanks, that was an interesting read! I'll have to check out other related papers tomorrow. Unfortunately, the article didn't have an in-depth investigation on what reductions would be possible or desirable during GC. Perhaps one could use similar techniques as used in Eager Haskell to perform more than just tuple deconstruction during GC.

On modularity and laziness

I was reading the introduction section of Interaction-Safe State for the Web when the following remark reminded me of laziness:

To describe [...] modularly (i.e., without transforming the entire program into a particular idiomatic style) [...]

The above remark, IMO, characterizes the modularity problem (space leaks) with laziness accurately. The problem is that laziness tends to cause space leaks unless you transform your program into a particular idiomatic style. Some time ago I read a bunch of papers on Yampa and I recall that one of the motivations for using arrows was space leaks. So, I'm not sure that I would describe the problems with laziness as a cultural issue. What seems to be necessary in order to really fix space leaks once and for all is to write your program in a particular idiomatic way (e.g. use arrows). This isn't just a simple optimization for laziness that you can add in later.

Yampa's a somewhat odd

Yampa's a somewhat odd example - the big reason for using arrows in it is to prevent code from hanging on to the whole of a signal's history (at least, without very very clearly doing so - you'd need to keep switching in a new signal function parameterised on the history).

This is potentially a general issue with lazy code - to extend my comment about consumption and production behaviour, we need to look at both what a function consumes as it produces and at what it keeps live as its results are themselves consumed. But I don't think it's insurmountable - it's just behaviour we need to understand as we start gluing pieces of code together.

How about Concurrency?

The thing that I've always wondered about purely functional data-types is how well they work with concurrency? Some processors now support 32 simultaneous threads of execution (ie. Sun's Niagra processor with 8 cores, 4 threads per core) and they say that within ten years the average desktop processor will probably support 100 hardware threads. It seems that for purely functional data-types to work that you would need to be constantly locking the whole graph and that this would effectively serialize your expensive parallel hardware. In a purely functional world you still have one mutable object, ie. the one from which all other objects are reachable, so this becomes a parallelization bottle-neck / critical point of lock contention. I think that temporal or journal based databases must have the same problem so I wonder how they do it?

Purely functional data structures...

...are the ideal case for parallelism, because you don't need to lock anything, ever.

If you have a functional binary tree, say, and then you "update" it by adding an element, then you'll have two binary trees available -- the old one and the updated one. This is because the add method will not modify the old one; it will create new nodes along the path to the right place to insert the node:

type tree =
  | Leaf
  | Node of tree * int * tree

let add(tree, x) =
  match tree with
  | Leaf                            -> Node(Leaf,        x, Leaf)
  | Node(left, y, right) when x < y -> Node(add(left,x), y, right)
  | Node(left, y, right) when x > y -> Node(left,        y, add(right, x))
  | Node(left, y, right)            -> tree

Since the two trees share identical subtrees, the incremental space usage for the added node will be O(log n) on average.

But: since you never update the old tree, that means you don't have to lock it while the addition is in progress. So you can do any number of parallel operations you like, and your code will continue to work.

Caveats with laziness

However, it appears that laziness (and hence lazy data structures) is a problem in this setting, because of the ubiquitous updating taking place under the hood. Small grain synchronisation is required that introduces quite significant overhead for some examples, see this paper from last year's Haskell Workshop (esp. Section 6).

There's a reason I used ML

There's a reason I used ML in this example. :)

A little more seriously, I wouldn't say laziness itself is the problem, because laziness can imply either call-by-name or call-by-need, and a call-by-name language doesn't need locks any more than a call-by-value language does. Haskell has problems because it's call-by-need, and this means that they have to do updates to fulfil their promise not to re-evaluate expressions.

Laziness vs call-by-need?

Now I got confused. I always took it that laziness is call-by-need, whereas call-by-name would also be non-strict, but not lazy. Is laziness used with a broader meaning somewhere?

Hmm...

Now I wonder if I got my terminology messed up too.

This isn't authoritative, but Wikipedia seems to think that "lazy evaluation" can be call-by-name or call-by-need, and that "non-strict" can include evaluation under lambdas (ie, any reduction strategy at all that's not call-by-value). I'm suprised by that last definition, though it certainly makes sense.

vs. Foldoc

Foldoc's definition is more restrictive: "combining normal order evaluation with updating". I suspect that "lazy" may originally have had the more restricted meaning described by Foldoc, buy as Wikipedia points out, "theoretical presentations of lazy evaluation often use call-by-name for simplicity", so the term ended up covering both.

Someone with ACM access could check out Henderson & Morris' 1976 paper, A lazy evaluator, since Morris apparently coined the term. (My ACM membership expires with inconvenient regularity... once every twelve months or so...)

... Still Requires Locking?

I don't understand why you say that I wouldn't need locking. If I have 32 processors updating the tree at once, then I would end up with 33 trees in the end: the original, plus 32 other trees each containing one of the updates (but sharing sub-trees which they have in common with the original tree). Thirty-three trees to choose from but they would all be wrong! The result I need is one tree with all thirty-two updates applied, not thirty-two trees each with one update applied.

Hi Kevin, I think you're

Hi Kevin, I think you're conceptualizing the problem in a bad way. If you want to do a bulk update and add 32 elements to a set, then giving each processor 1 element to add is fundamentally the wrong way to go about it. That's because now each processor knows one thing that the others don't, and that means each processor has to communicate with the others -- and communication is the source of synchronization.

A better way to do this is to start with an operation like fast union, and then design the algorithm so that you can operate on subtrees independently. This way, you can split the data so that each subtree can be updated independently and their results combined independently. For an example of this, see Guy Blelloch and Margaret Reid-Miller's paper "Fast Set Operations Using Treaps".

The parallelism in the union algorithm (on page 4) relies on the fact that you aren't doing any mutation -- this is the invariant that lets you do the recursive calls to the two subtrees in parallel.

Not Just a Functional Data-Stucture

I understand the advantages of functional data-structures in the face of concurrent updates: namely that you don't need to lock before you read for fear of someone re-balancing, re-hasing, or temporarily corrupting the data-structure while you're using it. I also understand what you're talking about w.r.t. splitting the data into sub-graphs and then providing a merged view. This is technique that I use all of the time. In java the standard ConcurrentHashMap is actually composed of many sub maps (16 in JDK 1.4 and 32 in Java 5, but configurable) so that when you perform an update it only has to lock a small portion of the tree.

This is fine, but this isn't my original question. I wasn't concerned about functional data-structures (in a non purely functional system) but rather any data-structure in a purely functional system (ie. with referential transparency). I can see how you can make an individual data-structure work, in Java say, but if your whole language/environment is purely functional then you can't really stop at the top of your data-structure. You need to update the object/structure that references it and then the object/structure that references the referencer,... , all of the way up to the top-level world/environment/state/whatever. At this point what do you do? If your top-level state is itself only a sub-tree then only one sub-tree contains your data-structure, in which case you've split too soon. Do you see what I mean? If the top-level state is a collection of sub-trees/sub-states then presumably my data-structure is only in one of them in which case all updates to my structure are still to the same sub-tree. In this case I haven't achieved anything (with all concurrent updates still going to the same structure). If the top level state is itself not comprised of sub-states and my structure is, then whenever I update a sub-graph of my structure then I still need to propagate this change up to the top-level state in which case I've just moved the concurrency bottleneck from my structure up to the top-level instead (which is even worse).

So in the end what you need is either a way to transparently support partial/merged states of partial/merged states (which I can't see how to do) or else use functional structures from within a non-functional environment (or perhaps allow for optimistic updates and then detect and abort conflicts) .

Functional -> "updates" are not shared

Basically what the subject line says. When you have functional code, the program doesn't update any data, it allocates modified copies. All the old references keep referencing the original version. So no, there is no need to merge states, since there never is any update. If you need updates, you can implement them by passing the stateful data around, like in a monad, which explicits the need for sequential evaluation.

Now, in order to implement call-by-need, the runtime usually uses side-effects (to nodes in a graph or to a hash table, iiuc), but the beauty of being referentially transparent is that, in the worst case, we can simply not update the graph or the hash table and gracefully fail into call-by-name.

State is on the stack

Re: any data structure in a functional system (short answer: there are no non-functional data structures in functional system; it's impossible by definition)

Your data structures are ultimately all stored on the stack. The base of the stack where your program started (ie. "main"), may have held an empty tree, and it may have called a function to fill that tree; that function will then return the new filled tree (no side-effects, so the tree can't be modified in place).

Any intermediate trees that are in partial states of being filled during the execution of that function, are now garbage. Please correct me if I'm wrong, but with no deletions the amount of garbage will likely be very small or zero due to sharing. This function to fill the tree will itself call other functions that return subtrees, etc., and the stack implicitly stores intermediate states and tree portions.

As an aside, this is what supports the local reasoning benefits of purely functional programming, vs the global reasoning that is often necessary with non-functional languages (particularly with shared mutable state and concurrency).

Multiple Processors


Re: any data structure in a functional system (short answer: there are no non-functional data structures in functional system; it's impossible by definition)

Yes, you either have a functional system or functional data-types in a non-function system, but not functional data-types in a functional system.

My question isn't on how to use functional data-types to support multi-processing as that is a well understood "solved" problem. My question is on how to do multi-processing in a purely functional referentially transparent system. As you say, all data is ultimately stored on *the* stack, but how do you run multiple processors against a single stack?

As I said in my original post: we already have CPU's which support 32 concurrent threads of execution and this number is expected to increase to ~100 for the average desktop machine within 10 years. What are purely-functional referentially transparent systems going to do to support these types of systems?

Unfortunately, I don't have

Unfortunately, I don't have time to write an extensive answer or dig up references, but there seems to be some confusion in this subthread that I want to try to clear up a bit.

First of all, data structures generally live on the heap, not the stack. Since I assume that you are talking about shared memory multi-processor systems, there is absolutely no problem of sharing data structures or results of parallel computations.

Furthermore, functional computations can (in principle) easily be parallised automatically. Consider a function call with N arguments (in a strict language). Since there is no interference through state, evaluation of the N argument expressions can freely be performed in parallel on N processors. The call itself synchronises on termination of all these computations (this synchronisation can be delayed even further using the concept of futures).

That is, in a pure language, parallelisation can be made totally transparent. In practice, you might want to rely on some hints from the programmer to achieve better results, but that is semantically irrelevant. In other words, you don't even have to change your programming model.

The problem with "merging state" that you allude to is no different in a parallel setting from the situation in a single-threaded world. As always, you need to gather and thread all state through your call chains, either explicitly, or implicitly via monads.

Note that I talked about parallelisation, since that seems to be what you are interested in. Concurrency (involving non-determinism through explicit synchronisation and communication between threads) is a slightly different issue. Since non-determinism is impure you will have to encapsulate respective primitives into a monad to maintain purity, for example.

Gather and thread?

As always, you need to gather and thread all state through your call chains, either explicitly, or implicitly via monads.

This statement seems to get to something I have been trying to understand for quite a while about monads. But could you explain it a little more. Mainly the expression "gather and thread". I think I understand monads, but how do you gather and thread explicitly.

State passing

You make a data structure to represent the state and then explicitly pass it around in state passing style (functions take and return additionally a "state"). This will necessarily induce synchronization. Finally, just to be certain, state passing /= monads. Monads are a more general idea; state passing is just a single particular instance.

They're going to scale

They're going to scale easily and without much trouble; in fact, I think that the spread of parallel machines is going to be a major opportunity for functional programming languages (though not necessarily ML or Haskell) to enter the mainstream.

Loosely, the reason is that purely functional, terminating languages have a property called "confluence" -- that is, a program computes the same value no matter which execution order you choose for it. As a consequence, you can evaluate any two subexpressions in parallel, and still compute the same result. This means that parallelizing a program becomes relatively simple -- you can take any two function calls in the body of a function and invoke them in parallel. For example, if you have a map function over a tree:

type 'a tree = Leaf 
             | Node of 'a tree * 'a * 'a tree

let rec map f tree = 
  match tree with
  | Leaf -> Leaf
  | Node(l, x, r) -> 
      let l' = map f l in
      let r' = map f r in
      let x' = f(x)
      Node(l', x', r')

Now, observe that in the recursive call, we make two calls to f, which have no data dependencies between them. So we can change this to something like:

val spawn : (unit -> 'a) -> 'a future
val sync  : 'a future -> 'a

let rec map f tree = 
  match tree with
  | Leaf -> Leaf
  | Node(l, x, r) -> 
      let l' = spawn (fun () -> map f l) in
      let r' = spawn (fun () -> map f r) in
      let x' = f(x) in
      Node(sync l', x', sync r')

with no worries at all. In fact, as Andreas points out, you can do this automatically -- and the hardest part is deciding when to stop spawning parallel execution threads, not finding enough opportunities for parallelizing.

How does this hook up with the question you asked about the stack? Well, having "a stack" is a result of choosing a particular evaluation strategy -- namely, you evaluate one function call at a time, and as you go down the program you build a list of function calls you need to do next. When we change the evaluation strategy to parallel execution, then each thread we create will need to keep track of its pending calls and the threads it's waiting on. So each thread has its own stack. However, the observable behavior of the program -- the value it computes -- will remain unchanged, thanks to the confluence property.

Can vs. Do

(W/out doing any research myself I'm asking this question so I might be really lame for asking it...) Are there concrete FP systems that actually do these possible optimizations for parallelism, and do them really well?

So, pure FP, ref. transparency == program state COW?

This sounds like COW (copy-on-write) filesystems.

COW is good for filesystems for many reasons, but is it good for program state in memory? What's the difference between RAM and moving rust anyways? Latency, bandwidth, and cost per-unit.

So the question is, is applying COW to program state economical?

GC

I hear that some beardy weirdies in the 60s invented this thing called garbage collection. You might want to look it up.

What were you replying to?

What were you replying to? What does a GC have to do with whether pure functional programming is economical?

It has a lot to do with it.

The model of computation in pure functional programming kind of assumes that memory is never freed -- without garbage collection your program would simply consume more and more memory for all the intermediate data produced by a computation. Garbage collection is what allows one to free memory anyway once it can be proven to be unreachable. Thus making the purely functional model useful on computers with bounded memory.

(EDIT: "with bounded" not "with unbounded". Doh)

Quantity of memory is not the issue

I don't know why you assumed that I was worried about quantity of memory when I didn't mention that at all. I know about GCs, thank you. Functional programming languages are hardly the only ones to sport GCs.

[EDIT: Certainly FP would not be economical w/o GC (garbage creation + no collection -> sadness), so, yes, GC has a lot to do with making FP economical. Again, I'm not asking about GC.]

My question is specifically about whether copying rather than modifying is economical for program state. Think cache effects.

Unquestionably COW is economical for persistent storage. See ZFS (the Zettabyte FS, not Zipper FS [which I saw here recently]).

I'm not making saying that COW is not economical for for program state, btw, but asking whether it is. Some techniques that ZFS uses, storing data block checksums with pointers for example, may not be applicable to program state held in memory (e.g., because memory errors not corrected or caught by ECC memory are so rare).

I should just search citeseer, I suppose. Thanks for the non-answers :)

More non-answers

Economics aside, I think equating RT to COW is firstly bad terminology. COW is concerned about state as you say, and copying sounds like an implementation tactic to improve the economy for persistent storage (I am taking your word for it as I know nothing about that.)

The problem is that RT is not at all about state, so looking at it assuming that programmers intend to maintain state and accomplish this via COW is going to be confusing. The programmer here actually wants to stay away from state and utilize equational reasoning. The language implementation makes this possible by copying the necessary bits around and GC'ing.

How economical a particular implementation is a question about the implementation. Sorting an array in-place is going to be more cache friendly than sorting via an output array of course. The kind of papers to read are the likes of Spineless Tagless G-machine.

If you have GC, copying the

If you have GC, copying the state before writing to it will consume (within a constant factor) the same amount of memory as updating the state directly: the old version becomes garbage and the space is recycled at some point. Of course, that's assuming that the program isn't keeping any other reference to the old state; in that case, imperative languages would usually use CoW anyway. In some implementation, the case where there is no sharing has been optimised through the use of linear objects that may not be copied. For such objects, the updating function is guaranteed to hold the only reference to its input object. It can thus bypass the GC and directly update the object without losing RT.

Zippers

For example, when processing an XML DOM, an attribute of an XML node many levels deep may be changed. How is that supposed to be handled with referential transparency? does the whole tree need to be copied?

As I understand it, part of the raison d'etre of zippers is to try to allow sharing in such cases.

Imagine you want to replace an entire subtree of a DOM and you have a "cursor" to the node at which this replacement is to occur. This cursor has a few operations: move up to the parent, move to the next node and move to the previous node (where "next" and "previous" are as defined by depth-first order). All these operations return new cursors corresponding to the movement. Conceptually the state of a cursor is represented by a (parent, subtree) tuple, so you can see how you can easily replace the subtree: have a function that maps (parent, subtree) to (parent, new_subtree).

This is just the 10,000 ft view of zippers as I grok them. I couldn't find Huet's original paper so the above is mostly based on what I inferred (perhaps falsely) from other souces that mention zippers. I understand there are some Haskell-specific tricks involving lazy evaluation that make zippers even more appealing.

Correct!

This is going to turn into the all-Oleg-Kiselyov-all-the-time site: see Oleg's work on SXML-diff: "We give an example of traversing two SXML documents ``side-by-side'' and making pure functional updates to arbitrary encountered nodes. The example illustrates the benefits of zipper -- an update cursor into an immutable data structure."

One of the nice things about the zipper is indeed that anything that can be shared is. Lazy evaluation isn't necesssary, or perhaps it's more accurate to say that the laziness necessary is embedded in "shift" and "reset."

Again, see ZipperFS for an extremely striking example of the benefits of the Zipper, and don't be fooled by the fact that it's in Haskell: you can do this in any language that can support delimited continuations, which would appear to be just about any of them, with some effort. I'm currently working on implementing A Monadic Framework for Subcontinuations in O'Caml, then I'll move on to C++, then I'll work on Oleg's Zipper in O'Caml and C++.

OO-Like

The classic definition of OO says you have shared mutable objects. In a pure functional language, you could not have a variable pointing at a shared mutable object because that wouldn't be referentially transparent.

I like keyword syntax (for its engineering advantage), like in Smalltalk, Self, and Slate. I keep thinking along the lines of a language where syntax like that would query immutable objects (values if you will). They could support polymorphism and whatever kind of inheritance or delegation would seem to pay dividends.

Still, for some problems, it helps to be able to think in terms of something like mutable objects, or processes as I prefer to think of them in this context. For example, consider software update. How do we express getting the latest version of some piece of software that's evolving over the months in human hands?

[On edit: The external world is also an example of a process. I/O is communication to that process.]

Here's how I currently think to model processes (these ideas are not original; I'm picking and choosing which giants to stand on). At the center of the idea of a process is the stream of commands the process is going to interpret over its lifetime. Suppose this stream is expessed as a list. To start out simple, let's think of the case where we have only one reference to the process or mutable object. If we add logical variables to the language, we can model the reference as the right to tell the head and tail of the stream of commands [on edit: what I meant is the right to tell the head and tail of the list that represents the stream of commands]. Somebody who wants to send a message to the mutable object asserts the message as the head of the list, and passes the tail of the list on to whoever should next posess the reference. [On edit: passes the right to tell the tail of the list . . .].

We can then add multiple references to the processes by providing a way to merge indeterministically two or more streams to one. The process interprets the merged stream, and the holders of references get the telling end of the streams merged. The indeterministic merger of the streams can be accomplished with a stream of oracles from the environment.

I submit that a framework with all these features would stick to referential transparency but at the same time provide the engineering advantages of OO, except maybe brevity. Maybe similar brevity could be obtained with the right syntactic sugaring or macros. The simplest syntax would denote polymorphic dispatch to functions over immutable values, and a syntax minimally more complex but looking quite similar to the first syntax, could denote sending messages to processes. Maybe Oz/Mozart has something to contribute here by showing the way to make messages themselves first-class values, with a convenient syntax to compose (or "comprehend", as in list comprehension) them. The syntax "a b" could mean b is a message and it should be dispatched to a, interpreted as a value with OO-like ability to field the message (but not to mutate itself (except maybe in a narrowing fasion, using logical variables, or being a logical variable)). So here, b would usually be a message comprehension in the style of Oz/Mozart (where it's viewed as a record, not a message). A message would be a record. Then we add syntax or sugaring to make it convenient to send one of these messages along a process reference. The reference can be consumed with that, or it might be convenient also to have a syntax that would return a new reference for sending the next message. Again, the sementics of these references would be tellers (tell rights) to the logical variables whose eventually determined values would make up the stream of messages to the process.

An issue I haven't quite yet understood how to work out with this framework stems from the necessity to make sure, statically, that no two agents can tell different values to one logical variable. For a while I thought for a quick dirty prototype we could just say that to do so is a runtime error. But I would be deeply unsatisfied that programs with errors in them, not detected until runtime, wouldn't be referentially transparent. If we said OK, I'm going to keep the asker to this logical variable, and you two guys are going to have tellers to it, and either one of you may finish first and tell a the teller, and you're going to tell it different values, then I end up with an asker that could have two different values depending on the luck of the draw, and that isn't RT!

[Elaboration on edit: the agent represented by "I" above could produce system output that reflected the choice of value for the variable being asked, and then later the error could occur. Marking the variable as now "erroneous" solves nothing at this late date because the horse (the first value chosen) has already escaped the barn (the system).]

Even though I like dynamic typing, I think that solving this issue requires enough static typing (or moding) to prove statically that all executions, even erroneous ones, will preserve RT. Maybe the answer comes from uniqueness typing (explored in Clean programming language and in Mercury programming language). Or maybe it comes from Janus programming language, which was said to have this property that it statically prevented logical contradiction, but unfortunately I can't seem to find documentation of any of the versions of Janus except what is in the Lucy/Janus paper, which does't lay out the proof.

We can then add multiple

We can then add multiple references to the processes by providing a way to merge indeterministically two or more streams to one. The process interprets the merged stream, and the holders of references get the telling end of the streams merged. The indeterministic merger of the streams can be accomplished with a stream of oracles from the environment.

If I remember correctly Oz includes exactly this feature, and CTM describes how this breakes referential transperience, for example by allowing you to implement mutable references.

O'Haskell

Have a look at O'Haskell (and Timber). It combines pure functional programming and OO in a pleasent way. But there are effects, so it's only referentially transparent for a part of the language.

In case you're looking too

Link to the O'Haskell web page.

O'Haskell was supersceded by Timber.

which still hasn't come out?

but i guess it looks like it isn't dead, at least?

CTM Does Not Mention Oracles

Oz/Mozart/CTM merges the message streams indeterministically without appealing to oracles as an explanation of the source of the choices.

The same thing is true of ToonTalk. Suppose, starting with a simple bird-nest pair, you use Maggie, the magic wand, to "copy" the bird. You now have two birds that fly to the same nest. Pass the birds to different agents running at different speeds, and the bird-bird-nest trio will implement an indeterministic merge of the streams of messages handed to the two birds.

Come to think of it, though, if I think of Maggie before the "copy" as a different wand than Maggie after the copy, I can conceive of Maggie as an infinite supply of oracles. So an RT account of this aspect of ToonTalk is perhaps not too far-fetched.

Under this conception, however, Maggie has to have invested the nest with half her infinite supply of oracles (keeping the other half for her next use). That's because the nest is going to receive up to infinity messages from each bird, so it may need up to infinity oracles to decide the order of messages received. I'm not sure what the implications of this are for persistence (as on disk) of the nest.

Nondeterminism is a strange beast

Adding threads to a functional language does not make it nondeterministic. Adding mutable state to a functional language does not make it nondeterministic either. But adding both does make it nondeterministic.

Pure Mostly

If O'Haskell and Timber are referentially transparent for only part of the language, then they are not pure functional. Or at least, if they are not purely RT, I think that makes them irrelevant to the discussion about combining OO and RT. It was combining OO and RT, not combining OO and RT-mostly.

[To non-U.S. readers: "pure mostly" is an expression of humorist Garrison Keillor, who makes fun of the culture of the U. S. State of Minnesota as influenced by immigrants from Norway. He says the product he advertises is made from whole wheat grown by "Norwegian bachelor farmers", so it's "pure mostly".]

Purely Functional Languages

What constitutes a purely functional language has been considered before, and page 2 seems to disagree about RT implying purely functional: ".. two of the informal definitions of purity: referential transparency and independence on order of evaluation, do not have universally agreed-upon definitions and are not considered any further".

FP vs OOP on TMR

For more info on the differences, see FP vs OOP by Alistair Bailey from The Monad.Reader issue three.

Equal Rights for Functional Objects

Henry Baker argues that object identity is best modeled as side-effect semantics in Equal Rights for Functional Objects. I think this means non-RT, but I am not clear the definition.

What about reference based equality?

If objects have a 'identity' on their own, in other word, if the language supports reference-based equality, is referential tranparency then even possible?

Imagine a simple function/method which returns a new object. The object is always a different one, so the return value of the function is different with every call which violates RT. OTOH we could say that the function has to give back a new object and if that's true for every call with the same parameters the method is RT. Also we have no dependence on execution order, because the it doesn't matter in which order new objects are created.

But is this really RT? If not, this would mean that RT is only possible in languages which only allows values.

Comments?

To me it seems that there

To me it seems that there isn't any exact boundary for referential transparity, and that things like this lies on the border.

Definitely not RT

Suppose there is a function that creates object instances. In the presence of object identity and object identity equality this function won't be referentially transparent, so I don't think it's border-line. This is the usual argument that shows that the combination of CONS and EQ? in Scheme is enough to make it impure even if you take away the various setters.

New Identities from the Environment

The function or method that returns a new object needs to be passed an oracle to consume. The oracle knows the identity of the new object. Alternatively, the function could be passed an infinite collection of oracles, one of which it would consume, and it would return a collection of the (infinity) remaining, unconsumed, oracles (along with the new object).

The ultimate source of the oracles would be the computing environment. The runtime system would pass a collection of them to the "main program".

Effects

Perhaps counter intuitively, an effects system can treat "new" as pure but identity based equality as impure. One way to look at that is that "new" is effected by the state of the store but you can't observe that effect until you use "eq" and cousins.

It's very similar to how Haskell let's you raise an exception anywhere but only catch it in the IO monad. In fact, the two tie together since "new" can raise exceptions due to heap exhaustion even in pure languages.

Expressing OOP functionally

I'm surprised that this thread has not mentioned Simple Type-Theoretic Foundations for Object-Oriented Programming by Benjamin C. Pierce and David N. Turner (JFP 4(2):207-247, 1994).
As in most type-theoretic accounts of object-oriented programming, we restrict our attention to purely functional objects, in which methods must return a new copy of the internal state instead of updating it in place. (Questions of typing are not affected by this simplification; the model can straightforwardly be extended to include imperative-style object-oriented programming (Bruce & van Gent, 1993), modulo one technical proviso; c.f. Section 10.) The state of an object is represented by a single value.

Abadi & Cardelli's object calculus

Sorry if this has already been mentioned (I haven't had time to read this entire thread, interesting though it is), but probably worth mentioning Abadi and Cardelli's object calculus, which is referentially transparent, although they do present imperative calculi as well in their book.