Pointer-Free Parallel Programming

As part of the ongoing design of the ParaSail programming language, we settled on a "pointer free" approach. Given that pointers (or references in Java parlance) are rather fundamental to almost all modern languages which allow the user definition of structures like trees, lists, sets, maps, graphs, etc., this seems like a risky decision. In fact, we have been very pleased with the result, and in some ways it requires only a slight shift from the familiar programming model, while in other ways it makes for a dramatic simplification in storage management, assignment semantics, analyzability, etc.

Coupled with a few other ParaSail rules, the elimination of pointers makes it as easy to parallelize ParaSail as it is to parallelize a pure side-effect-free functional language, while remaining very much in the familiar territory of the Java/C# class-and-interface-based object-oriented programming world. Overall, ParaSail is a pleasure to program in, and this is in no small part due to the elimination of pointers from the surface syntax and semantics of the language. Of course, pointers are likely to still be used behind the scenes by any ParaSail implementation, but that doesn't affect the simplicity provided at the programmer level of having banished them.

We have written a longer "blog" entry on this topic here:

http://parasail-programming-language.blogspot.com/2012/08/a-pointer-free-path-to-object-oriented.html

Needs examples

It's an interesting idea which needs examples to justify it. Write up some tree algorithms, a scene graph, or a directed acyclic graph as for a PERT chart and see how that looks. How are the relationships addressed? Do you end up with arrays of indices to other arrays, FORTRAN-style? Or arrays which have keys to other arrays, like SQL table joins?

By John Nagle at Wed, 2012-08-01 17:59 | login or register to post comments

trees

Personally, I would not place much emphasis on trees in the context of a highly parallel language.

First, note that you can do just fine using indices for references. You do not get automated garbage collection of your nodes this way, but you do get automated garbage collection of the tree as a whole.

Second, note that a lot of the benefits of using trees have to do with manipulating only one node at a time. This is a good way to waste time on a highly parallel machine -- in many practical cases you could probably rebuild the entire data structure represented by the tree in the time it takes to manipulate a few nodes.

That said, trees can be useful, once the data gets big enough. But in some contexts the best tree will be shallow with fat nodes (node size roughly comparable to some physical structure in your underlying parallel machine).

tl;dr: trees = Streetlight Effect

By rdm at Mon, 2012-08-06 13:49 | login or register to post comments

This looks pretty good. You

This looks pretty good. You use mutable option types in place of pointers, which is pretty much what I expected. But the use of linear-referenced structures and regions to avoid GC is especially nice.

I agree with modeling cyclic structures more explicitly by modeling identifiers, e.g. sort of like Oleg Kiselyov used an IntMap to model pure double-linked lists.

By dmbarbour at Wed, 2012-08-01 18:34 | login or register to post comments

An example of a pointer-free balanced (AA) tree

"Needs examples."

Good point, an example is worth well more than a 1000 words. Here is a blog entry with an example of a pointer-free balanced (AA) tree implemented in ParaSail:

http://parasail-programming-language.blogspot.com/2012/08/a-pointer-free-balanced-aatree-in.html

There was already an earlier blog entry illustrating a pointer-free directed graph:

http://parasail-programming-language.blogspot.com/2012/07/implementation-of-directed-graph-in.html

By sttaft at Wed, 2012-08-01 18:37 | login or register to post comments

Isn't this just pointers?

What you have eliminated is shallow copy, not pointers, right? Every copy operation is a deep copy?

By P Doyle at Wed, 2012-08-01 18:53 | login or register to post comments

Not just pointers.

"Isn't this just pointers?"

Not really. There is no pointer assignment in ParaSail. So there is no sharing of data between conceptual objects. That is pretty fundamental.

By sttaft at Wed, 2012-08-01 19:00 | login or register to post comments

Typically I'd call that "no

Typically I'd call that "no aliasing". The "no pointers" bit refers to the lack pointers as values, no pointer semantics - i.e. no ability to compare pointers or increment them. You have that too, of course.

Can you use the `reply` link near our names when replying to a comment?

By dmbarbour at Wed, 2012-08-01 19:08 | login or register to post comments

No hidden aliasing

[Thanks for the tip about the "reply" link.]

There is aliasing, in that A[I] and A[J] are aliased iff I == J, but none of what might be called "hidden" aliasing, where two different names or two different access paths end up on the same object. The point is that "pointer-free" implies quite a lot, and it isn't just a lack of shallow copying.

By sttaft at Wed, 2012-08-01 19:15 | login or register to post comments

So this is a tree

OK, from the example, this is a tree node:
class AA_Tree is var Value : Element; var Level : Univ_Integer := 0; var Left : optional AA_Tree; var Right : optional AA_Tree; ...

This makes sense as a way to organize certain kinds of data. It maps well to XML. There's a swap operation (atomic swap operations are nice to have available), so you can rearrange the tree in a controlled way.

But what if you want to move from a node towards the root? There's no "parent" relationship available. Of course, if you got to a node by recursive descent, there's an instance of the recursive function which has access to the parent, so you can find it that way.

The example for a directed graph gives every node a number, and then keeps sets of predecessor and successor node numbers in each node. The nodes themselves are simply "The vector of nodes, indexed by node ID." So this substitutes indices for pointers. A long time ago, I had to write things like that in FORTRAN. I've also done it in C++ when I had to represent arbitrary polyhedra in a dynamics simulator. It works well as long as you don't have to delete elements of the graph. Deleting isn't supported in the example given.

It does simplify memory management, though.

By John Nagle at Thu, 2012-08-02 06:39 | login or register to post comments

Deleting from a graph

Deleting is pretty straightforward in the directed graph example provided, given that each node knows all of its predecessors and successors. After removing the node id from the corresponding sets of each successor/predecessor, setting the node to null would release all of its storage back to its region.

By sttaft at Thu, 2012-08-02 16:21 | login or register to post comments

Deleting from a graph

After deletion, the deleted node would still be in the vector of nodes. That node slot will not be reused, because Add_Node just adds a new node at the end of the vector of nodes. This creates a memory leak.

So now there's a need for a storage manager to reclaim nodes. Searching linearly for a node creates an O(n) overhead. Linking dead nodes together through their predecessor and successor sets would work. That's a standard storage allocator technique; it's an exercise in Knuth. Of course, this has to be done for each vector type. Maybe a vector type with reference counts and storage management primitives could be created.

This is the point at which the programmer wishes for a garbage collector.

By John Nagle at Thu, 2012-08-02 18:55 | login or register to post comments

Deleting from a graph

Unfortunately there is no one perfect, general garbage collection algorithm, particularly when talking about a pervasively parallel language. In my experience, most of the time spent in garbage collection is worrying about objects that really ought to be stack temporaries in the first place. There are situations where you do a lot of insertions and deletions, and recycling the nodes is important, but that seems more the exception than the rule. Many data structures are built up, used, and then discarded.

In any case, the AA_Tree recycles nodes, and as you point out, it would be straightforward to have a directed-graph node deletion operation recycle nodes. Adding a set of "free" node-ids to the directed graph structure would be the simplest, allowing the free nodes themselves to be set to simply null on deletion.

By sttaft at Thu, 2012-08-02 20:30 | login or register to post comments

Difference between ParaSail and other functional languages?

I'm not quite sure I understand: what is the difference, wrt lack of pointers, between ParaSail and other languages such as Haskell?

I don't mean to imply that I think there isn't a difference, I just don't quite understand.

By Matt Fenwick at Thu, 2012-08-02 14:28 | login or register to post comments

Pointers and functional languages

Functional languages have pointers, but if the objects are immutable, it doesn't really matter. However many so-called "functional" languages are not "pure," and they do have references to mutable objects (e.g the ML derivatives including OCaml, F#, etc.).

Haskell is to some degree the exception here, where it is a "pure" functional language. But then of course the IO monad is introduced to provide side-effects, and exactly how the IO monad plays with pervasive parallelism is an interesting question.

Bottom line is that ParaSail has many of the advantages of a pure functional language, while still allowing a more imperative style when appropriate. Note that if a ParaSail function has no "var" parameters, then it is essentially a "pure" function (since there are no global variables), so it should be possible (though not required) to write in a purely functional style in ParaSail.

Note that the current (version 3.0.1) implementation has essentially all of the "object-oriented" capabilities implemented, but does not yet include full support for passing operations as parameters. You can pass "polymorphic" objects and rely on dispatching to accomplish some of that (as in Java), but directly passing operations is still on the "soon-to-be-supported" list.

By sttaft at Thu, 2012-08-02 16:08 | login or register to post comments

Won't this force users to

Won't this force users to implement their own GC? For example if you want to represent a graph, you'd probably do so as an array of vertices. When a vertex gets disconnected from the graph, it's still in the array. So essentially you're forced to represent the heap as an array, and a pointer becomes an integer index into the array, and you have to write your own GC...

Perhaps in practice this scenario doesn't come up though.

By Jules Jacobs at Wed, 2012-08-01 20:15 | login or register to post comments

Garbage collection

It clearly depends on how the graph is being managed. In many algorithms, it is quite possible for parts of the graph to be disconnected, and you wouldn't want disconnected parts to be automatically reclaimed. In general there is usually enough information already in the graph to support whatever kind of reclamation strategy is desired. If not, additional information can be added. In any case, setting any node to null will release all of its storage back to the associated region. This is analogous to the requirement to set a pointer to null if you want the GC to reclaim the designated object. And as a fall-back, when the region containing the graph goes out of scope, the entire region is automatically reclaimed.

By sttaft at Wed, 2012-08-01 20:53 | login or register to post comments

I see. Lets hope it doesn't

I see. Lets hope it doesn't come up often enough.

How would you implement a symbolic simplifier or symbolic differentiation? If you do it naively by copying the subtree at each operation, you'll end up with asymptotically worse complexity (because each node is copied O(n) times where n is its depth in the tree). So would you represent a symbolic expression as an array of nodes, where each node again points into the array? How well does that work in practice? (how pretty/ugly is the resulting code?)

By Jules Jacobs at Thu, 2012-08-02 09:21 | login or register to post comments

Symbolic simplification

It is not uncommon to implement a kind of symbolic simplification using a "value numbering" approach, where there is a hash table that encodes computations, after first performing various canonicalizations and simplifications. In this sort of approach, you walk a tree and return a unique "value number" that represents it, where the value number is an index into this "computation table." This sort of technique maps quite directly into ParaSail. In general you don't have to think in terms of an "array of nodes," but rather an indexed container of nodes, which can grow and shrink as appropriate.

By sttaft at Thu, 2012-08-02 16:15 | login or register to post comments

manual GC not a problem here

The main objection to manual memory management is that it's difficult to get right due to non-local interactions (e.g. framework callbacks, concurrency, etc.). It is often difficult to trace a memory leak to its source, or figure out how to correct it. It can also be difficult to manage concurrency, prevent use of deleted objects.

By comparison, management of an isolated structure representing a graph is easy. There are no complex interaction issues [assuming identifiers aren't remembered by the outside program]. It can be domain-specific, e.g. use a compacting copy-collection if that is appropriate (then swap or move). When developers forget to GC a long-lived structure representing a graph, it will be easy to trace the problem, and easy to fix it.

So, even if the scenario does come up, it doesn't cause very much pain. Especially compared to the pain of dealing with a malfunctioning GC...

By dmbarbour at Wed, 2012-08-01 23:46 | login or register to post comments

Show us the parallel-toolbox please

I must say ParaSail is a good and original design.

Sharing mutable data is certainly a sin.
It impedes readibility, composability and provability.
Thus one has two possibilities :
â€¢ the Haskell way is don't mutate the data
â€¢ the ParaSail way is don't share the mutable data
Both way are equally safe.
ParaSail does it while retaining the imperative flavor.
That could eventually help popular acceptance. Fine.

However, how is it fundamentally better than Haskell ?
How does it better address parallelism ?
How far is it a parallel language ?
My answer would be rather pessimistic: it seems no more a parallel language than Haskell already is.
Because to address parallelism is much more than to just address safety.

The real problem is : how do you minimize cpu idle ?
I don't really care what your safety-toolbox is.
Show us your parallel-toolbox please.

By Damien Guichard at Fri, 2012-08-03 00:32 | login or register to post comments

Parallel toolbox -- work stealing

Although you would think parallelism comes naturally to a pure functional language like Haskell, in fact it seems that it is not completely straightforward how to make the best use of parallel hardware in Haskell. The heavy use of lists in most functional languages is not particularly great for parallelism. Also most functional languages depend on underlying optimizations which break the pure side-effect-free model of computing, and so these optimizations can "fight" with the attempt to maximize parallelism. The net effect is that there are still major debates about the best way to handle parallelism in Haskell. The paper by Marlow et al from the Haskell '11 conference in Tokyo in September, "A Monad for Deterministic Parallelism," talks about some of the problems with the use of "par" and "pseq," and introduces a new monad called "Par." It seems clear that the parallel processing story for Haskell has not yet reached its final chapter.

In any case, as far as ParaSail, it adopts the "work stealing" model also used in Intel's Cilk language, and their Threaded Building Blocks (TBB) library. This model involves having a relatively small number of heavier weight "server" processes, each serving their own queue of very light weight threads (which we call "picothreads" in ParaSail, but they also go by other names like "sparks," "microthreads," lightweight threads, etc.). The approach starts with approximately as many server processes as there are physical cores. As a server executes a given picothread, it may spawn off many other picothreads. These are added to the servers own queue, which it serves in a last-in, first-out (LIFO) manner -- that is it serves the most recently spawned picothread first. However, if a server runs out of picothreads to serve, it looks around at the other servers' queues, and then "steals" a picothread from one of those, but in this case, it uses a first-in, first-out (FIFO) discipline, i.e., it steals the picothread that has been languishing longest on its server's queue.

This work-stealing approach does a very nice job of balancing the load, while preserving locality of reference. It works very well in the context of a language like ParaSail where the compiler automatically splits computations up into numerous picothreads, providing plenty of raw material for the work-stealing algorithm to do its thing.

By sttaft at Fri, 2012-08-03 04:06 | login or register to post comments

This work-stealing approach

This work-stealing approach does a very nice job of balancing the load, while preserving locality of reference.

Work stealing is being oversold by various PL designers. The locality of reference is great if you're doing a depth-first traversals with node-local references. As with "par" and other primitives, the program has to be refactored to make this true, or your speedups will be lousy. It is just one piece, and once you talk about locality, a very incomplete one.

By lmeyerov at Sun, 2012-08-05 19:50 | login or register to post comments

Re: Haskell comparison

I'll note right off that Haskell's parallelism doesn't seem to scale to a large number of cores. Now, this is implementation dependent, but the basic issue is that the following elements don't mix in a way that leaves me happy:

stop-the-world collector
Amdahl's law

Even for embarrassingly parallel computations, it can be difficult to get a speedup proportional to a very large number of cores because occasionally we must perform a world collection (which tends to get more expensive with more threads and cores active).

I think that ParaSail's control over memory can be a major contributor to keeping utilization high, albeit sometimes at the cost of a copy for large structures.

By dmbarbour at Fri, 2012-08-03 05:55 | login or register to post comments

Interesting point in the design space

An interesting language design and certainly a design space work exploring. It's similar to the kind of things you could do with strict ownership types - if anyone ever got around to building such a parallel language. Current research - things like Deterministic Parallel Java - uses ownership to delimit regions, but effect disjointness (rather than static confinement) to support parallelism.

On the other side, ScopedTypes uses ownership techniques to handle memory allocation for Realtime Java. But I haven't seen a design as good as ParaSail that combines the two.

By James Noble at Fri, 2012-08-03 07:11 | login or register to post comments

ParaSail article in top 10 for 2012 at embedded.com

An article about ParaSail and multicore programming made it into the top 10 most-read articles at embedded.com in 2012. Here is the article in case you are interested:

ParaSail: Less is more with multicore

The announcement of the top 10 articles for 2012 is here:

Embedded.com top 10 most viewed articles in 2012

ParaSail blog is here:

ParaSail Programming Language

By sttaft at Fri, 2013-01-11 15:58 | login or register to post comments

Lambda the Ultimate

User login

Navigation