The Manticore Project

Status Report: The Manticore Project. Along the lines of Concurrent ML comes a new language design that is aimed at the parallelism of multicore CPUs.

The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, we combine CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. We have been working on an implementation of Manticore for the past six months; this paper gives an overview of our design and a report on the status of the implementation effort.

Interesting material in terms of approach and implementation based on an ML style language (sans mutable variables). But the underlying assumption that language design offers the best path to solving parallelism is probably the key appeal for LtU:

The problem posed by such processors is how to effectively exploit the different forms of parallelism provided by the hardware. We believe that mechanisms that are well integrated into a programming language are the best hope for achieving parallelism across a wide range of applications, which is why we are focusing on language design and implementation.

And as long as we're on the subject, I see that Reppy's book on Concurrent ML is now in paperback form (moving it more into my price range).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Explicit+implicit parallelism, purely functional, but effects?

I was just about to post a question to the LTU forum asking about the tradeoffs of explicit and implicit parallelism in purely functional languages, and how they might be integrated. Excellent timing! It's an interesting paper so far. :-)

I'm still curious about side-effects in such a language though. The paper mentions using the concurrency primitives for I/O. What impact might this have, ie. can all important effects be modeled this way?

[Edit: to further the last question, does this mean the language doesn't need something like Monads to handle any effects?]

Weird synchronicity

I just found Manticore yesterday, and now here it is on LtU. Odd.
Anyways, I am confused by both the sequential semantics and use of exceptions.
1) Don't exceptions ruin the required referential transparency?
2) How are exceptions handled if they are raised on a remote computer?
3) Do sequential semantics really capture the simultaneity of code running on parallel processors/machines? It seems that treating concurrency as a set of compiler hints is missing something...but I just can't figure out what.

Explicit and Implicit

I would have thought it fairly obvious that explicit parallelism is based upon targetting a known number of independent CPU threads. But with all the attention now aiming at massively increasing the number of cores and independent threads on each chip then the known number will only be appropriate for a limited snapshot of time.

Implicit parallelism though will require some kind of intelligence in the run-time system in order to know when to fork a process, (which of course is much more expensive than a simple function call). It is going to need to know that there are threads/cores available for use and has to make intelligent guesses as when it is appropriate to do so. It could even decide it is more efficient to restart a function call that is taking too long as an independent process - although in this case I would expect that it would need type system guarantees that the function in question is "pure", and does not rely on any side-effects.

Perhaps the best people to answer questions like this would be the people who have worked on SEAM, the VM used by Alice ML.

I would have thought it

I would have thought it fairly obvious that explicit parallelism is based upon targetting a known number of independent CPU threads. But with all the attention now aiming at massively increasing the number of cores and independent threads on each chip then the known number will only be appropriate for a limited snapshot of time.

I don't see how this is relevant. Explicit parallelism is programmer-directed concurrency, meaning the programmer is spawning a process explicitly for a reason that can't necessarily be known to the compiler. For instance, a shared webhost would explicitly spawn a process for a hosted customer application, so it can be killed if it misbehaves. It's just one example that has little to do with exploiting the maximum concurrency.

Further, virtual processors are reified as vproc data types, so the program can scale itself to whatever number of vprocs are available.

Halt on GC?

I've now read the paper, (sorry for commenting first!). I would have expected there to be a co-ordinating process that could take charge when a exception is thrown - so why the need to wait for parallel processes to complete before an exception can be handled?

Perhaps more to the point is the heavy use by functional languages of Garbage Collection. So why does GC halt all other processing when it takes place? Amdahls' Law limits the amount of speedup in a parallel program to the sum of its sequential parts, so why insist something so important and so fundamental must be sequential?

So why does GC halt all

So why does GC halt all other processing when it takes place?

It doesn't. Most collections are local to a vproc, and only rare global collections block other vprocs from proceeding. There are known ways around this limitation, but they went for a simple approach for the first iteration I guess.

Explicit and Implicit

First, yes, fair point with regards to the current GC strategy.

Second, that is not my understanding of the distinction between explicit and implicit parallelisation - a Web Host is a specific (and not very good) example, if only because of the level of independence within each process.

A much better example would be mapping a function across a list whose length is not known at compile time. At what point should the decision be made to hand each of the function applications over to independent threads rather than performing the processing within the same thread? Should this be determined explicitly by the programmer who uses domain knowledge that each calculation is relatively lengthy, or implicitly by the run time system dynamically adjusting its strategy based upon the available processing capacity?

That is the distinction between explicit and truly implicit parallelisation. The run time system itself should make the decision, not the programmer. Remember, "Premature Optimisation is the Root of All Evil" - and that includes assuming how many hardware threads a program will have as spare capacity available to be used.

A more interesting example than map would be fold - because then data dependency, associativity, and so on also come into play. And that's all far too complicated for a programmer to explicitly annotate.

Should this be determined

Should this be determined explicitly by the programmer who uses domain knowledge that each calculation is relatively lengthy, or implicitly by the run time system dynamically adjusting its strategy based upon the available processing capacity?

That is the distinction between explicit and truly implicit parallelisation.

Your examples highlight the reason why they added parallelism hints in the language itself (parallel arrays and tuples, etc.), but not why explicit concurrency constructs like processes and channels are necessary; explicit concurrency can certainly be used to solve the same problems though.

My example highlights a clear-cut case for explicitly managed concurrency constructs, since there's no way to kill an implicit thread/terminate an implicitly concurrent computation early in general.

As to your other point, the compiler and/or runtime doesn't always know enough to be able to automatically split the processing load, or determine when something is fully parallelizable, which is the motivation for parallelism hints.

List is a stream

About mapping a function across a list (rather than an array) -- task queues are natural for that sort of situation, though they are more useful when the amount of work per list element (or group of list elements, perhaps) is enough to justify creating a task. Creating a task is different than creating a thread -- threads are relatively heavyweight, so it's more natural to have a fixed pool of workers which grab tasks.

I think of lists as streams -- naturally sequential objects that require transformation in order to be parallelized by data (rather than by task, via a task queue). Arrays are naturally parallel objects, in contrast. So it's not really natural to abstract away the details of a collection data type in the parallel setting; you have to know if something is "array-like" or "list-like."

Terminating an implicit thread

I agree that in general there's no way to kill an implicit thread, but does it matter? We are heading to a world in which even end-user desktop machines will have thousands of hardware threads available. Just let it run to completion and ignore the result. (This was the rationale behind my comment above on the handling of exceptions).

I suppose that my argument boils down to always presuming parallelisation unless something has been explicitly sequenced, (e.g. by monad laws). You appear to be arguing for the reverse in that you are looking for explicit concurrency primitives, thereby implying that the default behaviour is sequential processing.

And please accept my apologies if I am misreading what you are saying!

I agree that in general

I agree that in general there's no way to kill an implicit thread, but does it matter?

I'd say yes, because processes can be non-terminating. A language is an operating system of sorts, and as an OS, I want to be able to orchestrate execution of various entities and isolate them if necessary.

I suppose that my argument boils down to always presuming parallelisation unless something has been explicitly sequenced, (e.g. by monad laws)

What are the consequences for the developer's mental model? Sequenced semantics are simple to reason about (or at least, are well-known).

You appear to be arguing for the reverse in that you are looking for explicit concurrency primitives, thereby implying that the default behaviour is sequential processing.

I just want some framework where both are available. I know explicit concurrency primitives are necessary, as evidenced by my example (or at least some form of external termination control -- see Orc's filter expression for example); I want to be able to use this control when necessary, but otherwise, I'd love it if the compiler could parallelize everything it can!

I wonder how explicit and implicit concurrency intermix as pertaining to vprocs, etc. For instance, suppose I launch a process whose expression has many parallel subexpressions. What does the scheduling trace look like for each vproc? Do the sub-computations inherit the process's slice? I'll have to read their schedulers section more carefully, because I didn't really grasp it on the first read-through.

Sequential semantics

At least for scientific computing, we have two or three generations of programmers trained to assume sequential semantics of mathematical expressions. The latter may include operations on vectors or collections which have implicit parallel semantics, but the mathematics already suggests this kind of collection-oriented meaning. However, if I write y = A * x (for a matrix A and vector x) and then afterwards write w = y' * y, I expect an implicit sequencing. Furthermore, it's not clear how making these operations implicitly parallel would help performance, in the usual situations. Reasoning about how to decouple them is something I'm doing now for a PhD thesis; I don't expect compilers to do it and I don't _want_ them to, because it changes the rounding error significantly.

I see implicit dataflow parallel semantics as something that might be useful in some domains, but not in all.