Parallel/Distributed

Milner Symposium 2012

The Milner Symposium 2012 was held in Edinburgh this April in memory of the late Robin Milner.

The Milner Symposium is a celebration of the life and work of one of the world's greatest computer scientists, Robin Milner. The symposium will feature leading researchers whose work is inspired by Robin Milner.

The programme consisted of academic talks by colleagues and past students. The talks and slides are available online.

I particularly liked the interleaving of the personal and human narrative underlying the scientific journey. A particularly good example is Joachim Parrow's talk on the origins of the pi calculus. Of particular interest to LtU members is the panel on the future of functional programming languages, consisting of Phil Wadler, Xavier Leroy, David MacQueen, Martin Odersky, Simon Peyton-Jones, and Don Syme.

The Trouble with Erlang

Tony Arcieri, author of the Reia Ruby-like language for the Erlang BEAM platform, wrote a piece in July, The Trouble with Erlang (or Erlang is a ghetto), bringing together a long laundry list of complaints about Erlang and the concepts behind it, and arguing at the end that Clojure now provides a better basis for parallel programming in practice.

While the complaints include many points about syntax, data types, and the like, the heart of the critique is two-fold: first, that Erlang has terrible problems managing memory and does not scale as advertised, and that these failures partly follow from "Erlang hat[ing] state. It especially hates shared state." He points to the Goetz and Click argument in Concurrency Revolution From a Hardware Perspective (2010) that local state is compatible with the Actors model. He further argues that SSA as it is used in Erlang is less safe than local state.

Opa

Opa is a new member in the family of languages aiming to make web programming transparent by automatically generating client-side Javascript and handling communication and session control. Opa is written in OCaml. A hierarchical database and web server are integrated with the language. The distribution model is based on a notion of a session, a construct roughly comparable to process definitions in the join-calculus or to concurrent objects in a number of formalisms.

A good place to start is here. And here you can find several example programs with accompanying source code.

Memory Models: A Case for Rethinking Parallel Languages and Hardware, CACM, August 2010

Memory Models: A Case for Rethinking Parallel Languages and Hardware by Sarita V. Adve and Hans-J. Boehm

This is a pre-print of the actual version.

The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming.

This paper discusses the path to the above convergence, the hard lessons learned, and their implications. A cornerstone of this convergence has been the view that the memory model should be a contract between the programmer and the system - if the programmer writes disciplined (data-race-free) programs, the system will provide high programmability (sequential consistency) and performance. We discuss why this view is the best we can do with current popular languages, and why it is inadequate moving forward. We then discuss research directions that eliminate much of the concern about the memory model, but require rethinking popular parallel languages and hardware. In particular, we argue that parallel languages should not only promote high-level disciplined models, but they should also enforce the discipline. Further, for scalable and efficient performance, hardware should be co-designed to take advantage of and support such disciplined models. The inadequacies of the state-of-the-art and the research agenda we outline have deep implications for the practice, research, and teaching of many computer science sub-disciplines, spanning theory, software, and hardware.

Concurrent Pattern Calculus

Concurrent Pattern Calculus by Thomas Given-Wilson, Daniele Gorla, and Barry Jay:

Concurrent pattern calculus drives interaction between processes by comparing data structures, just as sequential pattern calculus drives computation. By generalising from pattern matching to pattern unification , interaction becomes symmetrical, with information flowing in both directions. This provides a natural language for describing any form of exchange or trade. Many popular process calculi can be encoded in concurrent pattern calculi.

Barry Jay's Pattern Calculus has been discussed a few times here before. I've always been impressed with the pattern calculus' expressive power for computing over arbitrary structure. The pattern calculus supports new forms of polymorphism, which he termed "path polymorphism" and "pattern polymorphism", which are difficult to provide in other calculi. The closest I can think of would be a compiler-provided generalized fold over any user-defined structure.

This work extends the pattern calculus to the concurrent setting by adding constructs for parallel composition, name restriction and replication, and argues convincingly for its greater expressiveness as compared to other concurrent calculi. He addresses some of the obvious concerns for symmetric information flow of the unification operation.

Language Virtualization for Heterogeneous Parallel Computing

Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun describe an approach to parallel DSLs that is a hybrid between external DSLs and internal DSLs in Language Virtualization for Heterogeneous Parallel Computing.

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Ordinary programmers are having a difficult time using these programming models effectively. To provide a programming model that addresses the productivity and performance requirements for the average programmer, we explore a domain-specific approach to heterogeneous parallel programming.

We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them.We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.

While the motivation of the paper is parallelization the proposed design looks like LINQ expression trees dialed to 11.

Scala Team Wins ERC Grant

The Scala research group at EPFL is excited to announce that they have won a 5 year European Research Grant of over 2.3 million Euros to tackle the "Popular Parallel Programming" challenge. This means that the Scala team will nearly double in size to pursue a truly promising way for industry to harness the parallel processing power of the ever increasing number of cores available on each chip.

As you can see from a synopsis of the proposal, Scala will be providing the essential mechanisms to allow a simpler programming model for common problems that benefit from parallel processing. The principal innovation is to use "language virtualization", combining polymorphic embeddings with domain-specific optimizations in a staged compilation process.

This may yet lead to very interesting developments.

Stressed by Distributed Programming? Well, CALM Down.

in Consistency Analysis in Bloom: a CALM and Collected Approach Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak of UC, Berkeley say

Distributed programming has become a topic of widespread interest, and many programmers now wrestle with tradeoffs between data consistency, availability and latency. Distributed transactions are often rejected as an undesirable tradeoffs today, but in the absence of transactions there are few concrete principles or tools to help programmers design and verify the correctness of their applications.

We address this situation with the CALM principle, which connects the idea of distributed consistency to program tests for logical monotonicity. We then introduce Bloom, a distributed programming language that is amenable to high-level consistency analysis and encourages order-insensitive programming. We present a prototype implementation of Bloom as a domain-specific language in Ruby. We also propose a program analysis technique that identifies points of order in Bloom programs: code locations where programmers may need to inject coordination logic to ensure consistency. We illustrate these ideas with two case studies: a simple key-value store and a distributed shopping cart service.

Semi-implicit batched remote code execution as staging

Oleg Kiselyov has just posted another amazing work: Semi-implicit batched remote code execution as staging.

Batching several remote-procedure or remote-object operations into one request decreases the number of network client/server round-trips, reduces the communication overhead and indeed significantly improves performance of distributed applications. The benefits are offset by the cost of restructuring the code to incite large batches; and by the increase in the difficulty of reasoning about the code, predicting its performance let alone establishing correctness. The overall research goal is to reduce the downside.

We describe a semi-implicit batching mechanism that makes the points of remote server communication explicit yet conceals the proxies, saving the trouble of writing them. The changes to the client code are minimal: mainly, adding calls to force. The type-checker will not let the programmer forget to call force. The remote batch server is simple and generic, with no need to optimize for specific clients.

Our mechanism batches both independent and data-dependent remote calls. Our mechanism is compositional, letting the programmer build nested applications and conditional (and, potentially, iterative) statements using composition, application and naming. Writing a remote program is exactly like writing a typed local program, which is type-checked locally, and can even be executed locally (for debugging).

The key insights are treating remote execution as a form of staging (meta-programming), generalizing mere remote function calls to remote applicative and conditional expressions, and introducing an embedded domain-specific language, Chourai, for such expressions. A batch of dependent remote function calls can then be regarded as a complex applicative expression in the A-normal form. Another key insight is that emulating call-by-value via call-by-need surprisingly makes sense.

Here's an example piece of Chourai code, for deleting albums whose rating is below 5 among the first n albums of an album database (called "large") hosted by the server. get_album, next_album, and similar functions constitute the "RPC" interface to the server.

     let delete_low_rating n =
      let rec loop album i =
        let t = guard (app2 lt (app get_rating album) (int 5)) 
                      (fun () -> app delete_album album) in
        if i >= n then force t else
        loop (app next_album album) (succ i)
      in loop (app get_album (string "large")) 0;;

Amazingly, delete_low_rating 4 requires just one round-trip to the server!

Thorn

Thorn is

a dynamically-typed concurrent language in which lightweight isolated processes communicate by message passing. Thorn includes powerful aggregate data types, a class-based object system, first-class functions, an expressive module system, and a variety of features supporting the gradual evolution of prototype scripts into robust programs.

Thorn is implemented by a compiler targeting the JVM and a Java interpreter, and syntactically resembles Scala, at least superficially.

One of those "features" is a unique (as far as I know) soft type system:

In Thorn, the type dyn (for dynamic) is assumed as default (and never written explicitly). At the other extreme, Thorn supports concrete types, as used in statically typed programming languages. A variable of a concrete type T is guaranteed to refer to a value of that type (or a subtype). [...] While concrete types help with performance and correctness, they introduce restrictions on how software can be used and make rapid development more difficult; scripting languages do not favor them.

As an intermediate step between the two, we propose like types, getting some of the safety of concrete types while retaining the flexibility of dynamic types. Concrete types for var x:T or fun f(x:T) are used in two main places. At a method call x.m(), a static type check ensures that x actually has an m method. At a binding or assignment, like x := y; or f(y), a static type check can ensure that y's value makes sense to assign to x, can reject it entirely, or can inspire a dynamic check. Like types, var x: like T or fun f(x:like T), give the expressive power of concrete type checks on method calls, but do not constrain binding or
assignment. They do require runtime checks and thus may cause programs to fail with runtime type errors: sometimes fewer and never more than dynamic types do.

Concurrency is also a little odd:

Every component (marked by the keyword spawn) runs in a different JVM. Component handles contains sufficient information to identify the node and port on which the component runs.

A couple of papers are linked to the home page; "Thorn - Robust, Concurrent, Extensible Scripting on the JVM", by Bard Bloom, et. al., is a general description of the language, from which come the quotes above; and "Integrating Typed and Untyped Code in a Scripting Language", by Tobias Wrigstad, et. al., with more information about like types.

I have not seen Thorn here before. Apologies if I have just missed it.

XML feed