Concurrent Pattern Calculus

Concurrent Pattern Calculus by Thomas Given-Wilson, Daniele Gorla, and Barry Jay:

Concurrent pattern calculus drives interaction between processes by comparing data structures, just as sequential pattern calculus drives computation. By generalising from pattern matching to pattern unification , interaction becomes symmetrical, with information flowing in both directions. This provides a natural language for describing any form of exchange or trade. Many popular process calculi can be encoded in concurrent pattern calculi.

Barry Jay's Pattern Calculus has been discussed a few times here before. I've always been impressed with the pattern calculus' expressive power for computing over arbitrary structure. The pattern calculus supports new forms of polymorphism, which he termed "path polymorphism" and "pattern polymorphism", which are difficult to provide in other calculi. The closest I can think of would be a compiler-provided generalized fold over any user-defined structure.

This work extends the pattern calculus to the concurrent setting by adding constructs for parallel composition, name restriction and replication, and argues convincingly for its greater expressiveness as compared to other concurrent calculi. He addresses some of the obvious concerns for symmetric information flow of the unification operation.

Parametric Prediction of Heap Memory Requirements

Parametric Prediction of Heap Memory Requirements, by Victor Braberman, Federico Fernandez, Diego Garbervetsky, Sergio Yovine:

This work presents a technique to compute symbolic polynomial approximations of the amount of dynamic memory required to safely execute a method without running out of memory, for Java-like imperative programs. We consider object allocations and deallocations made by the method and the methods it transitively calls. More precisely, given an initial configuration of the stack and the heap, the peak memory consumption is the maximum space occupied by newly created objects in all states along a run from it. We over-approximate the peak memory consumption using a scoped-memory management where objects are organized in regions associated with the lifetime of methods. We model the problem of computing the maximum memory occupied by any region configuration as a parametric polynomial optimization problem over a polyhedral domain and resort to Bernstein basis to solve it. We apply the developed tool to several benchmarks.

We've briefly discussed analyses to predict heap usage here on LtU, but I can't seem to find them. Anyone with a reference handy, please post in the comments!

Language Virtualization for Heterogeneous Parallel Computing

Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun describe an approach to parallel DSLs that is a hybrid between external DSLs and internal DSLs in Language Virtualization for Heterogeneous Parallel Computing.

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Ordinary programmers are having a difficult time using these programming models effectively. To provide a programming model that addresses the productivity and performance requirements for the average programmer, we explore a domain-specific approach to heterogeneous parallel programming.

We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them.We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.

While the motivation of the paper is parallelization the proposed design looks like LINQ expression trees dialed to 11.

Scala Team Wins ERC Grant

The Scala research group at EPFL is excited to announce that they have won a 5 year European Research Grant of over 2.3 million Euros to tackle the "Popular Parallel Programming" challenge. This means that the Scala team will nearly double in size to pursue a truly promising way for industry to harness the parallel processing power of the ever increasing number of cores available on each chip.

As you can see from a synopsis of the proposal, Scala will be providing the essential mechanisms to allow a simpler programming model for common problems that benefit from parallel processing. The principal innovation is to use "language virtualization", combining polymorphic embeddings with domain-specific optimizations in a staged compilation process.

This may yet lead to very interesting developments.

The IO Monad is 45 years old

Oleg Kiselyov wrote a mail to haskell-cafe today titled, The IO Monad is 45 years old. I thought LtU readers might like this.

The Semicolon Wars

Some light reading for the holiday season: writing for American Scientest, Brian Hayes says in The Semicolon Wars

A catalog maintained by Bill Kinnersley of the University of Kansas lists about 2,500 programming languages. Another survey, compiled by Diarmuid Piggott, puts the total even higher, at more than 8,500. And keep in mind that whereas human languages have had millennia to evolve and diversify, all the computer languages have sprung up in just 50 years. Even by the more-conservative standards of the Kinnersley count, that means we've been inventing one language a week, on average, ever since Fortran.

For ethnologists, linguistic diversity is a cultural resource to be nurtured and preserved, much like biodiversity. All human languages are valuable; the more the better. That attitude of detached reverence is harder to sustain when it comes to computer languages, which are products of design or engineering rather than evolution. The creators of a new programming language are not just adding variety for its own sake; they are trying to make something demonstrably better. But the very fact that the proliferation of languages goes on and on argues that we still haven't gotten it right. We still don't know the best notation—or even a good-enough notation—for expressing an algorithm or defining a data structure.

Azul's Pauseless Garbage Collector

Here's Gil Tene on Azul's Pauseless Garbage Collector for the JVM.

One of the key techniques that we use is massive and rapid manipulation of virtual memory mappings. We will change mappings of virutal to physical memory at the rate of Java allocation.

And

The same read barrier I mentioned before will also intercept any attempt to read a reference to an object that has been relocated, and that allows us to lazily relocate references without needing a pause. We compact by moving an entire virtual page worth of objects, we kind of blow it up, moving all the live objects to other places and thereby compacting them. But we don't try and locate and find all the pointers to that page immediately.

The challenge seems to be that standard OSes don't currently have enough hooks for them to do this kind of thing so their runtime must live in either their custom hardware and OS or a virtual machine.

Stressed by Distributed Programming? Well, CALM Down.

in Consistency Analysis in Bloom: a CALM and Collected Approach Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak of UC, Berkeley say

Distributed programming has become a topic of widespread interest, and many programmers now wrestle with tradeoffs between data consistency, availability and latency. Distributed transactions are often rejected as an undesirable tradeoffs today, but in the absence of transactions there are few concrete principles or tools to help programmers design and verify the correctness of their applications.

We address this situation with the CALM principle, which connects the idea of distributed consistency to program tests for logical monotonicity. We then introduce Bloom, a distributed programming language that is amenable to high-level consistency analysis and encourages order-insensitive programming. We present a prototype implementation of Bloom as a domain-specific language in Ruby. We also propose a program analysis technique that identifies points of order in Bloom programs: code locations where programmers may need to inject coordination logic to ensure consistency. We illustrate these ideas with two case studies: a simple key-value store and a distributed shopping cart service.

Semi-implicit batched remote code execution as staging

Oleg Kiselyov has just posted another amazing work: Semi-implicit batched remote code execution as staging.

Batching several remote-procedure or remote-object operations into one request decreases the number of network client/server round-trips, reduces the communication overhead and indeed significantly improves performance of distributed applications. The benefits are offset by the cost of restructuring the code to incite large batches; and by the increase in the difficulty of reasoning about the code, predicting its performance let alone establishing correctness. The overall research goal is to reduce the downside.

We describe a semi-implicit batching mechanism that makes the points of remote server communication explicit yet conceals the proxies, saving the trouble of writing them. The changes to the client code are minimal: mainly, adding calls to force. The type-checker will not let the programmer forget to call force. The remote batch server is simple and generic, with no need to optimize for specific clients.

Our mechanism batches both independent and data-dependent remote calls. Our mechanism is compositional, letting the programmer build nested applications and conditional (and, potentially, iterative) statements using composition, application and naming. Writing a remote program is exactly like writing a typed local program, which is type-checked locally, and can even be executed locally (for debugging).

The key insights are treating remote execution as a form of staging (meta-programming), generalizing mere remote function calls to remote applicative and conditional expressions, and introducing an embedded domain-specific language, Chourai, for such expressions. A batch of dependent remote function calls can then be regarded as a complex applicative expression in the A-normal form. Another key insight is that emulating call-by-value via call-by-need surprisingly makes sense.

Here's an example piece of Chourai code, for deleting albums whose rating is below 5 among the first n albums of an album database (called "large") hosted by the server. get_album, next_album, and similar functions constitute the "RPC" interface to the server.

     let delete_low_rating n =
      let rec loop album i =
        let t = guard (app2 lt (app get_rating album) (int 5)) 
                      (fun () -> app delete_album album) in
        if i >= n then force t else
        loop (app next_album album) (succ i)
      in loop (app get_album (string "large")) 0;;

Amazingly, delete_low_rating 4 requires just one round-trip to the server!

Continuation-Passing C: Compiling threads to events through continuations

Gabriel Kerneis and Juliusz Chroboczek, "Continuation-Passing C: Compiling threads to events through continuations", arXiv: 1011.4558.

In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight representation for contexts. We provide a complete proof of the correctness of our compilation scheme. We show in particular that lambda-lifting, a common compilation technique for functional languages, is also correct in an imperative language like C, under some conditions enforced by the CPC compiler. The current CPC compiler is mature enough to write substantial programs such as Hekate, a highly concurrent BitTorrent seeder. Our benchmark results show that CPC is as efficient, while significantly cheaper, as the most efficient thread libraries available.

See also the CPC website.