Parallel/Distributed

Guy Steele on Language Design

(via PLNews)
Thoughts on Language Design

Computers with multiple processors are becoming more and more common. We need programming languages that support the use of multiple threads of control. [...] In such a context, the organizing principles of structured programming may not always be appropriate. I'm not talking about avoiding goto. (Java has no goto statement.) I'm talking about sequencing, if-then-else, and loops.

Accelerator: simplified programming of graphics processing units for general-purpose uses via data-parallelism

Accelerator: simplified programming of graphics processing units for general-purpose uses via data-parallelism. David Tarditi, Sidd Puri, and Jose Oglesby.

GPUs are difficult to program for general-purpose uses. Programmers must learn graphics APIs and convert their applications to use graphics pipeline operations. We describe Accelerator, a system that simplifies the programming of GPUs for general-purpose uses. Accelerator provides a high-level data-parallel programming model as a library that is available from a conventional imperative programming language. The library translates the data-parallel operations on-the-fly to optimized GPU pixel shader code and API calls.

A library provides programmers with a new type of array, a data-parallel array. Data-parallel arrays differ from conventional arrays in two ways. First, the only operations available on them are aggregate operations over entire input arrays. The operations are a subset of those found in languages like APL. They include element-wise arithmetic and comparison operators, reductions to compute min, max, product, and sum, and transformations on entire arrays. Second, the data-parallel arrays are functional: each operation produces an entirely new data-parallel array.

Envisioning a New Language: A Conversation With Victoria Livschitz

Metaphors attempts to address four topics essential to software development that are traditionally handled outside of the programming language. The first is support for distributed runtime environment, which I just talked about. The other three are (a) support for contextual programming, (b) support for autonomous executable entities, and (c) support for software evolution and reuse.

I am not sure what to make of this.

Lock-Free Data Structures using STMs in Haskell

Lock -Free Data Structures using STMs in Haskell. Anthony Discolo, Tim Harris, Simon Marlow, Simon Peyton Jones, and Satnam Singh. Submitted to FLOPS'06.

This paper explores the feasibility of re-expressing concurrent algorithms with explicit locks in terms of lock free code written using Haskell's implementation of Software Transactional Memory (STM). Preliminary experimental results are presented which show that for multi-processor systems the simpler lock free implementations offer competitive or superior performance when compared to their corresponding the lock based implementations.

The ArrayBlockingQueue class from JSR-166 is the basis for this experiment. A selection of representative methods from the three interfaces supported by this class is reimplemented in Haskell. Two implementations are provided: one using mutexes and semaphores from the IO monad and one based on STMs.

The performance of the two versions is compared and discussed.

The X10 Programming Language

X10 is a type-safe, modern, parallel, distributed object-oriented language intended to be very easily accessible to Java(TM) programmers. It is targeted to future low-end and high-end systems with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in scalable cluster configurations. A member of the Partitioned Global Address Space (PGAS) family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; constructs for termination detection (finish) and phased computation (clocks); the use of lock-free synchronization (atomic blocks); and the manipulation of global arrays and data structures.

X10 was mentioned here a couple of times, but I don't think it was ever discussed at length.

Here's the (relatively empty) IBM Research X10 home page.

An Overview of the Singularity Project

Singularity is a research project in Microsoft Research that started with the question: what would a software platform look like if it was designed from scratch with the primary goal of dependability? Singularity is working to answer this question by building on advances in programming languages and tools to develop a new system architecture and operating system (named Singularity), with the aim of producing a more robust and dependable software platform. Singularity demonstrates the practicality of new technologies and architectural decisions, which should lead to the construction of more robust and dependable systems...
Singularity... starts from a premise of language safety and builds a system architecture that supports and enhances the language guarantees.

An interesting overview of what sounds like an intersting project.

The choice of implementation language is also interesting:

Singularity is written in Sing#, which is an extension to the Spec# language developed in Microsoft Research. Spec# itself is an extension to Microsoft’s C# language that provides constructs (pre- and post-conditions and object invariants) for specifying program behavior. Specifications can be statically verified by the Boogie verifier or checked by compiler-inserted run-time tests. Sing# extends this language with support for channels and low-level constructs necessary for system code....integrating a feature into a language allows more aspects of a program to be verified. Singularity’s constructs allow communication to be statically verified.

An interesting aspect is the support for meta-programming, which is implemented in an unusal manner:

Compile-time reflection (CTR) is a partial substitute for the CLR’s full reflection capability. CTR is similar to techniques such as macros, binary code rewriting, aspects, meta-programming, and multi-stage languages. The basic idea is that programs may contain place-holder elements (classes, methods, fields, etc.) that are subsequently expanded by a generator.

Many other intersting design decisions are discussed in the paper (e.g., various DbC facilities), so do check it out.

Multigame A Very High Level Language for Describing Board Games

Multigame - A Very High Level Language for Describing Board Games, John W. Romein, Henri E. Bal, Dick Grune. First Annual ASCI Conference, 1995.

Languages with implicit parallelism are easier to program in than those with explicit parallelism, but finding and efficiently exploiting parallelism in general-purpose programming languages by parallelizing compilers is hard. A compiler for a Very High Level Language, designed for a specific application domain, has more knowledge about its application domain and may use this knowledge to generate efficient parallel code without requiring the programmer to deal with explicit parallelism. To investigate this idea, we designed Multigame, a Very High Level Language for describing board games. A Multigame program is a formal description of the rules of a game, from which the Multigame compiler automatically generates a parallel game playing program.

An amusing DSL, and an interesting investigation of implicit parallelism.

Also see this later paper.

It would be nice to find a downloadable implementation, by the way.

A Type Discipline for Authorization Policies

Cedric Fournet; Andrew D. Gordon; Sergio Maffeis. A Type Discipline for Authorization Policies. ESOP 2005.

Distributed systems and applications are often expected to enforce high-level authorization policies. To this end, the code for these systems relies on lower-level security mechanisms such as, for instance, digital signatures, local ACLs, and encrypted communications. In principle, authorization specifications can be separated from code and carefully audited. Logic programs, in particular, can express policies in a simple, abstract manner. We consider the problem of checking whether a distributed implementation based on communication channels and cryptography complies with a logical authorization policy. We formalize authorization policies and their connection to code by embedding logical predicates and claims within a process calculus. We formulate policy compliance operationally by composing a process model of the distributed system with an arbitrary opponent process. Moreover, we propose a new dependent type system for verifying policy compliance of implementation code. Using Datalog as an authorization logic, we show how to type several examples using policies and present a general schema for compiling policies.

Another "extreme" use of static typing...

RPC Under Fire

Steve Vinoski, RPC Under Fire, Internet Computing.

Nice discussion of the problems associated with the RPC model, which abstracts the network making remote calls look like local calls, even though they exhibit different types of failure.

Web services, JAX, and Cw are also mentioned.

Related links: here.

Threads Cannot be Implemented as a Library

Hans-J. Boehm. Threads Cannot Be Implemented As a Library. Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, June 2005, pp. 261-268. Official version. Technical report version.

In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.

The paper includes several interesting examples, and reminds us yet again of the importance of formally well defined semantics.

XML feed