Parallel/Distributed

Is Transactional Programming Actually Easier?

Is Transactional Programming Actually Easier?, WDDD '09, Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel.

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.

The TM research community claims that programming with transactional memory is easier than alternatives (like locks), but evidence is scant. In this paper, we describe a user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the studentsâ€™ code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

I've recently discovered the Workshop on Duplicating, Deconstructing, and Debunking (WDDD) and have found a handful of neat papers, and this one seemed especially relevant to LtU.

[Edit: Apparently, there is a PPoPP'10 version of this paper with 237 undergraduate students.]

Also, previously on LtU:

Transactional Memory versus Locks - A Comparative Case Study

Despite the fact Tommy McGuire's post mentions Dr. Victor Pankratius's talk was at UT-Austin and the authors of this WDDD'09 paper represent UT-Austin, these are two independent case studies with different programming assignments. The difference in assignments is interesting because it may indicate some statistical noise associated with problem domain complexity (as perceived by the test subjects) and could account for differences between the two studies.

Everyone always likes to talk about usability in programming languages without trying to do it. Some claim it can't even be done, despite the fact Horning and Gannon did work on the subject 3+ decades ago, assessing how one can Language Design to Enhance Program Reliability. This gives a glimpse both on (a) why it is hard (b) how you can still try to do usability testing, rather than determine the truthiness of a language design decision.

Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didnâ€™t require type theory, but the road ahead sure wasnâ€™t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Intel Concurrent Collections for Haskell

Intel Concurrent Collections...provides a mechanism for constructing [programs] that will execute in parallel while allowing the application developer to ignore issues of parallelism such as low-level threading constructs or the scheduling and distribution of computations. The model allows the programmer to specify high-level computational steps including inputs and outputs without imposing unnecessary ordering on their execution... Data is either local to a computational step or it is explicitly produced and consumed by them. An application in this programming model supports multiple styles of parallelism (e.g., data, task, pipeline parallel). While the interface between the computational steps and the runtime system remains unchanged, a wide range of runtime systems may target different architectures (e.g., shared memory, distributed) or support different scheduling methodologies (e.g., static or dynamic).

In short CnC purports to be a "a graph-based data-flow-esque deterministic parallel programming model". An open-source Haskell edition of the software was recently released on Hackage.

A series of blog posts describe the implementation: #1, #2, #3, #4, #5 (the last post links to a draft paper).

Personally, I would have preferred a more concise and down to earth description on the whole thing. If you have experiences to share, please do.

The Resurgence of Parallelism

Peter J. Denning and Jack B. Dennis, The Resurgence of Parallelism, Communications of the ACM, Vol. 53 No. 6, Pages 30-32, 10.1145/1743546.1743560

"Multi-core chips are a new paradigm!" "We are entering the age of parallelism!" These are today's faddish rallying cries for new lines of research and commercial development. ... The parallel architecture research of the 1960s and 1970s solved many problems that are being encountered today. Our objective in this column is to recall the most important of these results and urge their resurrection.

A brief but timely reminder that we should avoid reinventing the wheel. Denning and Dennis give a nice capsule summary of the history of parallel computing research, and highlight some of the key ideas that came out of earlier research on parallel computing. This isn't a technically deep article. But it gives a quick overview of the field, and tries to identify some of the things that actually are research challenges rather than problems for which the solutions have seemingly been forgotten.

Brians functional brain

A rather fun and instructive series of posts about implementing a cellular automaton in idiomatic Clojure, and making it run fast (and in parallel).

Be sure to read the second part, and the errata.

The AI Systems of Left 4 Dead

There's no PL content per se in this presentation, but a PL weenie will surely think of a DSL for almost every slide, so I hope posting this is warranted.

Tim Sweeney has written previously on the programming challenges raised by game development (The Next Mainstream Programming Languages: A Game Developer's Perspective), and I think this presentation is another showcase of the huge problems that need solving in game development.

Finally, I wonder why anyone would use a language that doesn't allow quick and simple syntax extension for driving a game engine? Seriously, the possibilities for ad-hoc DSLs seem endless.

Developer Guided Code Splitting

Google Web Toolkit, which compiles Java to JavaScript for running code in the browser, now includes Code Splitting, for reducing application download time:

To split your code, simply insert calls to the method GWT.runAsync at the places where you want the program to be able to pause for downloading more code. These locations are called split points.

A call to GWT.runAsync is just like a call to register any other event handler. The only difference is that the event being handled is somewhat unusual. Instead of being a mouse-click event or key-press event, the event is that the necessary code has downloaded for execution to proceed.

Tim Bray on Clojure and Erlang

A short comparison (plus some links) of Erlang and Clojure solutions to the simple problem of running a counter in a separate thread.

Parallel Performance Tuning for Haskell

Parallel Performance Tuning for Haskell. Don Jones Jr., Simon Marlow, and Satnam Singh.

Parallel Haskell programming has entered the mainstream with support now included in GHC for multiple parallel programming models, along with multicore execution support in the runtime. However, tuning programs for parallelism is still something of a black art. Without much in the way of feedback provided by the runtime system, it is a matter of trial and error combined with experience to achieve good parallel speedups. This paper describes an early prototype of a parallel profiling system for multicore programming with GHC. The system comprises three parts: fast event tracing in the runtime, a Haskell library for reading the resulting trace files, and a number of tools built on this library for presenting the information to the programmer. We focus on one tool in particular, a graphical timeline browser called ThreadScope. The paper illustrates the use of ThreadScope through a number of case studies, and describes some useful methodologies for parallelizing Haskell programs.

A Java Fork/Join Framework

Doug Lea: A Java Fork/Join Framework, Proceedings of the ACM 2000 conference on Java Grande.

This paper describes the design, implementation, and performance of a Java framework for supporting a style of parallel programming in which problems are solved by (recursively) splitting them into subtasks that are solved in parallel, waiting for them to complete, and then composing results. The general design is a variant of the workâˆ’stealing framework devised for Cilk.

This work is about to be incorporated into Java 7 as jsr166y:

Parallel*Array (often referred to as PA) and its planned follow-ons for sets and maps, provide an easier/better way of routinely programming to take advantage of dozens to hundreds of processors/cores: If you can think about a programming problem in terms of aggregate operations on collections of elements, then we can automate parallel execution. This generally pays off if either you have lots of elements, (in which case, it works well even if the operations are small/cheap), or if each of the operations are time consuming (in which case it works well even if there are not a lot of elements). To take advantage of this though, the aggregate processing must have a regular structure, which means that you must be able to express things in terms of apply, reduce, filter, map, cumulate, sort, uniquify, paired mappings, and so on.

Lambda the Ultimate

User login

Navigation