LVars: monotonic update for deterministic parallel programming

LVars are one outcome of Lindsey Kuper's ongoing PhD work at Indiana University. They generalize existing models for deterministic parallelism by considering a general framework of monotonic read and write operations. They were briefly mentioned on LtU before (along with the strongly related work on Bloom in the distributed systems community), and were recently presented in two distinct and complementary articles.

The first article describes the basic building blocks and ideas of LVars:

LVars: Lattice-Based Data Structures for Deterministic Parallelism
Lindsey Kuper, Ryan R. Newton
2013

Programs written using a deterministic-by-construction model of parallel computation are guaranteed to always produce the same observable results, offering programmers freedom from subtle, hard-to-reproduce nondeterministic bugs that are the scourge of parallel software. We present LVars, a new model for deterministic- by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a user-specified lattice.

LVars ensure determinism by allowing only monotonic writes and "threshold" reads that block until a lower bound is reached. We give a proof of determinism and a prototype implementation for a language with LVars and describe how to extend the LVars model to support a limited form of nondeterminism that admits failures but never wrong answers

The second relaxes the original model by introducing failure, which widens its applicability:

Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars
Lindsey Kuper, Aaron Turon, Neelakantan Krishnaswami, Ryan R. Newton
2014

Deterministic-by-construction parallel programming models offer programmers the promise of freedom from subtle, hard-to-reproduce nondeterministic bugs in parallel code. A principled approach to deterministic-by-construction parallel programming with shared state is offered by LVars: shared memory locations whose semantics are defined in terms of a user-specified lattice. Writes to an LVar take the least upper bound of the old and new values with respect to the lattice, while reads from an LVar can observe only that its contents have crossed a specified threshold in the lattice. Although it guarantees determinism, this interface is quite limited.

We extend LVars in two ways. First, we add the ability to “freeze” and then read the contents of an LVar directly. Second, we add the ability to attach callback functions to an LVar, allowing events to be triggered by writes to it. Together, callbacks and freezing enable an expressive and useful style of parallel programming. We prove that in a language where communication takes place through freezable LVars, programs are at worst quasi-deterministic: on every run, they either produce the same answer or raise an error. We demonstrate the viability of our approach by implementing a library for Haskell supporting a variety of LVar-based data structures, together with two case studies that illustrate the programming model and yield promising parallel speedup.

Something I personally found surprising and impressive about LVars is that, while I was initially interested in the very formal aspects of providing a theoretical framework for deterministic concurrency, it very quickly produced a practical library that people can use to write parallel program -- and competitive with existing high-performance approaches. As described in a recent blog post, a Haskell library is available on Hackage -- but surely LVars-inspired libraries could make sense in a lot of other languages as well.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Single assignment, logic unification, and general lattices

Everyone comes to new research with different pre-existing (partial) knowledge and references. My personal reaction when first learning about LVars was to relate it to the use of logic variables for deterministic parallelism in Mozart/Oz, as described in the book Concept, Techniques and Models of Computer Programming by Peter Van Roy and Seif Haridi.

I think there is a distinction between this model and the more widely known single-assignment shared variables (eg. IVars in Haskell). Unification is richer than assignment as it may be performed several times; we move from a "unset/set" binary perspective to the richer world of partially defined data structures, where an undefined part may be found recursively on any subpart of the structure (and yet readers can work without blocking on the already-defined part). Looking at this from the LVars perspective, I would say that single-assignment variables make for a really boring lattice, while partially-defined data have a quite interesting order structure already. (I wonder if for any given lattice you can build an isomorphic one from a partially-defined data domain.)

Interestingly enough, the extension of single-assignment variables into logic-programming unification was already explicitly mentioned in the original I-structure paper from Arvind et al. cited in the first LVars paper:

In I-Structures: Data Structures for Parallel Computing
Arvind, Rishiyur S. Nikhil and Keshav K. Pingali
1989

[...]

A component of an I-structure A may be selected (read) using the expression:
A[i]
This expression returns a value only after the location becomes nonempty, i.e., after some other part of the program has assigned the value.

There is no test for emptiness of an I-structure location. These restrictions -- write-once, deferred reads, and no test for emptiness -- ensure that the language remains determinate; there are no read-write races. Thus, the programmer need not be concerned with the timing of a read relative to a write. All reads of a location return a single, consistent value, albeit after an arbitrary delay.

5.1.4 Discussion

Semantically, one can think of each location in an I-structure as containing a logical term. Initially, the term is just a logic variable -- it is completely unconstrained. Assignment to that location can be viewed as a refinement of, or constraint on the term at that location. This is what motivates our calling it a "constraint statement". The single-assignment rule is sufficient to preclude inconsistent instantiations of the initial logic variable. Of course, the single-assignment rule is not a necessary condition to avoid inconsistent instantiations. We could take the view that assignment is really unification, and then multiple writes would be safe so long as the values unify. Id does not currently take this view, for efficiency reasons.

Another citation that I feel is relevant is a remark from the Ozma project (already discussed on LtU), that tries to import the Oz programming concepts into Scala, purely as a library. In particular, they do not support full unification as in Oz, but only single-assignment variables (more precisely what they call "shallow unification": you can still unify two undefined variables together). They didn't find this loss of expressiveness problematic in practice:

In Ozma: Extending Scala with Oz Concurrency
Sébastien Doeraene
2011

[...]

Oz supports full unification. That means that the = operator will dive into data structures, and make them structurally equal. In Ozma, structures do not exist: all data structures are objects. Now, the = operator does not dive into objects.

Therefore, Ozma does not support full unification. It only supports variable-variable, i.e. shallow unification. This means that we cannot unify two partially filled lists and expect them to merge: this will fail instead because the two lists are two different instances.

Shallow unification is essential to all Oz programming techniques, while full unification is only rarely needed. We chose to drop full unification in Ozma because it fits better in the object model, while still preserving most of Oz programming techniques.

Note that this shallow unification is still very powerful: many unifications can take place in any order, and the result will always be the same.

The "Related Work" section of the first LVar paper has a very clear description of how to go from these work, and also the notion of monotonicity in Kahn's process networks and children (which were also cited as an inspiration in the CTM book), to the generalization to arbitrary notions of monotonic update.

In data-flow languages like StreamIt, communication takes place over FIFOs with ever-increasing channel histories, while in IVar-based systems such as CnC and monad-par, a shared data store of single-assignment memory cells grows monotonically. Hence monotonic data structures -- those to which information is only added and never removed -- emerge as a common theme of both data-flow and single-assignment models.

Because state modifications that only add information and never destroy it can be structured to commute with one another and thereby avoid race conditions, it stands to reason that diverse deterministic parallel programming models would leverage the principle of monotonicity. Yet there is little in the way of a theory of monotonic data structures as a basis for deterministic parallelism. As a result, systems like CnC, monad-par and StreamIt emerge independently, without recognition of their common basis. Moreover, they lack generality: IVars and FIFO streams alone cannot support all producer/consumer applications, as we discuss in Section 2.

I wonder

I wonder if for any given lattice you can build an isomorphic one from a partially-defined data domain

Given an element of a lattice L, you can identify it with an element of L -> Bool, by x |-> (\y. y < x). So just treat False as undefined. If the lattice has the property that there are only finitely many elements less than a given element, then these maps L -> Bool could be implemented as finite lists where the missing tail is taken to be const False.

Encoding LVars as I-structures

I wonder if for any given lattice you can build an isomorphic one from a partially-defined data domain.

If I understand the question correctly, then it can be rephrased as: can you express every LVar as an I-structure (that is, an array of IVars)? I think the answer is "yes, although the space complexity is terrible". The canonical example of an LVar that isn't already an I-structure is a simple monotonic natural number counter, where the order is just the usual

So the next interesting question to ask is, what can you do to cut down on space usage, and how does that affect determinism? (Perhaps related.)

IVars plus unification

Yes, indeed! I'm so glad that you're familiar with the 1989 paper. Arvind was my advisor in grad school and I feel like this earlier work on Id, Ph, and dataflow is under-appreciated today.

So, yes, the LVar project grew out of a fascination with this problem of the under-expressiveness of IVars without unification, and speculation about what they can do with unification (as hinted at in the 1989 paper).

Also, at Intel we worked on a system called "CnC" that is based on growing tables, and there's even a Haskell implementation, though the original implementation was replaced by monad-par. But the monad-par implementation loses something precisely because IVars alone cannot implement a table-like structure!

So we started off looking at IVars with explicit unification on empty IVars (like logic or type variables). But there are some asymptotically losses of efficiency with that model and the lattice-based one is at least as general. Lindsey's recent work has borne this out, but there's a lot left to do in this space to expand what is possible with deterministic parallelism.

Improving Values

An interesting design I heard of - first from Conal Elliott (though mentioned only in passing) - was 'improving values'. The idea here is that values are reported as a range of possible values, possibly indicating an error range or confidence interval. Over time, we may incrementally and monotonically tighten this range. I.e. we are improving our knowledge of the value.

Conal used this concept of improving values for timestamps of events. I.e. the FRP system is continuously tightening the lower bound for the 'next' event. A query on such a variable might ask whether the lower bound for the next event was greater than a certain timestamp, and would wait until it resolved. But I think there are quite a few other interesting applications for improving values.

Anyhow, improving values seem like they'd be a natural use case for LVars.

Burton's papers

Improving values are really interesting. There are two papers that I'm aware of, both from JFP:

F. Warren Burton (1991). Encapsulating non-determinacy in an abstract data type with determinate semantics. Journal of Functional Programming, 1, pp 3-20. doi:10.1017/S0956796800000046.

W. Ken Jackson and F. Warren Burton (1993). Improving intervals. Journal of Functional Programming, 3, pp 153-169. doi:10.1017/S0956796800000678.

Of the two, I've only looked closely at "Encapsulating nondeterminacy", the first, and it sounds as though the FRP work is more related to "Improving intervals", the second. But in any case, it's clear that there's some kind of relationship between improving values and LVars.

From all the notes I took on the first paper, one thing jumps out at me now: Burton defines a "flat" domain as one where values are either completely defined or are bottom. Various "awkward" problems can be avoided by not allowing non-flat domains like lazy lists to be improving values. And if we do avoid non-flat domains, then we can prove certain axioms about the behavior of operations on improving values. The "flat domain" thing sounds rather IVar-ish, to me.

Having said all that, I didn't find these papers easy to read, and they're showing their age a little -- the code examples are in Miranda. Maybe someone here has a reference to a more up-to-date introduction to improving values.

Typing these things

I haven't gotten into the paper deeply yet, but presumably the idea is that it's possible to use these new freezable LVars in such a way that errors can't happen. So what would a type system look like that shows that these race conditions don't happen for a particular program and that things are fully deterministic? Would that be reasonable or impractical?

A minor type-and-effect system

Oh no, that is quite reasonable. As you can see in the POPL paper we have a simple "determinism-level" phantom type argument that parameterizes this Par monad. It provides a very lightweight type-and-effect system where if you use the "freeze" operation, it will have to show up in the type of the computation. (This is the "d" argument to the Par constructor found in the library.)

We should emphasize, however, that while some apps need to use freeze directly, and it is interesting from a research perspective, many many apps can get away with only doing one freeze at the very end of the runPar. That is supported by the DeepFrz module, documented here.

Two more things

Two things I planned to say in the description post:

(1) It's really easy to experiment with the Haskell library, just cabal install lvish. There are code examples in this blog post

(2) It's easy to overlook the "get" operation on LVars. You can probably guess how "write" behaves, but all the subtlety of LVars is packed in the reading construction, which is annotated with a "threshold set" delimiting what can be observed from the read value. It's interesting and non-trivial, so you should probably have a look -- for example page 3 of the second paper.