Stressed by Distributed Programming? Well, CALM Down.

in Consistency Analysis in Bloom: a CALM and Collected Approach Peter Alvaro, Neil Conway, Joseph M. Hellerstein, William R. Marczak of UC, Berkeley say

Distributed programming has become a topic of widespread interest, and many programmers now wrestle with tradeoffs between data consistency, availability and latency. Distributed transactions are often rejected as an undesirable tradeoffs today, but in the absence of transactions there are few concrete principles or tools to help programmers design and verify the correctness of their applications.

We address this situation with the CALM principle, which connects the idea of distributed consistency to program tests for logical monotonicity. We then introduce Bloom, a distributed programming language that is amenable to high-level consistency analysis and encourages order-insensitive programming. We present a prototype implementation of Bloom as a domain-specific language in Ruby. We also propose a program analysis technique that identifies points of order in Bloom programs: code locations where programmers may need to inject coordination logic to ensure consistency. We illustrate these ideas with two case studies: a simple key-value store and a distributed shopping cart service.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

now i want some cereal

"tradeos" :-)

(i assume it was that thing where pdfs sometimes in some slices of the muliverse render two consecutive f's as a ligatured super italic ff logo of alien origin, which then doesn't copy and paste well at all. at least, that is what i've run across in my interdimensional travels.)

Impossible to download the feed

Hi,
indeed this causes my RSS feed reader (thunderbird) to classify the feed as not well-formed (check here). This renders impossible to download the new items or register to the feed.

Bye,
Tarch

tl;dr

but given that i just started being forced to use cassandra, and somebody else's app code on top of it, i'm hoping this thing hurries up and pans out. i just can't get behind this whole eventual consistency and no acid approach. :-)

Nobody uses ACID

At least, not full ACID. Even the big SQL DBMS vendors don't provide full ACID by default for transactions - for example, a multiple-query transaction will not have full isolation in Oracle unless you specify it. ACID just costs too much. Besides that, ACID is far too difficult to integrate with actuators and FFI.

I've grown rather fond of BASE (basically available, soft state, eventual consistency) as an alternative to ACID. But it is difficult to verify that programs will will actually exhibit eventual consistency.

CALM achieves BASE by construction, leveraging the fact that monotonic additions to a dataset are order-independent and thus eventually consistent. (Or, at least it does so on a per-instant basis. Temporal logic is used to model non-monotonic updates and indeterministic delay for IO.)

Unfortunately, I don't see this working for distributed programs written in Bloom. Channel IO in Bloom introduces indeterministic delay. If different replicas of a program use different indeterministic delays for channel operations, they would become inconsistent.

My name is nobody

I think "nobody uses ACID" and "ACID just costs too much" are strange claims.

As much as I admire the nice properties of an eventually consistent architecture, I don't think you'd want it for the general, ordinary case.

Take a common CRUD application: the user posts a new item into some list, and the app then presents an updated view of the list. With EC, there's no guarantee that the item, although successfully posted, will appear in the list. This is simply unacceptable from a user experience standpoint.

Two of the major database-as-a-service offerings, Google App Engine and Amazon SimpleDB, both support consistency. GAE from day one, SimpleDB added it later as an option.

GAE also supports "full ACID", albeit limited to "entity groups" - sets of records manually demarcated by the programmer to belong to a transactional scope. I still call that full ACID, but maybe you'll differ.

As to the cost: my hunch is that ACID may "cost more" performance-wise, but should easily make that up through saved development costs.

With EC, there's no

With EC, there's no guarantee that the item, although successfully posted, will appear in the list. This is simply unacceptable from a user experience standpoint.

The counter-argument is that without EC, you can't make your application robust enough to guarantee that your user will be able to access your program at all. A slight delay suddenly seems ok.

Well, not EC specifically, but one of the CAP properties.

Availability doesn't always trump consistency

without EC, you can't make your application robust enough to guarantee that your user will be able to access your program at all

Yes, but availability doesn't always trump consistency. Would you rather not be able to access your e-banking account, or see an inconsistent view of it?

A slight delay suddenly seems ok.

There's no bound on the delay.

Posting an Item

With EC, there's no guarantee that the item, although successfully posted, will appear in the list.

The 'eventual consistency' property by itself does not make this guarantee, true, but your user application could still make that guarantee and be compatible with EC. For example, you could immediately see your post, but it may be that other posts from concurrent users will be injected before and after it over the next few seconds; EC would only ensure that all concurrent users have a consistent view after the updates settle.

Eventual consistency is, in fact, compatible with real-time programming within a given subsystem (such as a user application).

Posting another item

(I was implicitly (and am) making the assumption that the context is plain, stateless web apps. Because it's so common. In other paradigms, this issue may not be as pressing.)

Yes, you could make that work in a simple way for one item: just make it "magically" appear in the list, even though it's not (yet) really there. This is easy because you just received the item from the user, so you have it in memory/"core".

But now imagine the user posts another item. So now you have to make 2 items magically appear in the list, since the first may still not be in the list, due to EC. Again, it's easy to make the new, second item appear in the list. But - assuming the stateless web app model - where do you get the first item from?

This may seem like a contrived example, but I believe it shows how EC pushes complexity into the app.

(Another way this "magic" could break down is if you require some server-side processing of an item (such as assignment of an ID) before you can display it.)

I don't want to push this too hard, but I have tried to write web apps using EC, and found it too painful.

Stateless apps

Given that EC is developed to support partitioning tolerance and availability (for example, the ability to add items to a list while offline then restore consistency once everyone is back online), it would be more idiomatic to assume a stateful application, or at least something like a distributed data store or unum model for the shared 'list'. Even if the app itself is stateless, the 'post' operation or equivalent would certainly hang around until acknowledged.

I'll grant that the HTTP+HTML+JS+DOM+CSS stack with a monolithic RDBMS backing store is ill-adapted to eventual consistency. But we're PL folk! If we would like to use or experiment with EC, and it seems 'painful' to use, we see that as a challenge to develop a programming language and technology stack that solves our problem.

Programmers don't understand accounting systems

All real-world accounting follows an eventually consistent-similar process where you reconcile your books at the end of an accounting "period", for regulatory purposes. Each party can also confirm if the data is consistent at the end of a period, based on the end-of-period report. Similarly, when you POST something to the general ledger, you have a journal entry indicating what was done when and for who. This can be used later on when doing reconcilation. This is similar to designing a RESTful interface with good POST semantics (idempotency). In fact, I am a firm believer accounting systems are the best example of REST for business people. If somehow the POST is submitted twice, the business logic layer has an additional layer and way to check for redundant TXNs.

Also, your bank statement on your debit card will usually fluctuate with missing transactions. Usually, if you check your statement online and use a debit card to pay for gas, you'll see a minimum $1.00 charge on your account with a * next to it. That * indicates that it is a placeholder for a payment, and all self-pay gas pumps charge your card $1.00 just to see you have funds available and that the card works, before they let you pump your gas. Within 3 business days, it is gone.

Useful

The language in this paper, Bloom, is based upon temporal logic programming, specifically upon Dedalus, which adds time to Datalog. Bloom contributes structure to support modularity, networking/FFI (via 'channels'), and timers to standardize persistence and interaction between systems. Bloom uses relations (sets of tuples) rather than a global data space.

Bloom has a surprisingly stateful and procedural 'feel' to it for a language that purports to be fully declarative. Arbitrary 'modules' (basically, objects instanced at compile-time) may introduce state and delay and request periodic timer events. Channels automatically introduce temporal indeterminism, so what you'll have are little islands (or 'vats') of determinism with indeterministic delay between them. I guess I was a little disappointed by this; I'm rather fond of controlling side-effects, and I'm skeptical of my ability to do any real reasoning about behavior of a large program over time.

I do appreciate the collection and dataflow oriented nature of Bloom. As procedural languages go, I suspect Bloom would be a rather nice one - no branches (contingent behavior is possible via logical joins), no explicit loops (looping feasible with time-step functions and periodic events; whole-collection operations), simple semantics.

Other paradigms with similar properties to temporal logic include temporal concurrent constraint programming, functional reactive programming, and reactive demand programming. Temporal databases would fit in there somewhere, too.