archives

On Understanding Data Abstraction, Revisited

One of the themes of Barbara Liskov's Turing Award lectue ("CS History 101") was that nobody has invented a better programming concept than abstract data types. William Cook wrote a paper for OOPSLA '09 that looks at how well PLT'ers understand their own vocabulary, in particular abstract data types and concepts that on the syntactical surface blend to all seem like ADTs. The paper is On Understanding Data Abstraction, Revisited.

In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called “On understanding types, data abstraction, and polymorphism”. Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.

The Introduction goes on to say:

What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”. This response is consistent with most programming language textbooks.

[...]

So what is the point of asking this question? Everyone knows the answer. It’s in the textbooks.

[...]

My point is that the textbooks mentioned above are wrong! Objects and abstract data types are not the same thing, and neither one is a variation of the other.

Ergo, if the textbooks are wrong, then your Dinner Answer to (the) Cook is wrong! The rest of the paper explains how Cook makes computer scientists sing for their supper ;-)

When I’m inciting discussion of this topic over drinks, I don’t tell the the full story up front. It is more fun to keep asking questions as the group explores the topic. It is a lively discussion, because most of these ideas are documented in the literature and all the basic facts are known. What is interesting is that the conclusions to be drawn from the facts are not as widely known.

Gilad Bracha on "Atomic Install"

Atomic Install from Bracha's blog.

I'm happy to see people talking about this, mostly because it's a pet topic of mine. I think too much ink has been spilled trying to shoehorn "dynamic languages" (by which I mean the usual suspects: Smalltalk-esque, mostly, and Python, PHP, etc.) into the vocabulary and mindset of type theory. Meanwhile, the operational model of these languages is really fundamentally different, and they are mostly quite similar: the semantics defines some run-time data model for programs, and then the program updates that model, usually in a fairly straightforward procedural way. In this sense, most of these languages can really be viewed as data description languages in disguise. Ruby is a great example of this: all of it's OO constructs are best viewed as syntactic sugar (semantic sugar?) over a fairly simple set of hashtable updates.

(Of course, the runtime data model may be abstract, as in JIT compiled code, and the runtime may not need to actually build and maintain it in a naive way, but that's hidden from the programmer.)

Anyway, given the above, it has for a long time struck me that a better direction for providing safety and static checking in such languages may be to borrow techniques from transaction processing. One might define suitable notions of data consistency and then insist that at particular semantic checkpoints, those consistency properties are guaranteed to hold. Atomic install is exactly what one such a mechanism might look like.

Is there work in the PL literature on this topic?