The Left Hand of Equals

The Left Hand of Equals, by James Noble, Andrew P. Black, Kim B. Bruce, Michael Homer, Mark S. Miller:

When is one object equal to another object? While object identity is fundamental to object-oriented systems, object equality, although tightly intertwined with identity, is harder to pin down. The distinction between identity and equality is reflected in object-oriented languages, almost all of which provide two variants of “equality”, while some provide many more. Programmers can usually override at least one of these forms of equality, and can always define their own methods to distinguish their own objects.

This essay takes a reflexive journey through fifty years of identity and equality in object-oriented languages, and ends somewhere we did not expect: a “left-handed” equality relying on trust and grace.

This covers a lot of ground, not only historical, but conceptual, like the meaning of equality and objects. For instance, they consider Ralph Johnson on what object oriented programming means:

I explain three views of OO programming. The Scandinavian view is that an OO system is one whose creators realise that programming is modelling. The mystical view is that an OO system is one that is built out of objects that communicate by sending messages to each other, and computation is the messages flying from object to object. The software engineering view is that an OO system is one that supports data abstraction, polymorphism by late-binding of function calls, and inheritance.

And constrast with William Cook's autognosis/procedural-abstraction view, which we've discussed here before.

The paper's goal then becomes clear: "What can we do to provide an equality operator for a pure, autognostic object-oriented language?" They answer this question in the context of the Grace programming language. As you might expect from some of the authors, security and trust are important considerations.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The first link is to

The first link is to LtU.org. I assume it's supposed to point elsewhere?

"Mystical view" doesn't sound very charitable.

Fixed, sorry about that.

Fixed, sorry about that.

Glad to see the paper mentioned EGAL ...

... Henry Baker's equality operator that makes mutable objects EGAL iff they are the same object, and immutable objects EGAL if they are value-equal or their components are EGAL. The nice thing about EGAL is that its truth value, like object identity, never changes. It even gives a meaning to equality of procedures: two closures are EGAL if their code objects are instruction-for-instruction the same, and their immutable variables are EGAL; but if there are any mutable variables in either closure, they are not EGAL.

However, I don't think the authors' definition of value equality (which they say they are uncertain about) as equality of public fields makes much sense. To begin with, who uses public fields nowadays? I think the right approach to value equality in an autognostic system is to have a method 'value' that returns an integer that depends on all the components (not a hash value, because two objects can hash to the same thing even if they are not value-equal). Then if you are asked if you are value-equal to another object, you ask the object for its value, ask yourself for your own value, and compare the two.

Some good points

Part of what makes equality hard is that there are just so many different meanings of the word, and they're not all equal ;-)

Immutable values are easier to reason about, but even with them, there are questions. Take for example two immutable sets of data: Are they equal if they contain the same values? Yes, I would argue, until you point out that one of them is an ordered data structure and the other is not.

In the OO world (e.g. Smalltalk "=", Java "equals()"), we often ask one object if it is equal to the other. Now we are really in a conundrum, because if we ask the unordered set if it is equal to the other, it is only too happy to answer "Yes!!!", while asking the ordered set the same question would result in a resounding "No." Of course, you could always ask each of them in turn, to make sure that they agree, but that seems almost criminally inefficient.

Should equality be reflexive? It isn't in Java (see NaN for example).

Should equality be symmetric? It isn't if you have to ask one of the two objects to evaluate equality.

Should equality be transitive? Ironically, if the above two properties cannot be guaranteed, transitivity seems almost pointless.

A couple points:

* Having two (or even eight) explicit forms of equality (reference versus value) is (IMHO) horrid.

* The David Ungar quote is beautiful, but it is also not reflective of real world use. ("Basically, given two references, A and B, A == B implies that for any message M, you could send M to A or send M to B and there would be no observable change in the future response to messages of the system.") It's a very dogmatic interpretation of equality, and would preclude our sorted set (from the example above) to be used in place of an unsorted one.

* The Andrew Black quote (“An application ...may have its own notion of identity that differs from the underlying system”) is spot on. I think this points to the kernel of the equality problem.

I got excited when they acknowledged that a single equality operator for a language was a good thing, but unfortunately the paper quickly devolved from the abstract perfection of that suggestion to a plethora of equality approaches, and we were right back where we began.

I do think that there is a solution to this problem for strongly typed languages, and I would propose that the solution is shockingly simple: base equality on the compile time type of the objects being compared, not on the runtime type. This is the model that we adopted, and thus far, it has not shown any holes. However, until used in anger, its weaknesses will not be exposed.

up to

It is not surprising that equality has so many variants, because it essentially encodes the notion of equivalence. And because equivalence is formally always "up to" some system of observation, the question becomes how to encode such a system. If this is true, then there should be even many more variants of equality.

Yah, long list

I agree with that. I once made a list. Didn't end. Leibnitz equality, structural equality, confluence equality, intentional and extensional equality, ..

I would be interested in someone doing an analysis of it.

But the lens ...

But the lens that you're looking through when you ask the question "are these two things equal" ... that lens seems to be the type of the things themselves, i.e. the aperture through which you see the things.

How lyrical?

I am not sure I get you. But let's continue! If math fails, maybe mysticism will help!

You say "the lens is the type of things themselves", I say "the type doesn't describe the thing!"

(EDIT: I suck at this. Forget it.)

Sorry, analogies suck

By "lens", I mean the _means_ of interacting with a value, an object, a reference, whatever. In C++, perhaps it's the pure virtual class type that a pointer is defined with. In Smalltalk, it's the messages that the caller is certain that an object can respond to. In Java, it's the interface type that (e.g.) a parameter is declared with.

So while an object may be capable of much more (e.g. a sorted order) than you know of it, the "lens" that you see it through says "this is a set". So when you compare it to another thing through the same lens, you really don't care about their order or the lack thereof, because you are viewing them both simply as "sets".

You seem to argue that an

You seem to argue that an ordered set _should_ answer "No" if it is asked whether it is equal to an unordered set with the same elements. If I am interpreting that correctly, why do you think so? It seems like a reasonable design choice for it to answer "Yes".

public fields

I guess it depends on what you mean by "public fields". I assume they're talking about the kind of public fields that exist in languages like Scala or Kotlin, where using what looks like a public field is really syntactic sugar for a private field with a public getter (and a public setter if the field is mutable), or like ES6, where the distinction between fields and accessor methods exists, but can be safely ignored in most situations.

IMHO C++ and Java are really behind the times in this area. It bugs me because using explicit accessor methods introduces a lot of boilerplate definitions, and it severely limits the usefulness of field-access syntax, forcing developers to use method-call syntax instead. Method-call syntax is more verbose and less visually distinctive than field-access syntax, resulting in a small but significant loss of readability in almost every nontrivial piece of code in those languages.

Good points. I have no

Good points. I have no qualms with C++ fields, since the only "hiding" done from a C++ developer is the encapsulation and hiding done by the C++ developer (i.e. in C++, one hides underlying fields and structures simply to maintain one's own sanity, with varying degrees of success, and nothing hidden can stay hidden in C++ if one chooses to dig for it.)

In a higher level OO language, I'd prefer a model that separates the virtual domain entirely from the non-virtual domain, such that a concrete concept like a field simply disappears. Ideally, the need for storage (struct) for the state (fields) of an object should be infer-able from the design of the object. By doing so, the storage aspects disappear from the interface of the object entirely, leaving a pure behavioral interface, which itself is virtual in composition. (I hope to know soon enough how this approach works out ;-)