First Class Relationships in an Object-oriented Language

First Class Relationships in an Object-oriented Language, by Gavin Bierman and Alisdair Wren, was a paper published at ECOOP 2005.

They show how to add relationships as a first-class mechanism to a Java-like language, where by relationships in something like the UML sense. Cribbing from their examples, you might have a Student class and a Course class, and an Attends relationship, which is a binary relation between Students and Courses. Then, there are mechanisms to dynamically add and remove entries from a relationship, and to query a relationship about whether particular objects are in them or not.

Now my personal random editorialisation: I think this is a really interesting idea, because these kinds of things pops up all the time in OO models, and having relations be first class means that you can move some state out of individual object instances, which is always a good thing, and the type system can guarantee something about what relationships will hold. The things that scare me about this idea are first, that you can potentially add a lot of heap pressure by keeping objects live longer, and second, that you now have much more pervasive aliasing of objects in your program. I don't know if these are real problems though, and anyway the feature is cool enough that I'd be interested in programming in a language with support for it, just to see what it's like.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Link is broken

It just points to the LTU homepage.

EDIT: this seems to be the actual paper.

Thanks, I fixed it.

Thanks, I fixed it.

Interesting. Especially

Interesting. Especially given that I've just spent a significant chunk of the morning grumbling about how broken equals is in Java. :-)

inform 7

having relations be first class means that you can move some state out of individual object instances, which is always a good thing, and the type system can guarantee something about what relationships will hold.

Inform 7 (which may be a little too domain specific for most) has something of this sort. Some samples from the manual (chapter 12):

many to one:

Pet-ownership relates various animals to one person (called the owner).

one to one:

Marriage relates one person to another (called the spouse).

an equivalence relation:

Nationality relates people to each other in groups.

expressing a query concerning current events:

the number of steps via the friendship relation from George Bush to Saddam Hussein

or modeling more classical concerns:

Divisibility relates a number (called N) to a number (called M) when remainder after dividing M by N is 0. The verb to divide (it divides, they divide, it divided, it is divisible) implies the divisibility relation. The verb to be a factor of implies the divisibility relation.

...and its caveats

Some commentary from Nelson on the tradeoffs and potential drawbacks of relation support:

Inform uses a different algorithm for finding routes ("the next step via R from A to B") in each of these cases, and internally it stores relations in different formats in the different cases, because it makes a big difference to the efficiency of Inform to minimise the storage required for a relation and the time taken to explore it.

All the cases are benign except for "various to various" - the most useful - and for its closely related symmetrical version, "relates... to each other". Inform cannot afford to assume that the relation will be "sparse" (that is: that no vertex will have more than a certain number of arrows, or that the total number of arrows will be small), because it has no idea how the arrows will come and go during play. Gratuitous various-to-various relations are therefore not a good idea.

The worst case in finding routes from A to B is when almost every vertex can reach B, some across long trails, but A cannot. In the case of finding routes across the game's map, this must be multiplied further by the number of possible directions - usually 16.

This does not sound too awful, but if one is trying to find (say) "the most distant room from A", that means a further loop and now the running time will be D times N squared. Extension writers will need to be careful of this kind of thing: it is easy to write highly cool prototypes which work terribly slowly on larger, more realistic maps.

Good stuff

Attributes and relationships are different things, yet most languages/technologies only provide one way to represent both. In the OO world, it's composition (including composition of pointers--many OO languages only compose pointers). In the relational world (or more specifically, the SQL world), it's relational tables. The former is not well suited to expressing relations (which are frequently not intrinsic properties of objects); the latter is not well suited for representing attributes. This dichotomy is IMHO one of the two issues causing the OO/relational impedance mismatch.

(The other is how mutable state is managed in a distributed system).

Obvious extensions to this work to consider

1) Addition of Horn clause-style rules and such, ala Prolog. In other words, if one has a ground relation IsParentOf, one can thus define the relations such as IsAncestorOf, IsChildOf, etc. Efficient support of negation is left as an exercise for the reader. :)

2) Application of linear types, and other substructral types, to the issue of restricting multiplicities.

3) Bringing the full relational algebra to bear on this.

horn clauses, à la ADVENT

as the divisibility example above showed, Inform 7 has Horn clause-like things. Another example from the documentation runs as follows:

Nearness relates a room (called A) to a room (called B)
when the number of moves from B to A is less than 3.
The verb to be near implies the nearness relation.

Instead of listening when the location is near the Sundial:
say "You hear a splashing of water."

(EDIT: original example obsoleted by next post)

is .666 a decent batting average?

In implementing Dijkstra's algorithm in Inform 7, it appears this list is not so daunting.

1) By introducing the concept of 'near'

Definition: a node is near if its total distance is 10 or less.

we get 'nearer', and even the universal property of 'nearest', for free.

To continue Dijkstra's algorithm:
        let n be the nearest examined node;
        now n is found;
        repeat with m running through the nodes contacting n begin;
                examine m from n;
        end repeat.

2) By declaring

Following relates various nodes to one node (called the precedent).

the multiplicity is restricted, and saying now m follows n in

To examine (m - an examined node) from (n - a node):
	let d be the extension of n to m;
	if the total distance of m > d begin;
		change the total distance of m to d;
		now m follows n;
	end if.

does what we want when a shorter path is found.
3) is left as an exercise for the motivated LTU-er (or Oxford language-crafter?).

How so?

"Attributes and relationships are different things"

How so? What do you see as differences?

"relational tables [are] not well suited to expressing relations"

How so? I'm assuming you can't mean that relational tables are a poor expression of [mathematical] relations.

It has always been my experience that one of the strengths of SQL is that it supports the use of relations, and that that provides a uniform way of querying and manipulating both attributes and relationships between entities. I don't recall anyone ever claiming there's a hard, theoretical difference between entities and relationships; are you claiming there is one?

As far as the so-called "impedance mismatch" it strikes me that it's more about the differences between SQL's structural type system and most OO languages' nominal ones. Coupled with the fact that OO languages typically have no support for relational abstractions, such as join.

To answer your questions:

I think we agree more than we disagree... but let me explain better.

First of all: Either you misquoted me, or I connected the wrong antecedent to the wrong clause. I met to say that relational tables aren't well suited at modelling attributes, not relationships. What you quoted me as saying is obviously absurd. That's OK, I'll take the blame for not being clear. :)

Second: No, I'm not suggesting a hard line between relationships and attributes, though I'll expand on this below. This is one of those things that is often in the judgement of the programmer or data modeller.

The following isn't very precise--it's a philosophical (sort of) argument, but here goes. Before hand, I'll note that I'll use the term relation in the mathematical sense--a (possibly-partial) binary predicate of various arities. Relationship will be defined as below. I'll also distinguish between entities and data; the former are things with a unique identity, the latter are things which are only important for their value.

Attributes, for purposes of this discussion, are those bits of information which are intrinsic to a particular object. My eyes are blue. My those things which are intrinsic to an object, and or data about an object. Attributes are often (not always) relations where at most one argument is an entity.

Relationships, on the other hand, are information which is external to a particular object; either involving multiple objects or being dependent on a particular context. Frequently, relationships are relations involving two or more entities.

Again, the above is a guideline, and there are obviously cases which violate these principles. I (an instance of some Person type) have a nose, which might be considered a distinct entity. However, the correspondence between me and my nose is one-to-one; the correspondence between me and my fingers is ten-to-one. On the other hand, my salary is not an entity, it's just a number--but I wouldn't consider it an intrinsic part of me. Indeed, one can express the salary relationship as a function with two entity arguments--myself and my employer (were I to take a second job, I'd have two salaries to consider).

Relational tables (as a means of specifying relations) can certainly be used to model attributes. However, it can be argued that doing so--especially in the context of application code rather than datastores--is inconvenient. It also is problematic with non-Liskov subtype polymorphism. Relational tables are excellent (far better than pointers) at modelling relationships between entities, in particular when the relationship is other the binary, ''n''-to-1 variety.

Object composition excels at the case where the item being composed is either data, or is in one-to-one correspondence with the enclosing object. It is barely tolerable in the ''n''-to-1 case (as this introducing aliasing), requires use of containers in the 1-to-''n'' case, and a royal pain in the butt for ''n''-to-''n''. It is also a problem for relationships not anticipated by the designer of the class, is frequently not reversible, and breaks down completely for relationships of higher arity than two. So, I certainly see the value in relational tables and other explicit ways of expressing external properties involving objects, and desire for these to become more commonplace in programming languages and just not in database systems.

Converse-wise, I wish it was more convenient to stuff complex data into a database column. I generally like very much what Date and Darwen propose in TTM (and it's recent revision), and look forward to more unification between the app and data worlds, from both directions.

To sum up--basically what I'm saying is that having more tools in the toolbox is a good thing--especially when I only have to specify what tool I'm using once.

This would appeal to me more, aesthetically

if they went the whole hog and made everything a relation. Doing a lot of relational data modelling does incline one to notice the abitrary nature of any distinction between 'objects' and 'relationships', beyond a blurry human-language semantic distinction.

What I'd like would be an object-oriented version of the relational algebra - a language designed specifically to make the whole object-relational-mapping can of worms work as smoothly and elegantly as possible. An attempt to bring the semantics of either one closer to the other, and have them meet somewhere in the middle. Alongside this a powerful relational database system baked into the language which can persist data, enforce constraints, index things, and makes RAM vs disk storage an omptimization decision.

Make it also a pure functional language with something like haskell's libraries for concurrency with transactional memory, built in and working transpararently with the database; put some real effort into making the syntax as accessible as possible; design the type system around the requirements of the language rather than vice-versa - and you may just have the ideal next generation 'enterprise' language.

In my (of course humble) opinion.

Perhaps this should go into a new "what's your ideal language" thread :)

implementing this in c# extension methods

I have a blog entry on this from last year which shows how easy this is to implement in c# using extension methods.
http://blog.lab49.com/?p=237

A broader context

To me, there is a broader context to this: the idea that languages should grow to support greater "richness" - particularly with regard to common needs of OO systems. There a a number of common needs, including relationships, object lifecyle events, validation, orginal value tracking, databind-ability and so on. Mats Helander lists several here. I've implemented a number of them here, all based on one core abstraction (which was an interesting exercise), but I haven't got round to relationships yet.