## Objective scientific proof of OOP's validity? Don't need no stinkun' proof.

Having just discovered LtU, I've been reading many of the very interesting topics on here. As a reasonably accomplished programmer myself, I find the discussions on the validity of OOP and OOAD particularly thoughtprovoking, especially those that refer to published articles that speak out against OO.

Having been a programmer in some sort of professional capacity for the better part of a decade, I have made the transition to OO myself, initially with a lot of innertia from my procedural ways.

In my own work and play with programming, I have absolutely no doubt that it is a wonderful way of designing and writing code, and I have myself reaped the benefits of OO that are it's raison d'etre. Sure, much of my work is still done procedurally, but it's done mostly because of the even-driven nature of modern GUI software. But when I create a complex system, I can now only ever think in terms of modelling system structure and behaviour after the real-life system's it's designed to automate/facilitate. OO techniques are perfect for my type of work, and I could never ever go back to pure procedural ways.

What bothers me is this: For me as a programmer, OO is a wonderful, useful thing, that saves me lots of time and mental effort. Why then, are some so vehement in their critique of it? Does that mean that as a former procedural programmer, my ways were so bad that 00 helped to make me better, and if OO is still bad, does it mean that my choice of paradigm brands me as a hopeless monkey?

If OO is so bad, then, is there some other panacea that I am not seeing? Personally, I need no scientific, objective proof that OO is worthwile...I can see and feel the improvement between the old and the new. If other programmers (as I assume the authors of aforementioned articles are employed) are not seeing that improvement, what are they measuring 00 against?

Or perhaps I am simply not grasping the critique properly?

## Comment viewing options

### A better explanation of forloops in ML

The MLton project has a nice page showing how to build ForLoops in SML.

### FP slow

It seems that SML compiled with MLTon, Ocaml, Clean, and Bigloo Scheme are very fast (as in "C fast"), if what I read has any resemblance to reality. Bigloo has a great IDE too.
And the "slower" ones often beat the scripting languages.

### Fabian Pascal On OO (especially databases)

In ON DOCUMENT- VS. DATA-BASES Fabian Pascal states:
Types are things we can talk about. Relations are sets of statements that we can utter about those things. What can you do without the latter? The problem of OO is that they have just types, no relations. ... What is the atomicity, selectivity, and correctness for a document base[xeo: or OO database]? ... The main objective is inferencing (manipulation) and integrity...

Relational databases have a single logical model; OOP applications and OO databases do not. Consequently OOP database management systems and OOP applications (even in similar application areas) are different in that each has its own query language and its own peculiar API for navigating the database.

### Interesting....

Venturing off the topic of this thread a bit...

While I'm not all that familiar with the plethora of "OODB" products out there; I haven't been impressed with what I've heard. But the usefulness of an OODB has little to do with the usefulness of OO; the relational model is a great way to model data but a horrible way to write application code.

It's often suggested, by Pascal, Date, and others, that OO (when used to model data relationship) commits the flaw of excessive denormalization--of using pointers (explicit as in C++, or implicit as in Java) to model relationships as well as entity modeling. Think of the difference of "has-a" and "uses-a". In Java and most other OO languages, the two are implemented using the same methodology--object composition. Often times, relationships between entities (the "uses-a" case) are often better modelled externally--which is one of the points of the relational model. The criticisms of the OO approach to relationship modelling--pointers pointers everywhere--are well-known, and most of them are rather valid. (By "pointers" I mean both the explicit kind as in C++, and the implicit kind as in Java.)

Ugh.

Of course, most databases (SQL ones in particular) COMPLETELY throw out the baby with the bathwater here. SQL is infamous for providing an extremely crude type system; in which one must use relations for both relationship modeling (good) and entity modelling (bad). Many relational tables are used to model the "has-a" case; which is probably better handled by a more advanced type system (as is found in OO and FP languages). When mixed with subtype polymorphism on the application side (which is quite useful), the natural results are things like Date and Darwen's "great blunders"--horrible hacks like "table inheritance" and such, wherein polymorphic class hierarchies are mapped directly onto relations.

Double-Ugh.

Date/Darwen seem to understand that introducing complex domains into the relational model (and not mapping application classes into database relations) is a good idea (though the type system they propose for Tutorial-D is still rather crippled). But in their research for TTM; they seem to have lost site of the forest and are focusing on the trees.

The point (and the forest is), at least IMHO:

* Objects/classes/types/domains should be the preferred way of describing entities. The four tires on my car are components of the car; the most natural way to model them is to have four instances of Tire in my Car object. In other words, the OO way.

* Relations and the relational model should be the preferred way of describing relationships. The fact that I own a car does not make the car a component of me; it's better (all things being equal) to have a relation somewhere associating cars with owners; which can describe easily that I own three cars; and that my wife is also co-owner of the same three cars. Neither me or my wife should (in a perfect world) have a Car object in our respective Person instances. In other words, the prefered method in this case is the relational way.

Too many people (and too many toolsets) make us choose between object modelling OR relational modelling to solve problems. Both OO and relational fans claim that the fact their methodologies can model both entities and relationships is a strength (and allows both to be used without the other). But the first thing MIS programmers learn in entry-level database class is how to do E/R analysis, and to keep entity and relationship tables separate, even though their tool (a SQL database) treats them as one and the same. Likewise, OO programmers working on UML diagrams know the difference between has-a and uses-a, but has to code 'em up the same way.

And THAT, more than anything else, is a big reason why the OO-relational impedance mismatch is such a problem.

### Bravo!

• Objects/classes/types/domains should be the preferred way of describing entities.
• Relations and the relational model should be the preferred way of describing relationships.

Couldn't agree more. It's a pity there isn't any multiparadigm language that does a good job of integrating the object and relational paradigms (hoping to be contradicted). Perhaps it could be based on a semantic data model.

(When I saw the chapter on relations in CTM, that's what I was hoping for, but it's about relational constraints rather than relational algebra or data modelling.)

### Hmm, what's wrong with storin

Hmm, what's wrong with storing the relationships in a hashtable or similar datastructure, and using an operation on the appropriate classes to chase-up the relationships?

### Implementation detail?

Nothing; that sounds to me like an implementation detail.

Note; when I say "relations" and such, I am not necessarily speaking of an RDBMS. Relations are a perfectly good data structure outside the context of a database, and for simple tables with one primary key--that's equivalent to a map, and a hashtable is a fine implementation of such, especially for large keysets. (Likewise, a relation where the entire record is the primary key is the same thing as a set).

Of course, one advantage of this approach is that such a hybrid relational-OO architecture IS more easily stashed away in a database. The main problem with existing SQL products, of course, is SQL... :) ...something which is a nice encapsulated class in your application still might get flattened into a record when it's stuck in the database.

A few issues for language designers, though:

* What is the access syntax? An advantage of using the OO model for everything (or the relational model for everything) is that the user need not know how a given component/relation of some entity is implemented. In OO it's just

myObject . myField

(or myObject.get_field() if you like strict encapsulation). In relational, it's just a SELECT statement (my SQL is a bit rusty, so I won't embarass myself with a likely incorrect example).

In the hybrid model, it would be a Bad Thing were the user to have to know whether something is a relation or a component. It would be a worse thing if changing your mind on such a matter required application code to be rewritten to reflect the choice. After all, there are many examples where it isn't clear if an attribute is internal or external (but we have to choose when implementing, and we may find we chose poorly); and there may be sound engineering reasons to implement an attribute differently than the default (much as one denormalizes a database for performance reasons). The question is--what syntax do you choose, and what are the semantics if a "conflict" exists?

### OO : Relational :: Oil : Water

The problem is that the OO model defines entities by their behavior. Any state they have is an implementation detail that should be hidden from the user. On the other hand, the Relational model defines entities by their (data) attributes. Any behavior they have is an implementation detail that is irrelevant to the object's storage and representation. You can convert data from one paradigm to the other, but I don't see how you can coherently combine them into a single system, seeing as how they have diametrically opposing views on representation.

[Note: Edited to remove markup error that was making everything italics]

And also read The Third Manifesto, if you haven't.

Entities are things that are properly defined by their behavior; and benefit from encaspulation. Relationships are, essentially, linkages between entites; they benefit from exposure (enabling joins and other relational operations) and seldom need encapsulation.

The use of relations (tables) to model entites in SQL databases is (IMHO, and in the apparent opinion of Darwen and Date) unfortunate. Howeverl SQL (earlier versions especially) provided extremely limited type systems (numerics, strings, BLOBs) with little or no support for building aggregate entities (something that OO excels at); the workaround for the past 30+ years has been to model entities with relations.

Conversely, OO models relationships with pointers and composition, which is useful (and efficient) in the 1-1 and N-1 cases, but obnoxious in the general case.

Your observations about the OO-relational impedance mismatch are correct, today. I readily agree that the current suite of tools available to the enterprise programmer and/or DBA are ill-suited for bridging the gap.

However, my claim is that this is a limitation of today's tools; not a fundamental limitation of OO and relational which must always exist. I believe there is much fruit to be borne by breaking down the barrier between the two worlds.

That said--some things need to occur for this to happen:

* The OO programmers love of unlimited and indiscriminate side effects has to cease. Functional programming is to some the ideal; though I believe that many of the elements of Tutorial D show a good middle way. D&D discuss quite a bit the difference between variables (which are mutable) and values (which aren't); a database is essentially a collection of relational variables (variables capable of holding values of type relation). Whether Date and Darwen realize this or not (their grasp of type theory seems to be limited), what Tutorial D accomplishes (and what databases themselves accomplish) is essentially a linearization of the mutable parts of the type system.

(Darwen wrote an interesting paper on what he calls the "4 out of 5" rule, which examines the tradeoffs between mutability, subtype polymorphism/subsumption, static typing, specialization by constraint, and object identity. They conclude that 4 of the 5 can exist simultaneously but not all 5; they propose abandonment of object identity--essentially, linearization of the type system. The paper can be found in the 2nd edition of TTM).

* The relational model itself may need some expansion. In particular, it must find a way to better accomodate subtype polymorphism (Tutorial D seems to only allow a limited form of this; based on invariant-strengthening). A few other relational constraints that could stand loosening (IMHO) include the rule that foreign keys in one table must all refer to tuples in the same table; if this was loosened to the requirement that they must refer to tuples of the same "shape" (type), not necessarily in the same target table, that would make me happy.

* And, the attitudes among some relational and OO practitioners that the other side is somehow weird and/or wrong, need to cease.

In short, while relational and OO might today be oil and water; I don't believe that this state of affairs need be permanent. Of course, given the billions of dollars worth of data, tool investments, and programming NRE that is invested in the current state of affairs, there are lots of economic hurdles which currently exist.

### Example...

So give us a concrete example of some entities and their relations (and how you would use them) that is not well-served by the OO or Relational model.

### The weaknesses of both models are well-known.

OO is rather lousy at handling N-N relationships (and 1-n relationships between entities. 1-n can be done internally to the object by using collections; though a common workaround is to invert the sense of the relationship to a N-1, which can be done with a scalar pointer, but often introduces a dependency in the wrong direction.

N-N is often done with pseudo-relation objects (a big vector of pairs of pointers); else it gets denormalized into a pair of 1-N relationships, which can easily become inconsistent if you're not careful. Plus, reverse lookups can be a pain unless you further denormalize (i.e. introduce redundant pointers).

Modelling entities with relations is generally fragile, due to lack of proper encapsulation. Many SQL products force you to do this however.

### So...

...if you had a magical language that did everything you wanted, what would it look like? How would you spell out the things you want in it?

### I'll take a shot

I haven't read The Third Manifesto yet, but I'll see if I can find a copy. Until then, let's make a concrete example as fodder for discussion. Suppose we have a university course registration system. The entities we are primarily concerned with are Students and Courses. The relationships we want to model are is-taking and has-student (obviously, they are bidirectionally related, but for clarity let's name them separately). Suppose we are developing a library or middleware component that performs all of the fundamental operations on these entities relevant to this relationship. So we want to be able to do things like:

• student.remove(course)
• student.courses
• course.students
However, it would also be nice to do all the query-related searching that you would expect from an RDBMS. The trick is how to define a query language that fits naturally into a PL. How about set syntax? I'm not well-versed in Haskell or *ML, so pardon me if I play a little fast and loose with syntax that comes close to your favorite FP lang. But I have in mind something like:
  english_courses
= { c <- courses | c.name.beginsWith("ENGL") }


Now, let's talk a bit about implementation. The OOP Way (or at least one of them) would be to implement a vector of Student objects containing a Course vector of references to Course objects. Likewise, the Course objects would contain a vector of Student references. This leads to the inefficiency you describe earlier.

The Relational Way would be to create a Student relation, a Course relation, and a StudentCourse relation which joins the two. This is almost certainly how you would have to represent it on the DB side anyway.

Enter The Third Way. So in our code, we set up a vector (or list, if you prefer) of Student, a vector of Course, and a hidden vector of StudentCourse. Access to the student.course field would perform a search on the relation vector and return the appropriate sublist. Same goes for the course.student field. Alternatively, since the StudentCourse relation has more rows than either the Student or the Course vectors, we could leave it in DB and perform queries against it dynamically.

Here's where it gets tricky. The power of expressing the ad-hoc searches in the PL, and not the QL is that you get to call whatever functions are available in the PL, which is probably going to be a lot richer than what is available server-side. That means that for full generality, you have to perform the searching "in language". Unfortunately, you then lose the power of the query engine available in the RDBMS.

Ideally, we would want to be able to send the search criterion as a function to the query engine. That implies that the query engine should be aware of the client PL, a la PostgreSQL's installable languages. But that brings up the issue of whether the Object-Relational merger should occur in the PL, the DBMS, or both.

Anyway, thinking about it makes my head hurt, so I'll just leave it at that for now.

### Why not make a prototype in l

Why not make a prototype in lisp, or some other language with good syntactic extension, to explore such issues as to how ou transparently switch between a property and a relationship?