Impact of OO language style on code

A few questions come to mind while reading about Salzman & Aldrich's PMD which combines multiple-dispatch and prototype OO, while eschewing Java/C# style class-based OO. Any experience/thoughts with respect to both language design and language use?

  1. Seems like the advantage they claim for the prototype approach is that behaviour really is part of the object, not the class. This allows one to remove complex conditional logic from inside methods, replacing it with specialized methods based on state.
    1. Question: taken to the extreme, would this mean any time you have a conditional which tests member data, you instead make a new object? How crazy does that get for understanding the code? Or for the VM dealing with the combinatorial explosion of objects?
  2. (less directly related to the paper...) Both the network approach of OO and the relational approach can run into broken references.
    1. Question: Is it generally the case that the OO approach leads to deferred runtime exceptions or leaks, whereas the usual approach with constraint/integrity checking in a relational database gives you the errors more "up front"?
    2. Question: Could one have some sort of integrity checking for the network model that would help show errors earlier?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

You can see an example of #1

You can see an example of #1 when the "state pattern" is used in an ordinary object-oriented language. While it is possible to have an unbounded number of states, in many cases you have a fixed number of states and it suffices to keep around a few singleton objects representing particular states, rather than creating a new one at each state transition.

Broken references

By "broken references", are you referring to null pointers (and/or null pointer exceptions) in the OO context, and to foreign keys which do not correspond to a valid candidate key, in the relational context?

The question seems valid, but a bit ill formed. I'm guessing at what you mean here...

In statically typed OO languages, one can create a type system in which references are required to point to valid instances, rather than NULL. Many production OO languages don't do this--C++ and Java pointer may always be NULL (C++ references are not supposed to be null, but the compiler won't prevent things like Foo &foo = *((Foo *) NULL);--that code snippet is officially "undefined behavior" but many implementations will not trap it until you actually try to derefence the pointer behind the reference. Nice is a JVM-based language which has true "guaranteed" pointers; i.e. pointers which are never null. To answer question #2, I would say "yes", assuming that I understand you correctly.

In the relational context, you really don't have the phase distinction between "compile time" and "runtime" that you do in application development. Production RDBMSs are generally kept running 24/7, and the same tools and techniques for modifying data are generally used for modifying schemata. WRT foreign keys; any transaction which results in a "hanging" foreign key will be bounced when you try to commit it, but that (like most everything) happens at runtime. SQL complicates things in that any attribute can be set to NULL; NULL is part of the domain of all types in SQL. (At least in the past this was the case...) Rather than producing exceptions or transaction failures, though, nulls in the database are something that app code has to handle.

Thanks for the thoughts

Makes sense. I should look at Nice some time (I think I tried to try it once but it died on simple test programs for some reason). It does seem like the better default is to avoid null as much as possible, and work on making the concept more nuanced and explicit and checkable. It seems the issue of nil is a frequent issue (Lisp, SQL, etc.) unfortunately still.

I wonder if there is a really good, rigorous and yet concise way to approach the issue of nil? One which can be applied across computer science in general. Like, are things such as Java just jerks for ignoring it if such theory exists, or is it just basically a hard problem?

Sure.

Nulls can be modelled as the unit type (but so can other things, like "void").

If a language gives you Cartesian sums, then a "nullable" pointer to T is simply type (T + null).

Null pointers can also be modelled as a monad (a "Maybe" modad is quite similar to a nullable reference or pointer).

In the context of some languages (notoriously Eiffel), it is even claimed that nulls are like the bottom type--a strange claim, as the bottom type is by defintion empty; but the set of nulls is not. (Eiffel refers to the type in question as "NONE" and to its singular value as "VOID"...but despite the fact that the type is manifestly not empty, the Eiffel docs claim that NONE := bottom. See page 29).

Some interesting (though oft incorrect) discussion at this c2 page.

An discussion of the theoretical aspects of null in Java is here

Bottom

I'd think Bottom would by definition be "less than" all of the other types, but not necessarily empty. I think NONE meets this definition in Eiffel since all types in Eiffel contain Void (as I recall).

Bottom is uninhabited

Bottom is the type of an infinite recursion, or a CPS function that never returns normally, or of an error. There is no value of type bottom (it's uninhabited). The null type has one value: null. Instead of saying the null type is the same as bottom it's more correct to say that, in most of these languages mentioned, the null type is a subtype of most or even almost all types but never a subtype of bottom which is a subtype of all types. Many of the languages mentioned don't explicitly have a bottom type, but it's always there since you can have an infinite loop in any language.

Unit is also distinct. It is usually not a subtype of any type except top. It has one value, but the value is "useless." It indicates successful return from a procedure with no useful value. It's generally the return of procedures only used for side effects. In many of these languages it's called "void" and its value can't be assigned to an expression or passed as a argument to a function/method. Void shouldn't be confused with C/C++ void*, which is not a pointer to unit but a pointer to a value of the top type.

Note that Java has no top type in the language since there's no common super type for objects and primitives and unit.

For an example of a hierarchy with all these types, see the Scala spec page 130 with the following translation:
top = Any
unit = Unit
null type = Null
bottom = Nothing

The AnyVal/AnyRef distinction is Scala's way of distinguishing objects with reference semantics from those that don't. As a result Null is a subtype of AnyRef but not of AnyVal.

You're mixing up your

You're mixing up your bottoms, and there's no reason that the only value of the bottom type has to be the bottom value (which is what denotes things like infinite recursion).

How so?

I don't have a copy of Pierce handy, but Wikipedia (which I admit isn't particularly authoritative) agrees with my memory:

The bottom type in type theory is a type that has no values.

and

[the bottom type is used to] signal that a function or computation diverges; in other words, does not return a result to the caller...[or as] an indication of error...

Hence, bottom is a proper subtype of all types, even the null type, since a function that promises to return a null might actually diverge instead.

What Pierce says

(TAPL section 15.4):

It is natural to ask whether we can also complete the subtype relation with a minimal element - a type Bot that is a subtype of every type.

He goes on to show that we can and that this bottom type must contain no closed values in the type system under consideration. However, I think the wording here supports that the concept of bottom is borrowed from order theory, and so using it to describe the minimal element of another type system would be appropriate even it weren't empty.

This is an interesting question--

--must bot be empty, or if there is a type system with a non-empty minimal type; can we call that type "bot"?

Regarding the Wikipedia entry; I must admit to working on that page many moons ago. (So take it with a grain of salt; it's only as good as my memory at the time... it may have been edited since then).

It looks like I was wrong

He goes on to show that we can and that this bottom type must contain no closed values in the type system under consideration.

That's what I remembered, too. However, I don't remember his proof and I suppose it's possible that bottom is inhabited in some type systems other than the one he described. In fact, in Bounded Quantification with Bottom Pierce seems to say so himself

Bot is a natural type for the "uninitialized pointer" value of languages like Java: saying that "null has every type" is just the same as "null [is a member of] Bot."

Of course, he gives the usual caveat when dealing with Java's bifurcated type system

Strictly speaking, Java's null value has all object types, but not types like integer or array. The language studied here does not capture this refinement.

So I hereby recant: it appears that in some languages (Eiffel?) the null value can be a member of the bottom type; the bottom type doesn't have to be uninhabited. However, I'd add another area besides "primitives" in Java and many similar languages where the null type doesn't qualify as bottom. In Java, the following is illegal

void f1() {
return null; // won't compile
}

So the null type isn't a subtype of the void (or "unit") type in those languages, while the bottom type certainly is


void f2() {
f2();
}

It's worth noting...

... that while many type systems include the empty type and call it bot; *no* realistic type system includes the true universal type, top. While java.lang.Object is the supertype of all objects in Java, and GENERAL is the supertype of all terms in Eiffel--neither is the supertype of anything not defineable in the type system of the specific language. Java types don't derive from GENERAL, Eiffel types don't derive from Object; and neither type system contains real-world objects such as pink elephants (though both can model pink elephants, that's another thing).

So if top can be type-system dependent, so can bottom, I suppose. But it's still worthwhile to carefully distinguish between the empty type, and types which are minimal within a given system, but non-empty.

I don't get the complexity

I don't get the complexity of the discussion. Java and C# pointers are undisciplined option/maybe types:

type 'a option = None | Some of 'a

where null = None. Every access made by the virtual machine implicity pattern matches:

x.method()

translates to:

match x with
  None -> raise NullReferenceException
| Some of o -> o.method();;

One of the only issues I have with F#, is that it doesn't enforce this pattern like SML.NET does, and so eliminate null reference exceptions. The fact that nulls don't exist in functional programming is a major reason why functional programs don't fail as often as imperative ones.

Slate "on hold"

Slate the language that used PMD has been moribund "after loss of interest from the lead implementor." I'm not sure if this reflects on the difficulty of optimizing the dispatch algorithm in a highly dynamic environment. After being initially excited by PMD, the reality of it made it seem to me to be an argument in favor of generic functions.

Re: generic functions

What is your opinion about which part of the reality of it seems troublesome? If the optimization issue was taken care of, would PMD be cool again? Or is Prototyping (javascript, no?) just generally a bad idea? Is it only a bad idea when faced with having to choose between it and multi-methods? Thanks for any thoughts / experiences.

well

Not sure what you mean by a 'good idea', but the next version of ES4 (so JS2) promotes sealed classes by default: to use prototypes in the class system, one must go the extra step of annotating a class as 'dynamic'. This started with ActionScript 3 (Flash), which was motivated by the Flex guys wanting faster dispatch etc. Not sure why type-feedback wouldn't have sufficed, but so goes.

About Slate

I wrote that notice. Lee Salzman is the one who initially lost interest. PMD was not difficult in principle to optimize for; the problem was that we attempted to write a Strongtalk-style setup and succeeded at writing Strongtalk-style libraries while not really making any significant progress on an dynamic inlining framework to optimize the cost of significant message-passing layers (Lee was the compiler-writer and focussed on an elaborate and interesting compiler framework that he never used, and the VM we wrote wasn't suitable for movement in this direction).

So, while individual primitive message-sends were fast, average libraries were far slower than their counterparts. The Strongtalk experience explains that in detail.

unexpected scaling costs?

Brian Rice: ... average libraries were far slower than their counterparts.

Did layers add too much indirection? Did the layers or message passing touch too much memory? (I only ask because touching too much memory kills lots of things. It's hard to anticipate cache line behavior when things scale. Testing primitives in isolation can rack up nice looking numbers that don't apply later when cache lines are missed under memory load.)

I like to write methods that are small, and oo style encourages small methods, but this appears to have poor cache line behavior compared to making methods longer (with suitable use of __builtin_expect for likely/unlikely macros). Sorry I'm just sketching this idea for others: it's a good idea to empirically test what happens if you arrange code to keep going without interruption in key runtime paths. (Sorry if it's not directly related to Slate.)

Scaling Costs

The scaling costs were not unexpected, per se. The Strongtalk literature spells out that libraries from Strongtalk ran 30 times slower on other Smalltalk systems, but that libraries from other systems ran only 3 times faster. The fact is that a polymorphic inlining compiler removes the heavy message-passing overhead of an extremely well-factored system, which we had. An example was a polymorphic "isZero" method which abstracted calls to "= 0". The library was so abstract and teased apart into separate layers that the code-base was extremely easy to read, but extremely slow as it scaled.

So as far as we could tell (using simple profiling), the costs were entirely about speed, not memory issues. Slate images have been garbage collected with a mark-sweep-compact cycle, and often tried to minimize motion.

If I had some energy/enthusiasm left for working on the project at this point, I might try to identify/quantify this problem, but the simple fact is that inlining should have bought us a lot of advancements, and we've never had it.

From a point of view,

null means "doesn't exist". For instance, if a linked-list node's next member is null then the link's next member doesn't exist. In jargonic terms, null logically negates the existence of a indirectionally defined object. This is a very easy definition to write out in predicate logic. This model holds true when there is no memory leaks in any code.

Another possible treatment of null is "undefined".