Packaging Data And Methods

While studying OO at university, it was drilled into me that I should "package data with the methods that operate on that data".

In recent years I have come to wonder why so much emphasis is placed on this concept.

For example, most 3D Vector classes I have encountered look something like:

class 3DVector
{
private float X;
private float Y;
private float Z;

public 3DVector Normalize();
public 3DVector Add (3DVector other);
public float Dot (3DVector other);
... etc
}

This appears well organised.. Various methods that act on a 3D vector nicely packaged up together. I'm guessing most OO programmers would generally approach class design in this way.

But at the same time, its very hard for multiple vendors to add their own functionality to the class, they are forced to collaborate and merge their methods into the "current version".

In this instance, would it not be more sensible to have something like:

class 3DVector // Agreed, Never going to change
{
public float X;
public float Y;
public float Z;
}

---- Vendor (1)

3DVector Vendor1_3DVector_Normalize (3DVector v)
3DVector Vendor1_3DVector_Add (3DVector a, 3DVector b)
float Vendor1_3DVector_Dot (3DVector a, 3DVector b)

---- Vendor (2)

3DVector Vendor2_3DVector_Cross (3DVector a, 3DVector b)
float Vendor2_3DVector_Length (3DVector v)
3DVector Vendor2_3DVector_Dot (3DVector a, 3DVector b)

Then as a vendor, you are able to release new functions as and when you please.

As an application developer, you are free to pick and choose those functions that fit your requirements, or add your own.

The vendors would have to collaborate on the initial "Data Format" of a class, but once settled, an eco-system of functions would build up that operate on it.

I realise this is a simplistic example, and can see it would be hard to extend the idea to more complex objects with state. But I was curious whether there are any real-world OO systems that leverage this sort of thing, and how far they take it?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Definitely!

The entirety of the Common Lisp Object System works this way, in a deliberate attempt to better match the basically functional nature of Lisp than a more traditional approach to a class-based object system does. Also, Oleg Kiselyov points out why such a functional approach can be useful even in a language such as C++ here. Note, though, that his BRules system is purely functional, and that's important to its correctness, so the analogy to what you propose is relatively weak.

CLOS

Thanks for the info. I'll take a closer look at CLOS, I'd be interested to see how it scales to more complex objects.

More than just CLOS

This technique is (sort of) used in the 'object systems' of scripting languages such as Perl/Python. The subs/functions (methods) of the packages (classes) by convention take a self (object) reference as their first parameter.

By calling these methods in static context and deliberately passing in the self object (instead of creating an implied self via new), these methods are called in a functional style. Indeed, this is the way I mentally model method calls when programming in these languages (by assuming all 'methods' are static). Further, this helps to localise the notion of 'state', as per functional programming.

The reason for packaging data and methods together is...

...that originally object orientation was about message dispatching, and there was only one receiver that handled the message. Therefore, it made sense to have the receiver's context be implicit (this/self keyword), so as that the programmer did not have to address the current object over and over. It also provided a sense of security, since the fields of the current object were directly accessible just like local variables.

But, in reality, object orientation could work without packaging data and methods together. In fact, object orientation is nothing more than pattern matching on the data type.

Let's say we have a message M and a receiver R1 and receiver R2. Message dispatch can be written as (in a pseudo-language):

procedure M(R)
    if R is type R1 then
        R1_M()
    else if R is type R2 then
        R2_M()
    end if
end procedure

or in a pseudo-functional way:

let M R = R is R1 => R1_M R
        | R is R2 => R2_M R

Just like pattern matching is a selection of function based on a predicate, message dispatch is a selection of an implementation based on type; i.e. the predicate is the type.

The above can be implemented in various ways. C++ and Java implement it with virtual tables (that's why a message can not be sent to any object and we need interfaces); Smalltalk implements it with message dispatch tables (that's why any object can receive any message); CLOS implements it with a map (from what I know - please correct me if I am wrong) etc.

For me, it would be better if the transition of C to C++ kept the functional style. Not only it makes up for better code, but it seems more natural, and leaves the door open for a) open classes, b) multimethods; It also makes the code better looking, because with methods outside of the data, you have to use one less identation level.

For example, instead of writing

class Vector {
public:
    int x;
    int y;
    int z;
    Vector();
    Vector(int x, int y, int z);
    int length();
    Vector &operator = (const Vector &v);
};

one could write:

class Vector {
    int x;
    int y;
    int z;
}

Vector(Vector &v);
Vector(Vector &v, int x, int y, int z);
int length(Vector &v);
const Vector &operator = (Vector &dst, const Vector &src);

I bet that the transition from procedural to object-oriented style would be much easier if the functional style was kept (and the compiler secretly built vtables on the background).

The link about sets and bags really confirms my claims that OOP has (relatively speaking) failed; the conclusions mentioned at that site are in agreement with my experience.

Just like pattern matching

Just like pattern matching is a selection of function based on a predicate, message dispatch is a selection of an implementation based on type; i.e. the predicate is the type.

Interesting, I never thought of it that way.

Classes are values too.

Classes are values too. Each number is a different class. If we think about each integer, for example, it has a different representation in the computer, i.e. it is a different structure of bits, i.e. a different record i.e. a different class. Therefore pattern matching on values is no different than pattern matching on classes.

In fact, the basis of all computing are the types bit 0 and bit 1. A combination of those yields a new type. Arithmetic operations for these types can be considered as overloads: for example, 2 + 1 is an overloading of + for types 2 and 1, respectively. Integers are records of union types of all possible products of 0 and 1. Records are tuples of integers, classes are records etc.

I personally consider type and value interchangeable concepts. By using this concept, I came to the understanding that a programming language should have a type system that also allows specific values to be declared as types. Modern mainstream programming languages fail on this point: algorithms often imply subtypes of the types used, but they offer no way to the programmer to define those subtypes.

I have an idea to make a programming language that has no type declarations, and all typing is done with binary operator as applied over expressions. I think it would be a very good type system, catching way more assumptions than those of mainstream PLs. The compiler would simply check if the expression on the left side is compatible with the result of the expression on the right side...this has the added benefit of using functions as type declarations, thus making functions executable at compile time.

'as' / types <-> values

(BTW, I had to read that last paragraph a few times before I noticed 'as' was in a slightly different font :)

Could you expand upon this idea? E.g. what would some simple code look like?

This reminds me of a paper on category theory that I've been trying to read -- true "equality" of two given objects is not really interesting, it is rather the two-way transformations/mappings (isomorphisms) that we make between them which are interesting; but sometimes we forget about the original objects, and we lose a lot of information.
(to paraphrase, "Take an apple and cut it into two equal pieces -- of course, they are not really equal, because if they were there would have only been half an apple". paper is by Baez & Dolan)

Likewise, aren't types a kind of isomorphism?

example of overloading based on values:

example of overloading based on values:

let factorial(n : n > 0) = n * factorial(n - 1)
let factorial(0) = 1
let factorial(n) = raise error

In the above example we can see the function 'factorial' implemented in 3 versions: one for positive numbers, one for 0, and one for negative numbers.

The style of programming resembles overloading methods, as there are 3 different functions with the same name.

The types of parameters are: positive, zero and negative.

The above is also pattern matching, because the function would actually be coded like this:

template  T factorial(T n) {
    if (n > 0) return n * factorial(n - 1);
    if (n == 0) return 1;
    throw error;
}

more than one type, without types

OK, but what if there are more than integers?

E.g. if you construct everything out of integers, how can you define '+' differently for anything else (like complex numbers)?
Do you have "primitive" operators (prim+, prim=, etc.) for the low-level, and redefine '+' for higher-level types (representations)?

I'm not sure if the latter part of your previous post,

I have an idea to make a programming language that has no type declarations, and all typing is done with binary operator as applied over expressions

is referring to something different than your factorial example, or not (you didn't mention 'as').

just to play devil's

just to play devil's advocate - what you lose here is the ability to hide information. in some cases it is useful to have methods with privileged access to information (ie that is otherwise restricted). that implies some kind of association between method and data (although you could imagine a pile of different approaches; it doesn't have to be via monolithic objects).

there's also a rather more pragmatic reason for the advice you were given. if you are restricted to using a language where methods and data are grouped in objects then you have to put your method with some data, and in that case it's easier to maintain if the methods pertain directly to the data being manipulated. for example, "toString" methods should go in the relevant class rather than String so that you don't have to extend String for each new datatype.

but i think you're correct, in your implied claim that the arguments i give are over-emphasised in much oo programming. on the other hand, i think there is a change in emphasis and that this is one of the issues that is thrown into the confused "dynamic languages" debate.

what you lose here

what you lose here is the ability to hide information. in some cases it is useful to have methods with privileged access to information

Yes, I think I might have taken the thought too far.

I can see that pulling methods out from simple classes, for example 3DVector, Colour, 4x4Matrix as I originally mentioned is probably a good idea, since someone may suddenly come up with a new operation you can perform on matrices and could add it and share the method without having to submit the change to the origional author of the class.

But I can see that as soon as you have a more complex class, say a motorcar, then you would want hidden state, and the ability to privately manipulate that state within the confines of the class (encapsulation).

This could be handled by a

But I can see that as soon as you have a more complex class, say a motorcar, then you would want hidden state, and the ability to privately manipulate that state within the confines of the class (encapsulation).

This could be handled by a more complex type system such as ocaml. While ocaml uses the object=attributes+methods model, it can easily hide information via interfaces:

Say you create a simple ocaml module named foo.ml:

open Printf;;

class point =
object (self)
  val x = 0
  val y = 0
     
  method private print_y = 
    printf "y: %d\n" y;
    ()

  method print =
    self#print_y; 
    printf "x: %d\n" x;
    ()
end;;

but you want to completely hide everything but the #print method in the foo.mli interface:

class point :
object
  method print : unit
end;;

then if you have another module, bar.ml, you can do this:

let p = new Foo.point in
  p#print;;

running:

> ocamlc foo.mli foo.ml bar.ml
> ./a.out
y: 1
x: 0

but trying to access #print_y will fail with a type error:

bah.ml:

let p = new Foo.point in
  p#print_y;;

running:

> ocamlc foo.mli foo.ml bah.ml
File "bah.ml", line 2, characters 2-3:
This expression has type Foo.point
It has no method print_y

Taking this mechanism, one can see how one could separate attributes and methods from the definition site via multiple interfaces to the same object. Unfortunately, I don't think ocaml supports hiding public methods, so you can't completely implement this technique in it, though.

Encapsulation should be a matter of accessibilty, not syntax.

Encapsulation should be a matter of accessibilty, not syntax. It could be possible that different parts of a more complex class are only trusted to certain functions only.

sure, but in practice you

sure, but in practice you may want to control who decides this. one way of doing so is to only allow the programmer to grant access to hidden state within the lexical scope of the definition of the data structure.

that and how dispatch is decided seem to be the important issues here. are there any languages that provide both guaranteed (fixed at compile time) access control and multiple dispatch? if so, what syntax do they use? if not, is an elegant syntactical solution possible?

maybe there's an obvious example i'm missing...

Even when one vendor writes

Even when one vendor writes the classes, not all the functionality is in the class. Consider the Math library in Java and .Net. They add a lot of functionality to the various numeric classes, yet they are in addition to the methods built into the classes.

As for picking and choosing methods between vendors, that usually won’t be an option. If for some reason they agree on a common data representation, they will also agree on a common set of methods.

Everything is Design

There are situations where one particular design approach is reasonable, and there are situations where a different approach is better. One example does not make or break a design approach. I'd posit that the real problem with OOD is not OOD itself, it is:

  • The general lack of understanding when it is appropriate.
  • The fact that popular languages do not help you escape out of a narrow style of design.

Overgeneralized drilling

I'm guessing most OO programmers would generally approach class design in this way.

Actually, no (I hope). Anybody who deals with 3d vectors knows that "x" is not an implementation detail; it's a key attribute of the object. In your use cases, you would realize that people will want to access these components (i.e. you can't abstract them away). So you would have:

public float getx();
public float gety();
public float getz();

The other methods may remain. In all likelihood, other vendors will write new (non-member) functions for any new functionality.

I wish I had been thought OOA/OOD in college myself. Just teaching Java/C++/etc. is not enough because people do write some really bad code thinking it's the OO way.

You are in luck, SamK!

You are in luck, SamK! Smalltalk works in exactly this way.

Smalltalk programs are represented as "changesets", which are sets of changes to some reference system ("image"). The changes can not only add new classes (as e.g. Java programs do) but also make changes to existing classes (for example adding new methods). To write a Smalltalk program you just change the existing system and then ask it to create a changeset representing what you did.

You should check out Squeak! I started to recently but my spare time has been less copious than usual so I haven't gotten far.

Smalltalk

I like the idea of changing the state of the system and asking it to create a record of the changes.

I will look more closely at smalltalk, but is it possible for people to publish the modifications without thought or do all changesets go through some sort of "conflict resolution" before release to the public?

publishing ST changesets

You can create changesets whenever you like, but they aren't automatically published to anyone (you just get a .cs (changeset) file).

There are internal tools for examining/editing changesets (what classes were changed, which lines of which methods, etc.) and I believe for merging changesets, although I haven't used the latter.

Personal computing environment

Changes to Smalltalk classes and methods are automatically recorded in a file called the change log, as snippets of Smalltalk code which could recreate those changes. So if we changed something and as a result lost all our work, we could grab a clean new image and source file and replay the old change log up-to just before the bad idea, and get back to where we were.

That's Smalltalk as a personal computing environment - for decades there have been fine-grained version control systems that create a team programming environment for Smalltalk.

Some hint of the incredibly playful personal system they were trying to create comes through in this video - imagine programming a new kind of number, and then having that new kind of number be used to calculate the window coordinates of all the windows Smalltalk displayed...

Great

Thanks for the link. I wish there was a cable channel that played these types of videos 24/7, but Google will do. At least their video section was smart enough to give me a list of similiar stuff.

Data oriented PL

I think too that data should be separated from methods. This is the same I see when running several languages on Neko. You don't need a class system to get good interoperability, you only need your different languages to have a contract for sharing the same data structures.

Contracts

Isn't the public (and to some degree protected) interface of a class a contract? I believe Herb Sutter would agree. And therefore aren't contracts themselves a form of information-hiding?

Although I agree with you that in certain domains data should be separated from methods, I myself struggle with the semantic difference between interfaces and contracts.

Encapsulation without owning methods

These are orthogonal in many languages (including some mainstream languages like C++). Here is how encapsulation without owning methods can look in C++:

class Foo {
private:
    int x;
    friend int bar(Foo &foo);
};

int bar(Foo &foo) {
    return foo.x;
}

I didn't realise you could

I didn't realise you could make methods friends of a class.

..but does it separate the methods from the class? since you still have to modify the class each time you need to add a new friend:

class Vector3D {
private:
    float x;
    float y;
    float z;	
    friend float Length(Foo &foo);
    friend float Dot(Foo &foo, Vector3D &other);
    friend Vector3D Cross(Foo &foo, Vector3D &other);
};
float Length(Foo &foo) {
    return Sqrt(foo.x*foo.x + foo.y*foo.y + foo.z*foo.z);
}
float Dot(Foo &foo, Vector3D &other) {
    ...
}
Vector3D Cross(Foo &foo, Vector3D &other) {
    ...
}

The functions are still tied to the class

You are right that the class needs to be modified to add more friend functions. In this example, the option to use friend functions only means that the syntax of calls isn't tied to encapsulation, so you can make decisions about the syntax and encapsulation separately.

AOP

Aspect Oriented Programming tries to handle this. "Introductions" are like Smalltalk changesets (with the added power of private, per-introduction fields); "advice" uses predicates on the program to trigger code. The predicates are called "pointcuts" over "join points"; join points are typically control-flow related.

It's very powerful at modularizing code, but so much so that the modularity touted requires whole-program reasoning. For example, one of the join points in AspectJ is the entire control flow of an arbitrary pointcut. (cflow(call(* Foo.method(*)) is the control flow of all calls to "method" (arbitrary return/arg types) on the Foo class.) Precisely determining when this pointcut matches isn't necessarily possible until runtime.

note that clos doesn't

note that clos doesn't restrict access to data in any way, iirc (although presumably you could extend it do so if you thought of a good solution) [argh - sorry, i keep forgetting that the box at the bottom of the page isn't to reply to the last item!]

sorry

Sorry if i'm missing the deeper meanings of what you're trying to convey, but perhaps what you're asking for is C structs and procedural programming?

If you're going OO, what's the matter with Interfaces (or abstract classes) to which all vendors agree with and implement their own classes with the desired addon methods? Not to say that the CLOS way doesn't have its merits...

I don't quite get it. But then, i may be just sleepy and shouldn't be posting right now...

the benefits of private are social

A vector is a case where exposing all the data is the right design. That's because a vector is really pure data. It's very unlikely you'll need to make any changes that would break compatibility. Many mathematical concepts have this property, but few other objects do.

Some benefits of private include: you can expose a smaller interface (to help people use your code) and help the object maintain invariants which make it easier to reason about code and make certain bugs impossible. And anything that is hidden can be thrown away and rewritten without breaking code that uses the class.

All these benefits are huge, huge things in the real world, especially for code that has many users.

java.awt.Font has been completely rewritten since Java 1.0 (probably several times). This wouldn't have been possible if Font were a struct with public data.

private strictly decreases the power of a programming language. But it has social benefits that make it indispensible to programmers in the real world. That said, a lot of people think Java/C++'s private is too restrictive and wish it were more of a guideline than an absolute rule.

private strictly decreases

private strictly decreases the power of a programming language.

I guess I should have said, it decreases the power of code that uses it. It's funny that something so purely negative in its meaning can be of such tremendous value to people, but I don't think it's hard to see why.

Of course

Information hiding is an essential component of software engineering. Indeed, it is one of the few ideas in the field that survived the test of time.

Support for information hiding increases the expressivness of a programming language (or a module language, if that's more to your liking).