archives

Categories, the homemade object system

Recently I posted a blog entry about some work I was doing on a novel object system, which I called Categories. That post generated a burst of commentary in a couple of places; one relevant comment suggest that I should post something about Categories here.

I thought, "that's a good idea; why didn't I think of it?" I've been reading LtU for years. What better place to find people who will tell me all the things that are wrong with Categories?

The main point of the Categories object system is to clearly and cleanly separate representation, behavior, and taxonomy. I wrote an article entitled "Protocols" eighteen years ago that explained why one might want to do this, and suggested that most object systems conflate at least two of them.

By "representation" I mean the concrete layout of data in storage. Behavior is the set of operations that are defined over some data. Taxonomy is the collection of representations into categories, and the superset/subset relations among those categories. My claim is that these three concepts are logically distinct, and that, although most object systems don't do it, it's possible to treat them distinctly.

Why should anyone care? The article linked above provides one argument. Another way to look at it is suggested by a comment from Joe Armstrong in Peter Seibel's new book, Coders At Work:

â€œBecause the problem with object-oriented languages is theyâ€™ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.â€

The fact that I want the behavior of a shape doesn't necessarily mean that I want the fields that some other programmer used to represent it. Pretty often, I want the behavior of an existing class without the representation; or I want the representation, but not the taxonomy. Haskell gets it right, in my estimation. In Haskell, representation is handled by datatypes; behavior by functions; and taxonomy by typeclasses.

I love Haskell, but I'm a Lisp-hacker by nature. If there's a way to do my work in a Lisp, that's what I'll do.

When I found myself dissatisfied with Clojure's treatment of types and polymorphism, and I started coding around the parts I didn't like, I ended up writing an implementation of types and polymorphic functions that pretty closely tracked the ideas in that old article. This bit of work I named "Categories".

Categories is a library written in Scheme and in Clojure. It implements a treatment of types and polymorphic functions that sharply distinguishes representation, behavior, and taxonomy.

Representation is handled by elements called, oddly enough, "representations". A representation is a concrete specification for storing data. In the Scheme version, for example, fixnums and vectors are representations.

Behavior is handled by functions. A function is an applicable object that accepts zero or more values as input parameters, and computes and returns zero or more values as outputs. Functions are polymorphic. A function examines its inputs and, based on some computed characteristic of them, chooses some concrete piece of code, called a method, to run.

Taxonomy is handled by domains. A domain is a description of a set of types, the relations among them, and the rules that functions use to select methods based on the types of their inputs.

Types are sets of values. Concrete types are aliases for representations. Abstract types are nothing more than names, commonly used in Domains to collect and organize related concrete types. As an example, Gambit has built-in fixnum ad bignum datatypes. The Gambit implementation of Categories has concrete and types that are aliases for these representations, and an abstract type that the default domain defines as a supertype of both.

This arrangement of elements accomplishes what I wanted: it completely separates representation, behavior, and taxonomy. Okay; so what? Is there anything else good about it? Is separating those concerns actually good in practice? What's it like to use it?

Obviously, I'm biased. It's what I wanted, so I like it. It's also very young; the present code is the result of tinkering around for five months. The API is still moving around some, and the working code is still likely to have significant bugs at any given moment.

Keeping those caveats in mind, I have been using Categories in production code for a few months, and there are definitely things I like about it. I like it well enough that, when I began work on a new project in Scheme, leaving my Clojure work aside for a bit, I missed Categories and ended up porting it from Clojure to Scheme.

It's very easy to build whatever abstract types I want. I simply alias the needed representations and use a domain to collect the resulting types into a convenient arrangement. If that arrangement turns out to be inconvenient later for some other API, nothing is lost; it's quite easy to make another domain that is as similar or different as I need. Once created, domains are invisible, except when creating new functions. The function constructor takes a domain as a parameter, and the new function remembers it. Later rearrangements of the domain's types are automatically visible in all existing functions defined on that domain, but have no effect on functions defined with different domains.

Like Haskell, Categories enables you to define data layouts, functions, and classes of types completely independently. Like CLOS, it also enables you to arrange for 'inherited' behavior, if that's what you want. A domain can tell you whether a method is applicable to a sequence of argument values, and, given two applicable methods, it can tell you which one is more specific to the particular argument values. Categories can therefore support CLOS-style next-method dispatching--if you want it. The API is there to provide that kind of dispatching; you can use it or not, as you see fit.

As I say, Categories is still very young, and it seems like every day I still see another way to improve the API or the implementation. You can't get the newest working code; I'm in the middle of refactoring domains again. Older versions are available, but probably less interesting at this point. It might be a week or so before I have a working release.

Feel free to tell me where I'm out of my mind, though, and if you want a look at the code, ask me in a couple of weeks; by then I'll likely have something packaged that's usable.

By mikel evins at 2009-08-15 12:06 | LtU Forum | 17 comments | other blogs | 7233 reads

Lambda the Ultimate

User login

Navigation

Categories, the homemade object system

Browse archives

Active forum topics

New forum topics

Recent comments