Do we need exactly two binding constructs?

I'm recently thinking about the relation between objects and lexical environment. Objects, especially under a prototype-based object system, looks suspiciously similar to a lexical environment:

slot-ref <-> environment-ref
slot-set! <-> environment-set!
add-slot! <-> environment-define
parent(s)-of/delegate(s)-of <-> environment-parent(s)

However, one problem remains in the way of unifying those 2 binding constructs completely: the slot-scoping problem. If I simply use symbols (like the case for environment) to designate slots, there's nothing to prevent two authors to come up with the same name (say 'x) and they can clash, especially in the presence of multiple delegation/inheritance. Therefore I figure I must use slot objects rather than symbols to designate slots:

(within-environment user-1
   (define x (make-slot)))
(within-environment user-2
   (define x (make-slot))
   (make-object x 1 user-1:x 2))

and... now environments bind symbols to slot objects, and objects bind slot objects to values.

This all looks fine, except that it makes me itch that I need to have two almost identical constructs, and I can't unify them into one! Are those two binding constructs exactly what we need, no more, no less?

There is no necessary difference.

I don't think it's necessary to distinguish object scope from environment scope. They are literally the same thing.

To explain why, I'm going to point out that when you call a procedure in most languages, its environment is allocated on the call stack. Then when the procedure returns, its environment is garbage collected by popping it off the call stack.

You're doing the same thing in creating objects, except that you're creating methods that refer to the environment. As long as something refers to the environment, it mustn't be garbage collected. And that environment - the "stack frame" if you think of it that way - IS your object.

Your object is the object constructor's lexical environment. It's allocated when the constructor is called, like any lexical environment. The constructor in turn defines the methods in its own scope, meaning they have access to the constructor's environment because lexical scope. The constructor then returns your 'object' - a bundle of references to these methods or to variables within the constuctor's scope.

So when the constructor returns, its environment cannot be garbage collected. Where most lexical environments are garbage when the function that created them returns, the lexical environment of the procedure that created your object is still 'live' because of the references to things inside it which been returned to the constructor's caller.

Object-oriented programming is just a particular way of thinking about using a general garbage collection strategy for your environments instead of simply assuming they are garbage when the function returns and popping them off the stack.

I realize this isn't how objects are actually implemented in many languages that have them, but my "proof" that there's no distinction between object environments and ordinary lexical environments amounts to showing that you get the behavior you expect out of objects simply by having the 'constructor' environment avoid garbage collection for as long as there are still live references to it.

By Ray Dillinger at Sun, 2022-03-27 19:03 | login or register to post comments

The question is about a potential subtle difference

I fully agree, and in fact almost exactly the same reasoning motivates me to unify object and environment. However, my second paragraph states a pragmatic issue (name clashing/slot scoping), which I realize might be a manifestation of a deeper distinction.

Let me elaborate: name clashing is not a big deal for lexical environment, because its keys are expected to use lexically locally. Therefore, each lexical environments can safely shadow names arbitrarily and there is no need for coordinated naming between multiple lexical environments. Same name referring to different concepts in an environment chain does not create any trouble, therefore we can safely use globally interned symbols as keys.

On the other hand, objects are transported and used in different lexical environment (technical "lexical scope" is not important here, what is important is that such object can be used in almost-unrelated code from different authors). For objects, name clashing can create confusion and unexpected behavior. If one author use "left" to refer to a direction and the other use "left" to refer to political alignment, chaos await.

I did come up with a somewhat better solution recently. I can have a "unified environment" that can have arbitrary objects as keys. When used as a lexical environment, keys are by default globally interned symbols, but slot objects with object identity are used for lambda-with-accessors. For objects, slot objects are normally used. This can retain a unified interface to some extent.

Edit: I realize that the above problem might not exist under the scenario you are describing: some class-based system with single dispatch. In such cases, object scopes have the same lexical locality and thus can safely use the same naming mechanism as lexical environment.

On the other hand, I encountered my problem when creating something more dynamic: a prototype-based multiple-dispatch system. In those systems it seems necessary to manipulate objects in different lexical contexts, and object scopes are not longer static, local or even existent.

By Qiantan Hong at Tue, 2022-03-29 04:12 | login or register to post comments

It may be too subtle for me....

I have responded to this before but I noticed today that my response isn't showing on this page, so I'll try again.

I'm not entirely sure I see the problem - so I probably don't understand you. You see a potential for name conflicts here but I don't think there's anything in multiple-dispatch or prototype-based typing that actually makes such a conflict unnecessarily likely.

So let's be sure we're talking about the same thing. When I read "prototype-based typing" I think of datatypes defined as interfaces - a set of accessor and function prototypes that fully define it. And then anything that fulfills every part of that interface, can be identified as a member of that type, regardless of whether any of the implementations of those functions has anything to do with the implementations that fulfill the interface for any other member of the type. And a particular object, in a prototype-based system, may be a member of several types simultaneously.

When I read "multiple dispatch" I think of functions (including methods) that can have the same name as long as at least one of their argument types or return types is different, and the compiler or interpreter works out which method to call based on the argument and/or continuation types.

I've implemented both of these things, with slightly different levels of satisfaction with the result but no name-clash difficulties that I ever identified.

There is the case of an object declared to be both of two different types when the types had one or more type-indistinguishable methods in their definitions. For example if types 'foo' and 'bar' both have a function named 'frob' that takes two integer arguments and returns two integer results, multiple-dispatch can't sort that out. But there's no name conflict. When something frobs that object, that method will be called regardless of whether something intends to be working on a 'foo' or a 'bar.'

This breaks however when one or more of the two methods indistinguishable by call type is non-pure. If the side effects are not identical, then the methods are of (likely different) effect types, even if they are identical call types. Because I never even attempted to distinguish whether side effect types were identical, I treated any side effecting methods indistinguishable by call type as conflicting declarations. In that case the compiler would stop you the instant you declare some object to be both of those two types.

Does that address the kind of name conflict you're thinking of, or is it something more subtle that I've missed so far?

By Ray Dillinger at Wed, 2022-03-30 22:03 | login or register to post comments

Self 4.0 ...

... uses the scheme you describe: it is prototype-based, and when you invoke a procedure its stack-frame object is cloned. Multiple parents are allowed, but conflicts are not: if the parent objects have slots of the same name, an error is signaled. That way you don't have to remember which of the bajillions of multiple-inheritance linearization algorithms (Common Lisp, Dylan, C3/Python, etc. etc.) is in effect, and exactly what will happen.3

By John Cowan at Wed, 2022-08-31 17:07 | login or register to post comments

Lambda the Ultimate

User login

Navigation