The this argument in dynamic programming languages with OO and first-class functions

I posted this as a comment on Hacker News yesterday, but didn't receive any responses. I hope I might get some comments here, in the more PL oriented community.

---------------

The handling of the this argument is of crucial importance in the design of programming languages with first-class functions that feature some kind of objects, i.e. data with attached behavior (call them functions or methods). (I use call as in f(x) and invoke as in o.m())

There are basically 3 possibilities that I see:

  1. differen invocation syntax, with explicit this parameter

    Lua is an example of a language that uses this approach:

      table.method1(arg1, arg2, arg3)
      table:method2(arg2, arg3, arg4)
    

    Here, method1 is called with 3 parameters (without the this parameter), but method2 is called with 4 (the first parameter is the this object, table).

    This is undoubtedly the approach with the most efficient execution, but it places a huge burden on the programmer, as one has to distinguish methods from ordinary functions both at call site, and at definition (if you attach a normal function to a table and invoke it as a method, you have a problem if the function doesn't expect the first parameter to be this). Also, this approach is the opposite from the one found in the traditional OOP languages such as Java and C++.

  2. automatic method binding

    Python is an example of a language that uses this approach:

      f = obj.method
      f(arg1, arg2)
    

    In this case, the function f is called with the first parameter being self, the Pythonic version of this. In Python, the self parameter (Pythonic this) is declared explicitly at method definition, but I believe that it could also work if it were implicit.

    This approach is perhaps the most sensible approach from the programmer's point of view, where methods "just work" - you can call them using the normal dot syntax, and you don't need to worry when using methods as first-class functions - they will remember the object they were attached to.

    However, a naive implementation of this technique is quite slow, as the function must be bound to the this object every time it is accessed. It is possible to invoke (access and call at the same time) a method without this overhead (e.g. use some object's internal dictionary to check whether it has a method with the appropriate name, and call it, passing this as the first parameter), but not when accessing attributes - one must check whether the attribute is a method, and bind it to this if it is. However, this argument is irrelevant in a language that supports arbitrary getters/setters (which seems to be the direction that new OO languages are headed in).

    Also, it complicates matters with ad-hoc objects:

      o = {}
      f = (arg1, arg2, arg3) -> arg1 + arg2 + arg3
      o.m = f
    

    When we invoke the method m of the object o, do we pass this as the first parameter (when the function f might no expect it) or not (loosing the OO aspect)? Python conveniently avoids this matter by not supporting ad-hoc objects out-of-the-box.

  3. implicit this

    Javascript is an example of a language that uses this approach:

      f = (arg1, arg2) -> print this, arg1 + arg2
      f(1, 2)        // prints "undefined, 3"
      o.m = f
      o.m(1, 2)      // prints "o, 3"
    

    Here, this is an implicit, undeclared parameter of every function, and is passed at every call/invocation (if a function is called like in the second line, this is set to (i) the global object (window) in the browser, or to (ii) undefined in strict mode). Here, there is some overhead at every unbound function call, as an extra parameter (undefined) is passed, and it is not clear how this overhead can be avoided (except by avoiding plain function calls, and sticking to OOP). Ad-hoc object creation is simple, as functions/methods have a simple way of knowing whether they were called (without a this parameter) or invoked (with a this parameter). However, care must be taken when using methods as first class values:

      f = o.m             // f will have the this parameter undefined
      g = o.m.bind(o)     // g will have o passed as the this parameter
    

    A syntax extension would be nice, e.g.

      g = bind o.m
    

    but it might be ambiguous. In my experience, this situation is not encountered very frequently, so the trade-off is acceptable.

    ---------------

    I see all these possibilities lacking, but tend to consider the third solution as the most favorable, mainly because it supports elegant and unambiguous ad-hoc object construction.

    Maybe we could combine solutions 2 and 3, e.g. by using getters that set this on method objects automagically, but to keep ad-hoc objects simple, an implicit this seems to be a must, which means we cannot avoid the plain function call overhead...

    I'm sure I'm missing a very simple, straightforward alternative, and I would be extremely grateful for any comments or constructive criticism.

    Edit: formatting.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Translate the explicit into the implicit

I think when you want to compare an explicit and an implicit method, a good way would be to define a rigorous translation of the more implicit into the more explicit technique. In your case, it would be a translation from solutions 2 and 3 to the solution 1 (or possibly a different explicit presentation). It's very useful when your description of the implicit behavior is otherwise informal and prone to confusion ("implicit, undeclared parameter" ?), and it's even better if you translation exhibits the performance trade-off of each solution.

On a more personal plan, I think that programmers should make a distinction between functions and methods: if you consider something a method (whether it was initially defined as a method or was independently defined or attached after on in a dynamic language), it *should* have the state of the object in its execution content (by the way of an explicit 'this' parameter or some more implicit mechanism). If it was not designed with this object state in mind, it does not make sense to transparently consider this as a method; but you could define a wrapper that takes a "non-self-conscious" function (sorry for the bad pun) and return a "self-conscious" function that just ignores its object context.

So I would choose an approach similar to point 2, accepting `o.m = f` only if `f` is indeed take that context parameter. The solution of point 3 looks like an "everyone lose" perspective to me: in the common case of "pure function", you have an useless undefined parameter (which is quite bad from a semantics point of view), and if you want to transform it into a method you have to use an additional explicit `bind` operation in any case (while with the point 2 approach you need to transform only the function that are not directly compatible with the "method call convention").

More generally I would avoid anything like "oh it does not really make sense but let's say that it actually means this" when you have the opportunity to be explicit (possibly using some nice syntactic sugar or designing some inference later). But that depend on your target public and expected usage: large-scale programs, scripts, something in between?

If you want an object or class to have both "methods" and "static functions", then you get something which is similar to your point 1. This looks very reasonable to me : `table.foo(..)` should be thought as a method call on instance `table`, and `table:foo(..)` as a static function/method call on *class* `table`. In C++ you also have a syntactic distinction at call sites between static and non-static methods.

Finally, I think the distinction of point 1 (: or .) can also answer you question about point 2 : you could write `o.m = f` to bind a self-conscious function `f`, and `o:m = f` to bind a non-self-conscious function. You can also explain that by saying that method call are just sugared function calls, ie. `o.m = f` translates to `o:m = fun(*params) -> f(o, *params)`, or that static methods are actually usual methods wrapped into a context-ignored transform, ie. `o:m = f` translates into `o.m = fun(o, *params) -> f(params)`. I would make whatever choice is the more implementation-sensible and efficient giving the expected common use among the target users of your programming language. The first choice seems more minimalistic and thus better to me, but you really can't tell, eg. if you target a virtual machine that has fast instructions for this-passing calls.

> It's very useful when your

> It's very useful when your description of the implicit behavior is otherwise informal and prone to confusion

I assumed basic familiarity with Lua, Python and Javascript. My bad, I probably should clarify the OP so that everyone would understand exactly what each approach is.

> But that depend on your target public and expected usage: large-scale programs, scripts, something in between?

Yes, I should have emphasized my goal/the expected usage of the language: a scripting/rapid prototyping language that tries to stay out of the programmer's way as much as possible. If you want to write large-scale applications, you can always enhance your dynamic program with static types/contracts/optimized classes/etc later.

> If you want an object or class to have both "methods" and "static functions"

Aren't static functions simply the methods of the class? If you have first-class classes (which seems to be the case in most class-based dynamic OO languages), then a class is just an object, and can thus have methods.

A more interesting case are first-class modules. Then the dot-notation is used simply to denote namespaces, and methods are really just functions that need no this context.

> I think the distinction of point 1 (: or .)

Actually, I'm not considering option 1 at all, I only included it to cover all existing approaches (that I know of). Option 1 seems the terrible from many perspectives. The most obvious are the usability issues - the user has to know what kind of "callable object" f is to be able to use it. Also, I consider it an insult to my intelligence and humanity whenever I have to tell the computer something that it should know / could figure out by itself. In most cases, only one of the possibilities is correct, and using the other one would results in wrong results or runtime errors.

Also, it is wrong from an engineering point of view. As I mentioned before, one has to know whether f is a function or a method, and this information might be far away from where you use f. Furthermore, changing the nature of f would mean you have to edit a lot of code (that calls f) in a lot of different places.

The aspect at which point 1 excels is runtime efficiency. However, chasing runtime efficiency at the expense of programmer productivity or, god forbid, correctness, is a path that leads us back to C or even assembly. And Lua's excellent tracing JIT implementation LuaJit has shown us that efficiency can be achieved even in a very dynamic language.

All in all, thank you for your response. You raise interesting points, and while I didn't reply to most of them, I took them into account. I feel I'm getting closer to a solution that would be acceptable to me and my goals.

Generic procedures

I prefer the following way of doing it:

For every message "X" understood by an object (in the Smalltalk sense), there is a procedure also called X, that sends that message. Schematically:

(define X
  (lambda (object . args)
    (send-message object "X" args)))

This means you can use such generic procedures as ordinary procedures (because they are ordinary procedures):

;; Map X across list of two newly created objects
(map X (list (make-object) (make-object))))

Note that there is no "this" - I find "this" to be a Rube Goldberg device.

P.S. This approach has long been used in Lisp and related languages.

Bi-duality

Note that this "reversal of point of view" is much older than smalltalk, as it is for example the idea underlying the well-known equivalence of a finite vector space and its bidual in a linear algebra: a vector (data, here "message") can also be seen as the second-order function that take a linear function (here "object") and pass it the data.

More generally, in all systems where you have a notion of "data" and a notion of "computation", and are reasonably observational, you can transition freely between "the data" and "the computation of passing this data to a computation". Actually this could be seen as a definition of being "reasonably observational". This is probably, as everything else in the world, related in some way to the Yoneda Lemma.

That is an idea that I

That is an idea that I considered as well, but I dropped it because I believe it brings more problems than advantages.

The main problem with this approach is that the method name has to be available in the current namespace. This could easily result in namespace pollution, or alternatively every name would have to be defined in advance. Also, I'm not sure how this would combine with ad-hoc objects that are created dynamically.

Maybe a better solution is to use some special syntax sugar for this case, such as _.x.

map(_.tostring, [obj1, obj2])

Now, I know that I (and thus this reply) am influenced by my programming experience, which has mostly been in Python/Javascript. I should study Lisp and its descendants more, maybe I get some ideas how to unify both approaches.

Actually, the fact that

Actually, the fact that procedures live in a namespace is an advantage, instead of looking up methods dynamically by name at runtime. Sometimes you may need that flexibility, but for nearly all code you don't. In languages like Ruby it's a problem that methods can't live in namespaces. There was some talk about providing a mechanism to make methods live in namespaces to allow different parties to provide different methods with the same name without fear of names clashing. Namespaces already do this for all values (which includes classes). That you need a different system for methods is telling.

Sure, you do lose the ability to create new methods with new names at runtime, but this is good riddance. Meta programming with macros works much better than meta programming by monkey patching.

Once you use function call syntax for methods, you realize that the first argument isn't special, and that you could dispatch on any of the arguments. This leads to multiple dispatch or even to predicate dispatch.

It is entirely possible that I am completely wrong about this :) Could you give a concrete scenario that you have in mind where this brings more problems than advantages?

Actually, the fact that

Actually, the fact that procedures live in a namespace is an advantage

Yes. Although the interactions of module systems and objects can be confusing. (E.g. if you have a module system that allows renaming of imported identifiers, then you can have generic procedures with names different from the messages they send... Oh, and whether message names should be simple strings or (module, message) pairs themselves is another complicated topic. Previous discussions: Should method names be (module-) scoped?, Namespaces for methods?, How important is language support for namespace management?.)

Sure, you do lose the ability to create new methods with new names at runtime

Well, you can always use SEND-MESSAGE for those. (Or if you have EVAL, define new generic functions at runtime.)

U Can't Touch `this`

I'm also among those who think the best answer for 'this' is to get rid of it - and model self-reference explicitly when we need it.

If 'this' is an explicit argument to each 'method' that needs it, we have a lot more freedom to model open recursion, develop alternative object models, control reentrant computations, and prevent looping constructs in cases we don't want them.

We can separate the concepts of corecursion and fixpoint.

Similarly, I would say that objects should generally not construct their own dependencies. I.e. if you need an Integer, ask for one in your constructor. This is a valuable separation of concerns, e.g. with respect to persistence, testing, debugging.

Encapsulation can be separated from the 'object' concept. When I was developing a language based on actors model, I used a separate 'configuration' concept, which also handled the fixpoint if desired. (There are also a lot of optimization advantages to constructing objects in one big declarative configuration.)

open recursion, develop

open recursion, develop alternative object models, control reentrant computations, and prevent looping constructs in cases we don't want them.

We can separate the concepts of corecursion and fixpoint.

Unfortunately, I'm not familiar with these concepts and how the presence of implicit this affects them. Could you please provide some references or examples?

Python is weird


class test(object):
    foo = 2
    bar = lambda x:x

o = test()
print o.foo # prints '2'
print o.bar(2) # error... got 2 arguments instead of 1

If objects can house regular values, they should also be able to house function values without assuming they are methods.

This illustrates why Lua has

This illustrates why Lua has the dot and the colon (as in the example in the OP). The Lua user has to explicitly distinguish between functions stored in the instance and methods.

I would argue that this is not the programmer's task, its the compiler's task (or VM for dynamic languages). This argues for a more implicit method of making the instance values available in the functions environment so that the compiler can make that decision.

Any explicit method means that the programmer has to know the distinction and therefore can get it wrong.

Other options

Lua's scheme is sort of reasonable but I don't like it either, and Python's scheme is too much of a hack for my liking. If object methods are going to seem closed over the object, then they should actually be closed over the object (hooking up self in methods could have been something the class does when an object is created).

Another design similar to the one I'm using is to have dot be a special reverse application that uses the type to resolve overload selection. So instead of obj.foo being a lookup inside obj, it's just an application of foo to obj. If there are multiple foo's in scope the type of obj is used to disambiguate them. Or you could just write foo(obj) and if needed manually qualify foo.

this is not the programmer's

this is not the programmer's task, its the compiler's task (or VM for dynamic languages). This argues for a more implicit method of making the instance values available in the functions environment so that the compiler can make that decision.

Any explicit method means that the programmer has to know the distinction and therefore can get it wrong.

I couldn't agree more.

Maybe function objects could have some property that tells whether the function needs a context (this) or not. This could easily be detected at function definition (if this is a keyword that cannot be used for anything else). Except in presence of eval and similar constructs.

Then, when an object is created or it's attribute is assigned a function value, it could detect what kind of function we're using, and how it should be treated.

Python might be weird at first...

...but it's the default type metaclass that modifies bar into a method. You can work around this problem by using a decorator:

class Test(object):
    foo = 2
    bar = staticmethod(lambda x:x)

Or if you want to go fancy:

def DontTouchMyLambdas(base, name, dict):
    def isLambda(v):
        aLambda = lambda: None
        return isinstance(v, type(aLambda)) and v.__name__== aLambda.__name__
    
    for k in dict:
        if isLambda(dict[k]):
            dict[k] = staticmethod(dict[k])
    
    return type(base, name, dict)

class Test:
    __metaclass__ = DontTouchMyLambdas
    foo = 2
    bar = lambda x: x

Still kinda weird

Thanks for the explanation. I was thinking the magic was on the lookup side, which would have been much worse. It still seems a little weird to me that Python chooses to interpret every function that occurs in a class as a method. It seems to me that rather than having a 'staticmethod' type that doesn't get hooked up, it would have been cleaner to have a 'method' type that does.

I was thinking the magic was

I was thinking the magic was on the lookup side

Until now, I've never considered that anything else could be possible. Thanks, you gave me a great idea.

It seems to me that rather than having a 'staticmethod' type that doesn't get hooked up, it would have been cleaner to have a 'method' type that does.

Why not have the compiler figure that out by itself? If it requires the use of the name this for the context, then figuring out whether a function needs the context or not is a simple source code analysis.

'this' is an interesting

'this' is an interesting concept and although I had it in the initial version of my language Babel-17, I removed it from it for a while, only to put it in later again. I think now 'this' just works in Babel-17 as you would expect it to work. It basically uses approach 2 (automatic method binding), but there is no problem with ad-hoc objects and so on:

val o = {}
val f = (arg1, arg2, arg3) => arg1 + arg2 + arg3
o.m = f

just assigns f to o.m, so the following holds:

o.m (3, 4, 5) == 12

'this' doesn't play a role here, because it can only be used in an object definition, like:

val o = object
def m (a, b, c) = this + a + b + c
def plus_ x = x
end

The code

val o = {}
o.m = this

would be rejected as illegal.

This is certainly a step in

This is certainly a step in the right direction. The only thing left is to allow the user to be able to define methods outside of class/object definitions and add them to objects laters:

f = (x) -> print this, x

obj = {}
obj.m = f

obj.m(1)              // prints "obj, 1"

Edit:

Of course, my approach complicates things a little bit. One source of a lot of bugs in Javascript is the following situation:

o1 = {
    x = 1
    f = () ->
      g = () -> print this.x
      o2 = {x = 2}
      o2.m = g
  }

If this is bound at function creation time, then o2.m() == 1. However, this is inconsistent with the above, so maybe this should be bound at method invocation time, which would result in o2.m() == 2.

If using approach (2) (this is bound at function invocation time), then to access the "previous" this, we could use one of the following approaches:

g = (this = this2) -> print this.x, this2.x    // prints 1, 2, as this2 is the self-reference
                                               // now, and this is resolved lexically

or

this2 = this
g = () -> print this2.x, this.x                // prints 1, 2

This is indeed a very complicated problem (pun intended).

This is fundamentally broken

The `this' reference as provided in most OO languages, dynamic or otherwise, is simply broken. Starting with the fact that it is a single keyword, which is not that great when you can nest objects/classes/methods. In JavaScript, for example, that is a frequent source of (sometimes subtle and hard to spot) errors. Only few languages (e.g. Scala) do the right thing and allow to bind a self variable per object instead, with proper lexical scoping. (However, Scala still makes an annoying distinction between functions and methods.)

Of course, that prevents not just accidental, but also intentional self-capture, i.e. invoking a method on a different object than it was defined with. But in my experience the latter is almost always a horribly bogus thing to do anyway, and interferes badly with encapsulation. Better make the object argument explicit if you need it to vary.

Edit: for clarity, the only sane semantics I see is the following:

let o1 = {(self) x = 1; m = () -> self.x}
o1.m() == 1
let m1 = o1.m; m1() == 1

let o2 = {(self) x = 2; m = o1.m}
o2.m() == 1

let o3 = {(self) x = 3; i = {(self') x = 4; f = () -> self.x; g = () -> self'.x}}
o3.i.f() == 3
o3.i.g() == 4

Note that all methods are ordinary functions here, and capture `self' (which is an ordinary variable) simply through their lexical context. It's the good old objects-as-records-of-closures model. You can still program "first-class methods" easily:

let mm = (self) -> () -> self.x * 10
o1.m := mm(o1); o1.m() == 10
o2.m := mm(o2); o2.m() == 20

You should like Babel-17

You should like Babel-17 then, it works exactly as you described it:

val o1 = object def x = 0; def m = this.x end
#assert o1.m == 0
o1.x = 1
#assert o1.m == 1

val m1 = o1.m
#assert m1 == 1

val o2 = object def x = 2; def m = o1.m end
#assert o2.m == 1

val o3 = object
def self = this
def x = 3
def i = object def x = 4; def f = self.x; def g = this.x end
end

#assert o3.i.f == 3
#assert o3.i.g == 4

I think it is not a problem that 'this' is bound to the inner most context, but find it rather convenient. If you need to address 'this' from an outer context, just bind 'this' in this outer context to another name ('self' in the above).

That's great, the only thing

That's great, the only thing I'm missing here is implicitness. Now, of course implicitness could be a bad thing, but in my limited experience a little bit of implicitness in just the right places can be a good thing, especially in scripting/rapid prototyping languages, which are kind-of my focus.

So, I would rewrite your examples like this:

o1 = {x = 1, m = () -> this.x}        // this is the default self-reference
o1.m() == 1
m1 = o1.m; m1() == 1

o2 = {x = 2, m = o1.m}                // equivalent to {x = 2, m = m1}
o2.m() == 1

o3 = {
    x = 3
    i = {
        this = this2                 // just a convenient notation for rebinging
                                     // the self-reference (and unbinds this
                                     // in this lexical context)
        x = 4
        f = () -> this.x             // refers to o3
        g = () -> this2.x
      }
  }
o3.i.f() == 3
o3.i.g() == 4

Do you think this is too confusing, or do you "get it" immediately and could get used to programming like this?

Error-prone

Maybe not "confusing", but in my experience, implicit `this' is rather error-prone in languages where you frequently nest constructs that bind it. See JavaScript.

Also, to be honest, I find your self binding notation far from optimal. It looks like a property definition, which would suggest that (1) it defines `this', not `this2', and (2) it defines it as a property of i, not a lexically scoped variable in the object body.

Well, reconsidering it, I

Well, reconsidering it, I see that it really was an awful idea. It certainly breaks the useful principle of least surprise... I have to find another way.

As far as the implicit/explicit goes... What I'm trying to achieve is to make programming easier - faster, less tedious, less boring, less repetitive, more natural. I really hate it when I have to do something as a programmer/user that could probably be done by the compiler, or possibly by the library designer. That is why I dislike Lua's explicit invocation syntax, and Python's obligatory explicit declaration of the self parameter. Now, I do realize that there are situations that these more explicit approaches turn out to be very useful and much better than implicit, but in most situations, they only consume keystrokes.

Personally, I find Javasctipt's this almost good - it allows rapid prototyping in many situations, and only misbehaves in a handful of situations. However, for someone not familiar to JS, these situations are very tricky, and the behaviors very unexpected. I'd like to find some combination of features, where in most situations, the programmers would not have to be explicit, and the implicit behaviors would be the intuitive ones. Some situations would still require explicitness.