archives

The this argument in dynamic programming languages with OO and first-class functions

I posted this as a comment on Hacker News yesterday, but didn't receive any responses. I hope I might get some comments here, in the more PL oriented community.

---------------

The handling of the this argument is of crucial importance in the design of programming languages with first-class functions that feature some kind of objects, i.e. data with attached behavior (call them functions or methods). (I use call as in f(x) and invoke as in o.m())

There are basically 3 possibilities that I see:

  1. differen invocation syntax, with explicit this parameter

    Lua is an example of a language that uses this approach:

      table.method1(arg1, arg2, arg3)
      table:method2(arg2, arg3, arg4)
    

    Here, method1 is called with 3 parameters (without the this parameter), but method2 is called with 4 (the first parameter is the this object, table).

    This is undoubtedly the approach with the most efficient execution, but it places a huge burden on the programmer, as one has to distinguish methods from ordinary functions both at call site, and at definition (if you attach a normal function to a table and invoke it as a method, you have a problem if the function doesn't expect the first parameter to be this). Also, this approach is the opposite from the one found in the traditional OOP languages such as Java and C++.

  2. automatic method binding

    Python is an example of a language that uses this approach:

      f = obj.method
      f(arg1, arg2)
    

    In this case, the function f is called with the first parameter being self, the Pythonic version of this. In Python, the self parameter (Pythonic this) is declared explicitly at method definition, but I believe that it could also work if it were implicit.

    This approach is perhaps the most sensible approach from the programmer's point of view, where methods "just work" - you can call them using the normal dot syntax, and you don't need to worry when using methods as first-class functions - they will remember the object they were attached to.

    However, a naive implementation of this technique is quite slow, as the function must be bound to the this object every time it is accessed. It is possible to invoke (access and call at the same time) a method without this overhead (e.g. use some object's internal dictionary to check whether it has a method with the appropriate name, and call it, passing this as the first parameter), but not when accessing attributes - one must check whether the attribute is a method, and bind it to this if it is. However, this argument is irrelevant in a language that supports arbitrary getters/setters (which seems to be the direction that new OO languages are headed in).

    Also, it complicates matters with ad-hoc objects:

      o = {}
      f = (arg1, arg2, arg3) -> arg1 + arg2 + arg3
      o.m = f
    

    When we invoke the method m of the object o, do we pass this as the first parameter (when the function f might no expect it) or not (loosing the OO aspect)? Python conveniently avoids this matter by not supporting ad-hoc objects out-of-the-box.

  3. implicit this

    Javascript is an example of a language that uses this approach:

      f = (arg1, arg2) -> print this, arg1 + arg2
      f(1, 2)        // prints "undefined, 3"
      o.m = f
      o.m(1, 2)      // prints "o, 3"
    

    Here, this is an implicit, undeclared parameter of every function, and is passed at every call/invocation (if a function is called like in the second line, this is set to (i) the global object (window) in the browser, or to (ii) undefined in strict mode). Here, there is some overhead at every unbound function call, as an extra parameter (undefined) is passed, and it is not clear how this overhead can be avoided (except by avoiding plain function calls, and sticking to OOP). Ad-hoc object creation is simple, as functions/methods have a simple way of knowing whether they were called (without a this parameter) or invoked (with a this parameter). However, care must be taken when using methods as first class values:

      f = o.m             // f will have the this parameter undefined
      g = o.m.bind(o)     // g will have o passed as the this parameter
    

    A syntax extension would be nice, e.g.

      g = bind o.m
    

    but it might be ambiguous. In my experience, this situation is not encountered very frequently, so the trade-off is acceptable.

    ---------------

    I see all these possibilities lacking, but tend to consider the third solution as the most favorable, mainly because it supports elegant and unambiguous ad-hoc object construction.

    Maybe we could combine solutions 2 and 3, e.g. by using getters that set this on method objects automagically, but to keep ad-hoc objects simple, an implicit this seems to be a must, which means we cannot avoid the plain function call overhead...

    I'm sure I'm missing a very simple, straightforward alternative, and I would be extremely grateful for any comments or constructive criticism.

    Edit: formatting.