Compilation/method resolution with structural subtyping

Nominal types, such as classes and interfaces, give us names to "hang" other names on - methods, data member names, nested types and classes, etc. Interfaces/mixins/traits provide some challenges vs. the well known single inheritance "v-table" of virtual methods, but I found an interesting IBM paper on efficient execution of interface defined methods.

So my question is: What about structural types? The structural types used for (among other things) the "static duck typing" that Scala most recently has brought to general attention?

What is a means for a) correct and b) efficient execution of method resolution, data member access, etc.?

To make it interesting, consider two examples.

-- One structurally typed argument
fun print-name(thing : { .name(-> Str) }) =

-- Arg is List of structurally typed things
fun print-names(things : [ { .name(-> Str) } ]) =
   for x <- things in 

I considering the first case and thought that the compiler could pass a hidden function parameter, or some indexes that provide a route to a method in a table of the parameter.

But then I considered the second case, and my (relatively speedy) scheme fell apart. In the second case, any given item in the list could (1) be some object whose class defines a virtual or final .name method, noting that final methods don't typically show up in vtable/RTTI type data; (2) it might be an object that implements a .name method required by an interface; (3) it might be an object who gets an actual .name method definition from a mixin/trait; (4) or it could be some object itself instantiated (via new) directly from some unnamed structural type.

I don't see a way to distinguish this mess of cases until runtime, particularly in the second example of a list of structurally typed objects, and how to compile such code for efficient execution.

I tried CiteSeerX but didn't turn up any papers directly addressing approaches to analysis and compilation.

Any pointers? Thanks VERY much in advance.


p.s. (I'm very interested in this question because I think an answer might also help me to efficiently compile a set of general, functional record operators and record type operators described in a paper by Cardelli.)


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Collections and Static Structural Typing

In a dynamic language, the problem you describe tends to evaporate due to the large amount of data readily available in the environment. You also wouldn't need to specify '{.name(->Str)}' for your functions.

In statically compiled languages, however, you're ignoring one of the capabilities you possess: you're free to lose data that isn't leveraged by a client of the object.

This 'collection' you've developed prior to passing it into the 'print_names' method will have a statically determined type just like everything else, and that type might look something like [{.name(->string), .favorite_color(->(red|blue|green)), .quest(->Goal)}].

The necessary type of this collection could be annotated by a programmer or simply inferred from the set of clients that ultimately observe the collection. And, ultimately, each object with a reference interred to this collection will need to expose the associated parameters.

If there are no side-effects, it's worth noting that the simple implementation would be for the collection to simply map to a collection of flat records of possibly lazily computed values. hat's what Haskell would do. However, it is clear that you're assuming an object-oriented collection with 'methods' that are statically determined and whose names aren't available to a dynamic dispatcher.

In C++ this sort of thing is sometimes done by inverting templates:

  // ...these usually constructed with macros...
  template<class T, class R> class get_name { 
    static R v()(void* pT) { return ((T*)pT)->name(); }}; 
  template<class T, class R> class get_quest { 
    static R v()(void* pT) { return ((T*)pT)->quest(); }}; 
  template<class T, class R> class get_favorite_color { 
    static R v()(void* pT) { return ((T*)pT)->favorite_color(); }}; 

  class NFCQ {
    void* pObject;
    string (*pfnGetName)(void*);
    Color (*pfnGetFavoriteColor)(void*);
    Goal (*pfnGetQuest)(void*);

    template<typename T> NFCQ(T& obj) 
      : pObject(&obj)
      , pfnGetName(&get_name<T,string>::v)
      , pfnGetColor(&get_favorite_color<T,Color>::v)
      , pfnGetQuest(&get_quest<T,Goal>::v) {}

    string name() { return (*pfnGetName)(pObject); }
    Color  favorite_color() { return (*pfnGetColor)(pObject); }
    Goal   quest() { return (*pfnGetQuest)(pObject); }

One could use NFCQ to wrap any object, whether it be final or dynamic, with something that grabs its name, color, quest, etc. One could even make a collection (std::list<NFCQ>). The use of template computed methods acts a great deal like a typeclass with a default derivation that can optionally be specified by the programmer:

 template<> get_name<MyObject,string> {
   static string v(void* pO) { 
     return ((MyObject*)pO)->MyObjectName(); 

Anyhow, sprucing this solution up a bit, providing greater safety than is achieved by use of 'void*', etc. is entirely possible, and is essentially what you'd end up doing for collections of arbitrary types in structural typing. Properly, what you have is a collection of [(objectRef,typeclassRefA,typeclassRefB,typeclassRefC)] four-tuples that are known by the compiler and accessed in the code.

But before you jump on a solution like this one, you should ask yourself: to what degree do I really need objects with side-effects that don't even know their own method names? Why not just stick a message dispatch table on every object that will potentially receive messages from an unknown source, and be done with it? Am I even taking the right approach here?

You can always map method names to unique integers to reduce lookup times. It has always been my experience that increasing uniformity among components of the language buys you features, and the increased simplicity often makes automated optimizations viable that gain back most of what you lost. If you are developing a new language, go ahead and stick vtables on objects for reference to 'final' fields and such if it simplifies this other stuff... you're still free to optimize the message dispatch if you know the exact type of an object statically, and you won't need to deal with this slow 'typeclass' approach to collections of objects (which buys you back a bit of performance and space).

I can't speak authoritatively...

but my current hobby project is a structurally typed language which is pretty much this sort of thing.

For the collection of things, I don't particularly have a problem with a variety of options. Something in the list is guaranteed to be a sub-type of {.name->string} because structurally its type has (at least) a member name that returns string. If the source of that member was the type itself, or a trait/mix-in, or a super-type... that's all already done before the compiler does parameter checking to let an object into the list.

Granted, most of the mess still can't be resolved until runtime. You don't know specifically what method will be invoked since the runtime type might overload it, just that to get there you do member-access 'name', invoke. That invocation then looks at the compile time type, and the runtime type to do proper dispatch to the most appropriate method (based on some rule system you setup). That dispatch (at least for me) can be cached if desired since types are constant entities. For my current [naive] prototype that is... 'fast enough'. It's not in the top targets for optimization and even hitting many small methods the dispatch remains a tiny fraction of the processing time.


Morrow (previously mentioned here and here) uses a form of evidence to index the values of a record. Perhaps it'll help you.

Put the hidden parameter everywhere?

I wonder if you could solve your problem by putting the "hidden parameters" everywhere, in addition to function parameter lists. So, in your list, each element would actually be the regular object pointer along with the hidden parameters.

var people : [ { .age(-> Int), .name(-> Str) } ]
people <- ...
   // Each element in 'people' is actually a 3-tuple
   // (object, age_func, name_func)

Width subtyping becomes difficult, but it might be workable:

var things : [ { .name(-> Str) } ]
things <- people
   // Use the same data as 'people', but have an
   // indicator at the beginning of 'things' that
   // says which of the elements of the three
   // tuple are actually relevant.  In our case,
   // it's the first and third.

I haven't worked this out past this simple example, so there could be a point at which this scheme fails to work.

Also, Daan Leijen's paper on extensible records seems related:

Xavier Leroy's ZINC paper

For statically type-checked languages, an efficient implementation is outlined in Xavier Leroy's paper on ZINC.

Below is a rough summary from my memory, somewhat generalized to non-ML languages.

Let's assume your language has some notion of a named function signature used for resolving the function being invoked via an interface. This named signature is presumably a (function_name, function_signature) pair or the type-annotated/mangled function name or its moral equivalent. In the case of ZINC, Caml/OCaml doesn't allow function overloading, so the name is sufficient as the named signature. Your language may use something more exotic, but presumably it has some notion of function equivalence classes by which function call sites are annotated for resolution purposes.

As I remember Leroy's scheme, your linker/class loader needs to keep an enumeration of all named function signatures that may be invoked through interfaces. (In your case, this may be all member functions.) When a class is loaded, the linker/loader needs to assign a unique integer to each of the named signatures that may be called via an interface, and find (preferably the smallest) a modulus which maps these integers to unique small integers via modular division. This modulus and a table of function pointers (similar to a vtable) are then associated with the class by the class loader. (I imagine you would make them the first and second slots in every vtable to avoid bloating each class instance.) The linker/loader also needs to patch every interface call site with the integer corresponding to the named signature of the called function.

Function resolution is as follows: the caller finds the modulus and the function pointer table for the target. The unique integer for the called function equivalence class is then converted into an offset into the table by modular division. The function pointer is retrieved from the table and invoked.

In the worst case, the modulus ends up being equal to one more than the largest unique integer assigned to the functions exported by the class. However, Leroy's paper contains an analysis which estimates that the mean is much much smaller than the worst case. (It may average a small multiple of the number of exported functions. I forget.)

In essence, this scheme is a hash table of function pointers. However, compile-time type checking has determined that an entry will exist, so the call site doesn't have to deal with the case of missing entries in the table. Also, the linker/loader has sized each class's table such that there are no collisions, so the call site doesn't have to deal with this case.

Modular division is slow on most CPUs. Presumably, one could find another parameterized compression function that is more efficiently calculated than modular division, or the linker/loader could restrict itself to moduli that can be efficiently implemented as a small number of fast operations.

If the majority of functions in your language are invoked via interfaces, it may be most efficient to get rid of the traditional vtable entirely and put the modulus at the start of the function pointer table, and store a pointer to the function table where the vtable pointer would normally be stored. If I remember correctly, this approach is what is actually outlined in Leroy's paper.

Not sure?

I use binary search on tuples of types and methods, O(log(N)).

I didn't fully understand your post. As I understand it the size of the lookup table becomes #(methods) * #(types). Can your comment be abbreviated as Leroy gives a O(1) lookup at the expense of growing the lookup table? Or is it a hashed map?

[Forget it, got it]