Static Analysis for Duck Typing

After reading about like and wrap in Ecmascript I began thinking about possibility of at compile time checking that parameters passed to functions are a duck for that method. For example say you had:

def quack(duck):
duck.aMethod()
duck.anotherMethod()

It would see that duck requires aMethod and anotherMethod methods, and then check that all calls to quack pass in types that have these methods. Is something like this possible? Or am I failing to see something important that prevents having static analysis for duck typing languages.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The difficulty you will run

The difficulty you will run into is getting this scheme to work in the presence of branching on dynamically computed values. Consider:

- Dynamic selection of which object is passed in. In 'quack(duckArray[i])', you either require that every element of duckArray has the same type (restricting your language) or this problem rapidly gets hard.

- Dynamic use of the object inside of the function. What if quack took two parameters, the second a boolean, and only called 'anotherMethod' if the boolean was true? How do you type that?

One of the "basic" methods

One of the "basic" methods for this sort of analysis that seems to have spawned a number of variations is the Cartesian Product Algorithm:
Ole Agesen. The Cartesian Product Algorithm: Simple and Precise Type Inference of Parametric Polymorphism. In Proceedings ECOOP '95, Aarhus, Denmark, August 1995. Springer-Verlag.
http://citeseer.ist.psu.edu/agesen95cartesian.html

You could take a look at some of the improvements to the basic CPA like Starkiller (static analysis of Python) or the DCPA (static analysis of downcasts in Java, but applicable in general to static analysis of duck typing).

Starkiller:
M. Salib. Faster Than C: Static Type Inference with Starkiller.
web.mit.edu/msalib/www/writings/talks/Pycon2004/paper.pdf

Data-polymorphic CPA:
T. Wang and S. F. Smith. Precise constraint-based type inference for Java. Lecture Notes in Computer Science, 2072:99--117, 2001. 17 http://citeseer.ist.psu.edu/wang01precise.html

You can in C++

In C++ it is possible to define a duck-typing interface that fails to compile if the target does not support the expected methods. Basically what happens is that for every type that you want to use as a duck, a table of function pointers is automatically generated at compile-time (kind of like a vtable), and this is stored in a fat interface pointer (i.e. it has a pointer to the target and a pointer to the function table).

See this article at CodeProject.com or this article at DDJ.com.

This sounds like

This sounds like dictionary-passing as used with type classes. I was going to suggest this same technique, but the problems are the transitive cases, where a higher-order function is called that requires a method/function that the calling function did not need. How do you make that constraint parametric for the higher-order function parameter?

It is easy to pass

It is easy to pass duck-typed interfaces to other duck-typed interfaces. In Pseudo-code:

duck-interface Cat { void Meow(); const char* Name(); };
duck-interface Dog { void Chase(Cat c); };

class YourCat {
public:
  void Meow() { printf("meow\n");
  const char* Name() { return "Sandro's cat");
};

class MyDog {
public:
  void Chase(Cat c) { printf("chasing %s\n"), c.Name()); }
};

int main() {
  YourCat c;
  MyDog d;
  d.Chase(c);
};

So to solve your problem you represent any single member function using a functor (an object with a single apply function).

  duck-interface BinaryIntFunctor { int Apply(int x, int y); }
  duck-interface Worker { int Work(BinaryIntFunctor f); }  

or using templates:

  template<typename T>
  duck-interface BinaryFunctor { T Apply(T x, T y); }
  duck-interface Worker { int Work(BinaryFunctor<int> f); }  

Does this address the scenario to which you were referring?

I don't think so. Consider:

I don't think so. Consider:

function iter(array, g) {
  foreach(object o in array) {
    g(o);
  }
}

iter traverses the array, and applies g to each item. g may invoke any method on o, probably none of which we can know ahead of time. How do you build the dictionaries at the call site in this case? They would have to be somehow parameterized on the methods that g calls, but g is not itself statically known. There are more sophisticated higher-order structures, but this is the simplest case which I think illustrates the point.

Where is the duck-typing?

Where is the duck-typing in your example? If you could flesh it out (e.g. use it context) then we could discuss further.

The duck typing is in g. Say

The duck typing is in g. Say the array is of floats and ints, and g may be a function that uses an arithmetic function, say + or -.

array = { 1, 2.0, 3.0, 4 }; //array of [int,float,float,int]
iter(array, fun x -> print(x + 1))
iter(array, fun x -> print(x - 1))

How can you construct dictionaries at the call site for iter or at the call site for g within iter? Performing a method search is possible, as OCaml does with its objects, but using dictionary-passing doesn't seem feasible.

Go take a look at

I'd suggest that you go take a look at Interfaces in C++ (a.k.a. the Boost Interface library or BIL), it highlights some of the possibilities of duck-typing interfaces.

In the example you give one could parameterize the array and functor over something like "INumeric" which supports addition, subtraction, and printing.

I don't see how interfaces

I don't see how interfaces are duck typing. Those interfaces are defined ahead of time, and the code uses the interface. With duck typing, the function g may use *any* of the methods/functions defined on the objects in the array, and this can't be known ahead of time. Is the method invoked on the array objects inferred somehow, explicitly specified? I'm not very familiar with C++.

That isn't duck typing

That isn't duck typing though. To even get close to duck typing, you need something like polymorphic extensible records masquerading as objects, and probably some form of equi-recursive function types.

Duck typing has no notion of interfaces like "INumeric". Instead, the rule is simply that if an object responds to a method, it gets invoked.

The class-based approach of something like Haskell or C++ is much more strict. For example, in Haskell, any (+) function must always be an instance of the type 'Num a => a -> a -> a'. This is the equivalent of requiring a '+' method in an OO language to only take one argument that is the same type as the receiver which must inherit from 'Num'.

Languages with duck typing clearly go far beyond this. If I want to define a '+' method that takes three arguments of varying kinds and calls 'foo' on the first, 'bar' on the second', and 'baz' with two arguments that are the result of the 'foo' and 'bar' messages, I can.

Duby

You might be interested in looking at Duby - a type-inferred Ruby-like language - created by Charles Nutter (one of guys behind JRuby):

Structural Typing

I believe this is known as structural typing, as found, for instance, in the O'Caml object system.

Similar to type inference

First, in type inference one starts with a function definition and a type for all free variables are inferred from it. Second, in Haskell type classes (or traits in other languages), a set of methods can be declared which can be used in types to restrict the domain of a function. What you describe sounds like "trait" inference (which I don't know if it exists), where not only the type is inferred, but also a set of traits. But this assumes that "aMethod" and "anotherMethod" are declared in a trait to begin with, and if this is not the case, then it is completely different.

IT IS NOT DUCK TYPING!!

Hi guys,
What the OP described, as Matt M. stated in the first comment, IS NOT DUCK TYPING.

Sorry for the shout, but maybe the DuckDecoy example in the talk page of the wiki article will help show an example of Duck Typing in which the OP's scheme would fail.

- Paddy.

Matt M stated no such thing.

Matt M stated no such thing.

It's quite clear what the OP was referring to. Furthermore, the notion of 'static duck typing' fairly clearly refers to this. You might want to complain about it being a degenerate example, but in that case the static crowd'd rather like the word 'type' back thankyouverymuch.

In practice, you don't own the terminology and the tone of your post (even with the apology for shouting) isn't appropriate.

But is it duck typing?

Er, I think you are wrong. Is it inappropriate to clarify the terminology as "duck typing" is used rather unconventionally in the original post. Don't you think you are sounding a little rude and reactive here?

GIving type back

Hi Philippa,
Tone is a hard thing to judge in this kind of post. It is rude to shout for which I appologised but, equally, I was making the point that there is a difference between the structural typing that the OP referred to and Duck typing that MM *was* pointing out, and was ignored by the majority of the posts after him.

Now if you want to call what the OP was referring to 'static duck typing' then I cannot stop you, but others may be more likely to think that it *is* Duck Typing. Those wanting to know more might follow the link.

As for type: Sorry, I'll put it back in your pram :-)

- Paddy.

My guess is that the OP that

My guess is that the OP *was* talking about duck typing, in the sense that I think he was imagining a system of inferred types that would somehow type any valid Python program. And while I see where you're coming from, I think phrasing the situation as "that's not duck typing" could be confusing because the OP didn't actually have a complete structural type system in mind. If he'd sat down and tried to write down all of the typing rules I think he would have realized the problem.

Agreed

That's how I read it, too.

However, once you get past the shouting and confusion, paddy3118 does have a point worth making in this thread, that (dynamic) duck typing is not equivalent to (static) structural typing, as some other comments seem to have been assuming.

[Edit: the difference being that dynamic duck typing allows imperfect ducks, as long as missing features aren't accessed, while static structural typing does not.]

[Edit: the difference being

[Edit: the difference being that dynamic duck typing allows imperfect ducks, as long as missing features aren't accessed, while static structural typing does not.]

Wouldn't this depend on how refined your typing is? For instance, given extensible records with first-class labels, we can simply ensure the presence of individual fields. I can't think of any dynamic behaviour that would be more refined than this.

Consider a pair where the

Consider a pair where the first element is a boolean and the second element must have a 'doSomething' method only if the first element is true. A perfect type system handling the general case would run afoul of the halting result.

Also I agree with Anton that paddy3118 had a useful point - I was just trying to point out a source of confusion.

Consider a pair where the

Consider a pair where the first element is a boolean and the second element must have a 'doSomething' method only if the first element is true. A perfect type system handling the general case would run afoul of the halting result.

Ah, I get it now. Still, I would think this particular example would be expressible with GADTs for True and False, and the label need only be present under True.

Ya, this example would only

Ya, this example would only require a minor extension. In general though it's undecidable whether or not an object is duck-enough for a given function.

I'll be quack

Does your duck terminate?

GADTS for DuckDecoy?

"I would think this particular example would be expressible with GADTs for True and False, and the label need only be present under True"

I can't follow your terminology, but I am interested in how you could statically type check the DuckDecoy example. Would you expand on this?

Thanks, Paddy.

Maybe something like ESP

ESP: Path-Sensitive Program Verification in Polynomial Time seems to have facilities for proving similar criteria. I found it via Taras' blog, where it's applied to checking ordering and co-occurrence constraints in Mozilla code, which are similar constraints to the decoy duck example.

Yes

I completely agree, and I think this is a common source of confusion when static typists try to account for "duck typing." If I had to summarize the distinction, I'd say that duck typing really requires a form dependent structural typing. I think that way of saying it highlights the fundamental problem in a way that type afficionados will readily understand.

Apples to oranges

Imperfect ducks are only allowed not because they don't have all features, but because the features have a default implementation: throw an exception. It's perfectly possible to have a structural type system that augments records without the necessary labels with records that have the missing labels pointing to _|_. Such a system wouldn't catch any errors, but if we mix it with a healthy dose of gradual typing we can have something that gives you the cake and let you it eat too.

On a side note, any OO enthusiast would eschew the if test and create a new polymorphic message, so the code would look like (in Smalltalk) duck move; doDuckStuff. with doDuckStuff having a different behavior whether the duck is flying or not, but for duck decoys it just quacks. Death to everything except message passing! Tell, don't ask!

haXe

Such structural subtyping (with type inference) is already implemented in the haXe programming language (http://haxe.org) which targets Javascript, Flash and Neko.

Duck typing in C# 3.0

In C# 3.0 there is a way to achieve duck typing by using extension methods. Extension methods are a new feature of C# 3.0 and are like mixins in Ruby. If you call an extension method on an object, compiler first will look through the object itself to see if it has an implementation of that method and if not; the extension method will be called.

So this way we have a statically typed duck typing.

Everything I've read about

Everything I've read about extension methods implies they use purely static method resolution. Are you just trying to exploit C#'s overloading to simulate duck typing?

Statically typed?

It's not really statically typed though, is it? The example you give is adding an extension method to Object with

// ...
public static SomeType TheMethod (this object o, [signature]) { throw new MissingMethodException (); }
// ...

Since TheMethod is defined so that method calls will throw an exception (which is a runtime event) if the target object doesn't have its own version of TheMethod defined, aren't you essentially embedding a kind of dynamic type check within the static type system?

Extension Methods and Class Methods

"An extension method with the same name and signature as an interface or class method will never be called.", this is the point (from official document on extension methods).
There is not a mechanism that check for existence of a class member with the same signature. So yes it is a runtime event. The type system of C# 3.0 has not a representation for this sort of type checking (which could be a protocol based typing like Objective C).
Yet it is not "trying to exploit C#'s overloading to simulate duck typing" (to naasking) because the dynamic dispatching (as described in definition) exists there and will choose to call the extension method or the class member and these method are not in the same scope. So this can not be called overloading (which takes place between class methods - members and inherited members).
And you can employ generics to achieve more type-safe product of this type, yet it is just partly statically typed.

"An extension method with

"An extension method with the same name and signature as an interface or class method will never be called.", this is the point (from official document on extension methods).
There is not a mechanism that check for existence of a class member with the same signature.

That just sounds like scoping to me. The class methods always shadow extension methods, which makes perfect sense. All class methods are known at compile-time, and only the extension methods available in namespaces which have been imported, so I can't see any reason why extension methods wouldn't be resolved purely at compile-time.

All class methods are known

All class methods are known at compile-time, and only the extension methods available in namespaces which have been imported, so I can't see any reason why extension methods wouldn't be resolved purely at compile-time.

What do you mean by 'resolved purely at compile-time'? You still have dynamic dispatch - statically you only know apparent types and not the actual (dynamic) types. I think it is true that this is equivalent to just lumping all methods into a class at the top of your hierarchy. Extension methods just let that you keep that class open to method addition.

From my understanding, there

From my understanding, there is no dispatch with extension methods. The method is selected purely based on what you called the "apparent type". See here for a brief overview.

For example, if you have two extension methods "ReadAll", one which takes a FileStream and one which takes a base Stream, and you call ReadAll using a FileStream which is locally typed as a Stream, it will call the ReadAll taking only a Stream.

Thus, the method is resolved purely at compile-time, no dynamic dispatch. Extension methods are purely a syntactic convenience for overload resolution of static function calls (to make them look more like method calls).

Ah ok, that actually makes

Ah ok, that actually makes more sense. Thanks :)

Type Definitions and Extension Methods

Consider that Extension Methods are not included in the definition of the extended type (If they were then they were no more extension methods but class methods!).
Here is a simple code and the extended type restricted to those that are derived from a specific interface:

public interface IMarker { }

public static class ExIMarker
{
public static void MethodToBeCalled (this IMarker sortOfIMarker, int arg1, string arg2)
{
// throw new MissingMethod ();
Console.WriteLine ("MissingMethod");
}
}

public class Type1 : IMarker
{
}
public class Type2 : IMarker
{
public void MethodToBeCalled (int arg1, string arg2) { Console.WriteLine ("Type2.MethodToBeCalled called"); }
}
public class Type3 : IMarker
{
public void MethodToBeCalled (int arg1, string arg2) { Console.WriteLine ("Type3.MethodToBeCalled called"); }
}

The code for putting this to use:

Type1 t1 = new Type1 ();
Type2 t2 = new Type2 ();
Type3 t3 = new Type3 ();

t1.MethodToBeCalled (100, "100");
t2.MethodToBeCalled (100, "100");
t3.MethodToBeCalled (100, "100");

And the output is:

MissingMethod
Type2.MethodToBeCalled called
Type3.MethodToBeCalled called

As we can see here calling syntax for all three types are the same and at compile time we do not know that Type1 has actually an implementation for MethodToBeCalled or not.
Another thing that should be concered is the mechanism for comparing the signature of the method; the signature of extension method is compared to the signature of class method with ignoring the first argument of the extension method.
Only at runtime we will consider that Type1 has not an implementation for MethodToBeCalled.

Only at runtime we will

Only at runtime we will consider that Type1 has not an implementation for MethodToBeCalled.

No, all of this information is known at compile-time. t1.MethodToBeCalled would not even compile if ExIMarker were not in scope at the time. Finally, only if you changed:

Type1 t1 = new Type1 ();
Type2 t2 = new Type2 ();
Type3 t3 = new Type3 ();

To:

IMarker t1 = new Type1 ();
IMarker t2 = new Type2 ();
IMarker t3 = new Type3 ();

And you got the exactly same output:

MissingMethod
Type2.MethodToBeCalled called
Type3.MethodToBeCalled called

Would this be a dynamic dispatch of some sort.