Type inference for Python

The subject of type inference for dynamically-checked languages came up in the Buried Treasure thread. A question was raised in that thread having to do with why static type inference in these languages is difficult. Since there's a nascent body of literature which addresses that question, here are a few links to articles and papers about type inference for Python.

A nice overview can be found in Localized Type Inference of Atomic Types in Python, a Master's thesis by Brett Cannon. The whole thesis is relevant, but for an overview of the issues, see Chapter 3, "Challenges of Inferring Types in Python". Chapter 4 summarizes previous attempts involving static inference in Python, including Psyco (previously on LtU) and Starkiller. The limitations of these attempts are briefly addressed.

Type inference solutions for Python invariably involve restrictions to make the problem tractable. The above paper focuses on "inferring atomic types in the local namespace". Another approach is described in Aggressive Type Inference, by John Aycock. Aycock makes an important observation:

Giving people a dynamically-typed language does not mean that they write dynamically-typed programs.

The article offers a type inference approach which exploits this observation. (If the meaning of the above quote isn't clear, I recommend reviewing our mammoth three-part thread on the subject, "Why type systems are interesting", part I, part II, and part III.)

The PyPy implementation of Python in Python (previously on LtU) uses a restricted subset of Python, called RPython, to implement parts of the language. RPython is sufficiently static to be able to support full-program type inference. It is not a "soft" inference approach, and is not designed to be used with ordinary Python programs. The paper Compiling dynamic language implementations covers the approach used for static analysis of RPython. The PyPy Coding Guide, starting at section 1.4 may also be useful.

(It may be interesting to note that the PyPy approach is very similar to that used previously for Scheme 48. The core of Scheme 48 is implemented in PreScheme, a subset of Scheme that supports full-program type inference.)

Finally, Guido van Rossum has a number of blog entries on the subject of adding optional static typing to Python:

If anyone knows of any other good treatments of type inference in Python or similar languages, please post links here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Type Feedback

Perhaps I'm a little naive here, but many of the points outlined in the linked articles seem to be to (at least for Python and potentially other dynamic languages) better solved by Type Feedback rather than Type Inference and static typing.

Type Feedback

Type feedback is certainly a viable alternative. Both approaches have pros and cons - using both inference and feedback together might be ideal, since inference will tell you some things that feedback can't, and feedback will tell you some things that inference can't.

The paper Type Feedback vs. Concrete Type Inference has a table in section 4 which summarizes the main differences between the approaches.

so that's what it's called

Thanks for the paper explaining what type feedback means. It's like... everything already has a name. :-)

Uncertainty about Type Feedback

It was my understanding that type feedback only benefited the program after it had been run several times, i.e., it works kinda like profile based optimization. Is this correct? If so the type feedback approach would seem to add a great deal of complexity to the compilation and execution of a dynamic language like python. In particular it would require the program to save run time information which would either require a global cache or secondary file either of which pose security concerns.

Of course it is likely that I am just missing something here.

type feedback admin

In another discussion, I suggested type feedback as a way to collect otherwise undeclared type info to augment a DT (dynamically typed) language when an IDE execution environment can be assumed as context for collecting and storing info, and presenting some UI to a developer for using that info.

logicnazi: It was my understanding that type feedback only benefited the program after it had been run several times, i.e., it works kinda like profile based optimization.

When data collected is frequency of use (for optimization), the more times the better. But if you merely want to know what actual types were bound to formal parameters and return values (for type exemplars), one time would be enough. More would be better.

logicnazi: If so the type feedback approach would seem to add a great deal of complexity to the compilation and execution of a dynamic language like python.

The language itself would merely need syntax for declaring types, if you hoped for portable persistence of the type declarations stored inline with the source.

Complexity to gather type feedback data depends on whether you update an existing complex code base, or write a new runtime from a clean slate. I'd not want to add it to an existing Python runtime myself (not on a lark).

logicnazi: In particular it would require the program to save run time information which would either require a global cache or secondary file either of which pose security concerns.

It would only be global in a Python implementation where other things were global too. A new implementation might have, say, profiles under an IDE that collected data to use as type feedback as a developer chooses.

A developer might invoke an IDE feature to explore the ways in which a type feedback profile differed from declared types in the source -- perhaps in order to edit the original sources, or perhaps to bless type declarations stored out of line.

I'd assume one developer's type feedback profiles (in this hypothetical IDE) would affect no one else, unless source shared by everyone was updated. Of course, compiled output from source would need separate IDE storage to avoid global sharing.

This model might not sit well with folks holding a philosophical view that everyone must have an identical view of the code in common (like a school uniform :-).

The Pyton compiler for Common Lisp

There is a compiler for Common Lisp that also does type inference. See http://doi.acm.org/10.1145/141471.141558 and also http://common-lisp.net/project/cmucl/doc/cmu-user/compiler-hint.html#toc152 and http://www.sbcl.org/manual/Handling-of-Types.html#Handling%20of%20Types

Pascal

Clarification

Thanks for the references. Those compilers (for CMUCL and SBCL) have a very practical approach to type inference.

Just to clarify for anyone who may be wondering, "Python" is the name of a compiler for Common Lisp, having nothing to do with the popular programming language of the same name...

Devel::TypeCheck

Perl got some limited type inferencing during a Google Summer of Code project last year, in the form of a module called Devel::TypeCheck. There's a paper about the project by the author.

ShedSkin

The author of Shed Skin has implemented type inference for Python and it is covered in his masters thesis:

http://shed-skin.blogspot.com/

also already implemented in boo

static typing plus type inference with a python syntax: http://boo.codehaus.org/.

PyPy type inference

PyPy coding guide is not the best document to link to... The most through description of PyPy's type inference is located here: Compiling dynamic language implementations.

Thanks

Thanks, I've updated the story.

p.s. python is never getting type inference

Guido has already ruled against it (you failed to link to his follow up article where he responds to the "NIMPY" crowd that was outraged about the possibility of static typing in python).

Instead you'll get optional type checking at runtime (not static type inference), which has the ironic effect of making your python programs run slower, not faster (Guido himself said so). So you'll get all the baggage of static typing but none of the speed benefits.

The syntax will be similar to Pascal, very yucky:

def gcd(self, a: int, b: int) -> int:
     lambda: whatever :)

Guido doesn't get to decide :)

The Python implementation itself doesn't need to do type inference - it can be done by external tools, integrated into IDEs, etc. If the plan is to add type annotations to the language, that just makes it easier for such tools. (Although it would be nice if the runtime checks caused by the annotations could be disabled, when an inferencer can prove that they're not necessary.)

Done properly, this shouldn't even offend those outraged by the idea of static typing in Python, because it won't force them to change the way they write code. The negative reaction to static typing is usually to a certain kind of static checking, in which the compiler rejects programs if there's even a single term that can't be typed. The whole point of "soft" type inference approaches is to be able to get some of the benefits of static type analysis without the requirement that the entire program be explicitly well-typed.

[P.S. I've updated the story with Guido's other blog entries on the subject, thanks for pointing out the omission.]

This is similar to using a

This is similar to using a Dynamic type in haXe, except that in Python+TypeAnnotations, Dynamic is the default, while in haXe there is unification and Dynamic only when you explicitly request it.

Instead you'll get optional

Instead you'll get optional type checking at runtime (not static type inference), which has the ironic effect of making your python programs run slower, not faster (Guido himself said so). So you'll get all the baggage of static typing but none of the speed benefits.

Incorrect. First of all, the types are optional. Secondly, without them you have to do type checking manually, which would look something like this in the above example:

def gcd(self, a, b):
    if not isinstance(a, int):
        raise TypeError, ...
    if not isinstance(b, int):
        raise TypeError, ...
    ...

I'm not particularly a big fan of return types, though. I really don't see the need, especially if you have the ability to give variables types.

The syntax will be similar to Pascal, very yucky:

def gcd(self, a: int, b: int) -> int:
     lambda: whatever :)

I don't know, I think it looks pretty good. But then, I'm biased, considering Guido and I are apparently very similar when it comes to type syntax. And no, I hadn't seen Guido's post before this (ironic, I know).

not incorrect

Incorrect. First of all, the types are optional. Secondly, without them you have to do type checking manually, which would look something like this in the above example:

def gcd(self, a, b):
if not isinstance(a, int):
raise TypeError, ...
if not isinstance(b, int):
raise TypeError, ...
...

Which is exactly what the type declarations will do. Putting "a: int" or whatever will basically tell python to add runtime type checking similar in form to what you wrote out explicitly.

Also I never said the type declarations were mandatory, so I fail to see what was wrong with what I (and Guido himself) said.

Progress?

Hi,

Does anyone know what the current thoughts on this in Python are? I see that PEP-246 (the subject of Guido's most recent linked blog post) failed.