PyPy

PyPy, the Python implementation written in Python, was mentioned here a couple of times in the past. After it was mentioned in a recent LtU discussion, I took another look, and boy did they make a lot of progress when I wasn't looking. PyPy can even compile itself now... You should check it out again if you are interested in this sort of thing.

There's even an introduction to the techniques used by PyPy, including a nice (but very high level) overview of abstract interpretation.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Ruby?

Is there any similar project for Ruby? (If there is, what are the odds it's called RuRu? :-)

I suppose MetaRuby comes

I suppose MetaRuby comes closest to it. Here is the developer's blog. It's in a much earlier stage though.

Cool.

I haven't really done a lot of Python programming, but its always cool to see a language develop to this point (I always view an implementation of a language in itself as a major step in its development). Now that it can compile itself, I wonder if there is any future goals of writing a native code compiler in Python. It would be a sort of "coming of age" for Python to be completely self hosting and natively compiled.

Selfhosting accomplished

Benjamin, PyPy is completely self hosting ( since version 0.7 ). It is not that PyPy still needs the CPython runtime to execute the PyPy interpreter encoded in the standard objectspace and the interpreter package but the interpreter is already translated to low-level code of C and LLVM backends ( I don't know if it is completed for the latter yet ). The next mayor step will be writing the JIT i.e. the PyPy incarnation of Psyco.

Kay

LLVM done too

Yes the LLVM backend has been fully translatably since around the end of November. It is lacking some features of the C backend (notably stackless and reference counting). Most of these features are likely to be added via transformations of the flow graph and thus removed from C backend. Along with the JIT, the next major features will be integrating the garbage collection framework, stackless exposure to the application layer and abilities to write extension modules for CPython in pure (R)python which will be translated through pypy's tool chain.

Cheers - Richard

(R)Python

[...] abilities to write extension modules for CPython in pure (R)python which will be translated through pypy's tool chain.

Sounds terrific, Richard. Are there any expectations that RPython will be a well defined proper sublanguage of Python soon? Currently it seems vage and still approximative. Another related but less important question: the code in the std-objectspace with all the klunky C-ish wrap/unwrap functionality and the type-name magic is this just a relict from an early project phase, where no flowgraph derivation and no annotator was available or is it there to stay?

Regards
Kay

RPython a language?

Hi Kay - Unfortunately not a very easy question and answers will differ between PyPy developers. RPython is a subset of Python. The rules as to what the subset is vague indeed, but in short writing code without using dynamic language features post import time (ie the static-ness is based on code objects not python code) should be enough. These rules are fairly well documented and unlikely to change significantly in the future. The problem (IMHO) of promoting it is a language is that the tool chain is extremely non-user friendly and it is necessary to have a good understanding of how the flow graphs are created and how the several annotation passes work. All you will get is a stack trace in some obscure area of the annotation code for a compile error. :-) If one takes the time to learn how the above works then you have pretty useful language at your disposal.

As to object space question - well two answers. First this is a how the interpreter is implemented and is also fundamental to how abstract interpration works in the flow object space. Second answer - RPython code does not need this wrap/unwrap functionality, so possibly one could write a version of the interpreter wihout them (and the JIT would still work incidentally). However, I think that this is the core abstraction of the project and everything else has evolved from there. I think some of the more core developers could give a better answer. Cheers - Richard

Linking

I wonder. Could LLVM provide .Net-style linking between different languages ?

Short Answer

No. .NET uses a high level type system in order to provide its seamless integration.

Is JIT in the future?

Yes, I'm sorry, I must have not made myself clear. I understand that PyPy is self hosting, I was just stating that it would be the next logical step in the development of Python (or I think it would be the next logical step) to have a native code compiler (whether JIT or not) for Python. That is, to be both self hosting (which is now) and natively compiled, hence my wondering as to whether such a thing was in the works. So, a JIT is one of the goals then? I would gather as much from other comments. I hope so (frankly, I would be surprised if it wasn't).

Native compilation

"Native compilation" is not a very interesting or useful goal. The real underlying goal is efficiency, and naive native compilation of Python to C (which has been done probably a dozen times so far) doesn't buy you that.

For a language with the characteristics of Python, dynamic translation in the tradition of Self and Smalltalk is the way to go. The PyPy guys are indeed working on this and have brought their own sizable bag of tricks to the table. For example, Armin Rigo, the developer of Psyco, is one of the head PyPy honchos; I understand that the ObjSpace concept in PyPy was largely inspired by Armin's experiences in implementing Psyco, which "partially specializes" the interpreter at run-time by dynamically outputting and executing x86 code.

Really?

"Native compilation" is not a very interesting or useful goal. The real underlying goal is efficiency, and naive native compilation of Python to C (which has been done probably a dozen times so far) doesn't buy you that.

Oh, I guess I was under the impression that native compilation was compiling directly to machine code (as opposed to compiling to some other language such as C or for another machine).

Now that I think about it, I suppose my wonderings are somewhat irrelevant because any difference (between using an intermediate language or not) is kind of arbitrary (since intermidiate languages can be JIT compiled). So, it would be better to compile to an intermediate language and then JIT compile that? Is that because you can do better analysis, and hence perform better optimizations?

So, do you think PyPy will become faster than CPython in the coming months/years because of this work on dynamic traslations and optimizations? I would imagine so, as that is the whole point of dynamic translation, right? Well, its a pretty cool project regardless.

Quux

Oh, I guess I was under the impression that native compilation was compiling directly to machine code (as opposed to compiling to some other language such as C or for another machine).

I don't think the difference is very important in this context. Whether you have your own low-level code generator, or you use C as an intermediate language and rely on the C compiler for low-level code generation, the ultimate artifact is executable machine code. In any case, PyPy now has LLVM back-end support, and can hence generate machine code directly.

So, do you think PyPy will become faster than CPython in the coming months/years because of this work on dynamic traslations and optimizations? I would imagine so, as that is the whole point of dynamic translation, right? Well, its a pretty cool project regardless.

Well, I hope so. I was initially very skeptic about the project when it was still in its infacy, but so far they've managed to put my skepticism to shame. They have some very competent and enthusiastic people involved on both the business and technical side of things. So yeah, I got my fingers crossed, and see no immediate reason why their success won't continue in the future.

Is self-hosting an accomplishment?

That depends on the size of the language. Ron Garret (aka Erann Gat) once posed the interesting question:

What is the shortest meta-circular interpreter that is actually capable of evaluating itself?

It turns out, the answer is surprisingly small.

My best shot was: A tiny self-evaluating interpreter

Another post contains Info on the interpreter.

Theoretically, no. In

Theoretically, no. In practice, yes. To step away from your hosting language and move over (as much as possible) to the language your writing (in this case Python) is a good thing, if not only because your language is (most likely) more expressive and (in your mind at least) better than the original hosting language (in this case C).

I guess my comment wasn't very interesting from a theoretical standpoint. ;-)

important disclaimer

"Because this invocation of PyPy still runs on top of CPython, it runs around 2000 times slower than the original CPython"

That's a jump 19 years back in moore's terms.

When you create a language processor using that same language, you get a slow-down factor of X. If X is 1 (as is the case with compiled languages), you've successfully bootstrapped yourself. However, if the slowdown is greater than 1, the whole exercise makes no sense. There is no fixed point in this loop of self-compilation (where each additional cycle of self-compilation produces an identical binary) -- this one diverges wildly, with a factor of 2000 on each iteration.

That's outdated

That's from when the project first started. It no longer runs on top of CPython -- it's fully capable of running itself. If you look at the last release announcement:

http://codespeak.net/pypy/dist/pypy/doc/release-0.8.0.html

It's now only between 10-20x slower than CPython. And if you read the coding sprint logs, you'll find they've been able to do stuff like changing 5 lines of code and get a 50% speedup. The project is still at the immature point where very small changes yield big optimizations. And the JIT work hasn't even been done yet.

Everything seems to indicate that PyPy is going to pass up CPython for speed in the near future.

It is all much more exciting!

If i'm correct pypy using the llvm back end runs 5x slower then CPython withoud the JIT. The goal would be not only to be faster then CPython but also plain C. Arman Rigo has a nice old post to python-dev concerning Psyco's JIT speed up over here.