Emscripten now (sort of) self-hosting

Summarizing from the Emscripten blog, there is a new eliminator pass, a new parallel optimizer, and a new relooper.

The self-hosting part:

Note that this update makes Emscripten a 'self-hosting compiler' in a sense: one of the major optimization passes must be compiled to JS from C++, using Emscripten itself. Since this is an optimization pass, there is no chicken-and-egg problem: We bootstrap the relooper by first compiling it without optimizations, which works because we don't need to reloop there. We then use that unoptimized build of the relooper (which reloops properly, but slowly since it itself is unoptimized) in Emscripten to compile the relooper once more, generating the final fully-optimized version of the relooper, or "relooped relooper" if you will.

Emscripten seems to be quite an undertaking. I'm never sure whether I should be more impressed or more appalled. I started leaning more toward "impressed" when I found out that the Google gmail app (the one you run in your browser) is written in Java, the bytecode then compiled to javascript, and the resulting javascript identifiers rewritten for compression and interpretation efficiency.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Gmail is written in

Gmail is written in Javascript that is compressed/optimized with Google's Closure compiler. Emscripten is amazing though. I wonder if high level languages targeting JS would run faster if they compiled down to the low level JS style that Emscripten uses with an explicit byte array for the heap, instead of compiling to high level JS?

GC

In the case of high-level languages you'd then need to implement your own GC over that heap. In JavaScript. Enjoy. ;)

This may be worth it. You

This may be worth it. You can use better data representations if you manage your own heap (e.g. no tags), and you control locality.

Tough

You'll have a very tough time doing better on that level than highly tuned contemporary JS VMs operating low-level. But I'd be thrilled to be proven wrong.

GC Perf.

I think in the general case you're right. But for a language that already controls resources (linear types, region types, real-time dataflow models, etc.) I wonder if we could get better performance by controlling GC.

GC

In fact we do have a basic GC written in JS for the Emscripten heap, supporting a Boehm-like API :) (src/library_gc.js)

I agree this does not make sense for all cases - if you can utilize the browser's GC, that is likely faster. But if you need things like finalizers, having your own GC is the only way to go.

Also, for overall performance, there is a tradeoff. Compiling to code that is high-level JS and uses the browser GC will have fast GC, but slower code than something compiled to low-level optimizable JS with its own GC. So the question is, how important is GC performance in the codebase etc.