Azul's Pauseless Garbage Collector

Here's Gil Tene on Azul's Pauseless Garbage Collector for the JVM.

One of the key techniques that we use is massive and rapid manipulation of virtual memory mappings. We will change mappings of virutal to physical memory at the rate of Java allocation.

And

The same read barrier I mentioned before will also intercept any attempt to read a reference to an object that has been relocated, and that allows us to lazily relocate references without needing a pause. We compact by moving an entire virtual page worth of objects, we kind of blow it up, moving all the live objects to other places and thereby compacting them. But we don't try and locate and find all the pointers to that page immediately.

The challenge seems to be that standard OSes don't currently have enough hooks for them to do this kind of thing so their runtime must live in either their custom hardware and OS or a virtual machine.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The virtual machine trick

But if the host operating system doesn't allow fast, fine-grained virtual memory control, how does a virtual machine solve that? Unless I misunderstand how today's virtualization software works, it would have to go through the host operating system for 'kernel mode' operations like this, wouldn't it?

We have hardware (real)

We have hardware (real) virtualisation on x86 nowadays, so no, it doesn't go through the host for every ring-0 operation.

You are correct and this is

You are correct and this is why Azul have released a bunch of patches to the Linux kernel called the Managed Runtime Initiative. The basic idea is to take control of resources like virtual memory from the kernel and put them into user's hands. The kernel developers don't really get it, from what I've read.

Cliff gave a good talk at VMIL 2010

that summarized into one session, A JVM Does What???? a lot of things he has been talking about over the years.

Snippet from abstract:

Here's my take on what services could & should be provided by a JVM, what services probably belong to the next layer up (STMs, new concurrency models), and what services belong to the next layer down (fast time management, sane thread scheduling).

This talk was taped and should be available on youtube or wherever but I can't find it.

This guy has some version of

This guy has some version of them, broken into parts?

That's it.

Bockisch is the workshop organizer. It is broken into parts due to YouTube's filters to try to block copyrighted material (edit: to clarify, the material is not illegal, YouTube simply doesn't like 1 hour long videos).

It's great to hear this

It's great to hear this happening. CPU-supported memory barriers should be very helpful to all manner of GCs, and it would be a real feather in Linux's cap.

I can understand why kernel developers, much less anyone, wouldn't get it. If you go to the project web site there is practically no documentation on what they are intending and why. They just tell you that it speeds up Azul software, that there are non-Azul members in the initiative, and that the initiative is headed by Azul.

Pager

It seems like eventually, operating systems will need to have more user-space functionality around the management of memory. External pagers are important for both Databases and Garbage collected languages in order for them to perform well.

http://frogchunk.com/documentation/macosx-programming/discardable-pager.pdf

compared to other real-time collectors?

I would have appreciated a comparison to other real-time garbage collectors. What they call pauseless, the literature calls real-time, and they've been studied for a long time now.

Historically, the reason real-time collectors haven't gotten much use is that they are on the order of four times slower than heuristics-based collectors for realistic loads. So why is Azul interested? Possible explanations:

1. This kind of collector has gotten a lot better, and the overhead isn't so bad now.

2. They plan to burn CPUs and run the collectors on side CPUs, thus the overhead turns into extra silicon rather than extra time.

3. They intend it for apps that are so parallel that they can take a 4x sequential slowdown and still win due to the massive parallelism.

I did a quick survey of real-time collectors about a decade ago, and at the time, you had to configure them with your expected rates of basic memory operations. Things like read, write, create object. If you didn't tune it right, then the collector would have to pause after all. However, to tune it right, you had to statically analyze your program to find out what its memory access rates are. It seemed to me that if you can really do that kind of analysis, then you know enough about your program to use a manual allocator. I wonder how Azul addresses this issue, or if they have additionally gotten rid of the need to tune it.

Perhaps other readers of LtU know more about this collector?

So why is Azul interested?

So why is Azul interested?

Apologies for sounding crass, but the frank answer is this: Because banks will write big fat paychecks so that they don't have to do what the London Stock Exchange did, and throw out the CLR VM for some proprietary C/C++ pile of crap that apparently is way over budget.

Azul is in the business of making the JVM a feasible business option for extreme markets like banking. In the talk I link, Click covers one example of timestamp guarantees that financial customers have come to expect despite not really being spec'ed.

Yes

You got it with points 2 and 3: Azul do already create custom hardware, with support for GC. They offer over 100 cores per machine (IIRC). Dunno about 1.

1 is old news

The only reason this article about JavaOne appeared on Artima is because that site has had a dearth of news to report lately (This is a hot button comment, and I will not pursue it any further).

Azul published a whitepaper about their pauseless collector awhile ago. He had a matrix that compared all the various approaches, intended to demonstrate exactly what their new hardware solution does.

On-the-fly GC could meet

On-the-fly GC could meet many needs of real-time GC. The pause time is proportional only to the number of roots (to take a snapshot), and collection happens fully concurrently with the mutator(s). Throughput is roughly competitive with other concurrent GCs.

If your VM runs programs in CPS form, this pause is effectively 0, without 4x the overhead you speak of. CPS overhead is certainly higher than direct form, but it's not 4x higher, and opportunities for parallelism are also higher. Might make a good design tradeoff.

Sigh, a bit lazy?

Why do you ask this question when Azul's marketing explain that they use new 'x86' instructions (planned initially for virtualisation) to create a hardware-assisted read memory barrier?

That said, I agree that it would be interesting to have real numbers on their performances compared to other real-time GCs.