Press enter to see results or esc to cancel.

Conservative GC: Is It Really That Bad?

There are numerous JVM implementations out there, based on different approaches and concepts. However, some aspects of the JVM architecture are often considered the Universal Truths that everyone should accept. Among them is the statement that a production-grade tracing garbage collector (GC) must compute the initial set of reachable objects precisely, whereas the alternative approach known as conservative GC is deemed not good enough for production. Though we agree with this statement in general, we also have something to say about it.

By definition, a conservative collector is unable to determine, for certain objects, whether they are dead, and thus has to allow them survive a GC cycle. The fact is that we had an experience of using a conservative GC for many years before upgrading it to a precise GC, so we are in a position to judge. We can make a head-to-head comparison of a highly tuned conservative GC with a precise one in the same environment of the same JVM. Now we want to share this knowledge with you.

Precise What?

Let’s start with a brief description of the problem. In a nutshell, the classical mark-and-sweep GC algorithm works like this:

  1. Identify the heap objects that the program can access immediately, e.g. via references stored in local variables, runtime structures, and so on. We call them the root objects or just roots.
  2. Mark all roots, and objects reachable from roots1, as alive.
  3. Now, the unmarked objects are dead and can be recycled (“swept”).

In this post, however, we will only focus on the first step: how to find those root objects?

There are several sources of roots. One is non-primitive static fields of currently loaded classes. They can be easily found at run time, since the JVM has full meta information of those classes at its disposal. Then, some internal JVM mechanisms gather and maintain collections of references to Java objects. For example, the registry of local references to objects from within JNI native methods is also a source of roots. But again, these collections can be scanned at run time without any problem.

The most interesting source of roots is the set of live references local to the Java methods that the JVM is currently executing: parameters, local variables, references returned by other methods, etc. (“Live” here means that these references might be used in the future.)  And to get the full picture, we need to look them up across the current call chains of all Java threads. We call the roots found this way local roots. It is not a big deal to find them when interpreting bytecode, because Java bytecode operates on typed values. However, in the optimized native code produced by a JIT or AOT compiler, values of primitive and reference types are stored in CPU registers and on the machine stack as raw bytes. So we need to somehow identify the stack slots and registers that contain references to Java objects at the moment of GC invocation.

At this point, we start to talk about the precision of GC. Usually, for the GC to know where the references to Java objects reside, the compiler generates special data structures called GC maps2. They contain the information about all register and stack locations of Java references at certain points in the code of each natively compiled method. Such approach is called Precise Garbage Collection, because the GC knows precisely where to look for references to live objects.

However, GC maps would constitute a huge amount of data if they were generated for each instruction in the compiled code. To make it practical, a precise GC would only start working after each thread has parked at a safepoint in Java code, so the number of GC maps is limited to the number of safepoints.

Conservative How?

But wait, we are inside the JVM, right? We know how objects are constructed and where they can be allocated. Perhaps it is possible to figure out which values on the stack are actually references to Java objects without any hints from the compiler? Well, let’s try it!

The idea is pretty simple: we know the values of CPU registers and the stack range for each stopped thread. Let’s iterate over them and filter out the values that are definitely not references:

  1. Values that are not aligned properly or point outside the Java heap (we can check that) are definitely not references.
  2. Otherwise, we consider the value as an address in the heap range, dereference it and check whether a valid Java object header is located there.

Values that have passed through these checks are considered local roots (references to root objects). This is a conservative approach: there can be some values on the stack that have passed all checks by a coincidence. Therefore some “false”-roots can be added to the root set and some dead objects can survive this GC cycle as a consequence.

So in this approach some objects do not get immediately collected when they are dead. This can be rather unexpected, but still doesn’t contradict the JVM specification. After all, any generational GC may also retain dead objects in the old generation for many GC cycles.

That was the trade-off, now let’s see what benefits you get:

  • You avoid the costs associated with the generation and storage of GC-maps altogether. In a handful of real-world applications compiled with Excelsior JET, we’ve observed those maps (in optimized compact representation) taking up to 11% of the .rdata section, or about 4% of the total executable size.
  • You don’t need to consult GC-maps to find roots, so you can suspend a thread for GC almost anywhere, not only at a safepoint. It means that you need not wait for every thread to reach the nearest safepoint to stop the world anymore.
  • We can go even further. Safepoints can be used in many parts and mechanisms of the JVM. Gathering the stack trace of a thread, code deoptimizations, class redefinition, and of course GC — all these operations usually require the thread to be suspended at a safepoint. The problem is that safepoints are rather expensive, despite the JVM’s best efforts to optimize them. The effect is most noticeable in highly optimized code where the compiler fought for each and every instruction trying to achieve the best performance, and then had to slap a safepoint at the end of a tight loop.
  • However, that is just an implementation detail, nothing forces you to add safepoints to your JVM. And with a conservative GC, the main use case for safepoints disappears as you can stop threads anywhere. So… let’s throw them safepoints away! As a reward, you can get a significant performance boost for an average application. In our case, removing safepoints can yield up to 15% speedup, depending on the application.

A conservative GC looks rather cool at this moment, doesn’t it?

Is This Real Life?

However, one can say that all these benefits mean nothing as a conservative GC just can’t be used in production. The arguments are usually as follows:

Conservative root-set building is ok, but you can’t implement the rest of GC effectively. For example, you cannot implement a full copying collector, because after copying an object you have to update all references to it, including those from the stack. With a conservative GC, you can never be sure whether a value on the stack is a reference or not, so you cannot update it and therefore you cannot move the root objects. As a result, heap fragmentation will be devastating and cause an OOM very soon.

Yes, a conservative GC cannot at the same time be a full copying collector, but practice shows that does not preclude using such a GC in production. We had been using a conservative GC for many years and never observed OOM provoked by heap fragmentation. It means that usually our heap compaction algorithms worked well even in the presence of some unmovable root objects. So, OOM due to heap fragmentation in a conservative GC is not such a huge problem as it is commonly deemed.

Checking every stack slot and register is very expensive, so GC pauses will be too long with a conservative GC.

A naïve implementation of a routine that checks whether the given sequence of bits matches the address of a Java object could indeed be rather expensive (O(n), where n is the number of all allocated objects, if you will merely iterate over them). However, the check can be performed much more efficiently: in our implementation it takes O(1) time for each query. At the same time, building the root-set with the help of GC-maps isn’t free either. You need to find the proper map, decode it from its compact representation, update the root-set, and repeat the process for each method found on the call stacks of all Java threads.

There will be a lot of false roots, and because of them applications will consume times more memory and therefore it will be just impossible to use it in production.

More memory consumption? For sure. Times more memory consumption? Not quite. First of all, totally random values passing all of the above validity checks rarely appear on the stack. Moreover, these values will probably be a problem only at some GC cycles, but then the stack will be cleared and those false roots will disappear. So they cause no problems in most cases.

However, there are some cases when memory consumption can get out of control because of false roots. Unfortunately, among them are not only artificial counterexamples, but also real applications. And that is the true reason why we abandoned our beloved conservative GC and replaced it with a precise one.

When A Conservative GC Is Breaking Bad[ly]

Let’s consider the following small sample based on a real application provided by one of our clients. It utilizes the java.util.Timer platform API. Here we have a concrete implementation of the java.util.TimerTask abstract class, which only job is allocate 4MB of memory and store a reference to the corresponding Timer object. Then, in the main method we create 50 Timer instances and schedule a task for every instance.

import java.util.Timer;
import java.util.TimerTask;

public class PreciseGCTest {
    private static class Task extends TimerTask {
        Timer timer;
        int[] array;

        public void run() {
            array = new int[1000000]; // 4MB

        public Task(Timer t) {
            this.timer = t;

    public static void main(String args[]) throws Exception {
        for (int i = 0; i < 50; i++) {     // 200MB max
            Timer t = new Timer();
            t.schedule(new Task(t), 500);
        } // 50mb of heap limit should be enough for this test

Potentially this test can consume 200MB only for the Timer Task threads. However, we expect it to pass with much smaller heap: 50MB should be enough. Let’s dive into the implementation of the Timer class to understand why.

For each Timer instance, a separate thread (TimerThread) with a queue of tasks to handle is created. That thread waits until some task is ready and then dequeues and executes it:

private void mainLoop() {
    while (true) {
        try {
            TimerTask task;
            boolean taskFired;
            synchronized(queue) {
                // Wait for queue to become non-empty
                while (queue.isEmpty() && newTasksMayBeScheduled)

                if (queue.isEmpty())

               task = queue.getMin();
            if (taskFired)  // Task fired; run it, holding no locks

        } catch(InterruptedException e) { }

The thread keeps working until there are no more tasks in the queue and the newTasksMayBeScheduled flag is set to false. The latter happens when the given Timer instance becomes unreachable because there are no more references to it, except maybe from other unreachable objects. To achieve that, there is a special field in the Timer class:

private final Object threadReaper = new Object() {
    protected void finalize() throws Throwable {
        synchronized(queue) {
            thread.newTasksMayBeScheduled = false;
            queue.notify(); // In case queue is empty.

Once the Timer object becomes unreachable, so does its threadReaper object. When the GC comes for the latter, its finalize() method sets the newTasksMayBeScheduled field of the respective timer thread to false. The thread will then quit as soon as all still-enqueued tasks will be executed, if any.

So, in our sample we can expect that the fired tasks will get collected, as well as their timer objects and timer threads. We have a rather long pause between scheduling new tasks with new timers, so the GC should be able to collect the completed tasks when there is a lack of memory and the sample certainly should survive with limited heap:

> javac *.java
> java -Xmx50m PreciseGCTest

But things can go terribly wrong with a conservative GC on board.

Remember that our implementation of TimerTask contains a reference to the respective Timer object. This means that  Timer object can be collected only when all its tasks are already dead. Ok, the last use of a TimerTask is the; line in that mainLoop method, after that it should be dead.

Consider the following situation: we had only one task in the queue and it has just fired. The timer thread goes into the next iteration of its main loop and hangs there in queue.wait() (source). Actually it just waits until the threadReaper comes for it and that will happen only when the GC collects the corresponding Timer object.

The problem here is that a conservative GC is just not going to do that. There is a good chance that a reference to the task handled on the previous iteration is still somewhere on the stack or in the registers of this timer thread. So the task object will be (conservatively) marked as a root and survive a GC cycle along with its timer object. The story will repeat during the next GC cycle, and so on. As a result, the threadReaper will never come and this timer thread will hold a 4MB array on the Java heap forever.

This situation leads to a massive (potentially unlimited) excess memory consumption and unexpected OutOfMemory errors in production. Let’s compile the above example with Excelsior JET and run it with the conservative GC enabled :

> jc =a PreciseGCTest.class
> export JETVMPROP=" -Xmx50m"
> ./PreciseGCTest

Fatal error: out of memory
Uncaught exception: Out of memory

This example illustrates the main defect of any conservative collector: the absence of knowledge about local variable liveness. The variable task still contains the value that was set during the previous iteration of the main loop. Any decent compiler can figure out that that value will never be used again, but without a compiler hint the JVM has to fall on the safe side and add that Task instance to the root-set.

We tried all kinds of workarounds, such as clearing the stack after returning from a method, but all of them either had an unacceptable impact on application performance or failed to cover all problematic cases. So at some point we gave up and finally implemented a precise GC. It was a many man-years task with lots of unexpected challenges and non-trivial solutions. But today it is ready and enabled in Excelsior JET by default:

> export JETVMPROP=-Xmx50m
> ./PreciseGCTest



Conservative Garbage Collector can be alluring: it is (relatively) easy to implement, works fine in most cases despite all the stereotypes and even allows you to achieve better performance. However, sooner or later the absence of knowledge about the liveness of local variables will lead to problems in production that just cannot be handled. So a conservative GC can be a reasonable temporary solution during the development or prototyping of a new managed runtime or compiler, but in the end you will have to implement a precise GC.

That’s all for today! Hope this will be helpful.

Feel free to join the discussion about this post on Hacker News or/and Reddit both on r/java and r/programming subreddits, or just leave your comments here below.  Thanks for reading and stay tuned.


  1. OpenJDK 9 source code:
  2. PreciseGCTest example and running scripts:


  1. By “reachable from roots” we mean that references to these objects are contained in fields of roots or other reachable objects. The JVM knows the type and offset of each field in each Java object, so it can scan all non-primitive fields and find all reachable objects.
  2. “GC maps” is the term we have been using internally at Excelsior. In some articles such structures are called “stack maps”, in the source code of the HotSpot JVM the term “Oop Maps” is used, and so on.