Deferred means you eventually have to do the work. Same for reference counting - when you free that last element it will have to actually do the freeing. If you've filled up ram and are allocating objects fast then you can either crash or do some garbage collection and if the tree is large it will have to take more than 1-2ms.
Yes, you can always overrun an incremental garbage collector, forcing it to start collecting more often to hit the deadline. Each collection may take only a couple of milliseconds, but there's no garbage collector that can say "no more than 1-2 ms per frame" in the case of a game, for example. If you bound max pause times, you may have to pause more often. There's no free lunch.
Deferred means you eventually have to do the work.
Not quite. What you save with deferred reference counting is what is generally the biggest cost of reference counting: assigning references to local variables and having local variables go out of scope. Deferred reference counting only requires an RC update if you store a reference in a heap object or in a global variable.
You will, of course, necessarily incur the overhead for allocation and deallocation, which one can make cheaper (bump allocation in generational and copying GC, memory regions) but can't entirely get rid of, especially in non-copying/compacting schemes.
But you lose the promptness of deallocation. Furthermore, I think the biggest overhead of reference counting is not the count manipulation (especially if you aren't making it thread safe), it's the inability to handle cycles, which DRC does not address.
You are correct that you won't get the immediacy of reference counting; that's a problem of all memory schemes that aren't naive reference counting. But no, the biggest constant time overhead for naive reference counting comes from the RC updates for literally any pointer manipulation. Not just the overhead of having increments, decrements, and possibly failed branch predictions for every single pointer assignment, but also what it does to cache locality. Deferred reference counting essentially only needs a write barrier not very different from generational or incremental forms of garbage collection. This is why tuned versions of deferred reference counting come pretty close to other tuned GC schemes, but naive reference counting falls short, even in the absence of cycles.
Cycle detection primarily affects pause time, not overhead.
Pause time is a form of overhead. It reduces both latency and throughput.
Furthermore, the RC updates needed for DRC have to be thread-safe in a concurrent scenario. I understand that Nimrod has chosen not to have a thread-safe GC, but I don't think that'll scale for systems programming: too many algorithms want shared memory, and manual memory management is much more difficult in a concurrent setting. Once you go thread-safe, RC updates become extremely expensive, while concurrent GC is well-studied and performant.
Once you go thread-safe, RC updates become extremely expensive, while concurrent GC is well-studied and performant.
This is simply false. In fact, concurrent versions of deferred reference counting have been known and studied for years [1]. The simple solution is to store the RC updates in a buffer and execute them only during a collection phase.
Pause time is a form of overhead. It reduces both latency and throughput.
Yes, but for many applications (e.g., HPC) only amortized cost is relevant.
Note also that you can do cycle detection concurrently with the mutator's operation (again, see [1]). Concurrent cycle detection makes the implementation more complicated, but no more so than a concurrent tracing scheme. Moreover, if your program is indeed performance-critical, RC schemes allow you to minimize the impact on cycle detection by designing your code so as to avoid cycles (either by using acyclic data structures exclusively or by using weak pointers). Conversely, tracing schemes make it difficult to programmatically influence worst case pause time.
> Note also that you can do cycle detection concurrently with the mutator's operation (again, see [1]). Concurrent cycle detection makes the implementation more complicated, but no more so than a concurrent tracing scheme.
I disagree—it's much more complicated. Concurrent GC is not trivial, but it doesn't have seven different colors and two different tests (sigma and delta test).
A good concurrent reference counting system is just as complex as a good tracing garbage collector, because it requires the same runtime machinery—precise stack maps, write barriers, etc., to perform backup tracing. But it also has the reference counts to deal with. It ends up being more complex overall.
> Moreover, if your program is indeed performance-critical, RC schemes allow you to minimize the impact on cycle detection by designing your code so as to avoid cycles (either by using acyclic data structures exclusively or by using weak pointers).
No, that's my point—even if you use acyclic data structures, there is no guarantee that you won't have the CC run because you had multiple references to an object and you dropped one, making the cycle collector suspect it was part of a cycle. This is the entire reason why ForgetSkippable was added to Firefox: this happened a lot in practice. The solution was to add a bunch of ad-hoc code to make the cycle collector drop objects from the purple buffer, much like the "acyclic" pragma does in Nimrod—but now you have the possibility of leaks unless this is done very carefully.
> Conversely, tracing schemes make it difficult to programmatically influence worst case pause time.
In collectors like Metronome, it's extremely easy: set the pause time to what you want it to be. That's the only solution that's really scalable in my view: working around the collector by manually telling the CC not to scan certain types feels too error-prone.
I think the slides are saying you have the ability to pass that option to the runtime. You can specify an upper bound for gc time, see here: http://nimrod-lang.org/gc.html