An Exceptional Case
Being too clever can backfire not only on a person or animal, but also on a piece of software, sometimes many years after the respective commit. Here is one such case that I witnessed in the Excelsior JET compiler recently.
This story began when Charles Nutter, one of the authors of JRuby, asked us whether AOT compilation could improve JRuby startup. We set JRuby up for compilation and eventually found out that the answer is “Yes, AOT really helps”. Not everything worked out of the box, though. When compiled with default settings, the
jruby.exe -X-C -S gem list command failed with a message like the following:
ERROR: While executing gem ... (NoMethodError) undefined method `' for nil:NilClass
After a few iterations, we discovered that it works as expected with stack trace support enabled, which means that JRuby relies either on the availability of precise call stack information or even on line numbers.
So far so good. With that knowledge we continued the investigation, only to realize that while the AMD64 version worked OK with stack trace enabled, its x86 sibling failed with an assert in a platform-independent part of the compiler!
That part packs exception tables into a compact representation to be used by the runtime. We have always had such a representation, but completely re-worked its format in last versions2, adding some pretty restrictive asserts. Moreover, there are a number of heavy asserts that are disabled by default, with a comment to turn them on in case of emergency.
Our case looked pretty emergent, so enabling those checks seemed to be a good idea. And it really was! With that heavy assert enabled we’ve got another compiler crash, but this time with an exhaustive explanation of what had gone wrong:
xSites weren’t merged but have the same offset: First: Second: siteOffset: 295 siteOffset: 295 kind: DIV kind: NULLCHECK inline depth: 6 inline depth: 4
This effectively means that at offset 295 in some method there appeared a single CPU instruction that came from one method (through six inline substitutions), while its operand came from some other method (through four inline substitutions). But that’s not all! Each of them could throw its own Java exception! As the assertion message suggests, one of those possible exceptions might be raised by a division operation and another by a null-check. But how can that be?
Readers familiar with the x86 instruction set and the implicit null-check technique3 might have already guessed what such instruction could look like. Still, let’s dive into the generated assembly and look at the problematic instruction:
mov ecx, dword [edi+1CH] ; 0123 _ 8B. 4F, 1C cdq ; 0126 _ 99 idiv dword [ecx+10H] ; 0127 _ F7. 79, 10
Here, the null-check hides in the pointer dereference
dword [ecx+10H], which would trigger a hardware exception if the value of
ecx is zero. If the dereference succeeds, but the value at
ecx+10H is zero,
idiv will raise a different hardware exception signalling a division by zero.4
Now, the JVM must convert those low-level exceptions into a
ArithmeticException, respectively. And that’s the problem — dereference (representing object’s field access) and division originated from two different methods that the compiler has managed to reduce to one instruction via inlining. This means that different stack traces should be displayed depending on which of the two exceptions this single instruction raised!
The Excelsior JET runtime was never prepared for such strange things to happen — storing two inline contexts for one instruction and distinguishing between them wasn’t even thought of, let alone designed for. But that’s not a bad thing, because division+dereference is the only case when such a divergence of inline contexts can appear in a single instruction. Plus, it is not obvious that merging a memory access operation with division would do any good for code performance on modern hardware. And that’s the reason why this optimization is only present in the x86 compiler and not in the newer AMD64 one — we have not fully unified their codebases yet.
OK, now we completely understand the problem we’ve faced, so what could be the solution?
Well, the most proper solution would be to place both possible stack-traces in metadata somehow and teach the runtime to choose the right one when an exception occurs. But that solution is also the hardest to implement and, more importantly, it’s definitely not worth the added complexity — the optimization most probably won’t be ported to the AMD64 compiler, but will instead be dropped when the time comes.
With that in mind, the next possible solution is to wean the compiler from this too-clever optimization. That would have required re-working of that yet-to-be-merged part of the codebase, which was not aligned with our development plan.
Thus, we chose the least influential approach: handle this exceptional case when collecting the metadata by selecting the longest common prefix of two inline contexts. That still conforms to the specification, because, again, according to getStackTrace documentation, some frames may be skipped by the VM.
This fix will be available in the coming-soon Excelsior JET 14: Beta 1 is already available, so you can try this at home.
Stay tuned for more stories to come!
- See getStackTrace documentation, and particularly StackTraceElement.toString.
- As a pleasant side-effect, it yielded a 7% decrease of the
.rdatasection of an average executable produced by Excelsior JET. However, there is a lot more to this new format — maybe just enough for a deep technical longread.
- To those interested in more details, I can recommend the IBM paper Effective Null Pointer Check Elimination Utilizing Hardware Trap.
- To illustrate what code compiler can turn into these instructions, I’ve created a simplified example available in our repository.