Reduction of Additions
Application programmers usually take compilers for granted: whether humming inside the JVM or invoked as a standalone tool, the compiler is that thing that magically translates your high-level code into native instructions.
However, any compiler itself is also a program, oftentimes large and knowledge-intensive, but still a program. Have you ever been curious about how exactly that program, the compiler, works? If yes, we invite you to take a brief look into compiler internals with a story about a tricky bug in the Excelsior JET AOT compiler. Along the way we’ll visit some basic concepts and approaches in compiler theory, as well as classical optimizations and their implementation in a real compiler. Welcome and let’s start!
Consider the following compiler optimization:
x = (y + c1) + c2 => x = y + (с1 + с2)
c2 are constants and
y are variables.
The goal of the optimization is to reduce the number of additions in the resulting program from two to one, as
c1 + c2 can be evaluated at compile-time. Would you expect a compiler to crash with a stack overflow when performing such a simple transformation, presenting a challenge to the entire compiler framework? That was exactly the experience we went through recently. Let us tell you what led to the crash and how we dealt with it.
For starters, let’s describe some properties of the intermediate representation (IR) of programs used in the Excelsior JET AOT compiler. IR is a system of objects and relations between them that express the semantics of a program.
We use SSA (Static Single Assignment) form as our main IR. In a program converted into SSA form, each variable has exactly one operation that defines its value. The SSA form is very convenient for optimizations and has become an industry standard, widely used in contemporary compiler construction .
You can use the following two techniques to transform an arbitrary program into SSA form:
- Variable versioning: every assignment to a variable is replaced with the creation of a new version of that variable:
- 𝜑-functions: special operations are inserted into IR of the program at the points where the different values of a variable (different versions) converge, coming along two or more different execution paths. A 𝜑-function represents a new version of a variable, the value of which depends on the execution path.
In the subsequent examples, rectangles represent basic blocks (sequences of operators without branches) and arrows designate control flow. The bold left-to-right arrow corresponds to the accomplished transformation.
Another important concept in the Compiler theory is dominance. Operation A dominates operation B iff A appears on every possible execution path from the beginning of the function to B. SSA form has a significant property – the definition of a variable dominates all its uses except maybe uses in 𝜑-functions (in the above loop example, observe the use of
3 in the definition of
Notice that programs represented in a non-SSA form do not always have this property, as you can see in the example with if/then statement – neither of the two definitions of
x dominates its use before the conversion into SSA form.
It implies from the dominance property that any loops in the data flow must pass through 𝜑-functions. Therefore, an operation cannot have its own result as a direct or indirect argument unless it is passed through a 𝜑-function.
Let’s go back to the original optimization.
It can be described by the following recursive pattern: if there is an addition operation in the IR, and one of its arguments is also an addition operation, collect all terms into a list, and if there is more than one constant in that list, compute their sum and build a new sequence of addition operations with just one constant and all variables from the list, otherwise do nothing.
So far so good. It seems that nothing can go wrong.
However, testing immediately revealed a case on which the compiler crashed with a
StackOverflowError while applying this optimization. Upon closer examination, we have discovered that the compiler tried to apply it to the calculation of
x in the following template:And indeed, an attempt to define the terms of the addition operation leads to infinite recursion: we add
1 to the list of terms, see that y is defined with an addition operation, so we take its arguments –
2, and add them to the list; but
x also defined through addition, so we need to add its arguments –
1 – to the list as well, and so on, until the stack is exhausted.
STOP! How is this even possible?
Such IR violates the defining property of the SSA form: irrespective of the relative order of operations defining
y we can’t ensure domination of those definitions over the uses of those variables. Therefore we have a data-flow loop, and an attempt to build its closure naturally leads to infinite recursion. Note that the template used to describe the optimization actually contains a reliance on the fact that such an IR cannot exist.
So how did those operations emerge anyway?
Those are very unusual operations, as they do calculations without initial values. Look back at the previous examples, for instance, at the calculation of
x = x + 1 in the loop which is represented in SSA form. There are no dependency loops without 𝜑-functions; furthermore, the 𝜑-functions always have an initial value, which appears before the beginning of the loop. Thus they are free of such problems.
Can this be related to other optimizations? Let’s go deeper and find the origin of those strange operations.
First, consider another simple optimization – if there is a conditional branch of the form:
if (true) A else
where A and B are basic blocks, we can change the branch to unconditional and remove the respective entry from the basic block B:
Of course, such branches are rare in the source code, but they are likely to appear in the IR as a result of other optimizations. This kind of optimizations is called Conditional Branch Elimination (CBE).
Note that we do not remove the basic block B from the IR during CBE, we just remove the edge connecting B and the basic block that contains a constant condition. The reason is that there may be edges directed to B from other blocks as well – imagine, for instance, that B is a part of a switch structure that has several labels. However, what we have to do during CBE is adjust the 𝜑-functions: removing one of incoming edges to B, we also need to remove the respective arguments from all 𝜑-functions in B.
We can also optimize some 𝜑-functions in the process. The concept is simple – if all remaining arguments of the 𝜑-function are the same, it can be replaced with that argument. Indeed, if a variable receives the same value regardless of the execution path, we can simply use that value instead of the 𝜑-function result variable everywhere.
Unreachable Code Elimination
Unreachable Code Elimination (UCE) is a classic optimization technique. In essence, it is very similar to the mark-sweep garbage collection concept: first, we identify the basic blocks that are reachable from the beginning of a function, and then remove the remaining blocks in it.
A basic block can become unreachable for a variety of reasons. In the previous section, we outlined the CBE optimization that can make a basic block unreachable. Similarly, if we can prove that a piece of code doesn’t throw any exceptions, we can safely assume that the respective exception handler is unreachable. Or we can make an inline substitution of a function that contains an infinite loop, which would make all code that followed the original call of that function unreachable, and so on.
Notice that unreachable blocks may stay in the IR for some time as re-running UCE after each and every optimization that alters edges between basic blocks is not practical (imagine GC would be triggered on each assignment to a field). Thus, unreachable code can legally exist in the IR between UCE calls, which occur at several points of the optimization pipeline.
This guarantees the absence of unreachable code in the compilation output and saves (a lot of) compilation time.
A Favorable Set of Circumstances
Here goes another example:
The only path to the loop was under a condition constantly evaluated to false. Let’s apply the CBE optimization:
As you can see, the loop has become unreachable, and its inner calculations have lost their initial values. Now let’s optimize the 𝜑-functions (their remaining arguments are 1-element sets, so we can simply replace them with their arguments) and here it goes, that terrifying IR.
One would think that now it’s time for UCE to come and remove the entire loop along with the invalid operations, but the addition optimization squeezes in between CBE and UCE and crashes the compiler with a
A question arises: where did we go wrong? Each of the steps looks perfectly logical and appropriate by itself, but their combination leads to incorrect IR. Wait, is it really incorrect?
Let us consider the dominance property of the SSA form again. It feels like the property is violated, but in fact it continues to hold true even in this seemingly defective IR. Let’s recall the definition of the dominance property: «A dominates over B if it occurs on every execution path from the beginning of the function to B.»
But the definitions of
3 are in unreachable code!
3 is present on every execution path (from an empty set of execution paths) from the beginning of the function to
3 and vice versa. It is true that all elements of an empty set are green and orange at the same time or, in our case, in unreachable code any operation dominates over any other one. Therefore the dominance property of the SSA form holds for unreachable code as well, but it does not guarantee the absence of data-flow loops in that code.
Let’s revise our compiler design decisions.
- On the one hand, the absence of data-flow loops without 𝜑-functions enables us to implement many important optimizations in an efficient manner, so we want to maintain that invariant.
- On the other hand, running UCE after each control graph alteration would eat up all that efficiency and then some, so we had to let unreachable code accrue in the IR for some time.
As we have just shown, these decisions are contradictory! And that contradiction is much worse than just an incorrect optimization, which may be debugged and fixed in isolation. Instead, we have a problem in the fundamental principles upon which the entire compiler is built, not just in one optimization.
Looks like we have to give up on one of these decisions. For example, we could enforce the absence of unreachable code. Running UCE right after each change of edges between blocks would certainly eliminate the problem, but at a very high cost.
Alternatively, we could patch the addition optimization so that it considers the possibility of data-flow loops without 𝜑-functions. That however won’t help, as that optimization had only uncovered the problem; the subsequent review of other optimizations and analyses revealed similar structures that rely on the absence of data-flow loops without 𝜑-functions in the IR. Further investigation and modification of all that code would have been required.
As neither solution is good enough, we had to figure out a way to let both design decisions stay, with minimal overheads. And we’ve managed to do just that:
Unreachable code can appear in the IR only as a result of removing the last edge directed to a certain basic block. The utility function of our IR responsible for edge removal also removes the arguments of 𝜑-functions as described above.
Compared to the complete UCE, it is relatively easy to check whether a particular basic block has become unreachable if one of its incoming edges has been removed. If it has, we replace each of the 𝜑-functions with a special operation
One can observe that if a basic block became unreachable, then no results of its 𝜑-functions can be used in any reachable code. It is not only intuitively understandable, but can be proven formally. After all, the use of a variable is dominated by the definition, but no reachable operation can be dominated by an unreachable one.
Thus we can be sure that sooner or later all uses of
NoValue will be removed by UCE, which means that the operation itself will be subject for removal as well. Meanwhile, in the rest of the compiler pipeline
NoValue acts like a legal operation that doesn’t require any custom processing.
This solution can be considered partial UCE, and it was what helped us to overcome the challenge.
- Cytron, Ron; Ferrante, Jeanne; Rosen, Barry K.; Wegman, Mark N. & Zadeck, F. Kenneth (1991). “Efficiently computing static single assignment form and the control dependence graph”. ACM Transactions on Programming Languages and Systems. 13 (4): 451–490. ↵