PDF.js viewer

i=4. On this iteration, TraceMonkey calls T16. Because i=4, the
if statement on line 2 is taken. This branch was not taken in the
original trace, so this causes T16 to fail a guard and take a side exit.
The exit is not yet hot, so TraceMonkey returns to the interpreter,
which executes the continue statement.
i=5. TraceMonkey calls T16, which in turn calls the nested trace
T45. T16 loops back to its own header, starting the next iteration
without ever returning to the monitor.
i=6. On this iteration, the side exit on line 2 is taken again. This
time, the side exit becomes hot, so a trace T23,1 is recorded that
covers line 3 and returns to the loop header. Thus, the end of T23,1
jumps directly to the start of T16. The side exit is patched so that
on future iterations, it jumps directly to T23,1.
At this point, TraceMonkey has compiled enough traces to cover
the entire nested loop structure, so the rest of the program runs
entirely as native code.
3. Trace Trees
In this section, we describe traces, trace trees, and how they are
formed at run time. Although our techniques apply to any dynamic
language interpreter, we will describe them assuming a bytecode
interpreter to keep the exposition simple.
3.1 Traces
A trace is simply a program path, which may cross function call
boundaries. TraceMonkey focuses on loop traces, that originate at
a loop edge and represent a single iteration through the associated
loop.
Similar to an extended basic block, a trace is only entered at
the top, but may have many exits. In contrast to an extended basic
block, a trace can contain join nodes. Since a trace always only
follows one single path through the original program, however, join
nodes are not recognizable as such in a trace and have a single
predecessor node like regular nodes.
A typed trace is a trace annotated with a type for every variable
(including temporaries) on the trace. A typed trace also has an entry
type map giving the required types for variables used on the trace
before they are deﬁned. For example, a trace could have a type map
(x: int, b: boolean), meaning that the trace may be entered
only if the value of the variable x is of type int and the value of b
is of type boolean. The entry type map is much like the signature
of a function.
In this paper, we only discuss typed loop traces, and we will
refer to them simply as “traces”. The key property of typed loop
traces is that they can be compiled to efﬁcient machine code using
the same techniques used for typed languages.
In TraceMonkey, traces are recorded in trace-ﬂavored SSA LIR
(low-level intermediate representation). In trace-ﬂavored SSA (or
TSSA), phi nodes appear only at the entry point, which is reached
both on entry and via loop edges. The important LIR primitives
are constant values, memory loads and stores (by address and
offset), integer operators, ﬂoating-point operators, function calls,
and conditional exits. Type conversions, such as integer to double,
are represented by function calls. This makes the LIR used by
TraceMonkey independent of the concrete type system and type
conversion rules of the source language. The LIR operations are
generic enough that the backend compiler is language independent.
Figure 3 shows an example LIR trace.
Bytecode interpreters typically represent values in a various
complex data structures (e.g., hash tables) in a boxed format (i.e.,
with attached type tag bits). Since a trace is intended to represent
efﬁcient code that eliminates all that complexity, our traces oper-
ate on unboxed values in simple variables and arrays as much as
possible.
A trace records all its intermediate values in a small activation
record area. To make variable accesses fast on trace, the trace also
imports local and global variables by unboxing them and copying
them to its activation record. Thus, the trace can read and write
these variables with simple loads and stores from a native activation
recording, independently of the boxing mechanism used by the
interpreter. When the trace exits, the VM boxes the values from
this native storage location and copies them back to the interpreter
structures.
For every control-ﬂow branch in the source program, the
recorder generates conditional exit LIR instructions. These instruc-
tions exit from the trace if required control ﬂow is different from
what it was at trace recording, ensuring that the trace instructions
are run only if they are supposed to. We call these instructions
guard instructions.
Most of our traces represent loops and end with the special loop
LIR instruction. This is just an unconditional branch to the top of
the trace. Such traces return only via guards.
Now, we describe the key optimizations that are performed as
part of recording LIR. All of these optimizations reduce complex
dynamic language constructs to simple typed constructs by spe-
cializing for the current trace. Each optimization requires guard in-
structions to verify their assumptions about the state and exit the
trace if necessary.
Type specialization.
All LIR primitives apply to operands of speciﬁc types. Thus,
LIR traces are necessarily type-specialized, and a compiler can
easily produce a translation that requires no type dispatches. A
typical bytecode interpreter carries tag bits along with each value,
and to perform any operation, must check the tag bits, dynamically
dispatch, mask out the tag bits to recover the untagged value,
perform the operation, and then reapply tags. LIR omits everything
except the operation itself.
A potential problem is that some operations can produce values
of unpredictable types. For example, reading a property from an
object could yield a value of any type, not necessarily the type
observed during recording. The recorder emits guard instructions
that conditionally exit if the operation yields a value of a different
type from that seen during recording. These guard instructions
guarantee that as long as execution is on trace, the types of values
match those of the typed trace. When the VM observes a side exit
along such a type guard, a new typed trace is recorded originating
at the side exit location, capturing the new type of the operation in
question.
Representation specialization: objects. In JavaScript, name
lookup semantics are complex and potentially expensive because
they include features like object inheritance and eval. To evaluate
an object property read expression like o.x, the interpreter must
search the property map of o and all of its prototypes and parents.
Property maps can be implemented with different data structures
(e.g., per-object hash tables or shared hash tables), so the search
process also must dispatch on the representation of each object
found during search. TraceMonkey can simply observe the result of
the search process and record the simplest possible LIR to access
the property value. For example, the search might ﬁnds the value of
o.x in the prototype of o, which uses a shared hash-table represen-
tation that places x in slot 2 of a property vector. Then the recorded
can generate LIR that reads o.x with just two or three loads: one to
get the prototype, possibly one to get the property value vector, and
one more to get slot 2 from the vector. This is a vast simpliﬁcation
and speedup compared to the original interpreter code. Inheritance
relationships and object representations can change during execu-
tion, so the simpliﬁed code requires guard instructions that ensure
the object representation is the same. In TraceMonkey, objects’ rep-

resentations are assigned an integer key called the object shape.
Thus, the guard is a simple equality check on the object shape.
Representation specialization: numbers. JavaScript has no
integer type, only a Number type that is the set of 64-bit IEEE-
754 ﬂoating-pointer numbers (“doubles”). But many JavaScript
operators, in particular array accesses and bitwise operators, really
operate on integers, so they ﬁrst convert the number to an integer,
and then convert any integer result back to a double.1 Clearly, a
JavaScript VM that wants to be fast must ﬁnd a way to operate on
integers directly and avoid these conversions.
In TraceMonkey, we support two representations for numbers:
integers and doubles. The interpreter uses integer representations
as much as it can, switching for results that can only be represented
as doubles. When a trace is started, some values may be imported
and represented as integers. Some operations on integers require
guards. For example, adding two integers can produce a value too
large for the integer representation.
Function inlining. LIR traces can cross function boundaries
in either direction, achieving function inlining. Move instructions
need to be recorded for function entry and exit to copy arguments
in and return values out. These move statements are then optimized
away by the compiler using copy propagation. In order to be able
to return to the interpreter, the trace must also generate LIR to
record that a call frame has been entered and exited. The frame
entry and exit LIR saves just enough information to allow the
intepreter call stack to be restored later and is much simpler than
the interpreter’s standard call code. If the function being entered
is not constant (which in JavaScript includes any call by function
name), the recorder must also emit LIR to guard that the function
is the same.
Guards and side exits. Each optimization described above
requires one or more guards to verify the assumptions made in
doing the optimization. A guard is just a group of LIR instructions
that performs a test and conditional exit. The exit branches to a
side exit, a small off-trace piece of LIR that returns a pointer to
a structure that describes the reason for the exit along with the
interpreter PC at the exit point and any other data needed to restore
the interpreter’s state structures.
Aborts. Some constructs are difﬁcult to record in LIR traces.
For example, eval or calls to external functions can change the
program state in unpredictable ways, making it difﬁcult for the
tracer to know the current type map in order to continue tracing.
A tracing implementation can also have any number of other limi-
tations, e.g.,a small-memory device may limit the length of traces.
When any situation occurs that prevents the implementation from
continuing trace recording, the implementation aborts trace record-
ing and returns to the trace monitor.
3.2 Trace Trees
Especially simple loops, namely those where control ﬂow, value
types, value representations, and inlined functions are all invariant,
can be represented by a single trace. But most loops have at least
some variation, and so the program will take side exits from the
main trace. When a side exit becomes hot, TraceMonkey starts a
new branch trace from that point and patches the side exit to jump
directly to that trace. In this way, a single trace expands on demand
to a single-entry, multiple-exit trace tree.
This section explains how trace trees are formed during execu-
tion. The goal is to form trace trees during execution that cover all
the hot paths of the program.
1 Arrays are actually worse than this: if the index value is a number, it must
be converted from a double to a string for the property access operator, and
then to an integer internally to the array implementation.
Starting a tree. Tree trees always start at loop headers, because
they are a natural place to look for hot paths. In TraceMonkey, loop
headers are easy to detect–the bytecode compiler ensures that a
bytecode is a loop header iff it is the target of a backward branch.
TraceMonkey starts a tree when a given loop header has been exe-
cuted a certain number of times (2 in the current implementation).
Starting a tree just means starting recording a trace for the current
point and type map and marking the trace as the root of a tree. Each
tree is associated with a loop header and type map, so there may be
several trees for a given loop header.
Closing the loop. Trace recording can end in several ways.
Ideally, the trace reaches the loop header where it started with
the same type map as on entry. This is called a type-stable loop
iteration. In this case, the end of the trace can jump right to the
beginning, as all the value representations are exactly as needed to
enter the trace. The jump can even skip the usual code that would
copy out the state at the end of the trace and copy it back in to the
trace activation record to enter a trace.
In certain cases the trace might reach the loop header with a
different type map. This scenario is sometime observed for the ﬁrst
iteration of a loop. Some variables inside the loop might initially be
undeﬁned, before they are set to a concrete type during the ﬁrst loop
iteration. When recording such an iteration, the recorder cannot
link the trace back to its own loop header since it is type-unstable.
Instead, the iteration is terminated with a side exit that will always
fail and return to the interpreter. At the same time a new trace is
recorded with the new type map. Every time an additional type-
unstable trace is added to a region, its exit type map is compared to
the entry map of all existing traces in case they complement each
other. With this approach we are able to cover type-unstable loop
iterations as long they eventually form a stable equilibrium.
Finally, the trace might exit the loop before reaching the loop
header, for example because execution reaches a break or return
statement. In this case, the VM simply ends the trace with an exit
to the trace monitor.
As mentioned previously, we may speculatively chose to rep-
resent certain Number-typed values as integers on trace. We do so
when we observe that Number-typed variables contain an integer
value at trace entry. If during trace recording the variable is unex-
pectedly assigned a non-integer value, we have to widen the type
of the variable to a double. As a result, the recorded trace becomes
inherently type-unstable since it starts with an integer value but
ends with a double value. This represents a mis-speculation, since
at trace entry we specialized the Number-typed value to an integer,
assuming that at the loop edge we would again ﬁnd an integer value
in the variable, allowing us to close the loop. To avoid future spec-
ulative failures involving this variable, and to obtain a type-stable
trace we note the fact that the variable in question as been observed
to sometimes hold non-integer values in an advisory data structure
which we call the “oracle”.
When compiling loops, we consult the oracle before specializ-
ing values to integers. Speculation towards integers is performed
only if no adverse information is known to the oracle about that
particular variable. Whenever we accidentally compile a loop that
is type-unstable due to mis-speculation of a Number-typed vari-
able, we immediately trigger the recording of a new trace, which
based on the now updated oracle information will start with a dou-
ble value and thus become type stable.
Extending a tree. Side exits lead to different paths through
the loop, or paths with different types or representations. Thus, to
completely cover the loop, the VM must record traces starting at all
side exits. These traces are recorded much like root traces: there is
a counter for each side exit, and when the counter reaches a hotness
threshold, recording starts. Recording stops exactly as for the root
trace, using the loop header of the root trace as the target to reach.