Dynamo: A Transparent Dynamic Optimization System

Download Report

Transcript Dynamo: A Transparent Dynamic Optimization System

Dynamo: A Transparent
Dynamic Optimization System
Bala, Dueterwald, and Banerjia
www.hpl.hp.com/cambridge/
projects/Dynamo
What is Dynamo?


Dynamic translation = quick & dirty
virtual-to-native translation + focused
native-to-native optimization of the
most frequently executed portions of
the translated code
Dynamo focuses on the latter
component
Motivation for Dynamo?



Programs typically spend most of their
execution time in a small fraction of program
code
Optimizing code at translation-time can be
counterproductive
Dynamo allows for implementation of a quick
virtual-to-native optimizer while letting the
native-to-native optimizer clean up the
translated code “hot” at runtime.
Dynamo Implementation

The native-to-native optimizer is
implemented as a software layer tightly
coupled to the CPU hardware
How does Dynamo work?



Dynamo operates at runtime
Interprets an application’s instructions
via a native instruction interpreter
Activates counter for application
addresses satisfying a start of trace
condition—an address reached by
taking a backward branch in the
application
Establishing “Hot” traces



If a start of trace is referred to again in the
program, Dynamo goes into the fragment
cache thus suspending itself and resuming
native execution of the program.
Key: when an address becomes “hot”, it is
statistically likely that the very next sequence
of interpreted instructions will also be “hot”.
When a counter is “hot”, the interpreter
changes state and goes into trace selection
mode.
What is done with hot traces?



Once selected the instructions are converted
into a low-level IR (intermediate
representation).
Next, a lightweight optimizer processes the IR
to create a fragment—a single-entry, multiexit sequence of instructions laid out
contiguously in memory.
Finally, a linker emits the fragment code to
the fragment cache and links it with others in
the cache
Continued…



The linker also gives a linker stub at the
bottom of each fragment for each taken
branch in the fragment.
If the branch cannot be found in the
fragment cache then it is set to jump to its
corresponding linker stub which invokes
Dynamo’s interpretive loop.
Once a fragment is put into the fragment
cache and linked, the hot counter is recycled,
thus allowing Dynamo to control the counter
storage amount used for trace selection.
The above illustrates the trace selection, fragment formation, and fragment linking
processes in terms of application flow
Optimization Results for
Dynamo

Dynamo’s Worst-Case
Scenario


Designed to be an adaptive system,
Dynamo can adjust its thresholds based
on the changing behavior of the
program running on top.
This allows Dynamo to bail out of the
overhead of its own operation—allowing
the input program to run directly on the
machine. (typical result is break even)
What’s Next?


The eventual goal is to run applications
that use Dynamo as a backend to
optimize dynamically generated code.
Virtual Machines being the primary
focus.