Smarter Code Generation for Dyninst Nick Rutar University of Maryland

Download Report

Transcript Smarter Code Generation for Dyninst Nick Rutar University of Maryland

Smarter Code Generation for Dyninst
Nick Rutar
University of Maryland
Why do we need better code generation?

Dyninst has evolved through its releases
– Originally designed with Paradyn in mind
• Frequent changes to instrumentation

Current code generation requirements
– Not have adverse effects on pre-existing program
– Tuned to handle future changes to instrumentation

Certain optimizations currently in compilers
can be used for Dyninst
– Dataflow analysis
– Register allocation

Because it is a dynamic environment certain
modifications need to be performed
University of Maryland
Methods to Improve Code Generation

Decrease Register Spills
– No Function Call
• Save only registers generated by a mini-tramp
– Function Call
• Do Analysis to see which registers need saving

Merge Base Tramp & Mini-Tramp
– Need to create flag after first instrumentation
– Only one mini-tramp created per site

Dataflow analysis for Dead Registers
– Useful for arbitrary instrumentation points
University of Maryland
Current Register Implementation

Base tramp
– Saves/restores all volatile (caller-save) registers

Mini tramp
– Uses volatile registers as needed

Problems
– Many small code snippets will have minimal register usage
– POWER platform
• 11 volatile GPR
• 14 volatile FPR
University of Maryland
New Register Implementation

Base Tramp Generation
– Only registers explicitly used within base tramp are saved
– Series of place holder noops are generated for those registers
not saved
– Jump created at last save/restore to end of noop group

Mini Tramp Generation
– Keeps track of all volatile registers used

After Mini Tramp Generation
– Noops are replaced within base tramp with save(s)/restore(s)
– Jump is updated
University of Maryland
Example (from POWER)

Old Base Tramp
(saves)
stu
st
st
st
st
st
st
st
st
st
st
st
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
stfd
r1,-540(r1)
r12,312(r1)
r11,308(r1)
r10,304(r1)
r9,300(r1)
r8,296(r1)
r7,292(r1)
r6,288(r1)
r5,284(r1)
r4,280(r1)
r3,276(r1)
r0,264(r1)
f0,152(r1)
f1,160(r1)
f2,168(r1)
f3,176(r1)
f4,184(r1)
f5,192(r1)
f6,200(r1)
f7,208(r1)
f8,216(r1)
f9,224(r1)
f10,232(r1)
f11,240(r1)
f12,248(r1)
f13,256(r1)

G
P
R
liu
l
cal
liu
st
br
Mini Tramp
r12,8192
r12,1416(r12)
r11,1(r12)
r10,8192
r11,1416(r10)
F
P
R
University of Maryland

l
l
l
l
l
l
l
l
l
l
l
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
lfd
cal
Old Base Tramp
(restores)
r12,312(r1)
r11,308(r1)
r10,304(r1)
r9,300(r1)
r8,296(r1)
r7,292(r1)
r6,288(r1)
r5,284(r1)
r4,280(r1)
r3,276(r1)
r0,264(r1)
f0,152(r1)
f1,160(r1)
f2,168(r1)
f3,176(r1)
f4,184(r1)
f5,192(r1)
f6,200(r1)
f7,208(r1)
f8,216(r1)
f9,224(r1)
f10,232(r1)
f11,240(r1)
f12,248(r1)
f13,256(r1)
r1,540r1)
G
P
R
F
P
R
Example (continued)

New Base Tramp
stu
st
st
st
st
st
st
stfd
b
nop
r1,-540(r1)
r12,312(r1)
r11,308(r1)
r10,304(r1)
r6,288(r1)
r5,284(r1)
r0,264(r1)
f10,232(r1)

liu
l
cal
liu
st
br
Mini-Tramp
r12,8192
r12,1416(r12)
r11,1(r12)
r10,8192
r11,1416(r10)
..
.
nop
brl

Reduces Base Tramp by 34 instructions
– Eliminate 18 Saves, 18 Restores
– Add two jumps
University of Maryland
Experiments (POWER)

Simple mutatee
– for (a = 0; a < 0xfffff; a++) {
x=x+a; x+= 5*a;
if( x > 6000)
x=2;
else
x *=4; }

Instrumentation
– Increments global variable by one
• Mini-tramp is six instructions
– Inserted at every node on CFG for program
• Four base tramps for every iteration of loop
University of Maryland
Results (POWER)

Instructions Completed
– Version 4.1.1 – 30,393,823
– New Code Generation – 21,485,346

FPU produced a result
– Version 4.1.1 – 4,716,269
– New Code Generation – 1,310,013
University of Maryland
Dealing with Function Calls

Linear scan on instructions for function that
is called from mini-tramp
– Record all modified registers within function
– Make recursive calls when needed
• At certain cut-off point assume all registers were
clobbered
University of Maryland
Merging Base & Mini Tramp

Original Design Decisions for Dyninst made to use
Paradyn’s instrumentation usage pattern
– Large amount of instrumentation changed frequently

Can generate better code for various reasons

One instrumentation point installed … that’s it
– Eliminates noops for registers in base tramp
– Eliminates link register modifications and branches
– Makes assembly more stream-lined
• And readable … if you’re in to that kind of thing
– Functionality somewhat limited
• Tradeoff of speed for ease of further instrumentation
– Delete then reinsert (Replace)
University of Maryland
How will it work?

Create flag for BPatch class in API
– Once flag is set merging is set
– When flag gets reset system reverts to old style
– void setMergeTramp(bool x)
• Similar to recursion flag currently in Dyninst

No effect on current Dyninst use
– Default flag set to no merging

Most users will probably leave it at one
setting based on instrumentation needs
University of Maryland
Mini-tramp operation comparison

Merging
– Insert
• Only one mini-tramp
allowed to be inserted,
instrumentation point
locked after first mini
tramp generated
– Delete
• Deletes instance of
mini-tramp and base
tramp
– Replace
• Delete, then Insert new
• Possible to save AST
information at the old
mini tramp to be used
for new instrumentation

University of Maryland
No Merging
– Insert
• Same as before,
unlimited per
instrumentation site
– Delete
• Deletes instance of
mini-tramp
Dataflow analysis for Dead Registers

Register use after instrumentation
– Overwritten before accessed
• We are free to use them in tramp without
having to spill them
– Not overwritten
• Spill to stay on cautious side

Do analysis before tramp generation
– Dead registers have highest priority
– Currently same registers used regardless
University of Maryland
Dataflow Analysis Example


Analyze code before
and after an arbitrary
instrumentation point
Dead registers are
given priority for
register use in tramps

.
.
.
cal
cal
st
l
cal
st
Uninstrumented Program
r11,1(r11)
r10,3(r12)
r11,1416(r10)
r4, 280(r1)
r3, 2(r11)
r10, 304(r1)
**Potential Inst Point**
cal
cal
l
l
University of Maryland
r3, 2(r11)
r4, 5(r10)
r10, 304(r1)
r11, 308(r1)
Other Speed-ups for Dyninst

Partial Parsing of functions
– Grab symbol table and create function objects
– Delay analysis until function is actually accessed
– User can’t see non-symbol-table functions
• Therefore, We don’t have to worry about them
University of Maryland
Status

Completed and in New Release (POWER)
– New Register Spilling for Basic Snippets
• Registers Spilled for Function Calls from a
Mini Tramp
– Partial Parsing (All platforms)

Currently Being Implemented (POWER)
– Linear Code Scan for function calls
– Base Tramp, Mini Tramp Merging
– Data Flow Analysis for Dead Registers

Will eventually be on all platforms
University of Maryland
Questions

???
University of Maryland