Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel Pin2 Tutorial Pin People Robert Cohn Kim Hazelwood Artur Klauser Geoff Lowney CK Luk Robert Muth Harish Patil Ramesh Peri Vijay Janapareddi Steven Wallace Pin2 Tutorial.

Download Report

Transcript Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel Pin2 Tutorial Pin People Robert Cohn Kim Hazelwood Artur Klauser Geoff Lowney CK Luk Robert Muth Harish Patil Ramesh Peri Vijay Janapareddi Steven Wallace Pin2 Tutorial.

Pin Tutorial
Kim Hazelwood
Robert Muth
VSSAD Group, Intel
Pin2 Tutorial
1
Pin People
Robert Cohn
Kim Hazelwood
Artur Klauser
Geoff Lowney
CK Luk
Robert Muth
Harish Patil
Ramesh Peri
Vijay Janapareddi
Steven Wallace
Pin2 Tutorial
2
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial
3
What is Pin?
• Pin Is Not a TLA
• Pin is a dynamic binary rewriting engine
• Derived from Spike: a static rewriter
• Two versions available:
– Pin2 is the current version
– Pin0 (IPF only) is not covered in this talk
Pin2 Tutorial
4
Pin Features
• Rewritten program exists only in memory
• No tool chain dependence
– No issues with code/data mixing, missing relocs, etc.
•
•
•
•
•
Rewrites all user level code including shared libs
Multi-ISA: Itanium, IA32, EM64T, XScale
Attach/detach to/from running process (like gdb)
Transparent: unchanged program behavior
Efficient: very good performance
Pin2 Tutorial
5
Pin Applications
•
•
•
•
Optimization
Security (program shepherding)
Debugging
Instrumentation
Instrumentation is our current focus
Pin2 Tutorial
6
Uses for Instrumentation
• Profiling for optimization
– Basic block counts, edge counts
– Value profiles, stride profiling, load latencies
• Micro-architectural studies
– Branch predictor simulation
– Cache simulation
– Trace generation
• Bug checking
– Find uninitialized or unallocated data references
Pin2 Tutorial
7
Pin Instrumentation Features
• User programmable via plug-ins
– many examples provided
– plug-ins are typically ISA agnostic
• Can take advantage of symtab info
• Automatic register saving/restoring
• Various instrumentation granularities
– Instruction, “Trace”, Routine
• ATOM compatibility mode (AOTI)
Pin2 Tutorial
8
Other Dynamic Rewriting Engines
(and what they focus on)
• Dynamo (PA-RISC HPUX)
– Dynamic optimization
• DynamoRIO (IA32 Linux + Win32)
– Originally: Dynamic optimization
– Now: Sandboxing, some instrumentation
• Valgrind (IA32 Linux)
– Originally: Special-purpose instrumentation
– Now: General-purpose instrumentation
Pin2 Tutorial
9
Static Instrumentation
(“Atom Style”)
•
•
•
•
(Way) Ahead-of-time
Persistent
Good but not perfect transparency
Shared libraries can be a problem
Program
ATOM
Pin2 Tutorial
Instrumented
Program
10
Dynamic Instrumentation
(“Pin Style”)
• Execution driven
– Occurs when code is executed
• Original program is NOT modified
– Code is “copied” into code cache
– Only code in code cache is executed
• Instrumentation is not persistent
• Can also instrument libraries
Pin2 Tutorial
11
Dynamic Instrumentation
Original
code
Code
cache
1
2
3
4
5
6
7
Pin has grabbed control
before execution of block 1
Pin2 Tutorial
Pin
12
Dynamic Instrumentation
Original
code
Code
cache
1
2
1’
3
2’
4
5
7’
6
7
Pin fetches trace and
allows for instrumentation
Pin2 Tutorial
Pin
13
Dynamic Instrumentation
Original
code
Code
cache
1
2
1’
3
2’
4
5
7’
6
7
Pin transfers control into
code cache (block 1)
Pin2 Tutorial
Pin
14
Dynamic Instrumentation
Original
code
Code
cache
1
2
1’
3’
2’
5’
7’
6’
3
4
5
6
7
Pin fetches new trace
and ‘links’ it
Pin2 Tutorial
Pin
15
Dynamic Instrumentation
Original
code
Code
cache
1
2
1’
3
4
3’
2’
5’
7’
6’
5
6
7
Pin transfers control into
code cache (block 3)
Pin2 Tutorial
Pin
16
Running Pin
• Three program images are involved:
1. pin
2. pintool/plug-in
3. Application
• “Shell mode”
$ pin –t inscount –- xclock
• “Gdb mode” - attaching to existing process
$ pin –pid 1067 –t inscount
(can detach and re-attach with different plug-in)
Pin2 Tutorial
17
Transparency
Program execution under Pin is transparent:
• Program state is unchanged
– Code/data addresses, memory content
• Will not expose latent bugs
• Instrumentation sees the original program
– Code/data address, memory content
• (But: intentional program state changes
possible, e.g. fault injection)
Pin2 Tutorial
18
Transparency (Example)
Original Code:
0x1000 call 0x4000
Push 0x1006 on stack,
then jump to 0x4000
Code cache address mapping:
0x1000 ->0x7000 “caller”
0x4000 -> 0x8000 “callee”
Translated Code:
0x7000 Push 0x1006
0x7006 Jmp 0x8000
Stack content remains
unchanged
Pin2 Tutorial
19
Transparency has a Price
Pop 0x1006 from stack,
then jump to 0x1006
Original Code:
0x4400 ret
Translated Code:
0x8400 Pop rx
0x84… ry = Translate(rx)
0x84… Jmp ry
• Pin needs to translate
program address to code
cache address.
• Main reason for slowdowns
in dynamic instrumentation
systems!
Pin2 Tutorial
20
Portability Challenges
ARM
IA-32/EM64T
IPF
Type
RISC
CISC
VLIW
Instruction
Fixed length
Variable length,
prefixes
Bundled
Memory
Instruction
LD/ST
Any, Implicit
LD/ST
Memory op size
Fixed
Variable length
Fixed
Addressing modes
Pre/post/iprel
increment
Index/offset/
scale/iprel
post
Predication
Cond. codes
None
Predicate regs
Parameters
Registers
Stack/registers
Stacked registers
Pin2 Tutorial
21
Pin Instrumentation Query API
• ISA independent part (usually sufficient)
– INS_Address(), INS_Size(), INS_IsRet(),
INS_IsCall(), INS_MemoryReadSize(),
INS_Mnemonic(), etc.
• ISA dependent part (optional)
– INS_GetPredicate(), INS_RegR(),
INS_RegW(), etc.
Pin2 Tutorial
22
Pin2 Tutorial
k
r
ea
n
r
x
vp
rte
142
154
105
109
173
198
302
113
114
623
568
414
203
237
121
115
168
101
101
855
847
739
1188
DynamoRIO
riM
vo
tw
ol
f
rlb
m
rs
e
cf
109
111
587
Valgrind
IN
TA
C latest numbers are even better
pe
pa
m
ip
178
299
138
122
1200
gz
c
gc
p
747
800
213
182
433
1000
ga
200
105
108
400
af
ty
ip
2
600
cr
bz
Normalized Execution Time (%)
Performance Comparison:
No Instrumentation
Pin/IA32
0
23
Pin2 Tutorial
vo
rte
x
tw
ol
f
k
r
508
286
320
177
834
1583
1220
936
817
520
318
391
269
147
793
517
574
480
200
191
158
119
212
934
718
633
478
DynamoRIO
vp
IN
r
TAr
iM
ea
n
C latest numbers are even better
rlb
m
rs
e
cf
ip
1091
860
606
365
617
377
582
479
240
Valgrind
pe
pa
m
gz
c
p
gc
ga
af
ty
ip
2
1600
1400
1200
1000
800
600
400
200
0
cr
bz
Normalized Execution Time (%)
Performance Comparison:
Basic-Block Counting
Pin/IA32
24
Pin2 Status
• ISAs: IA32, IA32E, Xscale, (IPF soon)
• Distros: Debian, Suse, Mandrake,
Red Hat 7.2, 8.0, 9.0, EL3, FC3
• >2500 downloads
• Multithreading support in beta
• Windows support in preparation
Pin2 Tutorial
25
Project Engineering
• Automatic nightly testing
– >4 platforms
– >7 Linux distributions
– >8 compilers
– >9000 binaries
• Automatically generated user manual,
internal documentation using Doxygen
Pin2 Tutorial
26
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial
27
Instrumentation vs. Analysis
Concepts borrowed from ATOM
• Instrumentation routines define where
instrumentation is inserted
– e.g. before instruction
Occurs at compile time (JIT time)
• Analysis routines define what to do when
instrumentation is activated
– e.g. increment counter
Occurs at runtime
Pin2 Tutorial
28
Instrumentation vs. Analysis (2)
In ATOM:
• Instrumentation and analysis occurred in separate phase
• Code was in separate files
In Pin:
• Difference is somewhat blurred
• Instrumentation and analysis are interleaved
• User plug-in provides code for both
These are difficult terms to remember!
Mental Bridge: Instrumentation → Insertion
Analysis → Action
Pin2 Tutorial
29
Instrumentation Routine
• Written in C++
• Invoked by Pin via Callback mechanism
• Invoked when Pin places new code in
code cache (different granularities:
instruction, trace, …)
• Instruments using the Pin API for
– inserting calls to analysis routines
– picking arguments for analysis routines
Pin2 Tutorial
30
Analysis Routines
• Written in any language: C, C++, Asm, etc.
• Invoked when surrounding code executes
• Isolated from application by
– separate memory areas
– separate register state
• Automatically optimized by Pin (inlining,
register allocation, etc.)
Pin2 Tutorial
31
Example: Instruction Count
IncCounter();
mov r2 = 2
IncCounter();
add r3 = 4, r3
IncCounter();
beq L1
Instrumentation:
Insert call to IncCounter()
before every instruction
Analysis:
IncCounter();
add r4 = 8, r4
IncCounter();
beq L2
VOID IncCounter()
{
icount++;
}
Pin2 Tutorial
32
Example: Instruction Count
$ /bin/ls
Makefile atrace.o imageload.out
$ pin -t inscount -- /bin/ls
Makefile atrace.o imageload.out
Count 422838
Output of inscount plug-in
$
Pin2 Tutorial
33
#include <iostream>
#include "pin.H"
inscount.C
UINT64 icount = 0;
VOID IncCounter() {
icount++;
}
analysis
VOID Instruction(INS ins, VOID *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IncCounter, IARG_END);
}
instrumentation
VOID Fini(INT32 code, VOID *v) {
std::cerr << "Count " << icount << endl;
}
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
3
PIN_StartProgram();
return 0;
}
Pin2 Tutorial
2
1
driver
34
ISA independent!
Explanations
1. Register Instruction() to be called back
for every instruction placed into the code
cache
2. Insert call to IncCount() before code
cache instruction
3. Register Fini() to be called back at the
end
Pin2 Tutorial
35
Instrumentation Points
Relative to an instruction (“beq L2”):
1. Before
(IPOINT_BEFORE)
2. After
(IPOINT_AFTER)
3. On taken branch (IPOINT_BRANCH_TAKEN)
mov r4 = 2
1
beq L2
2
3
L2: mov r9 = 4
ret
add r3=8,r9
Pin2 Tutorial
36
Example: Instruction Trace
traceInst(ip);
mov r2 = 2
traceInst(ip);
add r3 = 4, r3
traceInst(ip);
beq L1
traceInst(ip);
add r4 = 8, r4
traceInst(ip);
beq L2
Pin2 Tutorial
37
Example: Instruction Trace
$ pin -t itrace -- /bin/ls
Makefile atrace.o imageload.out
$ head itrace.out
0x40001e90
0x40001e91
0x40001ee4
0x40001ee5
0x40001ee7
0x40001ee8
…
$
Pin2 Tutorial
38
#include <stdio.h>
#include "pin.H"
itrace.C
FILE * trace;
VOID traceInst(VOID *ip) {
fprintf(trace, "%p\n", ip);
}
VOID Instruction(INS ins, VOID *v) {
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)traceInst, IARG_INST_PTR, IARG_END);
}
1
int main(int argc, char * argv[]) {
trace = fopen("itrace.out", "w");
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_StartProgram();
return 0;
}
Pin2 Tutorial
39
Explanations
1. Insert traceIns() before code cache
instruction, traceIns() takes extra
argument!
(Bad coding practice: we should have closed
the file descriptor using a Fini function)
Pin2 Tutorial
40
Analysis Routine Parameters
• IARG_UINT32 <number>
• IARG_REG_VALUE <register name> [*]
• IARG_INST_PTR
• IARG_BRANCH_TAKEN
• IARG_BRANCH_TARGET_ADDR
• IARG_G_ARG0_CALLER
• IARG_MEMORY_READ_EA
• IARG_SYSCALL_NUMBER
• …
[*] Will result in ISA dependent tool
Pin2 Tutorial
41
Example: Fast Instruction Count
IncCounter(1);
IncCounter(3);
mov r2 = 2
IncCounter(1);
add r3 = 4, r3
BBL1
IncCounter(1);
beq L1
IncCounter(1);
IncCounter(2);
add r4 = 8, r4
IncCounter(1);
BBL2
beq L2
Pin2 Tutorial
42
#include <stdio.h>
#include "pin.H“
inscount.C
UINT64 icount = 0;
VOID IncCounter(INT32 c) { icount += c; }
VOID Trace(TRACE trace, VOID *v) {
for(BBL b=TRACE_BblHead(trace); BBL_Valid(b); b=BBL_Next(b)){
BBL_InsertCall(b, IPOINT_BEFORE, (AFUNPTR)IncCounter,
IARG_UINT32, BBL_NumIns(b), IARG_END);2
}
}
VOID Fini(INT32 code, VOID *v)
{ fprintf(stderr, "Count %lld\n", icount);}
int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
1
TRACE_AddInstrumentFunction(Trace, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
Pin2 Tutorial
}
43
Explanations
1. Register Trace() to be called back for
every trace placed in the code cache
As first approximation, a “trace” is sequence
of basic blocks (BBLs)
2. For each trace walk the BBLs and insert
IncCount() with appropriate integer
parameter at beginning
Pin2 Tutorial
44
Further Reading
The following material is also covered in
the Pin user manual
Go to
http://rogue.colorado.edu/Pin/
Then follow the “manuals” link
Pin2 Tutorial
45
Summary
• Pin instrumentation is:
– Robust
– Transparent
– Easy-to-use
– Efficient
– Portable
• Try it: http://rogue.colorado.edu/Pin
Pin2 Tutorial
46
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial
47
Trace vs. Instruction Instrumentation
VOID Instruction(INS ins, VOID *v) {
INS_InsertCall(ins, IPOINT_BEFORE,(AFUNPTR)Cnt, IARG_END);
}
Can be emulated by:
VOID Trace(TRACE trace, VOID *v) {
for (BBL bbl = TRACE_BblHead(trace);
BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
for ( INS ins = BBL_InsHead(bbl);
INS_Valid(ins); ins = INS_Next(ins)){
INS_InsertCall(ins,IPOINT_BEFORE,(AFUNPTR)Cnt,IARG_END);
}
}
}
Pin2 Tutorial
48
Definition: Pin Trace (JITI)
• List of instructions that is only entered from
top, but may have multiple exits
• No side entries (Pin duplicates code to
ensure this!)
• Multiple copies of instruction in code cache
Program:
Trace 1:
Trace 2:
mov
L2: add
add
beq
…
mov
add
add
beq
…
add r3 = 4, r3
add r4 = 8, r4
beq L2
…
r2 = 2
r3 = 4, r3
r4 = 8, r4
L2
r2 = 2
r3 = 4, r3
r4 = 8, r4
L2
Pin2 Tutorial
49
Instrumentation Modes
• Just-In-Time Instrumentation (JITI)
– Per instruction, per trace
– “basic block” notion
• Ahead-Of-Time Instrumentation (AOTI)
– Per instruction, per function, per
section/image
– Emulated using JITI
– Functionality similar to ATOM
– Extra startup overhead
– No “basic blocks” notion
Pin2 Tutorial
50
Per Image Instrumentation (AOTI)
Hooking Image (Un)Loading
$pin -t imageload -- /bin/ls
Makefile imageload.o inscount0.o
$ cat imageload.out
Loading /bin/ls
Loading /lib/ld-linux.so.2
…
Unloading /bin/ls
Unloading /lib/ld-linux.so.2
…
Pin2 Tutorial
51
…
FILE * T;
VOID ImageLoad(IMG img, VOID *v)
{ fprintf(T, "Loading %s\n", IMG_Name(img).c_str());}
VOID ImageUnload(IMG img, VOID *v)
{ fprintf(T, "Unloading %s\n", IMG_Name(img).c_str());}
VOID Fini(INT32 code, VOID *v) { fclose(T); }
int main(int argc, char * argv[]) {
trace = fopen("imageload.out", "w");
PIN_Init(argc, argv);
IMG_AddInstrumentFunction(ImageLoad, 0);
IMG_AddUnloadFunction(ImageUnload, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram(); return 0;
}
Pin2 Tutorial
52
“Walking” Images
VOID ImageLoad(IMG img, VOID *v) {
for (SEC sec = IMG_SecHead(img);
SEC_Valid(sec); sec = SEC_Next(sec)) {
for (RTN rtn = SEC_RtnHead(sec);
RTN_Valid(rtn); rtn = RTN_Next(rtn)) {
RTN_Open(rtn);
for (INS ins = RTN_InsHead(rtn);
INS_Valid(ins); ins = INS_Next(ins))
static_count++;
RTN_Close(rtn);
}
}
}
Pin2 Tutorial
53
Explanations
•
•
•
Image->Section->Routine->Instruction
We are essentially walking the symtab
For each functions symbol:
– Disassemble function (RTN_Open)
– Then walk instructions
– NB: no basic blocks available!
Pin2 Tutorial
54
“Walking” And Instrumenting
VOID ImageLoad(IMG img, VOID *v) {
for (SEC sec = IMG_SecHead(img);
SEC_Valid(sec); sec = SEC_Next(sec)) {
for (RTN rtn = SEC_RtnHead(sec);
RTN_Valid(rtn); rtn = RTN_Next(rtn)) {
RTN_Open(rtn);
for (INS ins = RTN_InsHead(rtn);
INS_Valid(ins);
ins = INS_Next(ins)) {
INS_InsertCall(ins,IPOINT_BEFORE,
(AFUNPTR)Cnt,IARG_END);
}
RTN_Close(rtn);
}
}
}
Pin2 Tutorial
55
Explanations
•
•
•
AOTI, instrumentation request are cached
until code is executed
Effect like 1st instruction count example
But:
–
–
worse (startup) performance
higher memory consumption
• Requires symbol table
→ Bad use of AOTI!
Pin2 Tutorial
56
“Searching” And Instrumenting
SimpleExamples/malloctrace.C
VOID ImageLoad(IMG img, VOID *v)
{
RTN mallocRtn = RTN_FindByName(img, "malloc");
if (RTN_Valid(mallocRtn))
{
RTN_Open(mallocRtn);
RTN_InsertCall(mallocRtn, IPOINT_BEFORE,
(AFUNPTR)MBefore, IARG_G_ARG0_CALLEE, IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER,
(AFUNPTR)MAfter, IARG_G_RESULT0, IARG_END);
RTN_Close(mallocRtn);
}
}
Pin2 Tutorial
57
Explanations
•
•
•
•
Instrument prolog and epilogs of malloc()
using RTN_InsertCall
Instrumentation really happens on
instruction level, hence we must call
RTN_Open
Requires symbol table
Good use of AOTI!
Pin2 Tutorial
58
Performance Considerations
VOID count( ADDRINT s, ADDRINT d ) {
COUNTER *pedg = Lookup( s,d );
pedg->_count++; }
// expensive!
VOID Instruction(INS ins, void *v) {
...
if ( [ins is a branch or a call instruction] )
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)count, IARG_INST_PTR,
IARG_BRANCH_TARGET_ADDR, IARG_END);
... }
Pin2 Tutorial
59
Improved Version
VOID count_fast( COUNTER *pedg ) {
pedg->_count++;}
VOID InstructionFast(INS ins, void *v) {
…
if (INS_IsDirectBranchOrCall(ins)) {
COUNTER *pedg = Lookup( INS_Address(ins),
INS_DirectBranchOrCallTargetAddress(ins) );
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR) count_fast, IARG_ADDRINT, pedg,
IARG_END); }
else {
...
}
Pin2 Tutorial
60
Remarks
•
•
If possible move work from analysis to
instrumentation!
Keep analysis routine small so that
they get inlined!
Pin2 Tutorial
61
Plug-ins Shipped with Pin2
•
•
•
•
•
•
Data cache simulation
Malloc/Free tracer
Syscall tracer
Opcode mix profiler
Register usage profiler
…
Pin2 Tutorial
62
Debugging Pin Plug-ins
Pause Pin for 7 sec to attach with gdb
$ pin -pause_tool 7 -t inscount -- /bin/ls
Pausing to attach to pid 28769
$ gdb
(gdb) attach 28769
…
(gdb) break main
...
(gdb) cont
Pin2 Tutorial
63
Summary
• Pin instrumentation is:
– Robust
– Transparent
– Easy-to-use
– Efficient
– Portable
• Try it: http://rogue.colorado.edu/Pin
Pin2 Tutorial
64