Instruction-level Tracing: Framework & Applications Sanjay Bhansali Binary Technologies Group Center for Software Excellence (CSE) Microsoft 11/04/2005

Download Report

Transcript Instruction-level Tracing: Framework & Applications Sanjay Bhansali Binary Technologies Group Center for Software Excellence (CSE) Microsoft 11/04/2005

Instruction-level Tracing:
Framework & Applications
Sanjay Bhansali
Binary Technologies Group
Center for Software Excellence (CSE)
Microsoft
11/04/2005
Context
• Program analysis and transformation technology
can have huge impact on engineering of
software.
• Center for Software Excellence
– Part of Windows Core OS Division
– Balance research on innovation with focus on
deployment
• Binary Technologies Group
– Binary analysis
– Static and Dynamic approaches
11/6/2015
2
Outline
•
•
•
•
•
•
•
Applications of Execution Traces
Dynamic Translation
Trace Capture
Trace Replay
Applications
Related Work
Summary
11/6/2015
3
Applications of Execution
Traces
•
•
•
•
•
•
•
•
Debugging
Regression Analysis
Bug detection
Coverage Analysis
Optimization
Impact analysis
Usage analysis
…
11/6/2015
4
Run Once, Analyze Many
• Complete instruction-level trace
• Deterministic, full fidelity replay of user
mode execution
• Pros
– Run once, analyze multiple times
• Cons
– Trace size, performance
11/6/2015
5
Framework for Instruction level
Tracing and Analysis
•
•
•
•
•
Task and machine independent
User mode processes
Modest overhead (space and time)
On-demand tracing
Reduce engineering effort for building
analysis tools
11/6/2015
6
Dynamic Binary Translation
• Runtime interpretation/translation of binary
instructions
• Pros
– Requires no static instrumentation, or special symbol
information
– Handle dynamically generated code, self modifying
code
• Cons
– Approximately ~5x slower than native execution
11/6/2015
7
Nirvana Architecture
Nirvana Client
Nirvana API
JIT translator
Code
Cache
Application
VM monitor
User
Kernel
Nirvana
Driver
Operating System
11/6/2015
8
JIT Translation Example
Native code
mov eax, [ebp]
11/6/2015
Translated code
mov EDX, tls.ebp
mov EAX, [EDX]
9
JIT Translation Example
Native code
mov eax, [ebp]
11/6/2015
Translated code
mov EDX, tls.ebp
mov ECX, tls
call MemReadCallback
mov EAX, [EDX]
10
Code Cache Management
• Single code cache
– Contention, locality
• Per Thread code cache
– Code bloat
• P+d code caches where
P = number of processors
• Reuse code caches when possible
• Fall back on interpretation
11/6/2015
11
Self modifying code
• Snoop on system calls to flush hardware cache
• Watch page protection of code bytes
– Mark page if non-writable, and flush code cache on
page protection change
– Insert self-mod instruction check otherwise
• Fall back on interpretation if too many code
cache flushes
11/6/2015
12
Nirvana API
• RegisterEventCallback(event,callback)
• Events:
– Translation
– InstructionStart
– MemRead
– MemWrite
– FlowChange
– Sequencing
11/6/2015
13
Example Nirvana Client
/* Memory Read Logger */
bool Initialize()
{
if (!InitializeNirvanaClient)
RegisterCallback(MemReadEvent, MemCallback);
}
void MemCallback(NirvContext *ctx, void* pAddr, int nBytes)
{
X86REGS *pRegs = (X86Regs*) ctx->cpuRegs;
Log(pregs->InstructionPtr(),pAddr,nBytes);
}
11/6/2015
14
Tracing & Replay Overview
Playback Process
Record Process
>>
Application
||
Nirvana
Emulation
Nirvana
<<
Replay
Trace
Writer
Trace
Log
Trace
Reader
Defect
Debugger
…
11/6/2015
Different
Machines 15
Trace Writer
• Log only what cannot be regenerated by
processor
– Values read from memory
– Values changed by kernel
– Machine and time sensitive instructions (cpuid,rdtsc)
• Everything else can be regenerated
• Trace size is ~4-5 bytes per instruction
11/6/2015
16
Optimization: Trace select reads
• Observation: Hardware caches eliminate
most off-chip reads
• Use same trick to optimize logging:
– Have logger and replayer simulate identical cache
memories
– Only log cache misses
• Average trace size is <1 bit per instruction
11/6/2015
17
Example
i = 1;
for (j = 0; j < 10; j++)
{
i = i + j;
}
k = i; // value read is 46
System_call();
k = i; // value read is 0 (not predicted)
• The only read not predicted and logged follows
the system call
11/6/2015
18
Sequence points & Checkpoints
lock xadd
Kernel/User
User/Kernel
Module Load
Kernel/User
Exception
• Tracing uses per-thread streams for performance
• Sequence points used to impose partial order on
instruction executions across threads
• Checkpoint frames for random access into the trace
(every 5 million instructions)
11/6/2015
19
Trace Writer Performance
Application
Simulated
Instructions
(millions)
Trace File
Size
Trace File
Bits /
Instruction
Native
Execution
Time
Execution
Time While
Tracing
Execution
Overhead
Gzip
24,097
245 MB
0.09
11.7s
187s
15.98
Excel
1,781
99 MB
0.47
18.2s
105s
5.76
Power
Point
7,392
528 MB
0.60
43.6s
247s
5.66
IE
116
5 MB
0.50
0.499s
6.94s
13.90
Vulcan
2,408
152 MB
0.53
2.74s
46.6s
17.01
Satsolver
9,431
1300 MB
1.16
9.78s
127s
12.98
11/6/2015
20
Trace Reader - Replay
Instruction
Fetch
Nirvana
Trace
Log
Data
Read
Miss
Data
Fetch
Prediction
Cache
• Nirvana requests
code & data via the
Fetch operations
• TraceReader uses
same prediction
cache as TraceWriter
Data
Write
11/6/2015
21
Trace Reader - Navigation
Current
Position
Destination
Checkpoint
Frame
1
2
3
4
5
6
7
8
• Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
11/6/2015
22
Trace Reader - Navigation
Current
Position
Destination
Checkpoint
Frame
1
2
3
4
5
6
7
8
• Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
11/6/2015
23
Trace Reader - Navigation
Current
Position
Destination
Checkpoint
Frame
1
2
3
4
5
6
7
8
• Navigation involves going back to the closest
Checkpoint frame before the destination and
executing forward to the destination from there.
11/6/2015
24
Time Travel Debugging
• Examine a program as it runs backwards
to figure out root cause of a problem.
– Reverse breakpoint
– Step back
– Search backwards in time
• Used to diagnose bugs in shipped
products
11/6/2015
25
Truscan: Defect Detection Tool
• Scan traces for bugs that “hide”
– memory leaks
– dangling pointers
– un-initialized memory
• Report bugs that really happen – no false
positives
• Debug with time travel debugging
11/6/2015
26
Example: Memory Leak Detection
eax = HeapAlloc(42);
mov [0x4004], eax
eax
eax = 0;
0x4004
ADDR = 0x3004
SIZE = 42
1
2
REFCOUNT = 0
mov [0x4004], 0
This example is trivial, but …
11/6/2015
27
Statistics
• A Windows Application (under development)
600 million instructions
80,000 allocations
30 million pointers
48 leaks (8 unique bugs)
Native :
Trace:
Analyze:
~9 seconds
~44 seconds
~41 minutes
(3 Ghz, single threaded, 1GB ram)
11/6/2015
28
Regression Analysis
Callgraph
OS1
App 1

OS2
Foo
bar
m1
m2
m3
p1

Foo
bar
m1
m4
m3
p1
Instruction Sequence
Mov edi, edi
Push ebp
Mov ebp, esp
Sub esp, 0x54
Cmp [ebp+24],0
Jne …
Call …
Trace 1
TraceDiff
Trace 2
11/6/2015
.
.
.
Mov edi, edi
Push ebp
Mov ebp, esp
Sub esp, 0x54
Cmp [ebp+24],0
Jne …
Mov …
Coverage
Foo
Bar
m1
m2
m3
p1
Foo
Bar
m1
m3
p1
29
Related Work
• Process Virtualization
– DynamoRIO, Mojo, DELI, ReVirt, Valgrind
• Instrumentation
– ATOM, Vulcan, SHADE, Pin
• Trace Compression
– VPC
• Reverse Debugging
– ReVirt, Traceback, BugNet, Flashback, FDR
• Program/Trace Diffing & Applications
– Zeller, Zhang&Gupta
11/6/2015
30
Summary
• Flexible framework for instruction level
tracing and analysis
• Complete full-fidelity traces
• Run once, analyze multiple times
• Reasonable overhead
• Many useful applications
– Debugging, defect detection, optimization, …
11/6/2015
31
Shameless self promotion!
• Hiring for internships and full-time
positions at all levels
• Contact: [email protected]
11/6/2015
32
Questions
11/6/2015
33