WinDI: Pervasive Tracing in Windows

Download Report

Transcript WinDI: Pervasive Tracing in Windows

Extensible Distributed Tracing
from Kernels to Clusters
Úlfar Erlingsson, Google Inc.
Marcus Peinado, Microsoft Research
Simon Peter, Systems Group, ETH Zurich
Mihai Budiu, Microsoft Research
1
Wouldn’t it be nice if…
• We could know what our clusters were doing?
• We could ask any question,
… easily, using one simple-to-use system.
• We could collect answers extremely efficiently
… so cheaply we may even ask continuously.
2
Let’s imagine...
• Applying data-mining to cluster tracing
• Bag of words technique
– Compare documents w/o structural knowledge
– N-dimensional feature vectors
– K-means clustering
• Can apply to clusters, too!
3
Cluster-mining with Fay
• Automatically categorize cluster behavior,
based on system call activity
4
Cluster-mining with Fay
• Automatically categorize cluster behavior,
based on system call activity
– Without measurable overhead on the execution
– Without any special Fay data-mining support
5
Fay K-Means Behavior-Analysis Code
var kernelFunctionFrequencyVectors =
cluster.Function(kernel, “syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(),
Interval = evt.Cycles / CPS,
Function = evt.CallerAddr })
.GroupBy(evt => evt,
(k,g) => new { key = k, count = g.Count() });
Vector Nearest(Vector pt, Vectors centers) {
var near = centers.First();
foreach (var c in centers)
if (Norm(pt – c) < Norm(pt – near))
near = c;
return near;
}
Vectors OneKMeansStep(Vectors vs, Vectors cs) {
return vs.GroupBy(v => Nearest(v, cs))
.Select(g => g.Aggregate((x,y)
=> x+y)/g.Count());
}
Vectors KMeans(Vectors vs, Vectors cs, int K) {
for (int i=0; i < K; ++i)
cs = OneKMeansStep(vs, cs);
return cs;
}
6
Fay K-Means Behavior-Analysis Code
var kernelFunctionFrequencyVectors =
cluster.Function(kernel, “syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(),
Interval = evt.Cycles / CPS,
Function = evt.CallerAddr })
.GroupBy(evt => evt,
(k,g) => new { key = k, count = g.Count() });
7
Fay vs. Specialized Tracing
• Could’ve built a specialized tool for this
– Automatic categorization of behavior (Fmeter)
• Fay is general, but can efficiently do
– Tracing across abstractions, systems (Magpie)
– Predicated and windowed tracing (Streams)
– Probabilistic tracing (Chopstix)
– Flight recorders, performance counters, …
8
Key Takeaways
Fay: Flexible monitoring of distributed executions
– Can be applied to existing, live Windows servers
1. Single query specifies both tracing & analysis
– Easy to write & enables automatic optimizations
2. Pervasively data-parallel, scalable processing
– Same model within machines & across clusters
3. Inline, safe machine-code at tracepoints
– Allows us to do computation right at data source
9
K-Means: Single, Unified Fay Query
var
varkernelFunctionFrequencyVectors
kernelFunctionFrequencyVectors= =
cluster.Function(kernel,
cluster.Function(kernel,“*”)
“*”)
.Where(evt
.Where(evt=>=>evt.time
evt.time< <Now.AddMinutes(3))
Now.AddMinutes(3))
.Select(evt
.Select(evt=>=>new
new{ {Machine
Machine= =fay.MachineID(),
MachineID(),
Interval
Interval= =evt.Cycles
w.Cycles // CPS,
CPS,
Function
Function= =evt.CallerAddr})
w.CallerAddr})
.GroupBy(evt
.GroupBy(evt=>=>evt,
evt,
(k,g)
(k,g)=>=>new
new{ {key
key= =k,k,count
count= =g.Count()
g.Count()});
});
Vector Nearest(Vector pt, Vectors centers) {
var near = centers.First();
foreach (var c in centers)
if (Norm(pt
(|pt – c|–<c)
|pt
< –
Norm(pt
near|)– near))
near = c;
return near;
}
Vectors OneKMeansStep(Vectors vs, Vectors cs) {
return vs.GroupBy(v => Nearest(v, cs))
.Select(g => g.Aggregate((x,y)
=> x+y)/g.Count());
}
Vectors KMeans(Vectors vs, Vectors cs, int K) {
for (int i=0; i < K; ++i)
cs = OneKMeansStep(vs, cs);
return cs;
}
10
Fay is Data-Parallel on Cluster
• View trace query as distributed computation
• Use cluster for analysis
11
Fay is Data-Parallel on Cluster
System call trace events
• Fay does early aggregation & data reduction
• Fay knows what’s needed for later analysis
12
Fay is Data-Parallel on Cluster
System call trace events
• Fay does early aggregation & data reduction
K-Means analysis
• Fay builds an efficient processing plan from query
13
Fay is Data-Parallel within Machines
• Early aggregation
• Inline, in OS kernel
• Reduce dataflow & kernel/user transitions
• Data-parallel per each core/thread
14
Processing w/o Fay Optimizations
K-Means: System calls
K-Means: Clustering
• Collect data first (on disk)
• Reduce later
• Inefficient, can suffer data overload
15
Traditional Trace Processing
K-Means: System calls
K-Means: Clustering
• First log all data (a deluge)
• Process later (centrally)
• Compose tools via scripting
16
Takeaways so far
Fay: Flexible monitoring of distributed executions
1. Single query specifies both tracing & analysis
2. Pervasively data-parallel, scalable processing
17
Safety of Fay Tracing Probes
• A variant of XFI used for safety [OSDI’06]
–
–
–
–
Works well in the kernel or any address space
Can safely use existing stacks, etc.
Instead of language interpreter (DTrace)
Arbitrary, efficient, stateful computation
• Probes can access thread-local/global state
• Probes can try to read any address
– I/O registers are protected
18
Key Takeaways, Again
Fay: Flexible monitoring of distributed executions
1. Single query specifies both tracing & analysis
2. Pervasively data-parallel, scalable processing
3. Inline, safe machine-code at tracepoints
19
Installing and Executing Fay Tracing
Tracing Runtime
query
Create
probe
User-Space
Kernel
Target
ETW
Fay
Probe
Hotpatching
XFI
200 cycles
• Fay runtime on each
machine
• Fay module in each traced address space
• Tracepoints at hotpatched function boundary
20
Low-level Code Instrumentation
Module with a traced function Foo
Caller:
...
e8ab62ffff
...
call Foo
ff1508e70600 call[Dispatcher]
ebf8
jmp Foo-6
cccccc
Foo2: 57
push rdi
Foo:
...
c3
ret
• Replace 1st opcode of functions
21
Low-level Code Instrumentation
Module with a traced function Foo
Caller:
...
e8ab62ffff
...
call Foo
ff1508e70600 call[Dispatcher]
ebf8
jmp Foo-6
cccccc
Foo2: 57
push rdi
Foo:
Dispatcher:
t = lookup(return_addr)
...
call t.entry_probes
...
call t.Foo2_trampoline
...
...
c3
Fay platform module
call t.return_probes
ret
...
return /* to after call Foo */
• Replace 1st opcode of functions
• Fay dispatcher called via trampoline
22
Low-level Code Instrumentation
Module with a traced function Foo
Caller:
...
e8ab62ffff
...
Fay probes
call Foo
ff1508e70600 call[Dispatcher]
ebf8
jmp Foo-6
cccccc
Foo2: 57
push rdi
Foo:
Dispatcher:
t = lookup(return_addr)
...
call t.entry_probes
...
call t.Foo2_trampoline
PF3
XFI
...
...
c3
Fay platform module
call t.return_probes
ret
...
return /* to after call Foo */
PF4
XFI
PF5
XFI
• Replace 1st opcode of functions
• Fay dispatcher called via trampoline
• Fay calls the function, and entry & exit probes
23
What’s Fay’s Performance & Scalability?
• Fay adds 220 to 430 cycles per traced function
• Fay adds 180% CPU to trace all kernel functions
• Both approx 10x faster than Dtrace, SystemTap
Cycles
Null-probe overhead
10000
8000
6000
4000
2000
0
Slowdown (x)
26.7
30
17.2
20
10
Crash
2.8
0
Fay Solaris OS X Stap
Dtrace Dtrace Linux
Fay
Solaris OS X Stap
Dtrace Dtrace Linux
24
Fay Scalability on a Cluster
• Fay tracing memory allocations, in a loop:
– Ran workload on a 128-node, 1024-core cluster
– Spread work over 128 to 1,280,000 threads
– 100% CPU utilization
• Fay overhead was 1% to 11% (mean 7.8%)
25
More Fay Implementation Details
• Details of query-plan optimizations
• Case studies of different tracing strategies
• Examples of using Fay for performance analysis
• Fay is based on LINQ and Windows specifics
– Could build on Linux using Ftrace, Hadoop, etc.
• Some restrictions apply currently
– E.g., skew towards batch processing due to Dryad
26
Conclusion
• Fay: Flexible tracing of distributed executions
• Both expressive and efficient
– Unified trace queries
– Pervasive data-parallelism
– Safe machine-code probe processing
• Often equally efficient as purpose-built tools
27
Backup
28
A Fay Trace Query
from io in cluster.Function("iolib!Read")
where io.time < Now.AddMinutes(5)
let size = io.Arg(2) // request size in bytes
group io by size/1024 into g
select new { sizeInKilobytes = g.Key,
countOfReadIOs = g.Count() };
• Aggregates read activity in iolib module
• Across cluster, both user-mode & kernel
• Over 5 minutes
29
A Fay Trace Query
from io in cluster.Function("iolib!Read")
where io.time < Now.AddMinutes(5)
let size = io.Arg(2) // request size in bytes
group io by size/1024 into g
select new { sizeInKilobytes = g.Key,
countOfReadIOs = g.Count() };
• Specifies what to trace
5000
• 2nd argument of read function in iolib
0
• And how to aggregate
• Group into kb-size buckets and count
30