Microarchitectural Characterization of Production JVMs and
Download
Report
Transcript Microarchitectural Characterization of Production JVMs and
Microarchitectural Characterization
of Production JVMs
and Java Workload
work in progress
Jungwoo Ha (UT Austin)
Magnus Gustafsson (Uppsala Univ.)
Stephen M. Blackburn (Australian Nat’l Univ.)
Kathryn S. McKinley (UT Austin)
Challenges of JVM Performance Analysis
Controlling nondeterminism
Production JVMs are not created equal
Just-In-Time Compilation driven by nondeterministic
sampling
Garbage Collectors
Other Helper Threads
Thread model (kernel, user threads)
Type of helper threads
Need a solid measurement methodology!
2/22/08
Isolate each JVM part
2
Forest and Trees
What performance metrics explain performance
differences and bottlenecks?
Cache miss? L1 or L2?
TLB miss?
# of instructions?
Inspecting one or two metrics is not always enough
Performance counters give us only small number of
counters at a time
2/22/08
Multiple invocation for the measurement inevitable
3
Case Study: jython
Application performance (Cycles)
2/22/08
4
Case Study: jython
L1 Instruction cache miss/cyc
2/22/08
5
Case Study: jython
L1 Data cache miss/cyc
2/22/08
6
Case Study: jython
Total Instruction executed (retired)
2/22/08
7
Case Study: jython
L2 Data cache miss/cycle
2/22/08
8
Project Status
Established methodology to characterize
application code performance
Large
number of metrics (40+) measured from
hardware performance counters
apples to apple comparison of JVMs using
standard interface (JVMTI, JNI)
Simulator data for detail analysis
Limit
studies
What
More
performance metrics
e.g.
2/22/08
if L1 cache had no misses?
uop mix
9
Performance Counter Methodology
Collecting n metric
x warmup iterations (x = 10)
p performance counters (can measure at most p metrics per iter.)
n/p iterations needed for measurement
k redundant measurement for statistical validation (k = 1)
Need to hold workload constant for multiple measurements
Invoke JVM
y times
Warmup JVM
1st – xth iteration
Stop JIT
(x+1)th iteration
Full Heap GC
Measured Run
(x+2)th – (x+2+(n/p)k)th
iteration
change metric
2/22/08
10
Performance Counter Methodology
Stop-the-world Garbage Collector
One perfctr instance per pthread
No concurrent marking
JVM internal threads are different pthreads from the
application
JVMTI Callbacks
2/22/08
Thread start - start counter
Thread finish - stop counter
GC start - pause counter, only for userlevel thread
GC stop - resume counter, only for userlevel thread
11
Methodology Limitations
Cannot factor out memory barrier overhead
Use
garbage collector with the least application
overhead
If a helper thread runs in the same pthread
with the application (user-level thread), it will
cause perturbation
No
evidence in J9, HotSpot, JRockit
Instrumented code overhead
Must
2/22/08
be included in the measurement
12
Experiment
Performance Counter Experiment
Pentium-M
uni-processor
32KB
8-way L1 cache (data & instruction)
2MB 4-way L2 cache
2 hardware counter (18 if multiplexed)
1GB
Memory
32bit Linux 2.6.20 with perfctr patch
PAPI 3.5.0 Library
Simulator Experiment
PTLsim
(http://www.ptlsim.org) x86 simulator
64bit AMD Athlon
2/22/08
13
Experiment
3 Production JVMs * 2 versions
IBM
J9, Sun HotSpot JVM, JRockit (perfctr only)
1.5 and 1.6
Heap Size = max (16MB, 4*minimum heap size)
18 Benchmarks
9
DaCapo benchmarks
8 SPEC JVM 98
1 PseudoJBB
2/22/08
14
Experiment
40+ Metrics
40
distinct metrics from performance counter
L1
or L2 Cache misses (Instruction, Data, Read, Write)
TLB-I miss
Branch predictions
Resource Stalls
More
rich metrics from the simulator
Micro
operation mix
Load to store
2/22/08
15
Performance Counter Results (Cycle Counts)
PseudoJBB
jython
2/22/08
pmd
jess
16
Performance Counter Results (Cycle Counts)
jack
compress
2/22/08
hsqldb
db
17
Performance Counter Results
IBM J9 1.6 performed better than Sun
HotSpot 1.6 in the average
JRockit has the most variation in performance
Full results
~800
graphs
Full jython results in the paper
http://z.cs.utexas.edu/users/habals/jvmcmp
or Google my name (Jungwoo Ha)
2/22/08
18
Future Work
JVM activity characterization
Garbage
collector
JIT
Statistical analysis of performance metrics
metrics
correlation
Methodology to identify performance bottleneck
Multicore performance analysis
2/22/08
19
Conclusions
Methodology for production JVM comparison
Performance evaluation data
Simulator results for deeper analysis
2/22/08
20
Thanks you!
2/22/08
22
Simulation Result
2/22/08
23
Perfect Cache - compress
2/22/08
24
Perfect Cache - db
2/22/08
25