Transcript PowerPoint Template - William & Mary Computer Science
Influence of Program Inputs on the Selection of Garbage Collectors
Feng Mao, Eddy Zheng Zhang and Xipeng Shen
The College of William and Mary
1
Introduction
GC determines efficiency of Memory manage collection time Data locality Various garbage collectors Perform differently on different applications 2
GC Selection
Selecting the best garbage collector for an execution Application-specific selection [Fitgerald & Tarditi: ISMM’00, Soman et al.: ISMM’04, Singer et al.: ISMM’07] Selecting a GC for each application Based on offline profilings 3
Influence of Inputs
An important but under-explored dimension Determine the robustness of profiling-based selection Preliminarily covered previously [ Soman et al.: ISMM’04, Singer et al.: ISMM’07] • Few inputs per application • Different observations 4
Objective
A comprehensive understanding of the influence of inputs on the selection of garbage collectors.
5
Overview
A systematic measurement 1580 inputs 10 programs 316,000 executions 5 garbage collectors 4 heap size ratios A statistical analysis to address indeterminism 6
Overview
Findings Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable 7
Outline
Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable 8
Measurement
Environment Intel Xeon E5310 Linux 2.6.9
Jikes RVM 2.9.1
5 Garbage collectors (included in MMTK) GenCopy, GenMS, MarkSweep, RefCount, SemiSpace 9
Heap Size Ratio
r = heap size min possible heap size 4 heap size ratios: 1, 2, 4, 8 The min possible heap size differs across applications, and inputs 10
Benchmarks Benchmark Min heap size (MB) Compress Db
j j
Mpegaudio Mtrt
j
Bloat
d
Fop
d j
Euler
g
MoDyn
g
MonteCarlo
g
Search
g
20-98 16-31 16-20 15-49 22-23 72-86 16-55 18-21 39-74 21-21 Number of inputs 18 100 30 100 976 224 14 15 30 8
11
Metrics
End-to-end execution time Including start-up time No replay Challenge Non-determinism in performance • JIT compilation • Thread scheduling • Noises from environment Average time? Min time? Max time?
12
Statistical Analysis
Thanks to Georeges et al.
[OOPSLA’07] 10 repetitive runs Compute confidence interval Student’s t-distribution 90%-confidence interval • means the interval contains the true running time with 90% probability Interval overlap => Not significantly different in performance 13
Example
GC1
22 20.5 23.5
{ 22, 22.1, 21.9, 22.2, 21.8 } 21.2
19.7 22.8
GC2
(s)
Overlap => {21.1, 20.8, 20.7, 20.7, 22.8} Not significantly different in performance
(s)
14
Outline
Measurement Methodology Statistical performance analysis
Top collectors vary across inputs
Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable 15
Top Sets of GC
A top set of collectors for an execution contains The collectors performing the best Their confidence intervals overlap with one another 16
Variations of Top Sets {GC2} {GC3} {GC2, GC3}
17
Mtrt in Detail
18
Implication
Risk of profiling-based GC selection 19
Outline
Measurement Methodology Statistical performance analysis Top collectors vary across inputs
Cross-input consistency exists
Heap size ratio matters Heap size ratio is predictable 20
Coverage of a collector
# of inputs that GC i is a top collector total number of inputs 21
Coverage
22
Risk of Using Top Collector
23
Implication
Profiling on a spectrum of inputs and select the top collector.
Is it enough?
24
Outline
Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists
Heap size ratio matters
Heap size ratio is predictable 25
Coverage Changes
26
Implication
Profiling on many inputs
and multiple heap size ratios
Select the top collector for each heap size ratio r = heap size min possible heap size input sensitive 27
Outline
Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable 28
Cross-Input Pred.
Machine learning technique < input 1 , minSize 1 > ... ...
Regression Trees minSize = f (input)
Details in our paper.
29
Prediction Acc.
Benchmark Compress
j
Db
j
Mpegaudio
j
Mtrt
j
Bloat
d
Fop
d
Euler
g
MoDyn
g
MonteCarlo
g
Search
g
Average GC1 99.8
98.1
100 86.1
99.9
98.2
91.3
98.6
98.9
100 97.1
GC2 99.8
97.4
98.1
90.5
100 97.2
92.7
99.0
99.1
100 97.4
GC3 100 98.2
96.3
87.4
99.7
96.6
91.4
98.1
99.4
100 96.7
GC4 100 97.0
96.0
90.5
99.4
97.7
90.4
99.3
99.5
100 97.4
GC5 99.9
98.2
96.8
90.7
99.9
98.3
93.9
98.6
99.3
100 97.5
30
Conclusions
But heap size ratio is input-sensitive.
Cross-input adaptation is necessary for GC selection.
Top garbage collector consistent across inputs for a fixed heap size ratio.
The promise is suggested by the predictability of min heap size ratios.
31
Acknowledgement
Steve Blackburn Anonymous reviewers NSF CSR & CCF 32
Questions?
Feng Mao [email protected]
The College of William and Mary Mar 2009
33
Cluster Intervals
Execution time
Set1:{3, 1 } Top set Set2:{2} Set2:{4, 5}
1 2 3 4 5 Confidence interval for each GC 34
FQA
The practical use of this technique ?
Profiling overhead and input coverage?
35
Diagram
1 ThemeGallery is a Design Digital Content & Contents mall developed by Guild Design Inc.
2 ThemeGallery is a Design Digital Content & Contents mall developed by Guild Design Inc.
3 ThemeGallery is a Design Digital Content & Contents mall developed by Guild Design Inc.
36