ppt - The Journal of Instruction

Download Report

Transcript ppt - The Journal of Instruction

JILP
The Journal of Instruction-Level Parallelism
1st JILP Workshop on
Computer Architecture Competitions (JWAC-1):
Cache Replacement Championship
International Symposium on Computer Architecture ( ISCA – 2010 )
JILP
Submission Requirements
• Cache replacement algorithm
• Code that fits into provided framework
• Maximum of 3 versions of code were allowed
• 4-page paper
2
JILP
Statistics
• Submissions
– 26 total papers
– 35 distinct code submissions
• Distribution
– Asia – 12
– North America - 11
– Europe - 3
JILP
Metrics
• Performance Ranking
• Overall Paper Quality
• Adherence to Competition Rules
• Qualitative Assessment of Logic Complexity
• Intuition provided
JILP
Process
• Reviews
–
–
–
–
26 papers
3 reviews per paper -> 78 reviews
6 reviewers -> ~13 reviews per reviewer
8 reviewers -> ~10 reviews per reviewer
• Phone program committee
– Shared Google docs to manage process
10 Papers Accepted
JILP
Types of Policies
• Cache Replacement Strategies:
•
•
•
•
•
•
Insertion Policies
Reuse Distance Prediction
Dead Block Prediction
Memory Region Based Prediction
Counter-based Prediction
Frequency-based Prediction
6
JILP
Thanks
• Organizing Committee
– Aamer Jaleel, Intel (Chair)
– Alaa Alameldeen, Intel
– Moin Qureshi, IBM
• Sponsorship/Web
– Eric Rottenberg
• Program Committee
–
–
–
–
–
–
Doug Burger, Microsoft
Mainak Chaudhuri, IITK
Aamer Jaleel, Intel
Gabriel Loh, Georgia Tech
Moinuddin Qureshi, IBM
Yan Solihin, NC State
JILP
RESULTS
8
JILP
Experimental Framework
• Common framework
• Allows for comparison of competing algorithms
• Trace driven performance model
• 4-way OoO core
• 3-level Cache Hierarchy
• 32KB L1, 256KB L2
• Competition Focus:
Replacement Policies for LLC (L3)
• Private Cache: 1MB LLC (single core)
• Shared Cache: 4MB LLC (4-core CMP)
9
JILP
Workloads
• Workload Classes
• SPEC CPU2006 – Reference Inputs (29)
• PC Games and Multimedia (22)
• Enterprise Server (14)
• Tracing Methodology:
• SPEC workload traces captured with Pin (using Sim Points)
• Non-SPEC workloads captured on a HW tracing system
• Simulation Methodology:
• Warm up: 100M instructions
• Detailed Simulation: 100M instructions
• Shorter traces were divided 50/50
10
JILP
Experiments
• Single Threaded Workloads
• All 65 traces
• Heterogeneous Multi-Programmed Workloads
•
•
•
•
7 workloads selected from the three workload classes
4-core combinations for each class created (7 choose 4=35)
35 random selection created from all 21 workloads
Total # of Workloads For Shared Caches: 140
• Metrics:
• ST Workloads: Throughput
• Multi-Core Workloads: Weighted Speedup
All workloads kept secret from ALL contestants
11
paper19_v1
paper19_v3
paper19_v2
paper05
paper04
paper08
paper24
paper12
paper18
paper07_v2
paper14
paper07_v1
paper16_v1
paper13
paper16_v2
paper29
paper10
paper23
paper26
paper16_v3
paper27
paper17
paper28
paper22
paper21_v1
paper25
paper20_v1
paper20_v2
paper21_v2
paper15
paper09_v2
paper09_v1
paper06
paper20_v3
paper11
Performance Relative to LRU
JILP
Private Cache Championship Results
1.03
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.95
12
JILP
Private Cache Championship Awards
• 3rd Place:
•
D. Jimenez. Dead Block Replacement and Bypass with a Sampling Predictor
• 2nd Place:
•
P. Michaud. The 3P and 4P cache replacement policies
• Champion:
•
H. Gao and C. Wilkerson. A Dueling Segmented LRU Replacement Algorithm with
Adaptive Bypassing
13
paper19_v1
paper19_v3
paper19_v2
paper24
paper05
paper16_v3
paper29
paper04
paper22
paper16_v2
paper12
paper08
paper07_v2
paper16_v1
paper18
paper17
paper10
paper07_v1
paper21_v1
paper13
paper26
paper14
paper20_v2
paper21_v2
paper20_v1
paper27
paper25
paper28
paper09_v1
paper06
paper09_v2
paper20_v3
paper23
paper11
paper15
Performance Relative to LRU
JILP
Shared Cache Championship
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.95
14
JILP
Shared Cache Championship Awards
• 3rd Place:
•
P. Michaud. The 3P and 4P cache replacement policies
• 2nd Place:
•
Y. Ishii, M. Inaba, and K. Hiraki. Map-based Adaptive Insertion Policy
• Champion:
•
H. Gao and C. Wilkerson. A Dueling Segmented LRU Replacement Algorithm with
Adaptive Bypassing
15
paper19_v1
paper19_v3
paper19_v2
paper05
paper04
paper08
paper24
paper12
paper18
paper07_v2
paper14
paper07_v1
paper16_v1
paper13
paper16_v2
paper29
paper10
paper23
paper26
paper16_v3
paper27
paper17
paper28
paper22
paper21_v1
paper25
paper20_v1
paper20_v2
paper21_v2
paper15
paper09_v2
paper09_v1
paper06
paper20_v3
paper11
Performance Relative to LRU
JILP
Private Cache Championship Results
1.03
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.95
16
paper19_v1
paper19_v3
paper19_v2
paper24
paper05
paper16_v3
paper29
paper04
paper22
paper16_v2
paper12
paper08
paper07_v2
paper16_v1
paper18
paper17
paper10
paper07_v1
paper21_v1
paper13
paper26
paper14
paper20_v2
paper21_v2
paper20_v1
paper27
paper25
paper28
paper09_v1
paper06
paper09_v2
paper20_v3
paper23
paper11
paper15
Performance Relative to LRU
JILP
Shared Cache Championship
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.95
17