PACMan - Aamer Jaleel

Download Report

Transcript PACMan - Aamer Jaleel

PA Man: Prefetch-Aware Cache
Management for High Performance Caching
Carole-Jean Wu¶, Aamer Jaleel*, Margaret Martonosi¶, Simon
Steely Jr.*, Joel Emer*§
Princeton University¶
Intel VSSAD*
MIT§
December 7, 2011
International Symposium on Microarchitecture
Memory Latency is Performance Bottleneck
• Many commonly studied memory optimization
techniques
• Our work studies two:
– Prefetching
• For our workloads, prefetching alone improves performance by an
avg. of 35%
IPC Performance
Normalized to LRU
– Intelligent Last-Level Cache (LLC) Management
1.08
1.06
1.04
1.02
1
0.98
0.96
LRU
DRRIP
[ISCA `10]
SDBP
[MICRO `10]
SHiP-PC [MICRO `11]
LLC management
alone
No Prefetching
2
L2 Prefetcher: LLC Misses
L1I
CPU2
L1D
L2
L1I
Miss
...
LLC
CPU3
L1D
L2
L1I
PF
Miss
L2
L1D
PF
PF
L1I
CPU1
PF
CPU0
L1D
L2
L2 Prefetcher: LLC Hits
L1I
CPU2
L1D
L2
L1I
Hit
...
LLC
CPU3
L1D
L2
L1I
PF
Miss
L2
L1D
PF
PF
L1I
CPU1
PF
CPU0
L1D
L2
Prefetching
Intelligent LLC Management
Observation 1:
For Not-Easily-Prefetchable Applications…
Mm./Games
Server
SPEC CPU2006
sphinx3
gemsFDTD
bwaves
tpc-c
SHiP-PC
IB
SDBP
GG
DRRIP
halflife2
LRU
final-fantasy
4.5
4
3.5
3
2.5
2
1.5
1
0.5
doom3
(Normalized to LRU without Prefetching)
IPC Performance with Prefetching
Observation 1: Cache pollution causes unexpected performance
degradation despite intelligent LLC Management
Observation 2:
For Prefetching-Friendly Applications
1.1
1.05
LRU
6.5%+
DRRIP
1
SDBP
SHiP-PC
0.95
0.9
SPEC
CPU2006
No
Prefetching
No Prefetching
IPC Performance Normalized to LRU
IPC Performance Normalized to LRU
Observation 2: Prefetched data in LLC diminishes the performance
gains from intelligent LLC management.
1.1
1.05
LRU
DRRIP
3.0%+
1
SDBP
SHiP-PC
0.95
0.9
SPEC
CPU2006
Prefetching
Prefetching
Design Dimensions for Prefetcher/Cache
Management
Prefetcher Cache
Interference
Adaptive prefetch
filters/buffers
✔
Reduced Perf. Gains from
Intelligent LLC Management
✗
Synergistic management for prefetchers
Prefetch pollution
estimation
intelligent LLC management
Perf. counterbased prefetcher
manager
✔
✗
✔
✗
Hardware
Overhead
Some
(new hw.)
andModerate
(pf. bit/line)
Software
PACMan: Prefetch-Aware Cache Management
Research Question 1:
For applications suffering from prefetcher cache pollution,
can PACMan minimize such interference?
Research Question 2:
For applications already benefiting from prefetching,
can PACMan improve performance even more?
Talk Outline
• Motivation
• PACMan: Prefetch-Aware Cache Management
– PACMan-M
– PACMan-H
– PACMan-HM
– PACMan-Dyn
• Performance Evaluation
• Conclusion
Opportunities for a More Intelligent Cache
Management Policy
• A cache line’s state is naturally updated when
– Inserting an incoming cache line @ cache miss
– Updating a cache line’s state @ cache hit
Re-Reference Interval Prediction (RRIP) ISCA `10
Cache line is
re-referenced
Cache line is
inserted
Cache line is
evicted
0
1
PACMan
treats
demand
and prefetch2 requests differently
at
3
ImmeInterdistant
cache
insertion and hitfarpromotion
diate
mediate
No victim is
found
No victim is
found
Cache line is
re-referenced
Cache line is
re-referenced
No victim is
found
14
11
PACMan-M: Treat Prefetch Requests
Differently at Cache Misses
• Reducing prefetcher cache pollution at cache
Cache line is
Cache line is
line insertion
inserted
Demand
Cache line is
re-referenced
0
1
Immediate
Intermediate
Cache line is
re-referenced
Cache line is
re-referenced
evicted
Prefetch
2
3
far
distant
14
PACMan-H: Treat Prefetch Requests
Differently at Cache Hits
• Retaining more “valuable” cache lines at cache
hit promotion
Cache line is
inserted
Cache line is
re-referenced
Demand Hit
Cache line is
evicted
Prefetch Hit
0
1
Immediate
Intermediate
Prefetch Hit
Cache line is
re-referenced
Cache line is
re-referenced
2
3
far
distant
Demand Hit
Demand Hit
Prefetch Hit
16
PACMan-HM = PAMan-H + PACMan-M
Cache line is
inserted
Cache line is
re-referenced
Demand Miss
Demand Hit
Cache line is
evicted
Prefetch Miss
Prefetch Hit
0
1
Immediate
Intermediate
2
3
far
distant
Prefetch Hit
Cache line is
Demand Hit
re-referenced
Cache line is
re-referenced
Prefetch Hit
Demand Hit
PACMan-Dyn dynamically chooses
between static PACMan policies
Set Dueling
SDM
Baseline + PACMan-H
SDM
Baseline + PACMan-M
SDM
Baseline + PACMan-HM
Cnt policy1
MIN
Cnt policy2
Cnt policy3
index
Follower Sets
.
.
.
.
Policy Selection
19
Evaluation Methodology
• CMP$im simulation framework
– 4-way OOO processor
– 128-entry ROB
– 3-level cache hierarchy
• L1 inst. and data caches: 32KB, 4-way, private, 1-cycle
• L2 unified cache: 256KB, 8-way, private, 10-cycle
• L3 last-level cache: 1MB per core, 16-way, shared, 30-cycle
– Main memory: 32 outstanding requests, 200-cycle
• Streamer prefetcher – 16 stream detectors
• DRRIP-based LLC: 2-bit RRIP counter
PACMan-HM Outperforms
PACMan-H and PACMan-M
While PACMan policies improve performance overall,
static PACMan policies can hurt some applications i.e. bwaves and gemsFDTD
Multimedia
Server
SPEC CPU2006
Avg.
All
SPEC2K6
Server
Multimedia
PACMan-HM
sphinx3
gemsFDTD
PACMan-H
bwaves
tpc-c
PACMan-M
IB
GG
halflife2
final-fantasy
DRRIP
doom3
Performance Normalized to
LRU in the Presence of
Prefetching
2
1.8
1.6
1.4
1.2
1
0.8
PACMan-Dyn:
Better and More Predictable Performance Gains
2
DRRIP
1.8
PACMan-M
PACMan-H
PACMan-HM
PACMan-DYN
1.6
1.4
1.2
1
Multimedia
Server
SPEC CPU2006
Avg.
All
SPEC2K6
Server
Multimedia
sphinx3
gemsFDTD
bwaves
tpc-c
IB
GG
halflife2
final-fantasy
0.8
doom3
Performance Normalized to
LRU in the Presence of Prefetching
PACMan-Dyn performs the best (overall) while providing
more consistent performance gains.
PACMan: Prefetch-Aware Cache Management
Research Question 1:
For applications suffering from prefetcher cache pollution,
can PACMan minimize such interference?
Research Question 2:
For applications already benefiting from prefetching,
can PACMan improve performance even more?
IPC Performance Normalized to
Baseline LRU without Prefetching
PACMan Combines Benefits of Intelligent
LLC Management and Prefetching
2.6
Prefetch-Induced
LLC Interference
Prefetching
Friendly
2.2
1.8
1.4
22%
better
15%
better
LRU
DRRIP
PACMan-HM
1
PACMan-DYN
0.6
Prefetching Prefetching Prefetching Prefetching
Mm./Games
Server
SPEC
CPU2006
All
Other Topics in the Paper
• PACMan-Dyn-Local/Global for multiprog.
workloads
– An avg. of 21.0% perf. improvement
• PACMan cache size sensitivity
• PACMan for inclusive, non-inclusive, and
exclusive cache hierarchies
• PACMan’s impact on memory bandwidth
PACMan Conclusion
• First synergistic approach for prefetching and
intelligent LLC management
• Prefetch-aware cache insertion and update
– ~21% performance improvement
– Minimal hardware storage overhead
• PACMan’s Fine-Grained Prefetcher Control
– Reduces performance variability from prefetching
PA Man: Prefetch-Aware Cache
Management for High Performance Caching
Carole-Jean Wu¶, Aamer Jaleel*, Margaret Martonosi¶, Simon
Steely Jr.*, Joel Emer*§
Princeton University¶
Intel VSSAD*
MIT§
December 7, 2011
International Symposium on Microarchitecture