Transcript Slide 1

2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)
Characterization and Dynamic Mitigation
of Intra-Application Cache Interference
Carole-Jean Wu and Margaret Martonosi
Princeton University
4/11/2011
1/23
Today’s CMP systems
Operating System
App. 1
App.
App.22
App. 3
1/23
App. 4
Within a single application,
cache interference can stem from…
Operating System
App. 1
HW Prefetch Req.
TLB Miss Handling
Other OS Req.
App. Data ld/st
2/23
Real-System LLC Miss Characterization
Others [prefetching, page table walks, and etc]
Application LLC Misses
Percentage (%)
100%
80%
>50% of LLC misses are due to prefetching, TLB
60%
miss handling, other OS refs, etc.
40%
20%
0%
3/23
Prior Work for Intra-Application
Cache Interference
• System-induced Cache Interference
– Characterization indicates significant OS/user cache interference
[Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92]
– Reduce TLB miss handling effects
[Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10]
• Prefetch-induced Cache Interference
– Prefetch buffer/filter
[Peir et al. ICS ’02] [Hur and Lin MICRO ’06]
– Replacement
policies
(Prefetch
bit per cache
line)
But all
require
hardware
modification
[Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01]
– Prefetching algorithms
[Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04]
1/23
4/23
Contributions of This Paper
1. Cache interference within an application is a
problem
 Real-system characterization
 Detailed full-system simulation
2. Dynamic management mechanisms
 System-aware cache management
 Real-system, real-time prefetch manager
1/23
5/23
Talk Outline




Motivation and Prior Work
Measurement Methodology
Intra-Application Interference Characterization
Dynamic Mitigation of LLC Interference
 System-Aware Cache Management
 Real-System Dynamic Prefetch Manager
 Conclusion
1/23
6/23
Measurement Methodology
• Real-system infrastructure
– Intel Nehalem-based Core i7 (Bloomfield)
– perfmon2 to access hardware PMCs
• Full-system simulation: Simics/GEMS
– Simics/GEMS full system simulation
• Benchmarks
– SPEC CPU2006 benchmark suite
1/23
7/23
System-Mode Reference Breakdown
System-Mode Reference
Breakdown
page table walk references
other system-mode references
100%
80%
60%
40%
20%
0%
80% of system references are due to TLB miss
handling (details in the paper).
1/23
8/23
Memory Reuse Characteristics Analysis
for User References
User
System
Zero-reused cachelines
User-Mode References
100%
80%
60%
40%
20%
0%
mcf
sphinx3
sjeng
bzip2
System cache lines destroy good data locality
zero-reused cache lines [baseline]
of user lines when sharing the cache!
1/23
zero-reused cache lines [user only]
9/23
Avg.
Memory Reuse Characteristics Analysis
for System References
User
System
Zero-reused cachelines
System-Mode References
100%
80%
60%
40%
20%
0%
mcf
sphinx3
sjeng
bzip2
zero reused cachelines [baseline]
Majority of system cache
Bypassing
lines
lines?are not reused.
zero reused cache lines [system only]
1/23
10/23
Avg.
System-Aware Cache Management
0xEEEA
Refs
LRU
MRU
....
1/23
....
11/23
System-Aware Cache Management
LRU
MRU
Refs
0X001A
0XDADA
MRU
....
0XEEAF
1/23
0X1234
MID
12/23
....
0XDFAE
0xEEEA
LRU
System-Aware Cache Management
LRU
MRU
user
Refs
0XDADA
MRU
….
....
0XEEAF
0X1234
MID
system
SYS-LRUinsert
1/23
13/23
....
0XDFAE
0xEEEA
LRU
System-Aware Cache Management
LRU
MRU
user
Refs
0XDADA
MRU
….
....
0XEEAF
0X1234
MID
system
SYS-MIDinsert
1/23
14/23
....
0xEEEA
LRU
System-Aware Cache Management
LRU
MRU
user
Refs
0XDADA
MRU
….
....
0XEEAF
0X1234
MID
....
0xBEEF
LRU
system
SYS-DYNAMIC
*Set sampling: DIP [Qureshi et al. ISCA ‘07]
1/23
15/23
IPC Performance Improvement
Aggr. IPC Normalized to Baseline
(Higher is Better)
SYS-LRUinsert
SYS-MIDinsert
SYS-DYNAMIC
1.3
1.2
1.1
1
0.9
0.8
SYS-DYNAMIC improves performance for ALL
applications by as much as 10% (avg. of 3%).
1/23
16/23
Talk Outline




Motivation and Prior Work
Measurement Methodology
Intra-Application Interference Characterization
Dynamic Mitigation of LLC Interference
 System-Aware Cache Management
 Real-System Dynamic Prefetch Manager
 Conclusion
1/23
17/23
Intra-application cache interference can
also stem from hardware prefetching
L1 Instruction & Streamer
Prefetchers
Mid-Level Cache (MLC) Spatial
& Streamer Prefetchers
1/23
18/23
Intra-Application Interference Caused
by Hardware Prefetching
Miss Counts Normalized to System
Default [ALL Prefetchers On]
3
MLC Prefetcher OFF  Less LLC Misses for
libquantum and sphinx3
2.5
2
1.5
1
0.5
0
Application LLC Misses
1/23
19/23
Dynamic Prefetch Management
• Use Nehalem’s Precise Event Based Sampling (PEBS)
• Sample application inst. count periodically.
K
K
Inst.
Inst.
ON
OFF
Read
RDTSC
t0
t1
N
.....
MLC prefetchers ON
Read
RDTSC
t2
if ( t2 - t1 > t1 – t0)
Turn ON MLC prefetchers;
else
Turn OFF MLC prefetchers;
1/23
20/23
time
Dynamic Management Mitigating
Prefetch-Induced LLC Interference
Application LLC Miss Counts
Normalized to System Default
Prefetchers On (System Default)
3
Prefetchers Off
Dynamic Management
2.5
2
1.5
1
0.5
0
Dynamic modulation of MLC prefetchers >>
Static ON/OFF prefetch options.
1/23
21/23
Summary
 Dynamic System-Aware Cache Management
 Full-system evaluation (OS effects)
 Performance improvement by as much as 10%
(on avg. 3%).
 Real-time Dynamic Prefetch Manager
 Real-system implementation on Nehalem PEBS
 25% LLC miss count reduction  performance+,
bandwidth & energy saving
1/23
22/23
Characterization and Dynamic Mitigation of
Intra-Application Cache Interference
Operating System
*Intra-application* cache Interference from
App. 1
modern
hardware prefetching & OS
influence app. performance significantly!
HW Prefetch Req.
App. Data ld/st
1/23
TLB Miss Handling
Other OS Req.
23/23
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)
Characterization and Dynamic Mitigation
of Intra-Application Cache Interference
Carole-Jean Wu and Margaret Martonosi
{carolewu, mrm}@princeton.edu
1/23