Achieving Non-Inclusive Cache Performance Without

Download Report

Transcript Achieving Non-Inclusive Cache Performance Without

SHiP: Signature-based Hit Predictor for
High Performance Caching
*Carole-Jean
*Margaret
*Princeton
Wu, #Aamer Jaleel, #,+William Hasenplaugh,
Martonosi, #Simon Steely Jr., #,+Joel Emer
University
#Intel
Corporation, VSSAD
#,+MIT
IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)
Motivation
• Factors making caching important
• Increasing ratio of CPU speed to memory speed
• Multi-core poses challenges on better shared cache management
• LRU has been the standard LLC replacement policy
• However LRU has problems!
2
Problems with LRU Replacement
Working set larger than the cache causes thrashing
miss
miss
miss
miss
miss
Wsize
LLCsize
References to non-temporal data (scans) discards frequently referenced working set
hit
Wsize
hit
hit
scan
LLCsize
miss
hit
scan
miss
scan
miss
scans occur frequently in commercial workloads
3
Desired Behavior from Cache Replacement
Wsize
LLCsize
hit
miss
hit
miss
hit
miss
hit
miss
hit
miss
Working set larger than the cache  Preserve some of working set in the cache
[ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]
Recurring scans  Preserve frequently referenced working set in the cache
hit
hit
hit
scan
hit
hit
scan
hit
scan
hit
[ SRRIP (ISCA’10) achieves this effect ]
4
Dynamic Re-Reference Interval Prediction ( DRRIP )
(SRRIP)
Scan-Resistant
insertion
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
( BRRIP )
Thrash-Resistant
insertion
No Victim
3
distant
eviction
re-reference
re-reference
[ Jaleel et al., ISCA’10 ]
5
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
hit
“short” scan
hit
hit
“long” scan
miss
6
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
•
hit
“short” scan
hit
“long” scan
hit
miss
Active working-set MUST be RE-REFERENCED at least ONCE between scans
miss
scan
miss
scan
miss
scan
miss
7
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
•
hit
“short” scan
hit
“long” scan
hit
hit
miss
Active working-set MUST be RE-REFERENCED at least ONCE between scans
miss
scan
miss
hit
scan
miss
hit
scan
miss
hit
Can We Be More Intelligent in Dealing with Scans?
8
Closer Look at Scan Access Patterns
scan
Future Reference
scan
No Future References
Assuming Perfect Knowledge of Re-Reference Pattern
9
Improving RRIP on Cache Insertions
 Improve Insertion 
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
No Victim
scan
3
distant
eviction
re-reference
re-reference
Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion
10
Focus of this Paper…
• Goal: Learn re-reference interval of a cache line
cache access
PREDICTOR
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
How Best to Learn the Re-Reference Interval?
11
Learning Re-Reference Behavior
scan
REFERENCED BY
SIMILAR SET OF PCs
scan
REFERENCE SAME
MEMORY REGION
Can We Learn Re-References By Correlating Accesses With Some Other Information?
12
Learning Re-Reference Behavior
scan
REFERENCED BY
SIMILAR SET OF PCs
scan
REFERENCE SAME
MEMORY REGION
Can We Learn Re-References By Correlating Accesses With Some Other Information?
13
Using Signatures to Correlate Re-Reference
scan
scan
• Different types of information:
• Memory Region
• Memory Instruction PC
• Instruction Sequence
“signature“
• Observation: LLC accesses by the same “signature” tend to have
similar re-reference patterns
OBSERVE, LEARN and PREDICT Re-Reference Pattern of a Signature
Observe Signature Re-Reference Behavior
Load/Store
Address
• Observe re-reference pattern in the baseline cache
• Cache Tag
• Replacement State
• Coherence State
LLC
15
Observe Signature Re-Reference Behavior
Load/Store
• Was line re-referenced after cache insertion ( 1-bit )
• “Signature” responsible for cache insertion ( 14-bits )
Address
• Hardware Required:
Signature
• Observe re-reference pattern in the baseline cache
• reuse bit
• signature_insert
metadata
LLC
16
Learn Signature Re-Reference Behavior
• Learn signature re-reference behavior
• Hardware Required:
• Signature History Counter Table (SHCT) ( 16K, 2-bit counters )
• SHCT Training:
counter = 0, signature NOT re-referenced
counter != 0, signature re-referenced
• If evicted line reused:
SHCT [ signature_insert ] ++
SHCT
• If evicted line NOT reused:
SHCT [ signature_insert ] -Last Level Cache (LLC)
17
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval of line using SHCT
cache hit/miss
signature
SHiP
SHCT
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
18
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval using SHCT on CACHE MISS
cache miss
signature
SHiP Re-Reference Predictions On Miss
if ( SHCT [ signature ] == 0 )
predict DISTANT (i.e. 3)
else
predict FAR (i.e. 2)
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
19
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval on CACHE HIT
cache hit
signature
SHiP Re-Reference Predictions On Hit
Always predict IMMEDIATE (i.e. 0)
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
20
SHCT
SHiP
hit/miss
data
Access Type
Address
Signature
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
Re-Reference Prediction
Last Level Cache (LLC)
21
SHCT
SHiP
hit/miss
data
Address
Signature
Per-Line Overhead Can Be Reduced by using
Set Sampling ( need only 32 - 64 sets )
Access Type
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
Re-Reference Prediction
Last Level Cache (LLC)
22
~6 KB
SHCT
hit/miss
data
Address
Signature
Per-Line Overhead Can Be Reduced by using
Set Sampling ( need only 32 - 64 sets )
Access Type
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
SHiP
Re-Reference Prediction
Last Level Cache (LLC)
23
Performance Comparison of Replacement Policies
Performance Relative to LRU
1.15
16-way 2MB LLC
Core i7 Type Hierarchy
SRRIP
DRRIP
SHiP-PC
1.10
1.05
1.00
Mm./Games
Server
SPEC2K6
All
SHiP Significantly Improves Performance Across All Workload Categories
24
Performance Relative to LRU
Performance Comparison of Replacement Policies
CRC Results Comparison
1.15
SRRIP
1.10
DRRIP
Seg-LRU
SDBP
Averaged Across PC Games, Multimedia, Enterprise
Server, SPEC CPU2006 Workloads
1.05
SHiP
SHiP-PC
S
H
i
P
1.00
16-way 1MB Private Cache
16-way 4MB Shared Cache
65 Single-Threaded Workloads
165 4-core Workloads
SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies
25
Total Storage Overhead (16-way Set Associative Cache)
• LRU:
• Pseudo-LRU
• RRIP:
• Seg-LRU:
4-bits / cache block
1-bit / cache block
[ ISCA’10 ]
2-bits / cache block
[ CRC’10
~8-bits / cache block
]
• SDBP:
[ MICRO’10 ]
~10-bits / cache block
• SHiP:
[ MICRO’11 ]
~5-bits / cache block
SHiP Outperforms State-of-the-Art with HW Similar to LRU
26
Summary
• Scan-resistance is an important problem in commercial workloads
• State-of-the art policies do not fully address scan-resistance
• Signatures help improve re-reference predictions to address scans
• Need fine-grained re-reference predictions at insertion
• Proposed a Simple and Practical Scan-Resistant Replacement
• SHiP significantly outperforms winner of CRC Championship
• SHiP requires less storage than CRC winner
• HW overhead of SHiP is comparable to LRU
27
Q&A
28
Q&A
29
Q&A
30
Re-Reference Interval Prediction ( RRIP )
CAN INSERTION BE
Scan-Resistant
MORE
INTELLIGENT?
insertion
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
No Victim
3
distant
eviction
re-reference
re-reference
31
Using Signatures to Correlate Re-Reference Behavior
SIGN
ATURE
a
b
a
c
scan
d
c
scan
Example Signatures
Memory Region
Program Counter
Instruction Decode History
Future Cache Hits
No Future Cache Hits
a c
b d
32
LRU vs. Re-Reference Interval Prediction (RRIP)
Physical Way #
0
1
2
3
4
5
6
7
Cache Tag
s
c
b
h
f
d
g
e
2
2
1
3
0
5
4
7
6
“LRU Chain” position
LRU
RRIP Outperforms LRU with Storage Less Than LRU
Physical Way #
0
1
2
3
4
5
6
7
Cache Tag
s
c
b
h
f
d
g
e
Re-Reference Prediction
0
0
2
1
0
2
2
3
3
RRIP
33
Signature-based Hit Predictor (SHiP)
LLC
hit/miss
data
Access Type
Address
• Learn Re-Reference Behavior:
Signature
• Goal: Predict the re-reference behavior of a signature
34