Achieving Non-Inclusive Cache Performance Without
Download
Report
Transcript Achieving Non-Inclusive Cache Performance Without
SHiP: Signature-based Hit Predictor for
High Performance Caching
*Carole-Jean
*Margaret
*Princeton
Wu, #Aamer Jaleel, #,+William Hasenplaugh,
Martonosi, #Simon Steely Jr., #,+Joel Emer
University
#Intel
Corporation, VSSAD
#,+MIT
IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)
Motivation
• Factors making caching important
• Increasing ratio of CPU speed to memory speed
• Multi-core poses challenges on better shared cache management
• LRU has been the standard LLC replacement policy
• However LRU has problems!
2
Problems with LRU Replacement
Working set larger than the cache causes thrashing
miss
miss
miss
miss
miss
Wsize
LLCsize
References to non-temporal data (scans) discards frequently referenced working set
hit
Wsize
hit
hit
scan
LLCsize
miss
hit
scan
miss
scan
miss
scans occur frequently in commercial workloads
3
Desired Behavior from Cache Replacement
Wsize
LLCsize
hit
miss
hit
miss
hit
miss
hit
miss
hit
miss
Working set larger than the cache Preserve some of working set in the cache
[ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]
Recurring scans Preserve frequently referenced working set in the cache
hit
hit
hit
scan
hit
hit
scan
hit
scan
hit
[ SRRIP (ISCA’10) achieves this effect ]
4
Dynamic Re-Reference Interval Prediction ( DRRIP )
(SRRIP)
Scan-Resistant
insertion
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
( BRRIP )
Thrash-Resistant
insertion
No Victim
3
distant
eviction
re-reference
re-reference
[ Jaleel et al., ISCA’10 ]
5
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
hit
“short” scan
hit
hit
“long” scan
miss
6
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
•
hit
“short” scan
hit
“long” scan
hit
miss
Active working-set MUST be RE-REFERENCED at least ONCE between scans
miss
scan
miss
scan
miss
scan
miss
7
SRRIP Not Always Scan Resistant…
•
LONG scans in access pattern
miss
•
hit
“short” scan
hit
“long” scan
hit
hit
miss
Active working-set MUST be RE-REFERENCED at least ONCE between scans
miss
scan
miss
hit
scan
miss
hit
scan
miss
hit
Can We Be More Intelligent in Dealing with Scans?
8
Closer Look at Scan Access Patterns
scan
Future Reference
scan
No Future References
Assuming Perfect Knowledge of Re-Reference Pattern
9
Improving RRIP on Cache Insertions
Improve Insertion
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
No Victim
scan
3
distant
eviction
re-reference
re-reference
Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion
10
Focus of this Paper…
• Goal: Learn re-reference interval of a cache line
cache access
PREDICTOR
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
How Best to Learn the Re-Reference Interval?
11
Learning Re-Reference Behavior
scan
REFERENCED BY
SIMILAR SET OF PCs
scan
REFERENCE SAME
MEMORY REGION
Can We Learn Re-References By Correlating Accesses With Some Other Information?
12
Learning Re-Reference Behavior
scan
REFERENCED BY
SIMILAR SET OF PCs
scan
REFERENCE SAME
MEMORY REGION
Can We Learn Re-References By Correlating Accesses With Some Other Information?
13
Using Signatures to Correlate Re-Reference
scan
scan
• Different types of information:
• Memory Region
• Memory Instruction PC
• Instruction Sequence
“signature“
• Observation: LLC accesses by the same “signature” tend to have
similar re-reference patterns
OBSERVE, LEARN and PREDICT Re-Reference Pattern of a Signature
Observe Signature Re-Reference Behavior
Load/Store
Address
• Observe re-reference pattern in the baseline cache
• Cache Tag
• Replacement State
• Coherence State
LLC
15
Observe Signature Re-Reference Behavior
Load/Store
• Was line re-referenced after cache insertion ( 1-bit )
• “Signature” responsible for cache insertion ( 14-bits )
Address
• Hardware Required:
Signature
• Observe re-reference pattern in the baseline cache
• reuse bit
• signature_insert
metadata
LLC
16
Learn Signature Re-Reference Behavior
• Learn signature re-reference behavior
• Hardware Required:
• Signature History Counter Table (SHCT) ( 16K, 2-bit counters )
• SHCT Training:
counter = 0, signature NOT re-referenced
counter != 0, signature re-referenced
• If evicted line reused:
SHCT [ signature_insert ] ++
SHCT
• If evicted line NOT reused:
SHCT [ signature_insert ] -Last Level Cache (LLC)
17
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval of line using SHCT
cache hit/miss
signature
SHiP
SHCT
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
18
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval using SHCT on CACHE MISS
cache miss
signature
SHiP Re-Reference Predictions On Miss
if ( SHCT [ signature ] == 0 )
predict DISTANT (i.e. 3)
else
predict FAR (i.e. 2)
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
19
Signature-based Hit Predictor (SHiP)
• Predict re-reference interval on CACHE HIT
cache hit
signature
SHiP Re-Reference Predictions On Hit
Always predict IMMEDIATE (i.e. 0)
re-reference
prediction
0:
1:
2:
3:
immediate
intermediate
far
distant
20
SHCT
SHiP
hit/miss
data
Access Type
Address
Signature
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
Re-Reference Prediction
Last Level Cache (LLC)
21
SHCT
SHiP
hit/miss
data
Address
Signature
Per-Line Overhead Can Be Reduced by using
Set Sampling ( need only 32 - 64 sets )
Access Type
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
Re-Reference Prediction
Last Level Cache (LLC)
22
~6 KB
SHCT
hit/miss
data
Address
Signature
Per-Line Overhead Can Be Reduced by using
Set Sampling ( need only 32 - 64 sets )
Access Type
SHiP – High Level Architectural Overview
SHCT Training
signature_insert
reuse_bit
LLC hit/miss
SHiP
Re-Reference Prediction
Last Level Cache (LLC)
23
Performance Comparison of Replacement Policies
Performance Relative to LRU
1.15
16-way 2MB LLC
Core i7 Type Hierarchy
SRRIP
DRRIP
SHiP-PC
1.10
1.05
1.00
Mm./Games
Server
SPEC2K6
All
SHiP Significantly Improves Performance Across All Workload Categories
24
Performance Relative to LRU
Performance Comparison of Replacement Policies
CRC Results Comparison
1.15
SRRIP
1.10
DRRIP
Seg-LRU
SDBP
Averaged Across PC Games, Multimedia, Enterprise
Server, SPEC CPU2006 Workloads
1.05
SHiP
SHiP-PC
S
H
i
P
1.00
16-way 1MB Private Cache
16-way 4MB Shared Cache
65 Single-Threaded Workloads
165 4-core Workloads
SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies
25
Total Storage Overhead (16-way Set Associative Cache)
• LRU:
• Pseudo-LRU
• RRIP:
• Seg-LRU:
4-bits / cache block
1-bit / cache block
[ ISCA’10 ]
2-bits / cache block
[ CRC’10
~8-bits / cache block
]
• SDBP:
[ MICRO’10 ]
~10-bits / cache block
• SHiP:
[ MICRO’11 ]
~5-bits / cache block
SHiP Outperforms State-of-the-Art with HW Similar to LRU
26
Summary
• Scan-resistance is an important problem in commercial workloads
• State-of-the art policies do not fully address scan-resistance
• Signatures help improve re-reference predictions to address scans
• Need fine-grained re-reference predictions at insertion
• Proposed a Simple and Practical Scan-Resistant Replacement
• SHiP significantly outperforms winner of CRC Championship
• SHiP requires less storage than CRC winner
• HW overhead of SHiP is comparable to LRU
27
Q&A
28
Q&A
29
Q&A
30
Re-Reference Interval Prediction ( RRIP )
CAN INSERTION BE
Scan-Resistant
MORE
INTELLIGENT?
insertion
0
Immediate
No Victim
1
Intermediate
No Victim
2
far
re-reference
No Victim
3
distant
eviction
re-reference
re-reference
31
Using Signatures to Correlate Re-Reference Behavior
SIGN
ATURE
a
b
a
c
scan
d
c
scan
Example Signatures
Memory Region
Program Counter
Instruction Decode History
Future Cache Hits
No Future Cache Hits
a c
b d
32
LRU vs. Re-Reference Interval Prediction (RRIP)
Physical Way #
0
1
2
3
4
5
6
7
Cache Tag
s
c
b
h
f
d
g
e
2
2
1
3
0
5
4
7
6
“LRU Chain” position
LRU
RRIP Outperforms LRU with Storage Less Than LRU
Physical Way #
0
1
2
3
4
5
6
7
Cache Tag
s
c
b
h
f
d
g
e
Re-Reference Prediction
0
0
2
1
0
2
2
3
3
RRIP
33
Signature-based Hit Predictor (SHiP)
LLC
hit/miss
data
Access Type
Address
• Learn Re-Reference Behavior:
Signature
• Goal: Predict the re-reference behavior of a signature
34