Topic7c-323-08F - University of Delaware

Download Report

Transcript Topic7c-323-08F - University of Delaware

Cache Parameters
• Cache size :
Scache
(lines)
• Set number:
N
(sets)
• Line number per set:
K
(lines/set)
Scache = KN (lines)
= KN * L (bytes)  Here L is line size in bytes
K-way set-associative
2015/7/20
\course\cpeg324-08F\Topic7c
1
Trade-offs in Set-Associativity
Fully-associative:
- Higher hit ratio, concurrent search, but slow access when
associativity is large.
Direct mapping:
- Fast access (if hits) and simplicity for comparison.
- Trivial replacement algorithm.
Problem with hit ratio, e.g. in extreme case: if alternatively use 2
blocks which mapped into the same cache block frame: “trash”
may happen.
2015/7/20
\course\cpeg324-08F\Topic7c
2
Note
Main memory size:
Smain
(blocks)
Cache memory Size:
Scache
(blocks)
You need search!
Smain
Let P =
Scache
Since P >>1.
Average search length is much greater than 1.
• Set-associativity provides a trade-off between:
-
Concurrency in search.
-
Average search/access time per block.
2015/7/20
\course\cpeg324-08F\Topic7c
3
Number of sets
1
Full
associative
2015/7/20
<
N
<
Set
associative
\course\cpeg324-08F\Topic7c
Scache
Direct
Mapped
4
Important Factors in
Cache Design
• Address partitioning strategy
(3-dimention freedom).
• Total cache size/memory size
• Work load
2015/7/20
\course\cpeg324-08F\Topic7c
5
Address Partitioning
M bits
Log N
Set number
Log L
byte address in a line
• Byte addressing mode
Cache memory size data part = NKL (bytes)
• Directory size (per entry)
M - log2N - log2L
• Reduce clustering (randomize accesses)
2015/7/20
\course\cpeg324-08F\Topic7c
set size
6
Note: The exists a knee
1.0
0.9
0.8
Miss Ratio
0.7
0.6
0.5
0.4
0.34
0.3
0.2
0.1
8 10
20
30
40
Cache Size
General Curve Describing Cache Behavior
2015/7/20
\course\cpeg324-08F\Topic7c
7
…the data are sketchy and highly dependent on the
method of gathering...
… designer must make critical choices using a
combination of “hunches, skills, and experience” as
supplement…
“a strong intuitive feeling concerning a future event or
result.”
2015/7/20
\course\cpeg324-08F\Topic7c
8
Basic Principle
• Typical workload study + intelligent estimate of others
• Good Engineering: small degree over-design
• “30% rule”:
-
Each doubling of the cache size reduces misses
by 30% by Alan J. Smith. Cache Memories. Computing
Surveys, Vol. 14., No 13, Sep 1982.
-
2015/7/20
It is a rough estimate only.
\course\cpeg324-08F\Topic7c
9
K: Associativity
• Bigger
 Miss ratio
• Smaller is better in:
-
Faster
Simpler
-
Cheaper
• 4 ~ 8 get best miss ratio
2015/7/20
\course\cpeg324-08F\Topic7c
10
L : Line Size
• Atomic unit of transmission
• Miss ratio
• Smaller
-
Larger average delay
-
Less traffic
-
Larger average hardware cost for associative search
-
Larger possibility of “Line crossers”
• Workload dependent
• 16 ~ 128 byte
2015/7/20
Memory references spanning the
boundary between two cache
lines
\course\cpeg324-08F\Topic7c
11
Cache Replacement Policy
• FIFO (first-in-first-out) replace the block loaded furthest in
the past
• LRU (least-recently used) replace the block used furthest
in the past
• OPT (furthest-future used) replace the block which will be
used furthest in the future.
Do not retain lines that have next occurrence in the most
distant future
Note: LRU performance is close to OPT for frequently
encountered program structures.
2015/7/20
\course\cpeg324-08F\Topic7c
12
Example: Misses and
Associativity
Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
A. Direct Mapped Cache.
Blue text Data used in time t.
Black text Data used in time t-1.
2015/7/20
5 misses for the 5 accesses
\course\cpeg324-08F\Topic7c
13
Example: Misses and
Associativity (cont’d)
Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
B. Two-way set-associative. LRU replacement policy
Blue text Data used in time t.
Black text Data used in time t-1.
2015/7/20
4 misses for the 5 accesses
\course\cpeg324-08F\Topic7c
14
Example: Misses and
Associativity (cont’d)
Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8.
C. Fully associative Cache.
-
Any memory block can be stored in any cache block.
Blue text Data used in time t.
Black text Data used in time t-1.
Red text  Data used in time t-2.
2015/7/20
3 misses for the 5 accesses
\course\cpeg324-08F\Topic7c
15
Program Structure
….
for i = 1 to n
for j = 1 to n
endfor
endfor
Last-in-first-out feature makes the recent
past likes the near future
2015/7/20
\course\cpeg324-08F\Topic7c
16
Problem with LRU
• Not good in mimic sequential/cyclic
Example
ABCDEF ABC…… ABC……
Exercise: With a set size of 3, what is the miss ratio
assuming all 6 addresses mapped to the same set ?
2015/7/20
\course\cpeg324-08F\Topic7c
19
Performance Evaluation
Methods for Workload
• Analytical modeling.
• Simulation
• Measuring
2015/7/20
\course\cpeg324-08F\Topic7c
23
Cache Analysis Methods
• Hardware monitoring:
-
Fast and accurate.
-
Not fast enough (for high-performance
machines).
2015/7/20
-
Cost.
-
Flexibility/repeatability.
\course\cpeg324-08F\Topic7c
24
cont’d
Cache Analysis Methods
• Address traces and machine simulator:
-
Slow.
-
Accuracy/fidelity.
-
Cost advantage.
-
Flexibility/repeatability.
-
OS/other impacts - How to put them in?
2015/7/20
\course\cpeg324-08F\Topic7c
25
Trace Driven Simulation
for Cache
• Workload dependence:
-
Difficulty in characterizing the load.
-
No general accepted model.
• Effectiveness:
-
Possible simulation for many parameters.
-
Repeatability.
2015/7/20
\course\cpeg324-08F\Topic7c
26
Problem in Address Traces
• Representative of the actual workload (hard)
-
Only cover a small fraction of real workload.
-
Diversity of user programs.
• Initialization transient
-
Use long enough traces to absorb the impact
of cold misses
• Inability to properly model multiprocessor
effects
2015/7/20
\course\cpeg324-08F\Topic7c
27