Memory , Hierarchical Memory Systems Cache memory Prof. Sin-Min Lee

Download Report

Transcript Memory , Hierarchical Memory Systems Cache memory Prof. Sin-Min Lee

Memory , Hierarchical Memory
Systems
Cache memory
Prof. Sin-Min Lee
Department of Computer Science
The Five Classic Components of a
Computer
Processor
Input
Control
Memory
Datapath
Output
The Processor
Picture
Processor/Memory
Bus
PCI Bus
I/O Busses
Logic:
DRAM:
Disk:
Capacity Speed (latency)
2x in 3 years
2x in 3 years
4x in 3 years
2x in 10 years
4x in 3 years
2x in 10 years
Technology Trends
DRAM
Year
1980
1983
1986
1989
1992
1995
Size
64 Kb
Cycle Time
250 ns
256 Kb
1 Mb
4 Mb
220 ns
190 ns
165 ns
16 Mb
145 ns
64 Mb
120 ns
1000:1!
2:1!
Predicting Performance Change:
Moore's Law
Original version:
The density of transistors in an integrated circuit will
double every year. (Gordon Moore, Intel, 1965)
Current version:
Cost/performance of silicon chips doubles every 18 months.
Processor-DRAM Memory Gap (latency)
100
10
1
µProc
60%/yr.
“Moore’s Law”
(2X/1.5yr
Processor-Memory
)
Performance Gap:
(grows 50% / year)
DRAM
DRAM
9%/yr.
(2X/10
yrs)
CPU
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance
1000
Time
The connection between the
CPU and cache is very fast;
the connection between the
CPU and memory is slower
There are three methods in block placement: Direct mapped :
if each block has only one place it can appear in the cache, the
cache is said to be direct mapped. The mapping is usually
(Block address) MOD (Number of blocks in cache)
Fully Associative : if a block can be placed anywhere in the
cache, the cache is said to be fully associative.
Set associative : if a block can be placed in a restricted set of
places in the cache, the cache is said to be set associative . A set
is a group of blocks in the cache. A block is first mapped onto a
set, and then the block can be placed anywhere within that set.
The set is usually chosen by bit selection; that is,
(Block address) MOD (Number of sets in cache)
Cache (cont.)
Bits 2-4 of main
Direct Mapping Cache
memory address is
the cache address
(index). The upper 5
bits of main memory
(tag) is stored in
cache along with
data. If tag and index
requested from CPU
matches, it’s a cache
hit.
•A pictorial example for a cache with only 4 blocks and
a memory with only 16 blocks.
Replacement Policies
Whenever there is a “miss”, the information
must be read from main memory. In
addition, the cache is updated with this new
information. One line will be replaced with
the new block of information.
Policies for doing this vary. The three most
commonly used are FIFO, LRU, and
Random.
FIFO Replacement Policy
 First in, first out – Replaces the oldest line in the
cache, regardless of the last time that this line was
accessed.
 The main benefit is that this is easy to implement.
 The principle drawback is that you won’t keep any
item in cache for long – you may find that you are
constantly removing and adding the same block of
memory.
Hit Ratio
 The hit ratio–hits divided by the sum of hits
and misses–is a measure of cache
performance.
 A well-designed cache can have a hit ratio
close to 1.

The number of cache hits, far outnumber the
misses and this speeds up system performance
dramatically.
Total = 14
Hit= 4
Hit ratio =2/7
LRU Replacement Policy
 Least Recently Used – The line that was
accessed least recently is replaced with the
new block of data.
 The benefit is that this keeps the most
frequently accessed lines in the cache.
 The drawback is that this can be difficult and
costly to implement, especially if there are
lots of lines to consider.
Random Replacement Policy
 With this policy, the line that is replaced is
chosen randomly.
 Performance is close to that of LRU, and the
implementation is much simpler.
Mapping Technique

The cache mapping technique is another factor that determines how effective the cache is,
that is, what its hit ratio and speed will be. Three types are:
1. Direct Mapped Cache: Each memory location is mapped to a single cache line that it
shares with many others; only one of the many addresses that share this line can use it at a
given time. This is the simplest technique both in concept and in implementation. Using
this cache means the circuitry to check for hits is fast and easy to design, but the hit ratio is
relatively poor compared to the other designs because of its inflexibility. Motherboardbased system caches are typically direct mapped.
2. Fully Associative Cache: Any memory location can be cached in any cache line. This is the
most complex technique and requires sophisticated search algorithms when checking for a
hit. It can lead to the whole cache being slowed down because of this, but it offers the best
theoretical hit ratio since there are so many options for caching any memory address.
3. N-Way Set Associative Cache: "N" is typically 2, 4, 8 etc. A compromise between the two
previous design, the cache is broken into sets of "N" lines each, and any memory address
can be cached in any of those "N" lines. This improves hit ratios over the direct mapped
cache, but without incurring a severe search penalty (since "N" is kept small). The 2-way
or 4-way set associative cache is common in processor level 1 caches.
Comparison of cache mapping
techniques
1. Direct Mapped Cache
 The direct mapped cache is the simplest form of
cache and the easiest to check for a hit.
 Since there is only one possible place that any
memory location can be cached, there is nothing to
search; the line either contains the memory
information we are looking for, or it doesn't.
 Unfortunately, the direct mapped cache also has the
worst performance, because again there is only one
place that any address can be stored.
Direct mapped cache example
Data A B C
A
D
B
E
F
A
C
D
B
G
C
H
I
A
B
C 0
A
A
B
B
B
A
A
A
B
B
B
B
B
A
B
D
D
D
D
D
D
D
D
D
D
D
D
D
D
C
C
C
C
C
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
A B B
A 1
C 2
C
C
H 3
E 4
5
E
E
E
E
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
I
I
I
H
H
H
6
7
Hit?
H
Y
Y
Y
2. Fully Associative Cache
 The fully associative cache has the best hit ratio because any
line in the cache can hold any address that needs to be
cached.
 This means the problem seen in the direct mapped cache
disappears, because there is no dedicated single line that an
address must use.
 However, this cache suffers from problems involving
searching the cache. If a given address can be stored in any
of 16,384 lines, how do you know where it is? Even with
specialized hardware to do the searching, a performance
penalty is incurred. And this penalty occurs for all accesses
to memory, whether a cache hit occurs or not, because it is
part of searching the cache to determine a hit.
Associative cache example
Data A B C
A
D
B
E
F
A
C
D
B
G
C
H
I
A
B
C 0
A
A
B
B
B
A
A
A
B
B
B
B
B
A
B
D
D
D
D
D
D
D
D
D
D
D
D
D
D
C
C
C
C
C
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
A B B
A 1
C 2
C
C
H 3
E 4
5
E
E
E
E
E
E
E
E
E
E
E
E
F
F
F
F
F
F
F
F
F
F
F
I
I
I
H
H
H
6
7
Hit?
H
Y
Y
Y
3. N-Way Set Associative Cache
 The set associative cache is a good compromise between the direct mapped and
set associative caches.
 Each address is mapped to a certain set of cache locations.
 The address space is divided into blocks of m bytes (the cache line size),
discarding the bottom m address bits.
 An "n-way set associative" cache with S sets has n cache locations in each set.
Block b is mapped to set "b mod S" and may be stored in any of the n locations
in that set with its upper address bits as a tag. To determine whether block b is in
the cache, set "b mod S" is searched associatively for the tag. .
 In the "real world", the direct mapped and set associative caches are by far the
most common. Direct mapping is used more for level 2 caches on motherboards,
while the higher-performance set-associative cache is found more commonly on
the smaller primary caches contained within processors.
2-Way Set-Associative example
Data
A B
C A D
B
E
F
A
C
D
B
G
C
H
I
A
B
C 0
A 0
A0
A1
B0
A1
B0
A1
B0
E0
B1
E0
B1
E1
A0
E1
A0
E1
A0
B0
A1
B0
A1
B0
A1
B0
A1
B0
A1
B1
A0
A0
A1
D0
D0
D0
D1
F0
D1
F0
D1
F0
D0
F1
D0
F1
D0
F1
D0
F1
D0
F1
D0
F1
D0
F1
D0
F1
C0
C0
C0
C0
C0
C0
C0
C0
C0
C0
C0
C1
I0
C1
I0
C1
I0
G0
G0
G1
H0
G1
H0
G1
H0
G1
H0
Y
Y
A1
B0
A0
B1
C 1
H 1
E 2
2
C0
C0
3
3
Hit?
Y
Y
Y
Y
Y
Summary of mapping techniques
Cache Type
Hit Ratio
Search Speed
Direct Mapped
Good
Best
Fully Associative Best
N-Way Set
Associative
Moderate
Very good, better Good, but gets
as N increases.
worse as N
increases.