Internal Memory
Download
Report
Transcript Internal Memory
William Stallings
Computer Organization
and Architecture
Chapter 4
Internal Memory
The four-level memory hierarchy
♦Computer memory is
organized into a hierarchy.
♦Decreasing cost/bit,
increasing capacity, slower
access time, and decreasing
frequency of access of the
memory by the processor
♦The cache automatically
retains a copy of some of the
recently used words from the
DRAM.
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
“RAM”
External memory
Backing store
4.1 COMPUTER MEMORY SYSTEM
OVERVIEW
Characteristics of Memory Systems
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
Location
The term location refers to whether memory is
internal or external to the computer.
CPU
The processor requires its own local memory , in the
form of registers.
Internal
Main memory, cache
External
Peripheral storage devices, such as disk and tape
Capacity
Internal memory capacity typically expressed in terms
of bytes(1byte=8bits)or words.
External memory capacity expressed in bytes.
Word
The natural unit of organisation
Word length usually 8, 16 and 32 bits
The size of the word is typically equal to the number
of bits used to represent a number and to the instruction
length. Unfortunately, there are many exceptions.
Unit of Transfer
Internal
Usually governed by data bus width
External
Usually a block which is much larger than a word
Addressable unit
Smallest location which can be uniquely addressed
At the word level or byte level
In any case,
2A=N, A is the length in bits of an address
N is the number of addressable units
Access Methods (1)
Sequential access
Start at the beginning and read through in order
Access time depends on location of data and previous
location
variable
e.g. tape
Direct access
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
variable
e.g. disk
Access Methods (2)
Random
Individual addresses identify locations exactly
Access time is independent of location or previous
access and is constant
e.g. RAM
Associative
Data is located by a comparison with contents of a
portion of the store
Access time is independent of location or previous
access and is constant
e.g. cache
Performance Parameters
Access time
For random-access memory
the time it takes to perform a read or write operation.
Time between presenting the address to the memory and getting the
valid data
For non-random-access memory
The time it takes to position the read-write mechanism at the desired
location.
Memory Cycle time
Cycle time is access time plus additional time
Time may be required for the memory to “recover” before next
access
Transfer Rate
Rate at which data can be moved
Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD (Compact Disk) & DVD (Digital Video Disk)
Others
Bubble
Hologram
Physical Characteristics
Decay
Volatility
In a volatile memory, information decays naturally or is
lost when electrical power is switched off.
In a nonvolatile memory, no electrical power is needed
to retain information, e.g. magnetic-surface memory.
Erasable
Power consumption
Organisation
Organisation means physical arrangement of
bits into words
Obvious arrangement not always used
Memory Hierarchy
Registers
In CPU
Internal or Main memory
May include one or more levels of
cache
“RAM”
External memory
Backing store
The Bottom Line
The design constraints on a computer’s memory:
How much?
Capacity
How fast?
Time is money
How expensive?
A trade-ff among the three key characteristics of
memory: cost, capacity, and access time.
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
Hierarchy List
Across this spectrum of technologies:
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
From top to down:
Decreasing cost per bit
Increasing capacity
Increasing access time
Decreasing frequency of access of the
memory by the processor
So you want fast?
It is possible to build a computer which uses
only static RAM (see later)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount
Locality of Reference
During the course of the execution of a program, memory
references tend to cluster
e.g. loops and subroutines
Main memory is usually extended with a higher-speed,
smaller cache. It is a device for staging the movement of
data between main memory and processor registers to
improve performance.
External memory, called Secondary or auxiliary
memory are used to store program and data files and
visible to the programmer only in terms of files and records.
4.2 Semiconductor Main Memory
Table 4.2 Semiconductor Memory Types
Types of Random-Access
Semiconductor Memory
RAM
Misnamed as all semiconductor memory is random
access, because all of the types listed in the table are
random access.
Read/Write
Volatile
A RAM must be provided with a constant power supply.
Temporary storage
Static or dynamic
Dynamic RAM (DRAM)
Bits stored as charge in capacitors
Charges leak
Need refreshing even when powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Main memory
Static RAM (SRAM)
Bits stored as on/off switches
No charges to leak
No refreshing needed when powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
Faster
Cache
Read Only Memory (ROM)
Permanent storage
Applications
Microprogramming (see later)
Library subroutines
Systems programs (BIOS)
Function tables
Types of ROM
Written during manufacture
Very expensive for small runs
Programmable (once)
PROM
Needs special equipment to program
Read “mostly”
Erasable Programmable (EPROM)
Erased by UV
Electrically Erasable (EEPROM)
Takes much longer to write than read
Flash memory
It is intermediate between EPROM and EEPROM in both cost and
functionality.
Erase whole memory electrically or erase blocks of memory
Organisation in detail
Memory cell
The basic element of a semiconductor memory
Two stable states
being written into to set the state, or being read to sense the state
Chip Logic
One extreme organization : the physical arrangement of cells
in the array is the same as the logical arrangement.
The array is organized into W words of B bits each.
e.g. A 16Mbit chip can be organised as 1M 16-bit words
One-bit-per-chip in which data is read/written one bit at a
time
A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word
in chip 1 and so on
Chip Logic
Typical organization of a 16-Mbit DRAM
A 16Mbit chip can be organised as a 2048 x 2048 x 4bit
array
Reduces number of address pins
Multiplex row address and column address
11 pins to address (211=2048)
An additional 11 address lines select one of 2048 columns of
4bits per column. Four data lines are for the input and output
of 4 bits to and from a data buffer. On write, the bit driver of
each bit line is activated for a 1 or 0 according to the value of
the corresponding data line. On read, the value of each bit line
selects which row of cells is used for reading or writing.
Adding one more pin devoted to addressing doubles the
number of rows and columns, and so the size of the chip
memory grows by a factor 4.
Typical 16 Mb DRAM (4M x 4)
Refreshing
Refresh circuit included on chip
Disable chip
Count through rows
Read & Write back
Takes time
Slows down apparent performance
Chip Packaging
EPROM package , which is a one-word-per-chip,
8-Mbit chip organized as 1M×8
•The address of the word being accessed . For 1M
words, a total of 20 pins (220=1M) are needed.
•D0~D7
•The power supply to the chip (VCC)
•A ground pin (Vss)
•A chip enable (CE) pin: the CE pin is used to
indicate whether or not the address is valid for this
chip.
•A program voltage (Vpp)
DRAM package, 16-Mbit chip
organized as 4M×4
RAM chip can be updated, the data
pins are input/output different from
ROM chip
•Write Enable pin (WE)
•Output Enable pin (OE)
•Row Address Select (RAS)
•Column Address Select (CAS)
Module
Organisation
·If a RAM chip contain only
1bit per word, clearly a
number of chips equal to the
number of bits per words are
needed.
e.g. How a memory module
consisting of 256K 8-bit
words could be organized?
256K=218, an 18-bit address
needed;
The address is presented to
8 256K×1-bit chips, each of
which provides the
input/output of 1 bit.
Figure 4.6 256kbyte memory Organization
Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
(1M×8bit/256K×8bit)=4=22
As show in figure 4.7, 1M word by 8bits per word is
organized as four columns of chips, each column
containing 256K words arranged as in Figure 4.6.
1M=220
For 1M word, 20 address lines are needed.
The 18 least significant bits are routed to all 32
modules.
The high-order 2 bits are input to a group select logic
module that sends a chip enable signal to one of
the four columns of modules.
Error Correction
Hard Failure
Permanent defect
Soft Error
Random, non-destructive
No permanent damage to memory
Detected using Hamming error correcting code
Error Correcting Code Function
•A function f, is
performed on the data to
produce a code.
•When the previously
stored word is read out,
the code is used to
detect and possible
correct errors.
•A new set of K code
bits is generated from
the M data bits and
compared with the
fetched code bits.
Even Parity bits
Figure 4.9 Hamming Error-Correcting Code
Figure 4.9 uses Venn diagrams to illustrate the use of Hamming code on 4-bit words
(M=4). With three intersection circles, there are seven compartments. We assign the 4
data bits to the inner compartments. The remaining compartments are filled with parity
bits. Each parity bit is chosen so that the total number of 1s in its circle is even.
Figure 4.8 ErrorCorrecting Code
The comparison logic receives as input two k-bit values. A bit-by-bit
comparison is done by taking the exclusive-or of the two inputs. The
results is called the syndrome word.
The syndrome word is therefore K bits wide and has a range
between 0 and 2K-1. The value 0 indicates that no error was
detected. Leaving 2K-1 values to indicate, if there is an error,
which bit was in error (the numerical value of the syndrome
indicates the position of the data bit in error).
An error could occur on any of the M data bits or K check bits
so,
2K-1≥M+K
(This equation gives the number of bits needed to correct a single bit error
in a word containing M data bits.)
Those bit positions whose
position number are powers of 2
are designated as check bits.
Each check bit operates on every
data bit position whose position
number contains a 1 in the
corresponding column position.
Bit position n is checked by those
bits Ci such that ∑i=n.
C8 C4 C2 C1
Figure 4.10 Layout of Data bits and Check bits
The check bits are calculated as follows, where the symbol
designates the exclusive-or operation:
Assume that the 8-bit input words is 00111001, with data bit M1 in
the right-most position. The calculations are as follows:
Suppose the data bit 3 sustains an error and is changed from 0 to 1.
When the new check bits are compared with the old check bits,
the syndrome word is formed:
The result is 0110, indicating that bit position 6, which contains
data bit 3, in error.
Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) code
More commonly, semiconductor memory is equipped with a
single-error-correcting double-error-detecting (SEC-DED) code.
An error-correction code enhances the reliability of the memory
at the cost of added complexity.
Table 4.3 Increase in Word Length with Error Correction
1
1
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occur (Figure 4.12 c), the checking procedure
goes astray (d) and worsens the problem by creating a third error (e). To overcome
the problem, an eighth bit is added that is set so that the total number of 1s in the
diagram is even.
4.3 CASHE MEMORY
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module
Cache operation - overview
Figure 4.14 Cache/Main-Memory Structure (P118)
Cache includes tags to identify which block of main memory is in each
cache slot. The tag is usually a portion of the main memory address.
Line
Number
Tag
Block
0
1
2
C-1
(a) Cache
Block length (k words)
Memory
address
0
1
Block (K words)
2
3
Block
2n-1
(b) Main Memory
Word Length
Figure 4.15 Cache Read Operation (P119)
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from main
memory to cache
• Then deliver from cache to CPU
Typical Cache Organization
•In this organization, the cache
connects to the processor via
data, control, and address lines.
•The data and address lines
attach to data and address
buffers, which attach to a
system bus from which main
memory is reached.
•When a cache hit occurs, the
data and address buffers are
disabled and communication is
only between processor and
cache, with no system bus
traffic
Figure 4.16 Typical Cache Organization
•When a cache miss occurs, the
desired address is loaded onto
the system bus and the data are
returned through a data buffer
to both the cache and main
memory.
Elements of Cache Design
Size
Mapping Function
Direct
Associative
Set Associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Write Policy
Write through
Write back
Write once
Block Size
Number of Caches
Single or two level
Unified or split
Cache Size
A trade-off between cost per bit and access time
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
“Optimum” cache sizes Suggested : between 1K
and 512K words.
Mapping Function
Three techniques
direct, associative, and set associative
Elements of the example
Cache of 64kByte
Cache block of 4 bytes
Data is transferred between memory and the cache in
blocks of 4 bytes each.
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address (224=16M)
Main memory (4M blocks of 4 bytes each)
Direct Mapping
Each block of main memory maps to only
one cache line
i.e. if a block is in cache, it must be in one specific place
Address is in two parts
Least Significant w bits identify unique word or
byte within a block of main memory.
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and a
tag of s-r (most significant)
The line field of r identifies one of the m=2r lines
of the cache
Direct Mapping
Cache Line Table
Cache line
Every row has the same cache
line number; Every column has
the same tag number.
Main
Memory blocks assigned
0
1
0,
1,
m,
m+1,
2m, … 2s-m
2m+1…2s-m+1
m-1
m-1, 2m-1, 3m-1… 2s-1
The mapping is expressed as:
i= j modulo m
where i =cache line number
j = main memory block number
m = number of lines in the cache
No two blocks in the same line have the same Tag field!
Direct Mapping Cache Organization
The r-bit line number
is used as an index
into the cache to
access a particular
line.
If the (s-r) bit tag
number matches the
tag number currently
stored in that line,
then the w-bit word
number is used to
select one of the 2w
bytes in that line.
Otherwise, the s bits
tag-plus-line field is
used to fetch a block
from main memory.
Direct Mapping
Address Structure
Tag s-r
8
Line or Slot r
14
Word w
2
24 bit address
w =2 bit word identifier (4 byte block)
s=22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Direct Mapping Example
• The cache is organized as
16K=214 lines of 4 bytes each.
• The main memory consists of
16Mbytes, organized as 4M
blocks of 4 bytes each.
• i= j modulo m
i = cache line number
j = main memory block number
m = number of lines in the cache
• Note that no two blocks that
map into the same line number
have the same tag number.
Main Memory Address
Direct Mapping pros & cons
Advantages
Simple
Inexpensive
Disadvantages
Fixed location for given block
If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into any line of
cache
Memory address is interpreted as a tag and a word
field.
Tag uniquely identifies block of memory
Every line’s tag is examined for a match
Disadvantages of associative mapping
Cache searching gets expensive
Complex circuitry required to examine the tags of all
caches in parallel.
Fully Associative Cache
Organization
Associative Mapping
Address Structure
Word
2 bit
Tag 22 bit
22 bit tag stored with each 32 bit (4B) block of data
Compare tag field with tag entry in cache to check for
hit
Least significant 2 bits of address identify which 16-bit
word is required from 32 bit data block
e.g.
Address
Tag
Data
Cache line
16339C
058CE7
FEDCBA98
0001
Associative Mapping Example
Main Memory Address
Set Associative Mapping
Cache is divided into a number of sets
Each set contains a number of lines
A given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only one set
Set Associative Mapping
In this case , the cache is divided into v sets,
each of which consists of k lines.
The relationships are
m=v×k
i = j modulo v
where
i=cache set number
j=main memory block number
m=number of lines in the cache
This is referred to as k-way set associative mapping.
Two Way Set Associative Cache
Organization
The d set bits specify one
of v=2d sets. The s bits of
the tag and set fields
specify one of the 2s blocks
of main memory.
With K-way set associative
mapping, the tag in a
memory address is much
smaller and is only
compared to the k tags
within a single set.
Set Associative Mapping
Example
13 bit set number
Block number in main memory is modulo 213
000000, 00A000, 00B000, 00C000 … map to
same set
Set Associative Mapping
Address Structure
Tag 9 bit
Word
2 bit
Set 13 bit
Use set field to determine cache set to look in
Tag+Set field specifies one of the blocks in
the main memory.
Compare tag field to see if we have a hit
e.g
Address
Tag
Data
Set number
1FF 7FFC
1FF
24682468
1FFF
Two Way Set Associative
Mapping Example
e.g
Address
1FF 7FFC
Tag
1FF
Data
24682468
Set number
1FFF
02C 0004
02C
11235813
0001
Main Memory Address
Replacement Algorithms (1)
Direct mapping
When a new block is brought into the cache, one
of the existing blocks must be replaced.
Direct mapping
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2)
Associative & Set Associative
Hardware implemented algorithm (speed)
Least Recently used (LRU)
Replace that block in the set which has been in the cache
longest with no reference to it. (hit ratio + time)
e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO)
replace block in the set that has been in cache longest.
(time)
Least frequently used
replace block in the set which has had fewest hits.
Random
(hit ratio)
Write Policy
Must not overwrite a cache block unless main
memory is up to date
Problems to contend with
More than one device may have access to main memory.
Data inconsistent between memory and cache
Multiple CPUs may have individual caches
Data inconsistent among caches
Write Policy
Write through
Write back
Write once
Write through
All writes go to main memory as well as cache
Any other processor-cache can monitor main
memory traffic to keep local (to CPU) cache
updated.
Disadvantages
Lots of traffic
Slows down writes
Write back
Updates initially made in cache only
Update bit for cache slot is set when update occurs
If block in cache is to be replaced, write to
main memory only if update bit is set
Other caches get out of sync
I/O must access main memory through cache
Because portions of main memory are invalid
Approaches to cache coherency
Bus watching with write through
Each cache controller monitors the address lines to detect write
operations to memory by other bus masters.
This strategy depends on the use of a write-through policy by all
cache controller.
Hardware transparency
Additional hardware is used to ensure that all the updates to
main memory via cache are reflected in all caches.
Noncachable memory
Only a portion of main memory is shared by more than one
processor.
In such a system, all accesses to shared memory are cache
misses, because the shared memory is never copied to the cache.
The noncachable memory can be identified using chip-select logic
or high-access bits.
Line Size
The principle of locality
Data in the vicinity of a referenced word is likely to be
referenced in the near future.
The relationship between block size and hit ratio is
complex, depending on the locality characteristics
of a particular program, and no definitive optimum
value has been found.
A size of from two to eight words seems
reasonably close to optimum.
Number of caches
A single cache
Multiple caches
The number of levels of caches
The use of unified versus split caches
Split caches: one dedicated to instructions and one dedicated
to data
• Key advantage of split caches: eliminate contention for cache
between the instruction processor and the execution unit.
Unified cache: a single cache used to store references to both
data and instructions
For a given cache size, a unified cache has a higher hit rate than
split caches because it balances the load between instruction and
data fetches automatically.
Number of caches
The on-chip cache: cache and processor on the same chip
When the requested instruction or data is found in the on-chip cache,
the bus access is eliminated. Because of the short data paths internal
to the processor, on-chip cache accesses will complete appreciably
faster than would even zero-wait state bus cycles.
Advantages
Reduce the processor’s external bus activity
Speed up execution times
Increase overall system performance
A two-level cache
The internal cache designated as level 1 (L1)
The external cache designated as level 2 (L2)
4.4 Pentium Cache
Foreground reading
Find out detail of Pentium II cache systems
NOT just from Stallings!
4.5 Newer RAM Technology (1)
Basic DRAM same since first RAM chips
Constraints of the traditional DRAM chip:
its internal architecture and its interface to the
processor’s memory bus.
Enhanced DRAM
Contains small SRAM as well
SRAM holds last line read
A comparator stores the 11-bit value of the most recent
row address selection.
Cache DRAM (CDRAM)
Larger SRAM component
Use as cache or serial buffer
Newer RAM Technology (2)
Synchronous DRAM (SDRAM)
Access is synchronized with an external clock unlike
DRAM asynchronous.
Address is presented to RAM
Since SDRAM moves data in time with system clock, CPU
knows when data will be ready
CPU does not have to wait, it can do something else
Burst mode allows SDRAM to set up stream of data and
fire it out in block
Internal logic of
the SDRAM
• In burst mode, a series of data
bits can be clocked out rapidly
after the first bit has been
accessed.
Burst mode is useful when all
the bits to be accessed are in
sequence and in the same row
of the array as the initial access
•A dual-bank internal
architecture that improves
opportunities for on-chip
parallelism.
• The mode register and
associated control logic
provide a mechanism to
customize the SDRAM to suit
specific system needs.
Newer RAM Technology (3)
Foreground reading
Check out any other RAM you can find
See Web site:
The RAM Guide
Exercises
P143 4.4, 4.6, 4.7, 4.8
P145 4.20
Deadline