Transcript Chapter 3

MICROPROCESSOR
MEMORY ORGANIZATION
1



3.1 Introduction
3.2 Main memory
3.3 Microprocessor on-chip memory
management unit and cache
2
A memory unit is an integral part of any
microcomputer, and its primary purpose is to
hold instructions and data.
 Memory system can be divided into three
groups:
1. Microprocessor memory
2. Primary or main memory
3. Secondary memory

3



Microprocessor memory is a set of
microprocessor registers, used to hold temporary
results
Main memory is the storage area in which all
programs are executed, include ROM & RAM
Secondary memory (Electromechanical memory
)devices such as hard disks, also called virtual
memory.

The microcomputer cannot execute programs
stored in the secondary memory directly, so to
execute these programs the microcomputer must
transfer them to its main memory by a program
called the operating system.
4
Microprocessor
memory
main memory
Secondary memory
The fastest
The slower
The slowest
The smallest
The Largest
The larger
5
8-bit microprocessors:
The memory is divided into a number of 8-bit
units called memory words. An 8-bit unit of

data is termed a byte. Therefore, for an 8-bit
microprocessor,memory word and memory
byte mean the same thing.
6
16-bit microprocessors:
 The memory is divided into a word contains 2
bytes (16 bits). A memory word is identified
in the memory by an address.
 For example, the Pentium microprocessor
uses 32-bit addresses for accessing memory
words.
 This provides a maximum of 232 =
4,294,964,296 = 4 GB of memory addresses,
ranging from 00000000,, to FFFFFFFF,, in
hexadecimal.

7
Intel Pentium microprocessors:
 The memory is divided into segments
 Segment = 216 =64KB= addressed by16bits

8

Intel Pentium microprocessors (1MB):
High bit for address
LOW bit for
segment number
9


I MB memory  220 / 216 = 24
For example, the computer uses 24 address
pins to address 224= 16 MB of memory
directly with addresses from 000000,, to
FFFFFF,,.
No. of segment =size of memory / size of one segment(216)
10



An important characteristic of a memory is
whether it is volatile or nonvolatile.
The contents of a volatile memory are lost if
the power is turned off.
On the other hand, a nonvolatile memory
retains its contents after power is switched
off. ROM is a typical example of nonvolatile
memory. RAM is a volatile memory.
11



ROMs can only be read, so is nonvolatile
memory.
CMOS technology is used to fabricate it
ROMs are divided to: mask ROM and erasable
PROM(EPROM), and EAROM (electrically
alterable ROM)[also called EEPROM or E2PROM
(electrically erasable PROM)]
12
13




Mask ROMs are programmed by a masking
operation performed on a chip during the
manufacturing process. The contents of mask
ROMs are permanent and cannot be
changed by the user.
EPROMs can be programmed, and their contents
can also be altered by using special equipment,
called an EPROM programmer.
When designing a microcomputer for a particular
application, permanent programs are stored in
ROMs. Control memories
used to microprogram the control unit are ROMs.
14


EPROMs can be reprogrammed and erased.
The chip must be removed from the
microcomputer system for programming.
This memory is erased by exposing the chip
to ultraviolet light
Typical erase times vary between 10 and 20
min.
15




EAROMs can be programmed without removing the memory
from the ROM’s sockets.
These memories are also called read-mostly memories
(RMMs), because they have much slower write times than
read times. Therefore, these memories are usually suited
for operations when mostly reading rather that writing will
be performed.
Another type of memory, called Flush memory(nonvolatile),
is designed using a combination of EPROM and E2PROM
technologies.
Flash memory can be reprogrammed electrically while
embedded on the board. An example of flash memory is
used in cellular phones and digital cameras.
16

There are two types of RAM: static RAM
(SRAM), and dynamic RAM (DRAM).
SRAM
stores data in flip-flops.
DRAM
stores data in capacitors.
memory does not need to it can hold data for a few
be refreshed.
milliseconds, need to be
refreshed
have lower densities
have higher densities
DRAMs are inexpensive, occupy less space, and
dissipate less power than SRAMs.
17



Two enhanced versions of DRAM are ED0
DRAM (extended data output DRAM) and
SDRAM (synchronous DRAM).
The ED0 DRAM provides fast access by
allowing the DRAM controller to output the
next address at the same time the current
data is being read.
An SDRAM contains multiple DRAMs
(typically, four) internally. SDRAMs utilize the
multiplexed addressing of conventional
DRAMs.
18

We consider the instruction fetch, memory
READ, and memory WRITE timing diagrams
19
20
READ timing
1. The microprocessor performs the instruction fetch cycle
as before to READ the opcode.
2. The microprocessor interprets the op-code as a memory
READ operation.
3. When the clock pin signal goes HIGH, the microprocessor
places the contents of the memory address register on the
address pins A0,-A15,, of the chip.
4. At the same time, the microprocessor raises the READ pin
signal to HIGH.
5. The logic external to the microprocessor gets the
contents of the location in the main ROM/RAM addressed
by the memory address register and places it on the data
bus.
6. Finally, the microprocessor gets this data from the data
bus via pins D0, - D7, and stores it in an internal register.

21
22
Write timing
1. When the clock pin signal goes HIGH, the
microprocessor places the contents of the
memory address register on the address pins
A0,-A15,, of the chip.
2. At the same time, the microprocessor raises the
WRITE pin signal to HIGH.
3. The microprocessor places data to be stored
from the contents of an internal register onto
data pins Do-D7,.
4. The logic external to the microprocessor stores
the data from the register into a RAM location
addressed by the memory address register.

23
DRAM Organization
 DRAMs are typically used when memory
requirements are 16K words or larger.
 DRAM is addressed via row and column
addressing.
24
DRAM Organization
 1 -Mb (one megabit) DRAM requiring 20
address bits is addressed using 10 address
lines and two control lines, RAS (row address
strobe) and CAS (column address strobe).
 To provide a 20-bit address into the DRAM, a
LOW is applied to RAS and 1 0 bits of the
address are latched. The other 10 bits of the
address are applied next and CAS is then held
LOW.
25

The addressing capability of the DRAM can be
increased by a factor of 4 by adding
220 X 4 = 220 X 22

External logic is required to generate the RAS
and CAS signals and to output the current
address bits to the DRAM.
26

DRAM controller chips take care of the
refreshing and timing requirements needed
by DRAMs. DRAMs typically require a 4-ms
refresh time, it sends a wait signal to the
microprocessor if the microprocessor tries to
access memory during a refresh cycle
27


Memory Array Design means:
interconnecting several memory chips.
A microprocessor can address directly a
maximum of 216 = 65,536 or 64K bytes of
memory locations.
28

The control line M /IO goes LOW if the
microprocessor executes an I/O instruction; it
is held HIGH if the microprocessor executes a
memory instruction.
29
Chip
Select
M/IO
30
Disable
31

To connect a microprocessor to ROM/RAM
chips, two address-decoding techniques are
commonly used: linear decoding and full
decoding.
32
linear decoding
Suppose we have 4K SRAM chip array
comprised of the four 1K SRAM chips of
Figure 3.7
See Figure 3.8

33

linear decoding
34
35
linear decoding Advantage
does not require decoding hardware.
 linear decoding Disadvantage
1. If two or more of lines A10-A13are low at the
same time, more than one SRAM chip are
selected, and this causes a bus conflict.
Solution :software must be written such that it
never reads into or writes from any address in
which more than one of bits A10-A13are low.

36
linear decoding Disadvantage (cont.)
2. Wastes a large amount of address space.
For example, whenever the address value is
B800 or 3800, SRAM chip I is selected. (this
situation is also called (memory foldback).
Solution: To resolve problems with linear
decoding, we use full decoded memory
addressing.

The system of Figure 3.8 can be expanded
up to a total capacity of 6K using A14, and
A15, as chip selects for two more 1K SRAM
chips.

37


full decoding.
Use Decoder In Figure 3.9 the decoder output
selects one of the four IK SRAM chips,
depending on the values of A12, A11, and
A10(Table3.3).
38
39


Note that the decoder output will be enabled
only when E3 = E2 = 0 and E l = 1.
Using 3X8 decoder, when any one of the
high-order bits A15, A14,or A13, is 1, the
decoder will be disabled, and thus none of
the SRAM chips will be selected.
40
41
Typical 32-bit microprocessors such as the
Pentium contain on-chip memory
management unit hardware and on-chip
cache memory. These topics are discussed
next.
42


Because access to a hard disk, system
throughput will be reduced to unacceptable
levels. An obvious solution is to use a large
and fast locally accessed semiconductor
memory. Unfortunately, the storage cost per
bit for this solution is very high.
A combination of both off-board disk
(secondary memory) and on-board
semiconductor main memory must be
designed into a system.
43
Memory management unit (MMU):
a device, located between the microprocessor
and memory, to control accesses,perform
address mappings, and act as an interface
between the logical (programmer’s
memory) and physical (microprocessor’s
directly addressable memory) address spaces.
44
MMU address translation:


It translates logical program addresses to
physical memory address. Note that in
assembly language programming, addresses
are referred to by symbolic names.
These addresses in a program are called
logical addresses because they indicate the
logical positions of instructions and data.
45
MMU address translation:
The MMU can perform address translation in
one of two ways:
1. By using the substitution technique [Figure
3.10(a)].
2. By adding an offset to each logical address
to obtain the corresponding physical address
[Figure 3.10(b)].

46
MMU address translation:
47
MMU address translation:

Address translation using the substitution
technique is faster than translation using
the offset method. However, the offset
method has the advantage of mapping a
logical address to any physical address as
determined by the offset value.
48
MMU address translation:



Memory is usually divided into small
manageable units: page and segment.
Paging divides the memory into equal sized
pages; segmentation divides the memory into
variable-sized segments.
It is relatively easier to implement the address
translation table if the logical and main
memory spaces are divided into pages.
49
MMU address translation (mapping):
There are three ways to map logical addresses
to physical addresses:
 paging,
 segmentation,
 and combined paging-segmentation.
50
The paging method
 The virtual memory system is managed
by both hardware and software. The hardware
included in the memory management unit
handles address translation. The memory
management software in the operating
system performs all functions, including page
replacement policies to provide efficient
memory utilization.
51
The Segmentation method
 an MMU utilizes the segment selector to
obtain a descriptor from a table in memory
containing several descriptors. A descriptor
contains the physical base address for a
segment, the segment’s privilege level, and
some control bits.
52
The Segmentation method
 When the MMU obtains a logical address from
the microprocessor, it first determines
whether the segment is already in physical
memory. If it is, the MMU adds an offset
component to the segment base component of
the address obtained from the segment
descriptor table to provide the physical
address. The MMU then generates the physical
address on the address bus for selecting the
memory.
53
The paged-segmentation method
 each segment contains a number of pages.
The logical address is divided into three
components: segment, page, and word.
 A page component of n bits can provide up to


2npages.
A segment can be assigned with one or
more pages up to maximum of 2n pages;
therefore, a segment size depends on the
number of pages assigned to it.
54
The Virtual memory
 The key idea behind the virtual memory is to
allow a user program to address more
locations than those available in a physical
memory.
 An address generated by a user program is
called a virtual address
55

The performance of a microprocessor system
can be improved significantly by introducing
a small, expensive, but fast memory between
the microprocessor and main memory.
56
57



a cache memory is very small in size and its
access time is less than that of the main
memory by a factor of 5. Typically, the access
times of the cache and main memories are
100 and 500 ns, respectively.
A cache hit means : reference is found in the
cache,
A cache miss means : reference is not found
in the cache,
58

The relationship between the cache and main
memory blocks is established using mapping
techniques. Three widely used mapping
techniques are direct mapping, fully
associative mapping, and set-associative
mapping.
59
Direct mapping,

Direct mapping uses a RAM for the cache.
The microprocessor’s 12-bit address is
divided into two fields, an index field and a
tag field. Because the cache address is 8 bits
wide (28 = 256), the low-order 8 bits of the
microprocessor’s address form the index
field, and the remaining 4 bits constitute the
tag field.
 In general, if the main memory address field
is m bits wide and the cache memory address
is n bits wide, the index field will then require
n bits and the tag field will be (m - n )
60
Direct mapping,
61






Direct mapping,
The microprocessor first accesses the cache.
If there is a hit, the microprocessor
accepts the 16-bit word from the cache. In
case of a miss, the microprocessor reads the
desired 16-bit word from the main memory,
and this 16-bit word is then written to the
cache. A cache memory may contain
instructions only (Instruction cache) or data
only
(data cache) or both instructions and data
(unified cache).
62
Numerical example for Direct mapping
63
Example :
The content of index address 00 of cache is
tag = 0 and data = 0 13F. Suppose that a
microprocessor wants to access the memory
address 100. The index address 00 is used to
access the cache. Memory address tag 1 is
compared with cache tag 0. This does not
produce a match. Therefore, the main
memory is accessed and the data 27 14 is
transferred into the microprocessor. The
cache word at index address 00 is then
replaced by a tag of 1 and data of 27 14.

64

One of the main drawbacks of direct mapping
is that numerous misses may occur if two or
more words with addresses that have the
same index but different tags are accessed
several times.
65
Fully associative mapping
 The fastest and most expensive cache
memory
 Each element in associative memory contains
a main memory address and its content
(data).
66
Fully associative mapping
When the microprocessor generates a main
memory address, it is compared associatively
(simultaneously) with all addresses in the
associative memory. If there is a match, the
corresponding data word is read from the
associative cache memory and sent to the
microprocessor. If a miss occurs, the main
memory is accessed and the address and its
corresponding data are written to the
associative cache memory.
67
Fully associative mapping
68
Fully associative mapping
Each word in the cache is a 12-bit address
along with its 16-bit contents (data). When
the microprocessor wants to access memory,
the 12-bit address is placed in an address
register and the associative cache memory is
searched for a matching address. Suppose
that the content of the microprocessor
address register is 445. Because there is a
match, the microprocessor reads the
corresponding data OFAl into an internal data
register.
69
Set-associative mapping.
 a combination of direct and associative
mapping.
 cache word stores two or more main memory
words using the same index address. Each
main memory word consists of a tag and its
data word. An index with two or more tags
and data words forms a set
70
Set-associative mapping.
 When the microprocessor generates a
memory request, the index of the main
memory address is used as the cache
address. The tag field of the main memory
address is then compared associatively
(simultaneously) with all tags stored under
the index. If a match occurs, the desired
dataword is read. If a match does not occur,
the data word, along with its tag, is read from
main memory and written into the cache
71
Set-associative mapping.
72
Set-associative mapping.
The size of a set is defined by the number of
tag and data items in a cache word. A set size
of 2 is used in this example. Each index
address contains two data words and their
associated tags. Each tag includes 4 bits, and
each data word contains 16 bits. Therefore,
the word length = 2 x (4 + 16) = 40 bits. An
index address of 8 bits can represent 256
words. Hence, the size of the cache memory
is 256 x 40. It can store 512 main memory
words
73
How to write on cache :
 There are two ways of writing into cache: the
write-back and write-through methods.
74
The write-back method


whenever the microprocessor writes
something into a cache word, a “dirty” bit is
assigned to the cache word. When a dirty
word is to be replaced with a new word, the
dirty word is first copied into the main
memory before it is overwritten by the
incoming new word.
The advantage of this method is that it avoids
unnecessary writing into main memory.
75
The write-through method,


whenever the microprocessor alters a cache
address, the same alteration is made in the
main memory copy of the altered cache
address.
This policy is easily implemented and ensures
that the contents of the main memory are
always valid. This feature is desirable in a
multiprocesssor system, in which the main
memory is shared by several processors.
76




A valid bit used to ensures proper utilization
of the cache.
It is an extra bit contains in the tag directory
When the power is turned on, the valid bit
corresponding to each cache block entry of
the tag directory is reset to zero. This is done
to indicate that the cache block holds invalid
data.
When a block of data is transferred from the
main memory to a cache block, the valid bit
corresponding to this cache block is set to 1.
77
Finally, microprocessors such as the Intel
Pentium I1 support two levels of cache, L1
(level 1) and L2 ( level 2) cache memories.
 The L1 cache (smaller in size) is contained
inside the processor chip while the L2 cache
(larger in size) is interfaced external to the
microprocessor.

78


The L 1 cache normally provides separate
instruction and data caches. The processor
can access the L1 cache directly and the L2
cache normally supplies instructions and data
to the L1 cache.
The L2 cache is usually accessed by the
microprocessor only if L 1 misses occur. This
two-level cache memory enhances
microprocessor performance.
79